This guide explains how to use the SessionDataSet iterator pattern for efficiently querying large datasets from Apache IoTDB.
The SessionDataSet provides an iterator-based approach to reading query results, similar to JDBC ResultSet or database cursors. This approach offers several advantages over loading all data into memory at once:
import { Session } from '@iotdb/client'; const session = new Session({ host: 'localhost', port: 6667, username: 'root', password: 'root', }); await session.open(); // Execute query and get SessionDataSet const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1'); // Iterate through results while (await dataSet.hasNext()) { const row = dataSet.next(); // Access timestamp const timestamp = row.getTimestamp(); // Access values by column name const temperature = row.getFloat('temperature'); const humidity = row.getDouble('humidity'); const status = row.getString('status'); console.log(`${timestamp}: temp=${temperature}, humidity=${humidity}, status=${status}`); } // Always close the dataset when done await dataSet.close(); await session.close();
hasNext(): Promise<boolean>Checks if there are more rows available. This may trigger fetching the next batch from the server.
while (await dataSet.hasNext()) { // Process next row }
next(): RowRecordReturns the next row. Must call hasNext() first to ensure a row is available.
if (await dataSet.hasNext()) { const row = dataSet.next(); }
close(): Promise<void>Closes the dataset and releases server-side resources. Always call this when done.
await dataSet.close();
getColumnNames(): string[]Returns array of column names.
const columns = dataSet.getColumnNames(); console.log('Columns:', columns); // ['temperature', 'humidity', 'status']
getColumnTypes(): string[]Returns array of column data types.
const types = dataSet.getColumnTypes(); console.log('Types:', types); // ['FLOAT', 'DOUBLE', 'TEXT']
findColumn(columnName: string): numberReturns the zero-based index of a column by name.
const tempIndex = dataSet.findColumn('temperature'); // Returns 0
toArray(): Promise<any[][]> (Deprecated)Loads all remaining rows into memory as an array. Use only for small result sets.
// ⚠️ Not recommended for large datasets const allRows = await dataSet.toArray();
Represents a single row of data from the query result.
const timestamp = row.getTimestamp(); // Returns number (milliseconds)
// Typed getters const stringValue = row.getString('name'); const intValue = row.getInt('count'); const longValue = row.getLong('id'); const floatValue = row.getFloat('temperature'); const doubleValue = row.getDouble('humidity'); const boolValue = row.getBoolean('status'); // Generic getter (returns any) const value = row.getValue('columnName');
const stringValue = row.getStringByIndex(0); const intValue = row.getIntByIndex(1); const floatValue = row.getFloatByIndex(2); const value = row.getValueByIndex(3);
if (row.isNull('optionalColumn')) { console.log('Column is null'); } else { const value = row.getString('optionalColumn'); } // By index if (row.isNullByIndex(0)) { console.log('First column is null'); }
const fields = row.getFields(); // Returns array of field values const array = row.toArray(); // Returns [timestamp, ...fields]
Control how many rows are fetched in each batch:
const session = new Session({ host: 'localhost', port: 6667, fetchSize: 1024, // Fetch 1024 rows at a time }); const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1'); // Will automatically fetch in batches of 1024 rows
For very large result sets, use iterator pattern to avoid memory issues:
const dataSet = await session.executeQueryStatement('SELECT * FROM root.large_dataset'); let count = 0; let sum = 0; while (await dataSet.hasNext()) { const row = dataSet.next(); sum += row.getDouble('value'); count++; // Log progress every 10000 rows if (count % 10000 === 0) { console.log(`Processed ${count} rows...`); } } console.log(`Total rows: ${count}, Average: ${sum / count}`); await dataSet.close();
Always use try-finally to ensure cleanup:
let dataSet; try { dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1'); while (await dataSet.hasNext()) { const row = dataSet.next(); // Process row } } catch (error) { console.error('Query error:', error); throw error; } finally { if (dataSet) { await dataSet.close(); } }
const dataSet = await session.executeQueryStatement(` SELECT temperature, humidity, pressure, status FROM root.weather.station1 WHERE time > now() - 1h `); while (await dataSet.hasNext()) { const row = dataSet.next(); const reading = { timestamp: new Date(row.getTimestamp()), temperature: row.getFloat('temperature'), humidity: row.getFloat('humidity'), pressure: row.getFloat('pressure'), status: row.getString('status'), }; // Process reading if (!row.isNull('status') && reading.temperature > 30) { console.log('High temperature alert:', reading); } } await dataSet.close();
const dataSet = await session.executeQueryStatement(` SELECT AVG(temperature), MAX(temperature), MIN(temperature) FROM root.test.d1 GROUP BY ([2024-01-01, 2024-02-01), 1d) `); while (await dataSet.hasNext()) { const row = dataSet.next(); console.log({ timestamp: new Date(row.getTimestamp()), avg: row.getDouble('AVG(temperature)'), max: row.getDouble('MAX(temperature)'), min: row.getDouble('MIN(temperature)'), }); } await dataSet.close();
The old pattern loaded all data into memory at once. The new pattern uses lazy loading with iterators.
// ❌ This no longer works - executeQueryStatement now returns SessionDataSet // const result = await session.executeQueryStatement('SELECT * FROM root.test.d1'); // for (const row of result.rows) { // const timestamp = row[0]; // const value1 = row[1]; // const value2 = row[2]; // console.log(timestamp, value1, value2); // }
// ✅ New way - lazy loading with iterator const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1'); while (await dataSet.hasNext()) { const row = dataSet.next(); const timestamp = row.getTimestamp(); const value1 = row.getValueByIndex(0); const value2 = row.getValueByIndex(1); console.log(timestamp, value1, value2); } await dataSet.close();
executeQueryStatement() now returns SessionDataSet (not QueryResult)result.rowsRowRecord methods instead of array indicesawait dataSet.close()If you need all data at once for small result sets, use toArray():
const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1'); const allRows = await dataSet.toArray(); // Loads everything into memory // allRows is [[timestamp, value1, value2], ...]
⚠️ Warning: toArray() loads all data into memory. Only use for small result sets.
dataSet.close() is calledhasNext() before calling next()isNull() before accessing nullable columnsgetInt, getString) for type safetytoArray() for large datasetsFetch Size: Larger fetch size = fewer network calls, more memory used
Batch Processing: Process rows in batches for better performance
const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1'); const batch = []; while (await dataSet.hasNext()) { batch.push(dataSet.next()); if (batch.length >= 1000) { await processBatch(batch); batch.length = 0; } } if (batch.length > 0) { await processBatch(batch); } await dataSet.close();
// Faster const value = row.getIntByIndex(0); // Slightly slower (name lookup) const value = row.getInt('temperature');
// ❌ Wrong - calling next() without checking hasNext() const row = dataSet.next(); // May throw error // ✅ Correct if (await dataSet.hasNext()) { const row = dataSet.next(); }
// Make sure column names match exactly (case-sensitive) const columns = dataSet.getColumnNames(); console.log('Available columns:', columns); // Use correct column name const value = row.getString('temperature'); // Not 'Temperature' or 'temp'
// ❌ Don't do this with large datasets const allRows = await dataSet.toArray(); // Loads everything into memory // ✅ Process iteratively instead while (await dataSet.hasNext()) { const row = dataSet.next(); // Process and discard each row }