SessionDataSet Iterator Pattern Guide

This guide explains how to use the SessionDataSet iterator pattern for efficiently querying large datasets from Apache IoTDB.

Overview

The SessionDataSet provides an iterator-based approach to reading query results, similar to JDBC ResultSet or database cursors. This approach offers several advantages over loading all data into memory at once:

  • Memory Efficient: Only keeps the current batch in memory
  • Lazy Loading: Fetches data on-demand as you iterate
  • Large Datasets: Can handle result sets larger than available RAM
  • Resource Management: Proper cleanup of server-side resources

Basic Usage

Simple Iteration

import { Session } from '@iotdb/client';

const session = new Session({
  host: 'localhost',
  port: 6667,
  username: 'root',
  password: 'root',
});

await session.open();

// Execute query and get SessionDataSet
const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1');

// Iterate through results
while (await dataSet.hasNext()) {
  const row = dataSet.next();
  
  // Access timestamp
  const timestamp = row.getTimestamp();
  
  // Access values by column name
  const temperature = row.getFloat('temperature');
  const humidity = row.getDouble('humidity');
  const status = row.getString('status');
  
  console.log(`${timestamp}: temp=${temperature}, humidity=${humidity}, status=${status}`);
}

// Always close the dataset when done
await dataSet.close();

await session.close();

API Reference

SessionDataSet

Methods

hasNext(): Promise<boolean>

Checks if there are more rows available. This may trigger fetching the next batch from the server.

while (await dataSet.hasNext()) {
  // Process next row
}
next(): RowRecord

Returns the next row. Must call hasNext() first to ensure a row is available.

if (await dataSet.hasNext()) {
  const row = dataSet.next();
}
close(): Promise<void>

Closes the dataset and releases server-side resources. Always call this when done.

await dataSet.close();
getColumnNames(): string[]

Returns array of column names.

const columns = dataSet.getColumnNames();
console.log('Columns:', columns); // ['temperature', 'humidity', 'status']
getColumnTypes(): string[]

Returns array of column data types.

const types = dataSet.getColumnTypes();
console.log('Types:', types); // ['FLOAT', 'DOUBLE', 'TEXT']
findColumn(columnName: string): number

Returns the zero-based index of a column by name.

const tempIndex = dataSet.findColumn('temperature'); // Returns 0
toArray(): Promise<any[][]> (Deprecated)

Loads all remaining rows into memory as an array. Use only for small result sets.

// ⚠️ Not recommended for large datasets
const allRows = await dataSet.toArray();

RowRecord

Represents a single row of data from the query result.

Methods

Timestamp Access
const timestamp = row.getTimestamp(); // Returns number (milliseconds)
Access by Column Name
// Typed getters
const stringValue = row.getString('name');
const intValue = row.getInt('count');
const longValue = row.getLong('id');
const floatValue = row.getFloat('temperature');
const doubleValue = row.getDouble('humidity');
const boolValue = row.getBoolean('status');

// Generic getter (returns any)
const value = row.getValue('columnName');
Access by Column Index
const stringValue = row.getStringByIndex(0);
const intValue = row.getIntByIndex(1);
const floatValue = row.getFloatByIndex(2);
const value = row.getValueByIndex(3);
Null Checking
if (row.isNull('optionalColumn')) {
  console.log('Column is null');
} else {
  const value = row.getString('optionalColumn');
}

// By index
if (row.isNullByIndex(0)) {
  console.log('First column is null');
}
Get All Data
const fields = row.getFields(); // Returns array of field values
const array = row.toArray(); // Returns [timestamp, ...fields]

Advanced Usage

Configuring Fetch Size

Control how many rows are fetched in each batch:

const session = new Session({
  host: 'localhost',
  port: 6667,
  fetchSize: 1024, // Fetch 1024 rows at a time
});

const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1');
// Will automatically fetch in batches of 1024 rows

Processing Large Result Sets

For very large result sets, use iterator pattern to avoid memory issues:

const dataSet = await session.executeQueryStatement('SELECT * FROM root.large_dataset');

let count = 0;
let sum = 0;

while (await dataSet.hasNext()) {
  const row = dataSet.next();
  sum += row.getDouble('value');
  count++;
  
  // Log progress every 10000 rows
  if (count % 10000 === 0) {
    console.log(`Processed ${count} rows...`);
  }
}

console.log(`Total rows: ${count}, Average: ${sum / count}`);

await dataSet.close();

Error Handling

Always use try-finally to ensure cleanup:

let dataSet;
try {
  dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1');
  
  while (await dataSet.hasNext()) {
    const row = dataSet.next();
    // Process row
  }
} catch (error) {
  console.error('Query error:', error);
  throw error;
} finally {
  if (dataSet) {
    await dataSet.close();
  }
}

Working with Multiple Columns

const dataSet = await session.executeQueryStatement(`
  SELECT temperature, humidity, pressure, status
  FROM root.weather.station1
  WHERE time > now() - 1h
`);

while (await dataSet.hasNext()) {
  const row = dataSet.next();
  
  const reading = {
    timestamp: new Date(row.getTimestamp()),
    temperature: row.getFloat('temperature'),
    humidity: row.getFloat('humidity'),
    pressure: row.getFloat('pressure'),
    status: row.getString('status'),
  };
  
  // Process reading
  if (!row.isNull('status') && reading.temperature > 30) {
    console.log('High temperature alert:', reading);
  }
}

await dataSet.close();

Aggregation Queries

const dataSet = await session.executeQueryStatement(`
  SELECT AVG(temperature), MAX(temperature), MIN(temperature)
  FROM root.test.d1
  GROUP BY ([2024-01-01, 2024-02-01), 1d)
`);

while (await dataSet.hasNext()) {
  const row = dataSet.next();
  
  console.log({
    timestamp: new Date(row.getTimestamp()),
    avg: row.getDouble('AVG(temperature)'),
    max: row.getDouble('MAX(temperature)'),
    min: row.getDouble('MIN(temperature)'),
  });
}

await dataSet.close();

Migration from Old Pattern

The old pattern loaded all data into memory at once. The new pattern uses lazy loading with iterators.

Old Approach (No Longer Supported)

// ❌ This no longer works - executeQueryStatement now returns SessionDataSet
// const result = await session.executeQueryStatement('SELECT * FROM root.test.d1');
// for (const row of result.rows) {
//   const timestamp = row[0];
//   const value1 = row[1];
//   const value2 = row[2];
//   console.log(timestamp, value1, value2);
// }

New Approach (Current Implementation)

// ✅ New way - lazy loading with iterator
const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1');

while (await dataSet.hasNext()) {
  const row = dataSet.next();
  
  const timestamp = row.getTimestamp();
  const value1 = row.getValueByIndex(0);
  const value2 = row.getValueByIndex(1);
  
  console.log(timestamp, value1, value2);
}

await dataSet.close();

Migration Steps

  1. Change method call: executeQueryStatement() now returns SessionDataSet (not QueryResult)
  2. Replace array access: Use iterator pattern instead of result.rows
  3. Update row access: Use RowRecord methods instead of array indices
  4. Add cleanup: Always call await dataSet.close()

For Small Datasets

If you need all data at once for small result sets, use toArray():

const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1');
const allRows = await dataSet.toArray(); // Loads everything into memory
// allRows is [[timestamp, value1, value2], ...]

⚠️ Warning: toArray() loads all data into memory. Only use for small result sets.

Best Practices

  1. Always Close: Use try-finally to ensure dataSet.close() is called
  2. Check hasNext(): Always call hasNext() before calling next()
  3. Use Fetch Size: Set appropriate fetch size based on memory and network
  4. Handle Nulls: Check isNull() before accessing nullable columns
  5. Typed Access: Use typed getters (getInt, getString) for type safety
  6. Avoid toArray(): Don't use toArray() for large datasets

Performance Tips

  1. Fetch Size: Larger fetch size = fewer network calls, more memory used

    • Small datasets: 100-1000
    • Large datasets: 1000-10000
    • Very large: 10000-50000
  2. Batch Processing: Process rows in batches for better performance

const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1');
const batch = [];

while (await dataSet.hasNext()) {
  batch.push(dataSet.next());
  
  if (batch.length >= 1000) {
    await processBatch(batch);
    batch.length = 0;
  }
}

if (batch.length > 0) {
  await processBatch(batch);
}

await dataSet.close();
  1. Column Access: Access by index is slightly faster than by name
// Faster
const value = row.getIntByIndex(0);

// Slightly slower (name lookup)
const value = row.getInt('temperature');

Troubleshooting

“No more rows available” Error

// ❌ Wrong - calling next() without checking hasNext()
const row = dataSet.next(); // May throw error

// ✅ Correct
if (await dataSet.hasNext()) {
  const row = dataSet.next();
}

“Column not found” Error

// Make sure column names match exactly (case-sensitive)
const columns = dataSet.getColumnNames();
console.log('Available columns:', columns);

// Use correct column name
const value = row.getString('temperature'); // Not 'Temperature' or 'temp'

Memory Issues

// ❌ Don't do this with large datasets
const allRows = await dataSet.toArray(); // Loads everything into memory

// ✅ Process iteratively instead
while (await dataSet.hasNext()) {
  const row = dataSet.next();
  // Process and discard each row
}

See Also