blob: ce1e82680596602c1abb37e2ff76bc533c832be4 [file] [view]
# SessionDataSet Iterator Pattern Guide
This guide explains how to use the SessionDataSet iterator pattern for efficiently querying large datasets from Apache IoTDB.
## Overview
The SessionDataSet provides an iterator-based approach to reading query results, similar to JDBC ResultSet or database cursors. This approach offers several advantages over loading all data into memory at once:
- **Memory Efficient**: Only keeps the current batch in memory
- **Lazy Loading**: Fetches data on-demand as you iterate
- **Large Datasets**: Can handle result sets larger than available RAM
- **Resource Management**: Proper cleanup of server-side resources
## Basic Usage
### Simple Iteration
```typescript
import { Session } from '@iotdb/client';
const session = new Session({
host: 'localhost',
port: 6667,
username: 'root',
password: 'root',
});
await session.open();
// Execute query and get SessionDataSet
const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1');
// Iterate through results
while (await dataSet.hasNext()) {
const row = dataSet.next();
// Access timestamp
const timestamp = row.getTimestamp();
// Access values by column name
const temperature = row.getFloat('temperature');
const humidity = row.getDouble('humidity');
const status = row.getString('status');
console.log(`${timestamp}: temp=${temperature}, humidity=${humidity}, status=${status}`);
}
// Always close the dataset when done
await dataSet.close();
await session.close();
```
## API Reference
### SessionDataSet
#### Methods
##### `hasNext(): Promise<boolean>`
Checks if there are more rows available. This may trigger fetching the next batch from the server.
```typescript
while (await dataSet.hasNext()) {
// Process next row
}
```
##### `next(): RowRecord`
Returns the next row. Must call `hasNext()` first to ensure a row is available.
```typescript
if (await dataSet.hasNext()) {
const row = dataSet.next();
}
```
##### `close(): Promise<void>`
Closes the dataset and releases server-side resources. Always call this when done.
```typescript
await dataSet.close();
```
##### `getColumnNames(): string[]`
Returns array of column names.
```typescript
const columns = dataSet.getColumnNames();
console.log('Columns:', columns); // ['temperature', 'humidity', 'status']
```
##### `getColumnTypes(): string[]`
Returns array of column data types.
```typescript
const types = dataSet.getColumnTypes();
console.log('Types:', types); // ['FLOAT', 'DOUBLE', 'TEXT']
```
##### `findColumn(columnName: string): number`
Returns the zero-based index of a column by name.
```typescript
const tempIndex = dataSet.findColumn('temperature'); // Returns 0
```
##### `toArray(): Promise<any[][]>` (Deprecated)
Loads all remaining rows into memory as an array. Use only for small result sets.
```typescript
// ⚠️ Not recommended for large datasets
const allRows = await dataSet.toArray();
```
### RowRecord
Represents a single row of data from the query result.
#### Methods
##### Timestamp Access
```typescript
const timestamp = row.getTimestamp(); // Returns number (milliseconds)
```
##### Access by Column Name
```typescript
// Typed getters
const stringValue = row.getString('name');
const intValue = row.getInt('count');
const longValue = row.getLong('id');
const floatValue = row.getFloat('temperature');
const doubleValue = row.getDouble('humidity');
const boolValue = row.getBoolean('status');
// Generic getter (returns any)
const value = row.getValue('columnName');
```
##### Access by Column Index
```typescript
const stringValue = row.getStringByIndex(0);
const intValue = row.getIntByIndex(1);
const floatValue = row.getFloatByIndex(2);
const value = row.getValueByIndex(3);
```
##### Null Checking
```typescript
if (row.isNull('optionalColumn')) {
console.log('Column is null');
} else {
const value = row.getString('optionalColumn');
}
// By index
if (row.isNullByIndex(0)) {
console.log('First column is null');
}
```
##### Get All Data
```typescript
const fields = row.getFields(); // Returns array of field values
const array = row.toArray(); // Returns [timestamp, ...fields]
```
## Advanced Usage
### Configuring Fetch Size
Control how many rows are fetched in each batch:
```typescript
const session = new Session({
host: 'localhost',
port: 6667,
fetchSize: 1024, // Fetch 1024 rows at a time
});
const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1');
// Will automatically fetch in batches of 1024 rows
```
### Processing Large Result Sets
For very large result sets, use iterator pattern to avoid memory issues:
```typescript
const dataSet = await session.executeQueryStatement('SELECT * FROM root.large_dataset');
let count = 0;
let sum = 0;
while (await dataSet.hasNext()) {
const row = dataSet.next();
sum += row.getDouble('value');
count++;
// Log progress every 10000 rows
if (count % 10000 === 0) {
console.log(`Processed ${count} rows...`);
}
}
console.log(`Total rows: ${count}, Average: ${sum / count}`);
await dataSet.close();
```
### Error Handling
Always use try-finally to ensure cleanup:
```typescript
let dataSet;
try {
dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1');
while (await dataSet.hasNext()) {
const row = dataSet.next();
// Process row
}
} catch (error) {
console.error('Query error:', error);
throw error;
} finally {
if (dataSet) {
await dataSet.close();
}
}
```
### Working with Multiple Columns
```typescript
const dataSet = await session.executeQueryStatement(`
SELECT temperature, humidity, pressure, status
FROM root.weather.station1
WHERE time > now() - 1h
`);
while (await dataSet.hasNext()) {
const row = dataSet.next();
const reading = {
timestamp: new Date(row.getTimestamp()),
temperature: row.getFloat('temperature'),
humidity: row.getFloat('humidity'),
pressure: row.getFloat('pressure'),
status: row.getString('status'),
};
// Process reading
if (!row.isNull('status') && reading.temperature > 30) {
console.log('High temperature alert:', reading);
}
}
await dataSet.close();
```
### Aggregation Queries
```typescript
const dataSet = await session.executeQueryStatement(`
SELECT AVG(temperature), MAX(temperature), MIN(temperature)
FROM root.test.d1
GROUP BY ([2024-01-01, 2024-02-01), 1d)
`);
while (await dataSet.hasNext()) {
const row = dataSet.next();
console.log({
timestamp: new Date(row.getTimestamp()),
avg: row.getDouble('AVG(temperature)'),
max: row.getDouble('MAX(temperature)'),
min: row.getDouble('MIN(temperature)'),
});
}
await dataSet.close();
```
## Migration from Old Pattern
The old pattern loaded all data into memory at once. The new pattern uses lazy loading with iterators.
### Old Approach (No Longer Supported)
```typescript
// ❌ This no longer works - executeQueryStatement now returns SessionDataSet
// const result = await session.executeQueryStatement('SELECT * FROM root.test.d1');
// for (const row of result.rows) {
// const timestamp = row[0];
// const value1 = row[1];
// const value2 = row[2];
// console.log(timestamp, value1, value2);
// }
```
### New Approach (Current Implementation)
```typescript
// ✅ New way - lazy loading with iterator
const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1');
while (await dataSet.hasNext()) {
const row = dataSet.next();
const timestamp = row.getTimestamp();
const value1 = row.getValueByIndex(0);
const value2 = row.getValueByIndex(1);
console.log(timestamp, value1, value2);
}
await dataSet.close();
```
### Migration Steps
1. **Change method call**: `executeQueryStatement()` now returns `SessionDataSet` (not `QueryResult`)
2. **Replace array access**: Use iterator pattern instead of `result.rows`
3. **Update row access**: Use `RowRecord` methods instead of array indices
4. **Add cleanup**: Always call `await dataSet.close()`
### For Small Datasets
If you need all data at once for small result sets, use `toArray()`:
```typescript
const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1');
const allRows = await dataSet.toArray(); // Loads everything into memory
// allRows is [[timestamp, value1, value2], ...]
```
⚠️ **Warning**: `toArray()` loads all data into memory. Only use for small result sets.
## Best Practices
1. **Always Close**: Use try-finally to ensure `dataSet.close()` is called
2. **Check hasNext()**: Always call `hasNext()` before calling `next()`
3. **Use Fetch Size**: Set appropriate fetch size based on memory and network
4. **Handle Nulls**: Check `isNull()` before accessing nullable columns
5. **Typed Access**: Use typed getters (`getInt`, `getString`) for type safety
6. **Avoid toArray()**: Don't use `toArray()` for large datasets
## Performance Tips
1. **Fetch Size**: Larger fetch size = fewer network calls, more memory used
- Small datasets: 100-1000
- Large datasets: 1000-10000
- Very large: 10000-50000
2. **Batch Processing**: Process rows in batches for better performance
```typescript
const dataSet = await session.executeQueryStatement('SELECT * FROM root.test.d1');
const batch = [];
while (await dataSet.hasNext()) {
batch.push(dataSet.next());
if (batch.length >= 1000) {
await processBatch(batch);
batch.length = 0;
}
}
if (batch.length > 0) {
await processBatch(batch);
}
await dataSet.close();
```
3. **Column Access**: Access by index is slightly faster than by name
```typescript
// Faster
const value = row.getIntByIndex(0);
// Slightly slower (name lookup)
const value = row.getInt('temperature');
```
## Troubleshooting
### "No more rows available" Error
```typescript
// ❌ Wrong - calling next() without checking hasNext()
const row = dataSet.next(); // May throw error
// ✅ Correct
if (await dataSet.hasNext()) {
const row = dataSet.next();
}
```
### "Column not found" Error
```typescript
// Make sure column names match exactly (case-sensitive)
const columns = dataSet.getColumnNames();
console.log('Available columns:', columns);
// Use correct column name
const value = row.getString('temperature'); // Not 'Temperature' or 'temp'
```
### Memory Issues
```typescript
// ❌ Don't do this with large datasets
const allRows = await dataSet.toArray(); // Loads everything into memory
// ✅ Process iteratively instead
while (await dataSet.hasNext()) {
const row = dataSet.next();
// Process and discard each row
}
```
## See Also
- [Implementation Guide](implementation.md)
- [Data Types Reference](data-types.md)
- [TypeScript Examples](typescript-examples.md)