docs/performance-guide.md - iotdb-client-nodejs - Git at Google

 # Performance Optimization Guide

 ## Overview

 This document describes the performance optimizations implemented in the IoTDB Node.js client.

 ## Performance Benchmarks (2026)

 ### Write Performance - Multi-Process Cluster Mode

 | Configuration | Workers | Clients/Worker | Throughput | Avg Latency |
 |---------------|---------|----------------|------------|-------------|
 | **Best Config** | **8** | **10** | **5.42M pts/s** | **329ms** |
 | Single Process | 1 | 20 | 4.28M pts/s | 107ms |
 | Over-parallel | 10 | 10 | 3.81M pts/s | 551ms |

 **Best Configuration:**
 ```bash
 WORKER_COUNT=8 CLIENT_NUMBER=10 DEVICE_NUMBER=1000 \
 SENSOR_NUMBER=50 BATCH_SIZE_PER_WRITE=500 POOL_MAX_SIZE=10 \
 node benchmark/benchmark-table-cluster.js
 ```

 ### Node.js vs Java Performance

 | Client | Max Throughput | Architecture |
 |--------|----------------|--------------|
 | Node.js (multi-process) | ~5.5M pts/s | 8 workers × event loop |
 | Java iot-benchmark | ~60M pts/s | Multi-threaded |

 **Gap Analysis (~11x):**
 - Node.js single-threaded architecture limits true parallelism
 - Java Thrift implementation has JIT optimization
 - Java can use DirectByteBuffer for zero-copy

 ### Key Optimization Findings

 1. **Tablet Size**: 25K points (500 rows × 50 sensors) is optimal
    - Larger tablets (100K) cause latency spikes (3.5s)
    - Smaller tablets increase RPC overhead

 2. **Worker Count**: 8 workers is optimal for most servers
    - 10+ workers cause server saturation
    - Each worker needs independent SessionPool

 3. **Memory Management**: Use streaming batch processing
    - Build tablets per-loop, not all upfront
    - Prevents OOM for large-scale tests (billions of points)

 ## Implemented Optimizations

 ### 1. Buffer Pooling

 **Problem**: Frequent buffer allocations and deallocations cause significant GC (Garbage Collection) pressure, especially when serializing large datasets.

 **Solution**: Implemented `BufferPool` with size-based pooling strategy:

 ```typescript
 import { globalBufferPool } from '@iotdb/client';

 // Buffer pool automatically manages buffers in 7 size classes:
 // 1KB, 4KB, 16KB, 64KB, 256KB, 1MB, 4MB

 // Get statistics
 const stats = globalBufferPool.getStats();
 console.log(`Hit rate: ${stats.hitRate}`);
 console.log(`Pooled buffers: ${stats.pooledBuffers}`);
 ```

 **Impact**:
 - Reduces GC pressure by 70-80%
 - Particularly effective for batch operations
 - Automatic cleanup prevents memory bloat

 **When to use**:
 - Enabled by default via `enableFastSerialization: true`
 - Most beneficial for workloads with:
   - Large batch inserts (100+ rows)
   - High-frequency writes
   - Long-running processes

 ### 2. Fast Serialization

 **Problem**: Original serialization used multiple buffer concatenations and intermediate allocations, causing performance bottlenecks.

 **Solution**: Implemented type-specific fast serializers in `FastSerializer.ts`:

 ```typescript
 // Old approach (multiple allocations):
 const buffer1 = serializeColumn1();
 const buffer2 = serializeColumn2();
 const result = Buffer.concat([buffer1, buffer2]); // Extra allocation!

 // New approach (single pre-allocated buffer):
 const totalSize = calculateSize();
 const result = Buffer.allocUnsafe(totalSize);
 // Write directly to result buffer
 ```

 **Features**:
 - Single-pass serialization
 - Pre-calculated buffer sizes
 - Direct buffer writes (no intermediate arrays)
 - Conditional pooling (only for buffers >= 1KB)

 **Impact**:
 - **1.5-2x faster** serialization
 - **50-60% reduction** in memory allocations
 - Zero intermediate buffer copies

 ### 3. Optimized Timestamp Handling

 **Problem**: Converting timestamps one-by-one to BigInt and writing to buffer was inefficient.

 **Solution**: Batch timestamp conversion with optimized buffer writes:

 ```typescript
 // Optimized timestamp serialization
 function serializeTimestamps(timestamps: number[]): Buffer {
   const size = timestamps.length * 8;
   const buffer = size >= 1024 ? globalBufferPool.acquire(size) : Buffer.allocUnsafe(size);

   for (let i = 0; i < timestamps.length; i++) {
     buffer.writeBigInt64BE(BigInt(Math.floor(timestamps[i])), i * 8);
   }

   return buffer.subarray(0, size);
 }
 ```

 **Impact**:
 - **20-30% faster** timestamp processing
 - Particularly effective for large batches

 ### 4. Columnar Result Format (Phase 2)

 **Problem**: Row-by-row processing with object allocation creates overhead for large result sets.

 **Solution**: Added `toColumnar()` API inspired by pg's array mode:

 ```typescript
 const dataSet = await session.executeQueryStatement('SELECT temp, humidity FROM root.test');

 // OLD WAY: Object per row (high allocation overhead)
 while (await dataSet.hasNext()) {
   const row = dataSet.next();  // Creates RowRecord object
   console.log(row.getValue('temp'));
 }

 // NEW WAY: Columnar format (zero allocation overhead)
 const columnar = await dataSet.toColumnar();
 // columnar = {
 //   timestamps: [ts1, ts2, ts3, ...],
 //   values: [[temp1, temp2, temp3, ...], [humidity1, humidity2, humidity3, ...]],
 //   columnNames: ['temp', 'humidity'],
 //   columnTypes: ['FLOAT', 'FLOAT']
 // }

 // Process entire columns at once
 const avgTemp = columnar.values[0].reduce((a, b) => a + b) / columnar.values[0].length;
 ```

 **Impact**:
 - **2-3x faster** for bulk query processing
 - **80-90% reduction** in GC pressure
 - Enables vectorized processing
 - Perfect for analytics workloads

 **When to use**:
 - ✅ Small to medium result sets (< 100K rows)
 - ✅ Analytics and aggregation workloads
 - ✅ When processing entire columns
 - ❌ Very large result sets (use iterator pattern)
 - ❌ When you need streaming with backpressure

 ## Configuration

 ### Enabling/Disabling Fast Serialization

 ```typescript
 import { Session } from '@iotdb/client';

 // Enable (default)
 const session = new Session({
   host: 'localhost',
   port: 6667,
   enableFastSerialization: true,  // Uses optimized serializers
 });

 // Disable (fall back to legacy)
 const legacySession = new Session({
   host: 'localhost',
   port: 6667,
   enableFastSerialization: false,  // Uses original serializers
 });
 ```

 ### When to Disable Fast Serialization

 You might want to disable fast serialization if:
 - Debugging serialization issues
 - Running on memory-constrained environments
 - Comparing performance with legacy behavior

 ## Performance Benchmarks

 ### Write Performance

 | Scenario | Legacy | Optimized | Improvement |
 |----------|--------|-----------|-------------|
 | Small batch (10 rows, 10 columns) | 2.5ms | 1.8ms | **1.4x** |
 | Medium batch (100 rows, 10 columns) | 15ms | 6ms | **2.5x** |
 | Large batch (1000 rows, 10 columns) | 180ms | 65ms | **2.8x** |
 | Mixed data types | 25ms | 10ms | **2.5x** |

 ### Query Performance (toColumnar vs iterator)

 | Result Set Size | Iterator (objects) | toColumnar | Improvement |
 |-----------------|-------------------|------------|-------------|
 | 1,000 rows | 45ms | 18ms | **2.5x** |
 | 10,000 rows | 520ms | 180ms | **2.9x** |
 | 100,000 rows | 5800ms | 1900ms | **3.1x** |

 *Benchmarks performed on Node.js v20, Intel i7, 16GB RAM*

 ## Best Practices

 ### 1. Use Batch Inserts

 ```typescript
 // ❌ BAD: One-by-one inserts
 for (let i = 0; i < 1000; i++) {
   await session.insertTablet({
     deviceId: 'root.test.device1',
     measurements: ['temp'],
     dataTypes: [TSDataType.FLOAT],
     timestamps: [Date.now() + i],
     values: [[25.5]],
   });
 }

 // ✅ GOOD: Batch insert
 const batchSize = 100;
 await session.insertTablet({
   deviceId: 'root.test.device1',
   measurements: ['temp'],
   dataTypes: [TSDataType.FLOAT],
   timestamps: Array.from({ length: batchSize }, (_, i) => Date.now() + i),
   values: Array.from({ length: batchSize }, () => [25.5]),
 });
 ```

 ### 2. Use Columnar Format for Analytics

 ```typescript
 // ✅ GOOD: Columnar processing for analytics
 const columnar = await dataSet.toColumnar();
 const temps = columnar.values[0];

 // Vectorized operations
 const avg = temps.reduce((a, b) => a + b, 0) / temps.length;
 const max = Math.max(...temps);
 const min = Math.min(...temps);

 await dataSet.close();
 ```

 ### 3. Choose the Right Query Method

 ```typescript
 // For small result sets - use toColumnar()
 const smallDataSet = await session.executeQueryStatement('SELECT * FROM root.test LIMIT 100');
 const columnar = await smallDataSet.toColumnar();
 await smallDataSet.close();

 // For large result sets - use iterator
 const largeDataSet = await session.executeQueryStatement('SELECT * FROM root.test');
 while (await largeDataSet.hasNext()) {
   const row = largeDataSet.next();
   await processRow(row);  // Process with backpressure
 }
 await largeDataSet.close();
 ```

 ### 4. Monitor Buffer Pool Usage

 ```typescript
 import { globalBufferPool } from '@iotdb/client';

 // After warmup period
 setInterval(() => {
   const stats = globalBufferPool.getStats();
   console.log(`Buffer Pool - Hit rate: ${stats.hitRate}, Pooled: ${stats.pooledBuffers}`);

   // If hit rate < 50%, consider adjusting batch sizes
   if (parseFloat(stats.hitRate) < 50) {
     console.warn('Low buffer pool hit rate - consider larger batch sizes');
   }
 }, 60000); // Check every minute
 ```

 ## Future Optimizations (Planned)

 ### Phase 2 (In Progress)
 - [ ] Batch insert helpers
 - [x] Query result array mode
 - [ ] Cursor/streaming API with backpressure
 - [ ] Request pipelining

 ### Phase 3 (Future)
 - [ ] Optional native bindings for critical paths
 - [ ] Zero-copy deserialization
 - [ ] Custom type parsers
 - [ ] Prepared statement caching

 ## Troubleshooting

 ### High Memory Usage

 ```typescript
 // Clear buffer pool periodically in long-running processes
 import { globalBufferPool } from '@iotdb/client';

 // Clear pool every hour to prevent potential memory bloat
 setInterval(() => {
   globalBufferPool.clear();
 }, 3600000);
 ```

 ### Slow Serialization

 ```typescript
 // Enable performance logging
 process.env.LOG_LEVEL = 'debug';

 // Check serialization timings in logs:
 // [PERF] Values serialization: 5ms, buffer size: 4096 bytes
 // [PERF] Timestamp serialization (fast=true): 1ms
 ```

 ### Unexpected Results

 ```typescript
 // Disable fast serialization for debugging
 const session = new Session({
   host: 'localhost',
   port: 6667,
   enableFastSerialization: false,  // Use legacy serializers
 });
 ```

 ## Contributing

 Performance improvements are welcome! When contributing:

 1. **Benchmark first**: Establish baseline with existing code
 2. **Profile**: Use Node.js profiler to identify bottlenecks
 3. **Test thoroughly**: Ensure correctness with existing test suite
 4. **Document**: Update this guide with your improvements

 ## References

 - [pg nodejs client](https://github.com/brianc/node-postgres) - Inspiration for buffer management
 - [postgres.js](https://github.com/porsager/postgres) - Additional optimization patterns
 - [Node.js Buffer Documentation](https://nodejs.org/api/buffer.html)
 - [IoTDB Documentation](https://iotdb.apache.org/)

 ## License

 Apache License 2.0
	# Performance Optimization Guide

	## Overview

	This document describes the performance optimizations implemented in the IoTDB Node.js client.

	## Performance Benchmarks (2026)

	### Write Performance - Multi-Process Cluster Mode

	\| Configuration \| Workers \| Clients/Worker \| Throughput \| Avg Latency \|
	\|---------------\|---------\|----------------\|------------\|-------------\|
	\| Best Config \| 8 \| 10 \| 5.42M pts/s \| 329ms \|
	\| Single Process \| 1 \| 20 \| 4.28M pts/s \| 107ms \|
	\| Over-parallel \| 10 \| 10 \| 3.81M pts/s \| 551ms \|

	Best Configuration:
	```bash
	WORKER_COUNT=8 CLIENT_NUMBER=10 DEVICE_NUMBER=1000 \
	SENSOR_NUMBER=50 BATCH_SIZE_PER_WRITE=500 POOL_MAX_SIZE=10 \
	node benchmark/benchmark-table-cluster.js
	```

	### Node.js vs Java Performance

	\| Client \| Max Throughput \| Architecture \|
	\|--------\|----------------\|--------------\|
	\| Node.js (multi-process) \| ~5.5M pts/s \| 8 workers × event loop \|
	\| Java iot-benchmark \| ~60M pts/s \| Multi-threaded \|

	Gap Analysis (~11x):
	- Node.js single-threaded architecture limits true parallelism
	- Java Thrift implementation has JIT optimization
	- Java can use DirectByteBuffer for zero-copy

	### Key Optimization Findings

	1. Tablet Size: 25K points (500 rows × 50 sensors) is optimal
	- Larger tablets (100K) cause latency spikes (3.5s)
	- Smaller tablets increase RPC overhead

	2. Worker Count: 8 workers is optimal for most servers
	- 10+ workers cause server saturation
	- Each worker needs independent SessionPool

	3. Memory Management: Use streaming batch processing
	- Build tablets per-loop, not all upfront
	- Prevents OOM for large-scale tests (billions of points)

	## Implemented Optimizations

	### 1. Buffer Pooling

	Problem: Frequent buffer allocations and deallocations cause significant GC (Garbage Collection) pressure, especially when serializing large datasets.

	Solution: Implemented `BufferPool` with size-based pooling strategy:

	```typescript
	import { globalBufferPool } from '@iotdb/client';

	// Buffer pool automatically manages buffers in 7 size classes:
	// 1KB, 4KB, 16KB, 64KB, 256KB, 1MB, 4MB

	// Get statistics
	const stats = globalBufferPool.getStats();
	console.log(`Hit rate: ${stats.hitRate}`);
	console.log(`Pooled buffers: ${stats.pooledBuffers}`);
	```

	Impact:
	- Reduces GC pressure by 70-80%
	- Particularly effective for batch operations
	- Automatic cleanup prevents memory bloat

	When to use:
	- Enabled by default via `enableFastSerialization: true`
	- Most beneficial for workloads with:
	- Large batch inserts (100+ rows)
	- High-frequency writes
	- Long-running processes

	### 2. Fast Serialization

	Problem: Original serialization used multiple buffer concatenations and intermediate allocations, causing performance bottlenecks.

	Solution: Implemented type-specific fast serializers in `FastSerializer.ts`:

	```typescript
	// Old approach (multiple allocations):
	const buffer1 = serializeColumn1();
	const buffer2 = serializeColumn2();
	const result = Buffer.concat([buffer1, buffer2]); // Extra allocation!

	// New approach (single pre-allocated buffer):
	const totalSize = calculateSize();
	const result = Buffer.allocUnsafe(totalSize);
	// Write directly to result buffer
	```

	Features:
	- Single-pass serialization
	- Pre-calculated buffer sizes
	- Direct buffer writes (no intermediate arrays)
	- Conditional pooling (only for buffers >= 1KB)

	Impact:
	- 1.5-2x faster serialization
	- 50-60% reduction in memory allocations
	- Zero intermediate buffer copies

	### 3. Optimized Timestamp Handling

	Problem: Converting timestamps one-by-one to BigInt and writing to buffer was inefficient.

	Solution: Batch timestamp conversion with optimized buffer writes:

	```typescript
	// Optimized timestamp serialization
	function serializeTimestamps(timestamps: number[]): Buffer {
	const size = timestamps.length * 8;
	const buffer = size >= 1024 ? globalBufferPool.acquire(size) : Buffer.allocUnsafe(size);

	for (let i = 0; i < timestamps.length; i++) {
	buffer.writeBigInt64BE(BigInt(Math.floor(timestamps[i])), i * 8);
	}

	return buffer.subarray(0, size);
	}
	```

	Impact:
	- 20-30% faster timestamp processing
	- Particularly effective for large batches

	### 4. Columnar Result Format (Phase 2)

	Problem: Row-by-row processing with object allocation creates overhead for large result sets.

	Solution: Added `toColumnar()` API inspired by pg's array mode:

	```typescript
	const dataSet = await session.executeQueryStatement('SELECT temp, humidity FROM root.test');

	// OLD WAY: Object per row (high allocation overhead)
	while (await dataSet.hasNext()) {
	const row = dataSet.next(); // Creates RowRecord object
	console.log(row.getValue('temp'));
	}

	// NEW WAY: Columnar format (zero allocation overhead)
	const columnar = await dataSet.toColumnar();
	// columnar = {
	// timestamps: [ts1, ts2, ts3, ...],
	// values: [[temp1, temp2, temp3, ...], [humidity1, humidity2, humidity3, ...]],
	// columnNames: ['temp', 'humidity'],
	// columnTypes: ['FLOAT', 'FLOAT']
	// }

	// Process entire columns at once
	const avgTemp = columnar.values[0].reduce((a, b) => a + b) / columnar.values[0].length;
	```

	Impact:
	- 2-3x faster for bulk query processing
	- 80-90% reduction in GC pressure
	- Enables vectorized processing
	- Perfect for analytics workloads

	When to use:
	- ✅ Small to medium result sets (< 100K rows)
	- ✅ Analytics and aggregation workloads
	- ✅ When processing entire columns
	- ❌ Very large result sets (use iterator pattern)
	- ❌ When you need streaming with backpressure

	## Configuration

	### Enabling/Disabling Fast Serialization

	```typescript
	import { Session } from '@iotdb/client';

	// Enable (default)
	const session = new Session({
	host: 'localhost',
	port: 6667,
	enableFastSerialization: true, // Uses optimized serializers
	});

	// Disable (fall back to legacy)
	const legacySession = new Session({
	host: 'localhost',
	port: 6667,
	enableFastSerialization: false, // Uses original serializers
	});
	```

	### When to Disable Fast Serialization

	You might want to disable fast serialization if:
	- Debugging serialization issues
	- Running on memory-constrained environments
	- Comparing performance with legacy behavior

	## Performance Benchmarks

	### Write Performance

	\| Scenario \| Legacy \| Optimized \| Improvement \|
	\|----------\|--------\|-----------\|-------------\|
	\| Small batch (10 rows, 10 columns) \| 2.5ms \| 1.8ms \| 1.4x \|
	\| Medium batch (100 rows, 10 columns) \| 15ms \| 6ms \| 2.5x \|
	\| Large batch (1000 rows, 10 columns) \| 180ms \| 65ms \| 2.8x \|
	\| Mixed data types \| 25ms \| 10ms \| 2.5x \|

	### Query Performance (toColumnar vs iterator)

	\| Result Set Size \| Iterator (objects) \| toColumnar \| Improvement \|
	\|-----------------\|-------------------\|------------\|-------------\|
	\| 1,000 rows \| 45ms \| 18ms \| 2.5x \|
	\| 10,000 rows \| 520ms \| 180ms \| 2.9x \|
	\| 100,000 rows \| 5800ms \| 1900ms \| 3.1x \|

	Benchmarks performed on Node.js v20, Intel i7, 16GB RAM

	## Best Practices

	### 1. Use Batch Inserts

	```typescript
	// ❌ BAD: One-by-one inserts
	for (let i = 0; i < 1000; i++) {
	await session.insertTablet({
	deviceId: 'root.test.device1',
	measurements: ['temp'],
	dataTypes: [TSDataType.FLOAT],
	timestamps: [Date.now() + i],
	values: [[25.5]],
	});
	}

	// ✅ GOOD: Batch insert
	const batchSize = 100;
	await session.insertTablet({
	deviceId: 'root.test.device1',
	measurements: ['temp'],
	dataTypes: [TSDataType.FLOAT],
	timestamps: Array.from({ length: batchSize }, (_, i) => Date.now() + i),
	values: Array.from({ length: batchSize }, () => [25.5]),
	});
	```

	### 2. Use Columnar Format for Analytics

	```typescript
	// ✅ GOOD: Columnar processing for analytics
	const columnar = await dataSet.toColumnar();
	const temps = columnar.values[0];

	// Vectorized operations
	const avg = temps.reduce((a, b) => a + b, 0) / temps.length;
	const max = Math.max(...temps);
	const min = Math.min(...temps);

	await dataSet.close();
	```

	### 3. Choose the Right Query Method

	```typescript
	// For small result sets - use toColumnar()
	const smallDataSet = await session.executeQueryStatement('SELECT * FROM root.test LIMIT 100');
	const columnar = await smallDataSet.toColumnar();
	await smallDataSet.close();

	// For large result sets - use iterator
	const largeDataSet = await session.executeQueryStatement('SELECT * FROM root.test');
	while (await largeDataSet.hasNext()) {
	const row = largeDataSet.next();
	await processRow(row); // Process with backpressure
	}
	await largeDataSet.close();
	```

	### 4. Monitor Buffer Pool Usage

	```typescript
	import { globalBufferPool } from '@iotdb/client';

	// After warmup period
	setInterval(() => {
	const stats = globalBufferPool.getStats();
	console.log(`Buffer Pool - Hit rate: ${stats.hitRate}, Pooled: ${stats.pooledBuffers}`);

	// If hit rate < 50%, consider adjusting batch sizes
	if (parseFloat(stats.hitRate) < 50) {
	console.warn('Low buffer pool hit rate - consider larger batch sizes');
	}
	}, 60000); // Check every minute
	```

	## Future Optimizations (Planned)

	### Phase 2 (In Progress)
	- [ ] Batch insert helpers
	- [x] Query result array mode
	- [ ] Cursor/streaming API with backpressure
	- [ ] Request pipelining

	### Phase 3 (Future)
	- [ ] Optional native bindings for critical paths
	- [ ] Zero-copy deserialization
	- [ ] Custom type parsers
	- [ ] Prepared statement caching

	## Troubleshooting

	### High Memory Usage

	```typescript
	// Clear buffer pool periodically in long-running processes
	import { globalBufferPool } from '@iotdb/client';

	// Clear pool every hour to prevent potential memory bloat
	setInterval(() => {
	globalBufferPool.clear();
	}, 3600000);
	```

	### Slow Serialization

	```typescript
	// Enable performance logging
	process.env.LOG_LEVEL = 'debug';

	// Check serialization timings in logs:
	// [PERF] Values serialization: 5ms, buffer size: 4096 bytes
	// [PERF] Timestamp serialization (fast=true): 1ms
	```

	### Unexpected Results

	```typescript
	// Disable fast serialization for debugging
	const session = new Session({
	host: 'localhost',
	port: 6667,
	enableFastSerialization: false, // Use legacy serializers
	});
	```

	## Contributing

	Performance improvements are welcome! When contributing:

	1. Benchmark first: Establish baseline with existing code
	2. Profile: Use Node.js profiler to identify bottlenecks
	3. Test thoroughly: Ensure correctness with existing test suite
	4. Document: Update this guide with your improvements

	## References

	- [pg nodejs client](https://github.com/brianc/node-postgres) - Inspiration for buffer management
	- [postgres.js](https://github.com/porsager/postgres) - Additional optimization patterns
	- [Node.js Buffer Documentation](https://nodejs.org/api/buffer.html)
	- [IoTDB Documentation](https://iotdb.apache.org/)

	## License

	Apache License 2.0