blob: 75fd0193eb63b02e73a699df6427c92e2c0d05e6 [file] [log] [blame] [view]
# SessionPool Exception Handling and Health Monitoring
## Overview
The Apache IoTDB C# client library provides comprehensive exception handling and health monitoring capabilities for SessionPool operations. This document explains how to handle pool depletion scenarios, monitor pool health, and implement recovery strategies.
## SessionPoolDepletedException
### Description
`SessionPoolDepletedException` is a specialized exception thrown when the SessionPool cannot provide a client connection. This indicates that:
- All clients in the pool are currently in use, OR
- Client connections have failed and reconnection attempts were unsuccessful, OR
- The pool wait timeout has been exceeded
### Exception Properties
The exception provides detailed diagnostic information through the following properties:
| Property | Type | Description |
| --------------------- | ------ | -------------------------------------------------------------------------- |
| `DepletionReason` | string | A human-readable description of why the pool was depleted |
| `AvailableClients` | int | Number of currently available clients in the pool at the time of exception |
| `TotalPoolSize` | int | The total configured size of the session pool |
| `FailedReconnections` | int | Number of failed reconnection attempts since the pool was opened |
### Example Usage
```csharp
using Apache.IoTDB;
using System;
try
{
var sessionPool = new SessionPool.Builder()
.Host("127.0.0.1")
.Port(6667)
.PoolSize(4)
.Build();
await sessionPool.Open();
// Perform operations...
await sessionPool.InsertRecordAsync("root.sg.d1", record);
}
catch (SessionPoolDepletedException ex)
{
Console.WriteLine($"Pool depleted: {ex.DepletionReason}");
Console.WriteLine($"Available clients: {ex.AvailableClients}/{ex.TotalPoolSize}");
Console.WriteLine($"Failed reconnections: {ex.FailedReconnections}");
// Implement recovery strategy (see below)
}
```
## Pool Health Metrics
### Monitoring Pool Status
The `SessionPool` class exposes real-time health metrics that can be used for monitoring and alerting:
```csharp
var sessionPool = new SessionPool.Builder()
.Host("127.0.0.1")
.Port(6667)
.PoolSize(8)
.Build();
await sessionPool.Open();
// Check pool health
Console.WriteLine($"Available Clients: {sessionPool.AvailableClients}");
Console.WriteLine($"Total Pool Size: {sessionPool.TotalPoolSize}");
Console.WriteLine($"Failed Reconnections: {sessionPool.FailedReconnections}");
```
### Health Metrics
| Metric | Property | Description | Recommended Threshold |
| -------------------- | --------------------- | ------------------------------------------------ | --------------------------- |
| Available Clients | `AvailableClients` | Number of idle clients ready for use | Alert if < 25% of pool size |
| Total Pool Size | `TotalPoolSize` | Configured maximum pool size | N/A (constant) |
| Failed Reconnections | `FailedReconnections` | Cumulative count of failed reconnection attempts | Alert if > 0 and increasing |
## Failure Scenarios and Recovery Strategies
### Scenario 1: Pool Exhaustion (High Load)
**Symptoms:**
- `SessionPoolDepletedException` with reason "Connection pool is empty and wait time out"
- `AvailableClients` = 0
- `FailedReconnections` = 0 or low
**Root Cause:** Application workload exceeds pool capacity
**Recovery Strategies:**
1. **Increase Pool Size:**
```csharp
var sessionPool = new SessionPool.Builder()
.Host("127.0.0.1")
.Port(6667)
.PoolSize(16) // Increased from 8
.Build();
```
2. **Implement Connection Retry with Backoff:**
```csharp
int maxRetries = 3;
int retryDelayMs = 1000;
for (int i = 0; i < maxRetries; i++)
{
try
{
await sessionPool.InsertRecordAsync(deviceId, record);
break; // Success
}
catch (SessionPoolDepletedException ex) when (i < maxRetries - 1)
{
await Task.Delay(retryDelayMs * (i + 1)); // Exponential backoff
}
}
```
3. **Optimize Operation Duration:**
- Reduce the time each client is held
- Batch multiple operations together
- Use async operations efficiently
### Scenario 2: Network Connectivity Issues
**Symptoms:**
- `SessionPoolDepletedException` with reason "Reconnection failed"
- `AvailableClients` decreases over time
- `FailedReconnections` > 0 and increasing
**Root Cause:** IoTDB server unreachable or network issues
**Recovery Strategies:**
1. **Reinitialize SessionPool:**
```csharp
catch (SessionPoolDepletedException ex) when (ex.FailedReconnections > 5)
{
Console.WriteLine($"Critical: {ex.FailedReconnections} failed reconnections");
// Close existing pool
await sessionPool.Close();
// Wait for network recovery
await Task.Delay(5000);
// Create new pool
sessionPool = new SessionPool.Builder()
.Host("127.0.0.1")
.Port(6667)
.PoolSize(8)
.Build();
await sessionPool.Open();
}
```
2. **Implement Circuit Breaker Pattern:**
```csharp
public class SessionPoolCircuitBreaker
{
private SessionPool _pool;
private int _failureCount = 0;
private const int FailureThreshold = 5;
private bool _circuitOpen = false;
private DateTime _lastFailureTime;
public async Task<T> ExecuteAsync<T>(Func<SessionPool, Task<T>> operation)
{
if (_circuitOpen && DateTime.Now - _lastFailureTime < TimeSpan.FromMinutes(1))
{
throw new Exception("Circuit breaker is open");
}
try
{
var result = await operation(_pool);
_failureCount = 0; // Reset on success
_circuitOpen = false;
return result;
}
catch (SessionPoolDepletedException ex)
{
_failureCount++;
_lastFailureTime = DateTime.Now;
if (_failureCount >= FailureThreshold)
{
_circuitOpen = true;
Console.WriteLine("Circuit breaker opened - too many failures");
}
throw;
}
}
}
```
### Scenario 3: Server Overload
**Symptoms:**
- Intermittent `SessionPoolDepletedException`
- Both connection timeouts and reconnection failures
**Root Cause:** IoTDB server is overloaded
**Recovery Strategies:**
1. **Implement Rate Limiting:**
```csharp
using System.Threading;
private SemaphoreSlim _rateLimiter = new SemaphoreSlim(10, 10); // Max 10 concurrent operations
public async Task RateLimitedInsert(string deviceId, RowRecord record)
{
await _rateLimiter.WaitAsync();
try
{
await sessionPool.InsertRecordAsync(deviceId, record);
}
finally
{
_rateLimiter.Release();
}
}
```
2. **Add Timeout Configuration:**
```csharp
var sessionPool = new SessionPool.Builder()
.Host("127.0.0.1")
.Port(6667)
.Timeout(120) // Increased timeout for slow server
.Build();
```
## Monitoring and Alerting Recommendations
### Health Check Implementation
```csharp
public class SessionPoolHealthCheck
{
private readonly SessionPool _pool;
public SessionPoolHealthCheck(SessionPool pool)
{
_pool = pool;
}
public HealthStatus CheckHealth()
{
var availableRatio = (double)_pool.AvailableClients / _pool.TotalPoolSize;
if (_pool.FailedReconnections > 10)
{
return new HealthStatus
{
Status = "Critical",
Message = $"High reconnection failures: {_pool.FailedReconnections}",
Recommendation = "Check IoTDB server availability"
};
}
if (availableRatio < 0.25)
{
return new HealthStatus
{
Status = "Warning",
Message = $"Low available clients: {_pool.AvailableClients}/{_pool.TotalPoolSize}",
Recommendation = "Consider increasing pool size"
};
}
return new HealthStatus
{
Status = "Healthy",
Message = $"Pool healthy: {_pool.AvailableClients}/{_pool.TotalPoolSize} available"
};
}
}
public class HealthStatus
{
public string Status { get; set; }
public string Message { get; set; }
public string Recommendation { get; set; }
}
```
### Metrics Collection for Monitoring Systems
```csharp
// Example: Export metrics to Prometheus, StatsD, or similar
public class SessionPoolMetricsCollector
{
private readonly SessionPool _pool;
public void CollectMetrics()
{
// Gauge: Current available clients
MetricsCollector.Set("iotdb_pool_available_clients", _pool.AvailableClients);
// Gauge: Total pool size
MetricsCollector.Set("iotdb_pool_total_size", _pool.TotalPoolSize);
// Counter: Failed reconnections
MetricsCollector.Set("iotdb_pool_failed_reconnections", _pool.FailedReconnections);
// Calculated: Pool utilization percentage
var utilization = (1.0 - (double)_pool.AvailableClients / _pool.TotalPoolSize) * 100;
MetricsCollector.Set("iotdb_pool_utilization_percent", utilization);
}
}
```
### Recommended Alert Rules
1. **Critical Alerts:**
- `FailedReconnections > 10`: Server connectivity issues
- `AvailableClients == 0` for > 30 seconds: Complete pool exhaustion
2. **Warning Alerts:**
- `AvailableClients < TotalPoolSize * 0.25`: Pool under pressure
- `FailedReconnections > 0` and increasing: Network instability
3. **Info Alerts:**
- Pool utilization > 75% for extended periods: Consider scaling
## Best Practices
1. **Pool Sizing:**
- Start with poolSize = 2 × expected concurrent operations
- Monitor and adjust based on actual usage patterns
- Larger pools use more server resources but provide better throughput
2. **Error Handling:**
- Always catch `SessionPoolDepletedException` specifically
- Log exception properties for debugging
- Implement appropriate retry logic based on depletion reason
3. **Monitoring:**
- Continuously monitor `AvailableClients` metric
- Track `FailedReconnections` as a leading indicator of problems
- Set up alerts before pool is completely depleted
4. **Resource Management:**
- Always call `sessionPool.Close()` when done
- Use `using` statements or try-finally blocks for proper cleanup
- Don't create multiple SessionPool instances unnecessarily
## Example: Complete Production-Ready Implementation
```csharp
using Apache.IoTDB;
using System;
using System.Threading.Tasks;
public class ProductionSessionPoolManager
{
private SessionPool _pool;
private readonly object _lock = new object();
public async Task Initialize()
{
_pool = new SessionPool.Builder()
.Host("127.0.0.1")
.Port(6667)
.PoolSize(8)
.Timeout(60)
.Build();
await _pool.Open();
// Start health monitoring
_ = Task.Run(MonitorHealth);
}
public async Task<T> ExecuteWithRetry<T>(Func<SessionPool, Task<T>> operation)
{
const int maxRetries = 3;
const int baseDelayMs = 1000;
for (int attempt = 0; attempt < maxRetries; attempt++)
{
try
{
return await operation(_pool);
}
catch (SessionPoolDepletedException ex)
{
Console.WriteLine($"Attempt {attempt + 1} failed: {ex.Message}");
Console.WriteLine($"Pool state - Available: {ex.AvailableClients}/{ex.TotalPoolSize}, Failed reconnections: {ex.FailedReconnections}");
if (attempt == maxRetries - 1)
{
// Last attempt failed
if (ex.FailedReconnections > 5)
{
// Reinitialize pool
await ReinitializePool();
}
throw;
}
// Exponential backoff
await Task.Delay(baseDelayMs * (int)Math.Pow(2, attempt));
}
}
throw new InvalidOperationException("Should not reach here");
}
private async Task ReinitializePool()
{
lock (_lock)
{
try
{
_pool?.Close().Wait();
}
catch { }
}
await Task.Delay(5000); // Wait for server recovery
await Initialize();
}
private async Task MonitorHealth()
{
while (true)
{
await Task.Delay(10000); // Check every 10 seconds
try
{
var availableRatio = (double)_pool.AvailableClients / _pool.TotalPoolSize;
if (_pool.FailedReconnections > 10)
{
Console.WriteLine($"CRITICAL: {_pool.FailedReconnections} failed reconnections");
}
else if (availableRatio < 0.25)
{
Console.WriteLine($"WARNING: Low available clients - {_pool.AvailableClients}/{_pool.TotalPoolSize}");
}
}
catch (Exception ex)
{
Console.WriteLine($"Health check failed: {ex.Message}");
}
}
}
public async Task Cleanup()
{
await _pool?.Close();
}
}
```
## Summary
The SessionPool exception handling and health monitoring features provide comprehensive tools for building robust IoTDB applications:
- Use `SessionPoolDepletedException` to understand and react to pool issues
- Monitor `AvailableClients`, `TotalPoolSize`, and `FailedReconnections` metrics
- Implement appropriate recovery strategies based on failure scenarios
- Set up proactive monitoring and alerting to prevent issues
- Follow best practices for pool sizing and resource management