This document describes the internal architecture and design decisions of the Apache Ignite 3 client.
IgniteImpl)IgniteClient (see TcpIgniteClient)TcpClientChannel)Connection management involves three primary concerns:
These concerns are handled concurrently:
When IgniteClient.builder().addresses("foo:10800", "192.168.0.1").build() is called:
Build endpoint list - For each provided address:
Establish first connection - Iterate over endpoints and attempt to connect to each one until successful
Once the first connection is established, the client is fully functional and the IgniteClient instance is returned to the user.
Additional connections are established in the background after the initial connection is made. This provides robustness, load balancing, and partition awareness.
Each connection exchanges periodic heartbeats with the server:
New server addresses can be discovered through two mechanisms:
IgniteClientConfiguration.addressesFinderAddress discovery happens periodically (IgniteClientConfiguration#backgroundReResolveAddressesInterval) and on specific events (connection loss, partition assignment change).
When new addresses are discovered:
For each operation, the client selects a connection using the following algorithm:
Check operation type: Is it a single-key operation (e.g., get, put, remove)?
Round-robin selection: Pick an active connection using round-robin strategy
Fallback: If no active connections exist, attempt to establish a new connection to one of the known server endpoints. connectTimeout and retryPolicy configuration properties control the behavior of this attempt.
Endpoints are iterated in the order they were provided. This design allows users to prioritize certain addresses (e.g., prefer local servers over remote ones).
In scenarios where many client instances initialize simultaneously (e.g., application startup), the first server in the list could be flooded with connection attempts. Despite this concern, endpoint randomization was rejected for the following reasons:
The client is returned as soon as one connection is established. This minimizes startup time and allows applications to begin operations immediately.
It is fine to “miss” partition-aware routing occasionally (stale assignment, broken connection, etc). The server will redirect the request if necessary. We optimize for the happy path where the cluster is stable and connections are healthy.