title: S4 0.6.0 overview

What is S4?

S4 is a general-purpose,near real-time, distributed, decentralized, scalable, event-driven, modular platform that allows programmers to easily implement applications for processing continuous unbounded streams of data.

S4 0.5 focused on providing a functional complete refactoring.

S4 0.6 builds on this basis and brings plenty of exciting features, in particular:

major performance improvements: stream throughput improved by 1000 % (200k+ messages / s / stream)
major configurability and usability improvements, for both the S4 platform and deployed applications

What are the cool features?

Flexible deployment:

Application packages are standard jar files (suffixed .s4r)
Platform modules for customizing the platform are standard jar files
By default keys are homogeneously sparsed over the cluster: helps balance the load, especially for fine grained partitioning

Modular design:

both the platform and the applications are built by dependency injection, and configured through independent modules.
makes it easy to customize the system according to specific requirements
pluggable event serving policies: load shedding, throttling, blocking

Dynamic and loose coupling of S4 applications:

through a pub-sub mechanism
makes it easy to:
- assemble subsystems into larger systems
- reuse applications
- separate pre-processing
- provision, control and update subsystems independently

Fault tolerant

Fail-over mechanism for high availability
Checkpointing and recovery mechanism for minimizing state loss

Pure Java: statically typed, easy to understand, to refactor, and to extend

How does it work?

Some definitions

Platform

S4 provides a runtime distributed platform that handles communication, scheduling and distribution across containers.
Distributed containers are called S4 nodes
S4 nodes are deployed on S4 clusters
S4 clusters define named ensembles of S4 nodes
- by default, the size of the cluster is fixed
- the size of an S4 cluster corresponds to the number of logical partitions (sometimes referred to as tasks)

an ongoing integration with Apache Helix will remove these limitations and allow a variable number of nodes and rebalancing the partitions

Applications

Users develop applications and deploy them on S4 clusters
Applications are built as a graph of:
- Processing elements (PEs)
- Streams that interconnect PEs
PEs communicate asynchronously by sending events on streams.
Events are dispatched to nodes according to their key

External streams are a special kind of stream that:

send events outside of the application
receive events from external sources
can interconnect and assemble applications into larger systems.

Adapters are S4 applications that can convert external streams into streams of S4 events. Since adapters are also S4 applications, they can be scaled easily.

A hierarchical perspective on S4

The following diagram sums-up the key concepts in a hierarchical fashion: