| --- |
| title: S4 0.6.0 overview |
| --- |
| |
| |
| |
| # What is S4? |
| |
| S4 is a general-purpose,near real-time, distributed, decentralized, scalable, event-driven, modular platform that allows programmers to easily implement applications for processing continuous unbounded streams of data. |
| |
| |
| S4 0.5 focused on providing a functional complete refactoring. |
| |
| S4 0.6 builds on this basis and brings plenty of exciting features, in particular: |
| |
| * **major performance improvements**: stream throughput improved by 1000 % (200k+ messages / s / stream) |
| * major [configurability](../configuration) and usability improvements, for both the S4 platform and deployed applications |
| |
| |
| # What are the cool features? |
| |
| **Flexible deployment**: |
| |
| * Application packages are standard jar files (suffixed `.s4r`) |
| * Platform modules for customizing the platform are standard jar files |
| * By default keys are homogeneously sparsed over the cluster: helps balance the load, especially for fine grained partitioning |
| |
| **Modular design**: |
| |
| * both the platform and the applications are built by dependency injection, and configured through independent modules. |
| * makes it **easy to customize** the system according to specific requirements |
| * pluggable event serving policies: **load shedding, throttling, blocking** |
| |
| **Dynamic and loose coupling of S4 applications**: |
| |
| * through a pub-sub mechanism |
| * makes it easy to: |
| * assemble subsystems into larger systems |
| * reuse applications |
| * separate pre-processing |
| * provision, control and update subsystems independently |
| |
| |
| **[Fault tolerant](../fault_tolerance)** |
| |
| |
| * **Fail-over** mechanism for high availability |
| * **Checkpointing and recovery** mechanism for minimizing state loss |
| |
| **Pure Java**: statically typed, easy to understand, to refactor, and to extend |
| |
| |
| |
| |
| # How does it work? |
| |
| ## Some definitions |
| |
| **Platform** |
| |
| * S4 provides a runtime distributed platform that handles communication, scheduling and distribution across containers. |
| * Distributed containers are called **S4 nodes** |
| * S4 nodes are deployed on **S4 clusters** |
| * S4 clusters define named ensembles of S4 nodes |
| * by default, the size of the cluster is fixed |
| * the size of an S4 cluster corresponds to the number of logical **partitions** (sometimes referred to as **tasks**) |
| |
| > an ongoing integration with [Apache Helix](http://helix.apache.org) will remove these limitations and allow a variable number of nodes and rebalancing the partitions |
| |
| **Applications** |
| |
| * Users develop applications and deploy them on S4 clusters |
| * Applications are built as a graph of: |
| * **Processing elements** (PEs) |
| * **Streams** that interconnect PEs |
| * PEs communicate asynchronously by sending **events** on streams. |
| * Events are dispatched to nodes according to their **key** |
| |
| **External streams** are a special kind of stream that: |
| |
| * send events outside of the application |
| * receive events from external sources |
| * can interconnect and assemble applications into larger systems. |
| |
| **Adapters** are S4 applications that can convert external streams into streams of S4 events. Since adapters are also S4 applications, they can be scaled easily. |
| |
| |
| |
| ## A hierarchical perspective on S4 |
| |
| The following diagram sums-up the key concepts in a hierarchical fashion: |
| |
|  |
| |
| ## Note |
| |
| S4 0.6 works exclusively with TCP |
| |