tree: 2901b4d247ecfe1f604f0a12d868ac6286873c12 [path history] [tgz]
  1. dfa/
  2. event/
  3. generator/
  4. kafka/
  5. KafkaEventsGeneratorJob.java
  6. README.md
  7. StateMachineExample.java
flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/statemachine/README.md

Example: Running a state machine for pattern detection

This example illustrates a minimal roll-your-own event pattern detection scenario, using a simple state machine that is evaluated over the stream.

While this example is much simpler and more manual than what the CEP library supports, it illustrates the use of event processing and state management for a medium complex scenario.

Scenario Description

Events in streams are expected to occur in certain patterns. Any deviation from these patterns indicates an anomaly that the streaming system should recognize and that should trigger an alert.

You can, for example, think of events as being generated by network devices and services, such as firewalls login-, and registration with an authentication service, etc. A deviation from the expected pattern might indicate an intrusion detection.

The event patterns are tracked per interacting party (here simplified per source IP address) and are validated by a state machine. The state machine's states define what possible events may occur next, and what new states these events will result in.

The following diagram depicts the state machine used in this example.

           +--<a>--> W --<b>--> Y --<e>---+
           |                    ^         |
   INITIAL-+                    |         |
           |                    |         +--> (Z) -----<g>---> TERM
           +--<c>--> X --<b>----+         |
                     |                    |
                     +--------<d>---------+

Example Program

The main class of this example program is org.apache.flink.streaming.examples.statemachine.StateMachineExample. The core logic is in the flatMap function, which runs the state machines per IP address.

The streaming data flow is as shown below, where the source stream may come from either an embedded data generator, or from a from a Kafka topic:

 [ stream partition 1] --> source --> partition -+---> flatMap(state machine) --> sink
                                            \/
                                            /\
 [ stream partition 2] --> source --> partition -+---> flatMap(state machine) --> sink