blob: 1b9b461be6283385c4fc3736c92db8d0fbe2b9c3 [file] [log] [blame]
{
"docs": [
{
"location": "/index.html",
"text": "Gearpump is a real-time big data streaming engine.\nIt is inspired by recent advances in the \nAkka\n framework and a desire to improve on existing streaming frameworks.\nGearpump is event/message based and featured as low latency handling, high performance, exactly once semantics,\ndynamic topology update, \nApache Storm\n compatibility, etc.\n\n\nThe name Gearpump is a reference to the engineering term \"gear pump,\" which is a super simple\npump that consists of only two gears, but is very powerful at streaming water.\n\n\nGearpump Technical Highlights\n\n\nGearpump's feature set includes:\n\n\n\n\nExtremely high performance\n\n\nLow latency\n\n\nConfigurable message delivery guarantee (at least once, exactly once).\n\n\nHighly extensible\n\n\nDynamic DAG\n\n\nStorm compatibility\n\n\nSamoa compatibility\n\n\nBoth high level and low level API\n\n\n\n\nGearpump Performance\n\n\nPer initial benchmarks we are able to process 18 million messages/second (100 bytes per message) with a 8ms latency on a 4-node cluster.\n\n\n\n\nGearpump and Akka\n\n\nGearpump is a 100% Akka based platform. We model big data streaming within the Akka actor hierarchy.",
"title": "Overview"
},
{
"location": "/introduction/submit-your-1st-application/index.html",
"text": "Before you can submit and run your first Gearpump application, you will need a running Gearpump service.\nThere are multiple ways to run Gearpump \nLocal mode\n, \nStandalone mode\n, \nYARN mode\n or \nDocker mode\n.\n\n\nThe easiest way is to run Gearpump in \nLocal mode\n.\nAny Linux, MacOSX or Windows desktop can be used with zero configuration.\n\n\nIn the example below, we assume your are running in \nLocal mode\n.\nIf you running Gearpump in one of the other modes, you will need to configure the Gearpump client to\nconnect to the Gearpump service by setting the \ngear.conf\n configuration path in classpath.\nWithin this file, you will need to change the parameter \ngearpump.cluster.masters\n to the correct Gearpump master(s).\nSee \nConfiguration\n for details.\n\n\nSteps to submit your first Application\n\n\nStep 1: Submit application\n\n\nAfter the cluster is started, you can submit an example wordcount application to the cluster\n\n\nOpen another shell,\n\n\n### To run WordCount example\nbin/gear app -jar examples/wordcount-2.11-0.8.3-assembly.jar org.apache.gearpump.streaming.examples.wordcount.WordCount\n\n\n\n\nStep 2: Congratulations, you've submitted your first application.\n\n\nTo view the application status and metrics, start the Web UI services, and browse to \nhttp://127.0.0.1:8090\n to check the status.\nThe default username and password is \"admin:admin\", you can check\n\nUI Authentication\n to find how to manage users.\n\n\n\n\nNOTE:\n the UI port setting can be defined in configuration, please check section \nConfiguration\n.\n\n\nA quick Look at the Web UI\n\n\nTBD\n\n\nOther Application Examples\n\n\nBesides wordcount, there are several other example applications. Please check the source tree examples/ for detail information.",
"title": "Submit Your 1st Application"
},
{
"location": "/introduction/submit-your-1st-application/index.html#steps-to-submit-your-first-application",
"text": "Step 1: Submit application After the cluster is started, you can submit an example wordcount application to the cluster Open another shell, ### To run WordCount example\nbin/gear app -jar examples/wordcount-2.11-0.8.3-assembly.jar org.apache.gearpump.streaming.examples.wordcount.WordCount Step 2: Congratulations, you've submitted your first application. To view the application status and metrics, start the Web UI services, and browse to http://127.0.0.1:8090 to check the status.\nThe default username and password is \"admin:admin\", you can check UI Authentication to find how to manage users. NOTE: the UI port setting can be defined in configuration, please check section Configuration .",
"title": "Steps to submit your first Application"
},
{
"location": "/introduction/submit-your-1st-application/index.html#a-quick-look-at-the-web-ui",
"text": "TBD",
"title": "A quick Look at the Web UI"
},
{
"location": "/introduction/submit-your-1st-application/index.html#other-application-examples",
"text": "Besides wordcount, there are several other example applications. Please check the source tree examples/ for detail information.",
"title": "Other Application Examples"
},
{
"location": "/introduction/commandline/index.html",
"text": "The commands can be found at: \"bin\" folder of Gearpump binary.\n\n\nNOTE:\n on MS Windows platform, please use window shell gear.bat script instead. bash script doesn't work well in cygwin/mingw.\n\n\nCreating an uber-jar\n\n\nIf you use Maven you can have a look \nhere\n whereas SBT users may find \nthis\n useful.\n\n\nSubmit an new application\n\n\nYou can use the command \ngear\n under the bin directory to submit, query and terminate an application:\n\n\ngear app [-namePrefix <application name prefix>] [-executors <number of executors to launch>] [-conf <custom gearpump config file>] -jar xx.jar MainClass <arg1> <arg2> ...\n\n\n\n\nList all running applications\n\n\nTo list all running applications:\n\n\ngear info [-conf <custom gearpump config file>]\n\n\n\n\nKill a running application\n\n\nTo kill an application:\n\n\ngear kill -appid <application id> [-conf <custom gearpump config file>]\n\n\n\n\nSubmit a storm application to Gearpump Cluster\n\n\nFor example, to submit a storm application jar:\n\n\nstorm -verbose -config storm.yaml -jar storm-starter-${STORM_VERSION}.jar storm.starter.ExclamationTopology exclamation\n\n\n\n\nStorm Compatibility Guide\n\n\nStart Gearpump Cluster on YARN\n\n\nTo start a Gearpump Cluster on YARN, you can:\n\n\nyarnclient launch -package /usr/lib/gearpump/gearpump-2.11-0.8.3.zip\n\n\n\n\n/usr/lib/gearpump/gearpump-2.11-0.8.3.zip\n should be available on HDFS.\n\n\nPlease check \nYARN Deployment Guide\n for more information.\n\n\nStart a local cluster\n\n\nMasters and workers will be started in one machine:\n\n\nlocal\n\n\n\n\nCheck \nDeployment Guide for Local Cluster\n for more information.\n\n\nStart master daemons\n\n\nmaster -ip <Ip address> -port <port where this master is hooking>\n\n\n\n\nPlease check \nDeployment for Standalone mode\n for more information.\n\n\nStart worker daemons\n\n\nworker\n\n\n\n\nPlease check \nDeployment for Standalone mode\n for more information.\n\n\nStart UI server\n\n\nTo start UI server, you can:\n\n\nservices [-master <host:port>]\n\n\n\n\nThe default username and password is \"admin:admin\", you can check\n\nUI Authentication\n to find how to manage users.",
"title": "Client Command Line"
},
{
"location": "/introduction/basic-concepts/index.html",
"text": "System timestamp and Application timestamp\n\n\nSystem timestamp denotes the time of backend cluster system. Application timestamp denotes the time at which message is generated. For example, for IoT edge device, the timestamp at which field sensor device creates a message is type of application timestamp, while the timestamp at which that message get received by the backend is type of system time.\n\n\nMaster, and Worker\n\n\nGearpump follow master slave architecture. Every cluster contains one or more Master node, and several worker nodes. Worker node is responsible to manage local resources on single machine, and Master node is responsible to manage global resources of the whole cluster.\n\n\n\n\nApplication\n\n\nApplication is what we want to parallel and run on the cluster. There are different application types, for example MapReduce application and streaming application are different application types. Gearpump natively supports Streaming Application types, it also contains several templates to help user to create custom application types, like distributedShell.\n\n\nAppMaster and Executor\n\n\nIn runtime, every application instance is represented by a single AppMaster and a list of Executors. AppMaster represents the command and controls center of the Application instance. It communicates with user, master, worker, and executor to get the job done. Each executor is a parallel unit for distributed application. Typically AppMaster and Executor will be started as JVM processes on worker nodes.\n\n\nApplication Submission Flow\n\n\nWhen user submits an application to Master, Master will first find an available worker to start the AppMaster. After AppMaster is started, AppMaster will request Master for more resources (worker) to start executors. The Executor now is only an empty container. After the executors are started, the AppMaster will then distribute real computation tasks to the executor and run them in parallel way.\n\n\nTo submit an application, a Gearpump client specifies a computation defined within a DAG and submits this to an active master. The SubmitApplication message is sent to the Master who then forwards this to an AppManager.\n\n\n\nFigure: User Submit Application\n\n\nThe AppManager locates an available worker and launches an AppMaster in a sub-process JVM of the worker. The AppMaster will then negotiate with the Master for Resource allocation in order to distribute the DAG as defined within the Application. The allocated workers will then launch Executors (new JVMs).\n\n\n\nFigure: Launch Executors and Tasks\n\n\nStreaming Topology, Processor, and Task\n\n\nFor streaming application type, each application contains a topology, which is a DAG (directed acyclic graph) to describe the data flow. Each node in the DAG is a processor. For example, for word count it contains two processors, Split and Sum. The Split processor splits a line to a list of words, and then the Sum processor summarize the frequency of each word.\nAn application is a DAG of processors. Each processor handles messages.\n\n\n\nFigure: Processor DAG\n\n\nStreaming Task and Partitioner\n\n\nFor streaming application type, Task is the minimum unit of parallelism. In runtime, each Processor is paralleled to a list of tasks, with different tasks running in different executor. You can define Partitioner to denote the data shuffling rule between upstream processor tasks and downstream processor tasks.\n\n\n\nFigure: Task Data Shuffling",
"title": "Basic Concepts"
},
{
"location": "/introduction/features/index.html",
"text": "Technical highlights of Gearpump\n\n\nGearpump is a high performance, flexible, fault-tolerant, and responsive streaming platform with a lot of nice features, its technical highlights include:\n\n\nActors everywhere\n\n\nThe Actor model is a concurrency model proposed by Carl Hewitt at 1973. The Actor model is like a micro-service which is cohesive in the inside and isolated from other outside actors. Actors are the cornerstone of Gearpump, they provide facilities to do message passing, error handling, liveliness monitoring. Gearpump uses Actors everywhere; every entity within the cluster that can be treated as a service.\n\n\n\n\nExactly once Message Processing\n\n\nExactly once is defined as: the effect of a message will be calculated only once in the persisted state and computation errors in the history will not be propagated to future computations.\n\n\n\n\nTopology DAG DSL\n\n\nUser can submit to Gearpump a computation DAG, which contains a list of nodes and edges, and each node can be parallelized to a set of tasks. Gearpump will then schedule and distribute different tasks in the DAG to different machines automatically. Each task will be started as an actor, which is long running micro-service.\n\n\n\n\nFlow control\n\n\nGearpump has built-in support for flow control. For all message passing between different tasks, the framework will assure the upstream tasks will not flood the downstream tasks.\n\n\n\nNo inherent end to end latency\n\n\nGearpump is a message level streaming engine, which means every task in the DAG will process messages immediately upon receiving, and deliver messages to downstream immediately without waiting. Gearpump doesn't do batching when data sourcing.\n\n\nHigh Performance message passing\n\n\nBy implementing smart batching strategies, Gearpump is extremely effective in transferring small messages. In one test of 4 machines, the whole cluster throughput can reach 18 million messages per second, with message size of 100 bytes.\n\n\n\nHigh availability, No single point of failure\n\n\nGearpump has a careful design for high availability. We have considered message loss, worker machine crash, application crash, master crash, brain-split, and have made sure Gearpump recovers when these errors may occur. When there is message loss, the lost message will be replayed; when there is a worker machine crash or application crash, the related computation tasks will be rescheduled on new machines. For master high availability, several master nodes will form a Akka cluster, and CRDTs (conflict free data types) are used to exchange the state, so as long as there is still a quorum, the master will stay functional. When one master node fails, other master nodes in the cluster will take over and state will be recovered.\n\n\n\n\nDynamic Computation DAG\n\n\nGearpump provides a feature which allows the user to dynamically add, remove, or replace a sub graph at runtime, without the need to restart the whole computation topology.\n\n\n\n\nAble to handle out of order messages\n\n\nFor a window operation like moving average on a sliding window, it is important to make sure we have received all messages in that time window so that we can get an accurate result, but how do we handle stranglers or late arriving messages? Gearpump solves this problem by tracking the low watermark of timestamp of all messages, so it knows whether we've received all the messages in the time window or not.\n\n\n\n\nCustomizable platform\n\n\nDifferent applications have different requirements related to performance metrics, some may want higher throughput, some may require strong eventual data consistency; and different applications have different resource requirements profiles, some may demand high CPU performance, some may require data locality. Gearpump meets these requirements by allowing the user to arbitrate between different performance metrics and define customized resource scheduling strategies.\n\n\nBuilt-in Dashboard UI\n\n\nGearpump has a built-in dashboard UI to manage the cluster and visualize the applications. The UI uses REST calls to connect with backend, so it is easy to embed the UI within other dashboards.\n\n\n\n\nData connectors for Kafka and HDFS\n\n\nGearpump has built-in data connectors for Kafka and HDFS. For the Kafka connector, we support message replay from a specified timestamp.",
"title": "Technical Highlights"
},
{
"location": "/introduction/message-delivery/index.html",
"text": "What is At Least Once Message Delivery?\n\n\nMessages could be lost on delivery due to network partitions. \nAt Least Once Message Delivery\n (at least once) means the lost messages are delivered one or more times such that at least one is processed and acknowledged by the whole flow. \n\n\nGearpump guarantees at least once for any source that is able to replay message from a past timestamp. In Gearpump, each message is tagged with a timestamp, and the system tracks the minimum timestamp of all pending messages (the global minimum clock). On message loss, application will be restarted to the global minimum clock. Since the source is able to replay from the global minimum clock, all pending messages before the restart will be replayed. Gearpump calls that kind of source \nTimeReplayableSource\n and already provides a built in\n\nKafkaSource\n. With the KafkaSource to ingest data into Gearpump, users are guaranteed at least once message delivery.\n\n\nWhat is Exactly Once Message Delivery?\n\n\nAt least once delivery doesn't guarantee the correctness of the application result. For instance, for a task keeping the count of received messages, there could be overcount with duplicated messages and the count is lost on task failure.\n In that case, \nExactly Once Message Delivery\n (exactly once) is required, where state is updated by a message exactly once. This further requires that duplicated messages are filtered out and in-memory states are persisted.\n\n\nUsers are guaranteed exactly once in Gearpump if they use both a \nTimeReplayableSource\n to ingest data and the Persistent API to manage their in memory states. With the Persistent API, user state is periodically checkpointed by the system to a persistent store (e.g HDFS) along with its checkpointed time. Gearpump tracks the global minimum checkpoint timestamp of all pending states (global minimum checkpoint clock), which is persisted as well. On application restart, the system restores states at the global minimum checkpoint clock and source replays messages from that clock. This ensures that a message updates all states exactly once.\n\n\nPersistent API\n\n\nPersistent API consists of \nPersistentTask\n and \nPersistentState\n.\n\n\nHere is an example of using them to keep count of incoming messages.\n\n\nclass CountProcessor(taskContext: TaskContext, conf: UserConfig)\n extends PersistentTask[Long](taskContext, conf) {\n\n override def persistentState: PersistentState[Long] = {\n import com.twitter.algebird.Monoid.longMonoid\n new NonWindowState[Long](new AlgebirdMonoid(longMonoid), new ChillSerializer[Long])\n }\n\n override def processMessage(state: PersistentState[Long], message: Message): Unit = {\n state.update(message.timestamp, 1L)\n }\n}\n\n\n\n\nThe \nCountProcessor\n creates a customized \nPersistentState\n which will be managed by \nPersistentTask\n and overrides the \nprocessMessage\n method to define how the state is updated on a new message (each new message counts as \n1\n, which is added to the existing value)\n\n\nGearpump has already offered two types of states\n\n\n\n\nNonWindowState - state with no time or other boundary\n\n\nWindowState - each state is bounded by a time window\n\n\n\n\nThey are intended for states that satisfy monoid laws.\n\n\n\n\nhas binary associative operation, like \n+\n \n\n\nhas an identity element, like \n0\n\n\n\n\nIn the above example, we make use of the \nlongMonoid\n from \nTwitter's Algebird\n library which provides a bunch of useful monoids.",
"title": "Reliable Message Delivery"
},
{
"location": "/introduction/message-delivery/index.html#what-is-at-least-once-message-delivery",
"text": "Messages could be lost on delivery due to network partitions. At Least Once Message Delivery (at least once) means the lost messages are delivered one or more times such that at least one is processed and acknowledged by the whole flow. Gearpump guarantees at least once for any source that is able to replay message from a past timestamp. In Gearpump, each message is tagged with a timestamp, and the system tracks the minimum timestamp of all pending messages (the global minimum clock). On message loss, application will be restarted to the global minimum clock. Since the source is able to replay from the global minimum clock, all pending messages before the restart will be replayed. Gearpump calls that kind of source TimeReplayableSource and already provides a built in KafkaSource . With the KafkaSource to ingest data into Gearpump, users are guaranteed at least once message delivery.",
"title": "What is At Least Once Message Delivery?"
},
{
"location": "/introduction/message-delivery/index.html#what-is-exactly-once-message-delivery",
"text": "At least once delivery doesn't guarantee the correctness of the application result. For instance, for a task keeping the count of received messages, there could be overcount with duplicated messages and the count is lost on task failure.\n In that case, Exactly Once Message Delivery (exactly once) is required, where state is updated by a message exactly once. This further requires that duplicated messages are filtered out and in-memory states are persisted. Users are guaranteed exactly once in Gearpump if they use both a TimeReplayableSource to ingest data and the Persistent API to manage their in memory states. With the Persistent API, user state is periodically checkpointed by the system to a persistent store (e.g HDFS) along with its checkpointed time. Gearpump tracks the global minimum checkpoint timestamp of all pending states (global minimum checkpoint clock), which is persisted as well. On application restart, the system restores states at the global minimum checkpoint clock and source replays messages from that clock. This ensures that a message updates all states exactly once. Persistent API Persistent API consists of PersistentTask and PersistentState . Here is an example of using them to keep count of incoming messages. class CountProcessor(taskContext: TaskContext, conf: UserConfig)\n extends PersistentTask[Long](taskContext, conf) {\n\n override def persistentState: PersistentState[Long] = {\n import com.twitter.algebird.Monoid.longMonoid\n new NonWindowState[Long](new AlgebirdMonoid(longMonoid), new ChillSerializer[Long])\n }\n\n override def processMessage(state: PersistentState[Long], message: Message): Unit = {\n state.update(message.timestamp, 1L)\n }\n} The CountProcessor creates a customized PersistentState which will be managed by PersistentTask and overrides the processMessage method to define how the state is updated on a new message (each new message counts as 1 , which is added to the existing value) Gearpump has already offered two types of states NonWindowState - state with no time or other boundary WindowState - each state is bounded by a time window They are intended for states that satisfy monoid laws. has binary associative operation, like + has an identity element, like 0 In the above example, we make use of the longMonoid from Twitter's Algebird library which provides a bunch of useful monoids.",
"title": "What is Exactly Once Message Delivery?"
},
{
"location": "/introduction/performance-report/index.html",
"text": "Performance Evaluation\n\n\nTo illustrate the performance of Gearpump, we mainly focused on two aspects, throughput and latency, using a micro benchmark called SOL (an example in the Gearpump package) whose topology is quite simple. SOLStreamProducer delivers messages to SOLStreamProcessor constantly and SOLStreamProcessor does nothing. We set up a 4-nodes cluster with 10GbE network and each node's hardware is briefly shown as follows:\n\n\nProcessor: 32 core Intel(R) Xeon(R) CPU E5-2690 2.90GHz\nMemory: 64GB\n\n\nThroughput\n\n\nWe tried to explore the upper bound of the throughput, after launching 48 SOLStreamProducer and 48 SOLStreamProcessor the Figure below shows that the whole throughput of the cluster can reach about 18 million messages/second(100 bytes per message)\n\n\nLatency\n\n\nWhen we transfer message at the max throughput above, the average latency between two tasks is 8ms.\n\n\nFault Recovery time\n\n\nWhen the corruption is detected, for example the Executor is down, Gearpump will reallocate the resource and restart the application. It takes about 10 seconds to recover the application.\n\n\n\n\nHow to setup the benchmark environment?\n\n\nPrepare the env\n\n\n1). Set up a 4-nodes Gearpump cluster with 10GbE network which have 4 Workers on each node. In our test environment, each node has 64GB memory and Intel(R) Xeon(R) 32-core processor E5-2690 2.90GHz. Make sure the metrics is enabled in Gearpump.\n\n\n2). Submit a SOL application with 48 StreamProducers and 48 StreamProcessors:\n\n\nbin/gear app -jar ./examples/sol-2.11-0.8.3-assembly.jar -streamProducer 48 -streamProcessor 48\n\n\n\n\n3). Launch Gearpump's dashboard and browser http://$HOST:8090/, switch to the Applications tab and you can see the detail information of your application. The HOST should be the node runs dashboard.",
"title": "Performance"
},
{
"location": "/introduction/performance-report/index.html#performance-evaluation",
"text": "To illustrate the performance of Gearpump, we mainly focused on two aspects, throughput and latency, using a micro benchmark called SOL (an example in the Gearpump package) whose topology is quite simple. SOLStreamProducer delivers messages to SOLStreamProcessor constantly and SOLStreamProcessor does nothing. We set up a 4-nodes cluster with 10GbE network and each node's hardware is briefly shown as follows: Processor: 32 core Intel(R) Xeon(R) CPU E5-2690 2.90GHz\nMemory: 64GB",
"title": "Performance Evaluation"
},
{
"location": "/introduction/performance-report/index.html#throughput",
"text": "We tried to explore the upper bound of the throughput, after launching 48 SOLStreamProducer and 48 SOLStreamProcessor the Figure below shows that the whole throughput of the cluster can reach about 18 million messages/second(100 bytes per message)",
"title": "Throughput"
},
{
"location": "/introduction/performance-report/index.html#latency",
"text": "When we transfer message at the max throughput above, the average latency between two tasks is 8ms.",
"title": "Latency"
},
{
"location": "/introduction/performance-report/index.html#fault-recovery-time",
"text": "When the corruption is detected, for example the Executor is down, Gearpump will reallocate the resource and restart the application. It takes about 10 seconds to recover the application.",
"title": "Fault Recovery time"
},
{
"location": "/introduction/performance-report/index.html#how-to-setup-the-benchmark-environment",
"text": "Prepare the env 1). Set up a 4-nodes Gearpump cluster with 10GbE network which have 4 Workers on each node. In our test environment, each node has 64GB memory and Intel(R) Xeon(R) 32-core processor E5-2690 2.90GHz. Make sure the metrics is enabled in Gearpump. 2). Submit a SOL application with 48 StreamProducers and 48 StreamProcessors: bin/gear app -jar ./examples/sol-2.11-0.8.3-assembly.jar -streamProducer 48 -streamProcessor 48 3). Launch Gearpump's dashboard and browser http://$HOST:8090/, switch to the Applications tab and you can see the detail information of your application. The HOST should be the node runs dashboard.",
"title": "How to setup the benchmark environment?"
},
{
"location": "/introduction/gearpump-internals/index.html",
"text": "Actor Hierarchy?\n\n\n\n\nEverything in the diagram is an actor; they fall into two categories, Cluster Actors and Application Actors.\n\n\nCluster Actors\n\n\nWorker\n: Maps to a physical worker machine. It is responsible for managing resources and report metrics on that machine.\n\n\nMaster\n: Heart of the cluster, which manages workers, resources, and applications. The main function is delegated to three child actors, App Manager, Worker Manager, and Resource Scheduler.\n\n\nApplication Actors:\n\n\nAppMaster\n: Responsible to schedule the tasks to workers and manage the state of the application. Different applications have different AppMaster instances and are isolated.\n\n\nExecutor\n: Child of AppMaster, represents a JVM process. Its job is to manage the life cycle of tasks and recover the tasks in case of failure.\n\n\nTask\n: Child of Executor, does the real job. Every task actor has a global unique address. One task actor can send data to any other task actors. This gives us great flexibility of how the computation DAG is distributed.\n\n\nAll actors in the graph are weaved together with actor supervision, and actor watching and every error is handled properly via supervisors. In a master, a risky job is isolated and delegated to child actors, so it's more robust. In the application, an extra intermediate layer \"Executor\" is created so that we can do fine-grained and fast recovery in case of task failure. A master watches the lifecycle of AppMaster and worker to handle the failures, but the life cycle of Worker and AppMaster are not bound to a Master Actor by supervision, so that Master node can fail independently. Several Master Actors form an Akka cluster, the Master state is exchanged using the Gossip protocol in a conflict-free consistent way so that there is no single point of failure. With this hierarchy design, we are able to achieve high availability.\n\n\nApplication Clock and Global Clock Service\n\n\nGlobal clock service will track the minimum time stamp of all pending messages in the system. Every task will update its own minimum-clock to global clock service; the minimum-clock of task is decided by the minimum of:\n\n\n\n\nMinimum time stamp of all pending messages in the inbox.\n\n\nMinimum time stamp of all un-acked outgoing messages. When there is message loss, the minimum clock will not advance.\n\n\nMinimum clock of all task states. If the state is accumulated by a lot of input messages, then the clock value is decided by the oldest message's timestamp. The state clock will advance by doing snapshots to persistent storage or by fading out the effect of old messages.\n\n\n\n\n\n\nThe global clock service will keep track of all task minimum clocks effectively and maintain a global view of minimum clock. The global minimum clock value is monotonically increasing; it means that all source messages before this clock value have been processed. If there is message loss or task crash, the global minimum clock will stop.\n\n\nHow do we optimize the message passing performance?\n\n\nFor streaming application, message passing performance is extremely important. For example, one streaming platform may need to process millions of messages per second with millisecond level latency. High throughput and low latency is not that easy to achieve. There are a number of challenges:\n\n\nFirst Challenge: Network is not efficient for small messages\n\n\nIn streaming, typical message size is very small, usually less than 100 bytes per message, like the floating car GPS data. But network efficiency is very bad when transferring small messages. As you can see in below diagram, when message size is 50 bytes, it can only use 20% bandwidth. How to improve the throughput?\n\n\n\n\nSecond Challenge: Message overhead is too big\n\n\nFor each message sent between two actors, it contains sender and receiver actor path. When sending over the wire, the overhead of this ActorPath is not trivial. For example, the below actor path takes more than 200 bytes.\n\n\nakka.tcp://system1@192.168.1.53:51582/remote/akka.tcp/2120193a-e10b-474e-bccb-8ebc4b3a0247@192.168.1.53:48948/remote/akka.tcp/system2@192.168.1.54:43676/user/master/Worker1/app_0_executor_0/group_1_task_0#-768886794\n\n\n\n\nHow do we solve this?\n\n\nWe implement a custom Netty transportation layer with Akka extension. In the below diagram, Netty Client will translate ActorPath to TaskId, and Netty Server will translate it back. Only TaskId will be passed on wire, it is only about 10 bytes, the overhead is minimized. Different Netty Client Actors are isolated; they will not block each other.\n\n\n\n\nFor performance, effective batching is really the key! We group multiple messages to a single batch and send it on the wire. The batch size is not fixed; it is adjusted dynamically based on network status. If the network is available, we will flush pending messages immediately without waiting; otherwise we will put the message in a batch and trigger a timer to flush the batch later.\n\n\nHow do we do flow Control?\n\n\nWithout flow control, one task can easily flood another task with too many messages, causing out of memory error. Typical flow control will use a TCP-like sliding window, so that source and target can run concurrently without blocking each other.\n\n\n\nFigure: Flow control, each task is \"star\" connected to input tasks and output tasks\n\n\nThe difficult part for our problem is that each task can have multiple input tasks and output tasks. The input and output must be geared together so that the back pressure can be properly propagated from downstream to upstream. The flow control also needs to consider failures, and it needs to be able to recover when there is message loss.\nAnother challenge is that the overhead of flow control messages can be big. If we ack every message, there will be huge amount of acked messages in the system, degrading streaming performance. The approach we adopted is to use explicit AckRequest message. The target tasks will only ack back when they receive the AckRequest message, and the source will only send AckRequest when it feels necessary. With this approach, we can largely reduce the overhead.\n\n\nHow do we detect message loss?\n\n\nFor example, for web ads, we may charge for every click, we don't want to miscount. The streaming platform needs to effectively track what messages have been lost, and recover as fast as possible.\n\n\n\nFigure: Message Loss Detection\n\n\nWe use the flow control message AckRequest and Ack to detect message loss. The target task will count how many messages has been received since last AckRequest, and ack the count back to source task. The source task will check the count and find message loss.\nThis is just an illustration, the real case is more difficulty, we need to handle zombie tasks, and in-the-fly stale messages.\n\n\nHow Gearpump know what messages to replay?\n\n\nIn some applications, a message cannot be lost, and must be replayed. For example, during the money transfer, the bank will SMS us the verification code. If that message is lost, the system must replay it so that money transfer can continue. We made the decision to use \nsource end message storage\n and \ntime stamp based replay\n.\n\n\n\nFigure: Replay with Source End Message Store\n\n\nEvery message is immutable, and tagged with a timestamp. We have an assumption that the timestamp is approximately incremental (allow small ratio message disorder).\n\n\nWe assume the message is coming from a replay-able source, like Kafka queue; otherwise the message will be stored at customizable source end \"message store\". When the source task sends the message downstream, the timestamp and offset of the message is also check-pointed to offset-timestamp storage periodically. During recovery, the system will first retrieve the right time stamp and offset from the offset-timestamp storage, then it will replay the message store from that time stamp and offset. A Timestamp Filter will filter out old messages in case the message in message store is not strictly time-ordered.\n\n\nMaster High Availability\n\n\nIn a distributed streaming system, any part can fail. The system must stay responsive and do recovery in case of errors.\n\n\n\nFigure: Master High Availability\n\n\nWe use Akka clustering to implement the Master high availability. The cluster consists of several master nodes, but no worker nodes. With clustering facilities, we can easily detect and handle the failure of master node crash. The master state is replicated on all master nodes with the Typesafe akka-data-replication library, when one master node crashes, another standby master will read the master state and take over. The master state contains the submission data of all applications. If one application dies, a master can use that state to recover that application. CRDT LwwMap is used to represent the state; it is a hash map that can converge on distributed nodes without conflict. To have strong data consistency, the state read and write must happen on a quorum of master nodes.\n\n\nHow we do handle failures?\n\n\nWith Akka's powerful actor supervision, we can implement a resilient system relatively easy. In Gearpump, different applications have a different AppMaster instance, they are totally isolated from each other. For each application, there is a supervision tree, AppMaster->Executor->Task. With this supervision hierarchy, we can free ourselves from the headache of zombie process, for example if AppMaster is down, Akka supervisor will ensure the whole tree is shutting down.\n\n\nThere are multiple possible failure scenarios\n\n\n\nFigure: Possible Failure Scenarios and Error Supervision Hierarchy\n\n\nWhat happens when the Master crashes?\n\n\nIn case of a master crash, other standby masters will be notified, they will resume the master state, and take over control. Worker and AppMaster will also be notified, They will trigger a process to find the new active master, until the resolution complete. If AppMaster or Worker cannot resolve a new Master in a time out, they will make suicide and kill themselves.\n\n\nWhat happens when a worker crashes?\n\n\nIn case of a worker crash, the Master will get notified and stop scheduling new computation to this worker. All supervised executors on current worker will be killed, AppMaster can treat it as recovery of executor crash like \nWhat happen when an executor crashes?\n\n\nWhat happens when the AppMaster crashes?\n\n\nIf an AppMaster crashes, Master will schedule a new resource to create a new AppMaster Instance elsewhere, and then the AppMaster will handle the recovery inside the application. For streaming, it will recover the latest min clock and other state from disk, request resources from master to start executors, and restart the tasks with recovered min clock.\n\n\nWhat happen when an executor crashes?\n\n\nIf an executor crashes, its supervisor AppMaster will get notified, and request a new resource from the active master to start a new executor, to run the tasks which were located on the crashed executor.\n\n\nWhat happen when tasks crash?\n\n\nIf a task throws an exception, its supervisor executor will restart that Task.\n\n\nWhen \"at least once\" message delivery is enabled, it will trigger the message replaying in the case of message loss. First AppMaster will read the latest minimum clock from the global clock service(or clock storage if the clock service crashes), then AppMaster will restart all the task actors to get a fresh task state, then the source end tasks will replay messages from that minimum clock.\n\n\nHow does \"exactly-once\" message delivery work?\n\n\nFor some applications, it is extremely important to do \"exactly once\" message delivery. For example, for a real-time billing system, we will not want to bill the customer twice. The goal of \"exactly once\" message delivery is to make sure:\n The error doesn't accumulate, today's error will not be accumulated to tomorrow.\n Transparent to application developer\nWe use global clock to synchronize the distributed transactions. We assume every message from the data source will have a unique timestamp, the timestamp can be a part of the message body, or can be attached later with system clock when the message is injected into the streaming system. With this global synchronized clock, we can coordinate all tasks to checkpoint at same timestamp.\n\n\n\nFigure: Checkpointing and Exactly-Once Message delivery\n\n\nWorkflow to do state checkpointing:\n\n\n\n\nThe coordinator asks the streaming system to do checkpoint at timestamp Tc.\n\n\nFor each application task, it will maintain two states, checkpoint state and current state. Checkpoint state only contains information before timestamp Tc. Current state contains all information.\n\n\nWhen global minimum clock is larger than Tc, it means all messages older than Tc has been processed; the checkpoint state will no longer change, so we will then persist the checkpoint state to storage safely.\n\n\nWhen there is message loss, we will start the recovery process.\n\n\nTo recover, load the latest checkpoint state from store, and then use it to restore the application status.\n\n\nData source replays messages from the checkpoint timestamp.\n\n\n\n\nThe checkpoint interval is determined by global clock service dynamically. Each data source will track the max timestamp of input messages. Upon receiving min clock updates, the data source will report the time delta back to global clock service. The max time delta is the upper bound of the application state timespan. The checkpoint interval is bigger than max delta time:\n\n\n\n\n\nFigure: How to determine Checkpoint Interval\n\n\nAfter the checkpoint interval is notified to tasks by global clock service, each task will calculate its next checkpoint timestamp autonomously without global synchronization.\n\n\n\n\nFor each task, it contains two states, checkpoint state and current state. The code to update the state is shown in listing below.\n\n\nTaskState(stateStore, initialTimeStamp):\n currentState = stateStore.load(initialTimeStamp)\n checkpointState = currentState.clone\n checkpointTimestamp = nextCheckpointTimeStamp(initialTimeStamp)\nonMessage(msg):\n if (msg.timestamp < checkpointTimestamp):\n checkpointState.updateMessage(msg)\n currentState.updateMessage(msg) \n maxClock = max(maxClock, msg.timeStamp)\n\nonMinClock(minClock):\n if (minClock > checkpointTimestamp):\n stateStore.persist(checkpointState)\n checkpointTimeStamp = nextCheckpointTimeStamp(maxClock)\n checkpointState = currentState.clone\n\nonNewCheckpointInterval(newStep):\n step = newStep \nnextCheckpointTimeStamp(timestamp):\n checkpointTimestamp = (1 + timestamp/step) * step\n\n\n\n\nList 1: Task Transactional State Implementation\n\n\nWhat is dynamic graph, and how it works?\n\n\nThe DAG can be modified dynamically. We want to be able to dynamically add, remove, and replace a sub-graph.\n\n\n\nFigure: Dynamic Graph, Attach, Replace, and Remove\n\n\nAt least once message delivery and Kafka\n\n\nThe Kafka source example project and tutorials can be found at:\n- \nKafka connector example project\n\n- \nConnect with Kafka source\n\n\nIn this doc, we will talk about how the at least once message delivery works.\n\n\nWe will use the WordCount example of \nsource tree\n to illustrate.\n\n\nHow the kafka WordCount DAG looks like:\n\n\nIt contains three processors:\n\n\n\n\n\nKafkaStreamProducer(or KafkaSource) will read message from kafka queue.\n\n\nSplit will split lines to words\n\n\nSum will summarize the words to get a count for each word.\n\n\n\n\nHow to read data from Kafka\n\n\nWe use KafkaSource, please check \nConnect with Kafka source\n for the introduction.\n\n\nPlease note that we have set a startTimestamp for the KafkaSource, which means KafkaSource will read from Kafka queue starting from messages whose timestamp is near startTimestamp.\n\n\nWhat happen where there is Task crash or message loss?\n\n\nWhen there is message loss, the AppMaster will first pause the global clock service so that the global minimum timestamp no longer change, then it will restart the Kafka source tasks. Upon restart, Kafka Source will start to replay. It will first read the global minimum timestamp from AppMaster, and start to read message from that timestamp.\n\n\nWhat method KafkaSource used to read messages from a start timestamp? As we know Kafka queue doesn't expose the timestamp information.\n\n\nKafka queue only expose the offset information for each partition. What KafkaSource do is to maintain its own mapping from Kafka offset to Application timestamp, so that we can map from a application timestamp to a Kafka offset, and replay Kafka messages from that Kafka offset.\n\n\nThe mapping between Application timestamp with Kafka offset is stored in a distributed file system or as a Kafka topic.",
"title": "Gearpump Internals"
},
{
"location": "/introduction/gearpump-internals/index.html#at-least-once-message-delivery-and-kafka",
"text": "The Kafka source example project and tutorials can be found at:\n- Kafka connector example project \n- Connect with Kafka source In this doc, we will talk about how the at least once message delivery works. We will use the WordCount example of source tree to illustrate. How the kafka WordCount DAG looks like: It contains three processors: KafkaStreamProducer(or KafkaSource) will read message from kafka queue. Split will split lines to words Sum will summarize the words to get a count for each word. How to read data from Kafka We use KafkaSource, please check Connect with Kafka source for the introduction. Please note that we have set a startTimestamp for the KafkaSource, which means KafkaSource will read from Kafka queue starting from messages whose timestamp is near startTimestamp. What happen where there is Task crash or message loss? When there is message loss, the AppMaster will first pause the global clock service so that the global minimum timestamp no longer change, then it will restart the Kafka source tasks. Upon restart, Kafka Source will start to replay. It will first read the global minimum timestamp from AppMaster, and start to read message from that timestamp. What method KafkaSource used to read messages from a start timestamp? As we know Kafka queue doesn't expose the timestamp information. Kafka queue only expose the offset information for each partition. What KafkaSource do is to maintain its own mapping from Kafka offset to Application timestamp, so that we can map from a application timestamp to a Kafka offset, and replay Kafka messages from that Kafka offset. The mapping between Application timestamp with Kafka offset is stored in a distributed file system or as a Kafka topic.",
"title": "At least once message delivery and Kafka"
},
{
"location": "/deployment/deployment-local/index.html",
"text": "You can start the Gearpump service in a single JVM(local mode), or in a distributed cluster(cluster mode). To start the cluster in local mode, you can use the local /local.bat helper scripts, it is very useful for developing or troubleshooting.\n\n\nBelow are the steps to start a Gearpump service in \nLocal\n mode:\n\n\nStep 1: Get your Gearpump binary ready\n\n\nTo get your Gearpump service running in local mode, you first need to have a Gearpump distribution binary ready.\nPlease follow \nthis guide\n to have the binary. \n\n\nStep 2: Start the cluster\n\n\nYou can start a local mode cluster in single line\n\n\n## start the master and 2 workers in single JVM. The master will listen on 3000\n## you can Ctrl+C to kill the local cluster after you finished the startup tutorial.\nbin/local\n\n\n\n\nNOTE:\n You may need to execute \nchmod +x bin/*\n in shell to make the script file \nlocal\n executable.\n\n\nNOTE:\n You can change the default port by changing config \ngearpump.cluster.masters\n in \nconf/gear.conf\n.\n\n\nNOTE: Change the working directory\n. Log files by default will be generated under current working directory. So, please \"cd\" to required working directly before running the shell commands.\n\n\nNOTE: Run as Daemon\n. You can run it as a background process. For example, use \nnohup\n on Linux.\n\n\nStep 3: Start the Web UI server\n\n\nOpen another shell,\n\n\nbin/services\n\n\n\n\nYou can manage the applications in UI \nhttp://127.0.0.1:8090\n or by \nCommand Line tool\n.\nThe default username and password is \"admin:admin\", you can check\n\nUI Authentication\n to find how to manage users.",
"title": "Local Mode"
},
{
"location": "/deployment/deployment-standalone/index.html",
"text": "Standalone mode is a distributed cluster mode. That is, Gearpump runs as service without the help from other services (e.g. YARN).\n\n\nTo deploy Gearpump in cluster mode, please first check that the \nPre-requisites\n are met.\n\n\nHow to Install\n\n\nYou need to have Gearpump binary at hand. Please refer to \nHow to get gearpump distribution\n to get the Gearpump binary.\n\n\nYou are suggested to unzip the package to same directory path on every machine you planned to install Gearpump.\nTo install Gearpump, you at least need to change the configuration in \nconf/gear.conf\n.\n\n\n\n\n\n\n\n\nConfig\n\n\nDefault value\n\n\nDescription\n\n\n\n\n\n\n\n\n\n\ngearpump.hostname\n\n\n\"127.0.0.1\"\n\n\nHost or IP address of current machine. The ip/host need to be reachable from other machines in the cluster.\n\n\n\n\n\n\ngearpump.cluster.masters\n\n\n[\"127.0.0.1:3000\"]\n\n\nList of all master nodes, with each item represents host and port of one master.\n\n\n\n\n\n\ngearpump.worker.slots\n\n\n1000\n\n\nhow many slots this worker has\n\n\n\n\n\n\n\n\nBesides this, there are other optional configurations related with logs, metrics, transports, ui. You can refer to \nConfiguration Guide\n for more details.\n\n\nStart the Cluster Daemons in Standlone mode\n\n\nIn Standalone mode, you can start master and worker in different JVMs.\n\n\nTo start master:\n\n\nbin/master -ip xx -port xx\n\n\n\n\nThe ip and port will be checked against settings under \nconf/gear.conf\n, so you need to make sure they are consistent.\n\n\nNOTE:\n You may need to execute \nchmod +x bin/*\n in shell to make the script file \nmaster\n executable.\n\n\nNOTE\n: for high availability, please check \nMaster HA Guide\n\n\nTo start worker:\n\n\nbin/worker\n\n\n\n\nStart UI\n\n\nbin/services\n\n\n\n\nAfter UI is started, you can browse to \nhttp://{web_ui_host}:8090\n to view the cluster status.\nThe default username and password is \"admin:admin\", you can check\n\nUI Authentication\n to find how to manage users.\n\n\n\n\nNOTE:\n The UI port can be configured in \ngear.conf\n. Check \nConfiguration Guide\n for information.\n\n\nBash tool to start cluster\n\n\nThere is a bash tool \nbin/start-cluster.sh\n can launch the cluster conveniently. You need to change the file \nconf/masters\n, \nconf/workers\n and \nconf/dashboard\n to specify the corresponding machines.\nBefore running the bash tool, please make sure the Gearpump package is already unzipped to the same directory path on every machine.\n\nbin/stop-cluster.sh\n is used to stop the whole cluster of course.\n\n\nThe bash tool is able to launch the cluster without changing the \nconf/gear.conf\n on every machine. The bash sets the \ngearpump.cluster.masters\n and other configurations using JAVA_OPTS.\nHowever, please note when you log into any these unconfigured machine and try to launch the dashboard or submit the application, you still need to modify \nconf/gear.conf\n manually because the JAVA_OPTS is missing.",
"title": "Standalone Mode"
},
{
"location": "/deployment/deployment-yarn/index.html",
"text": "How to launch a Gearpump cluster on YARN\n\n\n\n\n\n\nUpload the \ngearpump-2.11-0.8.3.zip\n to remote HDFS Folder, suggest to put it under \n/usr/lib/gearpump/gearpump-2.11-0.8.3.zip\n\n\n\n\n\n\nMake sure the home directory on HDFS is already created and all read-write rights are granted for user. For example, user gear's home directory is \n/user/gear\n\n\n\n\n\n\nPut the YARN configurations under classpath.\n Before calling \nyarnclient launch\n, make sure you have put all yarn configuration files under classpath. Typically, you can just copy all files under \n$HADOOP_HOME/etc/hadoop\n from one of the YARN Cluster machine to \nconf/yarnconf\n of gearpump. \n$HADOOP_HOME\n points to the Hadoop installation directory. \n\n\n\n\n\n\nLaunch the gearpump cluster on YARN\n\n\nyarnclient launch -package /usr/lib/gearpump/gearpump-2.11-0.8.3.zip\n\n\n\n\nIf you don't specify package path, it will read default package-path (\ngearpump.yarn.client.package-path\n) from \ngear.conf\n.\n\n\nNOTE:\n You may need to execute \nchmod +x bin/*\n in shell to make the script file \nyarnclient\n executable.\n\n\n\n\n\n\nAfter launching, you can browser the Gearpump UI via YARN resource manager dashboard.\n\n\n\n\n\n\nHow to configure the resource limitation of Gearpump cluster\n\n\nBefore launching a Gearpump cluster, please change configuration section \ngearpump.yarn\n in \ngear.conf\n to configure the resource limitation, like:\n\n\n\n\nThe number of worker containers. \n\n\nThe YARN container memory size for worker and master.\n\n\n\n\nHow to submit a application to Gearpump cluster.\n\n\nTo submit the jar to the Gearpump cluster, we first need to know the Master address, so we need to get\na active configuration file first.\n\n\nThere are two ways to get an active configuration file:\n\n\n\n\n\n\nOption 1: specify \"-output\" option when you launch the cluster.\n\n\nyarnclient launch -package /usr/lib/gearpump/gearpump-2.11-0.8.3.zip -output /tmp/mycluster.conf\n\n\n\n\nIt will return in console like this:\n\n\n==Application Id: application_1449802454214_0034\n\n\n\n\n\n\n\n\nOption 2: Query the active configuration file\n\n\nyarnclient getconfig -appid <yarn application id> -output /tmp/mycluster.conf\n\n\n\n\nyarn application id can be found from the output of step1 or from YARN dashboard.\n\n\n\n\n\n\nAfter you downloaded the configuration file, you can launch application with that config file.\n\n\ngear app -jar examples/wordcount-2.11-0.8.3.jar -conf /tmp/mycluster.conf\n\n\n\n\n\n\n\n\nTo run Storm application over Gearpump on YARN, please store the configuration file with \n-output application.conf\n \n and then launch Storm application with\n\n\nstorm -jar examples/storm-2.11-0.8.3.jar storm.starter.ExclamationTopology exclamation\n\n\n\n\n\n\n\n\nNow the application is running. To check this:\n\n\ngear info -conf /tmp/mycluster.conf\n\n\n\n\n\n\n\n\nTo Start a UI server, please do:\n\n\nservices -conf /tmp/mycluster.conf\n\n\n\n\nThe default username and password is \"admin:admin\", you can check \nUI Authentication\n to find how to manage users.\n\n\n\n\n\n\nHow to add/remove machines dynamically.\n\n\nGearpump yarn tool allows to dynamically add/remove machines. Here is the steps:\n\n\n\n\n\n\nFirst, query to get active resources.\n\n\nyarnclient query -appid <yarn application id>\n\n\n\n\nThe console output will shows how many workers and masters there are. For example, I have output like this:\n\n\nmasters:\ncontainer_1449802454214_0034_01_000002(IDHV22-01:35712)\nworkers:\ncontainer_1449802454214_0034_01_000003(IDHV22-01:35712)\ncontainer_1449802454214_0034_01_000006(IDHV22-01:35712)\n\n\n\n\n\n\n\n\nTo add a new worker machine, you can do:\n\n\nyarnclient addworker -appid <yarn application id> -count 2\n\n\n\n\nThis will add two new workers machines. Run the command in first step to check whether the change is effective.\n\n\n\n\n\n\nTo remove old machines, use:\n\n\nyarnclient removeworker -appid <yarn application id> -container <worker container id>\n\n\n\n\nThe worker container id can be found from the output of step 1. For example \"container_1449802454214_0034_01_000006\" is a good container id.\n\n\n\n\n\n\nOther usage:\n\n\n\n\n\n\nTo kill a cluster,\n\n\nyarnclient kill -appid <yarn application id>\n\n\n\n\nNOTE:\n If the application is not launched successfully, then this command won't work. Please use \"yarn application -kill \n\" instead.\n\n\n\n\n\n\nTo check the Gearpump version\n\n\nyarnclient version -appid <yarn application id>",
"title": "YARN Mode"
},
{
"location": "/deployment/deployment-yarn/index.html#how-to-launch-a-gearpump-cluster-on-yarn",
"text": "Upload the gearpump-2.11-0.8.3.zip to remote HDFS Folder, suggest to put it under /usr/lib/gearpump/gearpump-2.11-0.8.3.zip Make sure the home directory on HDFS is already created and all read-write rights are granted for user. For example, user gear's home directory is /user/gear Put the YARN configurations under classpath.\n Before calling yarnclient launch , make sure you have put all yarn configuration files under classpath. Typically, you can just copy all files under $HADOOP_HOME/etc/hadoop from one of the YARN Cluster machine to conf/yarnconf of gearpump. $HADOOP_HOME points to the Hadoop installation directory. Launch the gearpump cluster on YARN yarnclient launch -package /usr/lib/gearpump/gearpump-2.11-0.8.3.zip If you don't specify package path, it will read default package-path ( gearpump.yarn.client.package-path ) from gear.conf . NOTE: You may need to execute chmod +x bin/* in shell to make the script file yarnclient executable. After launching, you can browser the Gearpump UI via YARN resource manager dashboard.",
"title": "How to launch a Gearpump cluster on YARN"
},
{
"location": "/deployment/deployment-yarn/index.html#how-to-configure-the-resource-limitation-of-gearpump-cluster",
"text": "Before launching a Gearpump cluster, please change configuration section gearpump.yarn in gear.conf to configure the resource limitation, like: The number of worker containers. The YARN container memory size for worker and master.",
"title": "How to configure the resource limitation of Gearpump cluster"
},
{
"location": "/deployment/deployment-yarn/index.html#how-to-submit-a-application-to-gearpump-cluster",
"text": "To submit the jar to the Gearpump cluster, we first need to know the Master address, so we need to get\na active configuration file first. There are two ways to get an active configuration file: Option 1: specify \"-output\" option when you launch the cluster. yarnclient launch -package /usr/lib/gearpump/gearpump-2.11-0.8.3.zip -output /tmp/mycluster.conf It will return in console like this: ==Application Id: application_1449802454214_0034 Option 2: Query the active configuration file yarnclient getconfig -appid <yarn application id> -output /tmp/mycluster.conf yarn application id can be found from the output of step1 or from YARN dashboard. After you downloaded the configuration file, you can launch application with that config file. gear app -jar examples/wordcount-2.11-0.8.3.jar -conf /tmp/mycluster.conf To run Storm application over Gearpump on YARN, please store the configuration file with -output application.conf \n and then launch Storm application with storm -jar examples/storm-2.11-0.8.3.jar storm.starter.ExclamationTopology exclamation Now the application is running. To check this: gear info -conf /tmp/mycluster.conf To Start a UI server, please do: services -conf /tmp/mycluster.conf The default username and password is \"admin:admin\", you can check UI Authentication to find how to manage users.",
"title": "How to submit a application to Gearpump cluster."
},
{
"location": "/deployment/deployment-yarn/index.html#how-to-addremove-machines-dynamically",
"text": "Gearpump yarn tool allows to dynamically add/remove machines. Here is the steps: First, query to get active resources. yarnclient query -appid <yarn application id> The console output will shows how many workers and masters there are. For example, I have output like this: masters:\ncontainer_1449802454214_0034_01_000002(IDHV22-01:35712)\nworkers:\ncontainer_1449802454214_0034_01_000003(IDHV22-01:35712)\ncontainer_1449802454214_0034_01_000006(IDHV22-01:35712) To add a new worker machine, you can do: yarnclient addworker -appid <yarn application id> -count 2 This will add two new workers machines. Run the command in first step to check whether the change is effective. To remove old machines, use: yarnclient removeworker -appid <yarn application id> -container <worker container id> The worker container id can be found from the output of step 1. For example \"container_1449802454214_0034_01_000006\" is a good container id.",
"title": "How to add/remove machines dynamically."
},
{
"location": "/deployment/deployment-yarn/index.html#other-usage",
"text": "To kill a cluster, yarnclient kill -appid <yarn application id> NOTE: If the application is not launched successfully, then this command won't work. Please use \"yarn application -kill \" instead. To check the Gearpump version yarnclient version -appid <yarn application id>",
"title": "Other usage:"
},
{
"location": "/deployment/deployment-docker/index.html",
"text": "Gearpump Docker Container\n\n\nThere is pre-built docker container available at \nDocker Repo\n\n\nCheck the documents there to find how to launch a Gearpump cluster in one line.",
"title": "Docker Mode"
},
{
"location": "/deployment/deployment-docker/index.html#gearpump-docker-container",
"text": "There is pre-built docker container available at Docker Repo Check the documents there to find how to launch a Gearpump cluster in one line.",
"title": "Gearpump Docker Container"
},
{
"location": "/deployment/deployment-ui-authentication/index.html",
"text": "What is this about?\n\n\nHow to enable UI authentication?\n\n\n\n\n\n\nChange config file gear.conf, find entry \ngearpump-ui.gearpump.ui-security.authentication-enabled\n, change the value to true\n\n\ngearpump-ui.gearpump.ui-security.authentication-enabled = true\n\n\n\n\nRestart the UI dashboard, and then the UI authentication is enabled. It will prompt for user name and password.\n\n\n\n\n\n\nHow many authentication methods Gearpump UI server support?\n\n\nCurrently, It supports:\n\n\n\n\nUsername-Password based authentication and\n\n\nOAuth2 based authentication.\n\n\n\n\nUser-Password based authentication is enabled when \ngearpump-ui.gearpump.ui-security.authentication-enabled\n,\n and \nCANNOT\n be disabled.\n\n\nUI server admin can also choose to enable \nauxiliary\n OAuth2 authentication channel.\n\n\nUser-Password based authentication\n\n\nUser-Password based authentication covers all authentication scenarios which requires\n user to enter an explicit username and password.\n\n\nGearpump provides a built-in ConfigFileBasedAuthenticator which verify user name and password\n against password hashcode stored in config files.\n\n\nHowever, developer can choose to extends the \norg.apache.gearpump.security.Authenticator\n to provide a custom\n User-Password based authenticator, to support LDAP, Kerberos, and Database-based authentication...\n\n\nConfigFileBasedAuthenticator: built-in User-Password Authenticator\n\n\nConfigFileBasedAuthenticator store all user name and password hashcode in configuration file gear.conf. Here\nis the steps to configure ConfigFileBasedAuthenticator.\n\n\nHow to add or remove user?\n\n\nFor the default authentication plugin, it has three categories of users: admins, users, and guests.\n\n\n\n\nadmins: have unlimited permission, like shutdown a cluster, add/remove machines.\n\n\nusers: have limited permission to submit an application and etc..\n\n\nguests: can not submit/kill applications, but can view the application status.\n\n\n\n\nSystem administrator can add or remove user by updating config file \nconf/gear.conf\n. \n\n\nSuppose we want to add user jerry as an administrator, here are the steps:\n\n\n\n\n\n\nPick a password, and generate the digest for this password. Suppose we use password \nilovegearpump\n, \n to generate the digest:\n\n\nbin/gear org.apache.gearpump.security.PasswordUtil -password ilovegearpump\n\n\n\n\nIt will generate a digest value like this:\n\n\nCgGxGOxlU8ggNdOXejCeLxy+isrCv0TrS37HwA==\n\n\n\n\n\n\n\n\nChange config file conf/gear.conf at path \ngearpump-ui.gearpump.ui-security.config-file-based-authenticator.admins\n,\n add user \njerry\n in this list:\n\n\nadmins = {\n ## Default Admin. Username: admin, password: admin\n ## !!! Please replace this builtin account for production cluster for security reason. !!!\n \"admin\" = \"AeGxGOxlU8QENdOXejCeLxy+isrCv0TrS37HwA==\"\n \"jerry\" = \"CgGxGOxlU8ggNdOXejCeLxy+isrCv0TrS37HwA==\"\n}\n\n\n\n\n\n\n\n\nRestart the UI dashboard by \nbin/services\n to make the change effective.\n\n\n\n\n\n\nGroup \"admins\" have very unlimited permission, you may want to restrict the permission. In that case \n you can modify \ngearpump-ui.gearpump.ui-security.config-file-based-authenticator.users\n or\n \ngearpump-ui.gearpump.ui-security.config-file-based-authenticator.guests\n.\n\n\n\n\n\n\nSee description at \nconf/gear.conf\n to find more information. \n\n\n\n\n\n\nWhat is the default user and password?\n\n\nFor ConfigFileBasedAuthenticator, Gearpump distribution is shipped with two default users:\n\n\n\n\nusername: admin, password: admin\n\n\nusername: guest, password: guest\n\n\n\n\nUser \nadmin\n has unlimited permissions, while \nguest\n can only view the application status.\n\n\nFor security reason, you need to remove the default users \nadmin\n and \nguest\n for cluster in production.\n\n\nIs this secure?\n\n\nFirstly, we will NOT store any user password in any way so only the user himself knows the password. \nWe will use one-way hash digest to verify the user input password.\n\n\nHow to develop a custom User-Password Authenticator for LDAP, Database, and etc..\n\n\nIf developer choose to define his/her own User-Password based authenticator, it is required that user\n modify configuration option:\n\n\n## Replace \"org.apache.gearpump.security.CustomAuthenticator\" with your real authenticator class.\ngearpump.ui-security.authenticator = \"org.apache.gearpump.security.CustomAuthenticator\"\n\n\n\n\nMake sure CustomAuthenticator extends interface:\n\n\ntrait Authenticator {\n\n def authenticate(user: String, password: String, ec: ExecutionContext): Future[AuthenticationResult]\n}\n\n\n\n\nOAuth2 based authentication\n\n\nOAuth2 based authentication is commonly use to achieve social login with social network account.\n\n\nGearpump provides generic OAuth2 Authentication support which allow user to extend to support new authentication sources.\n\n\nBasically, OAuth2 based Authentication contains these steps:\n 1. User accesses Gearpump UI website, and choose to login with OAuth2 server.\n 2. Gearpump UI website redirects user to OAuth2 server domain authorization endpoint.\n 3. End user complete the authorization in the domain of OAuth2 server.\n 4. OAuth2 server redirects user back to Gearpump UI server.\n 5. Gearpump UI server verify the tokens and extract credentials from query\n parameters and form fields.\n\n\nTerminologies\n\n\nFor terms like client Id, and client secret, please refers to guide \nRFC 6749\n\n\nEnable web proxy for UI server\n\n\nTo enable OAuth2 authentication, the Gearpump UI server should have network access to OAuth2 server, as\n some requests are initiated directly inside Gearpump UI server. So, if you are behind a firewall, make\n sure you have configured the proxy properly for UI server.\n\n\nIf you are on Windows\n\n\nset JAVA_OPTS=-Dhttp.proxyHost=xx.com -Dhttp.proxyPort=8088 -Dhttps.proxyHost=xx.com -Dhttps.proxyPort=8088\nbin/services\n\n\n\n\nIf you are on Linux\n\n\nexport JAVA_OPTS=\"-Dhttp.proxyHost=xx.com -Dhttp.proxyPort=8088 -Dhttps.proxyHost=xx.com -Dhttps.proxyPort=8088\"\nbin/services\n\n\n\n\nGoogle Plus OAuth2 Authenticator\n\n\nGoogle Plus OAuth2 Authenticator does authentication with Google OAuth2 service. It extracts the email address\nfrom Google user profile as credentials.\n\n\nTo use Google OAuth2 Authenticator, there are several steps:\n\n\n\n\nRegister your application (Gearpump UI server here) as an application to Google developer console.\n\n\nConfigure the Google OAuth2 information in gear.conf\n\n\nConfigure network proxy for Gearpump UI server if applies.\n\n\n\n\nStep1: Register your website as an OAuth2 Application on Google\n\n\n\n\nCreate an application representing your website at \nhttps://console.developers.google.com\n\n\nIn \"API Manager\" of your created application, enable API \"Google+ API\"\n\n\nCreate OAuth client ID for this application. In \"Credentials\" tab of \"API Manager\",\nchoose \"Create credentials\", and then select OAuth client ID. Follow the wizard\nto set callback URL, and generate client ID, and client Secret.\n\n\n\n\nNOTE:\n Callback URL is NOT optional.\n\n\nStep2: Configure the OAuth2 information in gear.conf\n\n\n\n\nEnable OAuth2 authentication by setting \ngearpump.ui-security.oauth2-authenticator-enabled\n\nas true.\n\n\nConfigure section \ngearpump.ui-security.oauth2-authenticators.google\n in gear.conf. Please make sure\nclass name, client ID, client Secret, and callback URL are set properly.\n\n\n\n\nNOTE:\n Callback URL set here should match what is configured on Google in step1.\n\n\nStep3: Configure the network proxy if applies.\n\n\nTo enable OAuth2 authentication, the Gearpump UI server should have network access to Google service, as\n some requests are initiated directly inside Gearpump UI server. So, if you are behind a firewall, make\n sure you have configured the proxy properly for UI server.\n\n\nFor guide of how to configure web proxy for UI server, please refer to section \"Enable web proxy for UI server\" above.\n\n\nStep4: Restart the UI server and try to click the Google login icon on UI server.\n\n\nCloudFoundry UAA server OAuth2 Authenticator\n\n\nCloudFoundryUaaAuthenticator does authentication by using CloudFoundry UAA OAuth2 service. It extracts the email address\n from Google user profile as credentials.\n\n\nFor what is UAA (User Account and Authentication Service), please see guide: \nUAA\n\n\nTo use Google OAuth2 Authenticator, there are several steps:\n\n\n\n\nRegister your application (Gearpump UI server here) as an application to UAA with helper tool \nuaac\n.\n\n\nConfigure the Google OAuth2 information in gear.conf\n\n\nConfigure network proxy for Gearpump UI server if applies.\n\n\n\n\nStep1: Register your application to UAA with \nuaac\n\n\n\n\nCheck tutorial on uaac at \nhttps://docs.cloudfoundry.org/adminguide/uaa-user-management.html\n\n\n\n\nOpen a bash shell, set the UAA server by command \nuaac target\n\n\nuaac target [your uaa server url]\n\n\n\n\n\n\n\n\nLogin in as user admin by\n\n\nuaac token client get admin -s MyAdminPassword\n\n\n\n\n\n\n\n\nCreate a new Application (Client) in UAA,\n\n\nuaac client add [your_client_id]\n --scope \"openid cloud_controller.read\"\n --authorized_grant_types \"authorization_code client_credentials refresh_token\"\n --authorities \"openid cloud_controller.read\"\n --redirect_uri [your_redirect_url]\n --autoapprove true\n --secret [your_client_secret]\n\n\n\n\n\n\n\n\nStep2: Configure the OAuth2 information in gear.conf\n\n\n\n\nEnable OAuth2 authentication by setting \ngearpump.ui-security.oauth2-authenticator-enabled\n as true.\n\n\nNavigate to section \ngearpump.ui-security.oauth2-authenticators.cloudfoundryuaa\n\n\nConfig gear.conf \ngearpump.ui-security.oauth2-authenticators.cloudfoundryuaa\n section.\nPlease make sure class name, client ID, client Secret, and callback URL are set properly.\n\n\n\n\nNOTE:\n The callback URL here should match what you set on CloudFoundry UAA in step1.\n\n\nStep3: Configure network proxy for Gearpump UI server if applies\n\n\nTo enable OAuth2 authentication, the Gearpump UI server should have network access to Google service, as\n some requests are initiated directly inside Gearpump UI server. So, if you are behind a firewall, make\n sure you have configured the proxy properly for UI server.\n\n\nFor guide of how to configure web proxy for UI server, please refer to please refer to section \"Enable web proxy for UI server\" above.\n\n\nStep4: Restart the UI server and try to click the CloudFoundry login icon on UI server.\n\n\nStep5: You can also enable additional authenticator for CloudFoundry UAA by setting config:\n\n\nadditional-authenticator-enabled = true\n\n\n\n\nPlease see description in gear.conf for more information.\n\n\nExtends OAuth2Authenticator to support new Authorization service like Facebook, or Twitter.\n\n\nYou can follow the Google OAuth2 example code to define a custom OAuth2Authenticator. Basically, the steps includes:\n\n\n\n\n\n\nDefine an OAuth2Authenticator implementation.\n\n\n\n\n\n\nAdd an configuration entry under \ngearpump.ui-security.oauth2-authenticators\n. For example:\n\n\n## name of this authenticator \n\"socialnetworkx\" {\n \"class\" = \"org.apache.gearpump.services.security.oauth2.impl.SocialNetworkXAuthenticator\"\n\n ## Please make sure this URL matches the name \n \"callback\" = \"http://127.0.0.1:8090/login/oauth2/socialnetworkx/callback\"\n\n \"clientId\" = \"gearpump_test2\"\n \"clientSecret\" = \"gearpump_test2\"\n \"defaultUserRole\" = \"guest\"\n\n ## Make sure socialnetworkx.png exists under dashboard/icons \n \"icon\" = \"/icons/socialnetworkx.png\"\n}\n\n\n\n\n\n\n\n\nThe configuration entry is supposed to be used by class \nSocialNetworkXAuthenticator\n.",
"title": "UI Authentication"
},
{
"location": "/deployment/deployment-ui-authentication/index.html#what-is-this-about",
"text": "",
"title": "What is this about?"
},
{
"location": "/deployment/deployment-ui-authentication/index.html#how-to-enable-ui-authentication",
"text": "Change config file gear.conf, find entry gearpump-ui.gearpump.ui-security.authentication-enabled , change the value to true gearpump-ui.gearpump.ui-security.authentication-enabled = true Restart the UI dashboard, and then the UI authentication is enabled. It will prompt for user name and password.",
"title": "How to enable UI authentication?"
},
{
"location": "/deployment/deployment-ui-authentication/index.html#how-many-authentication-methods-gearpump-ui-server-support",
"text": "Currently, It supports: Username-Password based authentication and OAuth2 based authentication. User-Password based authentication is enabled when gearpump-ui.gearpump.ui-security.authentication-enabled ,\n and CANNOT be disabled. UI server admin can also choose to enable auxiliary OAuth2 authentication channel.",
"title": "How many authentication methods Gearpump UI server support?"
},
{
"location": "/deployment/deployment-ui-authentication/index.html#user-password-based-authentication",
"text": "User-Password based authentication covers all authentication scenarios which requires\n user to enter an explicit username and password. Gearpump provides a built-in ConfigFileBasedAuthenticator which verify user name and password\n against password hashcode stored in config files. However, developer can choose to extends the org.apache.gearpump.security.Authenticator to provide a custom\n User-Password based authenticator, to support LDAP, Kerberos, and Database-based authentication... ConfigFileBasedAuthenticator: built-in User-Password Authenticator ConfigFileBasedAuthenticator store all user name and password hashcode in configuration file gear.conf. Here\nis the steps to configure ConfigFileBasedAuthenticator. How to add or remove user? For the default authentication plugin, it has three categories of users: admins, users, and guests. admins: have unlimited permission, like shutdown a cluster, add/remove machines. users: have limited permission to submit an application and etc.. guests: can not submit/kill applications, but can view the application status. System administrator can add or remove user by updating config file conf/gear.conf . Suppose we want to add user jerry as an administrator, here are the steps: Pick a password, and generate the digest for this password. Suppose we use password ilovegearpump , \n to generate the digest: bin/gear org.apache.gearpump.security.PasswordUtil -password ilovegearpump It will generate a digest value like this: CgGxGOxlU8ggNdOXejCeLxy+isrCv0TrS37HwA== Change config file conf/gear.conf at path gearpump-ui.gearpump.ui-security.config-file-based-authenticator.admins ,\n add user jerry in this list: admins = {\n ## Default Admin. Username: admin, password: admin\n ## !!! Please replace this builtin account for production cluster for security reason. !!!\n \"admin\" = \"AeGxGOxlU8QENdOXejCeLxy+isrCv0TrS37HwA==\"\n \"jerry\" = \"CgGxGOxlU8ggNdOXejCeLxy+isrCv0TrS37HwA==\"\n} Restart the UI dashboard by bin/services to make the change effective. Group \"admins\" have very unlimited permission, you may want to restrict the permission. In that case \n you can modify gearpump-ui.gearpump.ui-security.config-file-based-authenticator.users or\n gearpump-ui.gearpump.ui-security.config-file-based-authenticator.guests . See description at conf/gear.conf to find more information. What is the default user and password? For ConfigFileBasedAuthenticator, Gearpump distribution is shipped with two default users: username: admin, password: admin username: guest, password: guest User admin has unlimited permissions, while guest can only view the application status. For security reason, you need to remove the default users admin and guest for cluster in production. Is this secure? Firstly, we will NOT store any user password in any way so only the user himself knows the password. \nWe will use one-way hash digest to verify the user input password. How to develop a custom User-Password Authenticator for LDAP, Database, and etc.. If developer choose to define his/her own User-Password based authenticator, it is required that user\n modify configuration option: ## Replace \"org.apache.gearpump.security.CustomAuthenticator\" with your real authenticator class.\ngearpump.ui-security.authenticator = \"org.apache.gearpump.security.CustomAuthenticator\" Make sure CustomAuthenticator extends interface: trait Authenticator {\n\n def authenticate(user: String, password: String, ec: ExecutionContext): Future[AuthenticationResult]\n}",
"title": "User-Password based authentication"
},
{
"location": "/deployment/deployment-ui-authentication/index.html#oauth2-based-authentication",
"text": "OAuth2 based authentication is commonly use to achieve social login with social network account. Gearpump provides generic OAuth2 Authentication support which allow user to extend to support new authentication sources. Basically, OAuth2 based Authentication contains these steps:\n 1. User accesses Gearpump UI website, and choose to login with OAuth2 server.\n 2. Gearpump UI website redirects user to OAuth2 server domain authorization endpoint.\n 3. End user complete the authorization in the domain of OAuth2 server.\n 4. OAuth2 server redirects user back to Gearpump UI server.\n 5. Gearpump UI server verify the tokens and extract credentials from query\n parameters and form fields. Terminologies For terms like client Id, and client secret, please refers to guide RFC 6749 Enable web proxy for UI server To enable OAuth2 authentication, the Gearpump UI server should have network access to OAuth2 server, as\n some requests are initiated directly inside Gearpump UI server. So, if you are behind a firewall, make\n sure you have configured the proxy properly for UI server. If you are on Windows set JAVA_OPTS=-Dhttp.proxyHost=xx.com -Dhttp.proxyPort=8088 -Dhttps.proxyHost=xx.com -Dhttps.proxyPort=8088\nbin/services If you are on Linux export JAVA_OPTS=\"-Dhttp.proxyHost=xx.com -Dhttp.proxyPort=8088 -Dhttps.proxyHost=xx.com -Dhttps.proxyPort=8088\"\nbin/services Google Plus OAuth2 Authenticator Google Plus OAuth2 Authenticator does authentication with Google OAuth2 service. It extracts the email address\nfrom Google user profile as credentials. To use Google OAuth2 Authenticator, there are several steps: Register your application (Gearpump UI server here) as an application to Google developer console. Configure the Google OAuth2 information in gear.conf Configure network proxy for Gearpump UI server if applies. Step1: Register your website as an OAuth2 Application on Google Create an application representing your website at https://console.developers.google.com In \"API Manager\" of your created application, enable API \"Google+ API\" Create OAuth client ID for this application. In \"Credentials\" tab of \"API Manager\",\nchoose \"Create credentials\", and then select OAuth client ID. Follow the wizard\nto set callback URL, and generate client ID, and client Secret. NOTE: Callback URL is NOT optional. Step2: Configure the OAuth2 information in gear.conf Enable OAuth2 authentication by setting gearpump.ui-security.oauth2-authenticator-enabled \nas true. Configure section gearpump.ui-security.oauth2-authenticators.google in gear.conf. Please make sure\nclass name, client ID, client Secret, and callback URL are set properly. NOTE: Callback URL set here should match what is configured on Google in step1. Step3: Configure the network proxy if applies. To enable OAuth2 authentication, the Gearpump UI server should have network access to Google service, as\n some requests are initiated directly inside Gearpump UI server. So, if you are behind a firewall, make\n sure you have configured the proxy properly for UI server. For guide of how to configure web proxy for UI server, please refer to section \"Enable web proxy for UI server\" above. Step4: Restart the UI server and try to click the Google login icon on UI server. CloudFoundry UAA server OAuth2 Authenticator CloudFoundryUaaAuthenticator does authentication by using CloudFoundry UAA OAuth2 service. It extracts the email address\n from Google user profile as credentials. For what is UAA (User Account and Authentication Service), please see guide: UAA To use Google OAuth2 Authenticator, there are several steps: Register your application (Gearpump UI server here) as an application to UAA with helper tool uaac . Configure the Google OAuth2 information in gear.conf Configure network proxy for Gearpump UI server if applies. Step1: Register your application to UAA with uaac Check tutorial on uaac at https://docs.cloudfoundry.org/adminguide/uaa-user-management.html Open a bash shell, set the UAA server by command uaac target uaac target [your uaa server url] Login in as user admin by uaac token client get admin -s MyAdminPassword Create a new Application (Client) in UAA, uaac client add [your_client_id]\n --scope \"openid cloud_controller.read\"\n --authorized_grant_types \"authorization_code client_credentials refresh_token\"\n --authorities \"openid cloud_controller.read\"\n --redirect_uri [your_redirect_url]\n --autoapprove true\n --secret [your_client_secret] Step2: Configure the OAuth2 information in gear.conf Enable OAuth2 authentication by setting gearpump.ui-security.oauth2-authenticator-enabled as true. Navigate to section gearpump.ui-security.oauth2-authenticators.cloudfoundryuaa Config gear.conf gearpump.ui-security.oauth2-authenticators.cloudfoundryuaa section.\nPlease make sure class name, client ID, client Secret, and callback URL are set properly. NOTE: The callback URL here should match what you set on CloudFoundry UAA in step1. Step3: Configure network proxy for Gearpump UI server if applies To enable OAuth2 authentication, the Gearpump UI server should have network access to Google service, as\n some requests are initiated directly inside Gearpump UI server. So, if you are behind a firewall, make\n sure you have configured the proxy properly for UI server. For guide of how to configure web proxy for UI server, please refer to please refer to section \"Enable web proxy for UI server\" above. Step4: Restart the UI server and try to click the CloudFoundry login icon on UI server. Step5: You can also enable additional authenticator for CloudFoundry UAA by setting config: additional-authenticator-enabled = true Please see description in gear.conf for more information. Extends OAuth2Authenticator to support new Authorization service like Facebook, or Twitter. You can follow the Google OAuth2 example code to define a custom OAuth2Authenticator. Basically, the steps includes: Define an OAuth2Authenticator implementation. Add an configuration entry under gearpump.ui-security.oauth2-authenticators . For example: ## name of this authenticator \n\"socialnetworkx\" {\n \"class\" = \"org.apache.gearpump.services.security.oauth2.impl.SocialNetworkXAuthenticator\"\n\n ## Please make sure this URL matches the name \n \"callback\" = \"http://127.0.0.1:8090/login/oauth2/socialnetworkx/callback\"\n\n \"clientId\" = \"gearpump_test2\"\n \"clientSecret\" = \"gearpump_test2\"\n \"defaultUserRole\" = \"guest\"\n\n ## Make sure socialnetworkx.png exists under dashboard/icons \n \"icon\" = \"/icons/socialnetworkx.png\"\n} The configuration entry is supposed to be used by class SocialNetworkXAuthenticator .",
"title": "OAuth2 based authentication"
},
{
"location": "/deployment/deployment-ha/index.html",
"text": "To support HA, we allow to start master on multiple nodes. They will form a quorum to decide consistency. For example, if we start master on 5 nodes and 2 nodes are down, then the cluster is still consistent and functional.\n\n\nHere are the steps to enable the HA mode:\n\n\n1. Configure.\n\n\nSelect master machines\n\n\nDistribute the package to all nodes. Modify \nconf/gear.conf\n on all nodes. You MUST configure\n\n\ngearpump.hostname\n\n\n\n\nto make it point to your hostname(or ip), and\n\n\ngearpump.cluster.masters\n\n\n\n\nto a list of master nodes. For example, if I have 3 master nodes (node1, node2, and node3), then the \ngearpump.cluster.masters\n can be set as\n\n\ngearpump.cluster {\n masters = [\"node1:3000\", \"node2:3000\", \"node3:3000\"]\n}\n\n\n\n\nConfigure distributed storage to store application jars.\n\n\nIn \nconf/gear.conf\n, For entry \ngearpump.jarstore.rootpath\n, please choose the storage folder for application jars. You need to make sure this jar storage is highly available. We support two storage systems:\n\n\n1). HDFS\n\n\nYou need to configure the \ngearpump.jarstore.rootpath\n like this\n\n\nhdfs://host:port/path/\n\n\n\n\nFor HDFS HA,\n\n\n\n\n\n\nhdfs://namespace/path/\n\n\n2). Shared NFS folder\n\n\nFirst you need to map the NFS directory to local directory(same path) on all machines of master nodes.\nThen you need to set the \ngearpump.jarstore.rootpath\n like this:\n\n\nfile:///your_nfs_mapping_directory\n\n\n\n\n3). If you don't set this value, we will use the local directory of master node.\n NOTE! There is no HA guarantee in this case, which means we are unable to recover running applications when master goes down.\n\n\n2. Start Daemon.\n\n\nOn node1, node2, node3, Start Master\n\n\n## on node1\nbin/master -ip node1 -port 3000\n\n## on node2\nbin/master -ip node2 -port 3000\n\n## on node3\nbin/master -ip node3 -port 3000\n\n\n\n\n3. Done!\n\n\nNow you have a highly available HA cluster. You can kill any node, the master HA will take effect.\n\n\nNOTE\n: It can take up to 15 seconds for master node to fail-over. You can change the fail-over timeout time by adding config in \ngear.conf\n \ngearpump-master.akka.cluster.auto-down-unreachable-after=10s\n or set it to a smaller value",
"title": "High Availability"
},
{
"location": "/deployment/deployment-msg-delivery/index.html",
"text": "How to deploy for At Least Once Message Delivery?\n\n\nAs introduced in the \nWhat is At Least Once Message Delivery\n, Gearpump has a built in KafkaSource. To get at least once message delivery, users should deploy a Kafka cluster as the offset store along with the Gearpump cluster. \n\n\nHere's an example to deploy a local Kafka cluster. \n\n\n\n\n\n\ndownload the latest Kafka from the official website and extract to a local directory (\n$KAFKA_HOME\n)\n\n\n\n\n\n\nBoot up the single-node Zookeeper instance packaged with Kafka. \n\n\n$KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties\n\n\n\n\n\n\n\n\nStart a Kafka broker\n\n\n$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/kafka.properties\n\n\n\n\n\n\n\n\nWhen creating a offset store for \nKafkaSource\n, set the zookeeper connect string to \nlocalhost:2181\n and broker list to \nlocalhost:9092\n in \nKafkaStorageFactory\n.\n\n\nval offsetStorageFactory = new KafkaStorageFactory(\"localhost:2181\", \"localhost:9092\")\nval source = new KafkaSource(\"topic1\", \"localhost:2181\", offsetStorageFactory)\n\n\n\n\n\n\n\n\nHow to deploy for Exactly Once Message Delivery?\n\n\nExactly Once Message Delivery requires both an offset store and a checkpoint store. For the offset store, a Kafka cluster should be deployed as in the previous section. As for the checkpoint store, Gearpump has built-in support for Hadoop file systems, like HDFS. Hence, users should deploy a HDFS cluster alongside the Gearpump cluster. \n\n\nHere's an example to deploy a local HDFS cluster.\n\n\n\n\n\n\ndownload Hadoop 2.6 from the official website and extracts it to a local directory \nHADOOP_HOME\n\n\n\n\n\n\nadd following configuration to \n$HADOOP_HOME/etc/core-site.xml\n\n\n<configuration>\n <property>\n <name>fs.defaultFS</name>\n <value>hdfs://localhost:9000</value>\n </property>\n</configuration>\n\n\n\n\n\n\n\n\nstart HDFS\n\n\n$HADOOP_HOME/sbin/start-dfs.sh\n\n\n\n\n\n\n\n\nWhen creating a \nHadoopCheckpointStore\n, set the hadoop configuration as in the \ncore-site.xml\n\n\nval hadoopConfig = new Configuration\nhadoopConfig.set(\"fs.defaultFS\", \"hdfs://localhost:9000\")\nval checkpointStoreFactory = new HadoopCheckpointStoreFactory(\"MessageCount\", hadoopConfig, new FileSizeRotation(1000))",
"title": "Reliable Message Delivery"
},
{
"location": "/deployment/deployment-msg-delivery/index.html#how-to-deploy-for-at-least-once-message-delivery",
"text": "As introduced in the What is At Least Once Message Delivery , Gearpump has a built in KafkaSource. To get at least once message delivery, users should deploy a Kafka cluster as the offset store along with the Gearpump cluster. Here's an example to deploy a local Kafka cluster. download the latest Kafka from the official website and extract to a local directory ( $KAFKA_HOME ) Boot up the single-node Zookeeper instance packaged with Kafka. $KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties Start a Kafka broker $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/kafka.properties When creating a offset store for KafkaSource , set the zookeeper connect string to localhost:2181 and broker list to localhost:9092 in KafkaStorageFactory . val offsetStorageFactory = new KafkaStorageFactory(\"localhost:2181\", \"localhost:9092\")\nval source = new KafkaSource(\"topic1\", \"localhost:2181\", offsetStorageFactory)",
"title": "How to deploy for At Least Once Message Delivery?"
},
{
"location": "/deployment/deployment-msg-delivery/index.html#how-to-deploy-for-exactly-once-message-delivery",
"text": "Exactly Once Message Delivery requires both an offset store and a checkpoint store. For the offset store, a Kafka cluster should be deployed as in the previous section. As for the checkpoint store, Gearpump has built-in support for Hadoop file systems, like HDFS. Hence, users should deploy a HDFS cluster alongside the Gearpump cluster. Here's an example to deploy a local HDFS cluster. download Hadoop 2.6 from the official website and extracts it to a local directory HADOOP_HOME add following configuration to $HADOOP_HOME/etc/core-site.xml <configuration>\n <property>\n <name>fs.defaultFS</name>\n <value>hdfs://localhost:9000</value>\n </property>\n</configuration> start HDFS $HADOOP_HOME/sbin/start-dfs.sh When creating a HadoopCheckpointStore , set the hadoop configuration as in the core-site.xml val hadoopConfig = new Configuration\nhadoopConfig.set(\"fs.defaultFS\", \"hdfs://localhost:9000\")\nval checkpointStoreFactory = new HadoopCheckpointStoreFactory(\"MessageCount\", hadoopConfig, new FileSizeRotation(1000))",
"title": "How to deploy for Exactly Once Message Delivery?"
},
{
"location": "/deployment/deployment-configuration/index.html",
"text": "Master and Worker configuration\n\n\nMaster and Worker daemons will only read configuration from \nconf/gear.conf\n.\n\n\nMaster reads configuration from section master and gearpump:\n\n\nmaster {\n}\ngearpump{\n}\n\n\n\n\nWorker reads configuration from section worker and gearpump:\n\n\nworker {\n}\ngearpump{\n}\n\n\n\n\nConfiguration for user submitted application job\n\n\nFor user application job, it will read configuration file \ngear.conf\n and \napplication.conf\n from classpath, while \napplication.conf\n has higher priority.\nThe default classpath contains:\n\n\n\n\nconf/\n\n\ncurrent working directory.\n\n\n\n\nFor example, you can put a \napplication.conf\n on your working directory, and then it will be effective when you submit a new job application.\n\n\nLogging\n\n\nTo change the log level, you need to change both \ngear.conf\n, and \nlog4j.properties\n.\n\n\nTo change the log level for master and worker daemon\n\n\nPlease change \nlog4j.rootLevel\n in \nlog4j.properties\n, \ngearpump-master.akka.loglevel\n and \ngearpump-worker.akka.loglevel\n in \ngear.conf\n.\n\n\nTo change the log level for application job\n\n\nPlease change \nlog4j.rootLevel\n in \nlog4j.properties\n, and \nakka.loglevel\n in \ngear.conf\n or \napplication.conf\n.\n\n\nGearpump Default Configuration\n\n\nThis is the default configuration for \ngear.conf\n.\n\n\n\n\n\n\n\n\nconfig item\n\n\ndefault value\n\n\ndescription\n\n\n\n\n\n\n\n\n\n\ngearpump.hostname\n\n\n\"127.0.0.1\"\n\n\nhostname of current machine. If you are using local mode, then set this to 127.0.0.1. If you are using cluster mode, make sure this hostname can be accessed by other machines.\n\n\n\n\n\n\ngearpump.cluster.masters\n\n\n[\"127.0.0.1:3000\"]\n\n\nConfig to set the master nodes of the cluster. If there are multiple master in the list, then the master nodes runs in HA mode. For example, you may start three master, on node1: \nbin/master -ip node1 -port 3000\n, on node2: \nbin/master -ip node2 -port 3000\n, on node3: \nbin/master -ip node3 -port 3000\n, then you need to set \ngearpump.cluster.masters = [\"node1:3000\",\"node2:3000\",\"node3:3000\"]\n\n\n\n\n\n\ngearpump.task-dispatcher\n\n\n\"gearpump.shared-thread-pool-dispatcher\"\n\n\ndefault dispatcher for task actor\n\n\n\n\n\n\ngearpump.metrics.enabled\n\n\ntrue\n\n\nflag to enable the metrics system\n\n\n\n\n\n\ngearpump.metrics.sample-rate\n\n\n1\n\n\nWe will take one sample every \ngearpump.metrics.sample-rate\n data points. Note it may have impact that the statistics on UI portal is not accurate. Change it to 1 if you want accurate metrics in UI\n\n\n\n\n\n\ngearpump.metrics.report-interval-ms\n\n\n15000\n\n\nwe will report once every 15 seconds\n\n\n\n\n\n\ngearpump.metrics.reporter\n\n\n\"akka\"\n\n\navailable value: \"graphite\", \"akka\", \"logfile\" which write the metrics data to different places.\n\n\n\n\n\n\ngearpump.retainHistoryData.hours\n\n\n72\n\n\nmax hours of history data to retain, Note: Due to implementation limitation(we store all history in memory), please don't set this to too big which may exhaust memory.\n\n\n\n\n\n\ngearpump.retainHistoryData.intervalMs\n\n\n3600000\n\n\ntime interval between two data points for history data (unit: ms). Usually this is set to a big value so that we only store coarse-grain data\n\n\n\n\n\n\ngearpump.retainRecentData.seconds\n\n\n300\n\n\nmax seconds of recent data to retain. This is for the fine-grain data\n\n\n\n\n\n\ngearpump.retainRecentData.intervalMs\n\n\n15000\n\n\ntime interval between two data points for recent data (unit: ms)\n\n\n\n\n\n\ngearpump.log.daemon.dir\n\n\n\"logs\"\n\n\nThe log directory for daemon processes(relative to current working directory)\n\n\n\n\n\n\ngearpump.log.application.dir\n\n\n\"logs\"\n\n\nThe log directory for applications(relative to current working directory)\n\n\n\n\n\n\ngearpump.serializers\n\n\na map\n\n\ncustom serializer for streaming application, e.g. \n\"scala.Array\" = \"\"\n\n\n\n\n\n\ngearpump.worker.slots\n\n\n1000\n\n\nHow many slots each worker contains\n\n\n\n\n\n\ngearpump.appmaster.vmargs\n\n\n\"-server -Xss1M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:NewRatio=3 -Djava.rmi.server.hostname=localhost\"\n\n\nJVM arguments for AppMaster\n\n\n\n\n\n\ngearpump.appmaster.extraClasspath\n\n\n\"\"\n\n\nJVM default class path for AppMaster\n\n\n\n\n\n\ngearpump.executor.vmargs\n\n\n\"-server -Xss1M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:NewRatio=3 -Djava.rmi.server.hostname=localhost\"\n\n\nJVM arguments for executor\n\n\n\n\n\n\ngearpump.executor.extraClasspath\n\n\n\"\"\n\n\nJVM default class path for executor\n\n\n\n\n\n\ngearpump.jarstore.rootpath\n\n\n\"jarstore/\"\n\n\nDefine where the submitted jar file will be stored. This path follows the hadoop path schema. For HDFS, use \nhdfs://host:port/path/\n, and HDFS HA, \nhdfs://namespace/path/\n; if you want to store on master nodes, then use local directory. \njarstore.rootpath = \"jarstore/\"\n will point to relative directory where master is started. \njarstore.rootpath = \"/jarstore/\"\n will point to absolute directory on master server\n\n\n\n\n\n\ngearpump.scheduling.scheduler-class\n\n\n\"org.apache.gearpump.cluster.scheduler.PriorityScheduler\"\n\n\nClass to schedule the applications.\n\n\n\n\n\n\ngearpump.services.host\n\n\n\"127.0.0.1\"\n\n\ndashboard UI host address\n\n\n\n\n\n\ngearpump.services.port\n\n\n8090\n\n\ndashboard UI host port\n\n\n\n\n\n\ngearpump.netty.buffer-size\n\n\n5242880\n\n\nnetty connection buffer size\n\n\n\n\n\n\ngearpump.netty.max-retries\n\n\n30\n\n\nmaximum number of retries for a netty client to connect to remote host\n\n\n\n\n\n\ngearpump.netty.base-sleep-ms\n\n\n100\n\n\nbase sleep time for a netty client to retry a connection. Actual sleep time is a multiple of this value\n\n\n\n\n\n\ngearpump.netty.max-sleep-ms\n\n\n1000\n\n\nmaximum sleep time for a netty client to retry a connection\n\n\n\n\n\n\ngearpump.netty.message-batch-size\n\n\n262144\n\n\nnetty max batch size\n\n\n\n\n\n\ngearpump.netty.flush-check-interval\n\n\n10\n\n\nmax flush interval for the netty layer, in milliseconds\n\n\n\n\n\n\ngearpump.netty.dispatcher\n\n\n\"gearpump.shared-thread-pool-dispatcher\"\n\n\ndefault dispatcher for netty client and server\n\n\n\n\n\n\ngearpump.shared-thread-pool-dispatcher\n\n\ndefault Dispatcher with \"fork-join-executor\"\n\n\ndefault shared thread pool dispatcher\n\n\n\n\n\n\ngearpump.single-thread-dispatcher\n\n\nPinnedDispatcher\n\n\ndefault single thread dispatcher\n\n\n\n\n\n\ngearpump.serialization-framework\n\n\n\"org.apache.gearpump.serializer.FastKryoSerializationFramework\"\n\n\nGearpump has built-in serialization framework using Kryo. Users are allowed to use a different serialization framework, like Protobuf. See \norg.apache.gearpump.serializer.FastKryoSerializationFramework\n to find how a custom serialization framework can be defined\n\n\n\n\n\n\nworker.executor-share-same-jvm-as-worker\n\n\nfalse\n\n\nwhether the executor actor is started in the same jvm(process) from which running the worker actor, the intention of this setting is for the convenience of single machine debugging, however, the app jar need to be added to the worker's classpath when you set it true and have a 'real' worker in the cluster",
"title": "Configuration"
},
{
"location": "/deployment/deployment-configuration/index.html#master-and-worker-configuration",
"text": "Master and Worker daemons will only read configuration from conf/gear.conf . Master reads configuration from section master and gearpump: master {\n}\ngearpump{\n} Worker reads configuration from section worker and gearpump: worker {\n}\ngearpump{\n}",
"title": "Master and Worker configuration"
},
{
"location": "/deployment/deployment-configuration/index.html#configuration-for-user-submitted-application-job",
"text": "For user application job, it will read configuration file gear.conf and application.conf from classpath, while application.conf has higher priority.\nThe default classpath contains: conf/ current working directory. For example, you can put a application.conf on your working directory, and then it will be effective when you submit a new job application.",
"title": "Configuration for user submitted application job"
},
{
"location": "/deployment/deployment-configuration/index.html#logging",
"text": "To change the log level, you need to change both gear.conf , and log4j.properties . To change the log level for master and worker daemon Please change log4j.rootLevel in log4j.properties , gearpump-master.akka.loglevel and gearpump-worker.akka.loglevel in gear.conf . To change the log level for application job Please change log4j.rootLevel in log4j.properties , and akka.loglevel in gear.conf or application.conf .",
"title": "Logging"
},
{
"location": "/deployment/deployment-configuration/index.html#gearpump-default-configuration",
"text": "This is the default configuration for gear.conf . config item default value description gearpump.hostname \"127.0.0.1\" hostname of current machine. If you are using local mode, then set this to 127.0.0.1. If you are using cluster mode, make sure this hostname can be accessed by other machines. gearpump.cluster.masters [\"127.0.0.1:3000\"] Config to set the master nodes of the cluster. If there are multiple master in the list, then the master nodes runs in HA mode. For example, you may start three master, on node1: bin/master -ip node1 -port 3000 , on node2: bin/master -ip node2 -port 3000 , on node3: bin/master -ip node3 -port 3000 , then you need to set gearpump.cluster.masters = [\"node1:3000\",\"node2:3000\",\"node3:3000\"] gearpump.task-dispatcher \"gearpump.shared-thread-pool-dispatcher\" default dispatcher for task actor gearpump.metrics.enabled true flag to enable the metrics system gearpump.metrics.sample-rate 1 We will take one sample every gearpump.metrics.sample-rate data points. Note it may have impact that the statistics on UI portal is not accurate. Change it to 1 if you want accurate metrics in UI gearpump.metrics.report-interval-ms 15000 we will report once every 15 seconds gearpump.metrics.reporter \"akka\" available value: \"graphite\", \"akka\", \"logfile\" which write the metrics data to different places. gearpump.retainHistoryData.hours 72 max hours of history data to retain, Note: Due to implementation limitation(we store all history in memory), please don't set this to too big which may exhaust memory. gearpump.retainHistoryData.intervalMs 3600000 time interval between two data points for history data (unit: ms). Usually this is set to a big value so that we only store coarse-grain data gearpump.retainRecentData.seconds 300 max seconds of recent data to retain. This is for the fine-grain data gearpump.retainRecentData.intervalMs 15000 time interval between two data points for recent data (unit: ms) gearpump.log.daemon.dir \"logs\" The log directory for daemon processes(relative to current working directory) gearpump.log.application.dir \"logs\" The log directory for applications(relative to current working directory) gearpump.serializers a map custom serializer for streaming application, e.g. \"scala.Array\" = \"\" gearpump.worker.slots 1000 How many slots each worker contains gearpump.appmaster.vmargs \"-server -Xss1M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:NewRatio=3 -Djava.rmi.server.hostname=localhost\" JVM arguments for AppMaster gearpump.appmaster.extraClasspath \"\" JVM default class path for AppMaster gearpump.executor.vmargs \"-server -Xss1M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:NewRatio=3 -Djava.rmi.server.hostname=localhost\" JVM arguments for executor gearpump.executor.extraClasspath \"\" JVM default class path for executor gearpump.jarstore.rootpath \"jarstore/\" Define where the submitted jar file will be stored. This path follows the hadoop path schema. For HDFS, use hdfs://host:port/path/ , and HDFS HA, hdfs://namespace/path/ ; if you want to store on master nodes, then use local directory. jarstore.rootpath = \"jarstore/\" will point to relative directory where master is started. jarstore.rootpath = \"/jarstore/\" will point to absolute directory on master server gearpump.scheduling.scheduler-class \"org.apache.gearpump.cluster.scheduler.PriorityScheduler\" Class to schedule the applications. gearpump.services.host \"127.0.0.1\" dashboard UI host address gearpump.services.port 8090 dashboard UI host port gearpump.netty.buffer-size 5242880 netty connection buffer size gearpump.netty.max-retries 30 maximum number of retries for a netty client to connect to remote host gearpump.netty.base-sleep-ms 100 base sleep time for a netty client to retry a connection. Actual sleep time is a multiple of this value gearpump.netty.max-sleep-ms 1000 maximum sleep time for a netty client to retry a connection gearpump.netty.message-batch-size 262144 netty max batch size gearpump.netty.flush-check-interval 10 max flush interval for the netty layer, in milliseconds gearpump.netty.dispatcher \"gearpump.shared-thread-pool-dispatcher\" default dispatcher for netty client and server gearpump.shared-thread-pool-dispatcher default Dispatcher with \"fork-join-executor\" default shared thread pool dispatcher gearpump.single-thread-dispatcher PinnedDispatcher default single thread dispatcher gearpump.serialization-framework \"org.apache.gearpump.serializer.FastKryoSerializationFramework\" Gearpump has built-in serialization framework using Kryo. Users are allowed to use a different serialization framework, like Protobuf. See org.apache.gearpump.serializer.FastKryoSerializationFramework to find how a custom serialization framework can be defined worker.executor-share-same-jvm-as-worker false whether the executor actor is started in the same jvm(process) from which running the worker actor, the intention of this setting is for the convenience of single machine debugging, however, the app jar need to be added to the worker's classpath when you set it true and have a 'real' worker in the cluster",
"title": "Gearpump Default Configuration"
},
{
"location": "/deployment/deployment-resource-isolation/index.html",
"text": "CGroup (abbreviated from control groups) is a Linux kernel feature to limit, account, and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups.In Gearpump, we use cgroup to manage CPU resources.\n\n\nStart CGroup Service\n\n\nCGroup feature is only supported by Linux whose kernel version is larger than 2.6.18. Please also make sure the SELinux is disabled before start CGroup.\n\n\nThe following steps are supposed to be executed by root user.\n\n\n\n\n\n\nCheck \n/etc/cgconfig.conf\n exist or not. If not exists, please \nyum install libcgroup\n.\n\n\n\n\n\n\nRun following command to see whether the \ncpu\n subsystem is already mounted to the file system.\n\n\nlssubsys -m\n\n\n\n\nEach subsystem in CGroup will have a corresponding mount file path in local file system. For example, the following output shows that \ncpu\n subsystem is mounted to file path \n/sys/fs/cgroup/cpu\n\n\ncpu /sys/fs/cgroup/cpu\nnet_cls /sys/fs/cgroup/net_cls\nblkio /sys/fs/cgroup/blkio\nperf_event /sys/fs/cgroup/perf_event\n\n\n\n\n\n\n\n\nIf you want to assign permission to user \ngear\n to launch Gearpump Worker and applications with resource isolation enabled, you need to check gear's uid and gid in \n/etc/passwd\n file, let's take \n500\n for example.\n\n\n\n\n\n\nAdd following content to \n/etc/cgconfig.conf\n\n\n# The mount point of cpu subsystem.\n# If your system already mounted it, this segment should be eliminated.\nmount { \n cpu = /cgroup/cpu;\n}\n\n# Here the group name \"gearpump\" represents a node in CGroup's hierarchy tree.\n# When the CGroup service is started, there will be a folder generated under the mount point of cpu subsystem,\n# whose name is \"gearpump\".\n\ngroup gearpump {\n perm {\n task {\n uid = 500;\n gid = 500;\n }\n admin {\n uid = 500;\n gid = 500;\n }\n }\n cpu {\n }\n}\n\n\n\n\n\n\n\n\nPlease note that if the output of step 2 shows that \ncpu\n subsystem is already mounted, then the \nmount\n segment should not be included.\n\n\n\n\n\n\nThen Start cgroup service\n\n\nsudo service cgconfig restart\n\n\n\n\n\n\n\n\nThere should be a folder \ngearpump\n generated under the mount point of cpu subsystem and its owner is \ngear:gear\n. \n\n\n\n\n\n\nRepeat the above-mentioned steps on each machine where you want to launch Gearpump. \n\n\n\n\n\n\nEnable Cgroups in Gearpump\n\n\n\n\n\n\nLogin into the machine which has CGroup prepared with user \ngear\n.\n\n\nssh gear@node\n\n\n\n\n\n\n\n\nEnter into Gearpump's home folder, edit gear.conf under folder \n${GEARPUMP_HOME}/conf/\n\n\ngearpump.worker.executor-process-launcher = \"org.apache.gearpump.cluster.worker.CGroupProcessLauncher\"\n\ngearpump.cgroup.root = \"gearpump\"\n\n\n\n\n\n\n\n\nPlease note the gearpump.cgroup.root \ngearpump\n must be consistent with the group name in /etc/cgconfig.conf.\n\n\n\n\n\n\nRepeat the above-mentioned steps on each machine where you want to launch Gearpump\n\n\n\n\n\n\nStart the Gearpump cluster, please refer to \nDeploy Gearpump in Standalone Mode\n\n\n\n\n\n\nLaunch Application From Command Line\n\n\n\n\n\n\nLogin into the machine which has Gearpump distribution.\n\n\n\n\n\n\nEnter into Gearpump's home folder, edit gear.conf under folder \n${GEARPUMP_HOME}/conf/\n\n\ngearpump.cgroup.cpu-core-limit-per-executor = ${your_preferred_int_num}\n\n\n\n\n\n\n\n\nHere the configuration is the number of CPU cores per executor can use and -1 means no limitation\n\n\n\n\n\n\nSubmit application\n\n\nbin/gear app -jar examples/sol-2.11-0.8.3-assembly.jar -streamProducer 10 -streamProcessor 10\n\n\n\n\n\n\n\n\nThen you can run command \ntop\n to monitor the cpu usage.\n\n\n\n\n\n\nLaunch Application From Dashboard\n\n\nIf you want to submit the application from dashboard, by default the \ngearpump.cgroup.cpu-core-limit-per-executor\n is inherited from Worker's configuration. You can provide your own conf file to override it.\n\n\nLimitations\n\n\nWindows and Mac OS X don't support CGroup, so the resource isolation will not work even if you turn it on. There will not be any limitation for single executor's cpu usage.",
"title": "Resource Isolation"
},
{
"location": "/deployment/deployment-resource-isolation/index.html#start-cgroup-service",
"text": "CGroup feature is only supported by Linux whose kernel version is larger than 2.6.18. Please also make sure the SELinux is disabled before start CGroup. The following steps are supposed to be executed by root user. Check /etc/cgconfig.conf exist or not. If not exists, please yum install libcgroup . Run following command to see whether the cpu subsystem is already mounted to the file system. lssubsys -m Each subsystem in CGroup will have a corresponding mount file path in local file system. For example, the following output shows that cpu subsystem is mounted to file path /sys/fs/cgroup/cpu cpu /sys/fs/cgroup/cpu\nnet_cls /sys/fs/cgroup/net_cls\nblkio /sys/fs/cgroup/blkio\nperf_event /sys/fs/cgroup/perf_event If you want to assign permission to user gear to launch Gearpump Worker and applications with resource isolation enabled, you need to check gear's uid and gid in /etc/passwd file, let's take 500 for example. Add following content to /etc/cgconfig.conf # The mount point of cpu subsystem.\n# If your system already mounted it, this segment should be eliminated.\nmount { \n cpu = /cgroup/cpu;\n}\n\n# Here the group name \"gearpump\" represents a node in CGroup's hierarchy tree.\n# When the CGroup service is started, there will be a folder generated under the mount point of cpu subsystem,\n# whose name is \"gearpump\".\n\ngroup gearpump {\n perm {\n task {\n uid = 500;\n gid = 500;\n }\n admin {\n uid = 500;\n gid = 500;\n }\n }\n cpu {\n }\n} Please note that if the output of step 2 shows that cpu subsystem is already mounted, then the mount segment should not be included. Then Start cgroup service sudo service cgconfig restart There should be a folder gearpump generated under the mount point of cpu subsystem and its owner is gear:gear . Repeat the above-mentioned steps on each machine where you want to launch Gearpump.",
"title": "Start CGroup Service"
},
{
"location": "/deployment/deployment-resource-isolation/index.html#enable-cgroups-in-gearpump",
"text": "Login into the machine which has CGroup prepared with user gear . ssh gear@node Enter into Gearpump's home folder, edit gear.conf under folder ${GEARPUMP_HOME}/conf/ gearpump.worker.executor-process-launcher = \"org.apache.gearpump.cluster.worker.CGroupProcessLauncher\"\n\ngearpump.cgroup.root = \"gearpump\" Please note the gearpump.cgroup.root gearpump must be consistent with the group name in /etc/cgconfig.conf. Repeat the above-mentioned steps on each machine where you want to launch Gearpump Start the Gearpump cluster, please refer to Deploy Gearpump in Standalone Mode",
"title": "Enable Cgroups in Gearpump"
},
{
"location": "/deployment/deployment-resource-isolation/index.html#launch-application-from-command-line",
"text": "Login into the machine which has Gearpump distribution. Enter into Gearpump's home folder, edit gear.conf under folder ${GEARPUMP_HOME}/conf/ gearpump.cgroup.cpu-core-limit-per-executor = ${your_preferred_int_num} Here the configuration is the number of CPU cores per executor can use and -1 means no limitation Submit application bin/gear app -jar examples/sol-2.11-0.8.3-assembly.jar -streamProducer 10 -streamProcessor 10 Then you can run command top to monitor the cpu usage.",
"title": "Launch Application From Command Line"
},
{
"location": "/deployment/deployment-resource-isolation/index.html#launch-application-from-dashboard",
"text": "If you want to submit the application from dashboard, by default the gearpump.cgroup.cpu-core-limit-per-executor is inherited from Worker's configuration. You can provide your own conf file to override it.",
"title": "Launch Application From Dashboard"
},
{
"location": "/deployment/deployment-resource-isolation/index.html#limitations",
"text": "Windows and Mac OS X don't support CGroup, so the resource isolation will not work even if you turn it on. There will not be any limitation for single executor's cpu usage.",
"title": "Limitations"
},
{
"location": "/deployment/deployment-security/index.html",
"text": "Until now Gearpump supports deployment in a secured Yarn cluster and writing to secured HBase, where \"secured\" means Kerberos enabled. \nFurther security related feature is in progress.\n\n\nHow to launch Gearpump in a secured Yarn cluster\n\n\nSuppose user \ngear\n will launch gearpump on YARN, then the corresponding principal \ngear\n should be created in KDC server.\n\n\n\n\n\n\nCreate Kerberos principal for user \ngear\n, on the KDC machine\n\n\nsudo kadmin.local\n\n\n\n\nIn the kadmin.local or kadmin shell, create the principal\n\n\nkadmin: addprinc gear/fully.qualified.domain.name@YOUR-REALM.COM\n\n\n\n\nRemember that user \ngear\n must exist on every node of Yarn. \n\n\n\n\n\n\nUpload the gearpump-2.11-0.8.3.zip to remote HDFS Folder, suggest to put it under \n/usr/lib/gearpump/gearpump-2.11-0.8.3.zip\n\n\n\n\n\n\nCreate HDFS folder /user/gear/, make sure all read-write rights are granted for user \ngear\n\n\ndrwxr-xr-x - gear gear 0 2015-11-27 14:03 /user/gear\n\n\n\n\n\n\n\n\nPut the YARN configurations under classpath.\n Before calling \nyarnclient launch\n, make sure you have put all yarn configuration files under classpath. Typically, you can just copy all files under \n$HADOOP_HOME/etc/hadoop\n from one of the YARN cluster machine to \nconf/yarnconf\n of gearpump. \n$HADOOP_HOME\n points to the Hadoop installation directory. \n\n\n\n\n\n\nGet Kerberos credentials to submit the job:\n\n\nkinit gearpump/fully.qualified.domain.name@YOUR-REALM.COM\n\n\n\n\nHere you can login with keytab or password. Please refer Kerberos's document for details.\n\n\nyarnclient launch -package /usr/lib/gearpump/gearpump-2.11-0.8.3.zip\n\n\n\n\n\n\n\n\nHow to write to secured HBase\n\n\nWhen the remote HBase is security enabled, a kerberos keytab and the corresponding principal name need to be\nprovided for the gearpump-hbase connector. Specifically, the \nUserConfig\n object passed into the HBaseSink should contain\n\n{(\"gearpump.keytab.file\", \"\\\\$keytab\"), (\"gearpump.kerberos.principal\", \"\\\\$principal\")}\n. example code of writing to secured HBase:\n\n\nval principal = \"gearpump/fully.qualified.domain.name@YOUR-REALM.COM\"\nval keytabContent = Files.toByteArray(new File(\"path_to_keytab_file\"))\nval appConfig = UserConfig.empty\n .withString(\"gearpump.kerberos.principal\", principal)\n .withBytes(\"gearpump.keytab.file\", keytabContent)\nval sink = new HBaseSink(appConfig, \"$tableName\")\nval sinkProcessor = DataSinkProcessor(sink, \"$sinkNum\")\nval split = Processor[Split](\"$splitNum\")\nval computation = split ~> sinkProcessor\nval application = StreamApplication(\"HBase\", Graph(computation), UserConfig.empty)\n\n\n\n\nNote here the keytab file set into config should be a byte array.\n\n\nFuture Plan\n\n\nMore external components support\n\n\n\n\nHDFS\n\n\nKafka\n\n\n\n\nAuthentication(Kerberos)\n\n\nSince Gearpump\u2019s Master-Worker structure is similar to HDFS\u2019s NameNode-DataNode and Yarn\u2019s ResourceManager-NodeManager, we may follow the way they use.\n\n\n\n\nUser creates kerberos principal and keytab for Gearpump.\n\n\nDeploy the keytab files to all the cluster nodes.\n\n\nConfigure Gearpump\u2019s conf file, specify kerberos principal and local keytab file location.\n\n\nStart Master and Worker.\n\n\n\n\nEvery application has a submitter/user. We will separate the application from different users, like different log folders for different applications. \nOnly authenticated users can submit the application to Gearpump's Master.\n\n\nAuthorization\n\n\nHopefully more on this soon",
"title": "YARN Security Guide"
},
{
"location": "/deployment/deployment-security/index.html#how-to-launch-gearpump-in-a-secured-yarn-cluster",
"text": "Suppose user gear will launch gearpump on YARN, then the corresponding principal gear should be created in KDC server. Create Kerberos principal for user gear , on the KDC machine sudo kadmin.local In the kadmin.local or kadmin shell, create the principal kadmin: addprinc gear/fully.qualified.domain.name@YOUR-REALM.COM Remember that user gear must exist on every node of Yarn. Upload the gearpump-2.11-0.8.3.zip to remote HDFS Folder, suggest to put it under /usr/lib/gearpump/gearpump-2.11-0.8.3.zip Create HDFS folder /user/gear/, make sure all read-write rights are granted for user gear drwxr-xr-x - gear gear 0 2015-11-27 14:03 /user/gear Put the YARN configurations under classpath.\n Before calling yarnclient launch , make sure you have put all yarn configuration files under classpath. Typically, you can just copy all files under $HADOOP_HOME/etc/hadoop from one of the YARN cluster machine to conf/yarnconf of gearpump. $HADOOP_HOME points to the Hadoop installation directory. Get Kerberos credentials to submit the job: kinit gearpump/fully.qualified.domain.name@YOUR-REALM.COM Here you can login with keytab or password. Please refer Kerberos's document for details. yarnclient launch -package /usr/lib/gearpump/gearpump-2.11-0.8.3.zip",
"title": "How to launch Gearpump in a secured Yarn cluster"
},
{
"location": "/deployment/deployment-security/index.html#how-to-write-to-secured-hbase",
"text": "When the remote HBase is security enabled, a kerberos keytab and the corresponding principal name need to be\nprovided for the gearpump-hbase connector. Specifically, the UserConfig object passed into the HBaseSink should contain {(\"gearpump.keytab.file\", \"\\\\$keytab\"), (\"gearpump.kerberos.principal\", \"\\\\$principal\")} . example code of writing to secured HBase: val principal = \"gearpump/fully.qualified.domain.name@YOUR-REALM.COM\"\nval keytabContent = Files.toByteArray(new File(\"path_to_keytab_file\"))\nval appConfig = UserConfig.empty\n .withString(\"gearpump.kerberos.principal\", principal)\n .withBytes(\"gearpump.keytab.file\", keytabContent)\nval sink = new HBaseSink(appConfig, \"$tableName\")\nval sinkProcessor = DataSinkProcessor(sink, \"$sinkNum\")\nval split = Processor[Split](\"$splitNum\")\nval computation = split ~> sinkProcessor\nval application = StreamApplication(\"HBase\", Graph(computation), UserConfig.empty) Note here the keytab file set into config should be a byte array.",
"title": "How to write to secured HBase"
},
{
"location": "/deployment/deployment-security/index.html#future-plan",
"text": "More external components support HDFS Kafka Authentication(Kerberos) Since Gearpump\u2019s Master-Worker structure is similar to HDFS\u2019s NameNode-DataNode and Yarn\u2019s ResourceManager-NodeManager, we may follow the way they use. User creates kerberos principal and keytab for Gearpump. Deploy the keytab files to all the cluster nodes. Configure Gearpump\u2019s conf file, specify kerberos principal and local keytab file location. Start Master and Worker. Every application has a submitter/user. We will separate the application from different users, like different log folders for different applications. \nOnly authenticated users can submit the application to Gearpump's Master. Authorization Hopefully more on this soon",
"title": "Future Plan"
},
{
"location": "/deployment/get-gearpump-distribution/index.html",
"text": "Prepare the binary\n\n\nYou can either download pre-build release package or choose to build from source code.\n\n\nDownload Release Binary\n\n\nIf you choose to use pre-build package, then you don't need to build from source code. The release package can be downloaded from:\n\n\nDownload page\n\n\nBuild from Source code\n\n\nIf you choose to build the package from source code yourself, you can follow these steps:\n\n\n1). Clone the Gearpump repository\n\n\ngit clone https://github.com/apache/incubator-gearpump.git\n\n\n\n\ncd gearpump\n\n\n2). Build package\n\n\n## Please use scala 2.11\n## The target package path: output/target/gearpump-2.11-0.8.3.zip\nsbt clean assembly packArchiveZip\n\n\n\n\nAfter the build, there will be a package file gearpump-2.11-0.8.3.zip generated under output/target/ folder.\n\n\nNOTE:\n\n Please set JAVA_HOME environment before the build.\n\n\nOn linux:\n\n\nexport JAVA_HOME={path/to/jdk/root/path}\n\n\n\n\nOn Windows:\n\n\nset JAVA_HOME={path/to/jdk/root/path}\n\n\n\n\nNOTE:\n\nThe build requires network connection. If you are behind an enterprise proxy, make sure you have set the proxy in your env before running the build commands.\nFor windows:\n\n\nset HTTP_PROXY=http://host:port\nset HTTPS_PROXY= http://host:port\n\n\n\n\nFor Linux:\n\n\nexport HTTP_PROXY=http://host:port\nexport HTTPS_PROXY= http://host:port\n\n\n\n\nGearpump package structure\n\n\nYou need to flatten the \n.zip\n file to use it. On Linux, you can\n\n\nunzip gearpump-2.11-0.8.3.zip\n\n\n\n\nAfter decompression, the directory structure looks like picture 1.\n\n\n\n\nUnder bin/ folder, there are script files for Linux(bash script) and Windows(.bat script).\n\n\n\n\n\n\n\n\nscript\n\n\nfunction\n\n\n\n\n\n\n\n\n\n\nlocal\n\n\nYou can start the Gearpump cluster in single JVM(local mode), or in a distributed cluster(cluster mode). To start the cluster in local mode, you can use the local /local.bat helper scripts, it is very useful for developing or troubleshooting.\n\n\n\n\n\n\nmaster\n\n\nTo start Gearpump in cluster mode, you need to start one or more master nodes, which represent the global resource management center. master/master.bat is launcher script to boot the master node.\n\n\n\n\n\n\nworker\n\n\nTo start Gearpump in cluster mode, you also need to start several workers, with each worker represent a set of local resources. worker/worker.bat is launcher script to start the worker node.\n\n\n\n\n\n\nservices\n\n\nThis script is used to start backend REST service and other services for frontend UI dashboard (Default user \"admin, admin\").\n\n\n\n\n\n\n\n\nPlease check \nCommand Line Syntax\n for more information for each script.",
"title": "How to Get Your Gearpump Distribution"
},
{
"location": "/deployment/hardware-requirement/index.html",
"text": "Pre-requisite\n\n\nGearpump cluster can be installed on Windows OS and Linux.\n\n\nBefore installation, you need to decide how many machines are used to run this cluster.\n\n\nFor each machine, the requirements are listed in table below.\n\n\nTable: Environment requirement on single machine\n\n\n\n\n\n\n\n\nResource\n\n\nRequirements\n\n\n\n\n\n\n\n\n\n\nMemory\n\n\n2GB free memory is required to run the cluster. For any production system, 32GB memory is recommended.\n\n\n\n\n\n\nJava\n\n\nJRE 6 or above\n\n\n\n\n\n\nUser permission\n\n\nRoot permission is not required\n\n\n\n\n\n\nNetwork Ethernet\n\n\n(TCP/IP)\n\n\n\n\n\n\nCPU\n\n\nNothing special\n\n\n\n\n\n\nHDFS installation\n\n\nDefault is not required. You only need to install it when you want to store the application jars in HDFS.\n\n\n\n\n\n\nKafka installation\n\n\nDefault is not required. You need to install Kafka when you want the at-least once message delivery feature. Currently, the only supported data source for this feature is Kafka\n\n\n\n\n\n\n\n\nTable: The default port used in Gearpump:\n\n\n\n\n\n\n\n\nusage\n\n\nPort\n\n\nDescription\n\n\n\n\n\n\n\n\n\n\nDashboard UI\n\n\n8090\n\n\nWeb UI.\n\n\n\n\n\n\nDashboard web socket service\n\n\n8091\n\n\nUI backend web socket service for long connection.\n\n\n\n\n\n\nMaster port\n\n\n3000\n\n\nEvery other role like worker, appmaster, executor, user use this port to communicate with Master.\n\n\n\n\n\n\n\n\nYou need to ensure that your firewall has not banned these ports to ensure Gearpump can work correctly.\nAnd you can modify the port configuration. Check \nConfiguration\n section for details.",
"title": "Hardware Requirement"
},
{
"location": "/dev/dev-write-1st-app/index.html",
"text": "Write your first Gearpump Application\n\n\nWe'll use the classical \nwordcount\n example to illustrate how to write Gearpump applications.\n\n\n/** WordCount with High level DSL */\nobject WordCount extends AkkaApp with ArgumentsParser {\n\n override val options: Array[(String, CLIOption[Any])] = Array.empty\n\n override def main(akkaConf: Config, args: Array[String]): Unit = {\n val context = ClientContext(akkaConf)\n val app = StreamApp(\"dsl\", context)\n val data = \"This is a good start, bingo!! bingo!!\"\n\n //count for each word and output to log\n app.source(data.lines.toList, 1, \"source\").\n // word => (word, count)\n flatMap(line => line.split(\"[\\\\s]+\")).map((_, 1)).\n // (word, count1), (word, count2) => (word, count1 + count2)\n groupByKey().sum.log\n\n context.submit(app).waitUntilFinish()\n context.close()\n }\n}\n\n\n\n\nThe example is written in our [Stream DSL]\n(http://gearpump.apache.org/releases/latest/api/scala/index.html#org.apache.gearpump.streaming.dsl.Stream), which provides you with convenient combinators (e.g. \nflatMap\n, \ngroupByKey\n) to easily write up transformations.\n\n\nIDE Setup (Optional)\n\n\nYou can get your preferred IDE ready for Gearpump by following \nthis guide\n.\n\n\nSubmit application\n\n\nFinally, you need to package everything into a uber jar with \nproper dependencies\n and submit it to a Gearpump cluster. Please check out the \napplication submission tool\n.",
"title": "Write Your 1st App"
},
{
"location": "/dev/dev-write-1st-app/index.html#write-your-first-gearpump-application",
"text": "We'll use the classical wordcount example to illustrate how to write Gearpump applications. /** WordCount with High level DSL */\nobject WordCount extends AkkaApp with ArgumentsParser {\n\n override val options: Array[(String, CLIOption[Any])] = Array.empty\n\n override def main(akkaConf: Config, args: Array[String]): Unit = {\n val context = ClientContext(akkaConf)\n val app = StreamApp(\"dsl\", context)\n val data = \"This is a good start, bingo!! bingo!!\"\n\n //count for each word and output to log\n app.source(data.lines.toList, 1, \"source\").\n // word => (word, count)\n flatMap(line => line.split(\"[\\\\s]+\")).map((_, 1)).\n // (word, count1), (word, count2) => (word, count1 + count2)\n groupByKey().sum.log\n\n context.submit(app).waitUntilFinish()\n context.close()\n }\n} The example is written in our [Stream DSL]\n(http://gearpump.apache.org/releases/latest/api/scala/index.html#org.apache.gearpump.streaming.dsl.Stream), which provides you with convenient combinators (e.g. flatMap , groupByKey ) to easily write up transformations. IDE Setup (Optional) You can get your preferred IDE ready for Gearpump by following this guide . Submit application Finally, you need to package everything into a uber jar with proper dependencies and submit it to a Gearpump cluster. Please check out the application submission tool .",
"title": "Write your first Gearpump Application"
},
{
"location": "/dev/dev-custom-serializer/index.html",
"text": "Gearpump has a built-in serialization framework with a shaded Kryo version, which allows you to customize how a specific message type can be serialized. \n\n\nRegister a class before serialization.\n\n\nNote, to use built-in kryo serialization framework, Gearpump requires all classes to be registered explicitly before using, no matter you want to use a custom serializer or not. If not using custom serializer, Gearpump will use default com.esotericsoftware.kryo.serializers.FieldSerializer to serialize the class. \n\n\nTo register a class, you need to change the configuration file gear.conf(or application.conf if you want it only take effect for single application).\n\n\ngearpump {\n serializers {\n ## We will use default FieldSerializer to serialize this class type\n \"org.apache.gearpump.UserMessage\" = \"\"\n\n ## we will use custom serializer to serialize this class type\n \"org.apache.gearpump.UserMessage2\" = \"org.apache.gearpump.UserMessageSerializer\"\n }\n}\n\n\n\n\nHow to define a custom serializer for built-in kryo serialization framework\n\n\nWhen you decide that you want to define a custom serializer, you can do this in two ways.\n\n\nPlease note that Gearpump shaded the original Kryo dependency. The package name \ncom.esotericsoftware\n was relocated to \norg.apache.gearpump.esotericsoftware\n. So in the following customization, you should import corresponding shaded classes, the example code will show that part.\n\n\nIn general you should use the shaded version of a library whenever possible in order to avoid binary incompatibilities, eg don't use:\n\n\nimport com.google.common.io.Files\n\n\n\n\nbut rather\n\n\nimport org.apache.gearpump.google.common.io.Files\n\n\n\n\nSystem Level Serializer\n\n\nIf the serializer is widely used, you can define a global serializer which is available to all applications(or worker or master) in the system.\n\n\nStep1: you first need to develop a java library which contains the custom serializer class. here is an example:\n\n\npackage org.apache.gearpump\n\nimport org.apache.gearpump.esotericsoftware.kryo.{Kryo, Serializer}\nimport org.apache.gearpump.esotericsoftware.kryo.io.{Input, Output}\n\nclass UserMessage(longField: Long, intField: Int)\n\nclass UserMessageSerializer extends Serializer[UserMessage] {\n override def write(kryo: Kryo, output: Output, obj: UserMessage) = {\n output.writeLong(obj.longField)\n output.writeInt(obj.intField)\n }\n\n override def read(kryo: Kryo, input: Input, typ: Class[UserMessage]): UserMessage = {\n val longField = input.readLong()\n val intField = input.readInt()\n new UserMessage(longField, intField)\n }\n}\n\n\n\n\nStep2: Distribute the libraries\n\n\nDistribute the jar file to lib/ folder of every Gearpump installation in the cluster.\n\n\nStep3: change gear.conf on every machine of the cluster:\n\n\ngearpump {\n serializers {\n \"org.apache.gearpump.UserMessage\" = \"org.apache.gearpump.UserMessageSerializer\"\n }\n}\n\n\n\n\nAll set!\n\n\nDefine Application level custom serializer\n\n\nIf all you want is to define an application level serializer, which is only visible to current application AppMaster and Executors(including tasks), you can follow a different approach.\n\n\nStep1: Define your custom Serializer class\n\n\nYou should include the Serializer class in your application jar. Here is an example to define a custom serializer:\n\n\npackage org.apache.gearpump\n\nimport org.apache.gearpump.esotericsoftware.kryo.{Kryo, Serializer}\nimport org.apache.gearpump.esotericsoftware.kryo.io.{Input, Output}\n\nclass UserMessage(longField: Long, intField: Int)\n\nclass UserMessageSerializer extends Serializer[UserMessage] {\n override def write(kryo: Kryo, output: Output, obj: UserMessage) = {\n output.writeLong(obj.longField)\n output.writeInt(obj.intField)\n }\n\n override def read(kryo: Kryo, input: Input, typ: Class[UserMessage]): UserMessage = {\n val longField = input.readLong()\n val intField = input.readInt()\n new UserMessage(longField, intField)\n }\n}\n\n\n\n\nStep2: Put a application.conf in your classpath on Client machine where you submit the application,\n\n\n### content of application.conf\ngearpump {\n serializers {\n \"org.apache.gearpump.UserMessage\" = \"org.apache.gearpump.UserMessageSerializer\"\n }\n}\n\n\n\n\nStep3: All set!\n\n\nAdvanced: Choose another serialization framework\n\n\nNote: This is only for advanced user which require deep customization of Gearpump platform.\n\n\nThere are other serialization framework besides Kryo, like Protobuf. If user don't want to use the built-in kryo serialization framework, he can customize a new serialization framework. \n\n\nbasically, user need to define in gear.conf(or application.conf for single application's scope) file like this:\n\n\ngearpump.serialization-framework = \"org.apache.gearpump.serializer.CustomSerializationFramework\"\n\n\n\n\nPlease find an example in gearpump storm module, search \"StormSerializationFramework\" in source code.",
"title": "Customized Message Passing"
},
{
"location": "/dev/dev-connectors/index.html",
"text": "Basic Concepts\n\n\nDataSource\n and \nDataSink\n are the two main concepts Gearpump use to connect with the outside world.\n\n\nDataSource\n\n\nDataSource\n is the start point of a streaming processing flow. \n\n\nDataSink\n\n\nDataSink\n is the end point of a streaming processing flow.\n\n\nImplemented Connectors\n\n\nDataSource\n implemented\n\n\nCurrently, we have following \nDataSource\n supported.\n\n\n\n\n\n\n\n\nName\n\n\nDescription\n\n\n\n\n\n\n\n\n\n\nCollectionDataSource\n\n\nConvert a collection to a recursive data source. E.g. \nseq(1, 2, 3)\n will output \n1,2,3,1,2,3...\n.\n\n\n\n\n\n\nKafkaSource\n\n\nRead from Kafka.\n\n\n\n\n\n\n\n\nDataSink\n implemented\n\n\nCurrently, we have following \nDataSink\n supported.\n\n\n\n\n\n\n\n\nName\n\n\nDescription\n\n\n\n\n\n\n\n\n\n\nHBaseSink\n\n\nWrite the message to HBase. The message to write must be HBase \nPut\n or a tuple of \n(rowKey, family, column, value)\n.\n\n\n\n\n\n\nKafkaSink\n\n\nWrite to Kafka.\n\n\n\n\n\n\n\n\nUse of Connectors\n\n\nUse of Kafka connectors\n\n\nTo use Kafka connectors in your application, you first need to add the \ngearpump-external-kafka\n library dependency in your application:\n\n\nSBT\n\n\n\"org.apache.gearpump\" %% \"gearpump-external-kafka\" % 0.8.3\n\n\n\n\nXML\n\n\n<dependency>\n <groupId>org.apache.gearpump</groupId>\n <artifactId>gearpump-external-kafka</artifactId>\n <version>0.8.3</version>\n</dependency>\n\n\n\n\nThis is a simple example to read from Kafka and write it back using \nKafkaSource\n and \nKafkaSink\n. Users can optionally set a \nCheckpointStoreFactory\n such that Kafka offsets are checkpointed and at-least-once message delivery is guaranteed. \n\n\nLow level API\n\n\nval appConfig = UserConfig.empty\nval props = new Properties\nprops.put(KafkaConfig.ZOOKEEPER_CONNECT_CONFIG, zookeeperConnect)\nprops.put(KafkaConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList)\nprops.put(KafkaConfig.CHECKPOINT_STORE_NAME_PREFIX_CONFIG, appName)\nval source = new KafkaSource(sourceTopic, props)\nval checkpointStoreFactory = new KafkaStoreFactory(props)\nsource.setCheckpointStore(checkpointStoreFactory)\nval sourceProcessor = DataSourceProcessor(source, sourceNum)\nval sink = new KafkaSink(sinkTopic, props)\nval sinkProcessor = DataSinkProcessor(sink, sinkNum)\nval partitioner = new ShufflePartitioner\nval computation = sourceProcessor ~ partitioner ~> sinkProcessor\nval app = StreamApplication(appName, Graph(computation), appConfig)\n\n\n\n\nHigh level API\n\n\nval props = new Properties\nval appName = \"KafkaDSL\"\nprops.put(KafkaConfig.ZOOKEEPER_CONNECT_CONFIG, zookeeperConnect)\nprops.put(KafkaConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList)\nprops.put(KafkaConfig.CHECKPOINT_STORE_NAME_PREFIX_CONFIG, appName)\n\nval app = StreamApp(appName, context)\n\nif (atLeastOnce) {\n val checkpointStoreFactory = new KafkaStoreFactory(props)\n KafkaDSL.createAtLeastOnceStream(app, sourceTopic, checkpointStoreFactory, props, sourceNum)\n .writeToKafka(sinkTopic, props, sinkNum)\n} else {\n KafkaDSL.createAtMostOnceStream(app, sourceTopic, props, sourceNum)\n .writeToKafka(sinkTopic, props, sinkNum)\n}\n\n\n\n\nIn the above example, configurations are set through Java properties and shared by \nKafkaSource\n, \nKafkaSink\n and \nKafkaCheckpointStoreFactory\n.\nTheir configurations can be defined differently as below. \n\n\nKafkaSource\n configurations\n\n\n\n\n\n\n\n\nName\n\n\nDescriptions\n\n\nType\n\n\nDefault\n\n\n\n\n\n\n\n\n\n\nKafkaConfig.ZOOKEEPER_CONNECT_CONFIG\n\n\nZookeeper connect string for Kafka topics management\n\n\nString\n\n\n\n\n\n\n\n\nKafkaConfig.CLIENT_ID_CONFIG\n\n\nAn id string to pass to the server when making requests\n\n\nString\n\n\n\"\"\n\n\n\n\n\n\nKafkaConfig.GROUP_ID_CONFIG\n\n\nA string that uniquely identifies a set of consumers within the same consumer group\n\n\n\"\"\n\n\n\n\n\n\n\n\nKafkaConfig.FETCH_SLEEP_MS_CONFIG\n\n\nThe amount of time(ms) to sleep when hitting fetch.threshold\n\n\nInt\n\n\n100\n\n\n\n\n\n\nKafkaConfig.FETCH_THRESHOLD_CONFIG\n\n\nSize of internal queue to keep Kafka messages. Stop fetching and go to sleep when hitting the threshold\n\n\nInt\n\n\n10000\n\n\n\n\n\n\nKafkaConfig.PARTITION_GROUPER_CLASS_CONFIG\n\n\nPartition grouper class to group partitions among source tasks\n\n\nClass\n\n\nDefaultPartitionGrouper\n\n\n\n\n\n\nKafkaConfig.MESSAGE_DECODER_CLASS_CONFIG\n\n\nMessage decoder class to decode raw bytes from Kafka\n\n\nClass\n\n\nDefaultMessageDecoder\n\n\n\n\n\n\nKafkaConfig.TIMESTAMP_FILTER_CLASS_CONFIG\n\n\nTimestamp filter class to filter out late messages\n\n\nClass\n\n\nDefaultTimeStampFilter\n\n\n\n\n\n\n\n\nKafkaSink\n configurations\n\n\n\n\n\n\n\n\nName\n\n\nDescriptions\n\n\nType\n\n\nDefault\n\n\n\n\n\n\n\n\n\n\nKafkaConfig.BOOTSTRAP_SERVERS_CONFIG\n\n\nA list of host/port pairs to use for establishing the initial connection to the Kafka cluster\n\n\nString\n\n\n\n\n\n\n\n\nKafkaConfig.CLIENT_ID_CONFIG\n\n\nAn id string to pass to the server when making requests\n\n\nString\n\n\n\"\"\n\n\n\n\n\n\n\n\nKafkaCheckpointStoreFactory\n configurations\n\n\n\n\n\n\n\n\nName\n\n\nDescriptions\n\n\nType\n\n\nDefault\n\n\n\n\n\n\n\n\n\n\nKafkaConfig.ZOOKEEPER_CONNECT_CONFIG\n\n\nZookeeper connect string for Kafka topics management\n\n\nString\n\n\n\n\n\n\n\n\nKafkaConfig.BOOTSTRAP_SERVERS_CONFIG\n\n\nA list of host/port pairs to use for establishing the initial connection to the Kafka cluster\n\n\nString\n\n\n\n\n\n\n\n\nKafkaConfig.CHECKPOINT_STORE_NAME_PREFIX\n\n\nName prefix for checkpoint store\n\n\nString\n\n\n\"\"\n\n\n\n\n\n\nKafkaConfig.REPLICATION_FACTOR\n\n\nReplication factor for checkpoint store topic\n\n\nInt\n\n\n1\n\n\n\n\n\n\n\n\nUse of \nHBaseSink\n\n\nTo use \nHBaseSink\n in your application, you first need to add the \ngearpump-external-hbase\n library dependency in your application:\n\n\nSBT\n\n\n\"org.apache.gearpump\" %% \"gearpump-external-hbase\" % 0.8.3\n\n\n\n\nXML\n\n\n<dependency>\n <groupId>org.apache.gearpump</groupId>\n <artifactId>gearpump-external-hbase</artifactId>\n <version>0.8.3</version>\n</dependency>\n\n\n\n\nTo connect to HBase, you need to provide following info:\n\n\n\n\nthe HBase configuration to tell which HBase service to connect\n\n\nthe table name (you must create the table yourself, see the \nHBase documentation\n)\n\n\n\n\nThen, you can use \nHBaseSink\n in your application:\n\n\n//create the HBase data sink\nval sink = HBaseSink(UserConfig.empty, tableName, HBaseConfiguration.create())\n\n//create Gearpump Processor\nval sinkProcessor = DataSinkProcessor(sink, parallelism)\n\n\n:::scala\n//assume stream is a normal `Stream` in DSL\nstream.writeToHbase(UserConfig.empty, tableName, parallelism, \"write to HBase\")\n\n\n\n\nYou can tune the connection to HBase via the HBase configuration passed in. If not passed, Gearpump will try to check local classpath to find a valid HBase configuration (\nhbase-site.xml\n).\n\n\nAttention, due to the issue discussed \nhere\n you may need to create additional configuration for your HBase sink:\n\n\ndef hadoopConfig = {\n val conf = new Configuration()\n conf.set(\"hbase.zookeeper.quorum\", \"zookeeperHost\")\n conf.set(\"hbase.zookeeper.property.clientPort\", \"2181\")\n conf\n}\nval sink = HBaseSink(UserConfig.empty, tableName, hadoopConfig)\n\n\n\n\nHow to implement your own \nDataSource\n\n\nTo implement your own \nDataSource\n, you need to implement two things:\n\n\n\n\nThe data source itself\n\n\na helper class to easy the usage in a DSL\n\n\n\n\nImplement your own \nDataSource\n\n\nYou need to implement a class derived from \norg.apache.gearpump.streaming.transaction.api.TimeReplayableSource\n.\n\n\nImplement DSL helper (Optional)\n\n\nIf you would like to have a DSL at hand you may start with this customized stream; it is better if you can implement your own DSL helper.\nYou can refer \nKafkaDSLUtil\n as an example in Gearpump source.\n\n\nBelow is some code snippet from \nKafkaDSLUtil\n:\n\n\nobject KafkaDSLUtil {\n\n def createStream[T](\n app: StreamApp,\n topics: String,\n parallelism: Int,\n description: String,\n properties: Properties): dsl.Stream[T] = {\n app.source[T](new KafkaSource(topics, properties), parallelism, description)\n }\n}\n\n\n\n\nHow to implement your own \nDataSink\n\n\nTo implement your own \nDataSink\n, you need to implement two things:\n\n\n\n\nThe data sink itself\n\n\na helper class to make it easy use in DSL\n\n\n\n\nImplement your own \nDataSink\n\n\nYou need to implement a class derived from \norg.apache.gearpump.streaming.sink.DataSink\n.\n\n\nImplement DSL helper (Optional)\n\n\nIf you would like to have a DSL at hand you may start with this customized stream; it is better if you can implement your own DSL helper.\nYou can refer \nHBaseDSLSink\n as an example in Gearpump source.\n\n\nBelow is some code snippet from \nHBaseDSLSink\n:\n\n\nclass HBaseDSLSink[T](stream: Stream[T]) {\n def writeToHbase(userConfig: UserConfig, table: String, parallism: Int, description: String): Stream[T] = {\n stream.sink(HBaseSink[T](userConfig, table), parallism, userConfig, description)\n }\n\n def writeToHbase(userConfig: UserConfig, configuration: Configuration, table: String, parallism: Int, description: String): Stream[T] = {\n stream.sink(HBaseSink[T](userConfig, table, configuration), parallism, userConfig, description)\n } \n}\n\nobject HBaseDSLSink {\n implicit def streamToHBaseDSLSink[T](stream: Stream[T]): HBaseDSLSink[T] = {\n new HBaseDSLSink[T](stream)\n }\n}",
"title": "Gearpump Connectors"
},
{
"location": "/dev/dev-connectors/index.html#basic-concepts",
"text": "DataSource and DataSink are the two main concepts Gearpump use to connect with the outside world. DataSource DataSource is the start point of a streaming processing flow. DataSink DataSink is the end point of a streaming processing flow.",
"title": "Basic Concepts"
},
{
"location": "/dev/dev-connectors/index.html#implemented-connectors",
"text": "DataSource implemented Currently, we have following DataSource supported. Name Description CollectionDataSource Convert a collection to a recursive data source. E.g. seq(1, 2, 3) will output 1,2,3,1,2,3... . KafkaSource Read from Kafka. DataSink implemented Currently, we have following DataSink supported. Name Description HBaseSink Write the message to HBase. The message to write must be HBase Put or a tuple of (rowKey, family, column, value) . KafkaSink Write to Kafka.",
"title": "Implemented Connectors"
},
{
"location": "/dev/dev-connectors/index.html#use-of-connectors",
"text": "Use of Kafka connectors To use Kafka connectors in your application, you first need to add the gearpump-external-kafka library dependency in your application: SBT \"org.apache.gearpump\" %% \"gearpump-external-kafka\" % 0.8.3 XML <dependency>\n <groupId>org.apache.gearpump</groupId>\n <artifactId>gearpump-external-kafka</artifactId>\n <version>0.8.3</version>\n</dependency> This is a simple example to read from Kafka and write it back using KafkaSource and KafkaSink . Users can optionally set a CheckpointStoreFactory such that Kafka offsets are checkpointed and at-least-once message delivery is guaranteed. Low level API val appConfig = UserConfig.empty\nval props = new Properties\nprops.put(KafkaConfig.ZOOKEEPER_CONNECT_CONFIG, zookeeperConnect)\nprops.put(KafkaConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList)\nprops.put(KafkaConfig.CHECKPOINT_STORE_NAME_PREFIX_CONFIG, appName)\nval source = new KafkaSource(sourceTopic, props)\nval checkpointStoreFactory = new KafkaStoreFactory(props)\nsource.setCheckpointStore(checkpointStoreFactory)\nval sourceProcessor = DataSourceProcessor(source, sourceNum)\nval sink = new KafkaSink(sinkTopic, props)\nval sinkProcessor = DataSinkProcessor(sink, sinkNum)\nval partitioner = new ShufflePartitioner\nval computation = sourceProcessor ~ partitioner ~> sinkProcessor\nval app = StreamApplication(appName, Graph(computation), appConfig) High level API val props = new Properties\nval appName = \"KafkaDSL\"\nprops.put(KafkaConfig.ZOOKEEPER_CONNECT_CONFIG, zookeeperConnect)\nprops.put(KafkaConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList)\nprops.put(KafkaConfig.CHECKPOINT_STORE_NAME_PREFIX_CONFIG, appName)\n\nval app = StreamApp(appName, context)\n\nif (atLeastOnce) {\n val checkpointStoreFactory = new KafkaStoreFactory(props)\n KafkaDSL.createAtLeastOnceStream(app, sourceTopic, checkpointStoreFactory, props, sourceNum)\n .writeToKafka(sinkTopic, props, sinkNum)\n} else {\n KafkaDSL.createAtMostOnceStream(app, sourceTopic, props, sourceNum)\n .writeToKafka(sinkTopic, props, sinkNum)\n} In the above example, configurations are set through Java properties and shared by KafkaSource , KafkaSink and KafkaCheckpointStoreFactory .\nTheir configurations can be defined differently as below. KafkaSource configurations Name Descriptions Type Default KafkaConfig.ZOOKEEPER_CONNECT_CONFIG Zookeeper connect string for Kafka topics management String KafkaConfig.CLIENT_ID_CONFIG An id string to pass to the server when making requests String \"\" KafkaConfig.GROUP_ID_CONFIG A string that uniquely identifies a set of consumers within the same consumer group \"\" KafkaConfig.FETCH_SLEEP_MS_CONFIG The amount of time(ms) to sleep when hitting fetch.threshold Int 100 KafkaConfig.FETCH_THRESHOLD_CONFIG Size of internal queue to keep Kafka messages. Stop fetching and go to sleep when hitting the threshold Int 10000 KafkaConfig.PARTITION_GROUPER_CLASS_CONFIG Partition grouper class to group partitions among source tasks Class DefaultPartitionGrouper KafkaConfig.MESSAGE_DECODER_CLASS_CONFIG Message decoder class to decode raw bytes from Kafka Class DefaultMessageDecoder KafkaConfig.TIMESTAMP_FILTER_CLASS_CONFIG Timestamp filter class to filter out late messages Class DefaultTimeStampFilter KafkaSink configurations Name Descriptions Type Default KafkaConfig.BOOTSTRAP_SERVERS_CONFIG A list of host/port pairs to use for establishing the initial connection to the Kafka cluster String KafkaConfig.CLIENT_ID_CONFIG An id string to pass to the server when making requests String \"\" KafkaCheckpointStoreFactory configurations Name Descriptions Type Default KafkaConfig.ZOOKEEPER_CONNECT_CONFIG Zookeeper connect string for Kafka topics management String KafkaConfig.BOOTSTRAP_SERVERS_CONFIG A list of host/port pairs to use for establishing the initial connection to the Kafka cluster String KafkaConfig.CHECKPOINT_STORE_NAME_PREFIX Name prefix for checkpoint store String \"\" KafkaConfig.REPLICATION_FACTOR Replication factor for checkpoint store topic Int 1 Use of HBaseSink To use HBaseSink in your application, you first need to add the gearpump-external-hbase library dependency in your application: SBT \"org.apache.gearpump\" %% \"gearpump-external-hbase\" % 0.8.3 XML <dependency>\n <groupId>org.apache.gearpump</groupId>\n <artifactId>gearpump-external-hbase</artifactId>\n <version>0.8.3</version>\n</dependency> To connect to HBase, you need to provide following info: the HBase configuration to tell which HBase service to connect the table name (you must create the table yourself, see the HBase documentation ) Then, you can use HBaseSink in your application: //create the HBase data sink\nval sink = HBaseSink(UserConfig.empty, tableName, HBaseConfiguration.create())\n\n//create Gearpump Processor\nval sinkProcessor = DataSinkProcessor(sink, parallelism)\n\n\n:::scala\n//assume stream is a normal `Stream` in DSL\nstream.writeToHbase(UserConfig.empty, tableName, parallelism, \"write to HBase\") You can tune the connection to HBase via the HBase configuration passed in. If not passed, Gearpump will try to check local classpath to find a valid HBase configuration ( hbase-site.xml ). Attention, due to the issue discussed here you may need to create additional configuration for your HBase sink: def hadoopConfig = {\n val conf = new Configuration()\n conf.set(\"hbase.zookeeper.quorum\", \"zookeeperHost\")\n conf.set(\"hbase.zookeeper.property.clientPort\", \"2181\")\n conf\n}\nval sink = HBaseSink(UserConfig.empty, tableName, hadoopConfig)",
"title": "Use of Connectors"
},
{
"location": "/dev/dev-connectors/index.html#how-to-implement-your-own-datasource",
"text": "To implement your own DataSource , you need to implement two things: The data source itself a helper class to easy the usage in a DSL Implement your own DataSource You need to implement a class derived from org.apache.gearpump.streaming.transaction.api.TimeReplayableSource . Implement DSL helper (Optional) If you would like to have a DSL at hand you may start with this customized stream; it is better if you can implement your own DSL helper.\nYou can refer KafkaDSLUtil as an example in Gearpump source. Below is some code snippet from KafkaDSLUtil : object KafkaDSLUtil {\n\n def createStream[T](\n app: StreamApp,\n topics: String,\n parallelism: Int,\n description: String,\n properties: Properties): dsl.Stream[T] = {\n app.source[T](new KafkaSource(topics, properties), parallelism, description)\n }\n}",
"title": "How to implement your own DataSource"
},
{
"location": "/dev/dev-connectors/index.html#how-to-implement-your-own-datasink",
"text": "To implement your own DataSink , you need to implement two things: The data sink itself a helper class to make it easy use in DSL Implement your own DataSink You need to implement a class derived from org.apache.gearpump.streaming.sink.DataSink . Implement DSL helper (Optional) If you would like to have a DSL at hand you may start with this customized stream; it is better if you can implement your own DSL helper.\nYou can refer HBaseDSLSink as an example in Gearpump source. Below is some code snippet from HBaseDSLSink : class HBaseDSLSink[T](stream: Stream[T]) {\n def writeToHbase(userConfig: UserConfig, table: String, parallism: Int, description: String): Stream[T] = {\n stream.sink(HBaseSink[T](userConfig, table), parallism, userConfig, description)\n }\n\n def writeToHbase(userConfig: UserConfig, configuration: Configuration, table: String, parallism: Int, description: String): Stream[T] = {\n stream.sink(HBaseSink[T](userConfig, table, configuration), parallism, userConfig, description)\n } \n}\n\nobject HBaseDSLSink {\n implicit def streamToHBaseDSLSink[T](stream: Stream[T]): HBaseDSLSink[T] = {\n new HBaseDSLSink[T](stream)\n }\n}",
"title": "How to implement your own DataSink"
},
{
"location": "/dev/dev-storm/index.html",
"text": "Gearpump provides \nbinary compatibility\n for Apache Storm applications. That is to say, users could easily grab an existing Storm jar and run it \non Gearpump. This documentation illustrates Gearpump's compatibility with Storm. \n\n\nWhat Storm features are supported on Gearpump\n\n\nStorm 0.9.x\n\n\n\n\n\n\n\n\nFeature\n\n\nSupport\n\n\n\n\n\n\n\n\n\n\nbasic topology\n\n\nyes\n\n\n\n\n\n\nDRPC\n\n\nyes\n\n\n\n\n\n\nmulti-lang\n\n\nyes\n\n\n\n\n\n\nstorm-kafka\n\n\nyes\n\n\n\n\n\n\nTrident\n\n\nno\n\n\n\n\n\n\n\n\nStorm 0.10.x\n\n\n\n\n\n\n\n\nFeature\n\n\nSupport\n\n\n\n\n\n\n\n\n\n\nbasic topology\n\n\nyes\n\n\n\n\n\n\nDRPC\n\n\nyes\n\n\n\n\n\n\nmulti-lang\n\n\nyes\n\n\n\n\n\n\nstorm-kafka\n\n\nyes\n\n\n\n\n\n\nstorm-hdfs\n\n\nyes\n\n\n\n\n\n\nstorm-hbase\n\n\nyes\n\n\n\n\n\n\nstorm-hive\n\n\nyes\n\n\n\n\n\n\nstorm-jdbc\n\n\nyes\n\n\n\n\n\n\nstorm-redis\n\n\nyes\n\n\n\n\n\n\nflux\n\n\nyes\n\n\n\n\n\n\nstorm-eventhubs\n\n\nnot verified\n\n\n\n\n\n\nTrident\n\n\nno\n\n\n\n\n\n\n\n\nAt Least Once support\n\n\nWith Ackers enabled, there are two kinds of At Least Once support in both Storm 0.9.x and Storm 0.10.x.\n\n\n\n\nspout will replay messages on message loss as long as spout is alive\n\n\nIf \nKafkaSpout\n is used, messages could be replayed from Kafka even if the spout crashes. \n\n\n\n\nGearpump supports the second for both Storm versions. \n\n\nSecurity support\n\n\nStorm 0.10.x adds security support for following connectors \n\n\n\n\nstorm-hdfs\n\n\nstorm-hive\n\n\nstorm-hbase\n\n\n\n\nThat means users could access kerberos enabled HDFS, Hive and HBase with these connectors. Generally, Storm provides two approaches (please refer to above links for more information)\n\n\n\n\nconfigure nimbus to automatically get delegation tokens on behalf of the topology submitter user\n\n\nkerberos keytabs are already distributed on worker hosts; users configure keytab path and principal\n\n\n\n\nGearpump supports the second approach and users needs to add classpath of HDFS/Hive/HBase to \ngearpump.executor.extraClasspath\n in \ngear.conf\n on each node. For example, \n\n\n###################\n### Executor argument configuration\n### Executor JVM can contains multiple tasks\n###################\nexecutor {\nvmargs = \"-server -Xms512M -Xmx1024M -Xss1M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:NewRatio=3 -Djava.rmi.server.hostname=localhost\"\nextraClasspath = \"/etc/hadoop/conf\"\n}\n\n\n\n\nHow to run a Storm application on Gearpump\n\n\nThis section shows how to run an existing Storm jar in a local Gearpump cluster.\n\n\n\n\n\n\nlaunch a local cluster\n\n\nbin/local\n\n\n\n\n\n\n\n\nstart a Gearpump Nimbus server \n\n\nUsers need server's address(\nnimbus.host\n and \nnimbus.thrift.port\n) to submit topologies later. The address is written to a yaml config file set with \n-output\n option. \nUsers can provide an existing config file where only the address will be overwritten. If not provided, a new file \napp.yaml\n is created with the config.\n\n\nbin/storm nimbus -output [conf <custom yaml config>]\n\n\n\n\n\n\n\n\nsubmit Storm applications\n\n\nUsers can either submit Storm applications through command line or UI. \n\n\na. submit Storm applications through command line\n\n\nbin/storm app -verbose -config app.yaml -jar storm-starter-${STORM_VERSION}.jar storm.starter.ExclamationTopology exclamation\n\n\n\n\nUsers are able to configure their applications through following options\n\n\n\n\njar\n - set the path of a Storm application jar\n\n\nconfig\n - submit the custom configuration file generated when launching Nimbus\n\n\n\n\nb. submit Storm application through UI\n\n\n\n\nClick on the \"Create\" button on the applications page on UI. \n\n\nClick on the \"Submit Storm Application\" item in the pull down menu.\n\n\nIn the popup console, upload the Storm application jar and the configuration file generated when launching Nimbus,\n and fill in \nstorm.starter.ExclamationTopology exclamation\n as arguments.\n\n\nClick on the \"Submit\" button \n\n\n\n\n\n\n\n\nEither way, check the dashboard and you should see data flowing through your topology. \n\n\nHow is it different from running on Storm\n\n\nTopology submission\n\n\nWhen a client submits a Storm topology, Gearpump launches locally a simplified version of Storm's Nimbus server \nGearpumpNimbus\n. \nGearpumpNimbus\n then translates topology to a directed acyclic graph (DAG) of Gearpump, which is submitted to Gearpump master and deployed as a Gearpump application. \n\n\n\n\nGearpumpNimbus\n supports the following methods\n\n\n\n\nsubmitTopology\n / \nsubmitTopologyWithOpts\n\n\nkillTopology\n / \nkillTopologyWithOpts\n\n\ngetTopology\n / \ngetUserTopology\n\n\ngetClusterInfo\n\n\n\n\nTopology translation\n\n\nHere's an example of \nWordCountTopology\n with acker bolts (ackers) being translated into a Gearpump DAG.\n\n\n\n\nGearpump creates a \nStormProducer\n for each Storm spout and a \nStormProcessor\n for each Storm bolt (except for ackers) with the same parallelism, and wires them together using the same grouping strategy (partitioning in Gearpump) as in Storm. \n\n\nAt runtime, spouts and bolts are running inside \nStormProducer\n tasks and \nStormProcessor\n tasks respectively. Messages emitted by spout are passed to \nStormProducer\n, transferred to \nStormProcessor\n and passed down to bolt. Messages are serialized / de-serialized with Storm serializers.\n\n\nStorm ackers are dropped since Gearpump has a different mechanism of message tracking and flow control. \n\n\nTask execution\n\n\nEach Storm task is executed by a dedicated thread while all Gearpump tasks of an executor share a thread pool. Generally, we can achieve better performance with a shared thread pool. It's possible, however, some tasks block and take up all the threads. In that case, we can \nfall back to the Storm way by setting \ngearpump.task-dispatcher\n to \n\"gearpump.single-thread-dispatcher\"\n in \ngear.conf\n.\n\n\nMessage tracking\n\n\nStorm tracks the lineage of each message with ackers to guarantee at-least-once message delivery. Failed messages are re-sent from spout.\n\n\nGearpump \ntracks messages between a sender and receiver in an efficient way\n. Message loss causes the whole application to replay from the \nminimum timestamp of all pending messages in the system\n. \n\n\nFlow control\n\n\nStorm throttles flow rate at spout, which stops sending messages if the number of unacked messages exceeds \ntopology.max.spout.pending\n. \n\n\nGearpump has flow control between tasks such that \nsender cannot flood receiver\n, which is backpressured till the source.\n\n\nConfigurations\n\n\nAll Storm configurations are respected with the following priority order \n\n\ndefaults.yaml < custom file config < application config < component config\n\n\n\n\nwhere\n\n\n\n\napplication config is submit from Storm application along with the topology \n\n\ncomponent config is set in spout / bolt with \ngetComponentConfiguration\n\n\ncustom file config is specified with the \n-config\n option when submitting Storm application from command line or uploaded from UI\n\n\n\n\nStreamCQL Support\n\n\nStreamCQL\n is a Continuous Query Language on RealTime Computation System open sourced by Huawei.\nSince StreamCQL already supports Storm, it's straightforward to run StreamCQL over Gearpump.\n\n\n\n\n\n\nInstall StreamCQL as in the official \nREADME\n\n\n\n\n\n\nLaunch Gearpump Nimbus Server as before \n\n\n\n\n\n\nGo to the installed stream-cql-binary, and change following settings in \nconf/streaming-site.xml\n with the output Nimbus configs in Step 2.\n\n\n<property>\n <name>streaming.storm.nimbus.host</name>\n <value>${nimbus.host}</value>\n</property>\n<property>\n <name>streaming.storm.nimbus.port</name>\n <value>${nimbus.thrift.port}</value>\n</property>\n\n\n\n\n\n\n\n\nOpen CQL client shell with \nbin/cql\n and execute a simple cql example \n\n\nStreaming> CREATE INPUT STREAM s\n (id INT, name STRING, type INT)\nSOURCE randomgen\n PROPERTIES ( timeUnit = \"SECONDS\", period = \"1\",\n eventNumPerperiod = \"1\", isSchedule = \"true\" );\n\nCREATE OUTPUT STREAM rs\n (type INT, cc INT)\nSINK consoleOutput;\n\nINSERT INTO STREAM rs SELECT type, COUNT(id) as cc\n FROM s[RANGE 20 SECONDS BATCH]\n WHERE id > 5 GROUP BY type;\n\nSUBMIT APPLICATION example;\n\n\n\n\n\n\n\n\nCheck the dashboard and you should see data flowing through a topology of 3 components.",
"title": "Storm Compatibility"
},
{
"location": "/dev/dev-storm/index.html#what-storm-features-are-supported-on-gearpump",
"text": "Storm 0.9.x Feature Support basic topology yes DRPC yes multi-lang yes storm-kafka yes Trident no Storm 0.10.x Feature Support basic topology yes DRPC yes multi-lang yes storm-kafka yes storm-hdfs yes storm-hbase yes storm-hive yes storm-jdbc yes storm-redis yes flux yes storm-eventhubs not verified Trident no At Least Once support With Ackers enabled, there are two kinds of At Least Once support in both Storm 0.9.x and Storm 0.10.x. spout will replay messages on message loss as long as spout is alive If KafkaSpout is used, messages could be replayed from Kafka even if the spout crashes. Gearpump supports the second for both Storm versions. Security support Storm 0.10.x adds security support for following connectors storm-hdfs storm-hive storm-hbase That means users could access kerberos enabled HDFS, Hive and HBase with these connectors. Generally, Storm provides two approaches (please refer to above links for more information) configure nimbus to automatically get delegation tokens on behalf of the topology submitter user kerberos keytabs are already distributed on worker hosts; users configure keytab path and principal Gearpump supports the second approach and users needs to add classpath of HDFS/Hive/HBase to gearpump.executor.extraClasspath in gear.conf on each node. For example, ###################\n### Executor argument configuration\n### Executor JVM can contains multiple tasks\n###################\nexecutor {\nvmargs = \"-server -Xms512M -Xmx1024M -Xss1M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:NewRatio=3 -Djava.rmi.server.hostname=localhost\"\nextraClasspath = \"/etc/hadoop/conf\"\n}",
"title": "What Storm features are supported on Gearpump"
},
{
"location": "/dev/dev-storm/index.html#how-to-run-a-storm-application-on-gearpump",
"text": "This section shows how to run an existing Storm jar in a local Gearpump cluster. launch a local cluster bin/local start a Gearpump Nimbus server Users need server's address( nimbus.host and nimbus.thrift.port ) to submit topologies later. The address is written to a yaml config file set with -output option. \nUsers can provide an existing config file where only the address will be overwritten. If not provided, a new file app.yaml is created with the config. bin/storm nimbus -output [conf <custom yaml config>] submit Storm applications Users can either submit Storm applications through command line or UI. a. submit Storm applications through command line bin/storm app -verbose -config app.yaml -jar storm-starter-${STORM_VERSION}.jar storm.starter.ExclamationTopology exclamation Users are able to configure their applications through following options jar - set the path of a Storm application jar config - submit the custom configuration file generated when launching Nimbus b. submit Storm application through UI Click on the \"Create\" button on the applications page on UI. Click on the \"Submit Storm Application\" item in the pull down menu. In the popup console, upload the Storm application jar and the configuration file generated when launching Nimbus,\n and fill in storm.starter.ExclamationTopology exclamation as arguments. Click on the \"Submit\" button Either way, check the dashboard and you should see data flowing through your topology.",
"title": "How to run a Storm application on Gearpump"
},
{
"location": "/dev/dev-storm/index.html#how-is-it-different-from-running-on-storm",
"text": "Topology submission When a client submits a Storm topology, Gearpump launches locally a simplified version of Storm's Nimbus server GearpumpNimbus . GearpumpNimbus then translates topology to a directed acyclic graph (DAG) of Gearpump, which is submitted to Gearpump master and deployed as a Gearpump application. GearpumpNimbus supports the following methods submitTopology / submitTopologyWithOpts killTopology / killTopologyWithOpts getTopology / getUserTopology getClusterInfo Topology translation Here's an example of WordCountTopology with acker bolts (ackers) being translated into a Gearpump DAG. Gearpump creates a StormProducer for each Storm spout and a StormProcessor for each Storm bolt (except for ackers) with the same parallelism, and wires them together using the same grouping strategy (partitioning in Gearpump) as in Storm. At runtime, spouts and bolts are running inside StormProducer tasks and StormProcessor tasks respectively. Messages emitted by spout are passed to StormProducer , transferred to StormProcessor and passed down to bolt. Messages are serialized / de-serialized with Storm serializers. Storm ackers are dropped since Gearpump has a different mechanism of message tracking and flow control. Task execution Each Storm task is executed by a dedicated thread while all Gearpump tasks of an executor share a thread pool. Generally, we can achieve better performance with a shared thread pool. It's possible, however, some tasks block and take up all the threads. In that case, we can \nfall back to the Storm way by setting gearpump.task-dispatcher to \"gearpump.single-thread-dispatcher\" in gear.conf . Message tracking Storm tracks the lineage of each message with ackers to guarantee at-least-once message delivery. Failed messages are re-sent from spout. Gearpump tracks messages between a sender and receiver in an efficient way . Message loss causes the whole application to replay from the minimum timestamp of all pending messages in the system . Flow control Storm throttles flow rate at spout, which stops sending messages if the number of unacked messages exceeds topology.max.spout.pending . Gearpump has flow control between tasks such that sender cannot flood receiver , which is backpressured till the source. Configurations All Storm configurations are respected with the following priority order defaults.yaml < custom file config < application config < component config where application config is submit from Storm application along with the topology component config is set in spout / bolt with getComponentConfiguration custom file config is specified with the -config option when submitting Storm application from command line or uploaded from UI",
"title": "How is it different from running on Storm"
},
{
"location": "/dev/dev-storm/index.html#streamcql-support",
"text": "StreamCQL is a Continuous Query Language on RealTime Computation System open sourced by Huawei.\nSince StreamCQL already supports Storm, it's straightforward to run StreamCQL over Gearpump. Install StreamCQL as in the official README Launch Gearpump Nimbus Server as before Go to the installed stream-cql-binary, and change following settings in conf/streaming-site.xml with the output Nimbus configs in Step 2. <property>\n <name>streaming.storm.nimbus.host</name>\n <value>${nimbus.host}</value>\n</property>\n<property>\n <name>streaming.storm.nimbus.port</name>\n <value>${nimbus.thrift.port}</value>\n</property> Open CQL client shell with bin/cql and execute a simple cql example Streaming> CREATE INPUT STREAM s\n (id INT, name STRING, type INT)\nSOURCE randomgen\n PROPERTIES ( timeUnit = \"SECONDS\", period = \"1\",\n eventNumPerperiod = \"1\", isSchedule = \"true\" );\n\nCREATE OUTPUT STREAM rs\n (type INT, cc INT)\nSINK consoleOutput;\n\nINSERT INTO STREAM rs SELECT type, COUNT(id) as cc\n FROM s[RANGE 20 SECONDS BATCH]\n WHERE id > 5 GROUP BY type;\n\nSUBMIT APPLICATION example; Check the dashboard and you should see data flowing through a topology of 3 components.",
"title": "StreamCQL Support"
},
{
"location": "/dev/dev-ide-setup/index.html",
"text": "Intellij IDE Setup\n\n\n\n\nIn Intellij, download scala plugin. We are using scala version 2.11\n\n\nOpen menu \"File->Open\" to open Gearpump root project, then choose the Gearpump source folder.\n\n\nAll set.\n\n\n\n\nNOTE:\n Intellij Scala plugin is already bundled with sbt. If you have Scala plugin installed, please don't install additional sbt plugin. Check your settings at \"Settings -> Plugins\"\n\nNOTE:\n If you are behind a proxy, to speed up the build, please set the proxy for sbt in \"Settings -> Build Tools > SBT\". in input field \"VM parameters\", add \n\n\n-Dhttp.proxyHost=<proxy host>\n-Dhttp.proxyPort=<port like 911>\n-Dhttps.proxyHost=<proxy host>\n-Dhttps.proxyPort=<port like 911>\n\n\n\n\nEclipse IDE Setup\n\n\nI will show how to do this in eclipse LUNA.\n\n\nThere is a sbt-eclipse plugin to generate eclipse project files, but seems there are some bugs, and some manual fix is still required. Here is the steps that works for me:\n\n\n\n\nInstall latest version eclipse luna\n\n\nInstall latest scala-IDE http://scala-ide.org/download/current.html I use update site address: http://download.scala-ide.org/sdk/lithium/e44/scala211/stable/site\n\n\nOpen a sbt shell under the root folder of Gearpump. enter \"eclipse\", then we get all eclipse project file generated.\n\n\nUse eclipse import wizard. File->Import->Existing projects into Workspace, make sure to tick the option \"Search for nested projects\"\n\n\nThen it may starts to complain about encoding error, like \"IO error while decoding\". You need to fix the eclipse default text encoding by changing configuration at \"Window->Preference->General->Workspace->Text file encoding\" to UTF-8.\n\n\nThen the project gearpump-external-kafka may still cannot compile. The reason is that there is some dependencies missing in generated .classpath file by sbt-eclipse. We need to do some manual fix. Right click on project icon of gearpump-external-kafka in eclipse, then choose menu \"Build Path->Configure Build Path\". A window will popup. Under the tab \"projects\", click add, choose \"gearpump-streaming\"\n\n\nAll set. Now the project should compile OK in eclipse.",
"title": "IDE Setup"
},
{
"location": "/dev/dev-non-streaming-example/index.html",
"text": "We'll use \nDistributed Shell\n as an example to illustrate how to do that.\n\n\nWhat Distributed Shell do is that user send a shell command to the cluster and the command will the executed on each node, then the result will be return to user.\n\n\nMaven/Sbt Settings\n\n\nRepository and library dependencies can be found at \nMaven Setting\n\n\nDefine Executor Class\n\n\nclass ShellExecutor(executorContext: ExecutorContext, userConf : UserConfig) extends Actor{\n import executorContext._\n\n override def receive: Receive = {\n case ShellCommand(command, args) =>\n val process = Try(s\"$command $args\" !!)\n val result = process match {\n case Success(msg) => msg\n case Failure(ex) => ex.getMessage\n }\n sender ! ShellCommandResult(executorId, result)\n }\n}\n\n\n\n\nSo ShellExecutor just receive the ShellCommand and try to execute it and return the result to the sender, which is quite simple.\n\n\nDefine AppMaster Class\n\n\nFor a non-streaming application, you have to write your own AppMaster.\n\n\nHere is a typical user defined AppMaster, please note that some trivial codes are omitted.\n\n\nclass DistShellAppMaster(appContext : AppMasterContext, app : Application) extends ApplicationMaster {\n protected var currentExecutorId = 0\n\n override def preStart(): Unit = {\n ActorUtil.launchExecutorOnEachWorker(masterProxy, getExecutorJvmConfig, self)\n }\n\n override def receive: Receive = {\n case ExecutorSystemStarted(executorSystem) =>\n import executorSystem.{address, worker, resource => executorResource}\n val executorContext = ExecutorContext(currentExecutorId, worker.workerId, appId, self, executorResource)\n val executor = context.actorOf(Props(classOf[ShellExecutor], executorContext, app.userConfig)\n .withDeploy(Deploy(scope = RemoteScope(address))), currentExecutorId.toString)\n executorSystem.bindLifeCycleWith(executor)\n currentExecutorId += 1\n case StartExecutorSystemTimeout =>\n masterProxy ! ShutdownApplication(appId)\n context.stop(self)\n case msg: ShellCommand =>\n Future.fold(context.children.map(_ ? msg))(new ShellCommandResultAggregator) { (aggregator, response) =>\n aggregator.aggregate(response.asInstanceOf[ShellCommandResult])\n }.map(_.toString()) pipeTo sender\n }\n\n private def getExecutorJvmConfig: ExecutorSystemJvmConfig = {\n val config: Config = Option(app.clusterConfig).map(_.getConfig).getOrElse(ConfigFactory.empty())\n val jvmSetting = Util.resolveJvmSetting(config.withFallback(context.system.settings.config)).executor\n ExecutorSystemJvmConfig(jvmSetting.classPath, jvmSetting.vmargs,\n appJar, username, config)\n }\n}\n\n\n\n\nSo when this \nDistShellAppMaster\n started, first it will request resources to launch one executor on each node, which is done in method \npreStart\n\n\nThen the DistShellAppMaster's receive handler will handle the allocated resource to launch the \nShellExecutor\n we want. If you want to write your application, you can just use this part of code. The only thing needed is replacing the Executor class.\n\n\nThere may be a situation that the resource allocation failed which will bring the message \nStartExecutorSystemTimeout\n, the normal pattern to handle that is just what we do: shut down the application.\n\n\nThe real application logic part is in \nShellCommand\n message handler, which is specific to different applications. Here we distribute the shell command to each executor and aggregate the results to the client.\n\n\nFor method \ngetExecutorJvmConfig\n, you can just use this part of code in your own application.\n\n\nDefine Application\n\n\nNow its time to launch the application.\n\n\nobject DistributedShell extends App with ArgumentsParser {\n private val LOG: Logger = LogUtil.getLogger(getClass)\n\n override val options: Array[(String, CLIOption[Any])] = Array.empty\n\n LOG.info(s\"Distributed shell submitting application...\")\n val context = ClientContext()\n val appId = context.submit(Application[DistShellAppMaster](\"DistributedShell\", UserConfig.empty))\n context.close()\n LOG.info(s\"Distributed Shell Application started with appId $appId !\")\n}\n\n\n\n\nThe application class extends \nApp\n and `ArgumentsParser which make it easier to parse arguments and run main functions. This part is similar to the streaming applications.\n\n\nThe main class \nDistributeShell\n will submit an application to \nMaster\n, whose \nAppMaster\n is \nDistShellAppMaster\n.\n\n\nDefine an optional Client class\n\n\nNow, we can define a \nClient\n class to talk with \nAppMaster\n to pass our commands to it.\n\n\nobject DistributedShellClient extends App with ArgumentsParser {\n implicit val timeout = Constants.FUTURE_TIMEOUT\n import scala.concurrent.ExecutionContext.Implicits.global\n private val LOG: Logger = LoggerFactory.getLogger(getClass)\n\n override val options: Array[(String, CLIOption[Any])] = Array(\n \"master\" -> CLIOption[String](\"<host1:port1,host2:port2,host3:port3>\", required = true),\n \"appid\" -> CLIOption[Int](\"<the distributed shell appid>\", required = true),\n \"command\" -> CLIOption[String](\"<shell command>\", required = true),\n \"args\" -> CLIOption[String](\"<shell arguments>\", required = true)\n )\n\n val config = parse(args)\n val context = ClientContext(config.getString(\"master\"))\n val appid = config.getInt(\"appid\")\n val command = config.getString(\"command\")\n val arguments = config.getString(\"args\")\n val appMaster = context.resolveAppID(appid)\n (appMaster ? ShellCommand(command, arguments)).map { reslut =>\n LOG.info(s\"Result: $reslut\")\n context.close()\n }\n}\n\n\n\n\nIn the \nDistributedShellClient\n, it will resolve the appid to the real appmaster(the application id will be printed when launching \nDistributedShell\n).\n\n\nOnce we got the \nAppMaster\n, then we can send \nShellCommand\n to it and wait for the result.\n\n\nSubmit application\n\n\nAfter all these, you need to package everything into a uber jar and submit the jar to Gearpump Cluster. Please check \nApplication submission tool\n to command line tool syntax.",
"title": "Non Streaming Examples"
},
{
"location": "/dev/dev-rest-api/index.html",
"text": "Authentication.\n\n\nFor all REST API calls, We need authentication by default. If you don't want authentication, you can disable them.\n\n\nHow to disable Authentication\n\n\nTo disable Authentication, you can set \ngearpump-ui.gearpump.ui-security.authentication-enabled = false\n\nin gear.conf, please check \nUI Authentication\n for details.\n\n\nHow to authenticate if Authentication is enabled.\n\n\nFor User-Password based authentication\n\n\nIf Authentication is enabled, then you need to login before calling REST API.\n\n\ncurl -X POST --data username=admin --data password=admin --cookie-jar outputAuthenticationCookie.txt http://127.0.0.1:8090/login\n\n\n\n\nThis will use default user \"admin:admin\" to login, and store the authentication cookie to file outputAuthenticationCookie.txt.\n\n\nIn All subsequent Rest API calls, you need to add the authentication cookie. For example\n\n\ncurl --cookie outputAuthenticationCookie.txt http://127.0.0.1/api/v1.0/master\n\n\n\n\nfor more information, please check \nUI Authentication\n.\n\n\nFor OAuth2 based authentication\n\n\nFor OAuth2 based authentication, it requires you to have an access token in place.\n\n\nDifferent OAuth2 service provider have different way to return an access token.\n\n\nFor Google\n, you can refer to \nOAuth Doc\n.\n\n\nFor CloudFoundry UAA\n, you can use the uaac command to get the access token.\n\n\n$ uaac target http://login.gearpump.gotapaas.eu/\n$ uaac token get <user_email_address>\n\n### Find access token\n$ uaac context\n\n[0]*[http://login.gearpump.gotapaas.eu]\n\n [0]*[<user_email_address>]\n user_id: 34e33a79-42c6-479b-a8c1-8c471ff027fb\n client_id: cf\n token_type: bearer\n access_token: eyJhbGciOiJSUzI1NiJ9.eyJqdGkiOiI\n expires_in: 599\n scope: password.write openid cloud_controller.write cloud_controller.read\n jti: 74ea49e4-1001-4757-9f8d-a66e52a27557\n\n\n\n\nFor more information on uaac, please check \nUAAC guide\n\n\nNow, we have the access token, then let's login to Gearpump UI server with this access token:\n\n\n## Please replace cloudfoundryuaa with actual OAuth2 service name you have configured in gear.conf\ncurl -X POST --data accesstoken=eyJhbGciOiJSUzI1NiJ9.eyJqdGkiOiI --cookie-jar outputAuthenticationCookie.txt http://127.0.0.1:8090/login/oauth2/cloudfoundryuaa/accesstoken\n\n\n\n\nThis will use user \nuser_email_address\n to login, and store the authentication cookie to file outputAuthenticationCookie.txt.\n\n\nIn All subsequent Rest API calls, you need to add the authentication cookie. For example\n\n\ncurl --cookie outputAuthenticationCookie.txt http://127.0.0.1/api/v1.0/master\n\n\n\n\nNOTE:\n You can default the default permission level for OAuth2 user. for more information,\nplease check \nUI Authentication\n.\n\n\nQuery version\n\n\nGET version\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/version\n\n\n\n\nSample Response:\n\n\n0.8.3\n\n\n\n\nMaster Service\n\n\nGET api/v1.0/master\n\n\nGet information of masters\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/master\n\n\n\n\nSample Response:\n\n\n{\n \"masterDescription\": {\n \"leader\":{\"host\":\"master@127.0.0.1\",\"port\":3000},\n \"cluster\":[{\"host\":\"127.0.0.1\",\"port\":3000}]\n \"aliveFor\": \"642941\",\n \"logFile\": \"/Users/foobar/gearpump/logs\",\n \"jarStore\": \"jarstore/\",\n \"masterStatus\": \"synced\",\n \"homeDirectory\": \"/Users/foobar/gearpump\"\n }\n}\n\n\n\n\nGET api/v1.0/master/applist\n\n\nQuery information of all applications\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/master/applist\n\n\n\n\nSample Response:\n\n\n{\n \"appMasters\": [\n {\n \"status\": \"active\",\n \"appId\": 1,\n \"appName\": \"wordCount\",\n \"appMasterPath\": \"akka.tcp://app1-executor-1@127.0.0.1:52212/user/daemon/appdaemon1/$c\",\n \"workerPath\": \"akka.tcp://master@127.0.0.1:3000/user/Worker0\",\n \"submissionTime\": \"1450758114766\",\n \"startTime\": \"1450758117294\",\n \"user\": \"lisa\"\n }\n ]\n}\n\n\n\n\nGET api/v1.0/master/workerlist\n\n\nQuery information of all workers\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/master/workerlist\n\n\n\n\nSample Response:\n\n\n[\n {\n \"workerId\": \"1\",\n \"state\": \"active\",\n \"actorPath\": \"akka.tcp://master@127.0.0.1:3000/user/Worker0\",\n \"aliveFor\": \"431565\",\n \"logFile\": \"logs/\",\n \"executors\": [\n {\n \"appId\": 1,\n \"executorId\": -1,\n \"slots\": 1\n },\n {\n \"appId\": 1,\n \"executorId\": 0,\n \"slots\": 1\n }\n ],\n \"totalSlots\": 1000,\n \"availableSlots\": 998,\n \"homeDirectory\": \"/usr/lisa/gearpump/\",\n \"jvmName\": \"11788@lisa\"\n },\n {\n \"workerId\": \"0\",\n \"state\": \"active\",\n \"actorPath\": \"akka.tcp://master@127.0.0.1:3000/user/Worker1\",\n \"aliveFor\": \"431546\",\n \"logFile\": \"logs/\",\n \"executors\": [\n {\n \"appId\": 1,\n \"executorId\": 1,\n \"slots\": 1\n }\n ],\n \"totalSlots\": 1000,\n \"availableSlots\": 999,\n \"homeDirectory\": \"/usr/lisa/gearpump/\",\n \"jvmName\": \"11788@lisa\"\n }\n]\n\n\n\n\nGET api/v1.0/master/config\n\n\nGet the configuration of all masters\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/master/config\n\n\n\n\nSample Response:\n\n\n{\n \"extensions\": [\n \"akka.contrib.datareplication.DataReplication$\"\n ]\n \"akka\": {\n \"loglevel\": \"INFO\"\n \"log-dead-letters\": \"off\"\n \"log-dead-letters-during-shutdown\": \"off\"\n \"actor\": {\n ## Master forms a akka cluster\n \"provider\": \"akka.cluster.ClusterActorRefProvider\"\n }\n \"cluster\": {\n \"roles\": [\"master\"]\n \"auto-down-unreachable-after\": \"15s\"\n }\n \"remote\": {\n \"log-remote-lifecycle-events\": \"off\"\n }\n }\n}\n\n\n\n\nGET api/v1.0/master/metrics/<query_path>?readLatest=<true|false>\n\n\nGet the master node metrics.\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/master/metrics/master?readLatest=true\n\n\n\n\nSample Response:\n\n\n{\n \"path\"\n:\n \"master\", \"metrics\"\n:\n [{\n \"time\": \"1450758725070\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"master:memory.heap.used\", \"value\": \"59764272\"}\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"master:thread.daemon.count\", \"value\": \"18\"}\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"master:memory.total.committed\",\n \"value\": \"210239488\"\n }\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"master:memory.heap.max\", \"value\": \"880017408\"}\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"master:memory.total.max\", \"value\": \"997457920\"}\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"master:memory.heap.committed\",\n \"value\": \"179830784\"\n }\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"master:memory.total.used\", \"value\": \"89117352\"}\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"master:thread.count\", \"value\": \"28\"}\n }]\n}\n\n\n\n\nPOST api/v1.0/master/submitapp\n\n\nSubmit a streaming job jar to Gearpump cluster. It functions like command line\n\n\ngear app -jar xx.jar -conf yy.conf -executors 1 <command line arguments>\n\n\n\n\nRequired MIME type: \"multipart/form-data\"\n\n\nRequired post form fields:\n\n\n\n\nfield name \"jar\", job jar file.\n\n\n\n\nOptional post form fields:\n\n\n\n\n\"configfile\", configuration file, in UTF8 format.\n\n\n\"configstring\", text body of configuration file, in UTF8 format.\n\n\n\"executorcount\", The count of JVM process to start across the cluster for this application job\n\n\n\"args\", command line arguments for this job jar.\n\n\n\n\nExample html:\n\n\n<form id=\"submitapp\" action=\"http://127.0.0.1:8090/api/v1.0/master/submitapp\"\nmethod=\"POST\" enctype=\"multipart/form-data\">\n\nJob Jar (*.jar) [Required]: <br/>\n<input type=\"file\" name=\"jar\"/> <br/> <br/>\n\nConfig file (*.conf) [Optional]: <br/>\n<input type=\"file\" name=\"configfile\"/> <br/> <br/>\n\nConfig String, Config File in string format. [Optional]: <br/>\n<input type=\"text\" name=\"configstring\" value=\"a.b.c.d=1\"/> <br/><br/>\n\nExecutor count (integer, how many process to start for this streaming job) [Optional]: <br/>\n<input type=\"text\" name=\"executorcount\" value=\"1\"/> <br/><br/>\n\nApplication arguments (String) [Optional]: <br/>\n<input type=\"text\" name=\"args\" value=\"\"/> <br/><br/>\n\n<input type=\"submit\" value=\"Submit\"/>\n\n</table>\n\n</form>\n\n\n\n\nPOST api/v1.0/master/submitstormapp\n\n\nSubmit a storm jar to Gearpump cluster. It functions like command line\n\n\nstorm app -jar xx.jar -conf yy.yaml <command line arguments>\n\n\n\n\nRequired MIME type: \"multipart/form-data\"\n\n\nRequired post form fields:\n\n\n\n\nfield name \"jar\", job jar file.\n\n\n\n\nOptional post form fields:\n\n\n\n\n\"configfile\", .yaml configuration file, in UTF8 format.\n\n\n\"args\", command line arguments for this job jar.\n\n\n\n\nExample html:\n\n\n<form id=\"submitstormapp\" action=\"http://127.0.0.1:8090/api/v1.0/master/submitstormapp\"\nmethod=\"POST\" enctype=\"multipart/form-data\">\n\nJob Jar (*.jar) [Required]: <br/>\n<input type=\"file\" name=\"jar\"/> <br/> <br/>\n\nConfig file (*.yaml) [Optional]: <br/>\n<input type=\"file\" name=\"configfile\"/> <br/> <br/>\n\nApplication arguments (String) [Optional]: <br/>\n<input type=\"text\" name=\"args\" value=\"\"/> <br/><br/>\n\n<input type=\"submit\" value=\"Submit\"/>\n\n</table>\n\n</form>\n\n\n\n\nWorker service\n\n\nGET api/v1.0/worker/<workerId>\n\n\nQuery worker information.\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/worker/0\n\n\n\n\nSample Response:\n\n\n{\n \"workerId\": \"0\",\n \"state\": \"active\",\n \"actorPath\": \"akka.tcp://master@127.0.0.1:3000/user/Worker1\",\n \"aliveFor\": \"831069\",\n \"logFile\": \"logs/\",\n \"executors\": [\n {\n \"appId\": 1,\n \"executorId\": 1,\n \"slots\": 1\n }\n ],\n \"totalSlots\": 1000,\n \"availableSlots\": 999,\n \"homeDirectory\": \"/usr/lisa/gearpump/\",\n \"jvmName\": \"11788@lisa\"\n}\n\n\n\n\nGET api/v1.0/worker/<workerId>/config\n\n\nQuery worker config\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/worker/0/config\n\n\n\n\nSample Response:\n\n\n{\n \"extensions\": [\n \"akka.contrib.datareplication.DataReplication$\"\n ]\n \"akka\": {\n \"loglevel\": \"INFO\"\n \"log-dead-letters\": \"off\"\n \"log-dead-letters-during-shutdown\": \"off\"\n \"actor\": {\n ## Master forms a akka cluster\n \"provider\": \"akka.cluster.ClusterActorRefProvider\"\n }\n \"cluster\": {\n \"roles\": [\"master\"]\n \"auto-down-unreachable-after\": \"15s\"\n }\n \"remote\": {\n \"log-remote-lifecycle-events\": \"off\"\n }\n }\n}\n\n\n\n\nGET api/v1.0/worker/<workerId>/metrics/<query_path>?readLatest=<true|false>\n\n\nGet the worker node metrics.\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/worker/0/metrics/worker?readLatest=true\n\n\n\n\nSample Response:\n\n\n{\n \"path\"\n:\n \"worker\", \"metrics\"\n:\n [{\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.total.used\",\n \"value\": \"152931440\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker1:thread.daemon.count\", \"value\": \"18\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.heap.used\",\n \"value\": \"123139640\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.total.max\",\n \"value\": \"997457920\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.heap.committed\",\n \"value\": \"179830784\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker0:thread.count\", \"value\": \"28\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker0:memory.heap.max\", \"value\": \"880017408\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker1:memory.heap.max\", \"value\": \"880017408\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.total.committed\",\n \"value\": \"210239488\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.total.used\",\n \"value\": \"152931440\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker1:thread.count\", \"value\": \"28\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.total.max\",\n \"value\": \"997457920\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.heap.committed\",\n \"value\": \"179830784\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.total.committed\",\n \"value\": \"210239488\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker0:thread.daemon.count\", \"value\": \"18\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.heap.used\",\n \"value\": \"123139640\"\n }\n }]\n}\n\n\n\n\nSupervisor Service\n\n\nSupervisor service allows user to add or remove a worker machine.\n\n\nPOST api/v1.0/supervisor/status\n\n\nQuery whether the supervisor service is enabled. If Supervisor service is disabled, you are not allowed to use API like addworker/removeworker.\n\n\nExample:\n\n\ncurl -X POST [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/supervisor/status\n\n\n\n\nSample Response:\n\n\n{\"enabled\":true}\n\n\n\n\nGET api/v1.0/supervisor\n\n\nGet the supervisor path\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/supervisor\n\n\n\n\nSample Response:\n\n\n{path: \"supervisor actor path\"}\n\n\n\n\nPOST api/v1.0/supervisor/addworker/<worker-count>\n\n\nAdd workerCount new workers in the cluster. It will use the low level resource scheduler like\nYARN to start new containers and then boot Gearpump worker process.\n\n\nExample:\n\n\ncurl -X POST [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/supervisor/addworker/2\n\n\n\n\nSample Response:\n\n\n{success: true}\n\n\n\n\nPOST api/v1.0/supervisor/removeworker/<worker-id>\n\n\nRemove single worker instance by specifying a worker Id.\n\n\n*\nNOTE:\n Use with caution!\n\n\nNOTE:\n All executors JVMs under this worker JVM will also be destroyed. It will trigger failover for all\napplications that have executor started under this worker.\n\n\nExample:\n\n\ncurl -X POST [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/supervisor/removeworker/3\n\n\n\n\nSample Response:\n\n\n{success: true}\n\n\n\n\nApplication service\n\n\nGET api/v1.0/appmaster/<appId>?detail=<true|false>\n\n\nQuery information of an specific application of Id appId\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/appmaster/1?detail=true\n\n\n\n\nSample Response:\n\n\n{\n \"appId\": 1,\n \"appName\": \"wordCount\",\n \"processors\": [\n [\n 0,\n {\n \"id\": 0,\n \"taskClass\": \"org.apache.gearpump.streaming.examples.wordcount.Split\",\n \"parallelism\": 1,\n \"description\": \"\",\n \"taskConf\": {\n \"_config\": {}\n },\n \"life\": {\n \"birth\": \"0\",\n \"death\": \"9223372036854775807\"\n },\n \"executors\": [\n 1\n ],\n \"taskCount\": [\n [\n 1,\n {\n \"count\": 1\n }\n ]\n ]\n }\n ],\n [\n 1,\n {\n \"id\": 1,\n \"taskClass\": \"org.apache.gearpump.streaming.examples.wordcount.Sum\",\n \"parallelism\": 1,\n \"description\": \"\",\n \"taskConf\": {\n \"_config\": {}\n },\n \"life\": {\n \"birth\": \"0\",\n \"death\": \"9223372036854775807\"\n },\n \"executors\": [\n 0\n ],\n \"taskCount\": [\n [\n 0,\n {\n \"count\": 1\n }\n ]\n ]\n }\n ]\n ],\n \"processorLevels\": [\n [\n 0,\n 0\n ],\n [\n 1,\n 1\n ]\n ],\n \"dag\": {\n \"vertexList\": [\n 0,\n 1\n ],\n \"edgeList\": [\n [\n 0,\n \"org.apache.gearpump.partitioner.HashPartitioner\",\n 1\n ]\n ]\n },\n \"actorPath\": \"akka.tcp://app1-executor-1@127.0.0.1:52212/user/daemon/appdaemon1/$c/appmaster\",\n \"clock\": \"1450759382430\",\n \"executors\": [\n {\n \"executorId\": 0,\n \"executor\": \"akka.tcp://app1system0@127.0.0.1:52240/remote/akka.tcp/app1-executor-1@127.0.0.1:52212/user/daemon/appdaemon1/$c/appmaster/executors/0#-1554950276\",\n \"workerId\": \"1\",\n \"status\": \"active\"\n },\n {\n \"executorId\": 1,\n \"executor\": \"akka.tcp://app1system1@127.0.0.1:52241/remote/akka.tcp/app1-executor-1@127.0.0.1:52212/user/daemon/appdaemon1/$c/appmaster/executors/1#928082134\",\n \"workerId\": \"0\",\n \"status\": \"active\"\n },\n {\n \"executorId\": -1,\n \"executor\": \"akka://app1-executor-1/user/daemon/appdaemon1/$c/appmaster\",\n \"workerId\": \"1\",\n \"status\": \"active\"\n }\n ],\n \"startTime\": \"1450758117306\",\n \"uptime\": \"1268472\",\n \"user\": \"lisa\",\n \"homeDirectory\": \"/usr/lisa/gearpump/\",\n \"logFile\": \"logs/\",\n \"historyMetricsConfig\": {\n \"retainHistoryDataHours\": 72,\n \"retainHistoryDataIntervalMs\": 3600000,\n \"retainRecentDataSeconds\": 300,\n \"retainRecentDataIntervalMs\": 15000\n }\n}\n\n\n\n\nDELETE api/v1.0/appmaster/<appId>\n\n\nshutdown application appId\n\n\nGET api/v1.0/appmaster/<appId>/stallingtasks\n\n\nQuery list of unhealthy tasks of an specific application of Id appId\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/appmaster/2/stallingtasks\n\n\n\n\nSample Response:\n\n\n{\n \"tasks\": [\n {\n \"processorId\": 0,\n \"index\": 0\n }\n ]\n}\n\n\n\n\nGET api/v1.0/appmaster/<appId>/config\n\n\nQuery the configuration of specific application appId\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/appmaster/1/config\n\n\n\n\nSample Response:\n\n\n{\n \"gearpump\" : {\n \"appmaster\" : {\n \"extraClasspath\" : \"\",\n \"vmargs\" : \"-server -Xms512M -Xmx1024M -Xss1M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:NewRatio=3\"\n },\n \"cluster\" : {\n \"masters\" : [\n \"127.0.0.1:3000\"\n ]\n },\n \"executor\" : {\n \"extraClasspath\" : \"\",\n \"vmargs\" : \"-server -Xms512M -Xmx1024M -Xss1M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:NewRatio=3\"\n },\n \"jarstore\" : {\n \"rootpath\" : \"jarstore/\"\n },\n \"log\" : {\n \"application\" : {\n \"dir\" : \"logs\"\n },\n \"daemon\" : {\n \"dir\" : \"logs\"\n }\n },\n \"metrics\" : {\n \"enabled\" : true,\n \"graphite\" : {\n \"host\" : \"127.0.0.1\",\n \"port\" : 2003\n },\n \"logfile\" : {},\n \"report-interval-ms\" : 15000,\n \"reporter\" : \"akka\",\n \"retainHistoryData\" : {\n \"hours\" : 72,\n \"intervalMs\" : 3600000\n },\n \"retainRecentData\" : {\n \"intervalMs\" : 15000,\n \"seconds\" : 300\n },\n \"sample-rate\" : 10\n },\n \"netty\" : {\n \"base-sleep-ms\" : 100,\n \"buffer-size\" : 5242880,\n \"flush-check-interval\" : 10,\n \"max-retries\" : 30,\n \"max-sleep-ms\" : 1000,\n \"message-batch-size\" : 262144\n },\n \"netty-dispatcher\" : \"akka.actor.default-dispatcher\",\n \"scheduling\" : {\n \"scheduler-class\" : \"org.apache.gearpump.cluster.scheduler.PriorityScheduler\"\n },\n \"serializers\" : {\n \"[B\" : \"\",\n \"[C\" : \"\",\n \"[D\" : \"\",\n \"[F\" : \"\",\n \"[I\" : \"\",\n \"[J\" : \"\",\n \"[Ljava.lang.String;\" : \"\",\n \"[S\" : \"\",\n \"[Z\" : \"\",\n \"org.apache.gearpump.Message\" : \"org.apache.gearpump.streaming.MessageSerializer\",\n \"org.apache.gearpump.streaming.task.Ack\" : \"org.apache.gearpump.streaming.AckSerializer\",\n \"org.apache.gearpump.streaming.task.AckRequest\" : \"org.apache.gearpump.streaming.AckRequestSerializer\",\n \"org.apache.gearpump.streaming.task.LatencyProbe\" : \"org.apache.gearpump.streaming.LatencyProbeSerializer\",\n \"org.apache.gearpump.streaming.task.TaskId\" : \"org.apache.gearpump.streaming.TaskIdSerializer\",\n \"scala.Tuple1\" : \"\",\n \"scala.Tuple2\" : \"\",\n \"scala.Tuple3\" : \"\",\n \"scala.Tuple4\" : \"\",\n \"scala.Tuple5\" : \"\",\n \"scala.Tuple6\" : \"\",\n \"scala.collection.immutable.$colon$colon\" : \"\",\n \"scala.collection.immutable.List\" : \"\"\n },\n \"services\" : {\n # gear.conf: 112\n \"host\" : \"127.0.0.1\",\n # gear.conf: 113\n \"http\" : 8090,\n # gear.conf: 114\n \"ws\" : 8091\n },\n \"task-dispatcher\" : \"akka.actor.pined-dispatcher\",\n \"worker\" : {\n # reference.conf: 100\n # # How many slots each worker contains\n \"slots\" : 100\n }\n }\n}\n\n\n\n\nGET api/v1.0/appmaster/<appId>/metrics/<query_path>?readLatest=<true|false>&aggregator=<aggregator_class>\n\n\nQuery metrics information of a specific application appId\nFilter metrics with path metrics path\n\n\naggregator points to a aggregator class, which will aggregate on the current metrics, and return a smaller set.\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/appmaster/1/metrics/app1?readLatest=true&aggregator=org.apache.gearpump.streaming.metrics.ProcessorAggregator\n\n\n\n\nSample Response:\n\n\n{\n \"path\"\n:\n \"worker\", \"metrics\"\n:\n [{\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.total.used\",\n \"value\": \"152931440\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker1:thread.daemon.count\", \"value\": \"18\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.heap.used\",\n \"value\": \"123139640\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.total.max\",\n \"value\": \"997457920\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.heap.committed\",\n \"value\": \"179830784\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker0:thread.count\", \"value\": \"28\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker0:memory.heap.max\", \"value\": \"880017408\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker1:memory.heap.max\", \"value\": \"880017408\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.total.committed\",\n \"value\": \"210239488\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.total.used\",\n \"value\": \"152931440\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker1:thread.count\", \"value\": \"28\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.total.max\",\n \"value\": \"997457920\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.heap.committed\",\n \"value\": \"179830784\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.total.committed\",\n \"value\": \"210239488\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker0:thread.daemon.count\", \"value\": \"18\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.heap.used\",\n \"value\": \"123139640\"\n }\n }]\n}\n\n\n\n\nGET api/v1.0/appmaster/<appId>/errors\n\n\nGet task error messages\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/appmaster/1/errors\n\n\n\n\nSample Response:\n\n\n{\"time\":\"0\",\"error\":null}\n\n\n\n\nPOST api/v1.0/appmaster/<appId>/restart\n\n\nRestart the application\n\n\nExecutor Service\n\n\nGET api/v1.0/appmaster/<appId>/executor/<executorid>/config\n\n\nGet executor config\n\n\nExample:\n\n\ncurl http://127.0.0.1:8090/api/v1.0/appmaster/1/executor/1/config\n\n\n\n\nSample Response:\n\n\n{\n \"extensions\": [\n \"akka.contrib.datareplication.DataReplication$\"\n ]\n \"akka\": {\n \"loglevel\": \"INFO\"\n \"log-dead-letters\": \"off\"\n \"log-dead-letters-during-shutdown\": \"off\"\n \"actor\": {\n ## Master forms a akka cluster\n \"provider\": \"akka.cluster.ClusterActorRefProvider\"\n }\n \"cluster\": {\n \"roles\": [\"master\"]\n \"auto-down-unreachable-after\": \"15s\"\n }\n \"remote\": {\n \"log-remote-lifecycle-events\": \"off\"\n }\n }\n}\n\n\n\n\nGET api/v1.0/appmaster/<appId>/executor/<executorid>\n\n\nGet executor information.\n\n\nExample:\n\n\ncurl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/appmaster/1/executor/1\n\n\n\n\nSample Response:\n\n\n{\n \"id\": 1,\n \"workerId\": \"0\",\n \"actorPath\": \"akka.tcp://app1system1@127.0.0.1:52241/remote/akka.tcp/app1-executor-1@127.0.0.1:52212/user/daemon/appdaemon1/$c/appmaster/executors/1\",\n \"logFile\": \"logs/\",\n \"status\": \"active\",\n \"taskCount\": 1,\n \"tasks\": [\n [\n 0,\n [\n {\n \"processorId\": 0,\n \"index\": 0\n }\n ]\n ]\n ],\n \"jvmName\": \"21304@lisa\"\n}",
"title": "REST API"
},
{
"location": "/dev/dev-rest-api/index.html#authentication",
"text": "For all REST API calls, We need authentication by default. If you don't want authentication, you can disable them. How to disable Authentication To disable Authentication, you can set gearpump-ui.gearpump.ui-security.authentication-enabled = false \nin gear.conf, please check UI Authentication for details. How to authenticate if Authentication is enabled. For User-Password based authentication If Authentication is enabled, then you need to login before calling REST API. curl -X POST --data username=admin --data password=admin --cookie-jar outputAuthenticationCookie.txt http://127.0.0.1:8090/login This will use default user \"admin:admin\" to login, and store the authentication cookie to file outputAuthenticationCookie.txt. In All subsequent Rest API calls, you need to add the authentication cookie. For example curl --cookie outputAuthenticationCookie.txt http://127.0.0.1/api/v1.0/master for more information, please check UI Authentication . For OAuth2 based authentication For OAuth2 based authentication, it requires you to have an access token in place. Different OAuth2 service provider have different way to return an access token. For Google , you can refer to OAuth Doc . For CloudFoundry UAA , you can use the uaac command to get the access token. $ uaac target http://login.gearpump.gotapaas.eu/\n$ uaac token get <user_email_address>\n\n### Find access token\n$ uaac context\n\n[0]*[http://login.gearpump.gotapaas.eu]\n\n [0]*[<user_email_address>]\n user_id: 34e33a79-42c6-479b-a8c1-8c471ff027fb\n client_id: cf\n token_type: bearer\n access_token: eyJhbGciOiJSUzI1NiJ9.eyJqdGkiOiI\n expires_in: 599\n scope: password.write openid cloud_controller.write cloud_controller.read\n jti: 74ea49e4-1001-4757-9f8d-a66e52a27557 For more information on uaac, please check UAAC guide Now, we have the access token, then let's login to Gearpump UI server with this access token: ## Please replace cloudfoundryuaa with actual OAuth2 service name you have configured in gear.conf\ncurl -X POST --data accesstoken=eyJhbGciOiJSUzI1NiJ9.eyJqdGkiOiI --cookie-jar outputAuthenticationCookie.txt http://127.0.0.1:8090/login/oauth2/cloudfoundryuaa/accesstoken This will use user user_email_address to login, and store the authentication cookie to file outputAuthenticationCookie.txt. In All subsequent Rest API calls, you need to add the authentication cookie. For example curl --cookie outputAuthenticationCookie.txt http://127.0.0.1/api/v1.0/master NOTE: You can default the default permission level for OAuth2 user. for more information,\nplease check UI Authentication .",
"title": "Authentication."
},
{
"location": "/dev/dev-rest-api/index.html#query-version",
"text": "GET version Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/version Sample Response: 0.8.3",
"title": "Query version"
},
{
"location": "/dev/dev-rest-api/index.html#master-service",
"text": "GET api/v1.0/master Get information of masters Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/master Sample Response: {\n \"masterDescription\": {\n \"leader\":{\"host\":\"master@127.0.0.1\",\"port\":3000},\n \"cluster\":[{\"host\":\"127.0.0.1\",\"port\":3000}]\n \"aliveFor\": \"642941\",\n \"logFile\": \"/Users/foobar/gearpump/logs\",\n \"jarStore\": \"jarstore/\",\n \"masterStatus\": \"synced\",\n \"homeDirectory\": \"/Users/foobar/gearpump\"\n }\n} GET api/v1.0/master/applist Query information of all applications Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/master/applist Sample Response: {\n \"appMasters\": [\n {\n \"status\": \"active\",\n \"appId\": 1,\n \"appName\": \"wordCount\",\n \"appMasterPath\": \"akka.tcp://app1-executor-1@127.0.0.1:52212/user/daemon/appdaemon1/$c\",\n \"workerPath\": \"akka.tcp://master@127.0.0.1:3000/user/Worker0\",\n \"submissionTime\": \"1450758114766\",\n \"startTime\": \"1450758117294\",\n \"user\": \"lisa\"\n }\n ]\n} GET api/v1.0/master/workerlist Query information of all workers Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/master/workerlist Sample Response: [\n {\n \"workerId\": \"1\",\n \"state\": \"active\",\n \"actorPath\": \"akka.tcp://master@127.0.0.1:3000/user/Worker0\",\n \"aliveFor\": \"431565\",\n \"logFile\": \"logs/\",\n \"executors\": [\n {\n \"appId\": 1,\n \"executorId\": -1,\n \"slots\": 1\n },\n {\n \"appId\": 1,\n \"executorId\": 0,\n \"slots\": 1\n }\n ],\n \"totalSlots\": 1000,\n \"availableSlots\": 998,\n \"homeDirectory\": \"/usr/lisa/gearpump/\",\n \"jvmName\": \"11788@lisa\"\n },\n {\n \"workerId\": \"0\",\n \"state\": \"active\",\n \"actorPath\": \"akka.tcp://master@127.0.0.1:3000/user/Worker1\",\n \"aliveFor\": \"431546\",\n \"logFile\": \"logs/\",\n \"executors\": [\n {\n \"appId\": 1,\n \"executorId\": 1,\n \"slots\": 1\n }\n ],\n \"totalSlots\": 1000,\n \"availableSlots\": 999,\n \"homeDirectory\": \"/usr/lisa/gearpump/\",\n \"jvmName\": \"11788@lisa\"\n }\n] GET api/v1.0/master/config Get the configuration of all masters Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/master/config Sample Response: {\n \"extensions\": [\n \"akka.contrib.datareplication.DataReplication$\"\n ]\n \"akka\": {\n \"loglevel\": \"INFO\"\n \"log-dead-letters\": \"off\"\n \"log-dead-letters-during-shutdown\": \"off\"\n \"actor\": {\n ## Master forms a akka cluster\n \"provider\": \"akka.cluster.ClusterActorRefProvider\"\n }\n \"cluster\": {\n \"roles\": [\"master\"]\n \"auto-down-unreachable-after\": \"15s\"\n }\n \"remote\": {\n \"log-remote-lifecycle-events\": \"off\"\n }\n }\n} GET api/v1.0/master/metrics/<query_path>?readLatest=<true|false> Get the master node metrics. Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/master/metrics/master?readLatest=true Sample Response: {\n \"path\"\n:\n \"master\", \"metrics\"\n:\n [{\n \"time\": \"1450758725070\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"master:memory.heap.used\", \"value\": \"59764272\"}\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"master:thread.daemon.count\", \"value\": \"18\"}\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"master:memory.total.committed\",\n \"value\": \"210239488\"\n }\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"master:memory.heap.max\", \"value\": \"880017408\"}\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"master:memory.total.max\", \"value\": \"997457920\"}\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"master:memory.heap.committed\",\n \"value\": \"179830784\"\n }\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"master:memory.total.used\", \"value\": \"89117352\"}\n }, {\n \"time\": \"1450758725070\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"master:thread.count\", \"value\": \"28\"}\n }]\n} POST api/v1.0/master/submitapp Submit a streaming job jar to Gearpump cluster. It functions like command line gear app -jar xx.jar -conf yy.conf -executors 1 <command line arguments> Required MIME type: \"multipart/form-data\" Required post form fields: field name \"jar\", job jar file. Optional post form fields: \"configfile\", configuration file, in UTF8 format. \"configstring\", text body of configuration file, in UTF8 format. \"executorcount\", The count of JVM process to start across the cluster for this application job \"args\", command line arguments for this job jar. Example html: <form id=\"submitapp\" action=\"http://127.0.0.1:8090/api/v1.0/master/submitapp\"\nmethod=\"POST\" enctype=\"multipart/form-data\">\n\nJob Jar (*.jar) [Required]: <br/>\n<input type=\"file\" name=\"jar\"/> <br/> <br/>\n\nConfig file (*.conf) [Optional]: <br/>\n<input type=\"file\" name=\"configfile\"/> <br/> <br/>\n\nConfig String, Config File in string format. [Optional]: <br/>\n<input type=\"text\" name=\"configstring\" value=\"a.b.c.d=1\"/> <br/><br/>\n\nExecutor count (integer, how many process to start for this streaming job) [Optional]: <br/>\n<input type=\"text\" name=\"executorcount\" value=\"1\"/> <br/><br/>\n\nApplication arguments (String) [Optional]: <br/>\n<input type=\"text\" name=\"args\" value=\"\"/> <br/><br/>\n\n<input type=\"submit\" value=\"Submit\"/>\n\n</table>\n\n</form> POST api/v1.0/master/submitstormapp Submit a storm jar to Gearpump cluster. It functions like command line storm app -jar xx.jar -conf yy.yaml <command line arguments> Required MIME type: \"multipart/form-data\" Required post form fields: field name \"jar\", job jar file. Optional post form fields: \"configfile\", .yaml configuration file, in UTF8 format. \"args\", command line arguments for this job jar. Example html: <form id=\"submitstormapp\" action=\"http://127.0.0.1:8090/api/v1.0/master/submitstormapp\"\nmethod=\"POST\" enctype=\"multipart/form-data\">\n\nJob Jar (*.jar) [Required]: <br/>\n<input type=\"file\" name=\"jar\"/> <br/> <br/>\n\nConfig file (*.yaml) [Optional]: <br/>\n<input type=\"file\" name=\"configfile\"/> <br/> <br/>\n\nApplication arguments (String) [Optional]: <br/>\n<input type=\"text\" name=\"args\" value=\"\"/> <br/><br/>\n\n<input type=\"submit\" value=\"Submit\"/>\n\n</table>\n\n</form>",
"title": "Master Service"
},
{
"location": "/dev/dev-rest-api/index.html#worker-service",
"text": "GET api/v1.0/worker/<workerId> Query worker information. Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/worker/0 Sample Response: {\n \"workerId\": \"0\",\n \"state\": \"active\",\n \"actorPath\": \"akka.tcp://master@127.0.0.1:3000/user/Worker1\",\n \"aliveFor\": \"831069\",\n \"logFile\": \"logs/\",\n \"executors\": [\n {\n \"appId\": 1,\n \"executorId\": 1,\n \"slots\": 1\n }\n ],\n \"totalSlots\": 1000,\n \"availableSlots\": 999,\n \"homeDirectory\": \"/usr/lisa/gearpump/\",\n \"jvmName\": \"11788@lisa\"\n} GET api/v1.0/worker/<workerId>/config Query worker config Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/worker/0/config Sample Response: {\n \"extensions\": [\n \"akka.contrib.datareplication.DataReplication$\"\n ]\n \"akka\": {\n \"loglevel\": \"INFO\"\n \"log-dead-letters\": \"off\"\n \"log-dead-letters-during-shutdown\": \"off\"\n \"actor\": {\n ## Master forms a akka cluster\n \"provider\": \"akka.cluster.ClusterActorRefProvider\"\n }\n \"cluster\": {\n \"roles\": [\"master\"]\n \"auto-down-unreachable-after\": \"15s\"\n }\n \"remote\": {\n \"log-remote-lifecycle-events\": \"off\"\n }\n }\n} GET api/v1.0/worker/<workerId>/metrics/<query_path>?readLatest=<true|false> Get the worker node metrics. Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/worker/0/metrics/worker?readLatest=true Sample Response: {\n \"path\"\n:\n \"worker\", \"metrics\"\n:\n [{\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.total.used\",\n \"value\": \"152931440\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker1:thread.daemon.count\", \"value\": \"18\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.heap.used\",\n \"value\": \"123139640\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.total.max\",\n \"value\": \"997457920\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.heap.committed\",\n \"value\": \"179830784\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker0:thread.count\", \"value\": \"28\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker0:memory.heap.max\", \"value\": \"880017408\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker1:memory.heap.max\", \"value\": \"880017408\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.total.committed\",\n \"value\": \"210239488\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.total.used\",\n \"value\": \"152931440\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker1:thread.count\", \"value\": \"28\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.total.max\",\n \"value\": \"997457920\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.heap.committed\",\n \"value\": \"179830784\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.total.committed\",\n \"value\": \"210239488\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker0:thread.daemon.count\", \"value\": \"18\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.heap.used\",\n \"value\": \"123139640\"\n }\n }]\n}",
"title": "Worker service"
},
{
"location": "/dev/dev-rest-api/index.html#supervisor-service",
"text": "Supervisor service allows user to add or remove a worker machine. POST api/v1.0/supervisor/status Query whether the supervisor service is enabled. If Supervisor service is disabled, you are not allowed to use API like addworker/removeworker. Example: curl -X POST [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/supervisor/status Sample Response: {\"enabled\":true} GET api/v1.0/supervisor Get the supervisor path Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/supervisor Sample Response: {path: \"supervisor actor path\"} POST api/v1.0/supervisor/addworker/<worker-count> Add workerCount new workers in the cluster. It will use the low level resource scheduler like\nYARN to start new containers and then boot Gearpump worker process. Example: curl -X POST [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/supervisor/addworker/2 Sample Response: {success: true} POST api/v1.0/supervisor/removeworker/<worker-id> Remove single worker instance by specifying a worker Id. * NOTE: Use with caution! NOTE: All executors JVMs under this worker JVM will also be destroyed. It will trigger failover for all\napplications that have executor started under this worker. Example: curl -X POST [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/supervisor/removeworker/3 Sample Response: {success: true}",
"title": "Supervisor Service"
},
{
"location": "/dev/dev-rest-api/index.html#application-service",
"text": "GET api/v1.0/appmaster/<appId>?detail=<true|false> Query information of an specific application of Id appId Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/appmaster/1?detail=true Sample Response: {\n \"appId\": 1,\n \"appName\": \"wordCount\",\n \"processors\": [\n [\n 0,\n {\n \"id\": 0,\n \"taskClass\": \"org.apache.gearpump.streaming.examples.wordcount.Split\",\n \"parallelism\": 1,\n \"description\": \"\",\n \"taskConf\": {\n \"_config\": {}\n },\n \"life\": {\n \"birth\": \"0\",\n \"death\": \"9223372036854775807\"\n },\n \"executors\": [\n 1\n ],\n \"taskCount\": [\n [\n 1,\n {\n \"count\": 1\n }\n ]\n ]\n }\n ],\n [\n 1,\n {\n \"id\": 1,\n \"taskClass\": \"org.apache.gearpump.streaming.examples.wordcount.Sum\",\n \"parallelism\": 1,\n \"description\": \"\",\n \"taskConf\": {\n \"_config\": {}\n },\n \"life\": {\n \"birth\": \"0\",\n \"death\": \"9223372036854775807\"\n },\n \"executors\": [\n 0\n ],\n \"taskCount\": [\n [\n 0,\n {\n \"count\": 1\n }\n ]\n ]\n }\n ]\n ],\n \"processorLevels\": [\n [\n 0,\n 0\n ],\n [\n 1,\n 1\n ]\n ],\n \"dag\": {\n \"vertexList\": [\n 0,\n 1\n ],\n \"edgeList\": [\n [\n 0,\n \"org.apache.gearpump.partitioner.HashPartitioner\",\n 1\n ]\n ]\n },\n \"actorPath\": \"akka.tcp://app1-executor-1@127.0.0.1:52212/user/daemon/appdaemon1/$c/appmaster\",\n \"clock\": \"1450759382430\",\n \"executors\": [\n {\n \"executorId\": 0,\n \"executor\": \"akka.tcp://app1system0@127.0.0.1:52240/remote/akka.tcp/app1-executor-1@127.0.0.1:52212/user/daemon/appdaemon1/$c/appmaster/executors/0#-1554950276\",\n \"workerId\": \"1\",\n \"status\": \"active\"\n },\n {\n \"executorId\": 1,\n \"executor\": \"akka.tcp://app1system1@127.0.0.1:52241/remote/akka.tcp/app1-executor-1@127.0.0.1:52212/user/daemon/appdaemon1/$c/appmaster/executors/1#928082134\",\n \"workerId\": \"0\",\n \"status\": \"active\"\n },\n {\n \"executorId\": -1,\n \"executor\": \"akka://app1-executor-1/user/daemon/appdaemon1/$c/appmaster\",\n \"workerId\": \"1\",\n \"status\": \"active\"\n }\n ],\n \"startTime\": \"1450758117306\",\n \"uptime\": \"1268472\",\n \"user\": \"lisa\",\n \"homeDirectory\": \"/usr/lisa/gearpump/\",\n \"logFile\": \"logs/\",\n \"historyMetricsConfig\": {\n \"retainHistoryDataHours\": 72,\n \"retainHistoryDataIntervalMs\": 3600000,\n \"retainRecentDataSeconds\": 300,\n \"retainRecentDataIntervalMs\": 15000\n }\n} DELETE api/v1.0/appmaster/<appId> shutdown application appId GET api/v1.0/appmaster/<appId>/stallingtasks Query list of unhealthy tasks of an specific application of Id appId Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/appmaster/2/stallingtasks Sample Response: {\n \"tasks\": [\n {\n \"processorId\": 0,\n \"index\": 0\n }\n ]\n} GET api/v1.0/appmaster/<appId>/config Query the configuration of specific application appId Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/appmaster/1/config Sample Response: {\n \"gearpump\" : {\n \"appmaster\" : {\n \"extraClasspath\" : \"\",\n \"vmargs\" : \"-server -Xms512M -Xmx1024M -Xss1M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:NewRatio=3\"\n },\n \"cluster\" : {\n \"masters\" : [\n \"127.0.0.1:3000\"\n ]\n },\n \"executor\" : {\n \"extraClasspath\" : \"\",\n \"vmargs\" : \"-server -Xms512M -Xmx1024M -Xss1M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:NewRatio=3\"\n },\n \"jarstore\" : {\n \"rootpath\" : \"jarstore/\"\n },\n \"log\" : {\n \"application\" : {\n \"dir\" : \"logs\"\n },\n \"daemon\" : {\n \"dir\" : \"logs\"\n }\n },\n \"metrics\" : {\n \"enabled\" : true,\n \"graphite\" : {\n \"host\" : \"127.0.0.1\",\n \"port\" : 2003\n },\n \"logfile\" : {},\n \"report-interval-ms\" : 15000,\n \"reporter\" : \"akka\",\n \"retainHistoryData\" : {\n \"hours\" : 72,\n \"intervalMs\" : 3600000\n },\n \"retainRecentData\" : {\n \"intervalMs\" : 15000,\n \"seconds\" : 300\n },\n \"sample-rate\" : 10\n },\n \"netty\" : {\n \"base-sleep-ms\" : 100,\n \"buffer-size\" : 5242880,\n \"flush-check-interval\" : 10,\n \"max-retries\" : 30,\n \"max-sleep-ms\" : 1000,\n \"message-batch-size\" : 262144\n },\n \"netty-dispatcher\" : \"akka.actor.default-dispatcher\",\n \"scheduling\" : {\n \"scheduler-class\" : \"org.apache.gearpump.cluster.scheduler.PriorityScheduler\"\n },\n \"serializers\" : {\n \"[B\" : \"\",\n \"[C\" : \"\",\n \"[D\" : \"\",\n \"[F\" : \"\",\n \"[I\" : \"\",\n \"[J\" : \"\",\n \"[Ljava.lang.String;\" : \"\",\n \"[S\" : \"\",\n \"[Z\" : \"\",\n \"org.apache.gearpump.Message\" : \"org.apache.gearpump.streaming.MessageSerializer\",\n \"org.apache.gearpump.streaming.task.Ack\" : \"org.apache.gearpump.streaming.AckSerializer\",\n \"org.apache.gearpump.streaming.task.AckRequest\" : \"org.apache.gearpump.streaming.AckRequestSerializer\",\n \"org.apache.gearpump.streaming.task.LatencyProbe\" : \"org.apache.gearpump.streaming.LatencyProbeSerializer\",\n \"org.apache.gearpump.streaming.task.TaskId\" : \"org.apache.gearpump.streaming.TaskIdSerializer\",\n \"scala.Tuple1\" : \"\",\n \"scala.Tuple2\" : \"\",\n \"scala.Tuple3\" : \"\",\n \"scala.Tuple4\" : \"\",\n \"scala.Tuple5\" : \"\",\n \"scala.Tuple6\" : \"\",\n \"scala.collection.immutable.$colon$colon\" : \"\",\n \"scala.collection.immutable.List\" : \"\"\n },\n \"services\" : {\n # gear.conf: 112\n \"host\" : \"127.0.0.1\",\n # gear.conf: 113\n \"http\" : 8090,\n # gear.conf: 114\n \"ws\" : 8091\n },\n \"task-dispatcher\" : \"akka.actor.pined-dispatcher\",\n \"worker\" : {\n # reference.conf: 100\n # # How many slots each worker contains\n \"slots\" : 100\n }\n }\n} GET api/v1.0/appmaster/<appId>/metrics/<query_path>?readLatest=<true|false>&aggregator=<aggregator_class> Query metrics information of a specific application appId\nFilter metrics with path metrics path aggregator points to a aggregator class, which will aggregate on the current metrics, and return a smaller set. Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/appmaster/1/metrics/app1?readLatest=true&aggregator=org.apache.gearpump.streaming.metrics.ProcessorAggregator Sample Response: {\n \"path\"\n:\n \"worker\", \"metrics\"\n:\n [{\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.total.used\",\n \"value\": \"152931440\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker1:thread.daemon.count\", \"value\": \"18\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.heap.used\",\n \"value\": \"123139640\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.total.max\",\n \"value\": \"997457920\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.heap.committed\",\n \"value\": \"179830784\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker0:thread.count\", \"value\": \"28\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker0:memory.heap.max\", \"value\": \"880017408\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker1:memory.heap.max\", \"value\": \"880017408\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.total.committed\",\n \"value\": \"210239488\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker0:memory.total.used\",\n \"value\": \"152931440\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker1:thread.count\", \"value\": \"28\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.total.max\",\n \"value\": \"997457920\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.heap.committed\",\n \"value\": \"179830784\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.total.committed\",\n \"value\": \"210239488\"\n }\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\", \"name\": \"worker0:thread.daemon.count\", \"value\": \"18\"}\n }, {\n \"time\": \"1450759137860\",\n \"value\": {\n \"$type\": \"org.apache.gearpump.metrics.Metrics.Gauge\",\n \"name\": \"worker1:memory.heap.used\",\n \"value\": \"123139640\"\n }\n }]\n} GET api/v1.0/appmaster/<appId>/errors Get task error messages Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/appmaster/1/errors Sample Response: {\"time\":\"0\",\"error\":null} POST api/v1.0/appmaster/<appId>/restart Restart the application",
"title": "Application service"
},
{
"location": "/dev/dev-rest-api/index.html#executor-service",
"text": "GET api/v1.0/appmaster/<appId>/executor/<executorid>/config Get executor config Example: curl http://127.0.0.1:8090/api/v1.0/appmaster/1/executor/1/config Sample Response: {\n \"extensions\": [\n \"akka.contrib.datareplication.DataReplication$\"\n ]\n \"akka\": {\n \"loglevel\": \"INFO\"\n \"log-dead-letters\": \"off\"\n \"log-dead-letters-during-shutdown\": \"off\"\n \"actor\": {\n ## Master forms a akka cluster\n \"provider\": \"akka.cluster.ClusterActorRefProvider\"\n }\n \"cluster\": {\n \"roles\": [\"master\"]\n \"auto-down-unreachable-after\": \"15s\"\n }\n \"remote\": {\n \"log-remote-lifecycle-events\": \"off\"\n }\n }\n} GET api/v1.0/appmaster/<appId>/executor/<executorid> Get executor information. Example: curl [--cookie outputAuthenticationCookie.txt] http://127.0.0.1:8090/api/v1.0/appmaster/1/executor/1 Sample Response: {\n \"id\": 1,\n \"workerId\": \"0\",\n \"actorPath\": \"akka.tcp://app1system1@127.0.0.1:52241/remote/akka.tcp/app1-executor-1@127.0.0.1:52212/user/daemon/appdaemon1/$c/appmaster/executors/1\",\n \"logFile\": \"logs/\",\n \"status\": \"active\",\n \"taskCount\": 1,\n \"tasks\": [\n [\n 0,\n [\n {\n \"processorId\": 0,\n \"index\": 0\n }\n ]\n ]\n ],\n \"jvmName\": \"21304@lisa\"\n}",
"title": "Executor Service"
},
{
"location": "/api/scala/index.html",
"text": "Placeholder",
"title": "Scala API"
},
{
"location": "/api/java/index.html",
"text": "Placeholder",
"title": "Java API"
}
]
}