Only the submit
and kill
commands are currently supported. Other CLI commands (activate
, deactivate
, restart
, update
) are currently in development and will be released soon.
To run Heron on k8s, you will need a Zookeeper cluster. You can choose to use a zookeeper cluster outside of k8s if you‘d like (only if it’s accessible from the k8s cluster nodes), but most often you will probably want to deploy your own Zookeeper cluster inside of k8s. This can be done and a good example of how to deploy one can be found here.
Heron uses an uploader to upload the topology to a shared location so that a worker can fetch the topology to its sandbox. The configuration for an uploader is in the uploader.yaml
config file. For k8s deployments, Heron can use the S3Uploader
. Details on configuring the S3 Uploader can be found in the documentation here.
To configure Heron to use the Kubernetes scheduler, modify the scheduler.yaml
config file specific for the Heron cluster. The following must be specified for each cluster:
heron.class.scheduler
--- Indicates the class to be loaded for Kubernetes scheduler. You should set this to org.apache.heron.scheduler.kubernetes.KubernetesScheduler
heron.class.launcher
--- Specifies the class to be loaded for launching and submitting topologies. To configure the Kubernetes launcher, set this to org.apache.heron.scheduler.kubernetes.KubernetesLauncher
heron.directory.sandbox.java.home
--- Specifies the location of the Java 8 JRE. Depending on the Docker container you‘re using for executor image, this may be different. If you’re using the Streaml.io-hosted containter, this will already be available within the environment as $JAVA_HOME
heron.directory.home
--- Specifies the directory for the core binaries. This should be set to $HERON_HOME
heron.director.conf
--- Specifies the directory where the Heron configuration files are located. This should be set to /opt/heron/heron-conf/
heron.kubernetes.scheduler.uri
--- Specifies the URI for the Kubernetes API service. If you‘re using kubectl proxy
to access your API, this will need to be set to http://localhost:8001
if you’re using the standard proxy configuration.
heron.scheduler.is.service
--- This config indicates whether the scheduler is a service. In the case of Kubernetes, it should be set to False
.
heron.executor.docker.image
--- Specified the Docker image to be used for running the executor. To use the Docker image created for the latest release, you can set this value to streamlio/heron:<version>-<os>
. Example: streamlio/heron:0.14.8-ubuntu14.04
# scheduler class for distributing the topology for execution heron.class.scheduler: org.apache.heron.scheduler.kubernetes.KubernetesScheduler # launcher class for submitting and launching the topology heron.class.launcher: org.apache.heron.scheduler.kubernetes.KubernetesLauncher # location of java - pick it up from shell environment heron.directory.sandbox.java.home: $JAVA_HOME heron.directory.conf: "/opt/heron/heron-conf/" heron.directory.home: $HERON_HOME # The URI of kubernetes API heron.kubernetes.scheduler.uri: "http://localhost:8001" # Invoke the IScheduler as a library directly heron.scheduler.is.service: False # docker repo for executor heron.executor.docker.image: 'streamlio/heron:0.14.8-ubuntu14.04'
To configure Heron to use the correct State Manager configuration, modify the statemgr.yaml
configuration file for your Kubernetes cluster.
heron.class.state.manager
--- Specifies the class of the state manager you want to use. In a Kubernetes cluster, you'll want to use Zookeeper so set this to org.apache.heron.statemgr.zookeeper.curator.CuratorStateManager
heron.statemgr.connection.string
--- Specifies the connection string to the zookeeper cluster. This will look something like <zookeeper_ip>:2181
.
heron.statemgr.root.path
--- Specifies the node within Zookeeper where information should be stored.
heron.statemgr.zookeeper.is.initialize.tree
--- Specifies whether or not the State Manager can initialize the tree in Zookeeper. This should most often be set to true
Most often with Heron deployments in a cluster environment, you are going to need to create a tunneled connection to the cluster to submit and kill topology jars since communication with TMaster instances is needed.
To tunnel, you will need to add these extra configurations within your statemgr.yaml
file.
heron.statemgr.is.tunnel.needed
--- Flag for enabling the tunnel to a server within your cluster. If you need the tunnel, which you most likely will, set this flag to true
heron.statemgr.tunnel.connection.timeout.ms
--- The connection timeout in ms when trying to connect to zk server
heron.statemgr.tunnel.connection.retry.count
--- The count of retry to check whether has direct access on zk server
heron.statemgr.tunnel.retry.interval.ms
--- The interval in ms between two retry checking whether has direct access on zk server
heron.statemgr.tunnel.verify.count
--- The count of retry to verify whether seting up a tunnel process
heron.statemgr.tunnel.host
--- The host to SSH tunnel through in <user>@<host>
format
# local state manager class for managing state in a persistent fashion heron.class.state.manager: org.apache.heron.statemgr.zookeeper.curator.CuratorStateManager # local state manager connection string heron.statemgr.connection.string: "<zk_ip>:2181" # path of the root address to store the state in a local file system heron.statemgr.root.path: "/heron" # create the zookeeper nodes, if they do not exist heron.statemgr.zookeeper.is.initialize.tree: True # timeout in ms to wait before considering zookeeper session is dead heron.statemgr.zookeeper.session.timeout.ms: 30000 # timeout in ms to wait before considering zookeeper connection is dead heron.statemgr.zookeeper.connection.timeout.ms: 30000 # timeout in ms to wait before considering zookeeper connection is dead heron.statemgr.zookeeper.retry.count: 10 # duration of time to wait until the next retry heron.statemgr.zookeeper.retry.interval.ms: 10000 heron.statemgr.is.tunnel.needed: True # The connection timeout in ms when trying to connect to zk server heron.statemgr.tunnel.connection.timeout.ms: 1000 # The count of retry to check whether has direct access on zk server heron.statemgr.tunnel.connection.retry.count: 2 # The interval in ms between two retry checking whether has direct access on zk server heron.statemgr.tunnel.retry.interval.ms: 1000 # The count of retry to verify whether seting up a tunnel process heron.statemgr.tunnel.verify.count: 10 # SSH tunnel host heron.statemgr.tunnel.host: "<user>@<host>"
After setting up ZooKeeper and setting your configurations like above, you can run topologies on any machine in the cluster. Since every deployment is done via Docker, there's no need to worry about individual agent configurations. When launching topologies using the heron submit
command, you can look in your Kubernetes UI to see the launched topology pods.