| ---+Configuring Falcon |
| |
| By default config directory used by falcon is {package dir}/conf. To override this (to use the same conf with multiple |
| falcon upgrades), set environment variable FALCON_CONF to the path of the conf dir. |
| |
| falcon-env.sh has been added to the falcon conf. This file can be used to set various environment variables that you |
| need for you services. |
| In addition you can set any other environment variables you might need. This file will be sourced by falcon scripts |
| before any commands are executed. The following environment variables are available to set. |
| |
| <verbatim> |
| # The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path |
| #export JAVA_HOME= |
| |
| # any additional java opts you want to set. This will apply to both client and server operations |
| #export FALCON_OPTS= |
| |
| # any additional java opts that you want to set for client only |
| #export FALCON_CLIENT_OPTS= |
| |
| # java heap size we want to set for the client. Default is 1024MB |
| #export FALCON_CLIENT_HEAP= |
| |
| # any additional opts you want to set for prism service. |
| #export FALCON_PRISM_OPTS= |
| |
| # java heap size we want to set for the prism service. Default is 1024MB |
| #export FALCON_PRISM_HEAP= |
| |
| # any additional opts you want to set for falcon service. |
| #export FALCON_SERVER_OPTS= |
| |
| # java heap size we want to set for the falcon server. Default is 1024MB |
| #export FALCON_SERVER_HEAP= |
| |
| # What is is considered as falcon home dir. Default is the base location of the installed software |
| #export FALCON_HOME_DIR= |
| |
| # Where log files are stored. Default is logs directory under the base install location |
| #export FALCON_LOG_DIR= |
| |
| # Where pid files are stored. Default is logs directory under the base install location |
| #export FALCON_PID_DIR= |
| |
| # where the falcon active mq data is stored. Default is logs/data directory under the base install location |
| #export FALCON_DATA_DIR= |
| |
| # Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir. |
| #export FALCON_EXPANDED_WEBAPP_DIR= |
| |
| # Any additional classpath elements to be added to the Falcon server/client classpath |
| #export FALCON_EXTRA_CLASS_PATH= |
| </verbatim> |
| |
| ---++Advanced Configurations |
| |
| ---+++Configuring Monitoring plugin to register catalog partitions |
| Falcon comes with a monitoring plugin that registers catalog partition. This comes in really handy during migration from |
| filesystem based feeds to hcatalog based feeds. |
| This plugin enables the user to de-couple the partition registration and assume that all partitions are already on |
| hcatalog even before the migration, simplifying the hcatalog migration. |
| |
| By default this plugin is disabled. |
| To enable this plugin and leverage the feature, there are 3 pre-requisites: |
| <verbatim> |
| In {package dir}/conf/startup.properties, add |
| *.workflow.execution.listeners=org.apache.falcon.catalog.CatalogPartitionHandler |
| |
| In the cluster definition, ensure registry endpoint is defined. |
| Ex: |
| <interface type="registry" endpoint="thrift://localhost:1109" version="0.13.3"/> |
| |
| In the feed definition, ensure the corresponding catalog table is mentioned in feed-properties |
| Ex: |
| <properties> |
| <property name="catalog.table" value="catalog:default:in_table#year={YEAR};month={MONTH};day={DAY};hour={HOUR}; |
| minute={MINUTE}"/> |
| </properties> |
| </verbatim> |
| |
| *NOTE : for Mac OS users* |
| <verbatim> |
| If you are using a Mac OS, you will need to configure the FALCON_SERVER_OPTS (explained above). |
| |
| In {package dir}/conf/falcon-env.sh uncomment the following line |
| #export FALCON_SERVER_OPTS= |
| |
| and change it to look as below |
| export FALCON_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc=" |
| </verbatim> |
| |
| ---+++Activemq |
| * falcon server starts embedded active mq. To control this behaviour, set the following system properties using -D |
| option in environment variable FALCON_OPTS: |
| * falcon.embeddedmq=<true/false> - Should server start embedded active mq, default true |
| * falcon.embeddedmq.port=<port> - Port for embedded active mq, default 61616 |
| * falcon.embeddedmq.data=<path> - Data path for embedded active mq, default {package dir}/logs/data |
| |
| ---+++Falcon System Notifications |
| |
| Some Falcon features such as late data handling, retries, metadata service, depend on JMS notifications sent when the |
| Oozie workflow completes. Falcon listens to Oozie notification via JMS. You need to enable Oozie JMS notification as |
| explained below. Falcon post processing feature continues to only send user notifications so enabling Oozie |
| JMS notification is important. |
| |
| ---+++Enable Oozie JMS notification |
| |
| * Please add/change the following properties in oozie-site.xml in the oozie installation dir. |
| |
| <verbatim> |
| <property> |
| <name>oozie.jms.producer.connection.properties</name> |
| <value>java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://<activemq-host>:<port></value> |
| </property> |
| |
| <property> |
| <name>oozie.service.EventHandlerService.event.listeners</name> |
| <value>org.apache.oozie.jms.JMSJobEventListener</value> |
| </property> |
| |
| <property> |
| <name>oozie.service.JMSTopicService.topic.name</name> |
| <value>WORKFLOW=ENTITY.TOPIC,COORDINATOR=ENTITY.TOPIC</value> |
| </property> |
| |
| <property> |
| <name>oozie.service.JMSTopicService.topic.prefix</name> |
| <value>FALCON.</value> |
| </property> |
| |
| <!-- add org.apache.oozie.service.JMSAccessorService to the other existing services if any --> |
| <property> |
| <name>oozie.services.ext</name> |
| <value>org.apache.oozie.service.JMSAccessorService,org.apache.oozie.service.PartitionDependencyManagerService,org.apache.oozie.service.HCatAccessorService</value> |
| </property> |
| </verbatim> |
| |
| * In falcon startup.properties, set JMS broker url to be the same as the one set in oozie-site.xml property |
| oozie.jms.producer.connection.properties (see above) |
| |
| <verbatim> |
| *.broker.url=tcp://<activemq-host>:<port> |
| </verbatim> |
| |
| ---+++Configuring Oozie for Falcon |
| |
| Falcon uses HCatalog for data availability notification when Hive tables are replicated. Make the following configuration |
| changes to Oozie to ensure Hive table replication in Falcon: |
| |
| * Stop the Oozie service on all Falcon clusters. Run the following commands on the Oozie host machine. |
| |
| <verbatim> |
| su - $OOZIE_USER |
| |
| <oozie-install-dir>/bin/oozie-stop.sh |
| |
| where $OOZIE_USER is the Oozie user. For example, oozie. |
| </verbatim> |
| |
| * Copy each cluster's hadoop conf directory to a different location. For example, if you have two clusters, copy one to /etc/hadoop/conf-1 and the other to /etc/hadoop/conf-2. |
| |
| * For each oozie-site.xml file, modify the oozie.service.HadoopAccessorService.hadoop.configurations property, specifying clusters, the RPC ports of the NameNodes, and HostManagers accordingly. For example, if Falcon connects to three clusters, specify: |
| |
| <verbatim> |
| |
| <property> |
| <name>oozie.service.HadoopAccessorService.hadoop.configurations</name> |
| <value>*=/etc/hadoop/conf,$NameNode:$rpcPortNN=$hadoopConfDir1,$ResourceManager1:$rpcPortRM=$hadoopConfDir1,$NameNode2=$hadoopConfDir2,$ResourceManager2:$rpcPortRM=$hadoopConfDir2,$NameNode3 :$rpcPortNN =$hadoopConfDir3,$ResourceManager3 :$rpcPortRM =$hadoopConfDir3</value> |
| <description> |
| Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of |
| the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is |
| used when there is no exact match for an authority. The HADOOP_CONF_DIR contains |
| the relevant Hadoop *-site.xml files. If the path is relative is looked within |
| the Oozie configuration directory; though the path can be absolute (i.e. to point |
| to Hadoop client conf/ directories in the local filesystem. |
| </description> |
| </property> |
| |
| </verbatim> |
| |
| * Add the following properties to the /etc/oozie/conf/oozie-site.xml file: |
| |
| <verbatim> |
| |
| <property> |
| <name>oozie.service.ProxyUserService.proxyuser.falcon.hosts</name> |
| <value>*</value> |
| </property> |
| |
| <property> |
| <name>oozie.service.ProxyUserService.proxyuser.falcon.groups</name> |
| <value>*</value> |
| </property> |
| |
| <property> |
| <name>oozie.service.URIHandlerService.uri.handlers</name> |
| <value>org.apache.oozie.dependency.FSURIHandler, org.apache.oozie.dependency.HCatURIHandler</value> |
| </property> |
| |
| <property> |
| <name>oozie.services.ext</name> |
| <value>org.apache.oozie.service.JMSAccessorService, org.apache.oozie.service.PartitionDependencyManagerService, |
| org.apache.oozie.service.HCatAccessorService</value> |
| </property> |
| |
| <!-- Coord EL Functions Properties --> |
| |
| <property> |
| <name>oozie.service.ELService.ext.functions.coord-job-submit-instances</name> |
| <value>now=org.apache.oozie.extensions.OozieELExtensions#ph1_now_echo, |
| today=org.apache.oozie.extensions.OozieELExtensions#ph1_today_echo, |
| yesterday=org.apache.oozie.extensions.OozieELExtensions#ph1_yesterday_echo, |
| currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph1_currentMonth_echo, |
| lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph1_lastMonth_echo, |
| currentYear=org.apache.oozie.extensions.OozieELExtensions#ph1_currentYear_echo, |
| lastYear=org.apache.oozie.extensions.OozieELExtensions#ph1_lastYear_echo, |
| formatTime=org.apache.oozie.coord.CoordELFunctions#ph1_coord_formatTime_echo, |
| latest=org.apache.oozie.coord.CoordELFunctions#ph2_coord_latest_echo, |
| future=org.apache.oozie.coord.CoordELFunctions#ph2_coord_future_echo |
| </value> |
| </property> |
| |
| <property> |
| <name>oozie.service.ELService.ext.functions.coord-action-create-inst</name> |
| <value>now=org.apache.oozie.extensions.OozieELExtensions#ph2_now_inst, |
| today=org.apache.oozie.extensions.OozieELExtensions#ph2_today_inst, |
| yesterday=org.apache.oozie.extensions.OozieELExtensions#ph2_yesterday_inst, |
| currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_currentMonth_inst, |
| lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_lastMonth_inst, |
| currentYear=org.apache.oozie.extensions.OozieELExtensions#ph2_currentYear_inst, |
| lastYear=org.apache.oozie.extensions.OozieELExtensions#ph2_lastYear_inst, |
| latest=org.apache.oozie.coord.CoordELFunctions#ph2_coord_latest_echo, |
| future=org.apache.oozie.coord.CoordELFunctions#ph2_coord_future_echo, |
| formatTime=org.apache.oozie.coord.CoordELFunctions#ph2_coord_formatTime, |
| user=org.apache.oozie.coord.CoordELFunctions#coord_user |
| </value> |
| </property> |
| |
| <property> |
| <name>oozie.service.ELService.ext.functions.coord-action-start</name> |
| <value> |
| now=org.apache.oozie.extensions.OozieELExtensions#ph2_now, |
| today=org.apache.oozie.extensions.OozieELExtensions#ph2_today, |
| yesterday=org.apache.oozie.extensions.OozieELExtensions#ph2_yesterday, |
| currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_currentMonth, |
| lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_lastMonth, |
| currentYear=org.apache.oozie.extensions.OozieELExtensions#ph2_currentYear, |
| lastYear=org.apache.oozie.extensions.OozieELExtensions#ph2_lastYear, |
| latest=org.apache.oozie.coord.CoordELFunctions#ph3_coord_latest, |
| future=org.apache.oozie.coord.CoordELFunctions#ph3_coord_future, |
| dataIn=org.apache.oozie.extensions.OozieELExtensions#ph3_dataIn, |
| instanceTime=org.apache.oozie.coord.CoordELFunctions#ph3_coord_nominalTime, |
| dateOffset=org.apache.oozie.coord.CoordELFunctions#ph3_coord_dateOffset, |
| formatTime=org.apache.oozie.coord.CoordELFunctions#ph3_coord_formatTime, |
| user=org.apache.oozie.coord.CoordELFunctions#coord_user |
| </value> |
| </property> |
| |
| <property> |
| <name>oozie.service.ELService.ext.functions.coord-sla-submit</name> |
| <value> |
| instanceTime=org.apache.oozie.coord.CoordELFunctions#ph1_coord_nominalTime_echo_fixed, |
| user=org.apache.oozie.coord.CoordELFunctions#coord_user |
| </value> |
| </property> |
| |
| <property> |
| <name>oozie.service.ELService.ext.functions.coord-sla-create</name> |
| <value> |
| instanceTime=org.apache.oozie.coord.CoordELFunctions#ph2_coord_nominalTime, |
| user=org.apache.oozie.coord.CoordELFunctions#coord_user |
| </value> |
| </property> |
| |
| </verbatim> |
| |
| * Copy the existing Oozie WAR file to <oozie-install-dir>/oozie.war. This will ensure that all existing items in the WAR file are still present after the current update. |
| |
| <verbatim> |
| su - root |
| cp $CATALINA_BASE/webapps/oozie.war <oozie-install-dir>/oozie.war |
| |
| where $CATALINA_BASE is the path for the Oozie web app. By default, $CATALINA_BASE is: <oozie-install-dir> |
| </verbatim> |
| |
| * Add the Falcon EL extensions to Oozie. |
| |
| Copy the extension JAR files provided with the Falcon Server to a temporary directory on the Oozie server. For example, if your standalone Falcon Server is on the same machine as your Oozie server, you can just copy the JAR files. |
| |
| <verbatim> |
| |
| mkdir /tmp/falcon-oozie-jars |
| cp <falcon-install-dir>/oozie/ext/falcon-oozie-el-extension-<$version>.jar /tmp/falcon-oozie-jars |
| cp /tmp/falcon-oozie-jars/falcon-oozie-el-extension-<$version>.jar <oozie-install-dir>/libext |
| |
| </verbatim> |
| |
| * Package the Oozie WAR file as the Oozie user |
| |
| <verbatim> |
| su - $OOZIE_USER |
| cd <oozie-install-dir>/bin |
| ./oozie-setup.sh prepare-war |
| |
| Where $OOZIE_USER is the Oozie user. For example, oozie. |
| </verbatim> |
| |
| * Start the Oozie service on all Falcon clusters. Run these commands on the Oozie host machine. |
| |
| <verbatim> |
| su - $OOZIE_USER |
| <oozie-install-dir>/bin/oozie-start.sh |
| |
| Where $OOZIE_USER is the Oozie user. For example, oozie. |
| </verbatim> |
| |
| |
| ---+++Enabling Falcon Native Scheudler |
| You can either choose to schedule entities using Oozie's coordinator or using Falcon's native scheduler. To be able to |
| schedule entities natively on Falcon, you will need to add some additional properties |
| to <verbatim>$FALCON_HOME/conf/startup.properties</verbatim> before starting the Falcon Server. |
| For details on the same, refer to [[FalconNativeScheduler][Falcon Native Scheduler]] |
| |
| ---+++Titan GraphDB backend |
| GraphDB backend needs to be configured to properly start Falcon server. |
| You can either choose to use 5.0.73 version of berkeleydb (the default for Falcon for the last few releases) or 1.1.x or later version HBase as the backend database. |
| Falcon in its release distributions will have the titan storage plugins for both BerkeleyDB and HBase. |
| |
| ----++++Using BerkeleyDB backend |
| Falcon distributions may not package berkeley db artifacts (je-5.0.73.jar) based on build profiles. |
| If Berkeley DB is not packaged, you can download the Berkeley DB jar file from the URL: |
| <verbatim>http://download.oracle.com/otn/berkeley-db/je-5.0.73.zip</verbatim>. |
| The following properties describe an example berkeley db graph storage backend that can be specified in the configuration file |
| <verbatim>$FALCON_HOME/conf/startup.properties</verbatim>. |
| |
| <verbatim> |
| # Graph Storage |
| *.falcon.graph.storage.directory=${user.dir}/target/graphdb |
| *.falcon.graph.storage.backend=berkeleyje |
| *.falcon.graph.serialize.path=${user.dir}/target/graphdb |
| </verbatim> |
| |
| ---++++Using HBase backend |
| |
| To use HBase as the backend it is recommended that a HBase cluster be provisioned with distributed mode configuration primarily because of the support of kerberos enabled clusters and HA considerations. Based on build profile, a standalone hbase version can be packaged with the Falcon binary distribution. Along with this, a template for <verbatim>hbase-site.xml</verbatim> is provided, which can be used to start the standalone mode HBase enviornment for development/testing purposes. |
| |
| ---++++ Basic configuration |
| |
| <verbatim> |
| ##### Falcon startup.properties |
| *.falcon.graph.storage.backend=hbase |
| #For standalone mode , specify localhost |
| #for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2 |
| *.falcon.graph.storage.hostname=<ZooKeeper Quorum> |
| </verbatim> |
| |
| HBase configuration file (hbase-site.xml) and hbase libraries need to be added to classpath when Falcon starts up. The following must be appended to the environment variable <verbatim>FALCON_EXTRA_CLASS_PATH<verbatim> in <verbatim>$FALCON_HOME/bin/falcon-env.sh</verbatim>. Additionally the correct hbase client libraries need to be added. For example, |
| <verbatim> |
| export FALCON_EXTRA_CLASS_PATH=`${HBASE_HOME}/bin/hbase classpath` |
| </verbatim> |
| |
| Table name |
| We recommend that in the startup config the tablename for titan storage be named <verbatim>falcon_titan<verbatim> so that multiple applications using Titan can share the same HBase cluster. This can be set by specifying the tablename using the startup property given below. The default value is shown. |
| |
| <verbatim> |
| *.falcon.graph.storage.hbase.table=falcon_titan |
| </verbatim> |
| |
| ---++++Starting standalone HBase for testing |
| |
| HBase can be started in stand alone mode for testing as a backend for Titan. The following steps outline the config changes required: |
| <verbatim> |
| 1. Build Falcon as below to package hbase binaries |
| $ export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m" && mvn clean assembly:assembly -Ppackage-standalone-hbase |
| 2. Configure HBase |
| a. When falcon tar file is expanded, HBase binaries are under ${FALCON_HOME}/hbase |
| b. Copy ${FALCON_HOME}/conf/hbase-site.xml.template into hbase conf dir in ${FALCON_HOME}/hbase/conf/hbase-site.xml |
| c. Set {hbase_home} property to point to a local dir |
| d. Standalone HBase starts zookeeper on the default port (2181). This port can be changed by adding the following to hbase-site.xml |
| <property> |
| <name>hbase.zookeeper.property.clientPort</name> |
| <value>2223</value> |
| </property> |
| |
| <property> |
| <name>hbase.zookeeper.quorum</name> |
| <value>localhost</value> |
| </property> |
| e. set JAVA_HOME to point to Java 1.7 or above |
| f. Start hbase as ${FALCON_HOME}/hbase/bin/start-hbase.sh |
| 3. Configure Falcon |
| a. In ${FALCON_HOME}/conf/startup.properties, uncomment the following to enable HBase as the backend |
| *.falcon.graph.storage.backend=hbase |
| ### specify the zookeeper host and port name with which standalone hbase is started (see step 2) |
| ### by default, it will be localhost and port 2181 |
| *.falcon.graph.storage.hostname=<zookeeper-host-name>:<zookeeper-host-port> |
| *.falcon.graph.serialize.path=${user.dir}/target/graphdb |
| *.falcon.graph.storage.hbase.table=falcon_titan |
| *.falcon.graph.storage.transactions=false |
| 4. Add HBase jars to Falcon classpath in ${FALCON_HOME}/conf/falcon-env.sh as: |
| FALCON_EXTRA_CLASS_PATH=`${FALCON_HOME}/hbase/bin/hbase classpath` |
| 5. Set the following in ${FALCON_HOME}/conf/startup.properties to disable SSL if needed |
| *.falcon.enableTLS=false |
| 6. Start Falcon |
| </verbatim> |
| |
| ---++++Permissions |
| |
| When Falcon is configured with HBase as the storage backend Titan needs to have sufficient authorizations to create and access an HBase table. In a secure cluster it may be necessary to grant permissions to the <verbatim>falcon</verbatim> user for the <verbatim>falcon_titan</verbatim> table (or whateven tablename was specified for the property <verbatim>*.falcon.graph.storage.hbase.table</verbatim> |
| |
| With Ranger, a policy can be configured for <verbatim>falcon_titan</verbatim>. |
| |
| Without Ranger, HBase shell can be used to set the permissions. |
| |
| <verbatim> |
| su hbase |
| kinit -k -t <hbase keytab> <hbase principal> |
| echo "grant 'falcon', 'RWXCA', 'falcon_titan'" | hbase shell |
| </verbatim> |
| |
| ---++++Advanced configuration |
| |
| HBase storage backend support in Titan has a few other configurations and they can be set in <verbatim>$FALCON_HOME/conf/startup.properties</verbatim>, by prefixing the Titan property with <verbatim>*.falcon.graph</verbatim> prefix. |
| |
| Please Refer to <verbatim>http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage</verbatim> for generic storage properties, <verbaim>http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_berkeleydb</verbatim> for berkeley db properties and <verbatim>http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_hbase</verbatim> for hbase storage backend properties. |
| |
| |
| |
| ---+++Adding Extension Libraries |
| |
| Library extensions allows users to add custom libraries to entity lifecycles such as feed retention, feed replication |
| and process execution. This is useful for usecases such as adding filesystem extensions. To enable this, add the |
| following configs to startup.properties: |
| *.libext.paths=<paths to be added to all entity lifecycles> |
| |
| *.libext.feed.paths=<paths to be added to all feed lifecycles> |
| |
| *.libext.feed.retentions.paths=<paths to be added to feed retention workflow> |
| |
| *.libext.feed.replication.paths=<paths to be added to feed replication workflow> |
| |
| *.libext.process.paths=<paths to be added to process workflow> |
| |
| The configured jars are added to falcon classpath and the corresponding workflows. |