Instructions for installing Apache Fluo and starting a Fluo application on a cluster where Accumulo, Hadoop & Zookeeper are running. If you need help setting up these dependencies, see the related projects page for external projects that may help.
Before you install Fluo, the following software must be installed and running on your local machine or cluster:
Software | Recommended Version | Minimum Version |
---|---|---|
Accumulo | 1.7.2 | 1.6.1 |
Hadoop | 2.7.2 | 2.6.0 |
Zookeeper | 3.4.8 | |
Java | JDK 8 | JDK 8 |
Before you can install Fluo, you will need to obtain a distribution tarball. It is recommended that you download the latest release. You can also build a distribution from the master branch by following these steps which create a tarball in modules/distribution/target
:
git clone https://github.com/apache/fluo.git cd fluo/ mvn package
After you obtain a Fluo distribution tarball, follow these steps to install Fluo.
Choose a directory with plenty of space and untar the distribution:
tar -xvzf fluo-1.1.0-incubating-bin.tar.gz cd fluo-1.1.0-incubating
The distribution contains a fluo
script in bin/
that administers Fluo and the following configuration files in conf/
:
Configuration file | Description |
---|---|
fluo-env.sh | Configures classpath for fluo script. Required for all commands. |
fluo-conn.properties | Configures connection to Fluo. Required for all commands. |
fluo-app.properties | Template for configuration file passed to fluo init when initializing Fluo application. |
log4j.properties | Configures logging |
fluo.properties.deprecated | Deprecated Fluo configuration file. Replaced by fluo-conn.properties and fluo-app.properties |
Configure fluo-env.sh to set up your classpath using jars from the versions of Hadoop, Accumulo, and Zookeeper that you are using. Choose one of the two ways below to make these jars available to Fluo:
Set HADOOP_PREFIX
, ACCUMULO_HOME
, and ZOOKEEPER_HOME
in your environment or configure these variables in fluo-env.sh. Fluo will look in these locations for jars.
Run ./lib/fetch.sh ahz
to download Hadoop, Accumulo, and Zookeeper jars to lib/ahz
and configure fluo-env.sh to look in this directory. By default, this command will download the default versions set in lib/ahz/pom.xml. If you are not using the default versions, you can override them:
./lib/fetch.sh ahz -Daccumulo.version=1.7.2 -Dhadoop.version=2.7.2 -Dzookeeper.version=3.4.8
Fluo needs more dependencies than what is available from Hadoop, Accumulo, and Zookeeper. These extra dependencies need to be downloaded to lib/
using the command below:
./lib/fetch.sh extra
You are now ready to use the fluo
script.
The Fluo command script is located at bin/fluo
of your Fluo installation. All Fluo commands are invoked by this script.
Modify and add the following to your ~/.bashrc
if you want to be able to execute the fluo script from any directory:
export PATH=/path/to/fluo-1.1.0-incubating/bin:$PATH
Source your .bashrc
for the changes to take effect and test the script
source ~/.bashrc fluo
Running the script without any arguments prints a description of all commands.
./bin/fluo
Fluo will reread the same data frequently when it checks conditions on mutations. When Fluo initializes a table it enables data caching to make this more efficient. However you may need to increase the amount of memory available for caching in the tserver by increasing tserver.cache.data.size
. Increasing this may require increasing the maximum tserver java heap size in accumulo-env.sh
.
Fluo will run many client threads, will want to ensure the tablet server has enough threads. Should probably increase the tserver.server.threads.minimum
Accumulo setting.
Using at least Accumulo 1.6.1 is recommended because multiple performance bugs were fixed.