Below are instructions for running Fluo in a production environment where Accumulo, Hadoop & Zookeeper are installed and running. If you want to avoid setting up these dependencies, consider using fluo-dev or MiniFluo.
Before you install Fluo, you will need the following installed and running on your local machine or cluster:
Before you can install Fluo, you will need to obtain a distribution tarball that works for your environment. Fluo distributions are built for specific releases of Hadoop and Accumulo. If you are using Accumulo 1.6.4 and Hadoop 2.6.3, you can download the latest release. If you need a release for a different environment or one built from the master branch, follow these steps:
First, clone Fluo:
git clone https://github.com/fluo-io/fluo.git cd fluo/
Optionally, check out a stable tag (if you don't want to build a release from master):
git checkout 1.0.0-beta-1
Next, build a distribution for your environment. The tarball will be created in modules/distribution/target
.
mvn package -Daccumulo.version=1.6.4 -Dhadoop.version=2.7.1
When you have a distribution tarball built for your environment, follow these steps to install Fluo.
First, choose a directory with plenty of space and untar the distribution:
tar -xvzf fluo-1.0.0-beta-1-bin.tar.gz
Verify that your distribution has the same versions of Hadoop & Accumulo as your environment:
cd fluo-1.0.0-beta-1 ls lib/hadoop-* lib/accumulo-*
Next, copy the example configuration to the base of your configuration directory to create the default configuration for your Fluo install:
cp conf/examples/* conf/
The default configuration will be used as the base configuration for each new application. Therefore, you should modify fluo.properties for your environment. However, you should not configure any application settings (like observers).
NOTE - All properties that have a default are set with it. Uncomment a property if you want to use a value different than the default. Properties that are unset and uncommented must be set by the user.
The Fluo command script is located at bin/fluo
of your Fluo installation. All Fluo commands are invoked by this script.
Modify and add the following to your ~/.bashrc
if you want to be able to execute the fluo script from any directory:
export PATH=/path/to/fluo-1.0.0-beta-1/bin:$PATH
Source your .bashrc
for the changes to take effect and test the script
source ~/.bashrc fluo
Running the script without any arguments prints a description of all commands.
./bin/fluo
You are now ready to configure a Fluo application. Use the command below to create the configuration necessary for a new application. Feel free to pick a different name (other than myapp
) for your application:
fluo new myapp
This command will create a directory for your application at apps/myapp
of your Fluo install which will contain a conf
and lib
.
The apps/myapp/conf
directory contains a copy of the fluo.properties
from your default configuration. This should be configured for your application:
vim apps/myapp/fluo.properties
When configuring the observer section in fluo.properties, you can configure your instance for the phrasecount application if you have not created your own application. See the phrasecount example for instructions. You can also choose not to configure any observers but your workers will be idle when started.
The apps/myapp/lib
directory should contain any observer jars for your application. If you configured fluo.properties for observers, copy any jars containing these observer classes to this directory.
After your application has been configured, use the command below to initialize it:
fluo init myapp
This only needs to be called once and stores configuration in Zookeeper.
A Fluo application consists of one oracle process and multiple worker processes. Before starting your application, you can configure the number of worker process in your fluo.properties file.
When you are ready to start your Fluo application on your YARN cluster, run the command below:
fluo start myapp
The start command above will work for a single-node or a large cluster. By using YARN, you do not need to deploy the Fluo binaries to every node on your cluster or start processes on every node.
You can use the following command to check the status of your instance:
fluo status myapp
For more detailed information on the YARN containers running Fluo:
fluo info myapp
You can also use yarn application -list
to check the status of your Fluo instance in YARN. Logs are viewable within YARN.
When you have data in your fluo instance, you can view it using the command fluo scan
. Pipe the output to less
using the command fluo scan | less
if you want to page through the data.
Use the following command to stop your Fluo application:
fluo stop myapp
If stop fails, there is also a kill command.
fluo kill myapp
Fluo will reread the same data frequently when it checks conditions on mutations. When Fluo initializes a table it enables data caching to make this more efficient. However you may need to increase the amount of memory available for caching in the tserver by increasing tserver.cache.data.size
. Increasing this may require increasing the maximum tserver java heap size in accumulo-env.sh
.
Fluo will run many client threads, will want to ensure the tablet server has enough threads. Should probably increase the tserver.server.threads.minimum
Accumulo setting.
Using at least Accumulo 1.6.1 is recommended because multiple performance bugs were fixed.
If you do not have YARN set up, you can start the oracle and worker as a local Fluo process using the following commands:
local-fluo start-oracle local-fluo start-worker
Use the following commands to stop a local Fluo process:
local-fluo stop-worker local-fluo stop-oracle
In a distributed environment, you will need to deploy and configure a Fluo distribution on every node in your cluster.