Ignite In-Memory Accelerator For Apache Hadoop is designed to deliver uncompromised performance for existing Apache Hadoop 2.2 or above applications with zero code change as well as simplicity of installation and configuration across all the supported platforms.
Ignite distribution comes in a ZIP file that simply needs to be unzipped. The Accelerator requires Apache Hadoop of version 2.2 or above to be already installed on the system either using Apache Bigtop packages or manually (manual installation just means that Apache Hadoop binary distribution must be unpacked somewhere on the system). In case of manual installation HADOOP_HOME
environment variable must point to the installation directory of Apache Hadoop.
NOTE: You do not need any Apache Hadoop processes to be started, you only need to deploy the Apache Hadoop distribution on your system. Nevertheless you can run Apache Hadoop jobs with Ignite Accelerator over HDFS, in this case up and running HDFS infrastructure will be needed.
The Accelerator comes with command line setup tool bin/setup-hadoop.sh
(bin/setup-hadoop.bat
on Windows) which will guide you through all the needed setup steps (note that the setup tool will require write permissions to the Apache Hadoop installation directory).
Installation requirements:
JAVA_HOME
environment variable to your JDK or JRE installation.HADOOP_HOME
environment variable to the installation directory of Apache Hadoop.bin/setup-hadoop.{sh|bat}
setup script and follow instructions.NOTE: On Windows platform Apache Hadoop client requires
JAVA_HOME
path to not contain space characters. Java installed toC:\\Program Files\
will not work, install JRE to correct location and pointJAVA_HOME
there.
After setup script successfully completed, you can execute the Ignite startup script. The following command will startup Ignite node with default configuration using multicast node discovery.
bin/ignite.{sh|bat}
If Ignite was installed successfully, the output from above commands should produce no exceptions or errors. Note that you may see some other warnings during startup, but this is OK as they are meant to inform that certain functionality is turned on or off by default.
You can execute the above commands multiple times on the same machine and make sure that nodes discover each other. Here is an example of log printout when 2 nodes join topology:
... Topology snapshot [nodes=2, CPUs=8, hash=0xD551B245]
You can also start Ignite Management Console, called Visor, and observe started nodes. To startup Visor, you should execute the following script:
/bin/ignitevisorcmd.{sh|bat}
To configure Ignite nodes you can change configuration files at config
directory of Ignite installation. Those are conventional Spring files. Please refer to shipped configuration files and Ignite javadocs for more details.
Ignite has it's own distributed in-memory file system called IgniteFS. Hadoop jobs can use it instead of HDFS to achieve maximum performance and scalability. Setting up IGFS is much simpler than HDFS, it requires just few tweaks of Ignite node configuration and does not require starting any additional processes. Default configuration shipped with the Accelerator contains one configured instance named “ignitefs” which can be used as reference.
Generally URI for IgniteFS which will be used by Apache Hadoop looks like:
igfs://igfs_name@host_name
Where igfs_name
is IgniteFS instance name, host_name
is any host running Ignite node with that IgniteFS instance configured. For more details please refer to IgniteFS documentation.
To run Apache Hadoop jobs with Ignite cluster you need to configure core-site.xml
and mapred-site.xml
at $HADOOP_HOME/etc/hadoop
directory the same way as it is done in templates shipped with the Accelerator. The setup tool bin/setup-hadoop.{sh|bat}
will ask you to replace those files with Ignite templates or you can find these templates at config/hadoop/core-site.ignite.xml
and config/hadoop/mapred-site.ignite.xml
respectively and perform the needed configuration manually.
Apache Hadoop client will need to have Ignite jar files in classpath, the setup tool will care of that as well.
To run Apache Hadoop job with Ignite cluster you have to start one or multiple Ignite nodes and make sure they successfully discovered each other.
When all the configuration is complete and Ignite nodes are started, running Apache Hadoop job will be the same as with conventional Apache Hadoop distribution except that all Ignite nodes are equal and any of them can be treated as Job Tracker and DFS Name Node.
To run “Word Count” example you can load some text files to IGFS using standard Apache Hadoop tools:
cd $HADOOP_HOME/bin ./hadoop fs -mkdir /input ./hadoop fs -copyFromLocal $HADOOP_HOME/README.txt /input/WORD_COUNT_ME.txt
Run the job:
./hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/*-mapreduce-examples-*.jar wordcount /input /output
Check results:
./hadoop fs -ls /output ./hadoop fs -cat /output/part-r-00000
A job can be ran on multiple nodes on localhost or in cluster environment the same way. The only changes needed to switch Apache Hadoop client to a cluster are to fix host in default DFS URI in core-site.xml
and host in job tracker address in mapred-site.xml
.
Ignite comes with CLI (command) based DevOps Managements Console, called Visor, delivering advance set of management and monitoring capabilities.
To start Visor in console mode you should execute the following command:
`bin/ignitevisorcmd.sh`
On Windows, run the same commands with .bat
extension.