blob: 44cdcd89b9a9093c19d9912b091dd34f90104965 [file] [log] [blame]
Hyracks is a data-parallel platform that allows users to run jobs on a cluster of shared-nothing computers.
QUICKSTART
__________
Hyracks is made up of two parts that need to run on a cluster to accept and run users' jobs. The Hyracks Cluster Controller must be run on one machine designated as the
master node. This machine should be able to be accessed from the other nodes in the cluster that would do work as well as the client machines that would submit jobs to Hyracks.
The worker nodes (machines) must run a Hyracks Node Controller.
1. Starting the Hyracks Cluster Controller
The simplest way to start the cluster controller is to run bin/hyrackscc.
By default, the cluster controller listens on port 1099 for connections from Node Controllers. However, it can be made to listen on a different port by passing an optional
parameter as shown below:
bin/hyrackscc -port <port>
2. Starting the Hyracks Node Controller
The node controller is started by running bin/hyracksnc. It requires at least the following two command line arguments.
-cluster-address VAL : Cluster Controller host name
-data-listen-address VAL : IP Address to bind data listener
If the cluster controller was directed to listen on a port other than the default, you will need to pass one more argument to hyracksnc.
-cluster-port N : Cluster Controller port (default: 1099)
The data-listen-address is the interface on which the Node Controller must listen on -- in the event the machine is multi-homed it must listen on an IP that is reachable from
other Node Controllers. Make sure that the value passed to the data-listen-address is a valid IPv4 address (four octets separated by .).
3. Running a job on Hyracks
There are a few examples in the source distribution under src/test/integration that outline the construction and issue of a Job to the Hyracks Cluster Controller. The
basic steps that need to be performed on the client to execute jobs are:
Registry registry = LocateRegistry.getRegistry(ccHost, ccPort);
IClusterController cc = registry.lookup(IClusterController.class.getName()); // Get a handle to the Cluster Controller
JobSpecification spec = createJob(); // User code to create a Job
UUID jobId = cc.createJob(spec); // Install the Job on the Cluster Controller
cc.start(jobId); // Start the job
cc.waitForCompletion(jobId); // Jobs run asynchronously -- Wait for this one to complete
BUILDING FROM SOURCE
____________________
Prerequisites:
1. JDK 1.6
2. Maven2 (maven.apache.org)
Steps:
1. Checkout source from SVN
2. Download dcache-client-0.0.1.jar from the Downloads page at http://code.google.com/p/hyracks/downloads/list
3. Run the following command: (Replacing /path/to/file with the path to the dcache-client jar file).
mvn install:install-file -DgroupId=org.apache.dcache -DartifactId=dcache-client -Dversion=0.0.1 -Dpackaging=jar -Dfile=/path/to/file/dcache-client-0.0.1.jar
4. Download jol.jar from the Downloads page at http://code.google.com/p/hyracks/downloads/list
5. Run the following command: (Replacing /path/to/file with the path to the jol jar file).
mvn install:install-file -DgroupId=jol -DartifactId=jol -Dversion=0.0.1 -Dpackaging=jar -Dfile=/path/to/file/jol.jar
6. cd into the hyracks folder where the source was checked out
7. Run
mvn package
6. That's it!
IMPORTING INTO ECLIPSE
______________________
Prerequisites:
You will need to have the Eclipse Maven plugin (http://m2eclipse.sonatype.org/)
1. Open eclipse
2. Right click in the Package Explorer pane.
3. Click on Import...
4. Under "General" choose "Existing Projects into Workspace"
5. Pick the project root option and browse to the hyracks folder
6. Import all projects.
7. Click on Finish