| <!--- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| Introduction |
| ------------ |
| Hadoop requires a client that can be used to interact remotely with the services provided by Hadoop cluster. |
| This will also be true when using the Apache Knox Gateway to provide perimeter security and centralized access for these services. |
| The two primary existing clients for Hadoop are the CLI (i.e. Command Line Interface, hadoop) and HUE (i.e. Hadoop User Environment). |
| for several reasons however, neither of these clients can *currently* be used to access Hadoop services via the Apache Knox Gateway. |
| |
| This lead to thinking about a very simple client that could help people use and evaluate the gateway. |
| The list below outline the general requirements for such a client. |
| |
| 1. Promote the evaluation and adoption of the Apache Knox Gateway |
| 2. Simple to deploy and use on data worker desktops to access to remote Hadoop clusters |
| 3. Simple to extend with new commands both by other Hadoop projects and by the end user |
| 4. Support the notion of a SSO session for multiple Hadoop interactions |
| 5. Support the multiple authentication and federation token capabilities of the Apache Knox Gateway |
| 6. Promote the use of REST APIs as the dominant remote client mechanism for Hadoop services |
| 7. Promote the the sense of Hadoop as a single unified product |
| 8. Aligned with the Apache Knox Gateway's overall goals for security |
| |
| The result is a very simple DSL ([Domain Specific Language][1]) of sorts that is used via [Groovy][2] scripts. |
| Here is an example of a command that copies a file from the local file system to HDFS. |
| *Note: The variables session, localFile and remoteFile are assumed to be defined.* |
| |
| Hdfs.put( session ).file( localFile ).to( remoteFile ).now() |
| |
| ***This work is very early in development but is also very useful in its current state.*** |
| ***We are very interested in receiving feedback about how to improve this feature and the DSL in particular.*** |
| |
| A note of thanks to [REST-assured][3] which provides a [Fluent interface][4] style DSL for testing REST services. |
| It served as the initial inspiration for the creation of this DSL. |
| |
| |
| Assumptions |
| ----------- |
| This document assumes a few things about your environment in order to simplify the examples. |
| |
| 1. The JVM is executable as simply java. |
| 2. The Apache Knox Gateway is installed and functional. |
| 3. The example commands are executed within the context of the GATEWAY_HOME current directory. |
| The GATEWAY_HOME directory is the directory within the Apache Knox Gateway installation that contains the README file and the bin, conf and deployments directories. |
| 4. A few examples require the use of commands from a standard Groovy installation. These examples are optional but to try them you will need Groovy [installed][15]. |
| |
| |
| Usage |
| ----- |
| The DSL requires a shell to interpret the Groovy script. |
| The shell can either be used interactively or to execute a script file. |
| To simplify use, the distribution contains an embedded version of the Groovy shell. |
| |
| The shell can be run interactively. |
| |
| java -jar bin/shell.jar |
| |
| The shell can also be used to execute a script by passing a single filename argument. |
| |
| java -jar bin/shell.jar sample/SmokeTestJob.groovy |
| |
| When running interactively it may be helpful to reduce some of the output generated by the shell console. |
| Use the following command in the interactive shell to reduce that output. |
| This only needs to be done once as these preferences are persisted. |
| |
| set verbosity QUIET |
| set show-last-result false |
| |
| Also when running interactively use the `exit` command to terminate the shell. |
| Using `^C` to exit can sometimes leaves the parent shell in a problematic state. |
| |
| |
| Examples |
| -------- |
| Once the shell can be launched the DSL can be used to interact with the gateway and Hadoop. |
| Below is a very simple example of an interactive shell session to upload a file to HDFS. |
| |
| java -jar bin/shell.jar |
| knox:000> hadoop = Hadoop.login( "https://localhost:8443/gateway/sample", "hdfs", "hdfs-password" ) |
| knox:000> Hdfs.put( hadoop ).file( "README" ).to( "/tmp/example/README" ).now() |
| |
| The `knox:000>` in the example above is the prompt from the embedded Groovy console. |
| If you output doesn't look like this you may need to set the verbosity and show-last-result preferences as described above in the Usage section. |
| |
| Without using some other tool to browse HDFS it is impossible to tell that that this command did anything. |
| Execute this to get a bit more feedback. |
| |
| knox:000> println "Status=" + Hdfs.put( hadoop ).file( "README" ).to( "/tmp/example/README2" ).now().statusCode |
| Status=201 |
| |
| Notice that a different filename is used for the destination. |
| Without this an error would have resulted. |
| Of course the DSL also provides a command to list the contents of a directory. |
| |
| knox:000> println Hdfs.ls( hadoop ).dir( "/tmp/example" ).now().string |
| {"FileStatuses":{"FileStatus":[{"accessTime":1363711366977,"blockSize":134217728,"group":"hdfs","length":19395,"modificationTime":1363711366977,"owner":"hdfs","pathSuffix":"README","permission":"644","replication":1,"type":"FILE"},{"accessTime":1363711375617,"blockSize":134217728,"group":"hdfs","length":19395,"modificationTime":1363711375617,"owner":"hdfs","pathSuffix":"README2","permission":"644","replication":1,"type":"FILE"}]}} |
| |
| It is a design decision of the DSL to not provide type safe classes for various request and response payloads. |
| Doing so would provide an undesirable coupling between the DSL and the service implementation. |
| It also would make adding new commands much more difficult. |
| See the Groovy section below for a variety capabilities and tools for working with JSON and XML to make this easy. |
| The example below shows the use of JsonSlurper and GPath to extract content from a JSON response. |
| |
| knox:000> import groovy.json.JsonSlurper |
| knox:000> text = Hdfs.ls( hadoop ).dir( "/tmp/example" ).now().string |
| knox:000> json = (new JsonSlurper()).parseText( text ) |
| knox:000> println json.FileStatuses.FileStatus.pathSuffix |
| [README, README2] |
| |
| *In the future, "built-in" methods to slurp JSON and XML may be added to make this a bit easier.* |
| *This would allow for this type if single line interaction.* |
| `println Hdfs.ls(hadoop).dir("/tmp").now().json().FileStatuses.FileStatus.pathSuffix` |
| |
| Shell session should always be ended with shutting down the session. |
| The examples above do not touch on it but the DSL supports the simple execution of commands asynchronously. |
| The shutdown command attempts to ensures that all asynchronous commands have completed before existing the shell. |
| |
| knox:000> hadoop.shutdown() |
| knox:000> exit |
| |
| All of the commands above could have been combined into a script file and executed as a single line. |
| |
| java -jar bin/shell.jar samples/Example.groovy |
| |
| This script file is available in the distribution but for convenience, this is the content. |
| |
| import org.apache.hadoop.gateway.shell.Hadoop |
| import org.apache.hadoop.gateway.shell.hdfs.Hdfs |
| import groovy.json.JsonSlurper |
| |
| gateway = "https://localhost:8443/gateway/sample" |
| username = "mapred" |
| password = "mapred-password" |
| dataFile = "README" |
| |
| hadoop = Hadoop.login( gateway, username, password ) |
| Hdfs.rm( hadoop ).file( "/tmp/example" ).recursive().now() |
| Hdfs.put( hadoop ).file( dataFile ).to( "/tmp/example/README" ).now() |
| text = Hdfs.ls( hadoop ).dir( "/tmp/example" ).now().string |
| json = (new JsonSlurper()).parseText( text ) |
| println json.FileStatuses.FileStatus.pathSuffix |
| hadoop.shutdown() |
| |
| Notice the Hdfs.rm command. This is included simply to ensure that the script can be rerun. |
| Without this an error would result the second time it is run. |
| |
| |
| Constructs |
| ---------- |
| In order to understand the DSL there are three primary constructs that need to be understood. |
| |
| ${HHH} Hadoop |
| This construct encapsulates the client side session state that will be shared between all command invocations. |
| In particular it will simplify the management of any tokens that need to be presented with each command invocation. |
| It also manages a thread pool that is used by all asynchronous commands which is why it is important to call one of the shutdown methods. |
| |
| The syntax associated with this is expected to change we expect that credentials will not need to be provided to the gateway. |
| Rather it is expected that some form of access token will be used to initialize the session. |
| |
| ${HHH} Services |
| Services are the primary extension point for adding new suites of commands. |
| The built in examples are: Hdfs, Job and Workflow. |
| The desire for extensibility is the reason for the slightly awkward Hdfs.ls(hadoop) syntax. |
| Certainly something more like hadoop.hdfs().ls() would have been preferred but this would prevent adding new commands easily. |
| At a minimum it would result in extension commands with a different syntax from the "built-in" commands. |
| |
| The service objects essentially function as a factory for a suite of commands. |
| |
| ${HHH} Commands |
| Commands provide the behavior of the DSL. |
| They typically follow a Fluent interface style in order to allow for single line commands. |
| There are really three parts to each command: Request, Invocation, Response |
| |
| ${HHHH} Request |
| The request is populated by all of the methods following the "verb" method and the "invoke" method. |
| For example in Hdfs.rm(hadoop).ls(dir).now() the request is populated between the "verb" method rm() and the "invoke" method now(). |
| |
| ${HHHH} Invocation |
| The invocation method controls how the request is invoked. |
| Currently supported synchronous and asynchronous invocation. |
| The now() method executes the request and returns the result immediately. |
| The later() method submits the request to be executed later and returns a future from which the result can be retrieved. |
| In addition later() invocation method can optionally be provided a closure to execute when the request is complete. |
| See the Futures and Closures sections below for additional detail and examples. |
| |
| ${HHHH} Response |
| The response contains the results of the invocation of the request. |
| In most cases the response is a thin wrapper over the HTTP response. |
| In fact many commands will share a single BasicResponse type that only provides a few simple methods. |
| |
| public int getStatusCode() |
| public long getContentLength() |
| public String getContentType() |
| public String getContentEncoding() |
| public InputStream getStream() |
| public String getString() |
| public byte[] getBytes() |
| public void close(); |
| |
| Thanks to Groovy these methods can be accessed as attributes. |
| In the some of the examples the staticCode was retrieved for example. |
| |
| println Hdfs.put(hadoop).rm(dir).now().statusCode |
| |
| Groovy will invoke the getStatusCode method to retrieve the statusCode attribute. |
| |
| The three methods getStream(), getBytes() and getString deserve special attention. |
| Care must be taken that the HTTP body is read only once. |
| Therefore one of these methods (and only one) must be called once and only once. |
| Calling one of these more than once will cause an error. |
| Failing to call one of these methods once will result in lingering open HTTP connections. |
| The close() method may be used if the caller is not interested in reading the result body. |
| Most commands that do not expect a response body will call close implicitly. |
| If the body is retrieved via getBytes() or getString(), the close() method need not be called. |
| When using getStream(), care must be taken to consume the entire body otherwise lingering open HTTP connections will result. |
| The close() method may be called after reading the body partially to discard the remainder of the body. |
| |
| |
| Services |
| -------- |
| There are three basic DSL services and commands bundled with the shell. |
| |
| ${HHH} HDFS |
| Provides basic HDFS commands. |
| ***Using these DSL commands requires that WebHDFS be running in the Hadoop cluster.*** |
| |
| ${HHH} Jobs (Templeton/WebHCat) |
| Provides basic job submission and status commands. |
| ***Using these DSL commands requires that Templeton/WebHCat be running in the Hadoop cluster.*** |
| |
| ${HHH} Workflow (Oozie) |
| Provides basic workflow submission and status commands. |
| ***Using these DSL commands requires that Oozie be running in the Hadoop cluster.*** |
| |
| |
| HDFS Commands (WebHDFS) |
| ----------------------- |
| ${HHH} ls() - List the contents of a HDFS directory. |
| * Request |
| * dir (String) - The HDFS directory to list. |
| * Response |
| * BasicResponse |
| * Example |
| * `Hdfs.ls(hadoop).ls().dir("/").now()` |
| |
| ${HHH} rm() - Remove a HDFS file or directory. |
| * Request |
| * file (String) - The HDFS file or directory to remove. |
| * recursive (Boolean) - If the file is a directory also remove any contained files and directories. Optional: default=false |
| * Response |
| * EmptyResponse - Implicit close(). |
| * Example |
| * `Hdfs.rm(hadoop).file("/tmp/example").recursive().now()` |
| |
| ${HHH} put() - Copy a file from the local file system to HDFS. |
| * Request |
| * text (String) - The text to copy to the remote file. |
| * file (String) - The name of a local file to copy to the remote file. |
| * to (String) - The name of the remote file create. |
| * Response |
| * EmptyResponse - Implicit close(). |
| * Example |
| * `Hdfs.put(hadoop).file("localFile").to("/tmp/example/remoteFile").now()` |
| |
| ${HHH} get() - Copy a file from HDFS to the local file system. |
| * Request |
| * file (String) - The name of the local file to create from the remote file. If this isn't specified the file content must be read from the response. |
| * from (String) - The name of the remote file to copy. |
| * Response |
| * BasicResponse |
| * Example |
| * `Hdfs.get(hadoop).file("localFile").from("/tmp/example/remoteFile").now()` |
| |
| ${HHH} mkdir() - Create a directory in HDFS. |
| * Request |
| * dir (String) - The name of the remote directory to create. |
| * perm (String) - The permissions to create the remote directory with. Optional: default="777" |
| * Response |
| * EmptyResponse - Implicit close(). |
| * Example |
| * `Hdfs.mkdir(hadoop).dir("/tmp/example").perm("777").now()` |
| |
| |
| Job Commands (WebHCat/Templeton) |
| -------------------------------- |
| ${HHH} submitJava() - Submit a Java MapReduce job. |
| * Request |
| * jar (String) - The remote file name of the JAR containing the app to execute. |
| * app (String) - The app name to execute. This is wordcount for example not the class name. |
| * input (String) - The remote directory name to use as input for the job. |
| * output (String) - The remote directory name to store output from the job. |
| * Response |
| * jobId : String - The job ID of the submitted job. Consumes body. |
| * Example |
| * `Job.submitJava(hadoop).jar(remoteJarName).app(appName).input(remoteInputDir).output(remoteOutputDir).now().jobId` |
| |
| ${HHH} submitPig() - Submit a Pig job. |
| * Request |
| * file (String) - The remote file name of the pig script. |
| * arg (String) - An argument to pass to the script. |
| * statusDir (String) - The remote directory to store status output. |
| * Response |
| * jobId : String - The job ID of the submitted job. Consumes body. |
| * Example |
| * `Job.submitPig(hadoop).file(remotePigFileName).arg("-v").statusDir(remoteStatusDir).now()` |
| |
| ${HHH} submitHive() - Submit a Hive job. |
| * Request |
| * file (String) - The remote file name of the hive script. |
| * arg (String) - An argument to pass to the script. |
| * statusDir (String) - The remote directory to store status output. |
| * Response |
| * jobId : String - The job ID of the submitted job. Consumes body. |
| * Example |
| * `Job.submitHive(hadoop).file(remoteHiveFileName).arg("-v").statusDir(remoteStatusDir).now()` |
| |
| ${HHH} queryQueue() - Return a list of all job IDs registered to the user. |
| * Request |
| * No request parameters. |
| * Response |
| * BasicResponse |
| * Example |
| * `Job.queryQueue(hadoop).now().string` |
| |
| ${HHH} queryStatus() - Check the status of a job and get related job information given its job ID. |
| * Request |
| * jobId (String) - The job ID to check. This is the ID received when the job was created. |
| * Response |
| * BasicResponse |
| * Example |
| * `Job.queryStatus(hadoop).jobId(jobId).now().string` |
| |
| |
| Workflow Commands (Oozie) |
| ------------------------- |
| ${HHH} submit() - Submit a workflow job. |
| * Request |
| * text (String) - XML formatted workflow configuration string. |
| * file (String) - A filename containing XML formatted workflow configuration. |
| * action (String) - The initial action to take on the job. Optional: Default is "start". |
| * Response |
| * BasicResponse |
| * Example |
| * `Workflow.submit(hadoop).file(localFile).action("start").now()` |
| |
| ${HHH} status() - Query the status of a workflow job. |
| * Request |
| * jobId (String) - The job ID to check. This is the ID received when the job was created. |
| * Response |
| * BasicResponse |
| * Example |
| * `Workflow.status(hadoop).jobId(jobId).now().string` |
| |
| |
| Futures |
| ------- |
| The DSL supports the ability to invoke commands asynchronously via the later() invocation method. |
| The object returned from the later() method is a java.util.concurrent.Future parametrized with the response type of the command. |
| This is an example of how to asynchronously put a file to HDFS. |
| |
| future = Hdfs.put(hadoop).file("README").to("tmp/example/README").later() |
| println future.get().statusCode |
| |
| The future.get() method will block until the asynchronous command is complete. |
| To illustrate the usefullness of this however multiple concurrent commands are required. |
| |
| readmeFuture = Hdfs.put(hadoop).file("README").to("tmp/example/README").later() |
| licenseFuture = Hdfs.put(hadoop).file("LICENSE").to("tmp/example/LICENSE").later() |
| hadoop.waitFor( readmeFuture, licenseFuture ) |
| println readmeFuture.get().statusCode |
| println licenseFuture.get().statusCode |
| |
| The hadoop.waitFor() method will wait for one or more asynchronous commands to complete. |
| |
| |
| Closures |
| -------- |
| Futures alone only provide asynchronous invocation of the command. |
| What if some processing should also occur asynchronously once the command is complete. |
| Support for this is provided by closures. |
| Closures are blocks of code that are passed into the later() invocation method. |
| In Groovy these are contained within {} immediately after a method. |
| These blocks of code are executed once the asynchronous command is complete. |
| |
| Hdfs.put(hadoop).file("README").to("tmp/example/README").later(){ println it.statusCode } |
| |
| In this example the put() command is executed on a separate thread and once complete the { println it.statusCode } block is executed on that thread. |
| The it variable is automatically populated by Groovy and is a reference to the result that is returned from the future or now() method. |
| The future example above can be rewritten to illustrate the use of closures. |
| |
| readmeFuture = Hdfs.put(hadoop).file("README").to("tmp/example/README").later() { println it.statusCode } |
| licenseFuture = Hdfs.put(hadoop).file("LICENSE").to("tmp/example/LICENSE").later() { println it.statusCode } |
| hadoop.waitFor( readmeFuture, licenseFuture ) |
| |
| Again, the hadoop.waitFor() method will wait for one or more asynchronous commands to complete. |
| |
| Extension |
| --------- |
| Extensibility is a key design goal of the KnoxShell and DSL. |
| There are two ways to provide extended functionality for use with the shell. |
| The first is to simply create Groovy scripts that use the DSL to perform a useful task. |
| The second is to add new services and commands. |
| In order to add new service and commands new classes must be written in either Groovy or Java and added to the classpath of the shell. |
| Fortunately there is a very simple way to add classes and JARs to the shell classpath. |
| The first time the shell is executed it will create a configuration file in the same directory as the JAR with the same base name and a `.cfg` extension. |
| |
| bin/shell-${gateway-version}.jar |
| bin/shell-${gateway-version}.cfg |
| |
| That file contains both the main class for the shell as well as a definition of the classpath. |
| Currently that file will by default contain the following. |
| |
| main.class=org.apache.hadoop.gateway.shell.Shell |
| class.path=../lib; ../lib/*.jar; ../ext; ../ext/*.jar |
| |
| Therefore to extend the shell you should copy any new service and command class either to the `ext` directory or if they are packaged within a JAR copy the JAR to the `ext` directory. |
| The `lib` directory is reserved for JARs that may be delivered with the product. |
| |
| Below are samples for the service and command classes that would need to be written to add new commands to the shell. |
| These happen to be Groovy source files but could with very minor changes be Java files. |
| The easiest way to add these to the shell is to compile them directory into the `ext` directory. |
| *Note: This command depends upon having the Groovy compiler installed and available on the execution path.* |
| |
| groovyc -d ext -cp bin/shell-${gateway-version}.jar samples/SampleService.groovy samples/SampleSimpleCommand.groovy samples/SampleComplexCommand.groovy |
| |
| These source files are available in the samples directory of the distribution but these are included here for convenience. |
| |
| ${HHH} Sample Service (Groovy) |
| import org.apache.hadoop.gateway.shell.Hadoop |
| |
| class SampleService { |
| |
| static String PATH = "/namenode/api/v1" |
| |
| static SimpleCommand simple( Hadoop hadoop ) { |
| return new SimpleCommand( hadoop ) |
| } |
| |
| static ComplexCommand.Request complex( Hadoop hadoop ) { |
| return new ComplexCommand.Request( hadoop ) |
| } |
| |
| } |
| |
| ${HHH} Sample Simple Command (Groovy) |
| import org.apache.hadoop.gateway.shell.AbstractRequest |
| import org.apache.hadoop.gateway.shell.BasicResponse |
| import org.apache.hadoop.gateway.shell.Hadoop |
| import org.apache.http.client.methods.HttpGet |
| import org.apache.http.client.utils.URIBuilder |
| |
| import java.util.concurrent.Callable |
| |
| class SimpleCommand extends AbstractRequest<BasicResponse> { |
| |
| SimpleCommand( Hadoop hadoop ) { |
| super( hadoop ) |
| } |
| |
| private String param |
| SimpleCommand param( String param ) { |
| this.param = param |
| return this |
| } |
| |
| @Override |
| protected Callable<BasicResponse> callable() { |
| return new Callable<BasicResponse>() { |
| @Override |
| BasicResponse call() { |
| URIBuilder uri = uri( SampleService.PATH, param ) |
| addQueryParam( uri, "op", "LISTSTATUS" ) |
| HttpGet get = new HttpGet( uri.build() ) |
| return new BasicResponse( execute( get ) ) |
| } |
| } |
| } |
| |
| } |
| |
| ${HHH} Sample Complex Command (Groovy) |
| import com.jayway.jsonpath.JsonPath |
| import org.apache.hadoop.gateway.shell.AbstractRequest |
| import org.apache.hadoop.gateway.shell.BasicResponse |
| import org.apache.hadoop.gateway.shell.Hadoop |
| import org.apache.http.HttpResponse |
| import org.apache.http.client.methods.HttpGet |
| import org.apache.http.client.utils.URIBuilder |
| |
| import java.util.concurrent.Callable |
| |
| class ComplexCommand { |
| |
| static class Request extends AbstractRequest<Response> { |
| |
| Request( Hadoop hadoop ) { |
| super( hadoop ) |
| } |
| |
| private String param; |
| Request param( String param ) { |
| this.param = param; |
| return this; |
| } |
| |
| @Override |
| protected Callable<Response> callable() { |
| return new Callable<Response>() { |
| @Override |
| Response call() { |
| URIBuilder uri = uri( SampleService.PATH, param ) |
| addQueryParam( uri, "op", "LISTSTATUS" ) |
| HttpGet get = new HttpGet( uri.build() ) |
| return new Response( execute( get ) ) |
| } |
| } |
| } |
| |
| } |
| |
| static class Response extends BasicResponse { |
| |
| Response(HttpResponse response) { |
| super(response) |
| } |
| |
| public List<String> getNames() { |
| return JsonPath.read( string, "\$.FileStatuses.FileStatus[*].pathSuffix" ) |
| } |
| |
| } |
| |
| } |
| |
| |
| Groovy |
| ------ |
| The shell included in the distribution is basically an unmodified packaging of the Groovy shell. |
| Therefore these command are functionally equivalent if you have Groovy [installed][15]. |
| |
| java -jar bin/shell.jar sample/SmokeTestJob.groovy |
| groovy -cp bin/shell.jar sample/SmokeTestJob.groovy |
| |
| The interactive shell isn't exactly equivalent. |
| However the only difference is that the shell-${gateway-version}.jar automatically executes some additional imports that are useful for the KnoxShell DSL. |
| So these two sets of commands should be functionality equivalent. |
| ***However there is currently a class loading issue that prevents the groovysh command from working propertly.*** |
| |
| java -jar bin/shell.jar |
| |
| groovysh -cp bin/shell-${gateway-version}.jar |
| import org.apache.hadoop.gateway.shell.Hadoop |
| import org.apache.hadoop.gateway.shell.hdfs.Hdfs |
| import org.apache.hadoop.gateway.shell.job.Job |
| import org.apache.hadoop.gateway.shell.workflow.Workflow |
| import java.util.concurrent.TimeUnit |
| |
| Alternatively, you can use the Groovy Console which does not appear to have the same class loading issue. |
| |
| groovyConsole -cp bin/shell.jar |
| |
| import org.apache.hadoop.gateway.shell.Hadoop |
| import org.apache.hadoop.gateway.shell.hdfs.Hdfs |
| import org.apache.hadoop.gateway.shell.job.Job |
| import org.apache.hadoop.gateway.shell.workflow.Workflow |
| import java.util.concurrent.TimeUnit |
| |
| In addition because the DSL can be used via standard Groovy, the Groovy integrations in many popular IDEs (e.g. IntelliJ , Eclipse) can also be used. |
| This makes it particularly nice to develop and execute scripts to interact with Hadoop. |
| The code-completion feature in particular provides immense value. |
| All that is required is to add the shell-0.2.0.jar to the projects class path. |
| |
| There are a variety of Groovy tools that make it very easy to work with the standard interchange formats (i.e. JSON and XML). |
| In Groovy the creation of XML or JSON is typically done via a "builder" and parsing done via a "slurper". |
| In addition once JSON or XML is "slurped" the GPath, an XPath like feature build into Groovy can be used to access data. |
| * XML |
| * Markup Builder [Overview][5], [API][6] |
| * XML Slurper [Overview][7], [API][8] |
| * XPath [Overview][9], [API][10] |
| * JSON |
| * JSON Builder [API][11] |
| * JSON Slurper [API][12] |
| * JSON Path [API][14] |
| * GPath [Overview][13] |
| |
| |
| Disclaimer |
| ---------- |
| The Apache Knox Gateway is an effort undergoing incubation at the |
| Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC. |
| |
| Incubation is required of all newly accepted projects until a further review |
| indicates that the infrastructure, communications, and decision making process |
| have stabilized in a manner consistent with other successful ASF projects. |
| |
| While incubation status is not necessarily a reflection of the completeness |
| or stability of the code, it does indicate that the project has yet to be |
| fully endorsed by the ASF. |
| |
| [1]: http://en.wikipedia.org/wiki/Domain-specific_language |
| [2]: http://groovy.codehaus.org/ |
| [3]: https://code.google.com/p/rest-assured/ |
| [4]: http://en.wikipedia.org/wiki/Fluent_interface |
| [5]: http://groovy.codehaus.org/Creating+XML+using+Groovy's+MarkupBuilder |
| [6]: http://groovy.codehaus.org/api/groovy/xml/MarkupBuilder.html |
| [7]: http://groovy.codehaus.org/Reading+XML+using+Groovy's+XmlSlurper |
| [8]: http://groovy.codehaus.org/api/groovy/util/XmlSlurper.html |
| [9]: http://groovy.codehaus.org/GPath |
| [10]: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/xpath/XPath.html |
| [11]: http://groovy.codehaus.org/gapi/groovy/json/JsonBuilder.html |
| [12]: http://groovy.codehaus.org/gapi/groovy/json/JsonSlurper.html |
| [13]: http://groovy.codehaus.org/GPath |
| [14]: https://code.google.com/p/json-path/ |
| [15]: http://groovy.codehaus.org/Installing+Groovy |