blob: 891ecee9d650f3efa43cf34d5448bdaaafaa5b3f [file] [log] [blame] [view]
Apache Apex Packages
==================================================
# Application Packages
An Apache Apex Application Package is a zip file that contains all the
necessary files to launch an application in Apache Apex. It is the
standard way for assembling and sharing an Apache Apex application.
## Requirements
You will need have the following installed:
1. Apache Maven 3.0 or later (for assembling the App Package)
2. Apache Apex 3.2.0 or later (for launching the App Package in your cluster)
## Creating Your First Apex App Package
You can create an Apex Application Package using your Linux command
line, or using your favorite IDE.
### Using Command Line
First, change to the directory where you put your projects, and create
an Apex application project using Maven by running the following
command. Replace "com.example", "mydtapp" and "1.0-SNAPSHOT" with the
appropriate values (make sure this is all on one line):
$ mvn archetype:generate \
-DarchetypeGroupId=org.apache.apex \
-DarchetypeArtifactId=apex-app-archetype -DarchetypeVersion=3.4.0 \
-DgroupId=com.example -Dpackage=com.example.mydtapp -DartifactId=mydtapp \
-Dversion=1.0-SNAPSHOT
This creates a Maven project named "mydtapp". Open it with your favorite
IDE (e.g. NetBeans, Eclipse, IntelliJ IDEA). In the project, there is a
sample DAG that generates a number of tuples with a random number and
prints out "hello world" and the random number in the tuples. The code
that builds the DAG is in
src/main/java/com/example/mydtapp/Application.java, and the code that
runs the unit test for the DAG is in
src/test/java/com/example/mydtapp/ApplicationTest.java. Try it out by
running the following command:
$cd mydtapp; mvn package
This builds the App Package runs the unit test of the DAG. You should
be getting test output similar to this:
```
-------------------------------------------------------
TESTS
-------------------------------------------------------
Running com.example.mydtapp.ApplicationTest
hello world: 0.8015370953286478
hello world: 0.9785359225545481
hello world: 0.6322611586644047
hello world: 0.8460953663451775
hello world: 0.5719372906929072
hello world: 0.6361174312337172
hello world: 0.14873007534816318
hello world: 0.8866986277418261
hello world: 0.6346526809866057
hello world: 0.48587295703904465
hello world: 0.6436832429676687
...
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.863
sec
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
```
The "mvn package" command creates the App Package file in target
directory as target/mydtapp-1.0-SNAPSHOT.apa. You will be able to use
that App Package file to launch this sample application in your actual
Apex installation.
Alternatively you can perform the same steps within your IDE (IDEA IntelliJ, Eclipse, NetBeans all support it). Please check the IDE documentation for details.
Group ID: org.apache.apex
Artifact ID: apex-app-archetype
Version: 3.4.0 (or any later version)
## Writing Your Own App Package
Please refer to the [Application Developer Guide][application_development.md] on the basics on how to write an Apache Apex application. In your AppPackage project, you can add custom operators (refer to [Operator Development Guide](operator_development.md), project dependencies, default and required configuration properties, pre-set configurations and other metadata. Note that you can also specify the DAG using Java, JSON or properties files.
### Adding (and removing) project dependencies
Under the project, you can add project dependencies in pom.xml, or do it
through your IDE. Heres the section that describes the dependencies in
the default pom.xml:
```
<dependencies>
<!-- add your dependencies here -->
<dependency>
<groupId>org.apache.apex</groupId>
<artifactId>malhar-library</artifactId>
<version>${apex.version}</version>
<!--
If you know your application do not need the transitive dependencies that are pulled in by malhar-library,
Uncomment the following to reduce the size of your app package.
-->
<!--
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
-->
</dependency>
<dependency>
<groupId>org.apache.apex</groupId>
<artifactId>apex-engine</artifactId>
<version>${apex.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.10</version>
<scope>test</scope>
</dependency>
</dependencies>
```
By default, as shown above, the default dependencies include
malhar-library in compile scope, dt-engine in provided scope, and junit
in test scope. Do not remove these three dependencies since they are
necessary for any Apex application. You can, however, exclude
transitive dependencies from malhar-library to reduce the size of your
App Package, provided that none of the operators in malhar-library that
need the transitive dependencies will be used in your application.
In the sample application, it is safe to remove the transitive
dependencies from malhar-library, by uncommenting the "exclusions"
section. It will reduce the size of the sample App Package from 8MB to
700KB.
Note that if we exclude \*, in some versions of Maven, you may get
warnings similar to the following:
```
[WARNING] 'dependencies.dependency.exclusions.exclusion.groupId' for
org.apache.apex:malhar-library:jar with value '*' does not match a
valid id pattern.
[WARNING]
[WARNING] It is highly recommended to fix these problems because they
threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support
building such malformed projects.
[WARNING]
```
This is a bug in early versions of Maven 3. The dependency exclusion is
still valid and it is safe to ignore these warnings.
### Application Configuration
A configuration file can be used to configure an application. Different
kinds of configuration parameters can be specified. They are application
attributes, operator attributes and properties, port attributes, stream
properties and application specific properties. They are all specified
as name value pairs, in XML format, like the following.
```
<?xml version="1.0"?>
<configuration>
<property>
<name>some_name_1</name>
<value>some_default_value</value>
</property>
<property>
<name>some_name_2</name>
<value>some_default_value</value>
</property>
</configuration>
```
### Application attributes
Application attributes are used to specify the platform behavior for the
application. They can be specified using the parameter
```dt.attr.<attribute>```. The prefix dt is a constant, attr is a
constant denoting an attribute is being specified and ```<attribute>```
specifies the name of the attribute. Below is an example snippet setting
the streaming windows size of the application to be 1000 milliseconds.
```
<property>
<name>dt.attr.STREAMING_WINDOW_SIZE_MILLIS</name>
<value>1000</value>
</property>
```
The name tag specifies the attribute and value tag specifies the
attribute value. The name of the attribute is a JAVA constant name
identifying the attribute. The constants are defined in
com.datatorrent.api.Context.DAGContext and the different attributes can
be specified in the format described above.
### Operator attributes
Operator attributes are used to specify the platform behavior for the
operator. They can be specified using the parameter
```dt.operator.<operator-name>.attr.<attribute>```. The prefix dt is a
constant, operator is a constant denoting that an operator is being
specified, ```<operator-name>``` denotes the name of the operator, attr is
the constant denoting that an attribute is being specified and
```<attribute>``` is the name of the attribute. The operator name is the
same name that is specified when the operator is added to the DAG using
the addOperator method. An example illustrating the specification is
shown below. It specifies the number of streaming windows for one
application window of an operator named input to be 10
```
<property>
<name>dt.operator.input.attr.APPLICATION_WINDOW_COUNT</name>
<value>10</value>
</property>
```
The name tag specifies the attribute and value tag specifies the
attribute value. The name of the attribute is a JAVA constant name
identifying the attribute. The constants are defined in
com.datatorrent.api.Context.OperatorContext and the different attributes
can be specified in the format described above.
### Operator properties
Operators can be configured using operator specific properties. The
properties can be specified using the parameter
```dt.operator.<operator-name>.prop.<property-name>```. The difference
between this and the operator attribute specification described above is
that the keyword prop is used to denote that it is a property and
```<property-name>``` specifies the property name. An example illustrating
this is specified below. It specifies the property hostname of the
redis server for a redis output operator.
```
<property>
<name>dt.operator.redis.prop.host</name>
<value>127.0.0.1</value>
</property>
```
The name tag specifies the property and the value specifies the property
value. The property name is converted to a setter method which is called
on the actual operator. The method name is composed by appending the
word set and the property name with the first character of the name
capitalized. In the above example the setter method would become
setHost. The method is called using JAVA reflection and the property
value is passed as an argument. In the above example the method setHost
will be called on the redis operator with 127.0.0.1 as the argument.
### Port attributes
Port attributes are used to specify the platform behavior for input and
output ports. They can be specified using the parameter ```dt.operator.<operator-name>.inputport.<port-name>.attr.<attribute>```
for input port and ```dt.operator.<operator-name>.outputport.<port-name>.attr.<attribute>```
for output port. The keyword inputport is used to denote an input port
and outputport to denote an output port. The rest of the specification
follows the conventions described in other specifications above. An
example illustrating this is specified below. It specifies the queue
capacity for an input port named input of an operator named range to
be 4k.
```
<property>
<name>dt.operator.range.inputport.input.attr.QUEUE_CAPACITY</name>
<value>4000</value>
</property>
```
The name tag specifies the attribute and value tag specifies the
attribute value. The name of the attribute is a JAVA constant name
identifying the attribute. The constants are defined in
com.datatorrent.api.Context.PortContext and the different attributes can
be specified in the format described above.
The attributes for an output port can also be specified in a similar way
as described above with a change that keyword outputport is used
instead of intputport”. A generic keyword port can be used to specify
either an input or an output port. It is useful in the wildcard
specification described below.
### Stream properties
Streams can be configured using stream properties. The properties can be
specified using the parameter
```dt.stream.<stream-name>.prop.<property-name>``` The constant stream
specifies that it is a stream, ```<stream-name>``` specifies the name of the
stream and ```<property-name>``` the name of the property. The name of the
stream is the same name that is passed when the stream is added to the
DAG using the addStream method. An example illustrating the
specification is shown below. It sets the locality of the stream named
stream1 to container local indicating that the operators the stream is
connecting be run in the same container.
```
<property>
<name>dt.stream.stream1.prop.locality</name>
<value>CONTAINER_LOCAL</value>
</property>
```
The property name is converted into a set method on the stream in the
same way as described in operator properties section above. In this case
the method would be setLocality and it will be called in the stream
stream1 with the value as the argument.
Along with the above system defined parameters, the applications can
define their own specific parameters they can be specified in the
configuration file. The only condition is that the names of these
parameters dont conflict with the system defined parameters or similar
application parameters defined by other applications. To this end, it is
recommended that the application parameters have the format
```<full-application-class-name>.<param-name>.``` The
full-application-class-name is the full JAVA class name of the
application including the package path and param-name is the name of the
parameter within the application. The application will still have to
still read the parameter in using the configuration API of the
configuration object that is passed in populateDAG.
### Wildcards
Wildcards and regular expressions can be used in place of names to
specify a group for applications, operators, ports or streams. For
example, to specify an attribute for all ports of an operator it can be
done as follows
```
<property>
<name>dt.operator.range.port.*.attr.QUEUE_CAPACITY</name>
<value>4000</value>
</property>
```
The wildcard \* was used instead of the name of the port. Wildcard can
also be used for operator name, stream name or application name. Regular
expressions can also be used for names to specify attributes or
properties for a specific set.
### Adding configuration properties
It is common for applications to require configuration parameters to
run. For example, the address and port of the database, the location of
a file for ingestion, etc. You can specify them in
src/main/resources/META-INF/properties.xml under the App Package
project. The properties.xml may look like:
```
<?xml version="1.0"?>
<configuration>
<property>
<name>some_name_1</name>
</property>
<property>
<name>some_name_2</name>
<value>some_default_value</value>
</property>
</configuration>
```
The name of an application-specific property takes the form of:
```dt.operator.{opName}.prop.{propName} ```
The first represents the property with name propName of operator opName.
Or you can set the application name at run time by setting this
property:
dt.attr.APPLICATION_NAME
In this example, property some_name_1 is a required property which
must be set at launch time, or it must be set by a pre-set configuration
(see next section). Property some\_name\_2 is a property that is
assigned with value some\_default\_value unless it is overridden at
launch time.
### Adding pre-set configurations
At build time, you can add pre-set configurations to the App Package by
adding configuration XML files under ```src/site/conf/<conf>.xml```in your
project. You can then specify which configuration to use at launch
time. The configuration XML is of the same format of the properties.xml
file.
### Application-specific properties file
You can also specify properties.xml per application in the application
package. Just create a file with the name properties-{appName}.xml and
it will be picked up when you launch the application with the specified
name within the application package. In short:
properties.xml: Properties that are global to the Configuration
Package
properties-{appName}.xml: Properties that are specific when launching
an application with the specified appName.
### Properties source precedence
If properties with the same key appear in multiple sources (e.g. from
app package default configuration as META-INF/properties.xml, from app
package configuration in the conf directory, from launch time defines,
etc), the precedence of sources, from highest to lowest, is as follows:
1. Launch time defines (using -D option in CLI)
2. Launch time specified configuration file in file system (using -conf
option in CLI)
3. Launch time specified package configuration (using -apconf option in
CLI)
4. Configuration from \$HOME/.dt/dt-site.xml
5. Application defaults within the package as
META-INF/properties-{appname}.xml
6. Package defaults as META-INF/properties.xml
7. dt-site.xml in local DT installation
8. dt-site.xml stored in HDFS
### Other meta-data
In a Apex App Package project, the pom.xml file contains a
section that looks like:
```
<properties>
<apex.version>3.4.0</apex.version>
<apex.apppackage.classpath\>lib*.jar</apex.apppackage.classpath>
</properties>
```
apex.version is the Apache Apex version that are to be used
with this Application Package.
apex.apppackage.classpath is the classpath that is used when
launching the application in the Application Package. The default is
lib/\*.jar, where lib is where all the dependency jars are kept within
the Application Package. One reason to change this field is when your
Application Package needs the classpath in a specific order.
### Logging configuration
Just like other Java projects, you can change the logging configuration
by having your log4j.properties under src/main/resources. For example,
if you have the following in src/main/resources/log4j.properties:
```
log4j.rootLogger=WARN,CONSOLE
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} [%t] %-5p
%c{2} %M - %m%n
```
The root loggers level is set to WARN and the output is set to the console (stdout).
Note that by default from project created from the maven archetype,
there is already a log4j.properties file under src/test/resources and
that file is only used for the unit test.
## Zip Structure of Application Package
Apache Apex Application Package files are zip files. You can examine the content of any Application Package by using unzip -t on your Linux command line.
There are four top level directories in an Application Package:
1. "app" contains the jar files of the DAG code and any custom operators, and any JSON or properties files that specify a DAG.
2. "lib" contains all dependency jars
3. "conf" contains all the pre-set configuration XML files.
4. "META-INF" contains the MANIFEST.MF file and the properties.xml file.
5. resources contains any other files
## Examining and Launching Application Packages Through CLI
If you are working with Application Packages in the local filesystem, you can use the Apex Command Line Interface (apex).
### Getting Application Package Meta Information
You can get the meta information about the Application Package using
this Apex CLI command.
```
apex> get-app-package-info <app-package-file>
```
### Getting Available Operators In Application Package
You can get the list of available operators in the Application Package
using this command.
```
apex> get-app-package-operators <app-package-file> <package-prefix>
[parent-class]
```
### Getting Properties of Operators in Application Package
You can get the list of properties of any operator in the Application
Package using this command.
apex> get-app-package-operator-properties <app-package-file> <operator-class>
### Launching an Application Package
You can launch an application within an Application Package.
```
apex> launch [-D property-name=property-value, ...] [-conf config-name]
[-apconf config-file-within-app-package] <app-package-file>
[matching-app-name]
```
Note that -conf expects a configuration file in the file system, while -apconf expects a configuration file within the app package.
# Configuration Packages
Sometimes just a configuration file is not enough for launching an application package. If a configuration requires
additional files to be packaged, you can use an Apex Configuration Package.
## Creating Configuration Packages
Creating Configuration Packages is similar to creating Application Packages. You can create a configuration
package project using Maven by running the following command. Replace "com.example", "mydtconfig" and "1.0-SNAPSHOT" with the appropriate values:
```
$ mvn archetype:generate -DarchetypeGroupId=org.apache.apex \
-DarchetypeArtifactId=apex-conf-archetype -DarchetypeVersion=3.4.0 \
-DgroupId=com.example -Dpackage=com.example.mydtconfig -DartifactId=mydtconfig \
-Dversion=1.0-SNAPSHOT
```
And create the configuration package file by running:
```
$ mvn package
```
The "mvn package" command creates the Config Package file in target
directory as target/mydtconfig.apc. You will be able to use that
Configuration Package file to launch an Apache Apex application.
## Assembling your own configuration package
Inside the project created by the archetype, these are the files that
you should know about when assembling your own configuration package:
./pom.xml
./src/main/resources/classpath
./src/main/resources/files
./src/main/resources/META-INF/properties.xml
./src/main/resources/META-INF/properties-{appname}.xml
### pom.xml
Example:
```xml
<groupId>com.example</groupId>
<version>1.0.0</version>
<artifactId>mydtconf</artifactId>
<packaging>jar</packaging>
<!-- change these to the appropriate values -->
<name>My Apex Application Configuration</name>
<description>My Custom Application Configuration Description</description>
<properties>
<apex.apppackage.name>myapexapp</apex.apppackage.name>
<apex.apppackage.minversion>1.0.0</apex.apppackage.minversion>
<apex.apppackage.maxversion>1.9999.9999</apex.apppackage.maxversion>
<apex.appconf.classpath>classpath/*</apex.appconf.classpath>
<apex.appconf.files>files/*</apex.appconf.files>
</properties>
```
In pom.xml, you can change the following keys to your desired values
* ```<groupId>```
* ```<version>```
* ```<artifactId>```
* ```<name> ```
* ```<description>```
You can also change the values of
* ```<apex.apppackage.name>```
* ```<apex.apppackage.minversion>```
* ```<apex.apppackage.maxversion>```
to reflect what Application Packages can be used with this configuration package. Apex will use this information to check whether a
configuration package is compatible with the Application Package when you issue a launch command.
### ./src/main/resources/classpath
Place any file in this directory that youd like to be copied to the
compute machines when launching an application and included in the
classpath of the application. Example of such files are Java properties
files and jar files.
### ./src/main/resources/files
Place any file in this directory that youd like to be copied to the
compute machines when launching an application but not included in the
classpath of the application.
### Properties XML file
A properties xml file consists of a set of key-value pairs. The set of
key-value pairs specifies the configuration options the application
should be launched with.
Example:
```xml
<configuration>
<property>
<name>some-property-name</name>
<value>some-property-value</value>
</property>
...
</configuration>
```
Names of properties XML file:
* **properties.xml:** Properties that are global to the Configuration
Package
* **properties-{appName}.xml:** Properties that are specific when launching
an application with the specified appName within the Application
Package.
After you are done with the above, remember to do mvn package to
generate a new configuration package, which will be located in the
target directory in your project.
### Zip structure of configuration package
Apex Application Configuration Package files are zip files. You
can examine the content of any Application Configuration Package by
using unzip -t on your Linux command line. The structure of the zip
file is as follow:
```
META-INF
MANIFEST.MF
properties.xml
properties-{appname}.xml
classpath
{classpath files}
files
{files}
```
### Launching with CLI
`-conf` option of the launch command in CLI supports specifying configuration package in the local filesystem. Example:
dt\> launch mydtapp-1.0.0.apa -conf mydtconfig.apc
This command expects both the application package and the configuration package to be in the local file system.