blob: 6ce72d324f0f542987768862f9b2ff044680cae9 [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
////
[[Variables]]
:imagesdir: ../assets/images
:openvar: ${
:closevar: }
:description: Well-designed Hop solutions never use hard coded values. Hop offers variables on the global, project and environment levels, and allows workflows and parameters to provide parameters, to set and read variables in runtime etc.
= Variables
== What is a Hop variable?
You don't want to hardcode your solutions.
It's simply bad form to hardcode host names, user names, passwords, directories and so on.
Variables allow your solutions to adapt to a changing environment.
If for example the database server is different when developing than it is when running in production, you set it as a variable.
== How do I use a variable?
In the Hop user interface all places where you can enter a variable have a '$' symbol to the right of the input field:
image::variable-indicator.png[The variable indicator]
You can specify a variable by referencing it like this:
[source]
${VARIABLE_NAME}
*TIP* _You can see a list of defined variables by using CTRL-SPACE (CMD-SPACE on OSX) in the input field.
This helper will insert a selected variable into the input field._
Please note that you don't *have* to specify a variable in these places and that you can combine the variable with other information like:
[source]
${ENVIRONMENT_HOME}/input/source-file.txt
== Hexadecimal values
In rare cases you might have a need to enter non-character values as separators in 'binary' text files with for example a zero byte as a separator.
In those scenarios you can use a special 'variable' format:
[source]
$[<hexadecimal value>]
A few examples:
|===
|Value |Meaning
|`$[00]`
|A single byte with decimal value 0
|`$[FFFF]`
|Two bytes with decimal value 65535
|`$[CEA3]`
|The 2 bytes representing UTF-8 character Σ
|`$[e28ca8]`
|The 3 bytes representing UTF-8 character ⌨
|`$[f09d849e]`
|The 4 bytes representing UTF-8 character 𝄞
|===
== How can I define variables?
Variables can be defined and set in all the places where it makes sense:
* in hop-config.json when it applies to the installation
* in an environment configuration file when it concerns a specific lifecycle environment
* In a xref:projects/projects-environments.adoc[project]
* In a xref:pipeline/pipeline-run-configurations/pipeline-run-configurations.adoc[pipeline run configuration]
* As a default parameter value in a pipeline or workflow
* Using the xref:pipeline/transforms/setvariable.adoc[Set Variables] transform in a pipeline
* Using the xref:workflow/actions/setvariables.adoc[Set Variables] action in a workflow
* When executing with xref:hop-run/index.adoc[Hop run]
== Locality
Variables are local to the place where they are defined.
This means that setting them in a certain place means that it's going to be inherited from that place on.
This also means that's important to know where variables can be set and used and the hierarchy behind it.
== Hierarchy
.Variables Hierarchy
[width="90%",cols="2*",options="header"]
|===
|Place|Inheritance
|System properties|Inherited by the JVM and all other places in Hop
|Environment|Inherited by run configurations
|Run Configurations|Pipeline run configurations are inherited by the environment
|Pipeline|Inherited by the pipeline run configuration
|Workflow|Inherited by the workflow run configuration
|Metadata objects|Inherited by the place where it is loaded
|===
=== System properties
All system properties defined in the Hop configuration file 'hop-config.json' are available as variables as well as all Java system properties.
You can define new system properties in the Hop GUI using the system properties editor:
image::system-properties-menu.png[The system properties menu in Hop GUI]
You can also edit the 'hop-config.json' file manually:
[source,json]
{
"systemProperties" : {
"MY_SYSTEM_PROPERTY" : "SomeValue"
}
}
You can also use the hop-config command line tool to define system properties:
[source,bash]
sh hop-config.sh -s MY_SYSTEM_PROPERTY=SomeValue
System properties get set in Java Virtual Machine that Hop is running.
This means that you should limit yourself to only those variables which are really system specific.
=== Environment Variables
As you can read in the documentation regarding environments you can set variables there as well.
This helps you configure folders and other things which are environment specific.
You can set those in the Environment settings dialog or using the command line:
[source,bash]
sh hop-config.sh -e MyEnvironment -em -ev VARIABLE1=value1
=== Run Configurations
You can specify variables here to make a pipeline or workflow run in an engine agnostic way.
As an example you can have the same pipeline run on Hadoop with Spark and specify an input directory using hdfs:// and on Google DataFlow using gs://
=== Workflow
You can define variables in a workflow either with the "Set Variables", "JavaScript" actions or by defining parameters.
Parameters are variables which have a description and can have a default value.
=== Pipelines
You can define variables in a pipeline either with the "Set Variables", "JavaScript" transforms or by defining parameters.
Parameters are variables which have a description and can have a default value.
*IMPORTANT* Since in pipelines all transforms run in parallel you should never try to set and use the same variable in the same pipeline.
== Available global variables
The following variables are available in Hop through `Tools -> Edit config variables`
[options="header",width="90%"]
|===
|Variable name|Default Value|Description
|HOP_AGGREGATION_ALL_NULLS_ARE_ZERO|N|Set this variable to Y to return 0 when all values within an aggregate are NULL.
Otherwise by default a NULL is returned when all values are NULL.
|HOP_AGGREGATION_MIN_NULL_IS_VALUED|N|Set this variable to Y to set the minimum to NULL if NULL is within an aggregate.
Otherwise by default NULL is ignored by the MIN aggregate and MIN is set to the minimum value that is not NULL.
See also the variable HOP_AGGREGATION_ALL_NULLS_ARE_ZERO.
|HOP_ALLOW_EMPTY_FIELD_NAMES_AND_TYPES|false|Set this variable to TRUE to allow your pipeline to pass 'null' fields and/or empty types.
|HOP_COMPATIBILITY_DB_IGNORE_TIMEZONE|N|System wide flag to ignore timezone while writing date/timestamp value to the database.
|HOP_COMPATIBILITY_MERGE_ROWS_USE_REFERENCE_STREAM_WHEN_IDENTICAL|N|Set this variable to Y for backward compatibility for the Merge Rows (diff) transform.
Setting this to Y will use the data from the reference stream (instead of the comparison stream) in case the compared rows are identical.
|HOP_COMPATIBILITY_TEXT_FILE_OUTPUT_APPEND_NO_HEADER|N|Set this variable to Y for backward compatibility for the Text File Output transform.
Setting this to Ywill add no header row at all when the append option is enabled, regardless if the file is existing or not.
|HOP_CORE_TRANSFORMS_FILE||The name of the project variable that will contain the alternative location of the hop-transforms.xml file.
You can use this to customize the list of available internal transforms outside of the codebase.
|HOP_CORE_WORKFLOW_ACTIONS_FILE ||The name of the project variable that will contain the alternative location of the hop-workflow-actions.xml file.
|HOP_DEFAULT_BIGNUMBER_FORMAT||The name of the variable containing an alternative default bignumber format
|HOP_DEFAULT_DATE_FORMAT||The name of the variable containing an alternative default date format
|HOP_DEFAULT_INTEGER_FORMAT||The name of the variable containing an alternative default integer format
|HOP_DEFAULT_NUMBER_FORMAT||The name of the variable containing an alternative default number format
|HOP_DEFAULT_SERVLET_ENCODING||Defines the default encoding for servlets, leave it empty to use Java default encoding
|HOP_DEFAULT_TIMESTAMP_FORMAT||The name of the variable containing an alternative default timestamp format
|HOP_DISABLE_CONSOLE_LOGGING|N|Set this variable to Y to disable standard Hop logging to the console. (stdout)
|HOP_EMPTY_STRING_DIFFERS_FROM_NULL|N|NULL vs Empty String.
If this setting is set to Y, an empty string and null are different.
Otherwise they are not.
|HOP_FAIL_ON_LOGGING_ERROR|N|Set this variable to Y when you want the workflow/pipeline fail with an error when the related logging process (e.g. to a database) fails.
|HOP_FILE_OUTPUT_MAX_STREAM_COUNT|1024|This project variable is used by the Text File Output transform.
It defines the max number of simultaneously open files within the transform.
The transform will close/reopen files as necessary to insure the max is not exceeded
|HOP_FILE_OUTPUT_MAX_STREAM_LIFE|0|This project variable is used by the Text File Output transform.
It defines the max number of milliseconds between flushes of files opened by the transform.
|HOP_GLOBAL_LOG_VARIABLES_CLEAR_ON_EXPORT|false|Set this variable to false to preserve global log variables defined in pipeline / workflow Properties -> Log panel.
Changing it to true will clear it when export pipeline / workflow.
|HOP_LENIENT_STRING_TO_NUMBER_CONVERSION|N|System wide flag to allow lenient string to number conversion for backward compatibility.
If this setting is set to "Y", an string starting with digits will be converted successfully into a number. (example: 192.168.1.1 will be converted into 192 or 192.168 or 192168 depending on the decimal and grouping symbol).
The default (N) will be to throw an error if non-numeric symbols are found in the string.
|HOP_LOG_SIZE_LIMIT|0|The log size limit for all pipelines and workflows that don't have the "log size limit" property set in their respective properties.
|HOP_LOG_TAB_REFRESH_DELAY|1000|The hop log tab refresh delay.
|HOP_LOG_TAB_REFRESH_PERIOD|1000|The hop log tab refresh period.
|HOP_MAX_ACTIONS_LOGGED|5000|The maximum number of action results kept in memory for logging purposes.
|HOP_MAX_LOGGING_REGISTRY_SIZE|10000|The maximum number of logging registry entries kept in memory for logging purposes.
|HOP_MAX_LOG_SIZE_IN_LINES|0|The maximum number of log lines that are kept internally by Hop.
Set to 0 to keep all rows (default)
|HOP_MAX_LOG_TIMEOUT_IN_MINUTES|1440|The maximum age (in minutes) of a log line while being kept internally by Hop.
Set to 0 to keep all rows indefinitely (default)
|HOP_MAX_WORKFLOW_TRACKER_SIZE|5000|The maximum number of workflow trackers kept in memory
|HOP_PASSWORD_ENCODER_PLUGIN|Hop|Specifies the password encoder plugin to use by ID (Hop is the default).
|HOP_PIPELINE_PAN_JVM_EXIT_CODE||Set this variable to an integer that will be returned as the Pan JVM exit code.
|HOP_PLUGIN_CLASSES||A comma delimited list of classes to scan for plugin annotations
|HOP_PLUGIN_PACKAGES||A comma delimited list of packages to scan for plugin annotations (warning: slow!!)
|HOP_REDIRECT_STDERR|N|Set this variable to Y to redirect stderr to Hop logging.
|HOP_REDIRECT_STDOUT|N|Set this variable to Y to redirect stdout to Hop logging.
|HOP_ROWSET_GET_TIMEOUT|50|The name of the variable that optionally contains an alternative rowset get timeout (in ms).
This only makes a difference for extremely short lived pipelines.
|HOP_ROWSET_PUT_TIMEOUT|50|The name of the variable that optionally contains an alternative rowset put timeout (in ms).
This only makes a difference for extremely short lived pipelines.
|HOP_SERVER_JETTY_ACCEPTORS||A variable to configure jetty option: acceptors for Carte
|HOP_SERVER_JETTY_ACCEPT_QUEUE_SIZE||A variable to configure jetty option: acceptQueueSize for Carte
|HOP_SERVER_JETTY_RES_MAX_IDLE_TIME||A variable to configure jetty option: lowResourcesMaxIdleTime for Carte
|HOP_SERVER_OBJECT_TIMEOUT_MINUTES|1440|This project variable will set a time-out after which waiting, completed or stopped pipelines and workflows will be automatically cleaned up.
The default value is 1440 (one day).
|HOP_SPLIT_FIELDS_REMOVE_ENCLOSURE|false|Set this variable to false to preserve enclosure symbol after splitting the string in the Split fields transform.
Changing it to true will remove first and last enclosure symbol from the resulting string chunks.
|HOP_SYSTEM_HOSTNAME||You can use this variable to speed up hostname lookup.
Hostname lookup is performed by Hop so that it is capable of logging the server on which a workflow or pipeline is executed.
|HOP_TRANSFORM_PERFORMANCE_SNAPSHOT_LIMIT|0|The maximum number of transform performance snapshots to keep in memory.
Set to 0 to keep all snapshots indefinitely (default)
|HOP_USE_NATIVE_FILE_DIALOG|N|Set this value to Y if you want to use the system file open/save dialog when browsing files
|NEO4J_LOGGING_CONNECTION||Set this variable to the name of an existing Neo4j connection to enable execution logging to a Neo4j database.
|===
== Environment variables
Set the environment variables listed below in your operating system to configure Hop's startup behavior:
[options="header",width="90%"]
|===
|Variable name|Default Value|Description
|HOP_AUDIT_FOLDER||Set this variable to a valid path on your machine to store Hop's audit information.
This information includes last opened files per project, zoom size and lots more.
|HOP_CONFIG_FOLDER||Set this variable to a valid path on your machine to store Hop's configuration outside of your Hop installation's `config` folder
|HOP_PLUGIN_BASE_FOLDERS||Set this variable to point Hop to a comma separated list of folders where you want Hop to look for additional plugins.
|===