| % |
| % Licensed to the Apache Software Foundation (ASF) under one |
| % or more contributor license agreements. See the NOTICE file |
| % distributed with this work for additional information |
| % regarding copyright ownership. The ASF licenses this file |
| % to you under the Apache License, Version 2.0 (the |
| % "License"); you may not use this file except in compliance |
| % with the License. You may obtain a copy of the License at |
| % |
| % http://www.apache.org/licenses/LICENSE-2.0 |
| % |
| % Unless required by applicable law or agreed to in writing, |
| % software distributed under the License is distributed on an |
| % "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| % KIND, either express or implied. See the License for the |
| % specific language governing permissions and limitations |
| % under the License. |
| % |
| % Create well-known link to this spot for HTML version |
| \ifpdf |
| \else |
| \HCode{<a name='DUCC_CLI_SUBMIT'></a>} |
| \fi |
| |
| \section{ducc\_submit} |
| \label{sec:cli.ducc-submit} |
| % The source for this section is ducc\_duccbook/documents/part-user/cli/submit.xml. |
| \paragraph{Description:} |
| The submit CLI is used to submit work for execution by DUCC. DUCC assigns a unique id to the |
| job and schedules it for execution. The submitter may optionally request that the progress of |
| the job is monitored, in which case the state of the job as it progresses through its |
| lifetime is printed on the console. |
| \paragraph{Usage:} |
| \begin{description} |
| \item[Script wrapper] \ducchome/bin/ducc\_submit {\em options} |
| \item[Java Main] java -cp \ducchome/lib/uima-ducc-cli.jar org.apache.uima.ducc.cli.DuccJobSubmit {\em options} |
| \end{description} |
| |
| \paragraph{Options:} |
| \begin{description} |
| |
| \item[$--$all\_in\_one $<$local $|$ remote $>$] |
| Run driver and pipeline in single process. If {\em local} is specified, the |
| process is executed on the local machine, for example, in the current Eclipse session. |
| If {\em remote} is specified, the jobs is submitted to DUCC as a {\em managed reservation} |
| and run on some (presumably larger) machine allocated by DUCC. |
| |
| \item[$--$attach\_console] If specified, redirect remote stdout and stderr |
| to the local submitting console. |
| |
| \item[$--$cancel\_on\_interrupt] If specified, the job is monitored |
| and will be canceled if the submit command is interrupted, e.g. with CTRL-C. |
| This option always implies {\em $--$wait\_for\_completion}. |
| |
| \item[$--$classpath {[path-string]}] The CLASSPATH used for the job. If specified, this is used |
| for both the Job Driver and each Job Process. If not specified, the CLASSPATH of the |
| process invoking this request is used. |
| |
| \item[$--$classpath\_order {[user-before-ducc $|$ ducc-before-user]} ] |
| OBSOLETE - ignored. |
| |
| \item[$--$debug] Enable debugging messages. |
| This is primarily for debugging DUCC itself. |
| |
| \item[$--$description {[text]}] The text is any string used to describe the job. It is |
| displayed in the Web Server. When specified on a command-line the text usually |
| must be surrounded by quotes to protect it from the shell. The default is ``none''. |
| |
| \item[$--$driver\_debug {[debug-port]}] Append JVM debug flags to the JVM arguments |
| to start the JobDriver in remote debug mode. The remote process debugger will attempt |
| to contact the specified port. |
| |
| \item[$--$driver\_descriptor\_CR {[descriptor.xml]} ] This is the XML descriptor for the |
| Collection Reader. This |
| descriptor is a resource that is searched for in the filesystem or Java classpath as described |
| in the ~\hyperref[par:cli.submit.notes]{notes below}. (Required) |
| |
| \item[$--$driver\_descriptor\_CR\_overrides {[list]} ] |
| This is the Job Driver collection reader configuration overrides. They are specified as |
| name/value pairs in a whitespace-delimited list. Example: |
| \begin{verbatim} |
| --driver_descriptor_CR_overrides name1=value1 name2=value2... |
| \end{verbatim} |
| |
| \item[$--$driver\_exception\_handler {[classname]}] |
| This specifies a developer-supplied exception handler for the Job Driver. |
| It must implement {\em org.apache.uima.ducc.IErrorHandler} or extend |
| {\em org.apache.uima.ducc.ErrorHandler}. A built-in default exception handler is provided. |
| |
| \item[$--$driver\_exception\_handler\_arguments {[argument-string]}] This is a string |
| containing arguments for the exception handler. The contents of |
| the string is entirely a function of the specified exception handler. If not specified, |
| a {\em null} is passed in. |
| \\The built-in default exception handler supports an argument string of the following form |
| (with NO embedded blanks): |
| \begin{verbatim} |
| max_job_errors=15 max_timeout_retrys_per_workitem=0 |
| \end{verbatim} |
| Note: When used as a CLI option, the string must usually be |
| quoted to protect it from the shell, if it contains blanks. |
| The built-in default exception handler supports two arguments, whose |
| default values are shown above. The max\_job\_errors limit specifies the number |
| of work item errors allowed before forcibly terminating the job. The |
| max\_timeout\_retrys\_per\_workitem limit specifies the number of times each |
| work item is retried in the event of a time-out. |
| |
| \item[$--$driver\_jvm\_args {[list]} ] |
| This specifies extra JVM arguments to be provided to the Job Driver process. It is a blank-delimited |
| list of strings. Example: |
| \begin{verbatim} |
| --driver_jvm_args -Xmx100M -Xms50M |
| \end{verbatim} |
| Note: When used as a CLI option, the list must usually be |
| quoted to protect it from the shell. |
| |
| \item[$--$environment {[env vars]}] Blank-delimited list of environment variables and |
| variable assignments. |
| Entries will be copied from the user's environment if just the variable name is |
| specified, optionally with a final '*' for those with the same prefix. |
| If specified, this is used for all DUCC processes in the job. Example: |
| \begin{verbatim} |
| --environment TERM=xterm DISPLAY=:1.0 LANG UIMA_* |
| \end{verbatim} |
| Additional entries may be copied from the user's environment based on the setting of |
| {\em ducc.submit.environment.propagated} |
| in the global DUCC configuration ducc.properties. |
| \\Note: When used as a CLI option, the environment string must usually be |
| quoted to protect it from the shell. |
| |
| |
| The following cause special runtime behavior. |
| They are considered experimental and are not guaranteed |
| to be effective from release to release. |
| |
| \begin{enumerate} |
| \item[DUCC\_USER\_CP\_PREPEND {[path-to-ducc-jars-and-classes]} ] |
| If specified, this path is used to supply the DUCC classes required for running |
| the Job Driver and Job Process(es), normally set to \$DUCC\_HOME/lib/uima-ducc/users/*. |
| \end{enumerate} |
| |
| |
| \item[$--$help ] |
| Prints the usage text to the console. |
| |
| \item[$--$jvm {[path-to-java]} ] |
| States the JVM to use. If not specified, the same JVM used by the Agents is used. This is |
| the full path to the JVM, not the JAVA\_HOME. |
| Example: |
| \begin{verbatim} |
| --jvm /share/jdk1.6/bin/java |
| \end{verbatim} |
| |
| \item[$--$log\_directory {[path-to-log-directory]} ] |
| This specifies the path to the directory for the user logs. |
| If not fully specified the path is made relative to the value of the {\em $--$working\_directory}. |
| If omitted, the default is \$HOME/ducc/logs. |
| Example: |
| \begin{verbatim} |
| --log_directory /home/bob |
| \end{verbatim} |
| Within this directory DUCC creates a sub-directory for each job, using the unique numerical |
| ID of the job. The format of the generated log file names as described |
| \hyperref[chap:job-logs]{here}. |
| \\Note: The {\em $--$log\_directory} specifies only the path to a directory where |
| logs are to be stored. In order to manage multiple processes running in multiple |
| machines, sub-directory and file names are generated by DUCC and may |
| not be directly specified. |
| |
| \item[$--$process\_debug {[debug-port]}] Append JVM debug flags to the JVM |
| arguments to start the Job Process in remote debug mode. The remote process will start |
| its debugger and attempt to contact the debugger (usually Eclipse) on the specified |
| port. |
| |
| \item[$--$process\_deployments\_max {[integer]} ] |
| This specifies the maximum number of Job Processes to deploy at any given time. If not |
| specified, DUCC will attempt to provide the largest number of processes within the |
| constraints of fair\_share scheduling and the amount of work remaining. |
| in the job. Example: |
| \begin{verbatim} |
| --process_deployments_max 66 |
| \end{verbatim} |
| |
| \item[$--$process\_descriptor\_AE {[descriptor]} ] |
| This specifies the Analysis Engine descriptor to be deployed in the Job Processes. This |
| descriptor is a resource that is searched for in the filesystem or Java classpath as described |
| in the ~\hyperref[par:cli.submit.notes]{notes below}. |
| It is mutually exclusive with {\em $--$process\_descriptor\_DD}. |
| Example: |
| \begin{verbatim} |
| --process_descriptor_AE /home/billy/resource/AE_foo.xml |
| \end{verbatim} |
| |
| |
| \item[$--$process\_descriptor\_AE\_overrides {[list]} ] |
| This specifies AE overrides. It is a whitespace-delimited list of name/value pairs. Example: |
| \begin{verbatim} |
| --process_descriptor_AE_Overrides name1=value1 name2=value2 |
| \end{verbatim} |
| |
| \item[$--$process\_descriptor\_CC {[descriptor]} ] |
| This specifies the CAS Consumer descriptor to be deployed in the Job Processes. This |
| descriptor is a resource that is searched for in the filesystem or Java classpath as described |
| in the ~\hyperref[par:cli.submit.notes]{notes below}. |
| It is mutually exclusive with {\em $--$process\_descriptor\_DD}. |
| Example: |
| \begin{verbatim} |
| --process_descriptor_CC /home/billy/resourceCCE_foo.xml |
| \end{verbatim} |
| |
| \item[$--$process\_descriptor\_CC\_overrides {[list]} ] |
| This specifies CC overrides. It is a whitespace-delimited list of name/value pairs. Example: |
| \begin{verbatim} |
| --process_descriptor_CC_overrides name1=value1 name2=value2 |
| \end{verbatim} |
| |
| \item[$--$process\_descriptor\_CM {[descriptor]} ] |
| This specifies the CAS Multiplier descriptor to be deployed in the Job Processes. This |
| descriptor is a resource that is searched for in the filesystem or Java classpath as described |
| in the ~\hyperref[par:cli.submit.notes]{notes below}. |
| It is mutually exclusive with {\em $--$process\_descriptor\_DD}. |
| Example: |
| \begin{verbatim} |
| --process_descriptor_CM /home/billy/resource/CM_foo.xml |
| \end{verbatim} |
| |
| \item[$--$process\_descriptor\_CM\_overrides {[list]} ] |
| This specifies CM overrides. It is a whitespace-delimited list of name/value pairs. Example: |
| \begin{verbatim} |
| --process_descriptor_CM_overrides name1=value1 name2=value2 |
| \end{verbatim} |
| |
| \item[$--$process\_descriptor\_DD {[descriptor]} ] |
| This specifies a UIMA Deployment Descriptor for the job processes for DD-style jobs. |
| This is mutually exclusive with {\em $--$process\_descriptor\_AE}, {\em $--$process\_descriptor\_CM}, |
| and {\em $--$process\_descriptor\_CC}. This |
| descriptor is a resource that is searched for in the filesystem or Java classpath as described |
| in the ~\hyperref[par:cli.submit.notes]{notes below}. |
| Example: |
| \begin{verbatim} |
| --process_descriptor_DD /home/billy/resource/DD_foo.xml |
| \end{verbatim} |
| Alias: $--$process\_DD |
| |
| \item[$--$process\_failures\_limit {[integer]} ] |
| This specifies the maximum number of individual Job Process (JP) failures allowed |
| before killing the job. The default is twenty(20). If this limit is exceeded over the lifetime |
| of a job DUCC terminates the entire job. |
| Example: |
| \begin{verbatim} |
| --process_failures_limit 23 |
| \end{verbatim} |
| |
| \item[$--$process\_initialization\_failures\_cap {[integer]} ] This specifies the maximum |
| number of failures during a UIMA process's initialization phase. If the number is |
| exceeded the system will allow processes which are already running to continue, but |
| will assign no new processes to the job. The default is ninety-nine(99). Example: |
| \begin{verbatim} |
| --process_initialization_failures_cap 62 |
| \end{verbatim} |
| Note that the job is NOT killed if there are processes that have passed initialization and are |
| running. If this limit is reached, the only action is to not start new processes for the job. |
| |
| \item[$--$process\_initialization\_time\_max {[integer]}] This is the maximum time in minutes that |
| a process is allowed to remain in the ``initializing'' state, before DUCC terminates it. The |
| error counts as an initialization error towards the initialization failure cap. |
| |
| \item[$--$process\_jvm\_args {[list]} ] This specifies additional arguments to be passed to |
| all of the job processes as a blank-delimited list of strings. Example: |
| \begin{verbatim} |
| --process_jvm_args -Xmx400M -Xms100M |
| \end{verbatim} |
| Note: When used as a CLI option, the arguments must usually be |
| quoted to protect them from the shell. |
| |
| \item[$--$process\_memory\_size {[size]} ] This specifies the maximum amount of RAM in GB |
| to be allocated to each Job Process. This value is used by the Resource Manager to |
| allocate resources. |
| |
| \item[$--$process\_per\_item\_time\_max {[integer]} ] This specifies the maximum time in |
| minutes that the Job Driver will wait for a Job Processes to process a CAS. If a |
| timeout occurs the process is terminated and the CAS marked in error (not retried). If |
| not specified, the default is 24 hours. Example: |
| \begin{verbatim} |
| --process_per_item_time_max 60 |
| \end{verbatim} |
| |
| \item[$--$process\_pipeline\_count {[integer]} ] This specifies the number of pipelines per |
| process to be deployed, i.e. the number of work-items each JP will process simultaneously. |
| It is used by the Resource Manager to determine how many |
| processes are needed, by the Job Process wrapper to determine how many threads to |
| spawn, and by the Job Driver to determine how many CASs to dispatch. If not specified, |
| the default is 4. Example: |
| \begin{verbatim} |
| --process_pipeline_count 7 |
| \end{verbatim} |
| Alias: $--$process\_thread\_count |
| |
| \item[$--$scheduling\_class {[classname]} ] This specifies the name of the scheduling class |
| the used to determine the resource allocation for each process. The names of the |
| classes are installation dependent. |
| If not specified, the FAIR\_SHARE default is taken from the site class definitions file |
| described \hyperref[subsubsec:class.configuration]{here.} |
| Example: |
| \begin{verbatim} |
| --scheduling_class normal |
| \end{verbatim} |
| |
| \item[$--$service\_dependency{[list]}] This specifies a blank-delimited list of services the job |
| processes are dependent upon. Service dependencies are discussed in detail |
| \hyperref[sec:service.endpoints]{here}. Example: |
| \begin{verbatim} |
| --service_dependency UIMA-AS:Service1:tcp:host1:61616 UIMA-AS:Service2:tcp:host2:123 |
| \end{verbatim} |
| |
| \item[$--$specification, $-$f {[file]} ] |
| All the parameters used to submit a job may be placed in a standard Java properties file. |
| This file may then be used to submit the job (rather than providing all the parameters |
| directory to submit). The leading $--$ is omitted from the keywords. |
| |
| For example, |
| \begin{verbatim} |
| ducc_submit --specification job.props |
| ducc_submit -f job.props |
| \end{verbatim} |
| where job.props contains: |
| \begin{verbatim} |
| working_directory = /home/bob/projects/ducc/ducc_test/test/bin |
| process_failures_limit = 20 |
| driver_descriptor_CR = org.apache.uima.ducc.test.randomsleep.FixedSleepCR |
| environment = AE_INIT_TIME=10000 UIMA LD_LIBRARY_PATH=/a/bogus/path |
| log_directory = /home/bob/ducc/logs/ |
| process_pipeline_count = 1 |
| driver_descriptor_CR_overrides = jobfile:../simple/jobs/1.job compression:10 |
| process_initialization_failures_cap = 99 |
| process_per_item_time_max = 60 |
| driver_jvm_args = -Xmx500M |
| process_descriptor_AE = org.apache.uima.ducc.test.randomsleep.FixedSleepAE |
| classpath = /home/bob/duccapps/ducky_process.jar |
| description = ../simple/jobs/1.job[AE] |
| process_jvm_args = -Xmx100M -DdefaultBrokerURL=tcp://localhost:61616 |
| scheduling_class = normal |
| process_memory_size = 15 |
| \end{verbatim} |
| Note that properties in a specification file may be overridden by other command-line |
| parameters, as discussed \hyperref[chap:cli]{here}. |
| |
| \item[$--$suppress\_console\_log] If specified, suppress creation of the log files that |
| normally hold the redirected stdout and stderr. |
| |
| \item[$--$timestamp ] |
| If specified, messages from the submit process are timestamped. This is intended primarily |
| for use with a monitor with --wait\_for\_completion. |
| |
| \item[$--$wait\_for\_completion ] |
| If specified, the submit command monitors the job and prints periodic |
| state and progress information to the console. When the job completes, the monitor |
| is terminated and the submit command returns. If the command is interrupted, e.g. with CTRL-C, |
| the job will not be canceled unless {\em $--$cancel\_on\_interrupt} is also specified. |
| |
| \item[$--$working\_directory ] |
| This specifies the working directory to be set by the Job Driver and Job Process processes. |
| If not specified, the current directory is used. |
| \end{description} |
| |
| \paragraph{Notes:} |
| \phantomsection\label{par:cli.submit.notes} |
| When searching for UIMA XML resource files such as descriptors, DUCC searches either the |
| filesystem or Java classpath according to the following rules: |
| \begin{enumerate} |
| \item If the resource ends in .xml it is assumed the resource is a file in the filesystem |
| and the path is either an absolute path or a path relative to the specified working directory. [by location] |
| \item If the resource does not end in .xml, it is assumed the resource is in the Java |
| classpath. DUCC creates a resource name by replacing the "." separators with "/" and appending ".xml". [by name] |
| \end{enumerate} |