| % |
| % Licensed to the Apache Software Foundation (ASF) under one |
| % or more contributor license agreements. See the NOTICE file |
| % distributed with this work for additional information |
| % regarding copyright ownership. The ASF licenses this file |
| % to you under the Apache License, Version 2.0 (the |
| % "License"); you may not use this file except in compliance |
| % with the License. You may obtain a copy of the License at |
| % |
| % http://www.apache.org/licenses/LICENSE-2.0 |
| % |
| % Unless required by applicable law or agreed to in writing, |
| % software distributed under the License is distributed on an |
| % "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| % KIND, either express or implied. See the License for the |
| % specific language governing permissions and limitations |
| % under the License. |
| % |
| \section{Overview} |
| |
| DUCC is a multi-user, multi-system distributed application. |
| For first-time users a staged installation/verification methodology is recommended, |
| roughly as follows: |
| |
| \begin{itemize} |
| \item Single system installation - single node - all work runs with the credentials of the installer. |
| |
| \item Optionally add more nodes. |
| |
| \item Enable multiple-user support - processes run with the credentials of the submitting user, |
| while DUCC runs as user {\em ducc}. |
| This step requires root authority on one or more machines. |
| |
| \item Enable CGroup containers. This step requires root authority on every DUCC machine. |
| \end{itemize} |
| |
| When upgrading from an existing installation the {\em ducc\_update} script may be used |
| to replace the system files while leaving the site-specific configuration files in place. |
| For more information see |
| \ifdefined\DUCCSTANDALONE |
| {\em ``ducc\_update''} in the Administrative Commands section of the DuccBook. |
| \else |
| \hyperref[subsec:admin.ducc-update] {\em ducc\_update}. |
| \fi |
| |
| Since with this release the persistence data about completed work is stored in a database, |
| additional upgrade steps are required to convert the older file-based data in order to preserve information |
| about past work. |
| For more information see |
| \ifdefined\DUCCSTANDALONE |
| {\em ``db\_create''} and {\em ``db\_loader''} in the Administrative Commands section of the DuccBook. |
| \else |
| \hyperref[subsec:admin.db-create] {\em db\_create} and \hyperref[subsec:admin.db-loader] {\em db\_loader}. |
| \fi |
| |
| |
| DUCC is distributed as a compressed tar file. If building from source, this file will be created in your svn |
| trunk/target directory. The distribution file is in the form |
| \begin{verbatim} |
| uima-ducc-[version]-bin.tar.gz |
| \end{verbatim} |
| where [version] is the DUCC version; for example, {\em uima-ducc-2.1.0-bin.tar.gz}. This document will refer to the distribution |
| file as the ``$<$distribution.file$>$''. |
| |
| \section{Software Prerequisites} |
| \label{sec:install.prerequisites} |
| |
| Single system installation: |
| |
| \begin{itemize} |
| \item Reasonably current Linux. DUCC has been tested on SLES 11, RHEL 6 \& 7, and Ubuntu 14.04 |
| with 64-bit Intel, IBM Power (Big and Little Endian) hardware. |
| |
| {\em Note:} On some systems the default {\em user limits} |
| for max user processes (ulimit -u) and nfiles (ulimit -n) are defined too |
| low for DUCC. The shell login profile for user {\em ducc} should set the |
| soft limit for max user processes to be the same as the hard limit |
| (ulimit -u `ulimit -Hu`), and |
| the nfiles limit raised above 1024 to at least twice the number of user |
| processes running on the cluster. |
| |
| \item Python 2.x, where 'x' is 4 or greater. DUCC has not been tested on Python 3.x. |
| \item Java 7 or 8. DUCC has been tested and run using IBM and Oracle JDK 1.7 \& 1.8. |
| \item Passwordless ssh for the userid running DUCC |
| \end{itemize} |
| |
| Additional requirements for multiple system installation: |
| |
| \begin{itemize} |
| \item All systems must have a shared filesystem (such as NFS or GPFS) and common user space. |
| The \$DUCC\_HOME directory must be located on a shared filesystem. |
| \end{itemize} |
| |
| Additional requirements for running multiple user processes with their own credentials. |
| |
| \begin{itemize} |
| \item A userid {\em ducc}, and group {\em ducc}. User {\em ducc} must the the only member of group {\em ducc}. |
| \item DUCC run with user {\em ducc} credentials. |
| \item Root access is required to setuid-root the DUCC process launcher. |
| \end{itemize} |
| |
| Additional requirements for CGroup containers: |
| |
| \begin{itemize} |
| \item A userid {\em ducc}, and group {\em ducc}. User {\em ducc} must the the only member of group {\em ducc}. |
| \item DUCC run with user {\em ducc} credentials. |
| \item libcgroup1-0.37+ on SLES, libcgroup-0.37+ on RHEL, and on Ubuntu all of cgroup-bin cgroup-lite libcgroup1 |
| \item along with a customized /etc/cgconfig.conf |
| \end{itemize} |
| |
| |
| In order to build DUCC from source the following software is also required: |
| \begin{itemize} |
| \item A Subversion client, from \url{http://subversion.apache.org/packages.html}. The |
| svn url is \url{https://svn.apache.org/repos/asf/uima/uima-ducc/trunk}. |
| \item Apache Maven, from \url{http://maven.apache.org/index.html} |
| \end{itemize} |
| |
| The DUCC webserver server optionally supports direct ``jconsole'' attach to DUCC job processes. To install |
| this, the following is required: |
| \begin{itemize} |
| \item Apache Ant, any reasonably current version. |
| \end{itemize} |
| |
| To (optionally) build the documentation, the following is also required: |
| \begin{itemize} |
| \item Latex, including the \emph{pdflatex} and \emph{htlatex} packages. A good place |
| to start if you need to install it is \url{https://www.tug.org/texlive/}. |
| \end{itemize} |
| |
| More detailed one-time setup instructions for source-level builds via subversion can be found here: |
| \url{http://uima.apache.org/one-time-setup.html#svn-setup} |
| |
| \section{Building from Source} |
| |
| To build from source, ensure you have |
| Subversion and Maven installed. Extract the source from the SVN repository named above. |
| |
| Then from your extract directory into |
| the root directory (usually current-directory/trunk), and run the command |
| \begin{verbatim} |
| mvn install |
| \end{verbatim} |
| or |
| \begin{verbatim} |
| mvn install -Pbuild-duccdocs |
| \end{verbatim} |
| if you have LaTeX insalled and wish to do the optional build of documentation. |
| The build-duccdocs profile can also activated if the environment valiable BUILD\_DUCCDOCS is set true. |
| |
| If this is your first Maven build it may take quite a while as Maven downloads all the |
| open-source pre-requisites. (The pre-requisites are stored in the Maven repository, usually |
| your \$HOME/.m2). |
| |
| When build is complete, a tarball is placed in your current-directory/trunk/target |
| directory. |
| |
| \section{Documentation} |
| \begin{sloppypar} |
| After installation the DUCC documentation is found (in both PDF and HTML format) in the directory |
| ducc\_runtime/docs. As well, the DUCC webserver contains a link to the full documentation on each major page. |
| The API is documented only via JavaDoc, distributed in the webserver's root directory |
| {\tt \duccruntime/webserver/root/doc/api.} |
| \end{sloppypar} |
| |
| If building from source, Maven places the documentation in |
| \begin{itemize} |
| \item {\tt trunk/uima-ducc-duccdocs/target/site} (main documentation), and |
| \item {\tt trunk/target/site/apidocs} (API Javadoc) |
| \end{itemize} |
| |
| \section{Single System Installation and Verification} |
| |
| Although any user ID can be used to run a single-system DUCC, creating a ``ducc'' userid is recommended |
| to enable the later use of cgroups as well as running processes with the credentials of the submitting user. |
| |
| If multiple nodes are going to be added later, the ducc runtime tree should be installed |
| on a shared filesystem so that it can be mounted on the additional nodes. |
| |
| Verification submits a very simple UIMA pipeline for execution under DUCC. Once this is shown to be |
| working, one may proceed installing additional features. |
| |
| |
| \section{Minimal Hardware Requirements for Single System Installation} |
| \begin{itemize} |
| \item One Intel-based or IBM Power-based system (Big or Little Endian). (More systems may be added later.) |
| |
| \item 8GB of memory. 16GB or more is preferable for developing and testing applications beyond |
| the non-trivial. |
| |
| \item 1GB disk space to hold the DUCC runtime, system logs, and job logs. More is |
| usually needed for larger installations. |
| \end{itemize} |
| |
| Please note: DUCC is intended for scaling out memory-intensive UIMA applications over computing |
| clusters consisting of multiple nodes with large (16GB-256GB or more) memory. The minimal |
| requirements are for initial test and evaluation purposes, but will not be sufficient to run actual |
| workloads. |
| |
| \section{Single System Installation} |
| \label{subsec:install.single-user} |
| \begin{enumerate} |
| \item Expand the distribution file with the appropriate umask: |
| \begin{verbatim} |
| (umask 022 && tar -zxf <distribution.file>) |
| \end{verbatim} |
| |
| This creates a directory with a name of the form ``apache-uima-ducc-[version]''. |
| |
| This directory contains the full DUCC runtime which |
| you may use ``in place'' but it is highly recommended that you move it |
| into a standard location on a shared filesystem; for example, under ducc's HOME directory: |
| \begin{verbatim} |
| mv apache-uima-ducc-[version] /home/ducc/ducc_runtime |
| \end{verbatim} |
| |
| We refer to this directory, regardless of its location, as \duccruntime. For simplicity, |
| some of the examples in this document assume it has been moved to /home/ducc/ducc\_runtime. |
| |
| \item Change directories into the admin sub-directory of \duccruntime: |
| \begin{verbatim} |
| cd $DUCC_HOME/admin |
| \end{verbatim} |
| |
| \item Run the post-installation script: |
| \begin{verbatim} |
| ./ducc_post_install |
| \end{verbatim} |
| If this script fails, correct any problems it identifies and run it again. |
| |
| Note that {\em ducc\_post\_install} initializes various default parameters which |
| may be changed later by the system administrator. Therefore it usually should be |
| run only during this first installation step. |
| |
| \item If you wish to install jconsole support from the webserver, make sure Apache Ant |
| is installed, and run |
| \begin{verbatim} |
| ./sign_jconsole_jar |
| \end{verbatim} |
| This step may be run at any time if you wish to defer it. |
| |
| \end{enumerate} |
| |
| That's it, DUCC is installed and ready to run. (If errors were displayed during ducc\_post\_install |
| they must be corrected before continuing.) |
| |
| \section{Initial System Verification} |
| |
| Here we verify the system configuration, start DUCC, run a test Job, and then shutdown DUCC. |
| |
| To run the verification, issue these commands. |
| \begin{enumerate} |
| \item cd \duccruntime/admin |
| \item ./check\_ducc |
| |
| Examine the output of check\_ducc. If any errors are shown, correct the errors and rerun |
| check\_ducc until there are no errors. |
| \item Finally, start ducc: ./start\_ducc |
| \end{enumerate} |
| |
| Start\_ducc will first perform a number of consistency checks. |
| It then starts the ActiveMQ broker, the DUCC control processes, and a single DUCC agent on the |
| local node. |
| |
| You will see some startup messages similar to the following: |
| |
| \begin{verbatim} |
| ENV: Java is configured as: /share/jdk1.7/bin/java |
| ENV: java full version "1.7.0_40-b43" |
| ENV: Threading enabled: True |
| MEM: memory is 15 gB |
| ENV: system is Linux |
| allnodes /home/ducc/ducc_runtime/resources/ducc.nodes |
| Class definition file is ducc.classes |
| OK: Class and node definitions validated. |
| OK: Class configuration checked |
| Starting broker on ducchead.biz.org |
| Waiting for broker ..... 0 |
| Waiting for broker ..... 1 |
| ActiveMQ broker is found on configured host and port: ducchead.biz.org:61616 |
| Starting 1 agents |
| ********** Starting agents from file /home/ducc/ducc_runtime/resources/ducc.nodes |
| Starting warm |
| Waiting for Completion |
| ducchead.biz.org Starting rm |
| PID 14198 |
| ducchead.biz.org Starting pm |
| PID 14223 |
| ducchead.biz.org Starting sm |
| PID 14248 |
| ducchead.biz.org Starting or |
| PID 14275 |
| ducchead.biz.org Starting ws |
| PID 14300 |
| ducchead.biz.org |
| ducc_ling OK |
| DUCC Agent started PID 14325 |
| All threads returned |
| \end{verbatim} |
| |
| Now open a browser and go to the DUCC webserver's url, http://$<$hostname$>$:42133 where $<$hostname$>$ is |
| the name of the host where DUCC is started. Navigate to the Reservations page via the links in |
| the upper-left corner. You should see the DUCC JobDriver reservation in state |
| WaitingForResources. In a few minutes this should change to Assigned. |
| Now jobs can be submitted. |
| |
| To submit a job, |
| \begin{enumerate} |
| \item \duccruntime/bin/ducc\_submit --specification \duccruntime/examples/simple/1.job |
| \end{enumerate} |
| |
| Open the browser in the DUCC jobs page. You should see the job progress through a series of |
| transitions: Waiting For Driver, Waiting For Services, Waiting For Resources, Initializing, and |
| finally, Running. You'll see the number of work items submitted (15) and the number of work |
| items completed grow from 0 to 15. Finally, the job will move into Completing and then |
| Completed.. |
| |
| Since this example does not specify a log directory DUCC will create a log directory in your HOME directory under |
| \begin{verbatim} |
| $HOME/ducc/logs/job-id |
| \end{verbatim} |
| |
| In this directory, you will find a log for the sample job's JobDriver (JD), JobProcess (JP), and |
| a number of other files relating to the job. |
| |
| This is a good time to explore the DUCC web pages. Notice that the job id is a link to a set of |
| pages with details about the execution of the job. |
| |
| Notice also, in the upper-right corner is a link to the full DUCC documentation, the ``DuccBook''. |
| |
| Finally, stop DUCC: |
| \begin{enumerate} |
| \item cd \duccruntime/admin |
| \item./stop\_ducc -a |
| \end{enumerate} |
| |
| |
| \section{Add additional nodes to the DUCC cluster} |
| Additional nodes must meet all |
| \ifdefined\DUCCSTANDALONE |
| {\em prerequisites} (listed above). |
| \else |
| \hyperref[sec:install.prerequisites]{\em prerequisites}. |
| \fi |
| |
| \$DUCC\_HOME must be on a shared filesystem and mounted at the same location |
| on all DUCC nodes. |
| |
| If user's home directories are on local filesystems the location for user logfiles |
| should be specified to be on a shared filesystem. |
| |
| Addional nodes are normally added to a worker node group. Note that the |
| DUCC head node does not have to be a worker node. |
| In addition, the webserver node can be separate from the DUCC head node |
| (see webserver configuration options in ducc.properties). |
| |
| For worker nodes DUCC needs to know what node group |
| each machine belongs to, and what nodes need an Agent process to be started on. |
| |
| The configuration shipped with DUCC have all nodes in the same "default" node pool. |
| Worker nodes are listed in the file |
| \begin{verbatim} |
| $DUCC_HOME/resources/ducc.nodes. |
| \end{verbatim} |
| |
| During initial installation, this file was initialized with the node DUCC is installed on. |
| Additional nodes may be added to the file using a text editor to increase the size of the DUCC |
| cluster. |
| |
| |
| \section{Ducc\_ling Configuration - Running with credentials of submitting user} |
| \label{sec:duccling.install} |
| |
| DUCC launches user processes through ducc\_ling, a small native C application. |
| By default the resultant process runs with the credentials of the user ID of |
| the DUCC application. It is possible for multiple users to submit work to |
| DUCC in this configuration, but it requires that the user ID running DUCC has |
| write access to all directories to which the user process outputs data. |
| By configuring the ducc user ID and ducc\_ling correctly, work submitted by |
| all users will run with their own credentials. |
| |
| Before proceeding with this step, please note: |
| \begin{itemize} |
| \item The sequence operations consisting of {\em chown} and {\em chmod} MUST be performed |
| in the exact order given below. If the {\em chmod} operation is performed before |
| the {\em chown} operation, Linux will regress the permissions granted by {\em chmod} |
| and ducc\_ling will be incorrectly installed. |
| \end{itemize} |
| |
| ducc\_ling is designed to be a setuid-root program whose function is to run user processes with the identity of |
| the submitting user. This must be installed correctly; incorrect installation can prevent jobs from running as |
| their submitters, and in the worse case, can introduce security problems into the system. |
| |
| ducc\_ling can either be installed on a local disk on every system in the DUCC cluster, |
| or on a shared-filesystem that does not suppress setuid-root permissions on client nodes. |
| The path to ducc\_ling must be the same on each DUCC node. |
| The default path configuration is |
| \${DUCC\_HOME}/admin/\$\{os.arch\}/ in order to handle clusters with mixed OS platforms. |
| \$\{os.arch\} is the architecture specific value of the Java system property with that name; |
| examples are amd64 and ppc64. |
| |
| The steps are: build ducc\_ling for each node architecture to be added to the cluster, |
| copy ducc\_ling to the desired location, and then configure ducc\_ling to give user |
| ducc the ability to spawn a process as a different user. |
| |
| In the example below ducc\_ling is left under \$DUCC\_HOME, where it is built. |
| |
| As user {\em ducc}, build ducc\_ling for necessary architectures (this is done |
| automatically for the DUCC head machine by the ducc\_post\_install script). |
| For each unique OS platform: |
| \begin{enumerate} |
| \item cd \$DUCC\_HOME/admin |
| \item ./build\_duccling |
| \end{enumerate} |
| |
| Then, as user {\em root} on the shared filesystem, {\em cd \$DUCC\_HOME/admin}, and for each unique OS architecture: |
| \begin{enumerate} |
| \item chown ducc.ducc \$\{os.arch\} |
| \\ (set directory ownership to be user ducc, group ducc) |
| \item chmod 700 \$\{os.arch\} |
| \\ (only user ducc can read contents of directory) |
| \item chown root.ducc \$\{os.arch\}/ducc\_ling |
| \\ (make root owner of ducc\_ling, and let users in group ducc access it) |
| \item chmod 4750 \$\{os.arch\}/ducc\_ling |
| \\ (ducc\_ling runs as user root when started by users in group ducc) |
| \end{enumerate} |
| |
| If these steps are correctly performed, ONLY user {\em ducc} may use the ducc\_ling program in |
| a privileged way. ducc\_ling contains checks to prevent even user {\em root} from using it for |
| privileged operations. |
| |
| If a different location is chosen for ducc\_ling the new path needs to be specified |
| for ducc.agent.launcher.ducc\_spawn\_path in \$DUCC\_HOME/resources/site.ducc.properties. |
| For more information see |
| \ifdefined\DUCCSTANDALONE |
| {\em ``Properties merging''} in the DuccBook. |
| \else |
| \hyperref[sec:admin.properties-merge] {Properties merging}. |
| \fi |
| |
| |
| \section{CGroups Installation and Configuration} |
| |
| \begin{description} |
| \item[Note:] A key feature of DUCC is to run user processes in CGroups in order to guarantee |
| each process always has the amount of RAM requested. RAM allocated to the managed process |
| (and any child processes) that exceed requested DUCC memory size will be forced into swap space. |
| Without CGroups a process that exceeds its requested memory size by N\% is killed |
| (default N=5 in ducc.properties), and memory use by child processes is ignored. |
| |
| DUCC's CGroup configuration also allocates CPU resources to managed processes based on |
| relative memory size. A process with 50\% of a machine's RAM will be guaranteed at least |
| 50\% of the machine's CPU resources as well. |
| \end{description} |
| |
| The steps in this task must be done as user root and user ducc. |
| |
| To install and configure CGroups for DUCC: |
| \begin{enumerate} |
| \item Install the appropriate |
| \ifdefined\DUCCSTANDALONE |
| libcgroup package at level 0.37 or above (see {\em Installation Prerequisites}). |
| \else |
| \hyperref[sec:install.prerequisites]{libcgroup package} at level 0.37 or above. |
| \fi |
| |
| \item Configure /etc/cgconfig.conf as follows: |
| \begin{verbatim} |
| # Mount cgroups for older OS (e.g. RHEL v6) |
| # For newer OS, remove entire mount block |
| mount { |
| cpuset = /cgroup/cpuset; |
| cpu = /cgroup/cpu; |
| cpuacct = /cgroup/cpuacct; |
| memory = /cgroup/memory; |
| devices = /cgroup/devices; |
| freezer = /cgroup/freezer; |
| net_cls = /cgroup/net_cls; |
| blkio = /cgroup/blkio; |
| } |
| # Define cgroup ducc and setup permissions |
| group ducc { |
| perm { |
| task { |
| uid = ducc; |
| } |
| admin { |
| uid = ducc; |
| } |
| } |
| memory {} |
| cpu{} |
| } |
| \end{verbatim} |
| \item Start the cgconfig service: |
| \begin{verbatim} |
| service cgconfig start |
| \end{verbatim} |
| |
| \item Verify cgconfig service is running by the existence of directory: |
| \begin{verbatim} |
| /cgroups/ducc |
| \end{verbatim} |
| |
| \item Configure the cgconfig service to start on reboot: |
| \begin{verbatim} |
| chkconfig cgconfig on |
| \end{verbatim} |
| \end{enumerate} |
| |
| {\em Note:} if CGroups is not installed on a machine the DUCC Agent will detect this and not |
| attempt to use the feature. CGroups can also be disabled for all machines |
| (see |
| \ifdefined\DUCCSTANDALONE |
| {\em ducc.agent.launcher.cgroups.enable} in ducc.properties, described in the DuccBook.) |
| \else |
| \hyperref[itm:props-agent.cgroups.enable] {\em ducc.agent.launcher.cgroups.enable}) |
| \fi |
| or it can be disabled for individual machines (see |
| \ifdefined\DUCCSTANDALONE |
| {\em ducc.agent.exclusion.file} in ducc.properties, described in the DuccBook.) |
| \else |
| \hyperref[itm:props-agent.cgroups.exclusion]{\em ducc.agent.exclusion.file}). |
| \fi |
| |
| |
| |
| \section{Full DUCC Verification} |
| |
| This is identical to initial verification, with the one difference that the job ``1.job'' should be |
| submitted as any user other than ducc. Watch the webserver and check that the job executes |
| under the correct identity. Once this completes, DUCC is installed and verified. |
| |
| \section{Enable DUCC webserver login} |
| |
| This step is optional. As shipped, the webserver is disabled for |
| logins. This can be seen by hovering over the Login text located in the |
| upper right of most webserver pages: |
| \begin{verbatim} |
| System is configured to disallow logins |
| \end{verbatim} |
| |
| To enable logins, a Java-based authenticator must be plugged-in and the |
| login feature must be enabled in the ducc.properties file by the DUCC |
| administrator. Also, ducc\_ling should be properly deployed (see |
| Ducc\_ling Installation section above). |
| |
| A beta version of a Linux-based authentication plug-in is shipped with DUCC. |
| It can be found in the source tree: |
| \begin{verbatim} |
| org.apache.uima.ducc.ws.authentication.LinuxAuthenticationManager |
| \end{verbatim} |
| |
| The Linux-based authentication plug-in will attempt to validate webserver |
| login requests by appealing to the host OS. The user who wishes to |
| login provides a userid and password to the webserver via https, which |
| in-turn are handed-off to the OS for a success/failure reply. |
| |
| To have the webserver employ the beta Linux-based authentication plug-in, |
| the DUCC administrator should perform the following as user ducc: |
| \begin{verbatim} |
| 1. edit ducc.properties |
| 2. locate: ducc.ws.login.enabled = false |
| 3. modify: ducc.ws.login.enabled = true |
| 4. save |
| \end{verbatim} |
| |
| Note: The beta Linux-based authentication plug-in has limited testing. |
| In particular, it was tested using: |
| \begin{verbatim} |
| Red Hat Enterprise Linux Workstation release 6.4 (Santiago) |
| \end{verbatim} |
| |
| Alternatively, you can provide your own authentication plug-in. To do so: |
| \begin{verbatim} |
| 1. author a Java class that implements |
| org.apache.uima.ducc.common.authentication.IAuthenticationManager |
| 2. create a jar file comprising your authentication class |
| 3. put the jar file in a location accessible by the DUCC webserver, such as |
| $DUCC_HOME/lib/authentication |
| 4. put any authentication dependency jar files there as well |
| 5. edit ducc.properties |
| 6. add the following: |
| ducc.local.jars = authentication/* |
| ducc.authentication.implementer=<your.authenticator.class.Name> |
| 7. locate: ducc.ws.login.enabled = false |
| 8. modify: ducc.ws.login.enabled = true |
| 9. save |
| \end{verbatim} |
| |