| % |
| % Licensed to the Apache Software Foundation (ASF) under one |
| % or more contributor license agreements. See the NOTICE file |
| % distributed with this work for additional information |
| % regarding copyright ownership. The ASF licenses this file |
| % to you under the Apache License, Version 2.0 (the |
| % "License"); you may not use this file except in compliance |
| % with the License. You may obtain a copy of the License at |
| % |
| % http://www.apache.org/licenses/LICENSE-2.0 |
| % |
| % Unless required by applicable law or agreed to in writing, |
| % software distributed under the License is distributed on an |
| % "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| % KIND, either express or implied. See the License for the |
| % specific language governing permissions and limitations |
| % under the License. |
| % |
| |
| \section{Administrative Commands} |
| |
| The administrative commands include a command to start DUCC, one to stop it, and one to |
| verify the configuration and query the state of the cluster. |
| |
| Note: The scripting that supports some of these functions runs (by default) in multi-threaded mode so |
| large clusters can be started, stopped, and queried quickly. If DUCC is running on an older |
| system, the threading may not work right, in which case the scripts detect this and run |
| single-threaded. As well, all these commands support a ``--nothreading'' option to manually |
| disable the threading. |
| |
| \subsection{start\_ducc} |
| \label{subsec:admin.start-ducc} |
| |
| \subsubsection{{\em Description}} |
| The command \ducchome/admin/start\_ducc is used to start DUCC processes. |
| It must be run from the head node. |
| If run with no parameters it takes the following actions: |
| \begin{itemize} |
| \item Starts the ActiveMQ server. |
| \item Starts the database. |
| \item Starts the management processes Resource Manager, Orchestrator, Process Manager, |
| Services Manager, and Web Server on the local node. |
| \item Starts an agent process on every node named in the default node list. |
| \end{itemize} |
| |
| \subsubsection{{\em Usage}} |
| |
| \begin{description} |
| \item[start\_ducc {[options]}] \hfill \\ |
| If no options are given, all DUCC processes are started, using the default node list, |
| {\em ducc.nodes}. |
| |
| \end{description} |
| |
| \subsubsection{{\em Options: }} |
| \begin{description} |
| |
| \item[-n, --nodelist {[nodefile] }] \hfill \\ |
| Start agents on the nodes in the nodefile. Multiple nodefiles may be specified: |
| \begin{verbatim} |
| start\_ducc -n foo.nodes -n bar.nodes -n baz.nodes |
| \end{verbatim} |
| |
| |
| \item[-c, --component {[component] }] \hfill \\ |
| Start a specific DUCC component, optionally on a specific node. If the component |
| name is qualified with a nodename, the component is started on that node. To qualify |
| a component name with a destination node, use the notation component@nodename. |
| Multiple components may be specified: |
| \begin{verbatim} |
| start\_ducc -c sm -c pm -c rm -c or@bj22 -c agent@n1 -c agent@n2 |
| \end{verbatim} |
| |
| Components include: |
| \begin{description} |
| \item[rm] The Resource Manager |
| \item[or]The Orchestrator |
| \item[pm]The Process Manager |
| \item[sm]The Service Manager |
| \item[ws]The Web Server |
| \item[agent@node]Node Agents |
| \item[broker] ActiveMQ broker |
| \item[db] Database |
| \end{description} |
| |
| \item[--nothreading] If specified, the command does not run in multi-threaded mode |
| even if it is supported on the local platform. |
| |
| \end{description} |
| |
| \subsubsection{{\em Notes: }} |
| A different nodelist may be used to specify where Agents are started. As well multiple node |
| lists may be specified, in which case Agents are started on all the nodes in the multiple node |
| lists. |
| |
| To start only agents, run start\_ducc specifying a nodelist explicitly. Note that the broker |
| must have already been started. |
| |
| To start a specific management process, run start\_ducc with the -c component parameter, |
| specify the component that should be started. |
| |
| \subsubsection{{\em Examples: }} |
| |
| Start some nodes from two different nodelist and start the broker. |
| \begin{verbatim} |
| start\_ducc -n foo.nodes -n bar.nodes -c broker |
| \end{verbatim} |
| |
| Start an agent on a specific node: |
| \begin{verbatim} |
| start\_ducc -c agent@a.specific.node |
| \end{verbatim} |
| |
| Start the webserver on node 'bingle': |
| \begin{verbatim} |
| start\_ducc -c ws@bingle |
| \end{verbatim} |
| |
| \subsubsection{{\em Debugging:}} |
| |
| Sometimes something will not start and it can be difficult to understand why. To diagnose, it is |
| helpful to know that {\em start\_ducc} is simply a wrapper around a lower-level bit of scripting |
| that does the actual work. That lower-level code can be invoked stand-alone, in which case |
| console messages that {\em start\_ducc} will have suppressed are presented to the console. |
| |
| The lower-level script is called {\em ducc.py} and accepts the same {\em -c component} flag as |
| start\_ducc. If some component will not start, try running {\em ducc.py -c component} directly. |
| It will start in the foreground and usually the cause of the problem becomes evident from |
| the console. |
| |
| For example, suppose the Resource Manager will not start. Run the following: |
| \begin{verbatim} |
| ./ducc.py -c rm |
| \end{verbatim} |
| and examine the output. Use {\em CTL-C} to stop the component when done. |
| |
| |
| \subsection{stop\_ducc} |
| \label{subsec:admin.stop-ducc} |
| |
| \subsubsection{{\em Description:}} |
| Stop\_ducc is used to stop DUCC processes. At least one parameter is required. |
| When {\em -a} is specified, the following actions are taken: |
| \begin{itemize} |
| \item Uses the ActiveMQ broker to broadcast a shutdown request to all |
| DUCC compoments, other than the ActiveMQ broker itself, and the database. |
| \item Waits a bit, for all daemons to stop. |
| \item Stops the database. |
| \item Stops the ActiveMQ broker. |
| \end{itemize} |
| |
| |
| \subsubsection{\em Usage:} |
| |
| \begin{description} |
| \item[stop\_ducc {[options]}] \hfill \\ |
| If no options are given, help text is presented. At least one option is required, to avoid |
| accidental cluster shutdown. |
| \end{description} |
| |
| |
| \subsubsection{Options:} |
| \begin{description} |
| |
| \item[-a --all] \hfill \\ |
| Stop all the DUCC processes, including agents and management processes. This |
| broadcasts a "shutdown" command to all DUCC processes. Shutdown is normally |
| performed gracefully with all processes given time to save state. |
| All user processes, both jobs and services, are sent shutdown signals. Job and service |
| processes which do not shutdown within a designated grace period are then forcibly |
| terminated with kill -9. |
| |
| \item[-n, --nodelist {[nodefile]}] \hfill \\ |
| Only the DUCC agents in the designated nodelists are shutdown. The processes are sent |
| kill -INT signals which triggers the Java shutdown hooks and enables graceful shutdown. |
| All user processes on the indicated nodes, both jobs and services, are sent "shutdown" |
| signals and are given a minute to shutdown gracefully. Job and service processes which do |
| not shutdown within a designated grace period are then forcibly terminated with kill -9. |
| |
| \begin{verbatim} |
| stop\_ducc -n foo.nodes -n bar.nodes -n baz.nodes |
| \end{verbatim} |
| |
| \item[-c, --component {[component]}] \hfill \\ |
| Stop a specific DUCC component. |
| |
| This may be used to stop an errant management component and subsequently restart it |
| (with start\_ducc). |
| |
| This may also be used to stop a specific agent and the job and services processes it is |
| managing, without the need to specify a nodelist. |
| |
| Examples: |
| |
| Stop agents on nodes n1 and n2: |
| \begin{verbatim} |
| stop_ducc -c agent@n1 -c agent@n2 |
| \end{verbatim} |
| |
| Stop and restart the rm: |
| \begin{verbatim} |
| stop_ducc -c rm |
| start_ducc -c rm |
| \end{verbatim} |
| |
| Components include: |
| \begin{description} |
| \item[rm] The Resource Manager. |
| \item[or] The Orchestrator. |
| \item[pm] The Process Manager. |
| \item[sm] The Service Manager. |
| \item[ws] The Web Server. |
| \item[db] The database. |
| \item[broker] The ActiveMQ broker (only if the broker is auto-managed). |
| \item[agent@node] Node Agent on the specified node. |
| \end{description} |
| |
| \item[-w, --wait {[time in seconds]}] If given, this signals the time to wait |
| after broadcasting the shutdown signal, and before stopping the ActiveMQ broker itself. |
| If not specified, the default is 60 seconds. |
| |
| NOTE: In production systems, it is generally wise to use the default of 60 seconds. For |
| test systems a shorter wait speeds cycle time. Be sure to use {\em check\_ducc -k} after |
| {\em stop\_ducc} if you change the wait time to insure all processes are actually stopped. |
| |
| \item[--nothreading] If specified, the command does not run in multi-threaded mode |
| even if it is supported on the local platform. |
| |
| \end{description} |
| |
| \subsubsection{{\em Notes:}} |
| Sometimes problems in the network or elsewhere prevent the DUCC components from stopping properly. The |
| {\em check\_ducc} command, described in the following section, contains options to query the |
| existance of DUCC processes in the cluster, to forcibly ({\em kill -9}) terminate them, and to |
| more gracefully terminate them ({\em kill -INT}). |
| |
| |
| |
| \subsection{check\_ducc} |
| \label{subsec:admin.check-ducc} |
| \subsubsection{{\em Description:}} |
| |
| Check\_ducc is used to verify the integrity of the DUCC installation and to find and report on |
| DUCC processes. It identifies processes owned by ducc (management processes, agents, |
| and job processes), and processes started by DUCC on behalf of users. |
| |
| Check\_ducc can also be used to clean up errant DUCC processes when stop\_ducc is unable |
| to do so. The difference is that stop\_ducc generally tries more gracefully stop processes. |
| check\_ducc is used as a last resort, or if a fast but graceless shutdown is desired. |
| |
| \subsubsection{\em{Usage: }} |
| |
| \begin{description} |
| \item[check\_ducc {[options]}] |
| If no options are given this is the equivalent of: |
| \begin{verbatim} |
| check_ducc -c -n ../resources/ducc.nodes |
| \end{verbatim} |
| |
| This verifies the integrity of the DUCC installation and searches for all the |
| processes owned by user {\em ducc} and started by DUCC on all the nodes in ducc.nodes. |
| \end{description} |
| |
| \subsubsection{\em{Options:}} |
| \begin{description} |
| \item[-n --nodelist {[nodefile]}] |
| Only the nodes specified in the nodefile are searched. The option may be specified |
| multiple times for multiple nodefiles. Note that the "local" node is always checked as well. |
| \begin{verbatim} |
| check_ducc -n nlist1 -n nlist2 |
| \end{verbatim} |
| |
| \item[-c --configuration] |
| Verify the \hyperref[sec:ducc.classes]{Resource Manager configuration}. |
| |
| \item[-p --pids] |
| Rewrite the PID file. The PID file contains the process ids of all known DUCC |
| management and agent processes. The PID file is normally managed by start\_ducc and |
| stop\_ducc and is stored in the file {\em ducc.pids} in directory {\em ducc\_runtime/state}. |
| |
| Occasionally the PID file can become partially or fully corrupted; for example, if a DUCC |
| process dies spontaneously. Use check\_ducc -p to search the cluster for processes and |
| refresh the PID file. |
| |
| \item[-i, --int] \hfill \\ |
| Use this to send a shutdown signal ({\em kill -INT}) to all the DUCC processes. The DUCC processes |
| catch this signal, close their resources and exit. Some resources take some time to close, or in |
| case of problems, are unable to close, in which case the DUCC processes will unconditionally exit. |
| |
| Sometimes problems in the network or elsewhere prevent {\em check\_ducc -i} from terminating |
| the DUCC processes. In this case, use {\em check\_ducc -k}, described below. |
| |
| \item[-k, --kill] \hfill \\ |
| Use this to forcibly kill a component using kill -9. This should only be used if {\em stop\_ducc} |
| or {\em check\_ducc -i} does not work. |
| |
| \item[--nothreading] If specified, the command does not run in multi-threaded mode |
| even if it is supported on the local platform. |
| |
| \item[-v, --verbose] \hfill \\ |
| When specified with {\em -c} to check the configuration, this emits a formatted version |
| of the node list showing the full structure of the scheduling classes. |
| |
| |
| \end{description} |
| |
| |
| \subsection{ducc\_post\_install} |
| \label{subsec:admin.ducc-post-install} |
| |
| \paragraph{Description:} |
| The post-installation script must be run only after the first installation of DUCC. |
| When updating an existing installation use \hyperref[subsec:admin.ducc-update]{\em ducc\_update}. |
| ducc\_post\_install performs these tasks: |
| \begin{enumerate} |
| \item Verifies that the correct level of Java and Python are installed and available. |
| \item Creates a default nodelist, \duccruntime/resources/ducc.nodes, containing the name of the node you are installing on. |
| \item Defines the ``ducc head'' node to be the node you are installing from. |
| \item Initializes the database. |
| \item Sets up the default https keystore for the webserver. |
| \item Installs the DUCC documentation ``ducc book'' into the DUCC webserver root. |
| \item Builds and installs the C program, ``ducc\_ling'', into the default location. |
| \item Ensures that the (supplied) ActiveMQ broker is runnable. |
| \end{enumerate} |
| |
| Once the script completes successfully \hyperref[subsec:admin.start-ducc]{\em start\_ducc} will run a single-user/unprivileged DUCC. |
| |
| \paragraph{Notes:} |
| If the script is rerun it will rename the previously created files so any customizations applied |
| can be recovered. |
| |
| \subsection{ducc\_update} |
| \label{subsec:admin.ducc-update} |
| |
| \paragraph{Description:} |
| This command is used to unpack a new release of DUCC and create a new installion or update |
| an existing one. |
| For a new installation it simply unpacks the tar file with the appropriate permissions. |
| The setup must be completed by running \hyperref[subsec:admin.ducc-post-install]{\em ducc\_post\_install}. |
| |
| When updating an existing installation it performs the following actions: |
| \begin{enumerate} |
| \item Checks that DUCC is not running. |
| \item Creates a site.ducc.properties file if updating from DUCC 1.1.0. |
| \item Creates a time-stamped archive directory to hold the old runtime. |
| \item Archives current files before updating them, except for the customizable ones. |
| \item Leaves in place any files added to the directories that may hold csite-specific files. |
| \item Reports which are replaced, added, or kept. |
| \item Rebuilds the non-privileged ducc\_ling. |
| \end{enumerate} |
| |
| The site-specific files, those holding customizations such as node and class definitions |
| as well as logs and job history, are left in place, |
| while all replaced files are archived under a folder called {\em ducc\_archives} |
| so the previous installation can be restored if necessary. |
| |
| Note that the update does not create the database. After updating to 2.1.0 from an earlier |
| version with the file-based persistent scheme the database should be created with |
| \hyperref[subsec:admin.db-create]{\em db\_create} |
| and the files holding state such as job history and service registrations loaded into the database with |
| \hyperref[subsec:admin.db-loader]{\em db\_loader}. |
| If this conversion is omitted DUCC will continue to use the file-based scheme but with some |
| loss of functionality that the database design would provide. |
| |
| \paragraph{Usage:} |
| This command takes two parameters, a pointer to the DUCC\_HOME to be updated or created, |
| and the name of the tar file containg the new build. |
| \begin{description} |
| \item[ducc\_update {\em some-ducc-home} {\em binary-tar-file}] |
| Update an existing DUCC installation or install a new one. |
| \end{description} |
| |
| \paragraph{Arguments:} |
| \begin{description} |
| \item[{\em some-ducc-home}] |
| This specifies the DUCC\_HOME you wish to create or update. If it doesn't exist a new |
| installation is created, otherwise it is updated. |
| \item[{\em binary-tar-file}] |
| The name of the binary tar file containing the new build. |
| \end{description} |
| |
| \paragraph{Example:} |
| \begin{verbatim} |
| ducc_update ducc_runtime /home/ducc/ducc_runtime Downloads/uima-ducc-2.1.0-bin.tar.gz |
| \end{verbatim} |
| |
| \subsection{rm\_reconfigure} |
| \label{subsec:admin.rm-reconfigure} |
| |
| \subsubsection{{\em Description:}} |
| Rm\_reconfigure is used to force the Resource Manager (RM) to reread all its configuration |
| files and reconfigure itself accordingly, without the need to fully stop and restart RM. |
| This is generally much faster than RM restart and avoids losing most state messages from |
| the other DUCC processes. |
| |
| The {\em rm\_reconfigure} command first performs a |
| \hyperref[sec:admin.properties-merge]{properties merge.} |
| |
| RM then validates the new |
| configuration, and if no errors are found, saves certain information such as current node |
| online-offline status. It then rereads the following configuration files and rebuilds its |
| internal structures accordingly: |
| \begin{itemize} |
| \item ducc.properties (after merging default.ducc.properties and site.ducc.properties, |
| \item ducc.classes, |
| \item log4j.xml. |
| \end{itemize} |
| The saved configuration is then restored into the newly configured structures. |
| On receipt of the next Orchestrator state, the RM fully rebuilds its state from the current |
| DUCC load and scheduling restarts. |
| |
| Depending on the nature of the new configuration, the current load may be adjusted; for |
| example, if the weight of a fair-share class is changed, preemptions or extra allocations |
| may be performed. |
| |
| If the new configuration is not consistent with the current load, a number of more drastic |
| adjustments will be performed: |
| \begin{itemize} |
| \item If a fair-share class is deleted, all existing jobs for that class are preempted |
| and a {\em refusal} message is sent to the Orchestrator for each affected job. |
| \item If a fair-share class is redefined over a different nodepool such that existing |
| work are no longer legally scheduled, any shares allocated over inappropriate |
| hosts are {\em preempted}. As soon as the preemptions are acknowledged, the RM |
| will reschedule the shares over the differently-configured resources. |
| \item If a non-preemptable class is deleted or reconfigured so existing non-preempt able |
| work is no longer allocated correctly, the following will occur: |
| \begin{itemize} |
| \item If the shares are for services, they are deallocated and a {\em refusal} is |
| sent to the Orchestrator. The Service Manager will observe this and restart the |
| processes, causing them to be reallocated over the changed configuration. |
| \item Otherwise, the RM leaves the allocation in place, but places the hosts on an |
| internal {\em blacklist}, preventing subsequent scheduling to those hosts. Once |
| the (now) incorrectly placed shares are freed (e.g. by canceling a reservation or |
| exit of a managed reservation), the hosts are again white listed and made available |
| for scheduling. |
| \end{itemize} |
| \end{itemize} |
| |
| In short, the RM makes every effort to avoid disturbing existing allocations, and blacklists |
| hosts that are no longer consistently configured for the current load, until such time as |
| the allocations on those hosts are released. |
| |
| \subsubsection{\em Usage:} |
| |
| \begin{description} |
| \item[rm\_reconfigure] \hfill \\ |
| This command has no options. |
| \end{description} |
| |
| |
| \iffalse % Dropped this script for 2.1 .... needs work |
| |
| \subsection{rm\_qload} |
| \label{subsec:admin.rm-qload} |
| |
| \subsubsection{{\em Description:}} |
| Rm\_qload is used to query the Resource Manager's scheduling tables to determine the |
| current demand and capacity of the system, as the RM sees it. The primary purpose |
| is to provide information to adjunct resource managers (such as a ``cloud'') to |
| determine the current needs, or lack thereof, of the system. The administrative |
| command is implemented as a Python script that interacts with the underlying |
| Java ``RmQueryLoadReply'' API and is provided mostly as an example of how |
| scripting can be used to interact with the RM. |
| |
| After displaying the current scheduling quantum, the response is provided in two sections: |
| \begin{enumerate} |
| \item Information showing the current demand and usage of resource classes, and |
| \item Information showing the current nodepool usage. |
| \end{enumerate} |
| |
| \subsubsection{\em Class section} |
| Three lines are emitted per class: |
| \begin{enumerate} |
| \item The name of the class and its scheduling policy, |
| \item A line showing the {\em demand}, or {\em request} by quantum, on the class, and |
| \item A line showing the {\em usage}, or {\em award}, by quantum on the class. |
| \end{enumerate} |
| |
| The numbers shown for {\em request} and {\em award} show the number of processes, by |
| memory, in terms of scheduling quantum, for each class. For example, assuming the |
| scheduling quantum is 15GB, the following shows: |
| \begin{itemize} |
| \item Five processes of quantum 2 (15-30GB) are requested, but only two have been awarded, |
| \item Three processes of quantum 3 (31-45GB) are requested and all have been awarded, |
| \item Four processes of quantum 4 (46-60GB) are requested, and two have been awarded. |
| \end{itemize} |
| \begin{verbatim} |
| Class normal policy FAIR_SHARE |
| requested 0 0 5 3 4 0 0 0 0 |
| awarded 0 0 2 3 2 0 0 0 0 |
| \end{verbatim} |
| |
| \subsubsection{\em Nodepool section} |
| Six lines are displayed for each nodepool: |
| \begin{enumerate} |
| \item The name of the nodepool, |
| \item A summary showing the number hosts in the pool which are online, dead (unresponsive), and |
| varied-off, the total quantum shares available to the nodepool, and the total unscheduled or |
| {\em free} shares. |
| \item The number of hosts known to the nodepool, by quantum, similar to the class listings above, |
| \item The nubmer of online hosts, by quantum, |
| \item The number of completely free hosts by quantum (no work currently scheduled), and |
| \item The number of {\em virtual} hosts, by quantum. A {\em virtual host} is created when a |
| host is partially scheduled. For example, if a 32G processes is scheduled on a 64G host, this |
| creates one free 32G {\em virtual host}. |
| \end{enumerate} |
| To determine the number of processes, by quantum, that can be scheduled, one must {\em sum} the |
| ``free'' and ``virtual'' columns. |
| |
| For example, (assuming a 15GB quantum), the following listing shows |
| that nodepool ``power'' contains fourteen hosts with at least 45GB each (3 quanta). Two |
| of these hosts have something scheduled on them (the ``free |
| machines'' line), leaving unused space of one 15G quantum on one |
| host, and one 30GB quantum on another host. |
| |
| \begin{verbatim} |
| Nodepool power |
| online 14 dead 0 offline 0 total-shares 42 free-shares 42 |
| all machines: 0 0 0 14 0 0 0 0 0 |
| online machines: 0 0 0 0 0 0 0 0 0 |
| free machines: 0 0 0 12 0 0 0 0 0 |
| virtual machines: 0 1 1 0 0 0 0 0 0 |
| \end{verbatim} |
| |
| \subsubsection{\em Usage:} |
| \begin{description} |
| \item[rm\_qload] \hfill \\ |
| This command has no options. |
| \end{description} |
| |
| \fi % End of dropped rm_qload |
| |
| \subsection{rm\_qoccupancy} |
| \label{subsec:admin.rm-qoccupancy} |
| |
| \subsubsection{{\em Description:}} |
| Rm\_qoccupancy provides a list of all known hosts to the RM, and for each host, the following information: |
| \begin{itemize} |
| \item The name of the host, |
| \item Whether the host has any blacklists processes on it, |
| \item Whether the host is currently onlline (responsive), |
| \item The status of the host; whether the host is schedulable ({\em up} or {\em down}. A responsive host becomes |
| unschedulable ({\em down}) if it is varied-off, |
| \item The nodepool the host is a member of, |
| \item The reported memory size of the host, |
| \item The {\em order} of the host. The {\em order} is defined to be the maximum number of quantum shares |
| supported by the host, |
| \item The number of unscheduled quantum shares on the host, and |
| \item If work is scheduled on the host, information relevent to that scheduled processes (or reservation). |
| \end{itemize} |
| |
| If work is scheduled on a host, the work summary is keyed thus: |
| \begin{description} |
| \item[J] The Orchestrator-assigned job id of the work, |
| \item[S] The RM-assigned share id of the work, |
| \item[O] The {\em order} of the allocation; that is, the number of quantum shares the allocation occupies, |
| \item[II] The {\em initialization investment}; the number of milliseconds the allocated work spent in its |
| initialization phase, if any (usually only UIMA-AS processes display this), |
| \item[IR] The {\em runtime investment}; the number of milliseconds spent processing the current CASs, if this |
| is a UIMA-AS processes. Note that this number can change dramatically, as it is the sum of time spent only |
| by the current CASs. When a CAS completes, it no longer contributes to the investment of the process. The RM |
| uses this information to determine the best candidate for eviction, if needed ot maintain fair-share. |
| \item[E] Whether the RM has preempted (evicted) the process but it has not yet exited, |
| \item[P] Whether the RM has purged the process (evicted, because the host is non-responsive), but it has not |
| been confirmed evicted, |
| \item[F] Whether the process is {\em fixed}; that is, non-preemptbable, |
| \item[I] Whether the initialization phase is completed (usually only UIMA-AS processes). |
| \end{description} |
| |
| The following example shows seven hosts, one with a preemptable share in the {\em --default--} |
| nodepool (on bluej290-5), and one with a non-preemptable share in the {\em jobdriver} nodepool. |
| \begin{verbatim} |
| Node Blacklisted Online Status Nodepool Memory Order Free |
| bluej290-5 False True up --default-- 32505856 2 0 |
| J[ 6006] S[ 189] O[2] II[ 0] IR[ 0] E[False] P[False] F[False] I[False] |
| |
| bluej290-6 False True up --default-- 32505856 2 2 |
| bluej290-7 False True up --default-- 32505856 2 2 |
| bluej291-26 False True up nightly-test 32505856 2 2 |
| bluej291-27 False True up nightly-test 32505856 2 2 |
| bluej293-60 False True up intel 32505856 2 2 |
| bluej537-73 False True up jobdriver 32505856 2 1 |
| J[ 5973] S[ 1] O[1] II[ 0] IR[ 0] E[False] P[False] F[ True] I[False] |
| |
| |
| \end{verbatim} |
| |
| |
| \subsubsection{\em Usage:} |
| |
| \begin{description} |
| \item[rm\_qoccupancy] \hfill \\ |
| This command has no options. |
| \end{description} |
| |
| |
| |
| \subsection{vary\_off} |
| \label{subsec:admin.vary-off} |
| \subsubsection{{\em Description:}} |
| |
| Vary\_off is used to remove a host from scheduling and to evict the preemptable work that is running on it. |
| This allows for graceful clearance of a host so the host can be take offline for maintenance, |
| or any other purpose (such as sharing the host with other applications.) |
| The DUCC agent is NOT stoppped; use \hyperref[subsec:admin.stop-ducc]{stop\_ducc} to stop the |
| agent. |
| Managed and unmanaged reservations are not canceled by {\em vary\_off}. |
| |
| Only the userid that started DUCC may issue {\em vary\_off}; attempts from other userids |
| are rejected. |
| |
| \subsubsection{\em{Usage: }} |
| |
| \begin{description} |
| \item[vary\_off list-of-hosts] |
| The {\em list-of-nodes} is a space delimited list of hosts to be removed from |
| scheduling in the DUCC cluster. |
| \end{description} |
| |
| \subsection{vary\_on} |
| \label{subsec:admin.vary-on} |
| \subsubsection{{\em Description:}} |
| |
| Vary\_on is used to restore a host to scheduling by DUCC. If the agent is still |
| alive the host becomes immediately available. The agent is not started by |
| {\em vary\_on}; use use |
| \hyperref[subsec:admin.start-ducc]{start\_ducc} to start the agent if needed. |
| |
| Only the userid that started DUCC may issue {\em vary\_on}; attempts from other userids |
| are rejected. |
| |
| \subsubsection{\em{Usage: }} |
| |
| \begin{description} |
| \item[vary\_on list-of-hosts] |
| The {\em list-of-nodes} is a space delimited list of hosts to be restored for |
| scheduling in the DUCC cluster. |
| \end{description} |
| |
| |
| \subsection{ducc\_properties\_manager} |
| \label{sec:cli.ducc-properties-manager} |
| |
| \paragraph{Description:} |
| This CLI is used to manually merge or difference two properties files. |
| |
| Normally, the DUCC scripts {\em start\_ducc, check\_ducc,}, and {\em rm\_configure} automatically |
| merge the file {\em default.ducc.properties} and {\em site.ducc.properties} when invoked. |
| |
| \paragraph{Usage:} |
| \begin{description} |
| \item[ducc\_props\_manager --merge file1 --with file2 --to file3] |
| Merge two properties files into one. Properties added to, or changed in, the second file |
| are used to override those in the first file, with the result written to the third file. |
| \item[ducc\_props\_manager --delta file1 --with file2 --to file3] |
| Compare two properties files and write the differences into a third file. The first file is |
| considered a ``master'' file. Properties with different values in the second file, or which |
| do not occur in the first file, are written into the third file. |
| \end{description} |
| |
| \paragraph{Options:} |
| \begin{description} |
| \item[$--$merge file1] |
| In this form, the two files specified in the {\em $--$with} and {\em$--$to} fields are merged, with the |
| results placed in $--$file3. Overrides are flagged with a change tag and the date of the merge. |
| |
| {\em file1} is considered the ``master'' properties file and is usually the unmodified file provided |
| with the DUCC distribution, {\em default.ducc.properties}. |
| |
| {\em file2} is considered a set of override or additional properties and is usually the site local |
| properties file, {\em site.ducc.properties.} |
| |
| \item[$--$delta file1] |
| In this form, the two files specified in the {\em $--$with} and {\em$--$to} fields are compared, with |
| differences placed in $--$file3. |
| |
| {\em file1} is considered the ``master'' properties file and is usually the unmodified file provided |
| with the DUCC distribution, {\em default.ducc.properties}. |
| |
| {\em file2} is considered the ``external'' properties file and is usually the properties file from |
| an older version of DUCC. |
| |
| Differences are placed in {\em $--$file3} which may be a viable first cut at a new {\em site.ducc.properties.} |
| |
| \item[$--$with file2] This specifies the properties file to merge with the master, or to difference |
| with the master properties file. |
| |
| \item[$--$to file3] This specifies the file to which the results of the merge or delta are written. |
| \end{description} |
| |
| \paragraph{Notes:} |
| None. |
| |
| \subsection{db\_create} |
| \label{subsec:admin.db-create} |
| |
| \paragraph{Description:} |
| This command is used to initialize the database. Normally the database is initialized |
| during {\em ducc\_post\_install} but if this is an existing DUCC installation that is |
| being migrated from a version that does not use the database, it will be necessary to |
| initialize the database with this command. |
| |
| This command performs the following actions: |
| \begin{enumerate} |
| \item Starts the database. |
| \item Disables the default database superuser. |
| \item Installs a database superuser as ``ducc'' and sets the password |
| to a random string. The password is saved |
| in DUCC\_HOME/resources.private/ducc.private.properties. |
| \item Installs the DUCC database schema. |
| \item Stops the database. |
| \end{enumerate} |
| |
| |
| This command takes no parameters. |
| |
| NOTE: The database user and password are NOT RELATED to any login ID on the system, |
| they are used and maintained by the database only. |
| |
| \subsection{db\_loader} |
| \label{subsec:admin.db-loader} |
| |
| \paragraph{Description:} |
| This command is used to copy the data from DUCC's older (pre 2.1.0) file-based persistence |
| into the database. The database schema must already exist, created either |
| with {\em ducc\_post\_install} or with {\em db\_create}. |
| |
| This command performs the following actions: |
| \begin{enumerate} |
| \item Starts the database. |
| \item Drops some of the indexes in the database. |
| \item Loads the Orchestrator checkpoint file from {\em DUCC\_HOME/state/orchestrator.chkpt}. |
| \item Loads all job history from {\em DUCC\_HOME/history/jobs}. |
| \item Loads all reservation history from {\em DUCC\_HOME/history/reservations}. |
| \item Loads all service instance and AP history from {\em DUCC\_HOME/history/services}. |
| \item Loads the service registry from {\em DUCC\_HOME/state/services}. |
| \item Loads the service registry histroy from {\em DUCC\_HOME/history/service-registry}. |
| \item Reloads the Orchestratory checkpoint, as a spot-check of the loader's instrumentation (to insure |
| load times stay reasonable.) |
| \item Re-installs the DUCC database schema. |
| \item Stops the database. |
| \item Optionally renames the file-based state so if you rerun the command, the data does not get reloaded. |
| \end{enumerate} |
| |
| When the command exits, DUCC should be ready to run with all its state in the database. |
| |
| This command takes two parameters, a pointer to the DUCC\_HOME you want to load from, and |
| a flag to disable the rename of the file-based state. |
| |
| \paragraph{Usage:} |
| \begin{description} |
| \item[db\_loader -i {\em some-ducc-home} {[--no-archive]}] |
| Load the database from the specified DUCC\_HOME, and optionally do not archive the original files |
| by renaming them. |
| \end{description} |
| |
| \paragraph{Options:} |
| \begin{description} |
| \item[$-i$ {\em some-ducc-home}] |
| This specifies the DUCC\_HOME you wish to load. Most of the time it is the DUCC\_HOME you |
| are running within, but it can be some other DUCC\_HOME if you have multiple installations and |
| want other history and state loaded. |
| \item[$--no-archive$] |
| If specified, the original files are not renamed. Note that only the directories in {\em history} |
| and {\em state} are renamed. To restore these, simply rename them back without the {\em archive} |
| suffix. |
| \end{description} |
| |
| \paragraph{Example:} |
| \begin{verbatim} |
| db_loader -i /home/ducc/ducc_runtime |
| db_loader -i /home/ducc.old/ducc_runtime --no-archive |
| \end{verbatim} |
| |
| \paragraph{Notes:} |
| The console shows progress of the loader. Full details of the load are written to a log {\em db-loader-log} |
| in the usual DUCC log directory, for reference and potential problem determination if something goes wrong. |
| |