blob: caacdf11a3a3e8ae0236301c844a87b43c182721 [file] [log] [blame]
%
% Licensed to the Apache Software Foundation (ASF) under one
% or more contributor license agreements. See the NOTICE file
% distributed with this work for additional information
% regarding copyright ownership. The ASF licenses this file
% to you under the Apache License, Version 2.0 (the
% "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing,
% software distributed under the License is distributed on an
% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
% KIND, either express or implied. See the License for the
% specific language governing permissions and limitations
% under the License.
%
\section{DUCC Database Integration}
\label{sec:ducc.database}
As of Version 2.1.0, DUCC uses the \href{https://cassandra.apache.org/}{Apache Cassandra}
database instead of the filesystem to manage
history and the service registry. Additionally, the Resource Manager maintains
current scheduling and node state in the database.
\subsection{Overview}
During first-time installation, the \hyperref[subsec:admin.ducc-post-install]{\em ducc\_post\_install} utility
randomly generates a (database) super-user password, which is kept in the protected file {\em DUCC\_HOME/resources.private/ducc.private.properties}.
The utility proceeds to configure the database and install the schema.
If DUCC is being upgraded, generally \hyperref[subsec:admin.ducc-post-intall]{\em ducc\_post\_install} is not used, in
which case, again, \hyperref[subsec:admin.db-create]{\em db\_create} and \hyperref[subsec:admin.db-loader]{\em db\_loader} may be used to
convert the older file-based state to the database.
\subsubsection{Orchestrator use of the Database}
The Orchestrator persists two types of work:
\begin{enumerate}
\item All work history. This includes jobs, reservations, service instances, and
arbitrary processes. This history is what the webserver uses to display details
on previously run jobs. Prior to the database, this data was saved in the
{\em DUCC\_HOME/history directory}.
\item Checkpoint. On every state change, the Orchestrator saves the state of
all running and allocated work in the system. This is used to recover reservations
when DUCC is started, and to allow hot-start of the Orchestrator without losing work.
Prior to the database, this data was saved in the file {\em DUCC\_HOME/state/orchestrator.ckpt}.
\end{enumerate}
\subsubsection{Service Manager use of the Database}
The service manager uses the database to store the service registry and all state
of active services. Prior to the database, this data was saved in Java properties files
in the directory {\em DUCC\_HOME/state/services}.
When a service is ``unregistered'' it is not physically removed from the database. Instead,
a bit is set indicating the service is no long active. These registrations may be
recovered if needed by querying the database. Prior to the database, this data was saved
in {\em DUCC\_HOME/history/service-registry}.
\subsubsection{Resource Manager use of the Database}
The resource manager saves its entire runtime state in the database. Prior to the
database, this dynamnic state was not saved or directly accessible.
\subsubsection{Webserver use of the Database}
The web server uses the database in read-only mode to fetch work history, service
registrations, and node status. Previously to the database most of this information
was fetched from the filesystem. Node status was inferred using the Agent publications;
with the database, the webserver has direct access to the Resource Manager's view of the
DUCC nodes, providing a much more accurate picture of the system.
\subsection{Database Scripting Utilities}
Database support is fully integrated with the DUCC start, stop, and check utilities as
well as the post installation scripting.
In addition two utilities are supplied to enable migration of older installations to
enable the database:
\begin{description}
\item[db\_create] The \hyperref[subsec:cli.db.create]{db\_create} utility creates the database schema, disables the
default database superuser, installs a read-only guest id, and installs the
main DUCC super user ID. Note that database IDs are in no way related to
operating system IDs.
\item[db\_loader] The \hyperref[subsec:cli.db.loader]{db\_loader} utility migrates an existing file-based DUCC
system to use the database. It copies in the job history, Orchestrator checkpoint,
and the service registry.
\end{description}
Use the cross-references above for additional details on the utilities.
\subsection{Database Configuration}
Most database configuration is accomplished by setting appropriate values into
your local \hyperref[subsec:ducc.database.properties]{\em site.ducc.properties}. See
the linked section for details.
For existing installations, the {\em db\_create} utility installs the
database scheme and updates your {\em site.ducc.properties} with reasonable
defaults.