blob: 3dae8248293d5378aee0a06a5967cdf673a83bcd [file] [view]
# For Developers
To install the libraries necessary for running scripts or testing, a system python of 2.7 must be available, the version of gcc and g++ used to compile python must be available.
On most distributions, python will compiled with the same gcc and g++ verion available from the base packages "gcc" and "gcc-c++".
The command `python -VV` will show the compiler used to compile the version of python being used.
A `make` in from gpMgmt will install the proper libraries provided a gcc and gcc-c++ are present.
To run any of these python scripts, necessary libraries must be installed, and PYTHONPATH must be modified to use the libraries in this path.
```
PYTHONPATH="\$GPHOME/lib/python:${PYTHONPATH}"
```
This will be set automatically with a `source $GPHOME/cloudberry-env.sh`
## Python Version
System python 3 is currently required.
Where Things Go
---------------
* If you are adding a GP module, please add it to the gppylib dir.
* If you are adding a 3rd party module, please add it to the ext dir.
List of Management Scripts Written in Bash
------------------------------------------
bin/gpinitsystem - Creates a new Apache Cloudberry
bin/gpload - Sets env variables and calls gpload.py
List of Management Scripts Written in Python (no libraries)
-----------------------------------------------------------
bin/gpload.py - Loads data into a Apache Cloudberry
List of Management Scripts Written in Python (gpmlib - old libraries)
---------------------------------------------------------------------
bin/gpaddmirrors - Adds mirrors to an array (needs rewrite)
bin/gprecoverseg - Recovers a failed segment (needs rewrite)
bin/gpcheckperf - Checks the hardware for Apache Cloudberry
bin/gpsync - Copies files to many hosts
bin/gpssh - Remote shell to many hosts
bin/gpssh-exkeys - Exchange ssh keys between many hosts
List of Management Scripts Written in Python (gppylib - current libraries)
--------------------------------------------------------------------------
bin/gpactivatestandby - Activates the Standby Coordinator
bin/gpconfig_helper - Edits postgresql.conf file for all segments
bin/gpdeletesystem - Deletes a Apache Cloudberry
bin/gpexpand - Adds additional segments to a Apache Cloudberry
bin/gpinitstandby - Initializes standby coordinator
bin/gplogfilter - Filters log files
bin/gpstart - Start a Apache Cloudberry
bin/gpstop - Stop a Apache Cloudberry
sbin/gpconfig_helper.py - Helper script for gpconfig
sbin/gpsegcopy - Helper script for gpexpand
sbin/gpsegstart.py - Helper script for gpstart
sbin/gpsegstop.py - Helper script for gpstop
Overview of gppylib
-------------------
dattimeutils.py - Several utility functions for dealing with date/time data
gparray.py
|
+- Segment - Configuration information for a single dbid
|
+- SegmentPair - Configuration information for a single content id
| \- Contains multiple Segment objects
|
+- GpArray - Configuration information for a Apache Cloudberry
\- Contains multiple SegmentPair objects
gplog.py - Utility functions to assist in Cloudberry standard logging
gpparseopts.py - Wrapper around optparse library to aid in locating help files
gpsubprocess.py - Wrapper around python subprocess (?)
\- Used by commands/base.py
- Should move to the commands submodule?
logfilter.py - Contains numerous odd utility functions mostly not specific to logfilter
pgconf.py - Contains helper functions for reading postgresql.conf files
|
+- gucdict - dictionary of guc->value pairs
| \- Contains setting objects
|
+- setting - the setting of a single guc and some type coercion funcs
|
+- ConfigurationError - subclass of EnvironmentError, raised by type coercion functions
segcopy.py - code for copying a segment from one location to another
\- should be subclass of command ???
userinput.py - wrapper functions around raw_input
commands/base.py - Core of commands submodule (could use some work)
|
+- WorkerPool - Multithreading to execute multiple Command objects
| \- Spawns multiple Worker objects
|
+- Worker - A single thread used to execute Command objects
|
+- CommandResult - Packages results of a Command object
|
+- ExecutionError - subclass of Exception
|
+- ExecutionContext - Abstract class
| |
| +- LocalExecutionContext - execute a command locally
| |
| +- RemoteExecutionContext - execute a command remotely
|
+- Command - abstract class for executing (shell level) commands
|
+- SQLCommand - abstract class for executing SQL commands
commands/gp.py - Implements lots of subclasses of Command for various tasks
commands/pg.py - Like gp.py, not clear what the separation is, if any.
commands/unix.py - Platform information + more subclasses of Command
commands/test_pg - some tests for commands/pg.py
db/catalog.py - Wrappers for executing certain queries
\- also contains some goofy wrappers for catalog tables
db/dbconn.py - Connections to the database
|
+- ConnectionError - subclass of a StandardError (unused?)
|
+- Pgpass - wrapper for handling a .pgpass file
|
+- DbURL - descriptor of how to connect to a database
|
+- functions for returning a pygresql.connection object
|
+- Should have a wrapper class around a pygresql connection object!
util/gp_utils.py - Cloudberry related utility functions that are not Commands
util/ssh_session.py - SSH and RSYNC related utility functions brought in from gpmlib.py/gplib.py
that are used by gpssh, gpsync and gpssh-exkeys
## Testing Management Scripts (unit tests)
This directory contains the unit tests for the management scripts. These tests
require the following Python modules to be installed: mock and pygresql.
These modules can be installed by running "git submodule update --init --recursive"
if they are not already installed on your machine.
If you installed the dependencies using the above git command, you can run the tests with
make, using the following commands in the current directory:
"make check" will run all of the unit tests, some of which require a GPDB cluster to
be installed and currently running.
"make unitdevel" will run only the unit tests that do not require a running cluster.
If you did not install the dependencies using the git submodule, use the following commands in
place of the above make commands, still in the current directory:
"python -m unittest discover --verbose -s gppylib -p 'test_unit*.py' -p 'test_cluster*.py'" will
run all of the unit tests.
"python -m unittest discover --verbose -s gppylib -p 'test_unit*.py'" will run only the unit
tests that do not require a running cluster.
## Testing Management Scripts (behave tests)
Behave tests require a running Cloudberry cluster, and additional python libraries for testing, available to gpadmin.
Thus, you can install these additional python libraries using any of the following methods:
1. As root user, to be available globally:
```
sudo pip install -r gpMgmt/requirements-dev.txt
```
2. As gpadmin user, to be available only to gpadmin user, but overriding any overlapping libraries with the specific verions in this requirements file:
```
pip install --user -r gpMgmt/requirements-dev.txt
```
3. As gpadmin, using a virtual env - see additional documentation on using a virtual env on python.org