MXNet aims to support a variety of frontends, e.g. Python, Java, Perl, R, etc. as well as environments (Windows, Linux, Mac, with or without GPU, with or without oneDNN support, etc.). This package contains a small continuous delivery (CD) framework used to automate the delivery nightly and release builds across our delivery channels.
The CD process is driven by the CD pipeline job, which orchestrates the order in which the artifacts are delivered. For instance, first publish the libmxnet library before publishing the pip package. It does this by triggering the release job with a specific set of parameters for each delivery channel. The release job executes the specific release pipeline for a delivery channel across all MXNet variants.
A variant is a specific environment or features for which MXNet is compiled. For instance CPU, GPU with CUDA v10.1, CUDA v10.2 with oneDNN support, etc.
Currently, below variants are supported. All of these variants except native have oneDNN backend enabled.
For more on variants, see here
The CD pipeline job take two parameters:
This job defines and executes the CD pipeline. For example, first publish the MXNet library, then, in parallel, execute the python and maven releases. Every step of the pipeline executes a trigger for a release job.
The release job takes five parameters:
The release job executes, in parallel, the release pipeline for each of the variants (MXNET_VARIANTS) for the job type (RELEASE_JOB_TYPE). The job type the path to a directory (relative to the cd
directory) that includes a Jenkins_pipeline.groovy
file (e.g.).
NOTE: The COMMIT_ID is a little tricky and we must be very careful with it. It is necessary to ensure that the same commit is built through out the pipeline, but at the same time, it has the potential to change the current state of the release job configuration - specifically the parameter configuration. Any changes to this configuration will require a “dry-run” of the release job to ensure Jenkins has the current (master) version. This is acceptable as there will be few changes to the parameter configuration for the job, if any at all. But, it's something to keep in mind.
To avoid potential issues as much as possible, the CD pipeline executes this “dry run” and ensures that Jenkins' state of the release job matches what is defined for the release job in the specified COMMIT_ID. This is done by setting the RELEASE_JOB_TYPE to Status Update.
It should be noted that the ‘Pipeline’ section of the configuration should use the $COMMIT_ID parameter as the specifier and ‘lightweight checkout’ unchecked. For example:
This file defines the release pipeline for a particular release channel. It defines a function get_pipeline(mxnet_variant)
, which returns a closure with the pipeline to be executed. For instance:
def get_pipeline(mxnet_variant) { return { stage("${mxnet_variant}") { stage("Build") { timeout(time: max_time, unit: 'MINUTES') { build(mxnet_variant) } } stage("Test") { timeout(time: max_time, unit: 'MINUTES') { test(mxnet_variant) } } stage("Publish") { timeout(time: max_time, unit: 'MINUTES') { publish(mxnet_variant) } } } } } def build(mxnet_variant) { node(UBUNTU_CPU) { ... } } ...
The “first mile” of the CD process is posting the mxnet binaries to the artifact repository. Once this step is complete, the pipelines for the different release channels (PyPI, Maven, etc.) can begin from the compiled binary, and focus solely on packaging it, testing the package, and posting it to the particular distribution channel.
cd
which represents your release channel, e.g. python/pypi
.Jenkins_pipeline.groovy
there with a get_pipeline(mxnet_variant)
function that describes your pipeline.We shouldn't set global timeouts for the pipelines. Rather, the step
being executed should be rapped with a timeout
function (as in the pipeline example above). The max_time
is a global variable set at the release job level.
Ensure that either your steps, or the whole pipeline are wrapped in a node
call. The jobs execute in an utility
node. If you don't wrap your pipeline, or its individual steps, in a node
call, this will lead to problems.
Examples of the two approaches:
Whole pipeline
The release pipeline is executed on a single node, depending on the variant building released. This approach is fine, as long as the stages that don't need specialized hardware (e.g. compilation, packaging, publishing), are short lived.
def get_pipeline(mxnet_variant) { def node_type = mxnet_variant.startsWith('cu') ? NODE_LINUX_GPU : NODE_LINUX_CPU return { node (node_type) { stage("${mxnet_variant}") { stage("Build") { ... } stage("Test") { ... } ... } } } }
Examples:
Per step
Use this approach in cases where you have long running stages that don't depend on specialized/expensive hardware.
def get_pipeline(mxnet_variant) { return { stage("${mxnet_variant}") { stage("Build") { ... } ... } } } def build(mxnet_variant) { node(UBUNTU_CPU) { ... } } def test(mxnet_variant) { def node_type = mxnet_variant.startsWith('cu') ? NODE_LINUX_GPU : NODE_LINUX_CPU node(node_type) { ... } }
Examples:
The libmxnet pipeline has long running compilation and testing stages that do not require specialized/expensive hardware (e.g. GPUs). Therefore, as much as possible, it is important to run each stage in on its own node, and design the pipeline to spend the least amount of time possible on expensive hardware. E.g. for GPU builds, only run GPU tests on GPU instances, all other stages can be executed on CPU nodes.