| --- |
| title: Apache Mesos - Nvidia GPU Support |
| layout: documentation |
| --- |
| |
| # Nvidia GPU Support |
| |
| Mesos 1.0.0 added first-class support for Nvidia GPUs. |
| The minimum required Nvidia driver version is `340.29`. |
| |
| ## Overview |
| Getting up and running with GPU support in Mesos is fairly |
| straightforward once you know the steps necessary to make it work as |
| expected. On one side, this includes setting the necessary agent flags |
| to enumerate GPUs and advertise them to the Mesos master. On the other |
| side, this includes setting the proper framework capabilities so that |
| the Mesos master will actually include GPUs in the resource offers it |
| sends to a framework. So long as all of these constraints are met, |
| accepting offers that contain GPUs and launching tasks that consume |
| them should be just as straightforward as launching a traditional task |
| that only consumes CPUs, memory, and disk. |
| |
| Mesos exposes GPUs as a simple `SCALAR` resource in the same |
| way it always has for CPUs, memory, and disk. That is, a resource |
| offer such as the following is now possible: |
| |
| cpus:8; mem:1024; disk:65536; gpus:4; |
| |
| However, unlike CPUs, memory, and disk, *only* whole numbers of GPUs |
| can be selected. If a fractional amount is selected, launching the |
| task will result in a `TASK_ERROR`. |
| |
| At the time of this writing, Nvidia GPU support is only available for |
| tasks launched through the Mesos containerizer (i.e., no support exists |
| for launching GPU capable tasks through the Docker containerizer). |
| That said, the Mesos containerizer now supports running docker |
| images natively, so this limitation should not affect most users. |
| |
| Moreover, we mimic the support provided by [nvidia-docker]( |
| https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver) to |
| automatically mount the proper Nvidia drivers and tools directly into |
| your docker container. This means you can easily test your GPU-enabled |
| docker containers locally and deploy them to Mesos with the assurance |
| that they will work without modification. |
| |
| In the following sections we walk through all of the flags and |
| framework capabilities necessary to enable Nvidia GPU support in |
| Mesos. We then show an example of setting up and running an example |
| test cluster that launches tasks both with and without docker |
| containers. Finally, we conclude with a step-by-step guide of how to |
| install any necessary Nvidia GPU drivers on your machine. |
| |
| ## Agent Flags |
| The following isolation flags are required to enable Nvidia GPU |
| support on an agent. |
| |
| --isolation="filesystem/linux,cgroups/devices,gpu/nvidia" |
| |
| The `filesystem/linux` flag tells the agent to use Linux-specific |
| commands to prepare the root filesystem and volumes (e.g., persistent |
| volumes) for containers that require them. Specifically, it relies on |
| Linux mount namespaces to prevent the mounts of a container from being |
| propagated to the host mount table. In the case of GPUs, we require |
| this flag to properly mount certain Nvidia binaries (e.g., |
| `nvidia-smi`) and libraries (e.g., `libnvidia-ml.so`) into a container |
| when necessary. |
| |
| The `cgroups/devices` flag tells the agent to restrict access to a |
| specific set of devices for each task that it launches (i.e., a subset |
| of all devices listed in `/dev`). When used in conjunction with the |
| `gpu/nvidia` flag, the `cgroups/devices` flag allows us to grant / |
| revoke access to specific GPUs on a per-task basis. |
| |
| By default, all GPUs on an agent are automatically discovered and sent |
| to the Mesos master as part of its resource offer. However, it may |
| sometimes be necessary to restrict access to only a subset of the GPUs |
| available on an agent. This is useful, for example, if you want to |
| exclude a specific GPU device because an unwanted Nvidia graphics card |
| is listed alongside a more powerful set of GPUs. When this is |
| required, the following additional agent flags can be used to |
| accomplish this: |
| |
| --nvidia_gpu_devices="<list_of_gpu_ids>" |
| |
| --resources="gpus:<num_gpus>" |
| |
| For the `--nvidia_gpu_devices` flag, you need to provide a comma |
| separated list of GPUs, as determined by running `nvidia-smi` on the |
| host where the agent is to be launched ([see |
| below](#external-dependencies) for instructions on what external |
| dependencies must be installed on these hosts to run this command). |
| Example output from running `nvidia-smi` on a machine with four GPUs |
| can be seen below: |
| |
| +------------------------------------------------------+ |
| | NVIDIA-SMI 352.79 Driver Version: 352.79 | |
| |-------------------------------+----------------------+----------------------+ |
| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | |
| | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |
| |===============================+======================+======================| |
| | 0 Tesla M60 Off | 0000:04:00.0 Off | 0 | |
| | N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 0% Default | |
| +-------------------------------+----------------------+----------------------+ |
| | 1 Tesla M60 Off | 0000:05:00.0 Off | 0 | |
| | N/A 35C P0 39W / 150W | 34MiB / 7679MiB | 0% Default | |
| +-------------------------------+----------------------+----------------------+ |
| | 2 Tesla M60 Off | 0000:83:00.0 Off | 0 | |
| | N/A 38C P0 40W / 150W | 34MiB / 7679MiB | 0% Default | |
| +-------------------------------+----------------------+----------------------+ |
| | 3 Tesla M60 Off | 0000:84:00.0 Off | 0 | |
| | N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 97% Default | |
| +-------------------------------+----------------------+----------------------+ |
| |
| The GPU `id` to choose can be seen in the far left of each row. Any |
| subset of these `ids` can be listed in the `--nvidia_gpu_devices` |
| flag (i.e., all of the following values of this flag are valid): |
| |
| --nvidia_gpu_devices="0" |
| --nvidia_gpu_devices="0,1" |
| --nvidia_gpu_devices="0,1,2" |
| --nvidia_gpu_devices="0,1,2,3" |
| --nvidia_gpu_devices="0,2,3" |
| --nvidia_gpu_devices="3,1" |
| etc... |
| |
| For the `--resources=gpus:<num_gpus>` flag, the value passed to |
| `<num_gpus>` must equal the number of GPUs listed in |
| `--nvidia_gpu_devices`. If these numbers do not match, launching the |
| agent will fail. This can sometimes be a source of confusion, so it |
| is important to emphasize it here for clarity. |
| |
| ## Framework Capabilities |
| Once you launch an agent with the flags above, GPU resources will be |
| advertised to the Mesos master along side all of the traditional |
| resources such as CPUs, memory, and disk. However, the master will |
| only forward offers that contain GPUs to frameworks that have |
| explicitly enabled the `GPU_RESOURCES` framework capability. |
| |
| The choice to make frameworks explicitly opt-in to this `GPU_RESOURCES` |
| capability was to keep legacy frameworks from accidentally consuming |
| non-GPU resources on GPU-capable machines (and thus preventing your GPU |
| jobs from running). It's not that big a deal if all of your nodes have |
| GPUs, but in a mixed-node environment, it can be a big problem. |
| |
| An example of setting this capability in a C++-based framework can be |
| seen below: |
| |
| FrameworkInfo framework; |
| framework.add_capabilities()->set_type( |
| FrameworkInfo::Capability::GPU_RESOURCES); |
| |
| GpuScheduler scheduler; |
| |
| driver = new MesosSchedulerDriver( |
| &scheduler, |
| framework, |
| 127.0.0.1:5050); |
| |
| driver->run(); |
| |
| |
| ## Minimal GPU Capable Cluster |
| In this section we walk through two examples of configuring GPU-capable |
| clusters and running tasks on them. The first example demonstrates the |
| minimal setup required to run a command that consumes GPUs on a GPU-capable |
| agent. The second example demonstrates the setup necessary to |
| launch a docker container that does the same. |
| |
| **Note**: Both of these examples assume you have installed the |
| external dependencies required for Nvidia GPU support on Mesos. Please |
| see [below](#external-dependencies) for more information. |
| |
| ### Minimal Setup Without Support for Docker Containers |
| The commands below show a minimal example of bringing up a GPU-capable |
| Mesos cluster on `localhost` and executing a task on it. The required |
| agent flags are set as described above, and the `mesos-execute` |
| command has been told to enable the `GPU_RESOURCES` framework |
| capability so it can receive offers containing GPU resources. |
| |
| $ mesos-master \ |
| --ip=127.0.0.1 \ |
| --work_dir=/var/lib/mesos |
| |
| $ mesos-agent \ |
| --master=127.0.0.1:5050 \ |
| --work_dir=/var/lib/mesos \ |
| --isolation="cgroups/devices,gpu/nvidia" |
| |
| $ mesos-execute \ |
| --master=127.0.0.1:5050 \ |
| --name=gpu-test \ |
| --command="nvidia-smi" \ |
| --framework_capabilities="GPU_RESOURCES" \ |
| --resources="gpus:1" |
| |
| If all goes well, you should see something like the following in the |
| `stdout` out of your task: |
| |
| +------------------------------------------------------+ |
| | NVIDIA-SMI 352.79 Driver Version: 352.79 | |
| |-------------------------------+----------------------+----------------------+ |
| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | |
| | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |
| |===============================+======================+======================| |
| | 0 Tesla M60 Off | 0000:04:00.0 Off | 0 | |
| | N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 0% Default | |
| +-------------------------------+----------------------+----------------------+ |
| |
| ### Minimal Setup With Support for Docker Containers |
| The commands below show a minimal example of bringing up a GPU-capable |
| Mesos cluster on `localhost` and running a docker container on it. The |
| required agent flags are set as described above, and the |
| `mesos-execute` command has been told to enable the `GPU_RESOURCES` |
| framework capability so it can receive offers containing GPU |
| resources. Additionally, the required flags to enable support for |
| docker containers (as described [here](container-image.md)) have been |
| set up as well. |
| |
| $ mesos-master \ |
| --ip=127.0.0.1 \ |
| --work_dir=/var/lib/mesos |
| |
| $ mesos-agent \ |
| --master=127.0.0.1:5050 \ |
| --work_dir=/var/lib/mesos \ |
| --image_providers=docker \ |
| --executor_environment_variables="{}" \ |
| --isolation="docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia" |
| |
| $ mesos-execute \ |
| --master=127.0.0.1:5050 \ |
| --name=gpu-test \ |
| --docker_image=nvidia/cuda \ |
| --command="nvidia-smi" \ |
| --framework_capabilities="GPU_RESOURCES" \ |
| --resources="gpus:1" |
| |
| If all goes well, you should see something like the following in the |
| `stdout` out of your task. |
| |
| +------------------------------------------------------+ |
| | NVIDIA-SMI 352.79 Driver Version: 352.79 | |
| |-------------------------------+----------------------+----------------------+ |
| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | |
| | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |
| |===============================+======================+======================| |
| | 0 Tesla M60 Off | 0000:04:00.0 Off | 0 | |
| | N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 0% Default | |
| +-------------------------------+----------------------+----------------------+ |
| |
| <a name="external-dependencies"></a> |
| ## External Dependencies |
| |
| Any host running a Mesos agent with Nvidia GPU support **MUST** have a |
| valid Nvidia kernel driver installed. It is also *highly* recommended to |
| install the corresponding user-level libraries and tools available as |
| part of the Nvidia CUDA toolkit. Many jobs that use Nvidia GPUs rely |
| on CUDA and not including it will severely limit the type of |
| GPU-aware jobs you can run on Mesos. |
| |
| **Note:** The minimum supported version of CUDA is `6.5`. |
| |
| ### Installing the Required Tools |
| |
| The Nvidia kernel driver can be downloaded at the link below. Make |
| sure to choose the proper model of GPU, operating system, and CUDA |
| toolkit you plan to install on your host: |
| |
| http://www.nvidia.com/Download/index.aspx |
| |
| Unfortunately, most Linux distributions come preinstalled with an open |
| source video driver called `Nouveau`. This driver conflicts with the |
| Nvidia driver we are trying to install. The following guides may prove |
| useful to help guide you through the process of uninstalling `Nouveau` |
| before installing the Nvidia driver on CentOS or Ubuntu. |
| |
| http://www.dedoimedo.com/computers/centos-7-nvidia.html |
| http://www.allaboutlinux.eu/remove-nouveau-and-install-nvidia-driver-in-ubuntu-15-04/ |
| |
| After installing the Nvidia kernel driver, you can follow the |
| instructions in the link below to install the Nvidia CUDA toolkit: |
| |
| http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/ |
| |
| In addition to the steps listed in the link above, it is *highly* |
| recommended to add CUDA's `lib` directory into your `ldcache` so that |
| tasks launched by Mesos will know where these libraries exist and link |
| with them properly. |
| |
| sudo bash -c "cat > /etc/ld.so.conf.d/cuda-lib64.conf << EOF |
| /usr/local/cuda/lib64 |
| EOF" |
| |
| sudo ldconfig |
| |
| If you choose **not** to add CUDAs `lib` directory to your `ldcache`, |
| you **MUST** add it to every task's `LD_LIBRARY_PATH` that requires |
| it. |
| |
| **Note:** This is *not* the recommended method. You have been warned. |
| |
| ### Verifying the Installation |
| |
| Once the kernel driver has been installed, you can make sure |
| everything is working by trying to run the bundled `nvidia-smi` tool. |
| |
| nvidia-smi |
| |
| You should see output similar to the following: |
| |
| Thu Apr 14 11:58:17 2016 |
| +------------------------------------------------------+ |
| | NVIDIA-SMI 352.79 Driver Version: 352.79 | |
| |-------------------------------+----------------------+----------------------+ |
| | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | |
| | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |
| |===============================+======================+======================| |
| | 0 Tesla M60 Off | 0000:04:00.0 Off | 0 | |
| | N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 0% Default | |
| +-------------------------------+----------------------+----------------------+ |
| | 1 Tesla M60 Off | 0000:05:00.0 Off | 0 | |
| | N/A 35C P0 39W / 150W | 34MiB / 7679MiB | 0% Default | |
| +-------------------------------+----------------------+----------------------+ |
| | 2 Tesla M60 Off | 0000:83:00.0 Off | 0 | |
| | N/A 38C P0 38W / 150W | 34MiB / 7679MiB | 0% Default | |
| +-------------------------------+----------------------+----------------------+ |
| | 3 Tesla M60 Off | 0000:84:00.0 Off | 0 | |
| | N/A 34C P0 38W / 150W | 34MiB / 7679MiB | 99% Default | |
| +-------------------------------+----------------------+----------------------+ |
| |
| +-----------------------------------------------------------------------------+ |
| | Processes: GPU Memory | |
| | GPU PID Type Process name Usage | |
| |=============================================================================| |
| | No running processes found | |
| +-----------------------------------------------------------------------------+ |
| |
| To verify your CUDA installation, it is recommended to go through the instructions at the link below: |
| |
| http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/#install-samples |
| |
| Finally, you should get a developer to run Mesos's Nvidia GPU-related |
| unit tests on your machine to ensure that everything passes (as |
| described below). |
| |
| ### Running Mesos Unit Tests |
| |
| At the time of this writing, the following Nvidia GPU specific unit |
| tests exist on Mesos: |
| |
| DockerTest.ROOT_DOCKER_NVIDIA_GPU_DeviceAllow |
| DockerTest.ROOT_DOCKER_NVIDIA_GPU_InspectDevices |
| NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_VerifyDeviceAccess |
| NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage |
| NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_FractionalResources |
| NvidiaGpuTest.NVIDIA_GPU_Discovery |
| NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_FlagValidation |
| NvidiaGpuTest.NVIDIA_GPU_Allocator |
| NvidiaGpuTest.ROOT_NVIDIA_GPU_VolumeCreation |
| NvidiaGpuTest.ROOT_NVIDIA_GPU_VolumeShouldInject) |
| |
| The capitalized words following the `'.'` specify test filters to |
| apply when running the unit tests. In our case the filters that apply |
| are `ROOT`, `CGROUPS`, and `NVIDIA_GPU`. This means that these tests |
| must be run as `root` on Linux machines with `cgroups` support that |
| have Nvidia GPUs installed on them. The check to verify that Nvidia |
| GPUs exist is to look for the existence of the Nvidia System |
| Management Interface (`nvidia-smi`) on the machine where the tests are |
| being run. This binary should already be installed if the instructions |
| above have been followed correctly. |
| |
| So long as these filters are satisfied, you can run the following to |
| execute these unit tests: |
| |
| [mesos]$ GTEST_FILTER="" make -j check |
| [mesos]$ sudo bin/mesos-tests.sh --gtest_filter="*NVIDIA_GPU*" |