4. NVIDIA-GPU¶
4.1. Cluster Configuration¶
Note
Clearly since this is cluster configuration this only applies to administrators. Normal users will have no need to follow this cluster configuration step.
To enable nvidia-container-toolking previously nvidia-docker, by editing /etc/docker/daemon.json since kubernetes does not support the docker –gpu option:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Then we need to install the nvidia operator:
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin \
&& helm repo update \
&& helm install --generate-name nvdp/nvidia-device-plugin
4.2. Pod Specification¶
4.3. Test Pods¶
Test pod for cuda functionality
- Manifest example
1apiVersion: v1
2kind: Pod
3metadata:
4 name: gpu-operator-test
5spec:
6 restartPolicy: OnFailure
7 containers:
8 - name: cuda-vector-add
9 image: "nvidia/samples:vectoradd-cuda10.2"
10 resources:
11 limits:
12 nvidia.com/gpu: 1
Test job for nvidia-smi
- Manifest example
1apiVersion: batch/v1
2kind: Job
3metadata:
4 name: smi
5spec:
6 template:
7 spec:
8 containers:
9 - name: smi
10 image: docker.io/nvidia/cuda:11.0-base
11 command: ['nvidia-smi']
12 restartPolicy: OnFailure