使用 Linux vfio 将 Nvidia GPU 透传给 QEMU 虚拟机

資深大佬 : Cloudpods 77

Linux 上虚拟机 GPU 透传需要使用 vfio 的方式。主要是因为在 vfio 方式下对虚拟设备的权限和 DMA 隔离上做的更好。但是这么做也有个缺点，这个物理设备在主机和其他虚拟机都不能使用了。

qemu 直接使用物理设备本身命令行是很简单的，关键在于事先在主机上对系统、内核和物理设备的一些配置。

单纯从 qemu 的命令行来看，其实和普通虚拟机启动就差了最后那个 -device 的选项。这个选项也比较容易理解，就是把主机上的设备 0000:00:01.0 传给了虚拟机使用。

$ qemu-system-x86_64 -m 4096 -smp 4 --enable-kvm    -drive file=~/guest/fedora.img    -device vfio-pci,host=0000:00:01.0

系统及硬件准备

BIOS 中打开 IOMMU

设备直通在 x86 平台上需要打开 iommu 功能。这是 Intel 虚拟技术 VT-d(Virtualization Technology for Device IO) 中的一个部分。有时候这部分的功能没有被打开。

打开的方式在 BIOS 设置中 Security->Virtualization->VT-d 这个位置。当然不同的 BIOS 位置可能会略有不同。记得在使用直通设备前要将这个选项打开。

内核配置勾选 IOMMU

INTEL_IOMMU │ Location: │ │ -> Device Drivers │ │ (2) -> IOMMU Hardware Support (IOMMU_SUPPORT [=y])

内核启动参数 enable IOMMU

BIOS 中打开，内核编译选项勾选还不够。还需要在引导程序中添加上内核启动参数

# 对应编辑 /etc/default/grub, 设置 GRUB_CMDLINE_LINUX= $ cat /etc/default/grub ... GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 rdblacklist=nouveau nouveau.modeset=0" ...   # 重新生成 grub 引导配置文件 $ grub2-mkconfig -o /boot/grub2/grub.cfg   # 将 vfio 相关 module 设置为开机 load $ cat /etc/modules-load.d/vfio.conf vfio vfio_iommu_type1 vfio_pci vfio_virqfd

Setting up IOMMU Kernel parameters

找到 nvidia GPU BusID

record PCI addresses and hardware IDs of the GPU

$ lspci -k | grep -i nvidia -A 3                    41:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)         Subsystem: Device 1b4c:11bf         Kernel driver in use: vfio-pci         Kernel modules: nouveau 41:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)         Subsystem: Device 1b4c:11bf         Kernel driver in use: snd_hda_intel         Kernel modules: snd_hda_intel # pci address => 41:00.0,41:00.1 # device id => 1b4c:11bf   # 这里找到了两张 nvidia 卡，它们的 device id 都是 1b4c:11bf, 一张是 Audio device # 这样是不能 passthrough 进去的，因为: # vfio-pci use your vendor and device id pair to identify which device they need to bind to at boot, # if you have two GPUs sharing such an ID pair you will not be able to get your passthough driver to bind with just one of them # 使用下面的脚本解决这种情况：   $ cat /usr/bin/vfio-pci-override.sh #!/bin/sh   for i in $(find /sys/devices/pci* -name boot_vga); do     if [ $(cat "$i") -eq 0 ]; then         GPU="${i%/boot_vga}"         AUDIO="$(echo "$GPU" | sed -e "s/0$/1/")"         echo "vfio-pci" > "$GPU/driver_override"         if [ -d "$AUDIO" ]; then             echo "vfio-pci" > "$AUDIO/driver_override"         fi     fi done   modprobe -i vfio-pci   # 把脚本传入 /etc/modprobe.d/vfio.conf $ cat /etc/modprobe.d/vfio.conf install vfio-pci /usr/bin/vfio-pci-override.sh options vfio-pci ids=10de:1c82 disable_vga=1

使用 vfio 管理 GPU

# /etc/modprobe.d/vfio.conf, ids 为 lspci 找到的 hardware id, 多个设备的话用','分割 $ cat /etc/modprobe.d/vfio.conf options vfio-pci ids=10de:134d disable_vga=1   # 禁用 NVIDIA nouveau 开源驱动, /etc/modprobe.d/blacklist.conf $ cat /etc/modprobe.d/blacklist.conf blacklist nouveau   # kvm 模块配置, /etc/modprobe.d/kvm.conf $ cat /etc/modprobe.d/kvm.conf options kvm ignore_msrs=1

重启系统，启动完成后查看当前的 nvidia GPU 是否被 vfio-pci 模块使用, 确认 IOMMU 功能确实打开。

$ dmesg | grep -e DMAR -e IOMMU | grep enabled   # 如果能搜索到 DMAR: IOMMU enabled # 表示上述配置成功。   # 查看 GPU 是否被 vfio-pci 使用 # 另外注意检查看看 41:00.1 Audio device 是否也被 vfio-pci 使用 $ lspci -k | grep -i -e nvidia -A 3 41:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)     Subsystem: Device 1b4c:11bf     Kernel driver in use: vfio-pci # GTX 1050 Ti GPU 被 vfio-pci 使用     Kernel modules: nouveau 41:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)     Subsystem: Device 1b4c:11bf     Kernel driver in use: vfio-pci # 发现 Audio device 也被 vfio-pci 使用了     Kernel modules: snd_hda_intel ...

# list GPU IOMMU group $ find /sys/kernel/iommu_groups/ -type l | grep 41:00 /sys/kernel/iommu_groups/27/devices/0000:41:00.0 /sys/kernel/iommu_groups/27/devices/0000:41:00.1   # 找到 IOMMU Group 管理的 PCI 设备 #!/bin/bash shopt -s nullglob for d in /sys/kernel/iommu_groups/*/devices/*; do   n=${d#*/iommu_groups/*}; n=${n%%/*}   printf 'IOMMU Group %s ' "$n"   lspci -nns "${d##*/}" done

使用 qemu 透传 nvidia GPU

准备好 centos7 镜像，然后在虚拟机里面安装 nvidia 官方闭源驱动和 cuda SDK

# 我从服务器上拷贝过来的是 vmdk 的镜像，先把它转换成 qcow2 的格式 $ /usr/local/qemu-2.9.0/bin/qemu-img convert -f vmdk -O qcow2 centos-7.3.1611-20180104.vmdk centos-7.3.1611-20180104.qcow2   # 使用 qemu 启动，注意-cpu 需要 kvm=off 参数 # kvm=off will hide the kvm hypervisor signature, this is required for NVIDIA cards # since its driver will refuse to work on an hypervisor and result in Code 43 on windows $ cat startvm.sh #!/bin/sh /usr/local/qemu-2.9.0/bin/qemu-system-x86_64 -enable-kvm  -m 4096 -cpu host,kvm=off -smp 4,sockets=1,cores=4,threads=1  -drive file=./centos-7.3.1611-20180104.qcow2  -device vfio-pci,host=41:00.0,multifunction=on,addr=0x16  -device vfio-pci,host=41:00.1  -net nic,model=e1000 -net user,hostfwd=tcp::5022-:22  -vnc :1   # 这台虚拟机开了 vnc 和 ssh 端口转发，可以使用 vnc 或者 ssh 访问   # 从 host 进入虚拟机 $ ssh 127.0.0.1 -p 5022   # 查看虚拟机透传进来的显卡 $ lspci -k | grep -i nvidia -A 3 00:04.0 Audio device: NVIDIA Corporation Device 0fb9 (rev a1)     Subsystem: Device 1b4c:11bf     Kernel driver in use: snd_hda_intel     Kernel modules: snd_hda_intel 00:16.0 VGA compatible controller: NVIDIA Corporation GP107 (rev a1)     Subsystem: Device 1b4c:11bf     Kernel modules: nouveau

安装 nvidia 驱动和 Cuda

nvidia 驱动需要从官方下载，如果先安装 cuda 的话会一同安装 nvidia 驱动。接下来采用虚拟机先安装驱动再安装 cuda 的步骤。

参考： installing-nvidia-drivers-centos-7 NVIDIA CUDA GETTINGS STARTED GUIDE FOR LINUX

安装 nvidia 驱动

下载地址： http://www.nvidia.com/object/unix.html

# update 后如果更新内核，需要重启 $ yum -y update   # 安装 gcc 、make 、glibc 等工具和库 $ yum -y groupinstall "Development Tools" $ yum -y install kernel-devel   # Download the latest NVIDIA driver for unix. $ wget http://us.download.nvidia.com/XFree86/Linux-x86_64/390.42/NVIDIA-Linux-x86_64-390.42.run $ yum -y install epel-release $ yum -y install dkms   # Edit /etc/default/grub. Append the following  to “GRUB_CMDLINE_LINUX” rd.driver.blacklist=nouveau nouveau.modeset=0   # Generate a new grub configuration to include the above changes. $ grub2-mkconfig -o /boot/grub2/grub.cfg   # Edit/create /etc/modprobe.d/blacklist.conf and append: blacklist nouveau   # Backup your old initramfs and then build a new one $ mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img $ dracut /boot/initramfs-$(uname -r).img $(uname -r)   # 重启 again   # Run the NVIDIA driver installer and enter yes to all options. $ sh NVIDIA-Linux-x86_64-*.run   # 装好后再一次重启，lspci -k 看下 gpu 使用的驱动是否是 nvidia $ lspci -k | grep -i nvidia -A 3 00:04.0 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1) 00:16.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)     Kernel driver in use: nvidia # 发现已经使用 nvidia 驱动     Kernel modules: nouveau, nvidia_drm, nvidia   # 执行 nvidia-smi 看下输出和温度 $ nvidia-smi Thu Mar 15 01:31:09 2018       +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.42                 Driver Version: 390.42                    | |-------------------------------+----------------------+----------------------+ | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC | | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. | |===============================+======================+======================| |   0  GeForce GTX 105...  Off  | 00000000:00:16.0 Off |                  N/A | | 40%   32C    P0    N/A / 100W |      0MiB /  4040MiB |      0%      Default | +-------------------------------+----------------------+----------------------+   +-----------------------------------------------------------------------------+ | Processes:                                                       GPU Memory | |  GPU       PID   Type   Process name                             Usage      | |=============================================================================| |  No running processes found                                                 | +-----------------------------------------------------------------------------+   $ nvidia-smi -q -d TEMPERATURE   ==============NVSMI LOG==============   Timestamp                           : Thu Mar 15 01:32:42 2018 Driver Version                      : 390.42   Attached GPUs                       : 1 GPU 00000000:00:16.0     Temperature         GPU Current Temp            : 32 C         GPU Shutdown Temp           : 102 C         GPU Slowdown Temp           : 99 C         GPU Max Operating Temp      : N/A         Memory Current Temp         : N/A         Memory Max Operating Temp   : N/A

安装 cuda

下载地址： https://developer.nvidia.com/cuda-downloads 这里选择 runfile ，以后为了方便也可以选择 rpm(network)的方式，会自动帮我们安装 nvidia 驱动

$ wget https://developer.nvidia.com/compute/cuda/9.1/Prod/local_installers/cuda_9.1.85_387.26_linux   # Say no to installing the NVIDIA driver. # The standalone driver you already installed is typically newer than what is packaged with CUDA. # Use the default option for all other choices. $ sh cuda_*.run   # 添加 CUDA 相关的环境变量 export PATH=$PATH:/usr/local/cuda/bin export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH   # make samples $ cd ~/NVIDIA_CUDA-9.1_Samples; make -j 4 $ cd bin/x86_64/linux/release $ ./deviceQuery # 查询 gpu 信息 ./deviceQuery Starting...    CUDA Device Query (Runtime API) version (CUDART static linking)   Detected 1 CUDA Capable device(s)   Device 0: "GeForce GTX 1050 Ti"   CUDA Driver Version / Runtime Version          9.1 / 9.1   CUDA Capability Major/Minor version number:    6.1   Total amount of global memory:                 4040 MBytes (4236312576 bytes)   ( 6) Multiprocessors, (128) CUDA Cores/MP:     768 CUDA Cores   GPU Max Clock rate:                            1481 MHz (1.48 GHz)   Memory Clock rate:                             3504 Mhz   Memory Bus Width:                              128-bit   L2 Cache Size:                                 1048576 bytes ...   $ ./bandwidtTest # 使用 cuda 测试 gpu bandwidth Running on...    Device 0: GeForce GTX 1050 Ti  Quick Mode    Host to Device Bandwidth, 1 Device(s)  PINNED Memory Transfers    Transfer Size (Bytes)    Bandwidth(MB/s)    33554432            9719.0    Device to Host Bandwidth, 1 Device(s)  PINNED Memory Transfers    Transfer Size (Bytes)    Bandwidth(MB/s)    33554432            9215.8    Device to Device Bandwidth, 1 Device(s)  PINNED Memory Transfers    Transfer Size (Bytes)    Bandwidth(MB/s)    33554432            95525.1   Result = PASS   NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

作者: 云联壹云李泽玺

GitHub: https://github.com/yunionio/cloudpods

开源地址： https://www.cloudpods.org/

Cloudpods 是一个开源的 Golang 实现的云原生的多云和混合云融合平台。Cloudpods 不仅可以管理本地的虚拟机和物理机资源，还可以管理其他公有云和私有云平台的资源。

大佬有話說 (0)