13 年 MBP，外接 RX580 显卡，运行 Keras (TensorFlow) + PlaidML

資深大佬 : IgniteWhite 2

MacBook Pro 配置

主力机是 2013 年初的 15 寸 MBP：

2.7 GHz Intel Core i7
一代雷电接口
Intel HD4000 集显
GT 650m 独显
macOS Mojave 10.14.6

总之是比较老的机子了。独显性能差，且运行时风扇转速高，发热严重

eGPU 配置

蓝宝石 RX 580 8GB
Razer Core X 显卡盒子
USB C 转雷电 1/2 代转换器
雷电 1/2 代延长线

我用的这一代 MBP 连接 eGPU 需要运行：

purge-wrangler.sh，用来让老机子运行 eGPU
purge-nvda.sh，禁用独显必要脚本之一
Ubuntu GNU grub.cfg 魔改的 boot 文件，禁用独显必要步骤之一

教程详见这个eGPU.io 帖子

这个配置虽然把 RX 580 的数据传输性能限制到了一代雷电的水平，可以说是大打折扣。但是完全带得动 LG 4k60p 显示器（直连显卡盒子）。

PlaidML

使用PlaidML，在 pyenv 创建的 Python 3.8.6 虚拟环境里安装：

ipykernel
h5py<3.0.0 (need to enter “h5py<3.0.0”) actually 2.10.0. This version is required by tensorflow.
plaidml-keras (only available on PyPI, keras is also included in this package)
plaidbench (only available on PyPI)
tensorflow (just install plaidml-keras is not enough, I have to install this. I installed it after those packages)

括号里是我个人 wiki 里的内容，直接搬过来了。

VGG-19 测试结果

使用 CPU：

2020-12-05 15:00:42.457154: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fde5c7c9110 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-12-05 15:00:42.457188: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version Running initial batch (compiling tile program) Timing inference... Ran in 35.34518790245056 seconds

使用 RX580 eGPU：

Using plaidml.keras.backend backend. INFO:plaidml:Opening device "metal_amd_radeon_rx_580.0" Running initial batch (compiling tile program) Timing inference... Ran in 2.5728609561920166 seconds

我不是专业的机器学习从业者，如果大家对上面哪一环节有兴趣进一步了解，我会详细解释～

大佬有話說 (11)

主資深大佬 : IgniteWhite

实际使用中，我用一个树莓派 4B 装了 Manjaro ARM 作为辅助系统运行 JupyterLab，通过一个叫 remote_ikernel 的包，局域网访问 MBP 的 Keras IPython kernel 。这个远程 kernel 是通过 SSH 连接的，所以能连接各种机子，包括专门跑机器学习的云服务。

树莓派装了热管加风扇散热，外接三星 SSD 做 boot+root 。它就放在我的显卡盒子上：
主資深大佬 : IgniteWhite

对了补充一个细节：这里 PlaidML 能在 Metal 或者 OpenCL 上跑，我用的是 Metal 。
另一些跑分结果参见 @YUX 的帖子： /t/660085
資深大佬 : volvo007

@IgniteWhite 跟着帖子的内容我也装了，机器 13 寸的只有集显。这里请教一下，要怎么切换到显卡运算上面，接了外接显卡但还是调用的 cpu 在计算啊
資深大佬 : volvo007

@IgniteWhite 唔，稍微补充一下，前面的问题解决了，再运行一次 plaidml-setup，让脚本识别到 vega 显卡就可以了。选择 Metal，后面跑的时候就自动用显卡了

于是要请教的问题就变成了——能否通过 py 代码实现选择用于计算的硬件（ cpu or gpu ），每次跑模型之前都运行一次 plaidml-setup 也不是个事
資深大佬 : volvo007

刚刚忙了点别的，后面在手册里找到相关设置了：
> export PLAIDML_EXPERIMENTAL=1
> export PLAIDML_DEVICE_IDS=opencl_intel_uhd_graphics_630.0

这样在 rc 文件里设置一下就行，IDS 后面跟的就是 plaidml-setup 里面出现的那些设备 ID

于是可以考虑 rc 文件里绑几个 alias，跑代码前切换一下就好了，例如我的：

alias tfcpu=’export PLAIDML_EXPERIMENTAL=1 && export PLAIDML_DEVICE_IDS=llvm_cpu.0′

alias tfint=’export PLAIDML_EXPERIMENTAL=1 && export PLAIDML_DEVICE_IDS=metal_intel(r)_iris(tm)_plus_graphics.0′

alias tfgpu=’export PLAIDML_EXPERIMENTAL=1 && export PLAIDML_DEVICE_IDS=metal_amd_radeon_rx_vega_64.0′

对于相同的 plaidmlbench keras mobilenet 测试命令

——–
tfcpu 就是放在 cpu 上跑，
Example finished, elapsed: 2.923s (compile), 99.922s (execution)

tfint 则放在集显上跑
Example finished, elapsed: 0.401s (compile), 18.213s (execution)

tfgpu 激活 gpu 去跑（ Vega56 刷的 Vega64 的 bios ）
Example finished, elapsed: 0.413s (compile), 8.597s (execution)

效果还不错
主資深大佬 : IgniteWhite

@volvo007 厉害啊，我还没看手册哈哈，感谢分享！我看到有人说对于 eGPU，Metal 比不上 OpenCL 快，对于独显 dGPU，哪个好还有待商榷
資深大佬 : volvo007

@IgniteWhite 对比 cpu 能有很大提升就很满足了，这一套转译 CUDA 的想法还是很不错的
主資深大佬 : IgniteWhite

@volvo007 我刚刚用 OpenCL 的 eGPU 跑了一下 vgg19，用了 7.6 秒……看来还是 Metal 快
資深大佬 : fx777

给主的折腾精神，点赞。
資深大佬 : BugenZhao

之前试过 plaidml，keras 的一些实现可能有 bug，训练过程中经常变成 NaN，换用 cuda 上的 tensorflow 就没有问题
主資深大佬 : IgniteWhite

@BugenZhao 谢谢提醒