系统版本兼容要求
- centos7.2 cuda9.0 cudnn7.4
- centos7.5 cuda9.2 cudnn7.4
- yum -y install gcc gcc-c++ kernel-devel
-
- package manage-overview
- https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-overview
1、安装gpu显卡驱动
查看nvidia gpu信息
# nvidia-smi
2、安装nvidia检测
2.1添加ElRepo源
- # rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
- # rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
-
- # rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
2.2、安装显卡驱动检查
yum install nvidia-detect
2.3 运行
- # nvidia-detect -v
- Probing for supported NVIDIA devices...
- [10de:15f8] NVIDIA Corporation Device 15f8
- This device requires the current 410.78 NVIDIA driver kmod-nvidia
- [10de:15f8] NVIDIA Corporation Device 15f8
- This device requires the current 410.78 NVIDIA driver kmod-nvidia
- [102b:0538] Matrox Electronics Systems Ltd. Device 0538
2.4、编辑grub文件
vim /etc/default/grub
在“GRUB_CMDLINE_LINUX”中添加
rd.driver.blacklist=nouveau nouveau.modeset=0
改完后的文件如下:
- GRUB_TIMEOUT=5
- GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
- GRUB_DEFAULT=saved
- GRUB_DISABLE_SUBMENU=true
- GRUB_TERMINAL_OUTPUT="console"
- GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rd.driver.blacklist=nouveau nouveau.modeset=0 rhgb quiet"
- GRUB_DISABLE_RECOVERY="true"
随后生成配置
grub2-mkconfig -o /boot/grub2/grub.cfg
2.5、创建blacklist
vim /etc/modprobe.d/blacklist.conf
添加
blacklist nouveau
2.6、更新配置
- mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
- dracut /boot/initramfs-$(uname -r).img $(uname -r)
2.7、重启
reboot
2.8、确认禁用了nouveau
lsmod | grep nouveau
若无输出则禁用成功
3、安装cuda
cuda下载地址:
- https://developer.nvidia.com/cuda-toolkit
-
- # sh cuda_9.0.176_384.81_linux.run
如果出现you appear to be running an x server please exit x before installing
执行init 3 进入命令行模式,杀掉x server,然后再执行安装命令
- ===========
- = Summary =
- ===========
- Driver: Installed
- Toolkit: Installed in /usr/local/cuda-9.0
- Samples: Installed in /root, but missing recommended libraries
-
- Please make sure that
- - PATH includes /usr/local/cuda-9.0/bin
- - LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add /usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root
-
- To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin
- To uninstall the NVIDIA Driver, run nvidia-uninstall
-
- Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf for detailed information on setting up CUDA.
-
- Logfile is /tmp/cuda_install_7874.log
验证CUDA 9.0 是否安装成功
终端输入:
nvcc -V
可以看到cuda的版本信息
接着尝试运行一下cuda中自带的例子:
- cd /usr/local/cuda-9.0/samples/1_Utilities/deviceQuery
- make
- ./deviceQuery
可以看到输出成功
- deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 2
- Result = PASS
卸载
- To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin
- To uninstall the NVIDIA Driver, run nvidia-uninstall
3、安装cudnnv7
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html
下载完成以后将其解压到Cuda的目录当中,依次执行如下命令:
- tar -xzvf cudnn-9.0-linux-x64-v7.4.1.5.tgz
- sudo cp cuda/include/cudnn.h /usr/local/cuda/include
- sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
- sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
运行一个小Demo即可.
如果安装了 例程和用户指南 这个包的话,我们可以找到位于 /usr/src/cudnn_samples_v7的mnistCUDNN这个小例子.
拷贝到 你的home/yourdir 任意文件夹下
$cp -r /usr/src/cudnn_samples_v7/ $HOME
进入 mnistCUDNN
$ cd $HOME/cudnn_samples_v7/mnistCUDNN
编译
$make clean && make
运行
$ ./mnistCUDNN
如果安装成功了,你会看到这样结果
Test passed!
其实还可以cmake 一下你的caffe/build,也能很快测试是否安装成功
13.安装gpu版的TensorFlow(先配置加速器)
$ sudo pip install tensorflow-gpu
root用户在根目录下新建.pip目录,在目录中创建文件pip.conf(/root/.pip/pip.conf),配置内容如下,这里使用的清华源,还是挺快的:
- [global]
- index-url=https://pypi.tuna.tsinghua.edu.cn/simple
配置完成,无需任何操作,直接通过pip install即可安装任何想要的工具,再次来对比一下(输入pip install tensorflow之后立马截图就已经是如下图所示的效果)。
14.测试TensorFlow
走过前面的沟沟坎坎,终于到了测试这一步了,是不是很happy。
- [root@gpuserver ~]# python
- Python 2.7.5 (default, Nov 20 2015, 02:00:19)
- [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
- Type "help", "copyright", "credits" or "license" for more information.
- >>> import tensorflow as tf
- >>> hello = tf.constant('Hello, TensorFlow!')
- >>> sess = tf.Session()
- 2018-12-12 17:10:51.572488: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
- >>> sess = tf.Session()
- >>> print(sess.run(hello))
- Hello, TensorFlow!
- >>>
如果你可以正确的运行上面这个小的例子,那么恭喜你,gpu版的TensorFlow安装成功了,还等什么,赶紧造起来吧!
centos7.2安装pip
- yum install -y epel-release
- yum install -y python-pip
6、安装kernel-devel
yum -y install kernel-devel
centos7.2配置图形化界面启动
- # systemctl get-default
- multi-user.target
- # systemctl set-default graphical.target
附录:
1、cuda安装过程记录
- Installing the NVIDIA display driver...
- Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...
- Missing recommended library: libGLU.so
- Missing recommended library: libX11.so
- Missing recommended library: libXi.so
- Missing recommended library: libXmu.so
-
- Installing the CUDA Samples in /root ...
- Copying samples to /root/NVIDIA_CUDA-10.0_Samples now...
- Finished copying samples.
-
- ===========
- = Summary =
- ===========
-
- Driver: Installed
- Toolkit: Installed in /usr/local/cuda-10.0
- Samples: Installed in /root, but missing recommended libraries
-
- Please make sure that
- - PATH includes /usr/local/cuda-10.0/bin
- - LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root
-
- To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin
- To uninstall the NVIDIA Driver, run nvidia-uninstall
-
- Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.
-
- Logfile is /tmp/cuda_install_16878.log