本人在做实例分割调研时,找到模型TensorMask,其需要安装前置框架Detectron2。在Detectron2的安装文档INSTALL.md
中并没有Windows的安装手册,且需要Linux的 gcc & g++ 的环境。这里提供本人在Windows编译过程。
先说一下配置,2022年3月份购入的Y9000p,处理器i7-12700H、显卡Nvidia GTX 3060 6GB。
笔者使用的是Anconda3 4.12.0的Python3.8.8虚拟环境,安装CUDA 11.3、
CUDNN8.2.0、PyTorch 1.10.2+cu113、Torchvision 0.10.3+cu113。
首先安装MSVC VS C++生成工具。笔者是在Visual Studio Installer Enterprise 2019 安装使用C++的桌面开发 。下载链接
Microsoft Visual Studio Community 2022也行,看个人需求
安装完成后在环境变量加入\安装路径\2019\Enterprise\VC\Auxiliary\Build\
, 如放在D盘Softwares文件夹里面:
确定后输入Win
+R
输入cmd,进入命令提示符输入vcvars64和cl,验证是否配置完成。(出现下列信息即配置完成)
使用Git克隆源码(也可直接在Github下载压缩包解压) ,按照顺序输入以下命令进行安装:
pip instal opencv-python
git clone https://github.com/facebookresearch/detectron2.git detectron2
cd detectron2
SET DISTUTILS_USE_SDK=1
vcvars64
pip install -e .
出现以下提示则安装完成:
Installing collected packages: termcolor, tensorboard-plugin-wit, pywin32, pyasn1, mypy-extensions, antlr4-python3-runtime, zipp, urllib3, tomli, tensorboard-data-server, tabulate, six, rsa, pyyaml, pyparsing, pyasn1-modules, protobuf, portalocker, platformdirs, pathspec, oauthlib, MarkupSafe, kiwisolver, idna, future, fonttools, cycler, colorama, cloudpickle, charset-normalizer, cachetools, absl-py, yacs, werkzeug, tqdm, requests, python-dateutil, pydot, packaging, omegaconf, importlib-resources, importlib-metadata, grpcio, google-auth, fairscale, click, timm, requests-oauthlib, matplotlib, markdown, iopath, hydra-core, black, pycocotools, google-auth-oauthlib, fvcore, tensorboard, detectron2
Running setup.py develop for detectron2
Successfully installed detectron2-0.6
试着运行demo程序,将COCO2017的000000439715.jpg
放进demo文件夹,输入命令:
如果想保存结果,加入参数
--outout
即可;若无则使用OpenCV的imshow展示结果
cd demo
python demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input 000000439715.jpg --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
进入projects/TensorMask
目录,输入简单安装命令:
cd projects\TensorMask
pip install -e .
不出意外,应该会报错:
note: This error originates from a subprocess, and is likely not a problem with pip.
往上翻阅,发现这个错误:
× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [91 lines of output]
running develop
running egg_info
writing tensormask.egg-info\PKG-INFO
writing dependency_links to tensormask.egg-info\dependency_links.txt
writing top-level names to tensormask.egg-info\top_level.txt
reading manifest file 'tensormask.egg-info\SOURCES.txt'
writing manifest file 'tensormask.egg-info\SOURCES.txt'
running build_ext
building 'tensormask._C' extension
Emitting ninja build file D:\Workspace\exps\detectron2\projects\TensorMask\build\temp.win-amd64-3.8\Release\build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
D:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
D:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
[1/1] D:\Softwares\CUDA\v11.3\bin\nvcc --generate-dependencies-with-compile --dependency-output D:\Workspace\exps\detectron2\projects\TensorMask\build\temp.win-amd64-3.8\Release\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -DWITH_CUDA -ID:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include\torch\csrc\api\include -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include\TH -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include\THC -ID:\Softwares\CUDA\v11.3\include -ID:\Softwares\Anaconda3\envs\detectron2\include -ID:\Softwares\Anaconda3\envs\detectron2\Include "-ID:\Softwares\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-ID:\Softwares\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include" "-ID:\Windows Kits\10\include\10.0.19041.0\ucrt" "-ID:\Windows Kits\10\include\10.0.19041.0\shared" "-ID:\Windows Kits\10\include\10.0.19041.0\um" "-ID:\Windows Kits\10\include\10.0.19041.0\winrt" "-ID:\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu -o D:\Workspace\exps\detectron2\projects\TensorMask\build\temp.win-amd64-3.8\Release\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
FAILED: D:/Workspace/exps/detectron2/projects/TensorMask/build/temp.win-amd64-3.8/Release/Workspace/exps/detectron2/projects/TensorMask/tensormask/layers/csrc/SwapAlign2Nat/SwapAlign2Nat_cuda.obj
D:\Softwares\CUDA\v11.3\bin\nvcc --generate-dependencies-with-compile --dependency-output D:\Workspace\exps\detectron2\projects\TensorMask\build\temp.win-amd64-3.8\Release\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -DWITH_CUDA -ID:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include\torch\csrc\api\include -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include\TH -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include\THC -ID:\Softwares\CUDA\v11.3\include -ID:\Softwares\Anaconda3\envs\detectron2\include -ID:\Softwares\Anaconda3\envs\detectron2\Include "-ID:\Softwares\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-ID:\Softwares\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include" "-ID:\Windows Kits\10\include\10.0.19041.0\ucrt" "-ID:\Windows Kits\10\include\10.0.19041.0\shared" "-ID:\Windows Kits\10\include\10.0.19041.0\um" "-ID:\Windows Kits\10\include\10.0.19041.0\winrt" "-ID:\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu -o D:\Workspace\exps\detectron2\projects\TensorMask\build\temp.win-amd64-3.8\Release\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(438): error: no instance of function template "at::cuda::ATenCeilDiv" matches the argument list
argument types are: (int64_t, long)
D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(438): error: no instance of overloaded function "std::min" matches the argument list
argument types are: (<error-type>, long)
D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(495): error: no instance of function template "at::cuda::ATenCeilDiv" matches the argument list
argument types are: (int64_t, long)
D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(495): error: no instance of overloaded function "std::min" matches the argument list
argument types are: (<error-type>, long)
4 errors detected in the compilation of "D:/Workspace/exps/detectron2/projects/TensorMask/tensormask/layers/csrc/SwapAlign2Nat/SwapAlign2Nat_cuda.cu".
SwapAlign2Nat_cuda.cu
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\utils\cpp_extension.py", line 1717, in _run_ninja_build
subprocess.run(
File "D:\Softwares\Anaconda3\envs\detectron2\lib\subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
注意,这里并不是按照网上方法,修改['ninja', '-v']
为['ninja', '-V']
或者['ninja', '--version']
就能解决问题。产生错误的原因在这里:
D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(438): error: no instance of function template "at::cuda::ATenCeilDiv" matches the argument list
argument types are: (int64_t, long)
D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(438): error: no instance of overloaded function "std::min" matches the argument list
argument types are: (<error-type>, long)
D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(495): error: no instance of function template "at::cuda::ATenCeilDiv" matches the argument list
argument types are: (int64_t, long)
D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(495): error: no instance of overloaded function "std::min" matches the argument list
argument types are: (<error-type>, long)
4 errors detected in the compilation of "D:/Workspace/exps/detectron2/projects/TensorMask/tensormask/layers/csrc/SwapAlign2Nat/SwapAlign2Nat_cuda.cu".
可以发现,这里是因为detectron2/projects/TensorMask/tensormask/layers/csrc/SwapAlign2Nat/SwapAlign2Nat_cuda.cu
文件的整数类型和浮点数类型产生了冲突,导致无法使用C++进行源代码编译。找到该文件后,修改438行和495行的数据类型:
// 438行
// dim3 grid(std::min(at::cuda::ATenCeilDiv(Y.numel(), 512L), 4096L));
dim3 grid(std::min(at::cuda::ATenCeilDiv((int)Y.numel(), 512), 4096));
// 495行
// dim3 grid(std::min(at::cuda::ATenCeilDiv(gY.numel(), 512L), 4096L));
dim3 grid(std::min(at::cuda::ATenCeilDiv((int)gY.numel(), 512), 4096));
重新运行pip install -e .
,编译安装成功:
Obtaining file:///D:/Workspace/exps/detectron2/projects/TensorMask
Preparing metadata (setup.py) ... done
Installing collected packages: tensormask
Running setup.py develop for tensormask
Successfully installed tensormask-0.1
试着运行TensorMask的训练文件`train_net.py·:
python train_net.py --config-file configs/tensormask_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 1
不出意外的话又报错了:
Traceback (most recent call last):
File "train_net.py", line 63, in <module>
launch(
File "d:\workspace\exps\detectron2\detectron2\engine\launch.py", line 82, in launch
main_func(*args)
File "train_net.py", line 55, in main
trainer = Trainer(cfg)
File "d:\workspace\exps\detectron2\detectron2\engine\defaults.py", line 378, in __init__
data_loader = self.build_train_loader(cfg)
File "d:\workspace\exps\detectron2\detectron2\engine\defaults.py", line 547, in build_train_loader
return build_detection_train_loader(cfg)
File "d:\workspace\exps\detectron2\detectron2\config\config.py", line 207, in wrapped
explicit_args = _get_args_from_config(from_config, *args, **kwargs)
File "d:\workspace\exps\detectron2\detectron2\config\config.py", line 245, in _get_args_from_config
ret = from_config_func(*args, **kwargs)
File "d:\workspace\exps\detectron2\detectron2\data\build.py", line 344, in _train_loader_from_config
dataset = get_detection_dataset_dicts(
File "d:\workspace\exps\detectron2\detectron2\data\build.py", line 241, in get_detection_dataset_dicts
dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
File "d:\workspace\exps\detectron2\detectron2\data\build.py", line 241, in <listcomp>
dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
File "d:\workspace\exps\detectron2\detectron2\data\catalog.py", line 58, in get
return f()
File "d:\workspace\exps\detectron2\detectron2\data\datasets\coco.py", line 500, in <lambda>
DatasetCatalog.register(name, lambda: load_coco_json(json_file, image_root, name))
File "d:\workspace\exps\detectron2\detectron2\data\datasets\coco.py", line 69, in load_coco_json
coco_api = COCO(json_file)
File "D:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\pycocotools\coco.py", line 81, in __init__
with open(annotation_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'datasets\\coco/annotations/instances_train2017.json'
提示是找不到COCO数据集的instances_train2017.json
,这里我们创建一个软链接,链接到本地的数据集路径:
# 这里的D:\Workspace\data是笔者的数据集存放路径
mklink /j datasets D:\Workspace\data
再次运行训练命令,这里又又报错了:
Traceback (most recent call last):
File "train_net.py", line 63, in <module>
launch(
File "d:\workspace\exps\detectron2\detectron2\engine\launch.py", line 82, in launch
main_func(*args)
File "train_net.py", line 55, in main
trainer = Trainer(cfg)
File "d:\workspace\exps\detectron2\detectron2\engine\defaults.py", line 396, in __init__
self.register_hooks(self.build_hooks())
File "d:\workspace\exps\detectron2\detectron2\engine\defaults.py", line 463, in build_hooks
ret.append(hooks.PeriodicWriter(self.build_writers(), period=20))
File "d:\workspace\exps\detectron2\detectron2\engine\defaults.py", line 475, in build_writers
return default_writers(self.cfg.OUTPUT_DIR, self.max_iter)
File "d:\workspace\exps\detectron2\detectron2\engine\defaults.py", line 248, in default_writers
TensorboardXWriter(output_dir),
File "d:\workspace\exps\detectron2\detectron2\utils\events.py", line 145, in __init__
from torch.utils.tensorboard import SummaryWriter
File "D:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\utils\tensorboard\__init__.py", line 4, in <module>
LooseVersion = distutils.version.LooseVersion
AttributeError: module 'distutils' has no attribute 'version'
原因是setuptools版本太高,笔者的环境是61.2.0,因此强制安装59.5.0 [ 3 ] ^{[3]} [3],再次运行训练命令:
开始训练,安装教程结束。
2022.8.7
笔者之前也是被detectron2折磨过了好几天,后面仔细翻阅大量资料,发现很多解决方案都是修改['ninja', '-V']
,修改过后会报LINK : fatal error LNK1181: cannot open input file ...
的错误,实际上这是因为ninja编译错误导致没有生成编译过后的文件才导致的缺陷问题。
因此需要往上追溯,寻找错误的原点,最终发现错误的地方在tensormask/layers/csrc/SwapAlign2Nat/SwapAlign2Nat_cuda.cu
的438和495行处,因为数据类型不符合导致编译错误,可能原因是在Linux的gcc&g++平台中,int64_t能够与long浮点数进行胡同(?),在Windows的MSVC下则要求严格必须统一数据类型。
在此期间追溯错误的根源,重新审视了自己——因为时间的挤压导致自己忘记了学习的初衷:发现错误,就要追溯到根源,并且提出解决的方案。
期间一直抽空查阅C++的相关资料,从零开始学习Python的话可能不会接触到C++因此找不到正确的解决方案。本次分享是为了纪念自己重新找回学习、科研的初衷;同时也为了解决部分因Windows编译而困扰的网友们。最近有时间的话将更新detectron2的自定义数据集训练和验证。。。
有问题大家也可以在评论区回复或私信,基本都会秒回,我也是个菜鸟hhh.
[1] 2022年最新的Detectron 2 (0.6) 安装流程(联想笔记本Y9000K+Anaconda+Win 11 +RTX3070)
[2] Detectron2——0.2.1安装(windows10)
[3] AttributeError: module ‘distutils’ has no attribute ‘version’ 解决方案