• MindSpore:【resnet_thor模型】尝试运行resnet_thor时报Could not convert to


    问题描述:

    【功能模块】

    用mindspore-ascend-1.1.1 运行resnet_thor(仓库地址:https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet_thor)时报错。

    【操作步骤&问题现象】

    1、解压imagenet2012数据集

    2、注释掉src/dataset_helper.py中的160-162行(否则这里会抛出异常)

    3、cd resnet_thor && python train.py --dataset_path=/home/ImageNet2012_origin

    报错信息:
    WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.
    [ERROR] CORE(167346,python):2021-03-31-17:06:03.564.646 [mindspore/core/utils/status.cc:43] Status] Thread ID 281470327271920 Unexpected error. Could not convert to CV Tensor
    Line of code : 142
    File : /home/jenkins/agent-working-dir/workspace/Compile_Ascend_ARM_Ubuntu/mindspore/mindspore/ccsrc/minddata/dataset/kernels/image/image_utils.cc

    Traceback (most recent call last):
    File "train.py", line 143, in
    model.train(config.epoch_size, dataset, callbacks=cb)
    File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 592, in train
    sink_size=sink_size)
    File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 391, in _train
    self._train_dataset_sink_process(epoch, train_dataset, list_callback, cb_params, sink_size)
    File "/home/resnet_thor/src/model_thor.py", line 183, in _train_dataset_sink_process
    iter_first_order=iter_first_order)
    File "/home/resnet_thor/src/model_thor.py", line 122, in _exec_preprocess
    dataset_helper = DatasetHelper(dataset, dataset_sink_mode, sink_size, epoch_num, iter_first_order)
    File "/home/resnet_thor/src/dataset_helper.py", line 72, in init
    self.iter = iterclass(dataset, sink_size, epoch_num, iter_first_order)
    File "/home/resnet_thor/src/dataset_helper.py", line 156, in init
    super().init(dataset, sink_size, epoch_num)
    File "/home/resnet_thor/src/dataset_helper.py", line 106, in init
    dataset.transfer_dataset = _exec_datagraph(dataset, self.sink_size)
    File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/_utils.py", line 62, in _exec_datagraph
    dataset_types, dataset_shapes = _get_types_and_shapes(exec_dataset)
    File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/_utils.py", line 51, in _get_types_and_shapes
    dataset_types = _convert_type(dataset.output_types())
    File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/dataset/engine/datasets.py", line 1443, in output_types
    self.saved_output_shapes = runtime_getter[0].GetOutputShapes()
    RuntimeError: Thread ID 281470327271920 Unexpected error. Could not convert to CV Tensor
    Line of code : 142
    File : /home/jenkins/agent-working-dir/workspace/Compile_Ascend_ARM_Ubuntu/mindspore/mindspore/ccsrc/minddata/dataset/kernels/image/image_utils.cc


    报错截图:

    解决方案:

    看报错应该是数据集使用方式不对,应该是数据集路径没有使用到训练那级的路径,排查下数据集,可以试下

    python train.py --dataset_path=/home/ImageNet2012_origin/train

    参考了@zhaoting_731 做了修改后,原来的问题解决了,但是遇到了新的报错

    看起来似乎和hccl 多卡训练有关系,但我运行的命令是:

    python train.py --dataset_path=/home/ImageNet2012_origin/ilsvrc

    所以run_distribute是默认的False,走的应该是单卡训练

    错误信息:

    WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.

    WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.

    [ERROR] HCCL_ADPT(78728,python):2021-04-06-20:10:05.673.721 [mindspore/ccsrc/runtime/hccl_adapter/hccl_adapter.cc:124] GenTask] : The pointer[ops_kernel_builder] is null.

    Traceback (most recent call last):

      File "train.py", line 143, in

        model.train(config.epoch_size, dataset, callbacks=cb)

      File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 592, in train

        sink_size=sink_size)

      File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 391, in _train

        self._train_dataset_sink_process(epoch, train_dataset, list_callback, cb_params, sink_size)

      File "/home/thor/mindspore/model_zoo/official/cv/resnet_thor/src/model_thor.py", line 254, in _train_dataset_sink_process

        outputs = self._train_network(*inputs)

      File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/nn/cell.py", line 322, in __call__

        out = self.compile_and_run(*inputs)

      File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/nn/cell.py", line 578, in compile_and_run

        self.compile(*inputs)

      File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/nn/cell.py", line 565, in compile

        _executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)

      File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/common/api.py", line 505, in compile

        result = self._executor.compile(obj, args_list, phase, use_vm)

    RuntimeError: mindspore/ccsrc/runtime/hccl_adapter/hccl_adapter.cc:124 GenTask] : The pointer[ops_kernel_builder] is null.

     

    model zoo中的这个示例主要是针对多卡场景的,目前我们已经将resnet及resnet_thor脚本合并为resnet,如果想要运行单卡训练的话,推荐使用resnet目录下的代码,将src/config.py中的优化器改为Thor,然后按照README 执行训练。如:

    python train.py --net=resnet50 --dataset=imagenet2012 --device_target=Ascend --dataset_path=[DATASET_PATH]
  • 相关阅读:
    『第四章』一见倾心:初识小雨燕(上)
    FPGA结构:LATCH(锁存器)和 FF(触发器)介绍
    C++的动态内存分配
    【OpenCV】 透视变换 生活实际场景中的应用
    Linux环境搭建与登陆
    Linux系统的定时任务
    redis在日常开发工作中的常见用法
    Spring AI 第二讲 之 Chat Model API 第三节Azure OpenAI Chat
    使用 Databend 加速 Hive 查询
    第七天:gec6818开发板QT和Ubuntu中QT安装连接sqlite3数据库驱动环境保姆教程
  • 原文地址:https://blog.csdn.net/weixin_45666880/article/details/126059791