• 华为bug汇报:华为NPU竟成“遥遥领先”?


    华为bug汇报:华为NPU竟成“遥遥领先”?

    本文为我汇报在Ascend / pytorch 社区的一个bug,其中对NPU的实际算力进行了测试,并发现了华为NPU实际显存与销售宣传时存在着较大差差距的问题(算力问题见问题一、显存问题见问题二)。
    究竟是遥遥领先,还是?

    bug描述汇总

    本机NPU为Atlas 300I Pro
    本issue一共汇报两个问题,并在最后附上自己的环境信息,希望尽快寻得解释
    问题一:推理速度慢且最终报错,只有1.20it/s
    问题二:NPU显存计算与GPU似乎不同,原因为何?

    问题1:

    报错代码:

    使用NPU进行fp16

    import torch
    import torch_npu
    from accelerate import Accelerator
    accelerator = Accelerator()
    from accelerate import dispatch_model
    
    device = accelerator.device
    
    # source '/home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh'
    
    x = torch.randn(2, 2).npu()
    y = torch.randn(2, 2).npu()
    z = x.mm(y)
    
    print(z)
    print(device)
    
    # from modelscope import Model, AutoTokenizer
    
    
    # model = Model.from_pretrained("modelscope/Llama-2-7b-ms", revision='v1.0.1', device_map=device, torch_dtype=torch.float16)
    # tokenizer = AutoTokenizer.from_pretrained("modelscope/Llama-2-7b-ms", revision='v1.0.1')
    
    # prompt = "Hey, are you conscious? Can you talk to me?"
    # inputs = tokenizer(prompt, return_tensors="pt")
    
    # # Generate
    # generate_ids = model.generate(inputs.input_ids.to(model.device), max_length=30)
    # print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
    from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download
    from modelscope import GenerationConfig
    
    # Note: The default behavior now has injection attack prevention off.
    model_dir = snapshot_download("qwen/Qwen-7B-Chat", revision = 'v1.1.4')
    
    tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
    
    # use bf16
    # model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, bf16=True).eval()
    # use fp16
    model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True, fp16=True).eval()
    # use cpu only
    # model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="cpu", trust_remote_code=True).eval()
    # use auto mode, automatically select precision based on the device.
    # model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True).eval()
    
    # Specify hyperparameters for generation
    model.generation_config = GenerationConfig.from_pretrained(model_dir, trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
    
    # 第一轮对话 1st dialogue turn
    response, history = model.chat(tokenizer, "你好", history=None)
    print(response)
    # 你好!很高兴为你提供帮助。
    
    # 第二轮对话 2nd dialogue turn
    response, history = model.chat(tokenizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history)
    print(response)
    # 这是一个关于一个年轻人奋斗创业最终取得成功的故事。
    # 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。
    # 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。
    # 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。
    # 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。
    # 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。
    
    # 第三轮对话 3rd dialogue turn
    response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
    print(response)
    # 《奋斗创业:一个年轻人的成功之路》
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69

    补充:我对代码进行了修改,使得能够区分报错发生在哪一步,具体控制台输出均在报错内容

    import torch
    import torch_npu
    from accelerate import Accelerator
    accelerator = Accelerator()
    from accelerate import dispatch_model
    
    device = accelerator.device
    
    # source '/home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh'
    
    x = torch.randn(2, 2).npu()
    y = torch.randn(2, 2).npu()
    z = x.mm(y)
    
    print(z)
    print(device)
    
    # from modelscope import Model, AutoTokenizer
    
    
    # model = Model.from_pretrained("modelscope/Llama-2-7b-ms", revision='v1.0.1', device_map=device, torch_dtype=torch.float16)
    # tokenizer = AutoTokenizer.from_pretrained("modelscope/Llama-2-7b-ms", revision='v1.0.1')
    
    # prompt = "Hey, are you conscious? Can you talk to me?"
    # inputs = tokenizer(prompt, return_tensors="pt")
    
    # # Generate
    # generate_ids = model.generate(inputs.input_ids.to(model.device), max_length=30)
    # print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
    from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download
    from modelscope import GenerationConfig
    
    # Note: The default behavior now has injection attack prevention off.
    model_dir = snapshot_download("qwen/Qwen-7B-Chat", revision = 'v1.1.4')
    
    tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
    
    # use bf16
    # model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, bf16=True).eval()
    # use fp16
    model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True, fp16=True).eval()
    # use cpu only
    # model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="cpu", trust_remote_code=True).eval()
    # use auto mode, automatically select precision based on the device.
    # model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True).eval()
    
    # Specify hyperparameters for generation
    model.generation_config = GenerationConfig.from_pretrained(model_dir, trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
    response, history = model.chat(tokenizer, input("请输入问题:"), history=None)
    print(response)
    while True:
        # 第二轮对话 2nd dialogue turn
        response, history = model.chat(tokenizer, input("请输入问题:"), history=history)
        print(response)
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    报错内容:

    上述代码推理占用时间近似纯CPU推理,且第三轮对话报错,控制台输出如下:

    (NPU) (base) [HwHiAiUser@bogon Code]$ /home/HwHiAiUser/下载/yes/envs/NPU/bin/python /home/HwHiAiUser/Code/main.py
    Warning: Device do not support double dtype now, dtype cast repalce with float.
    tensor([[-1.8972, -0.0742],
            [-0.6470, -0.0174]], device='npu:0')
    npu
    2023-10-23 00:27:21,121 - modelscope - INFO - PyTorch version 2.1.0+cpu Found.
    2023-10-23 00:27:21,122 - modelscope - INFO - Loading ast index from /home/HwHiAiUser/.cache/modelscope/ast_indexer
    2023-10-23 00:27:21,141 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 068f7e60e6f05d224ec8ad9a969f5922 and a total number of 943 components indexed
    2023-10-23 00:27:21,675 - modelscope - INFO - Use user-specified model revision: v1.1.4
    /home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/tiktoken/core.py:50: ResourceWarning: unclosed 
      self._core_bpe = _tiktoken.CoreBPE(mergeable_ranks, special_tokens, pat_str)
    Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。
    Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
    Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
    Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
    Loading checkpoint shards: 100%|██████████████████████| 8/8 [00:06<00:00,  1.20it/s]
    [W OpCommand.cpp:75] Warning: [Check][offset] Check input storage_offset[%ld] = 0 failed, result is untrustworthy4096 (function operator())
    /home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/generation/logits_process.py:407: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/_internal/cpython-3.9.0/lib/python3.9/site-packages/torch/include/ATen/core/LegacyTypeDispatch.h:74.)
      sorted_indices_to_remove[..., -self.min_tokens_to_keep :] = 0
    [W AddKernelNpu.cpp:86] Warning: The oprator of add is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
    [W NeKernelNpu.cpp:28] Warning: The oprator of ne is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
    你好!有什么我能为你做的吗?
    好的,我给你讲一个年轻人奋斗创业最终取得成功的故事。这个故事叫做《奋斗》。
    
    故事的主人公是一个叫做李明的年轻人,他出生在一个普通的家庭,但他有一个梦想,那就是成为一名企业家。他从小就对创业有着浓厚的兴趣,经常参加各种创业比赛,也曾经在大学期间创办过一家小型的创业公司。
    
    然而,创业的道路并不容易,李明经历了许多挫折和困难。他的公司一度面临破产的危险,但他并没有放弃,而是更加努力地工作,寻找新的机会和资源。
    
    最终,李明的努力得到了回报,他的公司开始慢慢发展起来,他也因此获得了许多荣誉和奖励。他的故事告诉我们,只要有梦想,有勇气,有毅力,就一定能够实现自己的创业梦想。
    EZ9999: Inner Error!
    EZ9999  Kernel task happen error, retCode=0x28, [aicpu timeout].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1574]
            TraceBack (most recent call last):
            rtStreamSynchronizeWithTimeout execute failed, reason=[aicpu timeout][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:50]
            synchronize stream failed, runtime result = 507017[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
    
    
    DEVICE[0] PID[18400]: 
    EXCEPTION STREAM:
      Exception info:TGID=18400, model id=65535, stream id=3, stream phase=3
      Message info[0]:RTS_HWTS: Aicpu timeout, slot_id=12, stream_id=3, task_id=6200
        Other info[0]:time=2023-10-23-00:50:30.892.993, function=process_hwts_timeout_exception, line=3745, error code=0x28
    Traceback (most recent call last):
      File "/home/HwHiAiUser/Code/main.py", line 66, in 
        response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
      File "/home/HwHiAiUser/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1199, in chat
        outputs = self.generate(
      File "/home/HwHiAiUser/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1318, in generate
        return super().generate(
      File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/generation/utils.py", line 1652, in generate
        return self.sample(
      File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/generation/utils.py", line 2793, in sample
        if unfinished_sequences.max() == 0:
    RuntimeError: ACL stream synchronize failed.
    [W NPUStream.cpp:372] Warning: NPU warning, error code is 507017[Error]: 
    [Error]: The aicpu execution times out. 
            Rectify the fault based on the error information in the log, or you can ask us at follwing gitee link by issues: https://gitee.com/ascend/pytorch/issue
    EH9999: Inner Error!
            rtDeviceSynchronize execute failed, reason=[aicpu timeout][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:50]
    EH9999  wait for compute device to finish failed, runtime result = 507017.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
            TraceBack (most recent call last):
     (function npuSynchronizeDevice)
    [W NPUStream.cpp:372] Warning: NPU warning, error code is 507017[Error]: 
    [Error]: The aicpu execution times out. 
            Rectify the fault based on the error information in the log, or you can ask us at follwing gitee link by issues: https://gitee.com/ascend/pytorch/issue
    EH9999: Inner Error!
            rtDeviceSynchronize execute failed, reason=[aicpu timeout][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:50]
    EH9999  wait for compute device to finish failed, runtime result = 507017.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
            TraceBack (most recent call last):
     (function npuSynchronizeDevice)
    /home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up 
      _warnings.warn(warn_message, ResourceWarning)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73

    修改后的代码:

    (NPU) (base) [HwHiAiUser@bogon Code]$ /home/HwHiAiUser/下载/yes/envs/NPU/bin/python /home/HwHiAiUser/Code/main.py
    Warning: Device do not support double dtype now, dtype cast repalce with float.
    tensor([[-0.3798,  0.5290],
            [-0.7580, -0.6727]], device='npu:0')
    npu
    2023-10-23 08:28:35,064 - modelscope - INFO - PyTorch version 2.1.0+cpu Found.
    2023-10-23 08:28:35,065 - modelscope - INFO - Loading ast index from /home/HwHiAiUser/.cache/modelscope/ast_indexer
    2023-10-23 08:28:35,084 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 068f7e60e6f05d224ec8ad9a969f5922 and a total number of 943 components indexed
    2023-10-23 08:28:35,538 - modelscope - INFO - Use user-specified model revision: v1.1.4
    /home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/tiktoken/core.py:50: ResourceWarning: unclosed 
      self._core_bpe = _tiktoken.CoreBPE(mergeable_ranks, special_tokens, pat_str)
    Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。
    Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
    Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
    Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
    Loading checkpoint shards: 100%|██████████████████████| 8/8 [00:05<00:00,  1.37it/s]
    请输入问题:你好!你是谁
    [W OpCommand.cpp:75] Warning: [Check][offset] Check input storage_offset[%ld] = 0 failed, result is untrustworthy4096 (function operator())
    /home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/generation/logits_process.py:407: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/_internal/cpython-3.9.0/lib/python3.9/site-packages/torch/include/ATen/core/LegacyTypeDispatch.h:74.)
      sorted_indices_to_remove[..., -self.min_tokens_to_keep :] = 0
    [W AddKernelNpu.cpp:86] Warning: The oprator of add is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
    [W NeKernelNpu.cpp:28] Warning: The oprator of ne is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
    我是通义千问,由阿里云开发的AI助手。我被设计用来回答各种问题、提供信息和与用户进行对话。有什么我可以帮助你的吗?
    请输入问题:
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24

    问题二:

    报错代码:
      import torch
    import torch_npu
    from accelerate import Accelerator
    accelerator = Accelerator()
    from accelerate import dispatch_model
    
    device = accelerator.device
    
    # source '/home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh'
    
    x = torch.randn(2, 2).npu()
    y = torch.randn(2, 2).npu()
    z = x.mm(y)
    
    print(z)
    print(device)
    
    # from modelscope import Model, AutoTokenizer
    
    
    # model = Model.from_pretrained("modelscope/Llama-2-7b-ms", revision='v1.0.1', device_map=device, torch_dtype=torch.float16)
    # tokenizer = AutoTokenizer.from_pretrained("modelscope/Llama-2-7b-ms", revision='v1.0.1')
    
    # prompt = "Hey, are you conscious? Can you talk to me?"
    # inputs = tokenizer(prompt, return_tensors="pt")
    
    # # Generate
    # generate_ids = model.generate(inputs.input_ids.to(model.device), max_length=30)
    # print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
    from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download
    from modelscope import GenerationConfig
    
    # Note: The default behavior now has injection attack prevention off.
    model_dir = snapshot_download("qwen/Qwen-7B-Chat", revision = 'v1.1.4')
    
    tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
    
    # use bf16
    # model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, bf16=True).eval()
    # use fp16
    # model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True, fp16=True).eval()
    # use cpu only
    # model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="cpu", trust_remote_code=True).eval()
    # use auto mode, automatically select precision based on the device.
    model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True).eval()
    
    # Specify hyperparameters for generation
    model.generation_config = GenerationConfig.from_pretrained(model_dir, trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
    
    # 第一轮对话 1st dialogue turn
    response, history = model.chat(tokenizer, "你好", history=None)
    print(response)
    # 你好!很高兴为你提供帮助。
    
    # 第二轮对话 2nd dialogue turn
    response, history = model.chat(tokenizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history)
    print(response)
    # 这是一个关于一个年轻人奋斗创业最终取得成功的故事。
    # 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。
    # 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。
    # 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。
    # 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。
    # 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。
    
    # 第三轮对话 3rd dialogue turn
    response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
    print(response)
    # 《奋斗创业:一个年轻人的成功之路》
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68

    报错内容:

    加载一个不量化全精度的7B模型进行推理,使用GPU加载占用的显存绝对不会超过20G,然而使用GPU却超显存了。
    同时,Atlas 300I Pro在我购买时,标注的是24G显存,实际到手只有20G,我需要一个合理的解释。
    以下是第一个代码段的执行结果与报错

    (NPU) (base) [HwHiAiUser@bogon Code]$ /home/HwHiAiUser/下载/yes/envs/NPU/bin/python /home/HwHiAiUser/Code/main.py
    Warning: Device do not support double dtype now, dtype cast repalce with float.
    tensor([[ 0.0766,  0.2028],
            [-2.3419, -1.6132]], device='npu:0')
    npu
    2023-10-23 07:56:42,901 - modelscope - INFO - PyTorch version 2.1.0+cpu Found.
    2023-10-23 07:56:42,901 - modelscope - INFO - Loading ast index from /home/HwHiAiUser/.cache/modelscope/ast_indexer
    2023-10-23 07:56:42,919 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 068f7e60e6f05d224ec8ad9a969f5922 and a total number of 943 components indexed
    2023-10-23 07:56:43,931 - modelscope - INFO - Use user-specified model revision: v1.1.4
    /home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/tiktoken/core.py:50: ResourceWarning: unclosed 
      self._core_bpe = _tiktoken.CoreBPE(mergeable_ranks, special_tokens, pat_str)
    Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。
    Flash attention will be disabled because it does NOT support fp32.
    Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
    Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
    Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
    Loading checkpoint shards:  62%|█████████████▊        | 5/8 [00:06<00:04,  1.38s/it]
    Traceback (most recent call last):
      File "/home/HwHiAiUser/Code/main.py", line 45, in 
        model = AutoModelForCausalLM.from_pretrained(model_dir, device_map=device, trust_remote_code=True).eval()
      File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/modelscope/utils/hf_util.py", line 181, in from_pretrained
        module_obj = module_class.from_pretrained(model_dir, *model_args,
      File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 560, in from_pretrained
        return model_class.from_pretrained(
      File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/modelscope/utils/hf_util.py", line 78, in from_pretrained
        return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
      File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3307, in from_pretrained
        ) = cls._load_pretrained_model(
      File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3695, in _load_pretrained_model
        new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
      File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/transformers/modeling_utils.py", line 741, in _load_state_dict_into_meta_model
        set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
      File "/home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 317, in set_module_tensor_to_device
        new_value = value.to(device)
    RuntimeError: NPU out of memory. Tried to allocate 66.00 MiB (NPU 0; 0 bytes total capacity; 19.09 GiB already allocated; 19.09 GiB current active; 0 bytes free; 19.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.
    /home/HwHiAiUser/下载/yes/envs/NPU/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up 
      _warnings.warn(warn_message, ResourceWarning)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37

    环境信息

    固件版本检查

    (NPU) [HwHiAiUser@localhost ~]$ sudo /usr/local/Ascend/driver/tools/upgrade-tool --device_index -1 --component -1 --version
    {
    Get component version(6.4.12.1.241) succeed for deviceId(0), componentType(11).
    	{"device_id":0, "component":hboot1a, "version":6.4.12.1.241}
    Get component version(6.4.12.1.241) succeed for deviceId(0), componentType(12).
    	{"device_id":0, "component":hboot1b, "version":6.4.12.1.241}
    Get component version(6.4.12.1.241) succeed for deviceId(0), componentType(18).
    	{"device_id":0, "component":hlink, "version":6.4.12.1.241}
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    npu-smi info

    (NPU) [HwHiAiUser@localhost ~]$ npu-smi info
    +--------------------------------------------------------------------------------------------------------+
    | npu-smi 23.0.rc2                                 Version: 23.0.rc2                                     |
    +-------------------------------+-----------------+------------------------------------------------------+
    | NPU     Name                  | Health          | Power(W)     Temp(C)           Hugepages-Usage(page) |
    | Chip    Device                | Bus-Id          | AICore(%)    Memory-Usage(MB)                        |
    +===============================+=================+======================================================+
    | 8       310P3                 | OK              | NA           37                0     / 0             |
    | 0       0                     | 0000:01:00.0    | 0            1700 / 21527                            |
    +===============================+=================+======================================================+
    +-------------------------------+-----------------+------------------------------------------------------+
    | NPU     Chip                  | Process id      | Process name             | Process memory(MB)        |
    +===============================+=================+======================================================+
    | No running processes found in NPU 8                                                                    |
    +===============================+=================+======================================================+
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16

    CANN安装
    已安装适应pytorch2.1.0版本的CANN7.0.RC1.alpha003,并且环境配置正确,如下代码运行正常:

    import torch
    import torch_npu
    
    # source '/home/HwHiAiUser/Ascend/ascend-toolkit/set_env.sh'
    
    x = torch.randn(2, 2).npu()
    y = torch.randn(2, 2).npu()
    z = x.mm(y)
    
    print(z)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    运行结果:

    (NPU) (base) [HwHiAiUser@bogon Code]$ /home/HwHiAiUser/下载/yes/envs/NPU/bin/python /home/HwHiAiUser/Code/main.py
    Warning: Device do not support double dtype now, dtype cast repalce with float.
    tensor([[ 0.0766,  0.2028],
            [-2.3419, -1.6132]], device='npu:0')
    
    • 1
    • 2
    • 3
    • 4

    在这里插入图片描述

  • 相关阅读:
    全局滚动条样式修改,elementUI table底部滚动条遮挡
    React-Props进阶
    Python遥感开发之批量掩膜和裁剪
    灾难恢复站点类型
    【Centos8】Centos8+使用MegaCLI查看硬件RAID情况
    【owt-server】m88分支和m59-server
    实现8086汇编编译器(二)——汇编指令的翻译
    java基础16
    SAP 成品发布标准成本之后工艺路线改变,导致对应工单报工时成本中心出错 (TCODE MFBF)<转载>
    线程池的使用方式以及差别
  • 原文地址:https://blog.csdn.net/weixin_52292970/article/details/133971947