PyTorch DIstributed Overview
- Distributed Data-Parallel Training,DDP,分布式数据并行训练
- torch.nn.parallel.DistributedDataParallel
- RPC-Based Distributed Training,RPC,基于RPC的分布式训练
- Collective Communication,协同通信
不要把张量当入日志中输出,使用“.item()”转换为python的数据类型
logging.warning(f"epoch_index: {
epoch_index}, batch_index: