• Tensorflow2 中对模型进行编译,不同loss函数的选择下输入数据格式需求变化


    一、tf2中常用的损失函数介绍

    TensorFlow 2 中,编译模型时可以选择不同的损失函数来定义模型的目标函数。不同的损失函数适用于不同的问题类型和模型架构。下面是几种常见的损失函数以及它们的作用和适用场景:

    1.均方误差(Mean Squared Error, MSE):MSE 是回归问题中常用的损失函数,用于衡量预测值与真实值之间的平均平方差。较大的误差会得到更大的惩罚,适用于回归任务。

    model.compile(loss='mse', ...)

    2.二进制交叉熵(Binary Cross Entropy):二进制交叉熵是二分类问题中常用的损失函数,用于衡量两个类别之间的差异性。适用于二分类问题,输出为一个概率值的 sigmoid 激活的模型。

    model.compile(loss='binary_crossentropy', ...)

    3.多类交叉熵(Categorical Cross Entropy):多类交叉熵是多分类问题中常用的损失函数,用于衡量多个类别之间的差异性。适用于多分类问题,输出为每个类别的概率分布的 softmax 激活的模型。

    model.compile(loss='categorical_crossentropy', ...)

    4.稀疏分类交叉熵(Sparse Categorical Cross Entropy):类似于多类交叉熵,但适用于标签以整数形式表示的多分类问题,而不是 one-hot 编码。

    model.compile(loss='sparse_categorical_crossentropy', ...)

    5.KL 散度损失(Kullback-Leibler Divergence):KL 散度用于衡量两个概率分布的差异性。在生成模型中,常与自动编码器等模型结合使用,促使模型输出接近于预定义的概率分布。

    model.compile(loss='kullback_leibler_divergence', ...)

    除了上述常见的损失函数之外,还有其他一些定制化的损失函数,可以根据具体任务和需求来自定义。通过 tf.keras.losses 模块,您可以查看更多可用的损失函数,并选择适合自己模型的损失函数。在选择损失函数时,需要根据任务类型、数据分布以及模型设计进行合理选择,以获得最佳的训练效果。

    二、两种损失函数的比较分析

    多类交叉熵(Categorical Cross Entropy)和稀疏分类交叉熵(Sparse Categorical Cross Entropy)

    相同点:都可用于数据多分类任务。

    不同点:对数据的输入要求不一样,多类交叉熵(Categorical Cross Entropy)要求数据为one-hot 编码,这个主要是针对数据的标签数据,比如我们的数据标签数据读取的时候,其类别是0-9,这个数据可以是一列数据,这个时候我们可以使用稀疏分类交叉熵(Sparse Categorical Cross Entropy)函数直接进行编译。

    one_hot编码(独热编码)说明:

    一种将每个元素表示为二进制向量的编码方式,其中只有一个元素为1,其余元素都为0。例如,如果我们有一个长度为N的列表,那么它的one-hot编码将是一个NxN的矩阵,其中第i行表示第i个元素的编码。例如,如果我们有一个包含3种颜色的列表["红","蓝","绿"],那么它们的one-hot编码将是:

    红:[1,0,0] 蓝:[0,1,0] 绿:[0,0,1]

    这种编码方式常用于机器学习中,可以将每个类别标签转换为one-hot向量以便进行训练。

    如果是使用多类交叉熵(Categorical Cross Entropy)作为损失函数,那么我们对数据进行one-hot编码,代码有的地方使用:

    1. y_train=tf.keras.utils.to_categorical(y_train) #报错
    2. y_test=tf.keras.utils.to_categorical(y_test)

    在tensorflow2.5环境下报错:

    1. tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [12,16] and labels shape [204]
    2. [[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at /PycharmProjects/pythonProject/ML_New/MLP_Classifier_tf/MLP_Classifier_tf_imgVali.py:286) ]] [Op:__inference_train_function_762]
    3. Function call stack:
    4. train_function

    这里我们可以使用以下代码替代:

    1. y_train_one_hot = tf.one_hot(y_train, depth=num_classes)
    2. y_test_one_hot = tf.one_hot(y_test, depth=num_classes)

    三、示例代码分析

    Sparse Categorical Cross Entropy和Categorical Cross Entropy对应的损失函数围为:

    loss='sparse_categorical_crossentropy'    loss='categorical_crossentropy'

    使用minist数据做一个简单的MLP模型分类,这里先使用Sparse Categorical Cross Entropy损失函数。代码如下:

    1. import tensorflow as tf
    2. from tensorflow.keras.models import Sequential
    3. from tensorflow.keras.layers import Dense
    4. # 准备数据集
    5. (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
    6. x_train = x_train.reshape(-1, 784) / 255.0
    7. x_test = x_test.reshape(-1, 784) / 255.0
    8. # 构建模型
    9. model = Sequential()
    10. model.add(Dense(64, activation='relu', input_dim=784))
    11. model.add(Dense(64, activation='relu'))
    12. model.add(Dense(10, activation='softmax'))
    13. # 编译模型
    14. model.compile(optimizer='adam',
    15. loss='sparse_categorical_crossentropy',
    16. metrics=['accuracy'])
    17. # 训练模型
    18. model.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_test, y_test))
    19. # 评估模型
    20. loss, accuracy = model.evaluate(x_test, y_test)
    21. print('Test Loss:', loss)
    22. print('Test Accuracy:', accuracy)
    23. # 使用模型进行预测
    24. predictions = model.predict(x_test[:5])
    25. print('Predictions:', tf.argmax(predictions, axis=1))
    26. print('Labels:', y_test[:5])

    运行结果如下:

    1. D:\PycharmProjects\pythonProject\venv\Scripts\python.exe D:/PycharmProjects/pythonProject/ML_New/MLP_Classifier_tf/MLP_TEST_MINIST.py
    2. 2023-10-14 22:28:27.465600: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
    3. 2023-10-14 22:28:30.610122: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
    4. 2023-10-14 22:28:30.637119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
    5. pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
    6. coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
    7. 2023-10-14 22:28:30.637445: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
    8. 2023-10-14 22:28:30.648571: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
    9. 2023-10-14 22:28:30.648748: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
    10. 2023-10-14 22:28:30.652682: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
    11. 2023-10-14 22:28:30.654729: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
    12. 2023-10-14 22:28:30.657643: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
    13. 2023-10-14 22:28:30.661178: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
    14. 2023-10-14 22:28:30.662311: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
    15. 2023-10-14 22:28:30.662510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
    16. 2023-10-14 22:28:30.662864: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
    17. To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    18. 2023-10-14 22:28:30.663583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
    19. pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
    20. coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
    21. 2023-10-14 22:28:30.663941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
    22. 2023-10-14 22:28:31.130464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
    23. 2023-10-14 22:28:31.130645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
    24. 2023-10-14 22:28:31.130748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
    25. 2023-10-14 22:28:31.130967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6001 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
    26. 2023-10-14 22:28:31.709522: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
    27. Epoch 1/10
    28. 2023-10-14 22:28:31.920032: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
    29. 2023-10-14 22:28:32.369951: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
    30. 1875/1875 [==============================] - 5s 2ms/step - loss: 0.2845 - accuracy: 0.9174 - val_loss: 0.1443 - val_accuracy: 0.9547
    31. Epoch 2/10
    32. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.1261 - accuracy: 0.9633 - val_loss: 0.1085 - val_accuracy: 0.9646
    33. Epoch 3/10
    34. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0937 - accuracy: 0.9716 - val_loss: 0.1034 - val_accuracy: 0.9690
    35. Epoch 4/10
    36. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0731 - accuracy: 0.9772 - val_loss: 0.0987 - val_accuracy: 0.9714
    37. Epoch 5/10
    38. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0612 - accuracy: 0.9810 - val_loss: 0.0828 - val_accuracy: 0.9749
    39. Epoch 6/10
    40. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0507 - accuracy: 0.9835 - val_loss: 0.0955 - val_accuracy: 0.9702
    41. Epoch 7/10
    42. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0430 - accuracy: 0.9859 - val_loss: 0.0863 - val_accuracy: 0.9746
    43. Epoch 8/10
    44. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0374 - accuracy: 0.9874 - val_loss: 0.0935 - val_accuracy: 0.9737
    45. Epoch 9/10
    46. 1875/1875 [==============================] - 6s 3ms/step - loss: 0.0328 - accuracy: 0.9894 - val_loss: 0.0902 - val_accuracy: 0.9754
    47. Epoch 10/10
    48. 1875/1875 [==============================] - 6s 3ms/step - loss: 0.0287 - accuracy: 0.9900 - val_loss: 0.0902 - val_accuracy: 0.9771
    49. 313/313 [==============================] - 1s 2ms/step - loss: 0.0902 - accuracy: 0.9771
    50. Test Loss: 0.09022707492113113
    51. Test Accuracy: 0.9771000146865845
    52. Predictions: tf.Tensor([7 2 1 0 4], shape=(5,), dtype=int64)
    53. Labels: [7 2 1 0 4]
    54. Process finished with exit code 0

    我们将损失函数修改为Categorical Cross Entropy运行代码就会报错

     ValueError: Shapes (32, 1) and (32, 10) are incompatible

    这是因为我们没有将标签数据转化为独热编码,我们转换一下,,在model.fit()函数前加上:

    1. y_train = tf.one_hot(y_train, depth=10)
    2. y_test= tf.one_hot(y_test, depth=10)

    运行结果如下:

    1. D:\PycharmProjects\pythonProject\venv\Scripts\python.exe D:/PycharmProjects/pythonProject/ML_New/MLP_Classifier_tf/MLP_TEST_MINIST.py
    2. 2023-10-14 23:20:04.708405: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
    3. 2023-10-14 23:20:07.803493: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
    4. 2023-10-14 23:20:07.833164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
    5. pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
    6. coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
    7. 2023-10-14 23:20:07.833480: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
    8. 2023-10-14 23:20:07.840527: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
    9. 2023-10-14 23:20:07.840689: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
    10. 2023-10-14 23:20:07.844132: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
    11. 2023-10-14 23:20:07.845657: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
    12. 2023-10-14 23:20:07.848488: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
    13. 2023-10-14 23:20:07.852061: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
    14. 2023-10-14 23:20:07.853130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
    15. 2023-10-14 23:20:07.853317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
    16. 2023-10-14 23:20:07.853652: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
    17. To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    18. 2023-10-14 23:20:07.854467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
    19. pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
    20. coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
    21. 2023-10-14 23:20:07.854879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
    22. 2023-10-14 23:20:08.326771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
    23. 2023-10-14 23:20:08.326942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
    24. 2023-10-14 23:20:08.327041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
    25. 2023-10-14 23:20:08.327252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6001 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
    26. 2023-10-14 23:20:08.914697: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
    27. Epoch 1/10
    28. 2023-10-14 23:20:09.138669: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
    29. 1/1875 [..............................] - ETA: 21:57 - loss: 2.4066 - accuracy: 0.12502023-10-14 23:20:09.626287: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
    30. 1875/1875 [==============================] - 6s 3ms/step - loss: 0.2784 - accuracy: 0.9182 - val_loss: 0.1517 - val_accuracy: 0.9510
    31. Epoch 2/10
    32. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.1217 - accuracy: 0.9633 - val_loss: 0.1258 - val_accuracy: 0.9611
    33. Epoch 3/10
    34. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0874 - accuracy: 0.9731 - val_loss: 0.1045 - val_accuracy: 0.9666
    35. Epoch 4/10
    36. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0701 - accuracy: 0.9778 - val_loss: 0.0929 - val_accuracy: 0.9718
    37. Epoch 5/10
    38. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0569 - accuracy: 0.9821 - val_loss: 0.0853 - val_accuracy: 0.9751
    39. Epoch 6/10
    40. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0488 - accuracy: 0.9844 - val_loss: 0.0911 - val_accuracy: 0.9706
    41. Epoch 7/10
    42. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0402 - accuracy: 0.9868 - val_loss: 0.0847 - val_accuracy: 0.9748
    43. Epoch 8/10
    44. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0355 - accuracy: 0.9882 - val_loss: 0.0975 - val_accuracy: 0.9723
    45. Epoch 9/10
    46. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0307 - accuracy: 0.9895 - val_loss: 0.1027 - val_accuracy: 0.9743
    47. Epoch 10/10
    48. 1875/1875 [==============================] - 5s 3ms/step - loss: 0.0281 - accuracy: 0.9907 - val_loss: 0.1004 - val_accuracy: 0.9734
    49. 313/313 [==============================] - 1s 2ms/step - loss: 0.1004 - accuracy: 0.9734
    50. Test Loss: 0.10037881135940552
    51. Test Accuracy: 0.9733999967575073
    52. Predictions: tf.Tensor([7 2 1 0 4], shape=(5,), dtype=int64)
    53. Labels: tf.Tensor(
    54. [[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
    55. [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
    56. [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
    57. [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
    58. [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]], shape=(5, 10), dtype=float32)
    59. Process finished with exit code 0

    注意事项:

    使用稀疏交叉熵损失函数编译的模型的预测结果标签从0开始。如果自己的数据是从1开始的,那么后面做验证分析的时候需要注意两者应该保持一致。

    在使用稀疏交叉熵损失函数进行多分类问题训练时,标签通常使用整数表示,并且标签值的范围是从0到类别数量减1。模型的输出也应该是每个类别的概率分布。

    例如,如果有3个类别,标签将被编码为0、1和2,并且模型的输出将是一个长度为3的概率分布向量,表示对每个类别的预测概率。

    在预测时,模型会返回对每个类别的预测概率,通过取最大概率对应的索引,就可以得到预测的类别。这个索引范围是从0到类别数量减1。与稀疏交叉熵不同,使用普通的(非稀疏)交叉熵损失函数时,标签通常使用 one-hot 编码,其中每个类别都由一个向量表示,只有真实标签对应的位置为1,其余都为0。在这种情况下,预测结果的标签也是从0开始的。

  • 相关阅读:
    【深度学习】 自编码器(AutoEncoder)
    力扣(LeetCode)177. 第N高的薪水(2022.06.26)
    Javaweb学生信息管理系统(Mysql+JSP+MVC+CSS)
    什么是yandex.metrica 目标?
    LeetCode栈和队列练习
    细聊.Net Core中IServiceScope的工作方式
    Python:用于有效对象管理的单例模式
    精通Nginx(14)-配置HTTPS
    一个注解搞定SpringBoot接口定制属性加解密
    载荷壳聚糖微球的海藻酸钠水凝胶/载壳聚糖微球PVA/SA水凝胶/大豆蛋白复合壳聚糖球状水凝胶
  • 原文地址:https://blog.csdn.net/soderayer/article/details/133832597