• 机器学习分类问题标签如何做编码


    有人说:“没事多读书”,
    确实是这样,一直以为机器学习分类问题标签只有“One-hot”,主要是用得多,当然Hash应该也可以。     
    直到我看到了这个库category_encoders,仿佛发现了新大陆。  
    
    • 1
    • 2
    • 3

    https://github.com/scikit-learn-contrib/category_encoders
    以后就发挥想象,想怎么玩怎么玩。对于标签(目标对象)的理解,决定了编码方式,机器学习擅长编码和解码,输入和输出之间的编码距离如何做功最小,性能最好,这就是研究的价值所在。

    References

    1. Kilian Weinberger; Anirban Dasgupta; John Langford; Alex Smola; Josh Attenberg (2009). Feature Hashing for Large Scale Multitask Learning. Proc. ICML.
    2. Contrast Coding Systems for categorical variables. UCLA: Statistical Consulting Group. From https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/.
    3. Gregory Carey (2003). Coding Categorical Variables. From http://psych.colorado.edu/~carey/Courses/PSYC5741/handouts/Coding%20Categorical%20Variables%202006-03-03.pdf
    4. Owen Zhang - Leave One Out Encoding. From https://datascience.stackexchange.com/questions/10839/what-is-difference-between-one-hot-encoding-and-leave-one-out-encoding
    5. Beyond One-Hot: an exploration of categorical variables. From http://www.willmcginnis.com/2015/11/29/beyond-one-hot-an-exploration-of-categorical-variables/
    6. BaseN Encoding and Grid Search in categorical variables. From http://www.willmcginnis.com/2016/12/18/basen-encoding-grid-search-category_encoders/
    7. Daniele Miccii-Barreca (2001). A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems. SIGKDD Explor. Newsl. 3, 1. From http://dx.doi.org/10.1145/507533.507538
    8. Weight of Evidence (WOE) and Information Value Explained. From https://www.listendata.com/2015/03/weight-of-evidence-woe-and-information.html
    9. Empirical Bayes for multiple sample sizes. From http://chris-said.io/2017/05/03/empirical-bayes-for-multiple-sample-sizes/
    10. Simple Count or Frequency Encoding. From https://www.datacamp.com/community/tutorials/encoding-methodologies
    11. Transforming categorical features to numerical features. From https://tech.yandex.com/catboost/doc/dg/concepts/algorithm-main-stages_cat-to-numberic-docpage/
    12. Andrew Gelman and Jennifer Hill (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. From https://faculty.psau.edu.sa/filedownload/doc-12-pdf-a1997d0d31f84d13c1cdc44ac39a8f2c-original.pdf
    13. Carlos Mougan, David Masip, Jordi Nin and Oriol Pujol (2021). Quantile Encoder: Tackling High Cardinality Categorical Features in Regression Problems. https://link.springer.com/chapter/10.1007%2F978-3-030-85529-1_14
  • 相关阅读:
    神经网络基础部件-卷积层详解
    MySQL的介绍
    /etc/sudoers文件未配置nopasswd:但sudo su没有输密码就直接进root了
    Centos 7 安装 Docker Enginee
    Kotlin 位运算
    【Chain of Resposibility】C++设计模式——职责链
    欧拉计划Python解法(第1题-第5题)
    深度学习零基础学习之路——第三章 数据可视化TensorBoard和TorchVision的介绍
    使用wordpress搭建官网
    Python学习记录 析构函数
  • 原文地址:https://blog.csdn.net/xbs150/article/details/126268882