码农知识堂 - 1000bd
  •   Python
  •   PHP
  •   JS/TS
  •   JAVA
  •   C/C++
  •   C#
  •   GO
  •   Kotlin
  •   Swift
  • 数据科学AWS实践1-AutoML|Analyze Datasets and Train ML models using AutoML


    这里写目录标题

    • Week 1
    • 1 Specialization overview
    • 2 Working with Data
      • 2.1 Data ingestion and exploration
      • 2.2 Data visualization
    • Week 2
    • 1 Statistical bias
      • 1.1 Statistical bias
      • 1.2 Statistical bias causes
      • 1.3 Measuring statistical bias
      • 1.4 Detecting statistical bias
      • 1.5 Detect statistical bias with Amazon SageMaker Clarify
      • 1.6 Approaches to statistical bias detection
      • 1.7 Feature importance: SHAP
    • Week 3
    • 1 Automated Machine Learning
      • 1.1 Automated Machine Learning (AutoML)
      • 1.2 AutoML Workflow
      • 1.3 Amazon SageMaker Autopilot
      • 1.4 Running experiments with Amazon SageMaker Autopilot
      • 1.5 Amazon SageMaker Autopilot: evaluating output
      • 1.6 Model Hosting
    • Week 4
    • 1 Build in algorithm
      • 1.1 Build in algorithm
      • 1.2 Use cases and algorithms
      • 1.3 Text analysis
      • 1.4 Train a text classifier
      • 1.5 Deploy the text classifier
    • HW Answer

    Week 1

    1 Specialization overview

    请添加图片描述

    2 Working with Data

    2.1 Data ingestion and exploration

    AWS S3

    请添加图片描述

    • Data lakes are often built on top of object storage, such as AWS S3.

    请添加图片描述

    • With object storage, data is stored and managed as objects, which consists of the data itself, any relevant metadata, such as when the object was last modified, and a unique identifier.
    • Object storage is particularly helpful for storing and retrieving growing amounts of data of any type, hence it’s the perfect foundation for data lakes.

    AWS Data Wrangler

    请添加图片描述

    AWS Glue

    请添加图片描述

    请添加图片描述

    AWS Athena

    请添加图片描述

    • Athena is serverless

    请添加图片描述

    • Athena is based on Presto, an open source distributed SQL engine, developed for this exact use case, running interactive queries against data sources of all sizes.

    2.2 Data visualization

    请添加图片描述

    Week 2

    1 Statistical bias

    1.1 Statistical bias

    请添加图片描述

    1.2 Statistical bias causes

    请添加图片描述

    • Societal: This is societal bias. These biases could be introduced because of preconceived notions that exist in society. Data generated by humans can be biased because all of us have unconscious bias.
    • Data drift (data shift): Data drift happens, especially when the data distribution significantly varies from the distribution of the training data that was used to initially train the model.
      • Covariant drift: The distribution of the independent variables or features that make up your dataset can change.
      • Prior probability drift: The distribution of your labels or the targeted variables might change.
      • Concept drift: The relationship between the features and the labels can change.

    1.3 Measuring statistical bias

    请添加图片描述

    请添加图片描述

    1.4 Detecting statistical bias

    请添加图片描述

    1.5 Detect statistical bias with Amazon SageMaker Clarify

    请添加图片描述
    请添加图片描述

    请添加图片描述

    请添加图片描述

    请添加图片描述

    请添加图片描述

    1.6 Approaches to statistical bias detection

    请添加图片描述

    • Sagemaker Data Wrangle:
      • Connect to multiple data scources abd explore data in more visual format.
      • Only use a subset of your data to detect bias
    • Sagemaker Clarify:
      • Large volumes of data

    1.7 Feature importance: SHAP

    请添加图片描述

    请添加图片描述

    Week 3

    1 Automated Machine Learning

    1.1 Automated Machine Learning (AutoML)

    请添加图片描述

    1.2 AutoML Workflow

    Ingest & Analyze

    请添加图片描述

    请添加图片描述

    请添加图片描述

    请添加图片描述

    请添加图片描述

    请添加图片描述

    请添加图片描述

    请添加图片描述

    请添加图片描述

    1.3 Amazon SageMaker Autopilot

    请添加图片描述

    请添加图片描述

    请添加图片描述

    请添加图片描述

    1.4 Running experiments with Amazon SageMaker Autopilot

    请添加图片描述

    请添加图片描述

    请添加图片描述

    1.5 Amazon SageMaker Autopilot: evaluating output

    请添加图片描述

    请添加图片描述

    1.6 Model Hosting

    请添加图片描述

    请添加图片描述

    请添加图片描述

    Week 4

    1 Build in algorithm

    1.1 Build in algorithm

    请添加图片描述

    请添加图片描述

    1.2 Use cases and algorithms

    请添加图片描述

    请添加图片描述请添加图片描述

    请添加图片描述

    1.3 Text analysis

    请添加图片描述

    请添加图片描述

    • One challenge with Word2Vec: Out of vocabulary issues
      • Its vocabulary only contains three million words. The vocabulary is a set if words that the model learned in the training phase. Out of vocabulary words are words that were not present in the text data set the model was initially trained on. If the word is not found in its vocabulary, the model architecture assigns a zero to that words which is basically discarding the word.

    请添加图片描述

    请添加图片描述

    1.4 Train a text classifier

    请添加图片描述

    1.5 Deploy the text classifier

    请添加图片描述

    HW Answer

    • https://github.com/Alex2Yang97/Analyze-Datasets-and-Train-ML-Models-using-AutoML
  • 相关阅读:
    使用分类权重解决数据不平衡的问题
    java sql字符串解析
    前端研习录(14)——CSS雪碧图及字体图标讲解及示例说明
    YOLOv5 添加 OTA,并使用 coco、CrowdHuman数据集进行训练。
    AD - 将修改后的 PCB 封装更新到当前 PCB 中
    希尔排序:优化插入排序的精妙算法
    未来,属于终身学习者
    电脑msvcp100.dll丢失的解决办法,靠谱的五个解决方法分享
    LeetCode1493. 删掉一个元素以后全为 1 的最长子数组
    P2458 [SDOI2006]保安站岗
  • 原文地址:https://blog.csdn.net/qq_41103204/article/details/126382537
  • 最新文章
  • 攻防演习之三天拿下官网站群
    数据安全治理学习——前期安全规划和安全管理体系建设
    企业安全 | 企业内一次钓鱼演练准备过程
    内网渗透测试 | Kerberos协议及其部分攻击手法
    0day的产生 | 不懂代码的"代码审计"
    安装scrcpy-client模块av模块异常,环境问题解决方案
    leetcode hot100【LeetCode 279. 完全平方数】java实现
    OpenWrt下安装Mosquitto
    AnatoMask论文汇总
    【AI日记】24.11.01 LangChain、openai api和github copilot
  • 热门文章
  • 十款代码表白小特效 一个比一个浪漫 赶紧收藏起来吧!!!
    奉劝各位学弟学妹们,该打造你的技术影响力了!
    五年了,我在 CSDN 的两个一百万。
    Java俄罗斯方块,老程序员花了一个周末,连接中学年代!
    面试官都震惊,你这网络基础可以啊!
    你真的会用百度吗?我不信 — 那些不为人知的搜索引擎语法
    心情不好的时候,用 Python 画棵樱花树送给自己吧
    通宵一晚做出来的一款类似CS的第一人称射击游戏Demo!原来做游戏也不是很难,连憨憨学妹都学会了!
    13 万字 C 语言从入门到精通保姆级教程2021 年版
    10行代码集2000张美女图,Python爬虫120例,再上征途
Copyright © 2022 侵权请联系2656653265@qq.com    京ICP备2022015340号-1
正则表达式工具 cron表达式工具 密码生成工具

京公网安备 11010502049817号