• python爬虫实战-京东商品数据


    前言

    大家早好、午好、晚好吖 ❤ ~欢迎光临本文章

    今天介绍一下如何用 Python 来批量获取京东商品信息!!

    如果有什么疑惑/资料需要的可以点击文章末尾名片领取源码

    第三方库:

    • requests >>> pip install requests

    开发环境:

    • python 3.8

    • pycharm 专业版

    爬虫具体实现流程

    一. 思路分析

    找到数据来源 (找到 数据所在的链接地址)

    https://api.m.jd.com/?appid=search-pc-java&functionId=pc_search_s_new&client=pc&clientVersion=1.0.0&t=1697545127305&body=%7B%22keyword%22%3A%22iPhone%22%2C%22qrst%22%3A%221%22%2C%22wq%22%3A%22iPhone%22%2C%22ev%22%3A%22exbrand_Apple%5E%22%2C%22pvid%22%3A%22c2a8f09dbfa044a6a12f860e20edb6c7%22%2C%22isList%22%3A0%2C%22page%22%3A%223%22%2C%22s%22%3A%2256%22%2C%22click%22%3A%220%22%2C%22log_id%22%3A%221697544397338.9790%22%2C%22show_items%22%3A%22%22%7D&loginType=3&uuid=122270672.1675327822068798256204.1675327822.1696749738.1697544369.7&area=18_1482_48942_49058&h5st=20231017201847323%3Bg5giig9tnm63gij2%3Bf06cc%3Btk03wbde31c7218nmTOuI4vmUG1gibUwyDKLNpF6B_t1uk9ukpSq3k_k19h74PyUWE_Fz9mV-ggz4JCtsVbQZVSId9dC%3B7710a41bb85a10fe65109f794fb3b815%3B4.1%3B1697545127323%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980b96c3953b1ab788029ae792b39e113ccac142f09e3a1fa8c3f25055353b835ed0bf65228424626b8a9e1d2c030999d9be97a9dee9fb20116ceb0deb8736546109bc1cf5b91d1dfa2b39c79b3b0f0a5a036cdc921a1f147179b291c830dc87a6d3d0c3885fe721d5f0391a55bb4bf663963282084e04c7f24e6d3bcb219f4cbb08d3f76f13bca81938336d1934b88ded260caabac20e37a63dd3f6a093fd5dd2d936e95b67fee9654732d8a2908d96fe4b8d0a0b9d9b65996563d4cb94925fd651106c8e7c1234f63f57b1baa40324d6e8969e5c7b48e35e2c4bc5d325e88db237e42c33d6b256ebc720e76f574f34b&x-api-eid-token=jdd03BFXLLB72GO2GWA4OW3JSYXJPOVRF3WAKAKETOTSMNISZ6VIJTLEVQKEHWUA6VLD7ORS2QYC55PWBVUZVPZTXPDCHZUAAAAMLHWDXLUYAAAAACL4BJCT4CQASEEX
    
    • 1

    二. 代码实现

    1. 发送请求 (访问网站)

    2. 提取数据 将需要的内容提取出来

    3. 保存数据

    详情页: 评论数量 销量 商品介绍 店铺评分

    翻页抓取: 如何实现翻页抓取

    一页分为两部分加载 每个部分 30条数据

    翻页的规律:

    第二页的第一个包和第二个包的对比

    t: 1697545127305
    2-1-body: {"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"3","s":"56","click":"0","log_id":"1697544397338.9790","show_items":""}
    3-1-body: {"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"5","s":"116","click":"0","log_id":"1697546973358.2929","show_items":""}
    2-2-body: {"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","page":"4","s":"86","scrolling":"y","log_id":"1697545127114.3155","tpl":"3_M","isList":0,"show_items":""}
    3-2-body: {"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","page":"6","s":"146","scrolling":"y","log_id":"1697547015728.2990","tpl":"3_M","isList":0,"show_items":""}
    
    • 1
    • 2
    • 3
    • 4
    • 5

    page每次累加1

    s每次累加 30

    1-1: s 1
    1-2: s 26
    2-1: s 56
    2-2: s 86
    ...
    ...
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    h5st: 20231017201847323;g5giig9tnm63gij2;f06cc;tk03wbde31c7218nmTOuI4vmUG1gibUwyDKLNpF6B_t1uk9ukpSq3k_k19h74PyUWE_Fz9mV-ggz4JCtsVbQZVSId9dC;7710a41bb85a10fe65109f794fb3b815;4.1;1697545127323;ee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980b96c3953b1ab788029ae792b39e113ccac142f09e3a1fa8c3f25055353b835ed0bf65228424626b8a9e1d2c030999d9be97a9dee9fb20116ceb0deb8736546109bc1cf5b91d1dfa2b39c79b3b0f0a5a036cdc921a1f147179b291c830dc87a6d3d0c3885fe721d5f0391a55bb4bf663963282084e04c7f24e6d3bcb219f4cbb08d3f76f13bca81938336d1934b88ded260caabac20e37a63dd3f6a093fd5dd2d936e95b67fee9654732d8a2908d96fe4b8d0a0b9d9b65996563d4cb94925fd651106c8e7c1234f63f57b1baa40324d6e8969e5c7b48e35e2c4bc5d325e88db237e42c33d6b256ebc720e76f574f34b
    h5st: 20231017204933583;g5giig9tnm63gij2;f06cc;tk03wbde31c7218nmTOuI4vmUG1gibUwyDKLNpF6B_t1uk9ukpSq3k_k19h74PyUWE_Fz9mV-ggz4JCtsVbQZVSId9dC;0ed5b74f81ac6ded4aeee2f615d6e03f;4.1;1697546973583;ee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980b96c3953b1ab788029ae792b39e113ccac142f09e3a1fa8c3f25055353b835ed0bf65228424626b8a9e1d2c030999d9be97a9dee9fb20116ceb0deb8736546109bc1cf5b91d1dfa2b39c79b3b0f0a5a036cdc921a1f147179b291c830dc87a6d3d0c3885fe721d5f0391a55bb4bf663963282084e04c7f24e6d3bcb219f4cbfe11ef406022b163c00824a22a034ed25520965f3f71ba25eca1fe340990d9a3c4d0100fbc84b1e9094cbe21ed8f59acc7a3bfd1bdd706f19bc06fd1d9a10233e68a2c851f66633c3357188dfeec7cc88dc36ba5ab73fac1ee81fd17352694c31f5a0096b50478e73a7b645153333271
    
    • 1
    • 2

    代码展示

    import requests     # 发送请求 第三方库 (需要安装)
    import parsel       # 第三方库 用来提取网页源代码的
    import csv          # 内置模块 无需安装
    import time
    
    
    with open("jingdong.csv", mode='w', newline='', encoding='utf-8') as f:
        csv.writer(f).writerow(['title', 'price', 'shop', 'detail_url'])
    headers = {
        'Cookie': '__jdu=1675327822068798256204; shshshfpa=a8c4d3ab-4de2-1594-07c6-96937703bc48-1675511732; shshshfpx=a8c4d3ab-4de2-1594-07c6-96937703bc48-1675511732; shshshfp=df23b3178a68c52485e728025047439d; areaId=18; _pst=jd_7449b8b770c1a; unick=u_y14qxm7bysay; pin=jd_7449b8b770c1a; _tp=vZPPhy6cqARc6L2%2B3nOzUq3kCs2OWuApKpEwLezV01A%3D; unpl=JF8EAMhnNSttDRsGBx9XExcQHAlVWw4ATx4LP2JXXFpYSVwHS1VPGhl7XlVdXxRLFh9vYRRXXFNKUw4aCysSEXteXVdZDEsWC2tXVgQFDQ8VXURJQlZAFDNVCV9dSRZRZjJWBFtdT1xWSAYYRRMfDlAKDlhCR1FpMjVkXlh7VAQrAhwUFEleUldeC0oQCmlvDFdZX0hVACsDKxUge21UWloLQxczblcEZB8MF1EHGwcZFV1LWlJaXwtNHgBsZgJdW1BCVwEcARoXIEptVw; __jdv=76161171|baidu-pinzhuan|t_288551095_baidupinzhuan|cpc|0f3d30c8dba7459bb52f2eb5eba8ac7d_0_dac35d941fe04b9589a4c961393afe98|1697544369451; PCSYCityID=CN_430000_430100_0; jsavif=1; __jda=122270672.1675327822068798256204.1675327822.1696749738.1697544369.7; __jdc=122270672; wlfstk_smdl=zqjf27ll62rd5uge85230utp29qi2wv2; logintype=qq; npin=jd_7449b8b770c1a; thor=459E9A0707CDD36020E74D14717A705AD6CEE67A8D55FEDAACBD33B9D31511E6AA1AEEA695BDBF1921A135769B716889400BBD0DCF1CCB0F3B325202A6A3E27AD6388CDB3EBDB3F0B59C1377A16E8774FACFD9FCFC04AEE31844B7ABFC6C39EE9C2F52540A2CCF902FCA67B460688F87FCAC3279B369769DBB94CCADFE20BF7EE14A8666D30DEFBBA7837A308B8165AD71D91B839EF96E5CCB7F2F0026C5679B; flash=2_ZrWfSfPGSnxmE-YDUlWOCIWikxr51SV82QCigp8WUVY6X70ebZL51YYs2-iD8o1O6FnCUtUnKJhz7L-PsPM9Ts6kNGDO2_sAyca7PjZdqqN*; pinId=f_SKjtPUQ3D1_NrwwoSZkrV9-x-f3wj7; shshshsID=e63b3af9ee1f8ba7e59ca5c63186d670_3_1697544398707; 3AB9D23F7A4B3C9B=BFXLLB72GO2GWA4OW3JSYXJPOVRF3WAKAKETOTSMNISZ6VIJTLEVQKEHWUA6VLD7ORS2QYC55PWBVUZVPZTXPDCHZU; token=49a2f429c466477218207fee65086990,3,943080; __tk=IsupJpIwkDtzkvnzkDjFJsAwjiJTIskqlsuoJpt1jpSykpfojUbTIS,3,943080; 3AB9D23F7A4B3CSS=jdd03BFXLLB72GO2GWA4OW3JSYXJPOVRF3WAKAKETOTSMNISZ6VIJTLEVQKEHWUA6VLD7ORS2QYC55PWBVUZVPZTXPDCHZUAAAAMLHWJEJRYAAAAACH56UEPWVWMVU4X; _gia_d=1; __jdb=122270672.5.1675327822068798256204|7.1697544369; shshshfpb=AAidGkj2LEsTTq03iFZQHxpaTdwO8SBZ1URcyTgAAAAA; ipLoc-djd=18-1482-48942-49058',
        'Origin': '**屏蔽,完整源码可+我 V:python10010免费领 好友验证备注:6***',
        'Referer': '**屏蔽**',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36'
    }
    s = 1
    for page in range(1, 121):
        t = int(time.time() * 1000)
        body = '{"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"'+str(page)+'","s":"'+str(s)+'","click":"0","log_id":"1697547020245.6899","show_items":""}'
        if page == 2:
            s = 26
            body = '{"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"' + str(
                page) + '","s":"' + str(s) + '","click":"0","log_id":"1697547020245.6899","show_items":""}'
        elif page > 2:
            s += 30
            if page % 2 == 0:
                body = '{"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","page":"'+str(page)+'","s":"'+str(s)+'","scrolling":"y","log_id":"1697545127114.3155","tpl":"3_M","isList":0,"show_items":""}'
            else:
                body = '{"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"'+str(page)+'","s":"'+str(s)+'","click":"0","log_id":"1697544397338.9790","show_items":""}'
        params = {
            'appid': 'search-pc-java',
            'functionId': 'pc_search_s_new',
            'client': 'pc',
            'clientVersion': '1.0.0',
            't': str(t),
            'body': body,
            'loginType': '3',
            'uuid': '122270672.1675327822068798256204.1675327822.1696749738.1697544369.7',
            'area': '18_1482_48942_49058',
            'h5st': '20231017205657848;g5giig9tnm63gij2;f06cc;tk03wbde31c7218nmTOuI4vmUG1gibUwyDKLNpF6B_t1uk9ukpSq3k_k19h74PyUWE_Fz9mV-ggz4JCtsVbQZVSId9dC;825dbf6bd60713fa1ddad5e95d169108;4.1;1697547417848;ee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980b96c3953b1ab788029ae792b39e113ccac142f09e3a1fa8c3f25055353b835ed0bf65228424626b8a9e1d2c030999d9be97a9dee9fb20116ceb0deb8736546109bc1cf5b91d1dfa2b39c79b3b0f0a5a036cdc921a1f147179b291c830dc87a6d3d0c3885fe721d5f0391a55bb4bf663963282084e04c7f24e6d3bcb219f4cb08a33c86f2c515c368479ab2fffd0f4935b373832965c1ba9aa292710f7023e99dac2e1bde15cd796fe1601c5425e954a8cebb66dc24031fb337c7d79d2a6f46c875d77cbc102770fd5125f99aaa366d5abac9c006c2f0275731844dd1353f808489e029e35b485616771b972ae3bb95',
            'x-api-eid-token': 'jdd03BFXLLB72GO2GWA4OW3JSYXJPOVRF3WAKAKETOTSMNISZ6VIJTLEVQKEHWUA6VLD7ORS2QYC55PWBVUZVPZTXPDCHZUAAAAMLHWDXLUYAAAAACL4BJCT4CQASEEX',
        }
        url = '**屏蔽,完整源码可+我 V:python10010免费领 好友验证备注:6***/'
        # 1. 发送请求 (访问网站)
        response = requests.get(url=url, params=params, headers=headers)
        # 2. 提取数据 将需要的内容提取出来
        html_data = response.text
        # 怎么样提取网页源代码当中的内容
        select = parsel.Selector(html_data)
        # //ul[@class="gl-warp clearfix"]/li
        # 拿到了每个商品所属的标签
        lis = select.xpath('//ul[@class="gl-warp clearfix"]/li')
        for li in lis:
            # li.xpath('string(.//div[@class="p-name p-name-type-2"])').get()
            title = li.xpath('string(.//div[@class="p-name p-name-type-2"])').get("").strip()
            price = li.xpath('string(.//div[@class="p-price"])').get("").strip()
            shop = li.xpath('string(.//div[@class="p-shop"])').get("").strip()
            detail_url = "https:"+li.xpath('.//div[@class="p-name p-name-type-2"]/a/@href').get("")
            print(title, price, shop, detail_url)
            # 3. 保存数据
            with open("jingdong.csv", mode='a', newline='', encoding='utf-8') as f:
                csv.writer(f).writerow([title, price, shop, detail_url])
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61

    尾语

    好了,今天的分享就差不多到这里了!

    对下一篇大家想看什么,可在评论区留言哦!看到我会更新哒(ง •_•)ง

    喜欢就关注一下博主,或点赞收藏评论一下我的文章叭!!!

    最后,宣传一下呀~👇👇👇 更多源码、资料、素材、解答、交流 皆点击下方名片获取呀👇👇👇

  • 相关阅读:
    thonny的汉字编码是UTF-8,如何才能转为GB2312?
    Tomcat最大并发数及在线用户数
    保姆级教程之SABO-VMD-CNN-SVM的分类诊断,特征可视化
    【C++ STL容器】:vector存放数据以及存放自定义的数据类型
    chatgpt赋能python:Python中的随机选择:介绍和应用
    k8s-20 hpa控制器
    遥感测深方法综述(一)遥感测深方兴未艾
    MySQL半同步复制源码解析
    k8s---基本架构--节点
    java毕业设计—— 基于java+JSP+SSH的网上购物系统设计与实现(毕业论文+程序源码)——网上购物系统
  • 原文地址:https://blog.csdn.net/weixin_62853513/article/details/133908495