微信3.7版小程序数据分析

文章目录

微信3.7版小程序数据分析
前言
一、微信小程序是如何实现防止中间人攻击的？
- 老版本微信小程序请求逻辑
- 新版微信小程序请求数逻辑
二、神器是什么？它是如何实现抓包的？
- 神器下载
- 神器的抓包原理
如何实现全自动抓取数据？
- 笨方法
- 高端方法
总结

前言

随着微信防爬技术的再度升级，之前Python+mitmproxy利用网络代理这种中间人的方式已经爬取不到微信小程序的数据了。俗话世上无难事只要肯放弃,本以为已经无计可施了，在某天不经意间发现一个神器依然可以抓到小程序的包，于是开始研究如何实现自动爬取。

一、微信小程序是如何实现防止中间人攻击的？

老版本微信小程序请求逻辑

在这里插入图片描述

新版微信小程序请求数逻辑

在这里插入图片描述

二、神器是什么？它是如何实现抓包的？

神器下载

下载传送门

在这里插入图片描述

神器的抓包原理

它是直接扫描进程池，通过进程监听实现抓包。

如何实现全自动抓取数据？

笨方法

直接使用python控制鼠标点击，就像按键精灵一样把JSON复制粘贴出来然后进行数据分析

import pyWinhook as pyHook
import pythoncom 
from pymouse import PyMouse
import win32api
import win32con
import time
import json
import re
import pymongo

myMouse = PyMouse()
#开启数据库连接
collection = pymongo.MongoClient()['美团外卖']['药店']
collection2 = pymongo.MongoClient()['美团外卖']['药店_shop_info']
collection3 = pymongo.MongoClient()['美团外卖']['药店_spuList']
# 存储门店id
def static_vars(**kwargs):
    def decorate(func):
        for k in kwargs:
            setattr(func, k, kwargs[k])
        return func
    return decorate
@static_vars(soterid = '')

#获取当前的鼠标位置
def getcoord():
	nowP = myMouse.position()
	return nowP
# 监听爬取
def response(flow):
    target_url = 'poi/sputag/products'
    shop_info_url = 'poi/food'
    if shop_info_url in flow.request.url:
        item = {}
        item['data'] = json.loads(flow.response.text)['data']
        item['id'] = item['data']['poi_info']['id']
        response.soterid=item['id']
        item['shop_name'] = item['data']['poi_info']['name']
        pageCategoryDict = {}
        for cate in item['data']['food_spu_tags']:
            tag = cate['tag']
            pageCategoryDict[tag] = cate
        item['pageCategoryDict'] = pageCategoryDict
        print('shop_name:',item['shop_name'])
        if not collection2.find_one({'id': item['id']}):
            collection2.insert_one(item)
        else:
            collection2.update({'id': item['id']}, item)
    if target_url in flow.request.url:
        shop_id = response.soterid
        print('>>>>>>店铺id:',shop_id)
        data = json.loads(flow.response.text)['data']
        tag_id = data['product_tag_id']
        nextPageIndex = data['current_page']
        id = f'{shop_id}_{tag_id}_{nextPageIndex}'
        spuList = data['product_spu_list']
        item = {}
        item['shop_id'] = shop_id
        item['tag_id'] = 1
        item['id'] = 2
        item['spuList'] = spuList
        if spuList:
            for spu in spuList:
                spu_name = spu['name']
                print('>>>>>>插入:',spu_name)
                item1={}
                item1["shopid"] = shop_id
                spu_id= spu["id"]
                sputableid = f'{shop_id}_{tag_id}_{spu_id}'
                print('>>>>>>插入:',sputableid)
                item1["id"] = sputableid
                item1["name"] = spu_name
                item1["minprice"] =spu["min_price"]
                item1["promotioninfo"] =spu["promotion_info"]
                item1["monthsale"] =spu["month_saled_content"]
                goods=spu['skus']
                if goods:
                  for sku in goods:
                    item1["stock"]=sku['stock']
                    item1["status"] =sku["status"]
                collection3.insert_one(item1)
            if not collection.find_one({'id':id}):
                collection.insert_one(item)
            else:
                collection.update({'id':id},item)

#初次进入门店鼠标坐标
x=1121
y=389
# 进入门店
def enterStore(x,y):
	time.sleep(2)
	myMouse.click(x,y,1,1)
# 退出门店
def exitStore(x,y):
	time.sleep(2)
	myMouse.click(776,147,1,1)
	print("已退出门店")
	#移会门店列表
	myMouse.move(x,y)

FirestStore=True
storenum=0;
def nextStore(FirestStore,storenum,x,y):
	exitStore(x,y)
	time.sleep(2)
	print("准备进入下一个门店")
	if FirestStore==True:
		win32api.mouse_event(win32con.MOUSEEVENTF_WHEEL,0,0,-400)
		FirestStore=False
		print(FirestStore)
	else:
			if storenum<=3:
				storenum=storenum+1
				y=y-30
				myMouse.move(x,y)
			else:
				print(FirestStore)
				win32api.mouse_event(win32con.MOUSEEVENTF_WHEEL,0,0,-50)


	enterStore(x,y)

nowP=getcoord()
print(nowP)
#鼠标移动到坐标(x,y)处
#myMouse.move(600,800)
#鼠标点击，x,y是坐标位置 button 1表示左键，2表示点击右键 n是点击次数，默认是1次，2表示双击
myMouse.click(1123,387,1,1)
print("点击事件完成")
time.sleep(3)
win32api.mouse_event(win32con.MOUSEEVENTF_WHEEL,0,0,-300)
print("滚动事件完成")
nowP=getcoord()
#进入第一个门店
enterStore(nowP[0],nowP[1])


nextStore(FirestStore,storenum,nowP[0],nowP[1])
nextStore(FirestStore,storenum,nowP[0],nowP[1])
nextStore(FirestStore,storenum,nowP[0],nowP[1])

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142

高端方法

反编译程序或者找到抓包程序的缓存文件，这些请求记录肯定是会有缓存文件的，找到它的存储位置然后直接读取文件进行数据分析。
(暂时还没有找到)

总结

只要是数据交换那就一定能抓到包。不要轻言放弃有时候换个思考方式就能得到想要的答案。我在用这个神器之前也试过 Proxifier 、Wireshark等相对出名的数据分析程序。但是他们抓到的包都是没有经过任何处理的原始数据，这种数据对于我这种菜鸡来说和天书没啥区别。当时就准备放弃了。

相关阅读:
2022年最新广西建筑八大员（土建质量员）考试试题题库及答案
 LinkedList相较于Arravlist的特点/优化(面试笔记总结速记)
【操作系统】进程，线程和协程的哪些事儿
 [附源码]Python计算机毕业设计Django宁财二手物品交易网站
 手机端侧文字识别：挑战与解决方案
 String 为什么不可变？不可变有什么好处？
中期国际2.19黄金市场分析：美国通胀数据火热，黄金面临高利率削弱的挑战
 Java中Long型数据类型对应MySQL数据库中哪种类型？
处理.git文件夹过大出现臃肿问题-filter-branch和BFG工具
 照片太大怎么缩小kb？
原文地址：https://blog.csdn.net/qq_42455262/article/details/127447454