随着微信防爬技术的再度升级,之前Python+mitmproxy利用网络代理这种中间人的方式已经爬取不到微信小程序的数据了。俗话世上无难事只要肯放弃,本以为已经无计可施了,在某天不经意间发现一个神器依然可以抓到小程序的包,于是开始研究如何实现自动爬取。



它是直接扫描进程池,通过进程监听实现抓包。
直接使用python控制鼠标点击,就像按键精灵一样把JSON复制粘贴出来然后进行数据分析
import pyWinhook as pyHook
import pythoncom
from pymouse import PyMouse
import win32api
import win32con
import time
import json
import re
import pymongo
myMouse = PyMouse()
#开启数据库连接
collection = pymongo.MongoClient()['美团外卖']['药店']
collection2 = pymongo.MongoClient()['美团外卖']['药店_shop_info']
collection3 = pymongo.MongoClient()['美团外卖']['药店_spuList']
# 存储门店id
def static_vars(**kwargs):
def decorate(func):
for k in kwargs:
setattr(func, k, kwargs[k])
return func
return decorate
@static_vars(soterid = '')
#获取当前的鼠标位置
def getcoord():
nowP = myMouse.position()
return nowP
# 监听爬取
def response(flow):
target_url = 'poi/sputag/products'
shop_info_url = 'poi/food'
if shop_info_url in flow.request.url:
item = {}
item['data'] = json.loads(flow.response.text)['data']
item['id'] = item['data']['poi_info']['id']
response.soterid=item['id']
item['shop_name'] = item['data']['poi_info']['name']
pageCategoryDict = {}
for cate in item['data']['food_spu_tags']:
tag = cate['tag']
pageCategoryDict[tag] = cate
item['pageCategoryDict'] = pageCategoryDict
print('shop_name:',item['shop_name'])
if not collection2.find_one({'id': item['id']}):
collection2.insert_one(item)
else:
collection2.update({'id': item['id']}, item)
if target_url in flow.request.url:
shop_id = response.soterid
print('>>>>>>店铺id:',shop_id)
data = json.loads(flow.response.text)['data']
tag_id = data['product_tag_id']
nextPageIndex = data['current_page']
id = f'{shop_id}_{tag_id}_{nextPageIndex}'
spuList = data['product_spu_list']
item = {}
item['shop_id'] = shop_id
item['tag_id'] = 1
item['id'] = 2
item['spuList'] = spuList
if spuList:
for spu in spuList:
spu_name = spu['name']
print('>>>>>>插入:',spu_name)
item1={}
item1["shopid"] = shop_id
spu_id= spu["id"]
sputableid = f'{shop_id}_{tag_id}_{spu_id}'
print('>>>>>>插入:',sputableid)
item1["id"] = sputableid
item1["name"] = spu_name
item1["minprice"] =spu["min_price"]
item1["promotioninfo"] =spu["promotion_info"]
item1["monthsale"] =spu["month_saled_content"]
goods=spu['skus']
if goods:
for sku in goods:
item1["stock"]=sku['stock']
item1["status"] =sku["status"]
collection3.insert_one(item1)
if not collection.find_one({'id':id}):
collection.insert_one(item)
else:
collection.update({'id':id},item)
#初次进入门店鼠标坐标
x=1121
y=389
# 进入门店
def enterStore(x,y):
time.sleep(2)
myMouse.click(x,y,1,1)
# 退出门店
def exitStore(x,y):
time.sleep(2)
myMouse.click(776,147,1,1)
print("已退出门店")
#移会门店列表
myMouse.move(x,y)
FirestStore=True
storenum=0;
def nextStore(FirestStore,storenum,x,y):
exitStore(x,y)
time.sleep(2)
print("准备进入下一个门店")
if FirestStore==True:
win32api.mouse_event(win32con.MOUSEEVENTF_WHEEL,0,0,-400)
FirestStore=False
print(FirestStore)
else:
if storenum<=3:
storenum=storenum+1
y=y-30
myMouse.move(x,y)
else:
print(FirestStore)
win32api.mouse_event(win32con.MOUSEEVENTF_WHEEL,0,0,-50)
enterStore(x,y)
nowP=getcoord()
print(nowP)
#鼠标移动到坐标(x,y)处
#myMouse.move(600,800)
#鼠标点击,x,y是坐标位置 button 1表示左键,2表示点击右键 n是点击次数,默认是1次,2表示双击
myMouse.click(1123,387,1,1)
print("点击事件完成")
time.sleep(3)
win32api.mouse_event(win32con.MOUSEEVENTF_WHEEL,0,0,-300)
print("滚动事件完成")
nowP=getcoord()
#进入第一个门店
enterStore(nowP[0],nowP[1])
nextStore(FirestStore,storenum,nowP[0],nowP[1])
nextStore(FirestStore,storenum,nowP[0],nowP[1])
nextStore(FirestStore,storenum,nowP[0],nowP[1])
反编译程序或者找到抓包程序的缓存文件,这些请求记录肯定是会有缓存文件的,找到它的存储位置然后直接读取文件进行数据分析。
(暂时还没有找到)
只要是数据交换那就一定能抓到包。不要轻言放弃有时候换个思考方式就能得到想要的答案。我在用这个神器之前也试过 Proxifier 、Wireshark等相对出名的数据分析程序。但是他们抓到的包都是没有经过任何处理的原始数据,这种数据对于我这种菜鸡来说和天书没啥区别。当时就准备放弃了。