这个网易云JS解密,老网抑云看了都直呼内行
source link: http://www.happyhong.cn/ni-xiang/10034.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
最近更新频率慢了,这不是因为CK3发售了嘛,一个字就是“肝”。今天来看一下网易云音乐两个加密参数params和encSecKey,顺便抓取一波某歌单的粉丝,有入库哦,使用mysql存储,觉得有帮助的别忘了关注一下公众号啊,完整的JS代码都已整理好,请关注知识图谱与大数据公众号,找到本文点击文末阅读更多获取,也可以点击网易云音乐JS代码
网易云音乐只需要解密params
和encSecKey
就可以开始快乐的抓取了,当然你没有足够的代理IP的话,还是要慢点爬,废话不多说,直接开整。
老规矩,先看看我们要抓的页面样子:
POST
请求,那看看它提交了哪些参数吧,FormData
如下:params
和encSecKey
,是加密过的,看看返回的内容的格式:调试前我们得先找到这两个值的位置,那就搜索呗,先定位JS
文件再定位代码位置。怎么搜索应该都知道的吧,搜params
或encSecKey
均可,搜出来多个结果不确定是哪个文件的话可以每个点进去再搜索一下关键参数,以此确定是否是目标文件,这里我直接标记了正确文件,大家点击进去即可。
params
或encSecKey
:encSecKey
的位置,把这几行代码抠出来分析一下:
var bVZ8R = window.asrsea(JSON.stringify(i0x), bqN0x(["流泪", "强"]), bqN0x(Wx5C.md), bqN0x(["爱心", "女孩", "惊恐", "大笑"]));
e0x.data = j0x.cs1x({
params: bVZ8R.encText,
encSecKey: bVZ8R.encSecKey
})
粗略看,params
和encSecKey
来自bVZ8R.encText
和bVZ8R.encSecKey
,而bVZ8R
是window.asrsea
的结果,window.asrsea
有四个参数,JSON.stringify(i0x), bqN0x(["流泪", "强"]), bqN0x(Wx5C.md), bqN0x(["爱心", "女孩", "惊恐", "大笑"]
,先看后面三个参数,从它们固定的值可以大胆推测这三个值也是固定的。之所以说都是固定的,看看Wx5C.md
:
Wx5C.md
是一个固定好的数组,而bqN0x(["流泪", "强"])
和bqN0x(["爱心", "女孩", "惊恐", "大笑"]
这结果肯定也是不会变的,如下图,测试了一下:
window.asrsea
的具体实现方式,还有i0x
是什么样的,进入调试环节。
window.asrsea
打上断点,我的代码位置是13133
行,点击粉丝列表下一页就会激活断点,在激活断点的同时我们也能一睹i0x
的芳容,在控制台中输入i0x
:
limit、offset、total、userId
这些其实都是可知的,而csrf_token
的产生可以看这里,细心的童靴应该早就发现了:
}
i0x["csrf_token"] = v0x.gP3x("__csrf"); ## csrf_token在这里产生
X0x = X0x.replace("api", "weapi");
e0x.method = "post";
delete e0x.query;
var bVZ8R = window.asrsea(JSON.stringify(i0x), bqN0x(["流泪", "强"]), bqN0x(Wx5C.md), bqN0x(["爱心", "女孩", "惊恐", "大笑"]));
我点进v0x.gP3x
函数看了看:
csrf_token
来自于Cookie
中的__csrf
:window.asrsea
吧。一路点击下一步,进入函数中。d(d,e,f,g)
函数里,稍微往下一看,发现window.asrsea
就等于这个d
函数,哦了,那就调试这个d
函数就行:
function d(d, e, f, g) {
var h = {}
, i = a(16);
return h.encText = b(d, g),
h.encText = b(h.encText, i),
h.encSecKey = c(i, e, f),
h
}
进入a
函数:
a
函数是产生随机数的,继续运行,进入b
函数:AES
加密,继续运行进入c
函数:
RSA
加密,网易可真谨慎,各种加密。到这里总的框架已经调试完了,剩下的无非就是抠JS
代码了。
python运行
这次不是单单运行了结果哦,还带上了爬取与入库:
获取params
和encSecKey
def get_enc(self,a):
with open('..//js//wangyiyun.js', encoding='utf-8') as f:
wangyiyun = f.read()
js = execjs.compile(wangyiyun)
logid = js.call('get_pwd', a)
print(logid)
return logid
def get_fans(self):
resp = self.get_home_page()
print(resp.cookies)
print(resp.status_code)
time.sleep(6)
limit = 20
for i in range(1,110):
print("第{}页".format(i+1))
offset = limit*i
a = {"userId": "46991111", "offset": str(offset), "total": "false", "limit": str(limit), "csrf_token": ""}
print(a)
logid = self.get_enc(a)
data = {
"params":logid["encText"],
"encSecKey":logid["encSecKey"],
}
print(data)
fans_url = "https://music.163.com/weapi/user/getfolloweds?csrf_token="
resp = self.session.post(url=fans_url,data=data,headers=self.headers)
followed = json.loads(resp.text)
followed_list = []
for foll in followed["followeds"]:
foll_dict = {}
foll_dict["short_name"] = foll.get("py","") #缩写
foll_dict["userId"] = foll.get("userId","") #用户ID
foll_dict["nickname"] = foll.get("nickname","") #昵称
foll_dict["vipType"] = foll.get("vipType","") # vip
foll_dict["eventCount"] = foll.get("eventCount","")#动态
foll_dict["vipRights"] = str(foll.get("vipRights","")) #VIP权益
foll_dict["gender"] = foll.get("gender","") #性别
foll_dict["avatarUrl"] = foll.get("avatarUrl","") #头像
foll_dict["followed"] = foll.get("followed","")
foll_dict["followeds"] = foll.get("followeds","") #粉丝
foll_dict["follows"] = foll.get("follows","") #关注
foll_dict["playlistCount"] = foll.get("playlistCount","") #歌单
foll_dict["mutual"] = foll.get("mutual","") #
foll_dict["expertTags"] = str(foll.get("expertTags",""))
foll_dict["experts"] = str(foll.get("experts",""))
print(foll_dict)
followed_list.append(foll_dict)
self.mysql.insert("music",followed_list)
tm = random.randint(10,30)
time.sleep(tm)
这里要注意一下,要抓取指定的页面你还得先访问这个页面,不能直接请求"https://music.163.com/weapi/user/getfolloweds?csrf_token=
这个链接,因为它根本就没有带关于哪个页面的信息。
请求指定网页
def get_home_page(self):
url = "https://music.163.com/#/user/home?id=1737833656"
resp = self.session.get(url)
return resp
@property
def create_table_sql(self):
create_table = """
CREATE TABLE IF NOT EXISTS music (
short_name varchar(30) ,
userId varchar(100) NOT NULL,
nickname varchar(30),
vipType varchar(30) ,
eventCount varchar(200),
vipRights varchar(900),
gender varchar(900),
avatarUrl varchar(200),
followed varchar(30),
followeds varchar(30),
follows varchar(30),
playlistCount varchar(30),
mutual varchar(30),
expertTags varchar(30),
experts varchar(30),
PRIMARY KEY (userId)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4"""
return create_table
def insert(self,table,data_list):
if len(data_list) > 0:
data_list = [{k: v
for k, v in data.items() if v is not None}
for data in data_list]
keys = ", ".join(data_list[0].keys())
values = ", ".join(["%s"] * len(data_list[0]))
sql = """INSERT INTO {table}({keys}) VALUES ({values}) ON
DUPLICATE KEY UPDATE""".format(table=table,
keys=keys,
values=values)
update = ",".join([
" {key} = values({key})".format(key=key)
for key in data_list[0]
])
sql += update
print(sql)
self.connect()
try:
ret = self.cursor.executemany(sql, [tuple(data.values()) for data in data_list])
self.conn.commit()
except Exception as e:
self.conn.rollback()
print("Error: ", e)
traceback.print_exc()
finally:
self.close()
快乐的时光过得真快,到这里就结束了,抓取过程与入库大家可以作为一个参考。完整的JS
代码都已整理好,请关注知识图谱与大数据公众号,找到本文点击文末阅读更多获取,创作不易,请多关注,当然不关注也无所谓。
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK