4

解析小红书无水印视频直链

 6 months ago
source link: https://iecho.cc/2024/03/03/decode-xiaohongshu-video-url/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

这条笔记为例,其笔记 id 为 65e2c4fb00000000030367bd

获取含水印视频 URL

找到页面 Open Graph 协议的视频标签

<meta name="og:video" content="">

里面的 content 属性就是含水印视频的 URL,格式如下

https://sns-video-hw.xhscdn.com/stream/110/259/01e5e2b96c2b17eb010371038dfdd2b1c0_259.mp4

获取无水印视频 URL

页面源代码中搜索 originVideoKey,找到如下 JSON 字段

{
"originVideoKey":"spectrum\u002F1040g35830vr3bg1860005o6qr60o57fr83a7isg"
}

其中的 \u002F 是 Unicode 编码的 /,你可以用 jq 命令来解码。

 ~ % echo '{"originVideoKey":"spectrum\u002F1040g35830vr3bg1860005o6qr60o57fr83a7isg"}' | jq
{
"originVideoKey": "spectrum/1040g35830vr3bg1860005o6qr60o57fr83a7isg"
}

然后拼接在 https://sns-video-bd.xhscdn.com/ 的尾部,得到无水印视频 URL

https://sns-video-bd.xhscdn.com/spectrum/1040g35830vr3bg1860005o6qr60o57fr83a7isg

网上的小红书解析工具会返回一个路径为 258 的 URL,与上述 URL 不同,但是仍然有效,不知道是怎么构造出来的。对比两个 URL 的差异如下:

不同位                                              1                               1111   1
有水印 https://sns-video-hw.xhscdn.com/stream/110/259/01e5e2b96c2b17eb010371038dfdd2b1c0_259.mp4
无水印 https://sns-video-hw.xhscdn.com/stream/110/258/01e5e2b96c2b17eb010371038dfdd239f3_258.mp4

简单的 Python 脚本

import requests
import re
import json

link = 'https://www.xiaohongshu.com/explore/65e2c4fb00000000030367bd'

def work(url: str) -> dict:
r = requests.get(url)
if r.status_code == 200:

url_with_watermark = re.findall(r'<meta name="og:video" content="(.*?)">', r.text)
if url_with_watermark:
url_with_watermark = url_with_watermark[0]
else:
url_with_watermark = None

key = re.findall(r'{\"originVideoKey\":\".*?\"}', r.text)
if key:
url_without_watermark = "http://sns-video-bd.xhscdn.com/" + json.loads(key[0])["originVideoKey"]
else:
url_without_watermark = None
return {
"url_with_watermark": url_with_watermark,
"url_without_watermark": url_without_watermark
}
else:
print(f"status code: {r.status_code}")
return {
"url_with_watermark": None,
"url_without_watermark": None
}

print(work(link))

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK