Python - Requests 爬虫爬取亚马逊产品页, Headers 被识别为机器人

1 year ago

source link: https://www.v2ex.com/t/886930
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

V2EX › Python

Python - Requests 爬虫爬取亚马逊产品页, Headers 被识别为机器人

wyzh97 · 8 小时 32 分钟前 · 1265 次点击

我试图抓取亚马逊的产品页面( https://www.amazon.com/dp/B0B6TR2GTJ), 代码如下:


import requests

url = "https://www.amazon.com/dp/B0B6TR2GTJ"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36', 
    'Accept-Language': 'en-US, en;q=0.5'
}
r= requests.get(url, headers = headers)

print(r.status_code)
print("-------------------")
doc = pq(r.text)  

print(doc("title"))
print("-------------------")
print(r.text)

结果如下(被判断为机器人了): Headers 尝试了各种写法, 都是一样的结果.

503
-------------------
<title>Sorry! Something went wrong!</title>
  
-------------------
<!--
        To discuss automated access to Amazon data please contact [email protected].
        For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com/ref=rm_5_sv, or our Product Advertising API at https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases.
-->
<!doctype html>
......

我爬虫还在初学阶段, 有没有前辈大神帮帮我. 万分感谢

Recommend

Python - Requests 爬虫爬取亚马逊产品页, Headers 被识别为机器人

Python - Requests 爬虫爬取亚马逊产品页, Headers 被识别为机器人

Recommend

DC Fintech Week Explores Risks and Opportunity in Crypto Winter

日本DTC营养主食品牌BASE FOOD拟11月上市，市值达483亿日元

Thermaltake推出钢影TOUGHFAN 12/14 RGB高风压风扇，大胆展现绚丽光彩

GitHub - PurpleVsGreen/beacown

Autistic women and the scourge of the manic pixie dream girl

明明有月薪翻倍的工作，你为什么不跳槽？原因你可能想不到

IPO“一波三折”，绿茶餐厅失去年轻人？

Metamask Users Can Now Buy Crypto Directly From Their Bank Account

建议收藏！亚马逊优质选品网站分析-跨境头条-AMZ123亚马逊导航-跨境电商出海门户

Nokia deploys 5G private wireless for cycling World Championships

About Joyk

Python - Requests 爬虫 爬取亚马逊产品页, Headers 被识别为机器人

Python - Requests 爬虫 爬取亚马逊产品页, Headers 被识别为机器人

Recommend

About Joyk

Python - Requests 爬虫爬取亚马逊产品页, Headers 被识别为机器人

Python - Requests 爬虫爬取亚马逊产品页, Headers 被识别为机器人