4

Python - Requests 爬虫 爬取亚马逊产品页, Headers 被识别为机器人

 1 year ago
source link: https://www.v2ex.com/t/886930
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

V2EX  ›  Python

Python - Requests 爬虫 爬取亚马逊产品页, Headers 被识别为机器人

  wyzh97 · 8 小时 32 分钟前 · 1265 次点击

我试图抓取亚马逊的产品页面( https://www.amazon.com/dp/B0B6TR2GTJ), 代码如下:


import requests

url = "https://www.amazon.com/dp/B0B6TR2GTJ"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36', 
    'Accept-Language': 'en-US, en;q=0.5'
}
r= requests.get(url, headers = headers)

print(r.status_code)
print("-------------------")
doc = pq(r.text)  

print(doc("title"))
print("-------------------")
print(r.text)

结果如下(被判断为机器人了): Headers 尝试了各种写法, 都是一样的结果.

503
-------------------
<title>Sorry! Something went wrong!</title>
  
-------------------
<!--
        To discuss automated access to Amazon data please contact [email protected].
        For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com/ref=rm_5_sv, or our Product Advertising API at https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases.
-->
<!doctype html>
......

我爬虫还在初学阶段, 有没有前辈大神帮帮我. 万分感谢


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK