Python爬虫编程思想（47）：项目实战：抓取豆瓣Top250图书榜单

2 years ago

source link: https://blog.csdn.net/nokiaguy/article/details/120679836
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Python爬虫编程思想（47）：项目实战：抓取豆瓣Top250图书榜单

专栏收录该内容

48 篇文章 2 订阅 ¥29.90 ¥99.00

本文使用requests库、lxml库以及XPath抓取豆瓣网Top250图书排行榜。读者可以通过https://book.douban.com/top250访问Top250图书榜单，如图1所示。

在开始编写爬虫之前，先要分析一下Top250榜单代码和页面切换的规律。首先来分析一下页面切换的规则。在页面的最下方是分页导航条，分别切换到第1页、第2页、第3页、第4页，在地址栏会看到如下的4个URL

https://book.douban.com/top250?start=0

https://book.douban.com/top250?start=25

https://

Recommend

Python爬虫编程思想（47）：项目实战：抓取豆瓣Top250图书榜单

Python爬虫编程思想（47）：项目实战：抓取豆瓣Top250图书榜单

Recommend

GitHub - deno-web3/solc: 💎 Solidity bindings for Deno

Python爬虫编程思想（42）：XPath实战：匹配属性

快手直播入门手册

深度解读 | 《2021年中国企业级 SaaS 行业研究报告》趋势剖析

ReacType 8.0: Your Preferred React Prototyping Tool Now Exports With Tests

New AWS program seeks solutions for a more equitable, sustainable health system

Then and now: AWS pushes 10 years of GovCloud, sets its own new bar

园区智慧招商大数据平台“园商”获普维资本天使轮投资

博客系统知多少：揭秘那些不为人知的学问（三）

Google Maps API JS - MarkerClusterer - Unable to read 'maxZoom' property of unde...

About Joyk