12

Python爬虫编程思想(47):项目实战:抓取豆瓣Top250图书榜单

 2 years ago
source link: https://blog.csdn.net/nokiaguy/article/details/120679836
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Python爬虫编程思想(47):项目实战:抓取豆瓣Top250图书榜单

专栏收录该内容
48 篇文章 2 订阅 ¥29.90 ¥99.00

        本文使用requests库、lxml库以及XPath抓取豆瓣网Top250图书排行榜。读者可以通过https://book.douban.com/top250访问Top250图书榜单,如图1所示。

        在开始编写爬虫之前,先要分析一下Top250榜单代码和页面切换的规律。首先来分析一下页面切换的规则。在页面的最下方是分页导航条,分别切换到第1页、第2页、第3页、第4页,在地址栏会看到如下的4个URL

https://book.douban.com/top250?start=0

https://book.douban.com/top250?start=25

https://


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK