1

Python爬虫编程思想(41):XPath实战:选取DOM节点

 2 years ago
source link: https://blog.csdn.net/nokiaguy/article/details/120678672
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Python爬虫编程思想(41):XPath实战:选取DOM节点

专栏收录该内容
50 篇文章 2 订阅 ¥29.90 ¥99.00

目录



1 选取所有节点

2. 选取子节点

3. 选取父节点


1 选取所有节点

        以2个斜杠(//)开头的XPath规则会选取所有符合要求的节点。如果使用'//*',那么会选取整个HTML文档中所有的节点

本文会使用urllib3抓取猫眼电影Top100榜单,读者使用下面的URL进入Top100榜单页面。https://maoyan.com/board/4Top100榜单页面如图1所示。从Top100榜单页面可以看出,每一页有10部电影,共10页,一共100部电影。页面下方是导航,用于切换1至10个页面。这个爬虫的目的就是抓取这100部电影的信息(如电影封面图像的URL、电影名称、演员列表、评分、上映时间等),然后将这些数据以JSON格式保存到名为board.txt的文本...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK