Python爬虫编程思想（151）：使用Scrapy抓取数据，用ItemLoader保存单条抓取的数据

2 years ago

source link: https://blog.csdn.net/nokiaguy/article/details/125464683
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Python爬虫编程思想（151）：使用Scrapy抓取数据，用ItemLoader保存单条抓取的数据

在前面的文字中通过parse方法返回一个MyscrapyItem对象的方式将抓取的数据保存到指定的文件中，本文会介绍另外一种保存数据的方式：ItemLoader。

本质上，ItemLoader对象也是通过返回一个item的方式保存数据的，只不过ItemLoader对象将item和response（用于从服务端获取响应数据的对象）进行了封装。

ItemLoader类的构造方法常用的参数有2个：item和response，其中item用于指定Item对象（如本例的MyscrapyItem对象），response用于指定从服务的获取数据的对象（本例是response，也是parse方法的第2个参数）。

下面的例子会通过一个ItemLoader对象以及XPath截取文章列表的第一篇文章的标题、摘要和Url。它们分别保存在title、abstract和href三个属性中。最后在运行网络爬虫时会通过“-o”命令行参数指定保存的文件类型（通过扩展名确定文件类型），成功运行后，就会将抓取到的数据保存到指定的文件中。

Recommend

Python爬虫编程思想（151）：使用Scrapy抓取数据，用ItemLoader保存单条抓取的数据

Python爬虫编程思想（151）：使用Scrapy抓取数据，用ItemLoader保存单条抓取的数据

Recommend

Cosmonious High Has Added Some Major Accessibility Features

Windows 10 lifetime license only $13, Office $28 -- end of June super discounts...

男人的小浪漫，达尔优EH732机甲版耳机体验_原创_新浪众测

是否布局“造手机”？上汽相关人士：手机和车机融合有深度思考

vue2、Vue3折腾记 | CHEGVA

3 reasons to consider adopting AI cybersecurity tools

Intel Core i9-13900 engineering sample is 20% faster than Alder Lake in new benc...

Amazon Connect Offers General Availability of Outbound Campaigns for Calls, Text...

How I built a dark mode toggle

Now and Then season finale sizzles with classy trash [Apple TV+ recap]

About Joyk