2

爬取猪八戒网站_wx634e10232b539的技术博客_51CTO博客

 1 year ago
source link: https://blog.51cto.com/u_15834166/5852667
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

爬取猪八戒网站

精选 原创

朝暮与庸碌 2022-11-15 14:36:25 ©著作权

文章标签 html chrome safari 文章分类 其它 其它 阅读数158

爬取猪八戒网站

1.网站分析

首先在搜索框中输入saas

爬取猪八戒网站_html

我们主要获取价格、标题、评分、销量、好评、企业名称,在使用Xpath的时侯,从网站上复制的Xpath和返回的Xpath存在差异,所以我们在获取的时候按class进行查找。

爬取猪八戒网站_html_02

2.代码实现

import pandas as pd
import requests
from lxml import etree

url = 'https://shijiazhuang.zbj.com/search/service/?kw=saas&r=2'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
}
resp = requests.get(url=url, headers=headers)

html = etree.HTML(resp.text)
datas = html.xpath('//*[@id="__layout"]/div/div[3]/div/div[3]/div[4]/div[1]/div')
info_list = []
for data in datas:
# 网页上的路径和实际路径不同
price = data.xpath('.//div[@class="price"]/span/text()')[0] # 价格
title = data.xpath('.//div[@class="name-pic-box"]/a/text()')[0] # 标题
score = data.xpath('.//div[@class="fraction"]/span[1]/text()')[0] # 评分
sale = data.xpath('.//div[@class="sales"]//span[@class="num"]/text()')[0] # 销量
good = data.xpath('.//div[@class="evaluate"]//span[@class="num"]/text()')[0] # 好评
com_name = data.xpath('.//div[@class="shop-info text-overflow-line"]/text()')[0] # 公司名
info = {
'价格': price,
'标题': title,
'评分': score,
'销量': sale,
'好评': good,
'公司名': com_name
}
info_list.append(info)

pd.DataFrame(info_list).to_csv('../data/猪八戒.csv')

3.结果查看

打开文件:

爬取猪八戒网站_chrome_03
  • 收藏
  • 评论
  • 分享
  • 举报

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK