1

【笔记】爬虫IP代理池

 1 year ago
source link: https://loli.fj.cn/2023/05/24/%E7%88%AC%E8%99%ABIP%E4%BB%A3%E7%90%86%E6%B1%A0/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

通过jhao104/proxy_pool项目实现爬虫通过IP代理池获取代理IP

  • Redis
git clone https://github.com/jhao104/proxy_pool.git
cd proxy_pool

Docker部署

redis://:密码@IP地址:端口号/数据库:通过redis协议进行连接,可以指定密码,如果不指定密码可以省略

docker run --env DB_CONN=redis://127.0.0.1:6379/1 -p 5010:5010 jhao104/proxy_pool:latest
pip3 install -r requirements.txt

HOST:指定服务的访问地址
PORT:指定服务监听的端口号
DB_CONN:设置连接

redis://:密码@IP地址:端口号/数据库:通过redis协议进行连接,可以指定密码,如果不指定密码可以省略

setting.py

HOST = "0.0.0.0"
PORT = 5010
DB_CONN = 'redis://127.0.0.1:6379/1'

启动调度程序

  • 自动获取网上免费的IP代理,并存储到Redis
python3 proxyPool.py schedule

启动webApi服务

  • 通过HTTP服务操作Redis中存储的IP代理
python3 proxyPool.py server

使用IP代理

<url>:请求地址

import requests


def get_proxy():
return requests.get("http://127.0.0.1:5010/get/").json()


def delete_proxy(proxy):
requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy))


proxy = get_proxy().get("proxy")
response = requests.get(<url>, proxies={"http": "http://{}".format(proxy)})

知乎——CDA数据分析师


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK