1
【笔记】爬虫IP代理池
source link: https://loli.fj.cn/2023/05/24/%E7%88%AC%E8%99%ABIP%E4%BB%A3%E7%90%86%E6%B1%A0/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
通过jhao104/proxy_pool项目实现爬虫通过IP代理池获取代理IP
- Redis
git clone https://github.com/jhao104/proxy_pool.git
cd proxy_pool
Docker部署
redis://:密码@IP地址:端口号/数据库
:通过redis协议进行连接,可以指定密码,如果不指定密码可以省略
docker run --env DB_CONN=redis://127.0.0.1:6379/1 -p 5010:5010 jhao104/proxy_pool:latest
pip3 install -r requirements.txt
HOST
:指定服务的访问地址PORT
:指定服务监听的端口号DB_CONN
:设置连接
redis://:密码@IP地址:端口号/数据库
:通过redis协议进行连接,可以指定密码,如果不指定密码可以省略
setting.py
HOST = "0.0.0.0"
PORT = 5010
DB_CONN = 'redis://127.0.0.1:6379/1'
启动调度程序
- 自动获取网上免费的IP代理,并存储到Redis
python3 proxyPool.py schedule
启动webApi服务
- 通过HTTP服务操作Redis中存储的IP代理
python3 proxyPool.py server
使用IP代理
<url>
:请求地址
import requests
def get_proxy():
return requests.get("http://127.0.0.1:5010/get/").json()
def delete_proxy(proxy):
requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy))
proxy = get_proxy().get("proxy")
response = requests.get(<url>, proxies={"http": "http://{}".format(proxy)})
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK