49

Golang协程与Python协程速度比较

 4 years ago
source link: https://www.tuicool.com/articles/Az6jEfQ
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

本实验通过抓取50页诗词并做对每一页里面的a标签对应的html页面(每页40个a标签)进行抓取,然后对html做简单解析,总共就是请求50+50*40 = 2050个页面,并解析这些页面的html内容。

1.Python 速度

总耗时: 31.947秒 多次试验是在32s左右

7ZjQVvq.png!web

image.png

iymYbaV.png!web

image.png

ZBFf2mE.png!web

image.png

源码:

from bs4 import BeautifulSoup
import time
import aiohttp
import asyncio

async def do_task(domain, pageUrl):
    async with aiohttp.ClientSession() as session:
        async with session.request('GET', pageUrl) as resp:
            if resp.status != 200:
                raise Exception('http error, url:{} code:{}'.format(pageUrl, resp.status))
            html = await resp.read()  # 可直接获取bytes
    soup = BeautifulSoup(html, 'html.parser')
    for h in soup.select('h3>a'):
        url = ''.join([domain, h.get('href')])
        async with aiohttp.ClientSession() as session:
            async with session.request('GET', url) as resp:
                if resp.status != 200:
                    raise Exception('http error, url:{} code:{}'.format(pageUrl, resp.status))
                html = await resp.read()  # 可直接获取bytes
        print('url:{} title:{}'.format(url, parse_text(html)))


def parse_text(html):
    soup = BeautifulSoup(html, 'html.parser')
    return str(soup.select('.shici-title')[0].get_text())


def main():
    domain = 'http://www.shicimingju.com'
    urlTemplate = domain + '/chaxun/zuozhe/9_{0}.html'
    pageNum = 50  # 读取50页诗词进行测试
    loop = asyncio.get_event_loop()  # 获取事件循环
    tasks = []
    for num in range(pageNum + 1):
        tasks.append(do_task(domain, urlTemplate.format(num + 1)))
    loop.run_until_complete(asyncio.wait(tasks))  # 协程
    loop.close()


if __name__ == '__main__':
    start = time.time()
    main()  # 调用方
    print('总耗时:%.3f秒' % float(time.time() - start))

2. Golang 速度

总耗时: 15.366秒 多次试验基本是在15s左右,最快的几次甚至到了12s,13s,最差也是22s。

biQzmuq.png!web

image.png

myUF7jY.png!web

image.png

raym6bI.png!web

image.png

源码:

package main

import (
    "fmt"
    "github.com/PuerkitoBio/goquery"
    "strconv"
    "strings"
    "sync"
    "time"
)

func do_task(url string, domain string) {
    p, err := goquery.NewDocument(url)
    if err != nil {
        panic(err)
    } else {
        p.Find("h3").Find("a").Each(func(i int, selection *goquery.Selection) {
            href, _ := selection.Attr("href")
            link := domain + href
            h, err := goquery.NewDocument(link)
            if err != nil {
                panic(err)
            } else {
                title := h.Find(".shici-title").Text()
                fmt.Printf("url:%s title:%s \n", link, title)
            }
        })
    }
    wg.Done()
}

var wg sync.WaitGroup

func main()  {
    start := time.Now().UnixNano()
    domain := "http://www.shicimingju.com"
    urlTemplate := domain + "/chaxun/zuozhe/9_{:num}.html"
    pageNum := 50
    wg.Add(50)
    for page := 1; page <= pageNum; page++ {
        url := strings.Replace(urlTemplate, "{:num}", strconv.Itoa(page), -1)
        go do_task(url, domain)
    }
    wg.Wait()
    end := time.Now().UnixNano()
    fmt.Printf("总耗时:%.3f秒 \n", float32(end - start)/1000000000)
}

3.结论

Golang的性能妥妥地要比Python,PHP好,毕竟是为并发而生的语言。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK