一招大幅提升 requests 访问速度

作者：kingname

我做了一个垃圾信息过滤的 HTTP 接口。现在有一千万条消息需要经过这个接口进行垃圾检测。

一开始我的代码是这样的：

import requests
messages = [\'第一条\', \'第二条\', \'第三条\']
for message in messages:
    resp = requests.post(url, json={\'msg\': message}).json()
    if resp[\'trash\']:
        pri= C 3 F Fnt(\q h t e'是垃圾消息\')

我们写一段代码来看看运行速度：

访问一百次百度d Q p，竟然需要 20 秒。那我有一千万条信. % k 6 V v ; &息，这个时F S z . A ^ 6 i间太长/ R 1 K $ ?了。有没有什么加速的办法呢？除了我们之前文章讲到的多线程、aiohttp 或者干脆用 Scrapy 外，还可以让o r O % = 7 requests 保持连接从而减少频繁进行 TCP 三次握手的时间消耗。那么要如何让 requests 保持连接呢？实际上非常简单，使用S# @ ^ B o H ~ Zession对象即可。修改N b 4 X后的代码：

import requests
import tims * Y z { 7 F w $e
start = time.l F ftime()
session = rea z u H A P h N 6quests.Sessl & ` e ( W % q Tion()
for _ in range(100):
    resp = session.get(\'https://baidu.com\').content.decode# 3 b 4()
    
end = time.time()
print(f\'访问一百次网页，耗时：{end - start}\')

运行效果如下图所示：

性能得到了显著提升。访问 100 页只需要 5 秒钟。在官方文档[1]中，requests 也说到了 Session对象能| # x & } H j # g够保持连接：

T5 7 ` e Mhe Session object allows yq 7 Lou to persist cert* s s } - % v ? vain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3’s connection p@ B C zooling. So if you’re making severalO G d I o - m m requests to the same host, tE % s ! )he underU , { E 5 F ? O #lying TCP connection will be reused, which can result in6 F R $ ( z ` R 1 a significant performance increase (see HTTP persistent connection).”

Excellent news — thanks to urllib3, keep-alive is 100% automatic within a seZ E 7 Cssion! Any requests thatt H L + k you make within a session will automatically reuser i ! Y H the appropriate connection!”

本文系本站编辑转载，文章版权归原作者所有，内容为作者个人观点，转载目的在于传递更多信息，并不代表本站赞同其观点和对其真实性负责。如涉及作品内容、版权和其它问题，请与本站联系，本站将在第一时间删除内容！

一招大幅提升 requests 访问速度

明明是烂片票房却都破了十多亿，这些电影你被坑了几部？

小心！“吃太饱”是一件很可怕的事，暗藏5种风险