热线电话:13121318867

登录
2018-10-18 阅读量: 897
python爬虫常见异常

在编写python爬虫时经常会遇到异常中断的情况,导致爬虫意外终止,一个理想的爬虫应该能够在遇到这些异常时继续运行。下面就谈谈这几种常见异常及其处理方法:

异常1:requests.exceptions.ProxyError

对于这个错误,stackoverflow给出的解释是

The ProxyError exception is not actually the requests.exceptions exception; it an exception with the same name from the embedded urllib3 library, and it is wrapped in a MaxRetryError exception.

翻译过来就是这个错误实际上不是requests.exceptions中的异常,这是嵌入到urllib2库中的同名异常,这个异常是封装在MaxRetryError当中的。补充一点,通常在代理服务器不通时出现这个异常。

异常2:requests.exceptions.ConnectionError

对于这个错误,stackoverflow给出的解释是

In the event of a network problem (e.g. DNS failure, refused connection, etc), Requests will raise a ConnectionError exception.

翻译过来就是说这是网络问题出现的异常事件(如DNS错误,拒绝连接,等等),这是Requests库中自带的异常

一种解决办法是捕捉基类异常,这种方法可以处理所有的异常情况:

try:

r = requests.get(url, params={’s’: thing})

except requests.exceptions.RequestException as e: # This is the correct syntax

print e

sys.exit(1)

另外一种解决办法是分别处理各种异常,这里面有三种异常:

try:

r = requests.get(url, params={’s’: thing})

except requests.exceptions.Timeout:

except requests.exceptions.TooManyRedirects:

except requests.exceptions.RequestException as e:

print e

sys.exit(1)

异常3:requests.exceptions.ChunkedEncodingError

对于这个错误,stackoverflow给出的解释是

The link you included in your question is simply a wrapper that executes urllib’s read() function, which catches any incomplete read exceptions for you. If you don’t want to implement this entire patch, you could always just throw in a try/catch loop where you read your links.

问题中给出的链接是执行urllib’s库的read函数时,捕捉到了读取不完整数据导致的异常。如果你不想实现这个完整的不动,只要在读取你的链接时抛出一个try/catch循环即可:

try:

page = urllib2.urlopen(urls).read()

except httplib.IncompleteRead, e:

page = e.partial

0.0169
4
关注作者
收藏
评论(0)

发表评论

暂无数据
推荐帖子