10

Python远程控制模块paramiko遇到的问题及解决记录

 3 years ago
source link: https://zhang.ge/5122.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Jager · 6月16日 · 2017年python · shell · shell脚本 663次已读

最近一直在开发自动化运维发布平台,底层命令行、文件通道主要基于paramiko模块,使用过程中遇到各种各样的问题,本文主要用于收集问题及解决记录,以备后续使用。

一、Error reading SSH protocol banner连接错误

这个关键词,在百度、谷歌一搜一大把的提问,也有少部分给出了解决方案,但是最终都无法解决,我经过不断尝试和解读paramiko源码,终于搞定了这个问题,在此记录分享下。

1、具体报错信息:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.linux-x86_64/egg/paramiko/client.py", line 307, in connect
File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 465, in start_client
paramiko.SSHException: Error reading SSH protocol banner

2、解决办法:

重新下载paramiko插件源码,解压后,编辑安装目录下的transport.py文件:

vim build/lib/paramiko/transport.py

搜索 self.banner_timeout 关键词,并将其参数改大即可,比如改为300s:

self.banner_timeout = 300

最后,重装paramiko即可。

3、下面的曲折、啰嗦的解决过程,不喜请跳过:

在谷歌搜到一个老外相关提问,虽然他说的是pysftp,其实也是基于paramiko:

https://stackoverflow.com/questions/34288526/pysft-paramiko-grequests-error-reading-ssh-protocol-banner/44493465#44493465

他最后给出了他的解决方案:

UPDATE:

It seems the problem is caused by importing the package grequests. If I do not import grequests, pysftp works as expected. The issue was raised before but has not been solved

意思是,在paramiko使用前,先import grequests,就能解决问题。我照做之后,发现对手头的现网环境无效,可能错误产生的原因不一样。

但是,我从老外的问题描述过程中,找到了解决方法,他是这样说的:

I have already tried changing the banner timeout from 15 seconds to 60 secs in the transport.py, but it did not solve the problem.

我看到有个timeout和transport.py,就想到现网那些报Error reading SSH protocol banner错误的机器也是非常卡,而且目测了下发起paramiko连接到报错的时间,基本是相同的。

于是系统中搜索,并找到了transport.py这个文件:

/usr/lib/python2.7/site-packages/paramiko/transport.py

并搜了下banner,发现果然有一个参数设置,而且和目测的超时基本一致!

于是,顺手修改成300S,并重新测试发现没任何效果,依然15S超时。接着打断点、甚至移走这个文件,问题依旧!!看来这个文件不会被引用。。。

回到最初的报错信息,发现里面显示的是:

build/bdist.linux-x86_64/egg/paramiko/transport.py

而系统里面搜不到这个问题,最后醍醐灌顶,发觉Python模块编译后,基本是以egg文件保存的,看来 必须修改源码才行了。

于是cd到paramiko的源码目录,执行搜索,找到2各transport.py文件:

[root@localhost:/data/software/paramiko-1.9]# find . -name transport.py
./paramiko/transport.py
./build/lib/paramiko/transport.py

尝试将文件中的 self.banner_timeout 值改成300,重新安装paramiko,结果一次性测试成功!

然后,我顺便在老外的帖子回答了下(请忽略蹩脚的英语),算是回馈吧!

二、paramiko远程执行后台脚本“阻塞”问题

我写的远程命令通道上线之后,发现在远程脚本中后台再执行另一个脚本,通道会一直等待后台脚本执行完成才会返回,有时甚至会僵死。

1、复现过程如下:

①、编写测试脚本

脚本1:test.sh

#!/bin/bash
sleep 30
echo test end
exit 0

脚本2:run.sh

#!/bin/bash
bash /tmp/test.sh &
echo run ok!
exit 0

脚本3:test.py

import paramiko
client = paramiko.SSHClient()
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(hostname='192.168.1.10', port=22, username='root', password='123456', timeout=300,allow_agent=False,look_for_keys=False)
stdin,stdout,stderr=client.exec_command("bash /tmp/run.sh")
result_info = ""
for line in stdout.readlines():
result_info += line
print result_info

将test.sh和run.sh传到远程服务器上,比如放到192.168.1.10:/tmp/下。

②、发起远程执行

在本地执行 python test.py,会发现整个脚本不会立即打印run ok,而是等30s之后才打印包括test.sh的所有输出信息。

2、解决办法

将远程脚本的标准输出stdout重定向到错误输出stderr即可,test.py 修改如下:

import paramiko
client = paramiko.SSHClient()
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(hostname='192.168.1.10', port=22, username='root', password='123456', timeout=300,allow_agent=False,look_for_keys=False)
stdin,stdout,stderr=client.exec_command("bash /tmp/run.sh 1>&2")
result_info = ""
for line in stderr.readlines():
result_info += line
print result_info

现在执行,就能立即得到结果了。其实原因很简单,因为stdout(标准输出),输出方式是行缓冲。输出的字符会先存放在缓冲区,等按下回车键时才进行实际的I/O操作,导致paramiko远程命令产生等待问题。而stderr(标准错误),是不带缓冲的,这使得出错信息可以直接尽快地显示出来。所以,这里只要将脚本执行的标准输出重定向到错误输出(1>&2),然后paramiko就可以使用stderr快速读取远程打屏信息了。

三、This operation would block forever 报错解决

这次扩容一个基于pramiko的自动化apiserver,结果发现在新环境执行远程命令或文件传输时,抛了如下报错:

2017-08-04 12:38:31,243 [ERROR] Exception: Error reading SSH protocol banner('This operation would block forever', <Hub at 0x38b02d0 epoll pending=0 ref=0 fileno=28>)
2017-08-04 12:38:31,244 [ERROR] Traceback (most recent call last):
2017-08-04 12:38:31,244 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1555, in run
2017-08-04 12:38:31,245 [ERROR] self._check_banner()
2017-08-04 12:38:31,245 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1681, in _check_banner
2017-08-04 12:38:31,245 [ERROR] raise SSHException('Error reading SSH protocol banner' + str(x))
2017-08-04 12:38:31,245 [ERROR] SSHException: Error reading SSH protocol banner('This operation would block forever', <Hub at 0x38b02d0 epoll pending=0 ref=0 fileno=28>)
2017-08-04 12:38:31,245 [ERROR]
2017-08-04 12:38:31,247 [INFO] Error reading SSH protocol banner('This operation would block forever', <Hub at 0x38b02d0 epoll pending=0 ref=0 fileno=28>)

总以为是python组件安装有问题,反反复复检查,最终发现居然是多装了一个插件导致的!

解决办法:

删除已经安装 greenlet插件即可,具体原因见后文:

rm -r /usr/local/python2.7.5/lib/python2.7/site-packages/greenlet*

下面是"艰难险阻"的解决过程,不喜勿看:

1、看到报错,作为懒人第一时间就搜了下 【This operation would block forever', <Hub】这个关键词,发现没能get到解决方案。

2、按照经验,我先找到图中 _check_banner 函数如下:

def _check_banner(self):
# this is slow, but we only have to do it once
for i in range(100):
# give them 15 seconds for the first line, then just 2 seconds
# each additional line. (some sites have very high latency.)
if i == 0:
timeout = self.banner_timeout
else:
timeout = 2
buf = self.packetizer.readline(timeout)
except ProxyCommandFailure:
raise
except Exception, x:
raise SSHException('Error reading SSH protocol banner' + str(x))
if buf[:4] == 'SSH-':
break
self._log(DEBUG, 'Banner: ' + buf)
if buf[:4] != 'SSH-':
raise SSHException('Indecipherable protocol version "' + buf + '"')
# save this server version string for later
self.remote_version = buf
# pull off any attached comment
comment = ''
i = string.find(buf, ' ')
if i >= 0:
comment = buf[i+1:]
buf = buf[:i]
# parse out version string and make sure it matches
segs = buf.split('-', 2)
if len(segs) < 3:
raise SSHException('Invalid SSH banner')
version = segs[1]
client = segs[2]
if version != '1.99' and version != '2.0':
raise SSHException('Incompatible version (%s instead of 2.0)' % (version,))
self._log(INFO, 'Connected (version %s, client %s)' % (version, client))

3、很明显这个异常由 buf = self.packetizer.readline(timeout) 语句抛出,我印象中的粗暴定位方法就是不使用try,直接将此语句执行看看:

def _check_banner(self):
# this is slow, but we only have to do it once
for i in range(100):
# give them 15 seconds for the first line, then just 2 seconds
# each additional line. (some sites have very high latency.)
if i == 0:
timeout = self.banner_timeout
else:
timeout = 2
buf = self.packetizer.readline(timeout) # 我就加到,看看是从哪出来的异常
buf = self.packetizer.readline(timeout)
except ProxyCommandFailure:
raise
except Exception, x:
raise SSHException('Error reading SSH protocol banner' + str(x))
if buf[:4] == 'SSH-':
break
self._log(DEBUG, 'Banner: ' + buf)
.....

结果报错信息就更加具体了,如下所示:

2017-08-04 13:23:26,085 [ERROR] Unknown exception: ('This operation would block forever', <Hub at 0x20390f0 epoll pending=0 ref=0 fileno=27>)
2017-08-04 13:23:26,087 [ERROR] Traceback (most recent call last):
2017-08-04 13:23:26,088 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1555, in run
2017-08-04 13:23:26,088 [ERROR] self._check_banner()
2017-08-04 13:23:26,088 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1676, in _check_banner
2017-08-04 13:23:26,088 [ERROR] buf = self.packetizer.readline(timeout)
2017-08-04 13:23:26,088 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 280, in readline
2017-08-04 13:23:26,088 [ERROR] buf += self._read_timeout(timeout)
2017-08-04 13:23:26,088 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 468, in _read_timeout
2017-08-04 13:23:26,089 [ERROR] x = self.__socket.recv(128)
2017-08-04 13:23:26,089 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/_socket2.py", line 280, in recv
2017-08-04 13:23:26,089 [ERROR] self._wait(self._read_event)
2017-08-04 13:23:26,089 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/_socket2.py", line 179, in _wait
2017-08-04 13:23:26,089 [ERROR] self.hub.wait(watcher)
2017-08-04 13:23:26,089 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/hub.py", line 630, in wait
2017-08-04 13:23:26,089 [ERROR] result = waiter.get()
2017-08-04 13:23:26,089 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/hub.py", line 878, in get
2017-08-04 13:23:26,090 [ERROR] return self.hub.switch()
2017-08-04 13:23:26,090 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/hub.py", line 609, in switch
2017-08-04 13:23:26,090 [ERROR] return greenlet.switch(self)
2017-08-04 13:23:26,090 [ERROR] LoopExit: ('This operation would block forever', <Hub at 0x20390f0 epoll pending=0 ref=0 fileno=27>)
2017-08-04 13:23:26,090 [ERROR]
2017-08-04 13:23:26,093 [INFO] ('This operation would block forever', <Hub at 0x20390f0 epoll pending=0 ref=0 fileno=27>)

这次基本就定位到了gevent和greenlet这个真凶了!本以为是我的apiserver调用了gevent,结果定位了半天,确定并没有使用。而且印象中paramiko这个插件也没用到gevent,可这异常是怎么来的?

直到我再次在谷歌搜索【LoopExit: ('This operation would block forever', <Hub at】关键词,找到一个博客文章:http://www.hongquan.me/?p=178,总算知道是什么原因了!

具体原因:主要是因为 greenlet 里面有个run函数,覆盖了 paramiko 的transport.py 里面的同名函数,导致paramiko执行_check_banner时,实际调用了greenlet的run函数,因此报错!再次醉了!

《未完待续,更多问题,后续继续补充...》


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK