之前分析一个服务问题的记录
现象:一个python进程卡死
分析:strace看,是在recvfrom(3
这里,fd为3,lsof看这个fd对应的是到work.notsobad.work:8086
的一个已经建立的连接。但是work服务器上并没有这个连接
原因:服务器端连接已经丢失,客户端没有配置超时,在长时间等待
db / # ps aux|grep work/main.py
root 10820 0.0 0.0 112724 2232 pts/1 S+ 16:57 0:00 grep --colour=auto work/main.py
root 47609 2.8 0.0 1498876 74116 ? S 2020 7736:31 /home/work/venv/bin/python2.7 /home/work/main.py
db / # strace -vv -p 47609
Process 47609 attached
recvfrom(3,
^CProcess 47609 detached
<detached ...>
db / #
db / # sudo lsof -i | grep 47609
redis-ser 7281 redis 20u IPv4 390954150 0t0 TCP db.notsobad.work:6384->ui.notsobad.work:47609 (ESTABLISHED)
python2.7 47609 root 3u IPv4 338023916 0t0 TCP db.notsobad.work:48258->work.notsobad.work:8086 (ESTABLISHED)
python2.7 47609 root 4u IPv4 3828936519 0t0 TCP db.notsobad.work:55164->db.notsobad.work:6378 (ESTABLISHED)
python2.7 47609 root 6u IPv4 3828936523 0t0 TCP db.notsobad.work:40212->db.notsobad.work:6379 (ESTABLISHED)
db服务器的IP为10.255.1.1
, 在work上看,找不到10.255.1.1:48258这个连接:
app@work ~ $ netstat -ant |grep 10.255.1.1:48258
app@work ~ $