python的unicode问题实在是让人痛苦,本身要写一段小程序,时间都被浪费在处理unicode上面了。 我的python版本
python -c 'import sys;print sys.version' 2.6.4 (r264:75706, Dec 7 2009, 18:45:15) [GCC 4.4.1] import os,sys from BeautifulSoup import BeautifulSoup, SoupStrainer def get_info(cont): print type(cont) soup = BeautifulSoup(cont) a = soup.findAll('a') print type(a) print(a) if __name__ == "__main__": s = sys.stdin.read() s = unicode(s, 'utf-8') get_info(s) 出错信息:
505 ~/script/notsobad/python/tool>cat /tmp/book_2742/index.html | ./book_res.py Traceback (most recent call last): File "./book_res.py", line 28, in get_info(s) File "./book_res.py", line 20, in get_info print(a) UnicodeEncodeError: 'ascii' codec can't encode characters in position 79-82: ordinal not in range(128) type(a) 是unicode,但是print a却报错。 最后在python的mail list里找到了篇 http://mail.python.org/pipermail/tutor/2005-August/040991.html http://mail.python.org/pipermail/tutor/2005-August/040993.html
...