At some point the following started to happen in my small httplib2-based script:
Traceback: ... File "/usr/lib/python2.7/httplib.py", line 996, in _send_request self.endheaders(body) File "/usr/lib/python2.7/httplib.py", line 958, in endheaders self._send_output(message_body) File "/usr/lib/python2.7/httplib.py", line 816, in _send_output msg += message_body UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 245: ordinal not in range(128)
I know that earlier I would start putting .encode() and .decode() randomly to make it run but now I am much better at understanding the reason of the failure after watching an awesome talk by Net Batchelder titled “Pragmatic Unicode, or, How do I stop the pain?”
Now it took me mere seconds to find the reason. In the traceback above, UnicodeDecodeError was raised because
msg was already a
unicode object, and
message_body was a
str. It happened because the URL supplied to the request method was
unicode. Python 2.7 was trying to concatenate
str, decided that it’s best way to make
message_body a unicode string using the default
ascii encoding, but the content was full of symbols outside ASCII space. Converting URL to
str fixed the issue as URLs are not good candidates to be passed around decoded.
This won’t happen in Python 3 as it will not do such kind of str-to-unicode conversion any more, and it will be a responsibility of the programmer to make sure the application is working properly with unicode and bytes.