python3 UnicodeDecodeError

遇到 python UnicodeDecodeError 其實有點頭大。

#!/usr/bin/python3

# -*- coding: utf-8 -*-


import subprocess
import os
import sys
import locale

if __name__ == "__main__":
    locale.setlocale(locale.LC_ALL, 'en_US.UTF8')
    proc = subprocess.Popen(['date'], stdout=subprocess.PIPE,
                                     stderr=subprocess.PIPE,
                                     stdin=subprocess.PIPE,
                                     env={'LANG': 'zh_TW.UTF8', 'LANGUAGE': 'zh_TW:zh'},
                                     universal_newlines=True, shell=True)
    (out, err) = proc.communicate()
    import codecs
    sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
    print(out)
    print(err)
$ ./date.py

再進一步查了一下,發現這跟環境變數有關。

$ env -i LANG='en_US.UTF8' /usr/bin/python3 -c 'import locale; print(locale.getpreferredencoding(False))'               
UTF-8
$ env -i  /usr/bin/python3 -c 'import locale; print(locale.getpreferredencoding(False))'
ANSI_X3.4-1968

看完這份 slides ,才比較知道 Unicode 在 Python 中較好的處理方法 Unicode In Python, Completely Demystified

Solution

  1. Decode early
  2. Unicode everywhere
  3. Encode late

或是類似 Ubiquity 的直接強制使用 C.UTF
http://bazaar.launchpad.net/~ubuntu-installer/ubiquity/trunk/view/head:/bin/ubiquity#L44

comments powered by Disqus