8

[Python] Automatically Convert Traditional Chinese PO file to Simplified Chinese

 2 years ago
source link: https://siongui.github.io/2016/01/08/python-automatically-convert-zhtw-po-file-to-zhcn/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

In this post, we will write a Python script to automatically convert Traditional Chinese (zh_TW) PO file to Simplified Chinese (zh_CN) by OpenCC (Open Chinese Convert) and pyOpenCC (OpenCC Python binding). Please read my previous post [1] to install OpenCC and pyOpenCC first.

Source Code

The zh_TW PO file for test:

messages.po | repository | view raw
# Chinese translations for PACKAGE package.
# Copyright (C) 2013 THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# Automatically generated, 2013.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2013-06-04 10:20+0800\n"
"PO-Revision-Date: 2013-03-10 05:19+0800\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: zh_TW\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

msgid "Definition and Meaning"
msgstr "定義與意義"

msgid "Words Start with"
msgstr "單字,開頭為"

msgid "Home"
msgstr "首頁"

msgid "Canon"
msgstr "經典"

msgid "About"
msgstr "關於"

msgid "Setting"
msgstr "設定"

msgid "Translation"
msgstr "翻譯"

The Python script:

tw2cn.py | repository | view raw
#!/usr/bin/env python
# -*- coding:utf-8 -*-

import re
import pyopencc
tw2cn = pyopencc.OpenCC('zht2zhs.ini').convert


if __name__ == '__main__':
  with open("locale/zh_TW/LC_MESSAGES/messages.po", 'r') as ftw:
    with open("locale/zh_CN/LC_MESSAGES/messages.po", "w") as fcn:
      for line in ftw.readlines():
        if 'zh_TW' in line:
          fcn.write(line.replace('zh_TW', 'zh_CN'))
        elif line.startswith('msgstr'):
          try:
            fcn.write(re.sub('msgstr "(.+)"', lambda m: 'msgstr "%s"' % tw2cn(m.group(1)), line))
          except UnicodeEncodeError:
            fcn.write(re.sub('msgstr "(.+)"', lambda m: 'msgstr "%s"' % tw2cn(m.group(1)), line).encode('utf-8'))
        else:
          fcn.write(line)

Tested on: Ubuntu Linux 15.10, Python 2.7.10, opencc 0.4.3-2build1, pyopencc-0.4.2.2.


References:

[2]Python Regular Expressions | Google for Education | Google Developers[3]Regex replace (in Python) - a simpler way? - Stack Overflow[4]python - Import a module from a relative path - Stack Overflow


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK