汉语拼音转换工具（Python 版）¶

将汉语转为拼音。可以用于汉字注音、排序、检索。

Documentation: http://pypinyin.rtfd.org
GitHub: https://github.com/mozillazg/python-pinyin
License: MIT license
PyPI: https://pypi.python.org/pypi/pypinyin
Python version: 2.6, 2.7, pypy, 3.3, 3.4

特性¶

根据词组智能匹配最正确的拼音。
支持多音字。
简单的繁体支持。
支持多种不同拼音风格。

安装¶

$ pip install pypinyin

为了更好的处理包含多音字及非中文字符的字符串，推荐同时安装 jieba 分词模块。

使用示例¶

>>> from pypinyin import pinyin, lazy_pinyin
>>> import pypinyin
>>> pinyin(u'中心')
[[u'zh\u014dng'], [u'x\u012bn']]
>>> pinyin(u'中心', heteronym=True)  # 启用多音字模式
[[u'zh\u014dng', u'zh\xf2ng'], [u'x\u012bn']]
>>> pinyin(u'中心', style=pypinyin.INITIALS)  # 设置拼音风格
[['zh'], ['x']]
>>> pinyin('中心', style=pypinyin.TONE2, heteronym=True)
[['zho1ng', 'zho4ng'], ['xi1n']]
>>> lazy_pinyin(u'中心')  # 不考虑多音字的情况
['zhong', 'xin']

命令行工具：

$ pypinyin 音乐
yīn yuè
$ pypinyin -h

分词处理¶

如果安装了 jieba 分词模块，程序会自动调用。

使用其他分词模块：

安装分词模块，比如 pip install snownlp ；

使用经过分词处理的字符串列表作参数：

>> from pypinyin import lazy_pinyin, TONE2
>> from snownlp import SnowNLP
>> hans = u'音乐123'
>> lazy_pinyin(hans, style=TONE2)
[u'yi1n', u'le4', u'1', u'2', u'3']
>> hans_seg = SnowNLP(hans).words  # 分词处理
>> hans_seg
[u'\u97f3\u4e50', u'123']
>> lazy_pinyin(hans_seg, style=TONE2)
[u'yi1n', u'yue4', u'123']

自定义拼音库¶

如果对结果不满意，可以通过自定义拼音库的方式修正结果：

安装了 jieba 分词模块并且支持分词的词组

>> from pypinyin import lazy_pinyin, load_phrases_dict, TONE2
>> hans = u'桔子'
>> lazy_pinyin(hans, style=TONE2)
[u'jie2', u'zi3']
>> load_phrases_dict({u'桔子': [[u'jú'], [u'zǐ']]})
>> lazy_pinyin(hans, style=TONE2)
[u'ju2', u'zi3']

未安装 jieba 分词模块 and/or 不支持分词的词组

>> from pypinyin import lazy_pinyin, load_phrases_dict, TONE2, load_single_dict
>> hans = u'还没'
>> lazy_pinyin(hans, style=TONE2)
['hua2n', 'me2i']
>>>  # 第一种自定义词组的方法
>> load_phrases_dict({u'还没': [[u'hái'], [u'méi']]})
>>> lazy_pinyin(u'还没', style=TONE2)})
['hua2n', 'me2i']
>>> lazy_pinyin([u'还没'], style=TONE2)  # 手动指定 "还没" 为一个词组
['ha2i', 'me2i']
>>>  # 第二种自定义词组的方法
>> load_single_dict({ord(u'还'): u'hái,huán'})  # 调整 "还" 字的拼音顺序
>>> lazy_pinyin(u'还没', style=TONE2)
['ha2i', 'me2i']

API¶

拼音风格:

风格	值	含义
pypinyin.NORMAL	0	普通风格，不带声调。如： `pin yin`
pypinyin.TONE	1	声调风格1，拼音声调在韵母第一个字母上（默认风格）。如： `pīn yīn`
pypinyin.TONE2	2	声调风格2，即拼音声调在各个拼音之后，用数字 [0-4] 进行表示。如： `pi1n yi1n`
pypinyin.INITIALS	3	声母风格，只返回各个拼音的声母部分。如： `中国` 的拼音 `zh g`
pypinyin.FIRST_LETTER	4	首字母风格，只返回拼音的首字母部分。如： `p y`
pypinyin.FINALS	5	韵母风格1，只返回各个拼音的韵母部分，不带声调。如： `ong uo`
pypinyin.FINALS_TONE	6	韵母风格2，带声调，声调在韵母第一个字母上。如： `ōng uó`
pypinyin.FINALS_TONE2	7	韵母风格2，带声调，声调在各个拼音之后，用数字 [0-4] 进行表示。如： `o1ng uo2`

pypinyin.pinyin(hans, style=1, heteronym=False, errors=u'default')¶

将汉字转换为拼音.

Parameters:	hans (unicode 字符串或字符串列表) – 汉字字符串( `u'你好吗'` )或列表( `[u'你好', u'吗']` ). 如果用户安装了 `jieba` , 将使用 `jieba` 对字符串进行分词处理。可以通过传入列表的方式禁用这种行为。也可以使用自己喜爱的分词模块对字符串进行分词处理, 只需将经过分词处理的字符串列表传进来就可以了。 style – 指定拼音风格 errors – 指定如何处理没有拼音的字符 `'default'`: 保留原始字符 `'ignore'`: 忽略该字符 `'replace'`: 替换为去掉 `\u` 的 unicode 编码字符串 (`u'\u90aa'` => `u'90aa'`) heteronym – 是否启用多音字
Returns:	拼音列表
Return type:	list

Usage:

>>> from pypinyin import pinyin
>>> import pypinyin
>>> pinyin(u'中心')
[[u'zhōng'], [u'xīn']]
>>> pinyin(u'中心', heteronym=True)  # 启用多音字模式
[[u'zhōng', u'zhòng'], [u'xīn']]
>>> pinyin(u'中心', style=pypinyin.INITIALS)  # 设置拼音风格
[[u'zh'], [u'x']]
>>> pinyin(u'中心', style=pypinyin.TONE2)
[[u'zho1ng'], [u'xi1n']]

pypinyin.slug(hans, style=0, heteronym=False, separator=u'-', errors=u'default')¶

生成 slug 字符串.

Parameters:	hans (unicode or list) – 汉字 style – 指定拼音风格 heteronym – 是否启用多音字 separstor – 两个拼音间的分隔符/连接符 errors – 指定如何处理没有拼音的字符
Returns:	slug 字符串.

>>> import pypinyin
>>> pypinyin.slug(u'中国人')
u'zhong-guo-ren'
>>> pypinyin.slug(u'中国人', separator=u' ')
u'zhong guo ren'
>>> pypinyin.slug(u'中国人', style=pypinyin.INITIALS)
u'zh-g-r'

pypinyin.lazy_pinyin(hans, style=0, errors=u'default')¶

不包含多音字的拼音列表.

与 pinyin() 的区别是返回的拼音是个字符串，并且每个字只包含一个读音.

Parameters:	hans (unicode or list) – 汉字 style – 指定拼音风格 errors – 指定如何处理没有拼音的字符
Returns:	拼音列表(e.g. `['zhong', 'guo', 'ren']`)
Return type:	list

Usage:

>>> from pypinyin import lazy_pinyin
>>> import pypinyin
>>> lazy_pinyin(u'中心')
[u'zhong', u'xin']
>>> lazy_pinyin(u'中心', style=pypinyin.TONE)
[u'zhōng', u'xīn']
>>> lazy_pinyin(u'中心', style=pypinyin.INITIALS)
[u'zh', u'x']
>>> lazy_pinyin(u'中心', style=pypinyin.TONE2)
[u'zho1ng', u'xi1n']

pypinyin.load_single_dict(pinyin_dict)¶

载入用户自定义的单字拼音库

Parameters:	pinyin_dict (dict) – 单字拼音库。比如： `{0x963F: u"ā,ē"}`

pypinyin.load_phrases_dict(phrases_dict)¶

载入用户自定义的词语拼音库

Parameters:	phrases_dict (dict) – 词语拼音库。比如： `{u"阿爸": [[u"ā"], [u"bà"]]}`

Changelog¶

0.5.6 (2015-02-26)¶

fix “苹果” pinyin error. #11
精简 phrases_dict
fix 重复 import jieba 的问题
更新文档

0.5.5 (2015-01-27)¶

fix phrases_dict error

0.5.4 (2014-12-26)¶

修复无法正确处理由分词模块产生的中英文混合词组（比如：B超，维生素C）的问题. #8

0.5.3 (2014-12-07)¶

更新拼音库

0.5.2 (2014-09-21)¶

载入拼音库时，改为载入其副本。防止内置的拼音库被破坏
修复 胜败乃兵家常事 的音标问题

0.5.1 (2014-03-09)¶

新增参数 errors 用来控制如何处理没有拼音的字符:
- 'default': 保留原始字符
- 'ignore': 忽略该字符
- 'replace': 替换为去掉 \u 的 unicode 编码字符串(u'\u90aa' => u'90aa')
只处理 [^a-zA-Z0-9_] 字符。

0.5.0 (2014-03-01)¶

使用新的单字拼音库内容和格式

新的格式：{0x963F: u"ā,ē"}

旧的格式：{u'啊': u"ā,ē"}

0.4.4 (2014-01-16)¶

清理命令行命令的输出结果，去除无关信息
修复“ImportError: No module named runner”

0.4.3 (2014-01-10)¶

修复命令行工具在 Python 3 下的兼容性问题

0.4.2 (2014-01-10)¶

去除拼音风格前的 STYLE_ 前缀（兼容包含 STYLE_ 前缀的拼音风格）
增加命令行工具，具体用法请见： pypinyin -h

0.4.1 (2014-01-04)¶

支持自定义拼音库，方便用户修正程序结果

0.4.0 (2014-01-03)¶

将 jieba 模块改为可选安装，用户可以选择使用自己喜爱的分词模块对汉字进行分词处理
支持 Python 3

0.3.1 (2013-12-24)¶

增加 lazy_pinyin

>>> lazy_pinyin(u'中心')
['zhong', 'xin']

0.3.0 (2013-09-26)¶

修复首字母风格无法正确处理只有韵母的汉字
新增三个拼音风格:
- pypinyin.STYLE_FINALS ：韵母风格1，只返回各个拼音的韵母部分，不带声调。如： ong uo
- pypinyin.STYLE_FINALS_TONE ：韵母风格2，带声调，声调在韵母第一个字母上。如： ōng uó
- pypinyin.STYLE_FINALS_TONE2 ：韵母风格2，带声调，声调在各个拼音之后，用数字 [0-4] 进行表示。如： o1ng uo2

0.2.0 (2013-09-22)¶

完善对中英文混合字符串的支持:

>> pypinyin.pinyin(u'你好abc')
[[u'n\u01d0'], [u'h\u01ceo'], [u'abc']]

0.1.0 (2013-09-21)¶

Initial Release

汉语拼音转换工具（Python 版）¶

特性¶

安装¶

使用示例¶

分词处理¶

自定义拼音库¶

API¶

Changelog¶

0.5.6 (2015-02-26)¶

0.5.5 (2015-01-27)¶

0.5.4 (2014-12-26)¶

0.5.3 (2014-12-07)¶

0.5.2 (2014-09-21)¶

0.5.1 (2014-03-09)¶

0.5.0 (2014-03-01)¶

0.4.4 (2014-01-16)¶

0.4.3 (2014-01-10)¶

0.4.2 (2014-01-10)¶

0.4.1 (2014-01-04)¶

0.4.0 (2014-01-03)¶

0.3.1 (2013-12-24)¶

0.3.0 (2013-09-26)¶

0.2.0 (2013-09-22)¶

0.1.0 (2013-09-21)¶

Indices and tables¶

Table Of Contents

This Page

汉语拼音转换工具（Python 版）¶

特性¶

安装¶

使用示例¶

分词处理¶

自定义拼音库¶

Related Projects¶

API¶

Changelog¶

0.5.6 (2015-02-26)¶

0.5.5 (2015-01-27)¶

0.5.4 (2014-12-26)¶

0.5.3 (2014-12-07)¶

0.5.2 (2014-09-21)¶

0.5.1 (2014-03-09)¶

0.5.0 (2014-03-01)¶

0.4.4 (2014-01-16)¶

0.4.3 (2014-01-10)¶

0.4.2 (2014-01-10)¶

0.4.1 (2014-01-04)¶

0.4.0 (2014-01-03)¶

0.3.1 (2013-12-24)¶

0.3.0 (2013-09-26)¶

0.2.0 (2013-09-22)¶

0.1.0 (2013-09-21)¶

Indices and tables¶