Welcome to the Chinese Encoding Webpage
GB 18030 Encoding Table
Since GB 18030 is designed to map the entire Unicode code points, the code table of GB 18030 is as large as Unicode.
Please press the drop-down box to select the code point you want to query. The first column of code points is sorted by Unicode, and the second column is the GB 18030 internal code. Supplementary with the third column description when necessary.
Differences between GBK and GB 2312
GB 2312 has 2-byte code positions. The value of the first byte is from 0xA1–FE (0xAA–AF and 0xF8–FE are not actually used), and the value of the second byte is from 0xA1–FE.
GBK has 2-byte code positions. The value of the first byte is from 0x81–FE, and the value of the second byte is from 0x40–7E and 0x80–FE.
GB 2312 has only 6,763 Chinese characters. GBK includes all CJK unified ideographs in the basic plane.
0x8140–A0FE, adds 6,080 Chinese characters; 0xAA40–FD9B (excluding the original GB 2312 range), adds 8,059 Chinese characters; 0xFD9C–FE4F, adds 21 compatible Chinese characters.
GB 2312 has only 682 symbols. In later font standards such as GB 5007.1 and GB 6345.1, six pinyin symbols: ɑ ḿ ń ň ǹ ɡ are added at 0xA8BB–A8C0. GBK inherits these symbols.
GBK adds 10 lowercase Roman numerals ⅰ–ⅹ (0xA2A1–A2AA).
GBK adds 29 vertical punctuation marks (0xA6D9–A6F5). Derived from GB 12345 standard.
GBK adds symbols used in Taiwan computer systems (0xA840–A895, 0xA940–A988, excluding A958, A95B, A95D–A95F).
However, in reality, Taiwan computer systems do not have 0xA844(―), 0xA891(☉), 0xA95C(‐).
Big5 code's 0xA145(‧), 0xA15A(╴), 0xA1C2(¯ or ‾), 0xA1C5(ˍ) also do not appear in GBK.
Adds ideographic description characters (0xA989–A995) and Chinese character zero 〇 (0xA996)
Adds 52 simplified Chinese characters in the "General Table of Simplified Characters" and 28 components of "Kangxi Dictionary" and "Cihai" that were not yet included in Unicode at that time (0xFE50–FEA0).
Note: Standards such as GB 5007.1 and GB 6345.1 supplement 94 half-width graphic characters (i.e., ASCII symbols) in the 10th area (internal code 0xAAA1–AAFE), and 32 half-width characters for Chinese pinyin a, e, i, o, u, ü with four tones and ê, ɑ, ḿ, ń, ň, ǹ, ɡ in the 11th area (internal code 0xABA1–ABC0). GBK and GB 18030 standards do not comply.
Differences between GBK and Microsoft CP936
Microsoft CP936 adds the euro symbol € at 0x80 (when GBK was launched in 1995, the euro had not been born)
Microsoft CP936 does not have 0xA6D9–A6DF, A6EC–A6ED, A6F3, A8BC, A8BF, A989–A995, FE50–FEA0 (GB 13000.1 / Unicode 1.0 did not have those characters).
Differences between GB 18030-2000 and GBK
GB 18030-2000 adds 4-byte code positions. The value of the first byte is from 0x81–FE, the second byte is from 0x30–39, the third byte is from 0x81–FE, and the fourth byte is from 0x30–39. And maps all possible encodings of Unicode to one of the GB 18030 code positions.
GB 18030-2000 includes all CJK unified ideographs in Extension A.
GB 18030-2000 includes the euro symbol at 0xA2E3.
Unfortunately, in the Microsoft Simplified Chinese system, 0x80 is still the euro symbol; 0xA2E3 has another euro symbol corresponding to the private use code point U+E76C.
Since Unicode ≥3.0 has included the following characters, the official document Appendix E of GB 18030-2000 and Appendix E - Table E.1 of GB 18030-2005 list the positions of the following characters in the next version of GB 13000 (note: equivalent to ISO/IEC 10646:2003). In fact, GB 18030-2000 and -2005 have modified their corresponding Unicode mappings.
GB code position ↓ Character ↓ GBK corresponding private use area ↓ GB 18030 corresponding Unicode ↓
A8BF ǹ U+E7C8 U+01F9
A989 〾 U+E7E7 U+303E
A98A ⿰ U+E7E8 U+2FF0
A98B ⿱ U+E7E9 U+2FF1
A98C ⿲ U+E7EA U+2FF2
A98D ⿳ U+E7EB U+2FF3
A98E ⿴ U+E7EC U+2FF4
A98F ⿵ U+E7ED U+2FF5
A990 ⿶ U+E7EE U+2FF6
A991 ⿷ U+E7EF U+2FF7
A992 ⿸ U+E7F0 U+2FF8
A993 ⿹ U+E7F1 U+2FF9
A994 ⿺ U+E7F2 U+2FFA
A995 ⿻ U+E7F3 U+2FFB
FE50 ⺁ U+E815 U+2E81
FE54 ⺄ U+E819 U+2E84
FE55 㑳 U+E81A U+3473
FE56 㑇 U+E81B U+3447
FE57 ⺈ U+E81C U+2E88
FE58 ⺋ U+E81D U+2E8B
FE5A 㖞 U+E81F U+359E
FE5B 㘚 U+E820 U+361A
FE5C 㘎 U+E821 U+360E
FE5D ⺌ U+E822 U+2E8C
FE5E ⺗ U+E823 U+2E97
FE5F 㥮 U+E824 U+396E
FE60 㤘 U+E825 U+3918
FE62 㧏 U+E827 U+39CF
FE63 㧟 U+E828 U+39DF
FE64 㩳 U+E829 U+3A73
FE65 㧐 U+E82A U+39D0
FE68 㭎 U+E82D U+3B4E
FE69 㱮 U+E82E U+3C6E
FE6A 㳠 U+E82F U+3CE0
FE6B ⺧ U+E830 U+2EA7
FE6E ⺪ U+E833 U+2EAA
FE6F 䁖 U+E834 U+4056
FE70 䅟 U+E835 U+415F
FE71 ⺮ U+E836 U+2EAE
FE72 䌷 U+E837 U+4337
FE73 ⺳ U+E838 U+2EB3
FE74 ⺶ U+E839 U+2EB6
FE75 ⺷ U+E83A U+2EB7
FE77 䎱 U+E83C U+43B1
FE78 䎬 U+E83D U+43AC
FE79 ⺻ U+E83E U+2EBB
FE7A 䏝 U+E83F U+43DD
FE7B 䓖 U+E840 U+44D6
FE7C 䙡 U+E841 U+4661
FE7D 䙌 U+E842 U+464C
FE80 䜣 U+E844 U+4723
FE81 䜩 U+E845 U+4729
FE82 䝼 U+E846 U+477C
FE83 䞍 U+E847 U+478D
FE84 ⻊ U+E848 U+2ECA
FE85 䥇 U+E849 U+4947
FE86 䥺 U+E84A U+497A
FE87 䥽 U+E84B U+497D
FE88 䦂 U+E84C U+4982
FE89 䦃 U+E84D U+4983
FE8A 䦅 U+E84E U+4985
FE8B 䦆 U+E84F U+4986
FE8C 䦟 U+E850 U+499F
FE8D 䦛 U+E851 U+499B
FE8E 䦷 U+E852 U+49B7
FE8F 䦶 U+E853 U+49B6
FE92 䲣 U+E856 U+4CA3
FE93 䲟 U+E857 U+4C9F
FE94 䲠 U+E858 U+4CA0
FE95 䲡 U+E859 U+4CA1
FE96 䱷 U+E85A U+4C77
FE97 䲢 U+E85B U+4CA2
FE98 䴓 U+E85C U+4D13
FE99 䴔 U+E85D U+4D14
FE9A 䴕 U+E85E U+4D15
FE9B 䴖 U+E85F U+4D16
FE9C 䴗 U+E860 U+4D17
FE9D 䴘 U+E861 U+4D18
FE9E 䴙 U+E862 U+4D19
FE9F 䶮 U+E863 U+4DAE
Differences between GB 18030-2005 and GB 18030-2000
Includes the glyph tables for CJK unified ideographs in Extension B, Korean characters, Mongolian (including Manchu, Torgut, Xibe, and Alagkari scripts), Dehong Dai, Tibetan, Uyghur/Kazakh/Kyrgyz, and Yi scripts. Korean includes 3,376 Korean characters plus 69 letters plus 51 compatible letters, Mongolian includes 149 characters, Dai includes 35 characters, Tibetan includes 193 characters, Uyghur includes 49 characters plus 153 letter forms, Yi includes 1,215 characters (excluding U+A4A2, U+A4A3, U+A4B4, U+A4C1, U+A4C5).
GB 18030-2000 did not map ḿ to Unicode. It was finally corrected in GB 18030-2005. See Appendix E - Table E.2 of the official document.
GB code position ↓ Character ↓ GB 18030-2000 corresponding private use area ↓ GB 18030-2005 corresponding Unicode ↓
A8BC ḿ U+E7C7 U+1E3F
Characters not yet corrected in GB 18030
When GB 18030-2000 was launched, because there was no CJK unified ideographs in Extension B, the following characters were mapped to the private use area. And when GB 18030-2005 was launched, although Unicode had included Extension B, in the GB 18030-2005 standard, the following characters still mapped to the private use area and were not modified. See WG2 N2773 document. As a result, GB 18030-2005 repeatedly included the following 6 characters twice.
GB code position ↓ Character ↓ GB 18030 corresponding private use area ↓ Unicode ≥3.1 ↓ Repeated GB code positions due to this ↓
FE51 ? U+E816 U+20087 95329031
FE52 ? U+E817 U+20089 95329033
FE53 ? U+E818 U+200CC 95329730
FE6C ? U+E831 U+215D7 9536B937
FE76 ? U+E83B U+2298F 9630BA35
FE91 ? U+E855 U+241FE 9635B630
The following characters were already in GB 18030-2000, and at that time Unicode did not have the following characters. Although Unicode included all the following characters in version 4.1, in the GB 18030-2005 standard, the following characters still mapped to the private use area. See WG2 N2773 document.
GB code position ↓ Character ↓ GB 18030 corresponding private use area ↓ Unicode ≥4.1 ↓
A6D9 ︐ U+E78D U+FE10
A6DA ︒ U+E78E U+FE12
A6DB ︑ U+E78F U+FE11
A6DC ︓ U+E790 U+FE13
A6DD ︔ U+E791 U+FE14
A6DE ︕ U+E792 U+FE15
A6DF ︖ U+E793 U+FE16
A6EC ︗ U+E794 U+FE17
A6ED ︘ U+E795 U+FE18
A6F3 ︙ U+E796 U+FE19
FE59 龴 U+E81E U+9FB4
FE61 龵 U+E826 U+9FB5
FE66 龶 U+E82B U+9FB6
FE67 龷 U+E82C U+9FB7
FE6D 龸 U+E832 U+9FB8
FE7E 龹 U+E843 U+9FB9
FE90 龺 U+E854 U+9FBA
FEA0 龻 U+E864 U+9FBB
Return to the main page
Saturday, June 25, 2016 9:15:04 pm
http://code.web.idv.hk/gb18030/gb18030.php
[
Last edited by zzz19760225 on 2017-11-28 at 11:24 ]