中 Traditional Chinese

臺 Taiwanese character sets

Big 5

Albeit not historically correct, we are going to treat Big 5 as an encoding of CNS 11643 Planes 1 and 2. (In actual fact, Big 5 predates CNS 11643.) However, the correspondence between the two is not entirely straightforward, partly because the planes in Big 5 have 157 cells whereas the ones in CNS 11643 have the usual 94, and partly because certain characters appear out of order in Big 5 (owing to incorrect stroke counts), are missing or appear more than once. Separate character tables are provided for Big 5 to avoid a complex reordering/mapping scheme.

CNS 11643

「中文標準交換碼」 (Chinese Standard Interchange Code).
Previously:
「國家中文標準交換碼」 (National Chinese Standard Interchange Code)

This very large character set has its own governmental web site. The official standard can be obtained from the CNS Online Service (complete low-resolution preview available).

Only the first two planes are widely supported.

第一字面 Plane 1

Ref.: ISO-IR 171 Chinese Standard Interchange Code (CSIC) — Set 1.

5,401 hanzi (Chinese characters or 漢字) in Columns 36–93; 684 miscellaneous characters in Columns 1–9 and 34.

Unihan (統漢字) T1 includes 6 hanzi in Column 2 (viz, Cells 89 兙, 90 兛, 91 兞, 92 兝, 93 兡 and 94 兣), 3 hanzi in Column 3 (viz, Cells 1 嗧, 2 瓩 and 3 糎) and 3 radicals in Column 7 (viz, Cells 8 亠, 15 冫 and 20 勹), which gives 5,413 hanzi in total (PDF — hanzi).

Apart from the 12 hanzi included in Unihan T1, columns 1–9 and 34 contain 672 characters (PDF — non-hanzi).

Unihan CNS1992-1 = CNS1986-1 includes the same 6 hanzi in Column 2 and 3 hanzi in Column 3 as well as 1 hanzi in Column 4 (viz, Cell 31 卄), 5,411 hanzi in total (PDF). This gives 674 characters outside of Unihan, but the full set of characters remains the same.

Unihan BigFive (level 1) contains the same hanzi as CNS1992-1 except that U+5F5D 彝 replaces U+545E 彞 (PDF).

BigFive itself does not include the 30 numerals in Column 6 or the 213 radicals in Columns 7–9, which gives a set of 431 non-hanzi (PDF — non-hanzi). Many of the missing characters are included in common extensions.

* * *

A certain number of the characters in Columns 1–6 have different Unicode mappings in different implementations. It is not uncommon that more than one Unicode character matches the pictorial representation in the standard, so this is not entirely unexpected. The leftmost column shows the Unicode mapping used for Big5 in all four browsers (except encircled numerals, which are not included in Big5 but have only one possible Unicode mapping [each]). The following table shows the differences:

IESafariFirefoxOpera
2022EUC
  3000  20
, ff0c, 2c
. ff0e․ 2024. 2e
‧ 2027• 2022· b7・ 30fb・ 30fb
; ff1b; 3b
: ff1a: 3a
? ff1f? 3f
! ff01! 21
﹒ fe52‧ 2027
· b7﹒ fe52‧ 2027
| ff5c︱ fe31� fffd︱ fe31︱ fe31︱ fe31
– 2013� fffd— 2014— 2014— 2014
︱ fe31︲ fe32� fffd︲ fe32︲ fe32︲ fe32
— 2014� fffd﹘ fe58– 2013– 2013
︳ fe33� fffd� fffd
╴ 2574_ ff3f� fffd� fffd
︴ fe34� fffd� fffd
﹏ fe4f﹋ fe4b� fffd
( ff08( 28
) ff09) 29
{ ff5b{ 7b
} ff5d} 7d
‵ 2035′ 2032′ 2032′ 2032
′ 2032‵ 2035‵ 2035‵ 2035
# ff03# 23
& ff06& 26
* ff0a✳ 2733
¯ af‾ 203e� fffd‾ 203e‾ 203e‾ 203e
 ̄ ffe3� fffd� fffd
_ ff3f� fffd
ˍ 2cd_ ff3f_ 5f� fffd
﹋ fe4b� fffd
﹌ fe4c� fffd
+ ff0b+ 2b
- ff0d- 2d
< ff1c< 3c
> ff1e> 3e
= ff1d= 3d
﹥ fe65﹦ fe66﹦ fe66
﹦ fe66﹥ fe65﹥ fe65
~ ff5e∼ 223c〜 301c∼ 223c∼ 223c∼ 223c
⊕ 2295♁ 2641♁ 2641
⊙ 2299☉ 2609☉ 2609
∥ 2225‖ 2016‖ 2016‖ 2016
∣ 2223| 7c| ff5c| ff5c| ff5c
/ ff0f⁄ 2044
\ ff3c\ 5c
∕ 2215/ ff0f/ 2f
$ ff04$ 24
¥ ffe5¥ a5
¢ ffe0¢ a2
£ ffe1£ a3
% ff05% 25
@ ff20@ 40
° b0゜ 309c
0 ff100 30
⋮ (+8)
9 ff199 39
十 5341� fffd〸 3038� fffd
卄 5344� fffd〹 3039
卅 5345� fffd〺 303a� fffd
A ff21A 41
⋮ (+24)
Z ff3aZ 5a
a ff41a 61
⋮ (+24)
z ff5az 7a
ˉ 2c9˟ 2df  2003
④ 2463⑤ 2464
⋮ (+5)
⑩ 2469⑪ 246a
1512071823

Most of the alternative mappings are reasonable; a few are clearly at odds with the published standard, but may be more common in practice; some are wrong for no apparent reason.

* * *

Columns 7–9 enumerate 213 of the 214 Kangxi radicals (no. 34 ⼡, marked with an asterisk below, is missing because official Taiwanese sources have unified it with the similar-looking no. 35 ⼢). 166 of the radicals appear as ‘normal’ hanzi elsewhere in Plane 1, 20 appear in Plane 2, and 25 (including no. 34) appear in Plane 3, whereas three radicals are not encoded anywhere else (nos. 8, 15 and 20). (Note: Lunde’s numbers are slightly different, probably because he maps radical no. 174 to U+9752 青 in Plane 1 instead of U+9751 靑 in Plane 3.)

Big 5 encoding only includes Planes 1 and 2, but 22 radicals from Plane 3 as well as the ones not encoded elsewhere are included in a very common E-Ten extension. 3 Kangxi radicals from Plane 3 are still missing (nos. 2, 34 and 174).

Unicode encodes Kangxi radicals twice: both in a dedicated radicals block and in the standard ideographic block.

The table below gives further details:

UnicodeCNSE-Ten
Kangxi
radicals
Unified
ideographs
11643
Plane
Column 38
Cell
1.U+2F00U+4E001
2.U+2F01U+4E283
3.U+2F02U+4E363bf
4.U+2F03U+4E3F3c0
5.U+2F04U+4E591
6.U+2F05U+4E853c1
7.U+2F06U+4E8C1
8.U+2F07U+4EA0c2
9.U+2F08U+4EBA1
10.U+2F09U+513F1
11.U+2F0AU+51651
12.U+2F0BU+516B1
13.U+2F0CU+51823c3
14.U+2F0DU+51963c4
15.U+2F0EU+51ABc5
16.U+2F0FU+51E01
17.U+2F10U+51F52
18.U+2F11U+52001
19.U+2F12U+529B1
20.U+2F13U+52F9c6
21.U+2F14U+53151
22.U+2F15U+531A2
23.U+2F16U+53383c7
24.U+2F17U+53411
25.U+2F18U+535C1
26.U+2F19U+53693c8
27.U+2F1AU+53822
28.U+2F1BU+53B63c9
29.U+2F1CU+53C81
30.U+2F1DU+53E31
31.U+2F1EU+56D72
32.U+2F1FU+571F1
33.U+2F20U+58EB1
34.⼡*U+2F21U+59023
35.U+2F22U+590A3ca
36.U+2F23U+59151
37.U+2F24U+59271
38.U+2F25U+59731
39.U+2F26U+5B501
40.U+2F27U+5B803cb
41.U+2F28U+5BF81
42.U+2F29U+5C0F1
43.U+2F2AU+5C221
44.U+2F2BU+5C381
45.U+2F2CU+5C6E2
46.U+2F2DU+5C711
47.U+2F2EU+5DDB3cc
48.U+2F2FU+5DE51
49.U+2F30U+5DF11
50.U+2F31U+5DFE1
51.U+2F32U+5E721
52.U+2F33U+5E7A3cd
53.U+2F34U+5E7F3ce
54.U+2F35U+5EF43cf
55.U+2F36U+5EFE1
56.U+2F37U+5F0B1
57.U+2F38U+5F131
58.U+2F39U+5F503d0
59.U+2F3AU+5F613d1
60.U+2F3BU+5F732
61.U+2F3CU+5FC31
62.U+2F3DU+62081
63.U+2F3EU+62361
64.⼿U+2F3FU+624B1
65.U+2F40U+652F1
66.U+2F41U+65343d2
67.U+2F42U+65871
68.U+2F43U+65971
69.U+2F44U+65A41
70.U+2F45U+65B91
71.U+2F46U+65E03d3
72.U+2F47U+65E51
73.U+2F48U+66F01
74.U+2F49U+67081
75.U+2F4AU+67281
76.U+2F4BU+6B201
77.U+2F4CU+6B621
78.U+2F4DU+6B791
79.U+2F4EU+6BB32
80.U+2F4FU+6BCB1
81.U+2F50U+6BD41
82.U+2F51U+6BDB1
83.U+2F52U+6C0F1
84.U+2F53U+6C142
85.U+2F54U+6C341
86.U+2F55U+706B1
87.U+2F56U+722A1
88.U+2F57U+72361
89.U+2F58U+723B1
90.U+2F59U+723F2
91.U+2F5AU+72471
92.U+2F5BU+72591
93.U+2F5CU+725B1
94.U+2F5DU+72AC1
95.U+2F5EU+73841
96.U+2F5FU+73891
97.U+2F60U+74DC1
98.U+2F61U+74E61
99.U+2F62U+75181
100.U+2F63U+751F1
101.U+2F64U+75281
102.U+2F65U+75301
103.U+2F66U+758B1
104.U+2F67U+75923d4
105.U+2F68U+76763d5
106.U+2F69U+767D1
107.U+2F6AU+76AE1
108.U+2F6BU+76BF1
109.U+2F6CU+76EE1
110.U+2F6DU+77DB1
111.U+2F6EU+77E21
112.U+2F6FU+77F31
113.U+2F70U+793A1
114.U+2F71U+79B82
115.U+2F72U+79BE1
116.U+2F73U+7A741
117.U+2F74U+7ACB1
118.U+2F75U+7AF91
119.U+2F76U+7C731
120.U+2F77U+7CF81
121.U+2F78U+7F361
122.U+2F79U+7F512
123.U+2F7AU+7F8A1
124.U+2F7BU+7FBD1
125.U+2F7CU+80011
126.U+2F7DU+800C1
127.U+2F7EU+80121
128.⽿U+2F7FU+80331
129.U+2F80U+807F1
130.U+2F81U+80891
131.U+2F82U+81E31
132.U+2F83U+81EA1
133.U+2F84U+81F31
134.U+2F85U+81FC1
135.U+2F86U+820C1
136.U+2F87U+821B1
137.U+2F88U+821F1
138.U+2F89U+826E1
139.U+2F8AU+82721
140.U+2F8BU+82782
141.U+2F8CU+864D2
142.U+2F8DU+866B1
143.U+2F8EU+88401
144.U+2F8FU+884C1
145.U+2F90U+88631
146.U+2F91U+897E2
147.U+2F92U+898B1
148.U+2F93U+89D21
149.U+2F94U+8A001
150.U+2F95U+8C371
151.U+2F96U+8C461
152.U+2F97U+8C551
153.U+2F98U+8C782
154.U+2F99U+8C9D1
155.U+2F9AU+8D641
156.U+2F9BU+8D701
157.U+2F9CU+8DB31
158.U+2F9DU+8EAB1
159.U+2F9EU+8ECA1
160.U+2F9FU+8F9B1
161.U+2FA0U+8FB01
162.U+2FA1U+8FB53d6
163.U+2FA2U+90911
164.U+2FA3U+91491
165.U+2FA4U+91C61
166.U+2FA5U+91CC1
167.U+2FA6U+91D11
168.U+2FA7U+95771
169.U+2FA8U+95801
170.U+2FA9U+961C1
171.U+2FAAU+96B63d7
172.U+2FABU+96B91
173.U+2FACU+96E81
174.U+2FADU+97513
175.U+2FAEU+975E1
176.U+2FAFU+97621
177.U+2FB0U+97691
178.U+2FB1U+97CB1
179.U+2FB2U+97ED1
180.U+2FB3U+97F31
181.U+2FB4U+98011
182.U+2FB5U+98A81
183.U+2FB6U+98DB1
184.U+2FB7U+98DF1
185.U+2FB8U+99961
186.U+2FB9U+99991
187.U+2FBAU+99AC1
188.U+2FBBU+9AA81
189.U+2FBCU+9AD81
190.U+2FBDU+9ADF2
191.U+2FBEU+9B251
192.⾿U+2FBFU+9B2F2
193.U+2FC0U+9B321
194.U+2FC1U+9B3C1
195.U+2FC2U+9B5A1
196.U+2FC3U+9CE51
197.U+2FC4U+9E751
198.U+2FC5U+9E7F1
199.U+2FC6U+9EA51
200.U+2FC7U+9EBB1
201.U+2FC8U+9EC31
202.U+2FC9U+9ECD1
203.U+2FCAU+9ED11
204.U+2FCBU+9EF92
205.U+2FCCU+9EFD2
206.U+2FCDU+9F0E1
207.U+2FCEU+9F131
208.U+2FCFU+9F201
209.U+2FD0U+9F3B1
210.U+2FD1U+9F4A1
211.U+2FD2U+9F521
212.U+2FD3U+9F8D1
213.U+2FD4U+9F9C1
214.U+2FD5U+9FA02

Safari EUC substitutes U+9752 青 for U+9751 靑 as radical no. 174. This is actually a better match for the glyph in ISO-IR 171 and Lunde.

Firefox maps to Unicode’s Kangxi Radicals block.

In Opera, all 213 radicals are missing. In Safari’s ISO 2022 encoding, 210 radicals are missing, only the 3 radicals in Unihan (and nowhere else in CNS 11643) are included.

Internet Explorer includes the 25 E-Ten radicals (although no. 35. ⼢ is mapped to no. 34 ⼡), which means that 189 of the 213 radicals are missing

* * *

In Internet Explorer (Big 5 and DEC, possibly E-Ten), the 33 Control Pictures U+2400–U+241F and U+2421 in Column 34 are replaced by real control characters 0x00–0x1F and 0x7F, which turn into question marks when they appear in HTML (as opposed to plain text). These characters are missing from Safari (Big 5) as well.

For the hanzi in Columns 36–93, all browsers follow Unihan almost perfectly. Internet Explorer however substitutes U+5F5D 彝 for U+5F5E 彞, thus making it Big5-compatible.

All four browsers include the euro symbol at the end of the symbols range in Big 5.

第二字面 Plane 2

Ref.: ISO-IR 172 Chinese Standard Interchange Code (CSIC) — Set 2.

7,650 hanzi in Columns 1–82.

Unihan T2 = CNS1992-2 = CNS1986-2 (PDF).

Unihan BigFive (level 2) contains the same hanzi as well as two duplicates, viz, U+FA0C 兀 in addition to U+5140 兀 and U+FA0D 嗀 in addition to U+55C0 嗀 (PDF).

第三字面 Plane 3

Ref.: ISO-IR 183 Chinese Standard Interchange Code — Set 3.

6,148 hanzi in Columns 1–66.

Unihan T3 covers all but 1, 6,147 hanzi (PDF).

Unihan T3 additionally contains a number of hanzi in Columns 68–71 which are not part of the published CNS standard. Lunde refers to this as a ‘fictitious extension’ (PDF).

Safari’s ISO-2022 implementation is fairly complete (and also includes the fictitious extension, which is perhaps not a good idea), whereas over a third of the characters are missing from the ISO-2022 and EUC implementations in Firefox (which excludes fictitious extension). Neither Opera nor Internet Explorer implements Planes 3–7. Opera however implements Plane 14 (see below).

第十四字面 Plane 14

The 1986 version of the standard had a Plane 14 which was mostly identical to Plane 3 in the 1992 version, the only difference being the additional 171 hanzi later assigned to Plane 4, which gives a total of 6,319 hanzi. Lunde provides the mapping from Plane 14 to Plane 4 for these additional characters, which, in combination with Unihan T4, enables us to make a character chart for 170 of these (PDF — Plane 14 extension), the last one being mapped to a Plane 4 character missing from Unihan.

Unihan CNS1992-3 = CNS1986-E is an incomplete subset of Plane 3 with Plane 14 and fictitious extensions. This collection of characters is of little interest except that it seems to form the basis for Opera’s implementation of Plane 14. Around one third of the characters are missing.

第四字面 Plane 4

Ref.: ISO-IR 184 Chinese Standard Interchange Code — Set 4.

7,298 hanzi in Columns 1–78.

Unihan T4 only includes 7,286 hanzi, which means that 12 are missing (PDF).

Nearly half of the characters are missing in Safari, and almost nine tenths are missing in Firefox.

第五字面 Plane 5

Ref.: ISO-IR 185 Chinese Standard Interchange Code — Set 5.

8,603 hanzi in Columns 1–92.

Unihan T5 enumerates 8,601 hanzi; 2 are missing (PDF).

Safari implements around five per cent, Firefox less than one per cent of the characters.

第六字面 Plane 6

Ref.: ISO-IR 186 Chinese Standard Interchange Code — Set 6.

6,388 hanzi in Columns 1–68.

Unihan T6 covers 6,386 of these; 2 are missing (PDF).

Safari implements under four per cent, Firefox just over four per mille of the characters.

第七字面 Plane 7

Ref.: ISO-IR 187 Chinese Standard Interchange Code — Set 7.

6,539 hanzi in Columns 1–70.

Unihan T7 includes 6,357 hanzi; again, 2 are missing (PDF).

Safari implements around two and a half per cent, Firefox around two and a half per mille of the characters.

Further planes

The current version of the standard furthermore includes Planes 10, 11, 12, 13, 14 (unrelated to the old Plane 14 described above) and 15 (already present in the 1986 version; certain characters seem to have been moved to other planes). We are not aware of any support for these planes in any browser.

Common extensions

E-Ten 1 contains 365 characters added at the end of Plane 1 in Big 5: the 30 numerals from CNS 11643 Plane 1 Column 6, 25 radicals (see table above), 169 Japanese hiragana/katakana, 66 Cyrillic letters, 40 E-Ten input codes (not included in the PDF or in any browser implementation) and 35 hanzi and symbols (PDF). 29 of the radicals/hanzi can be found in Unihan H.

This extension is missing from IE as well as from Safari’s Big 5 (non-HKSCS) implementation. Differences between implementations are summarised below for plain Big 5 (B) as well as Big 5 with HKSCS extensions (H).

SafariFirefoxOpera
HBHBH
丶 4e36⼂ 2f02
丿 4e3f⼃ 2f03
亅 4e85⼅ 2f05
亠 4ea0⼇ 2f07
冂 5182⼌ 2f0c
冖 5196⼍ 2f0d
冫 51ab⼎ 2f0e
勹 52f9⼓ 2f13
匸 5338⼖ 2f16
卩 5369⼙ 2f19
厶 53b6⼛ 2f1b
夊 590a⼢ 2f22
宀 5b80⼧ 2f27
巛 5ddb⼮ 2f2e
幺 5e7a⼳ 2f33⼳ 2f33⼳ 2f33
广 5e7f⼴ 2f34
廴 5ef4� fffd⼵ 2f35� fffd
彐 5f50⼹ 2f39
彡 5f61⼺ 2f3a
攴 6534⽁ 2f41
无 65e0� fffd⽆ 2f46� fffd
疒 7592⽧ 2f67
癶 7676� fffd⽨ 2f68� fffd
辵 8fb5⾡ 2fa1
隶 96b6� fffd⾪ 2faa� fffd
ˆ 2c6^ ff3e
〃 3003� fffd� fffd
仝 4edd� fffd� fffd
А 410� fffd
⋮ (+31)
Я 42f� fffd
а 430� fffd
⋮ (+31)
я 44f� fffd
⇧ 21e7� fffd
↸ 21b8� fffd
↹ 21b9� fffd
㇏ 31cf f7e5 f7e5� fffd
𠃌 200cc𠃌 d840 f7e6 f7e6� fffd𠃌 d840
乚 4e5a� fffd
𠂊 2008a𠂊 d840 f7e8 f7e8� fffd𠂊 d840
刂 5202� fffd
䒑 4491� fffd
龰 9fb0 f7eb f7eb� fffd
冈 5188� fffd
龱 9fb1 f7ed f7ed� fffd
𧘇 27607𧘇 d85d f7ee f7ee� fffd𧘇 d85d
¬ ffe2� fffd
¦ ffe4� fffd
' ff07� fffd
" ff02� fffd
㈱ 3231� fffd
№ 2116� fffd
℡ 2121� fffd
133611210
†) Included elsewhere in HKSCS; not listed here in the official table (4 characters).
ℎ) Mapped to the Unicode Kangxi Radicals block in the official HKSCS table (1 character). U+5E7A is already included elsewhere in HKSCS, just like the characters marked †.
ℏ) Missing from the official HKSCS table (2 characters).

There is also a less common version of E-Ten 1 which fills in empty cells in CNS 11643 Plane 1, which does not include numerals or radicals (which are in CNS 11643 Plane 1 itself already) but otherwise encodes an almost identical set of characters (PDF). No reference for this — there are probably some errors and missing characters in the PDF.

E-Ten 2 contains 41 characters added at the end of Plane 2 in Big 5: 7 hanzi and 34 line-drawing characters (PDF). The hanzi can be found in Unihan H. The four quarter-circles are supposed to have a double line (according to Lunde), but the corresponding characters appear to be missing from Unicode.

All four browsers include the E-Ten 2 extension. However, the official HKSCS table substitutes U+FFED for U+2593, and Safari and Opera follows this for their Big 5 HKSCS implementations.

Uncommon extensions

Less common extensions include Big 5 Plus, Big 5 E and Unicode-at-on (a version of which appears to have been implemented in Firefox). The lack of documentation, implementations or both makes it difficult to provide much useful information.

港 Hong Kong character sets

HKSCS

「香港增補字符集」 (Hong Kong Supplementary Character Set).
Previously:
「政府通用字庫」 (Government Chinese Character Set)

Ref.: HKSCS specification published by the Office of the Government Chief Information Officer, Hong Kong.

The Hong Kong Supplementary Character Set is an extension to Big 5 which encodes a number of hanzi needed in Hong Kong, as well as a few Latin letters and symbols, 5,009 characters in total.

Unihan H includes 4,543 hanzi (PDF) in addition to the ones found in E-Ten extensions as mentioned above. Safari, Firefox and Opera all implement around three fifths of these hanzi in accordance with Unihan.

HKSCS furthermore includes 66 additional hanzi and extended Latin letters in Column 8 (PDF). 17 of these are mapped to PUA characters in Firefox and Safari:

㇀ 31c0 f303
㇁ 31c1 f304
㇂ 31c2 f305
㇃ 31c3 f306
㇄ 31c4 f307
㇅ 31c5 f309
㇆ 31c6 f30c
㇇ 31c7 f30d
㇈ 31c8 f310
㇉ 31c9 f312
㇊ 31ca f313
㇋ 31cb f314
㇌ 31cc f315
㇍ 31cd f317
㇎ 31ce f318
⏚ 23da f34a
⏛ 23db f34b
The doubly accented E/e with circumflex and caron and E/e with circumflex and macron do not exist as precomposed characters in Unicode. No browser uses the correct character sequences.

40 additional radicals and phonetic letters have been added at the end of the E-Ten 1 extension (PDF). All but 6 of the 366 characters in the two E-Ten extensions are included as well (see above for details).

84 hanzi included in previous versions of HKSCS have been unified with characters found elsewhere in the extension or (more often) in Big 5 itself, i.e., in CNS 11643 Planes 1 and 2 (PDF). For 22 hanzi included in previous versions of the standards, no Unicode mapping is provided. These are currently characterised as ‘non verifiable’. Safari does not implement these compatibility mappings.

IE (default version, Western locale) shows no evidence of implementing HKSCS.

ISO-2022 encoding

PDF. MIME charset labels: iso-2022-cn-ext (Safari and Firefox), iso-2022-cn (Safari, Firefox and Opera; limited number of character sets in Firefox and Opera).

Note: This encoding includes Latin and Simplified Chinese as well. Only Traditional Chinese character sets are mentioned on this page.

CNS 11643

The designator sequence ESC $ ) G selects Plane 1 as G1, which can be invoked by shift out (SO, 0x0E); shift in (SI, 0x0F) switches back to G0, which always encodes ISO646-US. (iso-2022-cn-ext), (iso-2022-cn).

The designator sequence ESC $ * H selects Plane 2 as G2, which can be invoked (for the following character only) by single shift 2 (SS2, ESC N). (iso-2022-cn-ext), (iso-2022-cn),

The designator sequence ESC $ + I–M selects Plane 3–7 as G3, which can be invoked (for the following character only) by single shift 3 (SS3, ESC O).
Plane 3: iso-2022-cn-ext, iso-2022-cn.
Plane 4: iso-2022-cn-ext, iso-2022-cn.
Plane 5: iso-2022-cn-ext, iso-2022-cn.
Plane 6: iso-2022-cn-ext, iso-2022-cn.
Plane 7: iso-2022-cn-ext, iso-2022-cn.

EUC encoding

PDF. MIME charset labels: euc-tw (Safari and Opera), x-euc-tw (Firefox).

ISO646-US (ASCII)

Code set 0 (7-bit characters) encodes ISO646-US (euc-tw), (x-euc-tw).

CNS 11643

Code set 1 (unprefixed 8-bit characters) encodes Plane 1: euc-tw, x-euc-tw,

Code set 2 (8-bit characters prefixed by two bytes: SS2, 0x8E and 0xAn) encodes Plane n.
Plane 1: euc-tw, x-euc-tw.
Plane 2: euc-tw, x-euc-tw.
Plane 3: euc-tw, x-euc-tw.
Plane 14: euc-tw, x-euc-tw.
Plane 4: euc-tw, x-euc-tw.
Plane 5: euc-tw, x-euc-tw.
Plane 6: euc-tw, x-euc-tw.
Plane 7: euc-tw, x-euc-tw.
Plane 15: euc-tw, x-euc-tw.

DEC Hanyu encoding

PDF. MIME charset labels: x-chinese-cns (IE).

ISO646-US (ASCII) etc.

The one-byte range encodes ISO646-US. The Internet Explorer implementation also includes characters from Windows-1252 in columns 8 and 9.

CNS 11643

Plane 1 is encoded just like in EUC. Plane 2 is encoded as an 8-bit byte (the same as in EUC) followed by a 7-bit byte (the same as in ISO 2022). The IE implementation includes the E-Ten 1 (CNS) extension (test, 2nd byte ‘<’ not handled properly).

DEC Hanyu encodes Planes 3 and 4 with the prefix 0xC2 0xCB. The IE implementation does does not seem to support this, however.

E-Ten encoding

PDF. MIME charset label: x-chinese-eten (IE).

ISO646-US (ASCII)

The one-byte range encodes ISO646-US.

CNS 11643

The two-byte range encodes Planes 1 and 2. The IE implementation includes the E-Ten 1 (CNS) extension.

Big5 encoding

PDF. MIME charset labels: big5, big5-hkscs.

ISO646-US (ASCII)

The one-byte range encodes ISO646-US: big5, big5-hkscs.

CNS 11643

The two-byte range encodes Planes 1 and 2 with E-Ten 1 and 2 extensions as detailed above: big5.

Provided the appropriate MIME label, HKSCS extensions are included as well: big5-hkscs.

Ad­ver­tise­ments

Contact

temp-ock1@coq.no