Albeit not historically correct, we are going to treat Big 5 as an encoding of CNS 11643 Planes 1 and 2. (In actual fact, Big 5 predates CNS 11643.) However, the correspondence between the two is not entirely straightforward, partly because the planes in Big 5 have 157 cells whereas the ones in CNS 11643 have the usual 94, and partly because certain characters appear out of order in Big 5 (owing to incorrect stroke counts), are missing or appear more than once. Separate character tables are provided for Big 5 to avoid a complex reordering/mapping scheme.
「中文標準交換碼」 (Chinese Standard Interchange Code).
Previously:
「國家中文標準交換碼」 (National Chinese Standard Interchange Code)
This very large character set has its own governmental web site. The official standard can be obtained from the CNS Online Service (complete low-resolution preview available).
Only the first two planes are widely supported.
Ref.: ISO-IR 171 Chinese Standard Interchange Code (CSIC) — Set 1.
5,401 hanzi (Chinese characters or 漢字) in Columns 36–93; 684 miscellaneous characters in Columns 1–9 and 34.
Unihan (統漢字) T1 includes 6 hanzi in Column 2 (viz, Cells 89 兙, 90 兛, 91 兞, 92 兝, 93 兡 and 94 兣), 3 hanzi in Column 3 (viz, Cells 1 嗧, 2 瓩 and 3 糎) and 3 radicals in Column 7 (viz, Cells 8 亠, 15 冫 and 20 勹), which gives 5,413 hanzi in total (PDF — hanzi).
Apart from the 12 hanzi included in Unihan T1, columns 1–9 and 34 contain 672 characters (PDF — non-hanzi).
Unihan CNS1992-1 = CNS1986-1 includes the same 6 hanzi in Column 2 and 3 hanzi in Column 3 as well as 1 hanzi in Column 4 (viz, Cell 31 卄), 5,411 hanzi in total (PDF). This gives 674 characters outside of Unihan, but the full set of characters remains the same.
Unihan BigFive (level 1) contains the same hanzi as CNS1992-1 except that U+5F5D 彝 replaces U+545E 彞 (PDF).
BigFive itself does not include the 30 numerals in Column 6 or the 213 radicals in Columns 7–9, which gives a set of 431 non-hanzi (PDF — non-hanzi). Many of the missing characters are included in common extensions.
* * *
A certain number of the characters in Columns 1–6 have different Unicode mappings in different implementations. It is not uncommon that more than one Unicode character matches the pictorial representation in the standard, so this is not entirely unexpected. The leftmost column shows the Unicode mapping used for Big5 in all four browsers (except encircled numerals, which are not included in Big5 but have only one possible Unicode mapping [each]). The following table shows the differences:
| IE | Safari | Firefox | Opera | ||
|---|---|---|---|---|---|
| 2022 | EUC | ||||
| 3000 | 20 | ||||
| , ff0c | , 2c | ||||
| . ff0e | ․ 2024 | . 2e | |||
| ‧ 2027 | • 2022 | · b7 | ・ 30fb | ・ 30fb | |
| ; ff1b | ; 3b | ||||
| : ff1a | : 3a | ||||
| ? ff1f | ? 3f | ||||
| ! ff01 | ! 21 | ||||
| ﹒ fe52 | ‧ 2027 | ||||
| · b7 | ﹒ fe52 | ‧ 2027 | |||
| | ff5c | ︱ fe31 | � fffd | ︱ fe31 | ︱ fe31 | ︱ fe31 |
| – 2013 | � fffd | — 2014 | — 2014 | — 2014 | |
| ︱ fe31 | ︲ fe32 | � fffd | ︲ fe32 | ︲ fe32 | ︲ fe32 |
| — 2014 | � fffd | ﹘ fe58 | – 2013 | – 2013 | |
| ︳ fe33 | � fffd | � fffd | |||
| ╴ 2574 | _ ff3f | � fffd | � fffd | ||
| ︴ fe34 | � fffd | � fffd | |||
| ﹏ fe4f | ﹋ fe4b | � fffd | |||
| ( ff08 | ( 28 | ||||
| ) ff09 | ) 29 | ||||
| { ff5b | { 7b | ||||
| } ff5d | } 7d | ||||
| ‵ 2035 | ′ 2032 | ′ 2032 | ′ 2032 | ||
| ′ 2032 | ‵ 2035 | ‵ 2035 | ‵ 2035 | ||
| # ff03 | # 23 | ||||
| & ff06 | & 26 | ||||
| * ff0a | ✳ 2733 | ||||
| ¯ af | ‾ 203e | � fffd | ‾ 203e | ‾ 203e | ‾ 203e |
|  ̄ ffe3 | � fffd | � fffd | |||
| _ ff3f | � fffd | ||||
| ˍ 2cd | _ ff3f | _ 5f | � fffd | ||
| ﹋ fe4b | � fffd | ||||
| ﹌ fe4c | � fffd | ||||
| + ff0b | + 2b | ||||
| - ff0d | - 2d | ||||
| < ff1c | < 3c | ||||
| > ff1e | > 3e | ||||
| = ff1d | = 3d | ||||
| ﹥ fe65 | ﹦ fe66 | ﹦ fe66 | |||
| ﹦ fe66 | ﹥ fe65 | ﹥ fe65 | |||
| ~ ff5e | ∼ 223c | 〜 301c | ∼ 223c | ∼ 223c | ∼ 223c |
| ⊕ 2295 | ♁ 2641 | ♁ 2641 | |||
| ⊙ 2299 | ☉ 2609 | ☉ 2609 | |||
| ∥ 2225 | ‖ 2016 | ‖ 2016 | ‖ 2016 | ||
| ∣ 2223 | | 7c | | ff5c | | ff5c | | ff5c | |
| / ff0f | ⁄ 2044 | ||||
| \ ff3c | \ 5c | ||||
| ∕ 2215 | / ff0f | / 2f | |||
| $ ff04 | $ 24 | ||||
| ¥ ffe5 | ¥ a5 | ||||
| ¢ ffe0 | ¢ a2 | ||||
| £ ffe1 | £ a3 | ||||
| % ff05 | % 25 | ||||
| @ ff20 | @ 40 | ||||
| ° b0 | ゜ 309c | ||||
| 0 ff10 | 0 30 | ||||
| ⋮ | ⋮ (+8) | ||||
| 9 ff19 | 9 39 | ||||
| 十 5341 | � fffd | 〸 3038 | � fffd | ||
| 卄 5344 | � fffd | 〹 3039 | |||
| 卅 5345 | � fffd | 〺 303a | � fffd | ||
| A ff21 | A 41 | ||||
| ⋮ | ⋮ (+24) | ||||
| Z ff3a | Z 5a | ||||
| a ff41 | a 61 | ||||
| ⋮ | ⋮ (+24) | ||||
| z ff5a | z 7a | ||||
| ˉ 2c9 | ˟ 2df | 2003 | |||
| ④ 2463 | ⑤ 2464 | ||||
| ⋮ | ⋮ (+5) | ||||
| ⑩ 2469 | ⑪ 246a | ||||
| 15 | 120 | 7 | 18 | 23 | |
Most of the alternative mappings are reasonable; a few are clearly at odds with the published standard, but may be more common in practice; some are wrong for no apparent reason.
* * *
Columns 7–9 enumerate 213 of the 214 Kangxi radicals (no. 34 ⼡, marked with an asterisk below, is missing because official Taiwanese sources have unified it with the similar-looking no. 35 ⼢). 166 of the radicals appear as ‘normal’ hanzi elsewhere in Plane 1, 20 appear in Plane 2, and 25 (including no. 34) appear in Plane 3, whereas three radicals are not encoded anywhere else (nos. 8, 15 and 20). (Note: Lunde’s numbers are slightly different, probably because he maps radical no. 174 to U+9752 青 in Plane 1 instead of U+9751 靑 in Plane 3.)
Big 5 encoding only includes Planes 1 and 2, but 22 radicals from Plane 3 as well as the ones not encoded elsewhere are included in a very common E-Ten extension. 3 Kangxi radicals from Plane 3 are still missing (nos. 2, 34 and 174).
Unicode encodes Kangxi radicals twice: both in a dedicated radicals block and in the standard ideographic block.
The table below gives further details:
| Unicode | CNS | E-Ten | |||
|---|---|---|---|---|---|
| Kangxi radicals | Unified ideographs | 11643 Plane | Column 38 Cell | ||
| 1. | ⼀ | U+2F00 | U+4E00 | 1 | |
| 2. | ⼁ | U+2F01 | U+4E28 | 3 | |
| 3. | ⼂ | U+2F02 | U+4E36 | 3 | bf |
| 4. | ⼃ | U+2F03 | U+4E3F | 3 | c0 |
| 5. | ⼄ | U+2F04 | U+4E59 | 1 | |
| 6. | ⼅ | U+2F05 | U+4E85 | 3 | c1 |
| 7. | ⼆ | U+2F06 | U+4E8C | 1 | |
| 8. | ⼇ | U+2F07 | U+4EA0 | — | c2 |
| 9. | ⼈ | U+2F08 | U+4EBA | 1 | |
| 10. | ⼉ | U+2F09 | U+513F | 1 | |
| 11. | ⼊ | U+2F0A | U+5165 | 1 | |
| 12. | ⼋ | U+2F0B | U+516B | 1 | |
| 13. | ⼌ | U+2F0C | U+5182 | 3 | c3 |
| 14. | ⼍ | U+2F0D | U+5196 | 3 | c4 |
| 15. | ⼎ | U+2F0E | U+51AB | — | c5 |
| 16. | ⼏ | U+2F0F | U+51E0 | 1 | |
| 17. | ⼐ | U+2F10 | U+51F5 | 2 | |
| 18. | ⼑ | U+2F11 | U+5200 | 1 | |
| 19. | ⼒ | U+2F12 | U+529B | 1 | |
| 20. | ⼓ | U+2F13 | U+52F9 | — | c6 |
| 21. | ⼔ | U+2F14 | U+5315 | 1 | |
| 22. | ⼕ | U+2F15 | U+531A | 2 | |
| 23. | ⼖ | U+2F16 | U+5338 | 3 | c7 |
| 24. | ⼗ | U+2F17 | U+5341 | 1 | |
| 25. | ⼘ | U+2F18 | U+535C | 1 | |
| 26. | ⼙ | U+2F19 | U+5369 | 3 | c8 |
| 27. | ⼚ | U+2F1A | U+5382 | 2 | |
| 28. | ⼛ | U+2F1B | U+53B6 | 3 | c9 |
| 29. | ⼜ | U+2F1C | U+53C8 | 1 | |
| 30. | ⼝ | U+2F1D | U+53E3 | 1 | |
| 31. | ⼞ | U+2F1E | U+56D7 | 2 | |
| 32. | ⼟ | U+2F1F | U+571F | 1 | |
| 33. | ⼠ | U+2F20 | U+58EB | 1 | |
| 34. | ⼡* | U+2F21 | U+5902 | 3 | |
| 35. | ⼢ | U+2F22 | U+590A | 3 | ca |
| 36. | ⼣ | U+2F23 | U+5915 | 1 | |
| 37. | ⼤ | U+2F24 | U+5927 | 1 | |
| 38. | ⼥ | U+2F25 | U+5973 | 1 | |
| 39. | ⼦ | U+2F26 | U+5B50 | 1 | |
| 40. | ⼧ | U+2F27 | U+5B80 | 3 | cb |
| 41. | ⼨ | U+2F28 | U+5BF8 | 1 | |
| 42. | ⼩ | U+2F29 | U+5C0F | 1 | |
| 43. | ⼪ | U+2F2A | U+5C22 | 1 | |
| 44. | ⼫ | U+2F2B | U+5C38 | 1 | |
| 45. | ⼬ | U+2F2C | U+5C6E | 2 | |
| 46. | ⼭ | U+2F2D | U+5C71 | 1 | |
| 47. | ⼮ | U+2F2E | U+5DDB | 3 | cc |
| 48. | ⼯ | U+2F2F | U+5DE5 | 1 | |
| 49. | ⼰ | U+2F30 | U+5DF1 | 1 | |
| 50. | ⼱ | U+2F31 | U+5DFE | 1 | |
| 51. | ⼲ | U+2F32 | U+5E72 | 1 | |
| 52. | ⼳ | U+2F33 | U+5E7A | 3 | cd |
| 53. | ⼴ | U+2F34 | U+5E7F | 3 | ce |
| 54. | ⼵ | U+2F35 | U+5EF4 | 3 | cf |
| 55. | ⼶ | U+2F36 | U+5EFE | 1 | |
| 56. | ⼷ | U+2F37 | U+5F0B | 1 | |
| 57. | ⼸ | U+2F38 | U+5F13 | 1 | |
| 58. | ⼹ | U+2F39 | U+5F50 | 3 | d0 |
| 59. | ⼺ | U+2F3A | U+5F61 | 3 | d1 |
| 60. | ⼻ | U+2F3B | U+5F73 | 2 | |
| 61. | ⼼ | U+2F3C | U+5FC3 | 1 | |
| 62. | ⼽ | U+2F3D | U+6208 | 1 | |
| 63. | ⼾ | U+2F3E | U+6236 | 1 | |
| 64. | ⼿ | U+2F3F | U+624B | 1 | |
| 65. | ⽀ | U+2F40 | U+652F | 1 | |
| 66. | ⽁ | U+2F41 | U+6534 | 3 | d2 |
| 67. | ⽂ | U+2F42 | U+6587 | 1 | |
| 68. | ⽃ | U+2F43 | U+6597 | 1 | |
| 69. | ⽄ | U+2F44 | U+65A4 | 1 | |
| 70. | ⽅ | U+2F45 | U+65B9 | 1 | |
| 71. | ⽆ | U+2F46 | U+65E0 | 3 | d3 |
| 72. | ⽇ | U+2F47 | U+65E5 | 1 | |
| 73. | ⽈ | U+2F48 | U+66F0 | 1 | |
| 74. | ⽉ | U+2F49 | U+6708 | 1 | |
| 75. | ⽊ | U+2F4A | U+6728 | 1 | |
| 76. | ⽋ | U+2F4B | U+6B20 | 1 | |
| 77. | ⽌ | U+2F4C | U+6B62 | 1 | |
| 78. | ⽍ | U+2F4D | U+6B79 | 1 | |
| 79. | ⽎ | U+2F4E | U+6BB3 | 2 | |
| 80. | ⽏ | U+2F4F | U+6BCB | 1 | |
| 81. | ⽐ | U+2F50 | U+6BD4 | 1 | |
| 82. | ⽑ | U+2F51 | U+6BDB | 1 | |
| 83. | ⽒ | U+2F52 | U+6C0F | 1 | |
| 84. | ⽓ | U+2F53 | U+6C14 | 2 | |
| 85. | ⽔ | U+2F54 | U+6C34 | 1 | |
| 86. | ⽕ | U+2F55 | U+706B | 1 | |
| 87. | ⽖ | U+2F56 | U+722A | 1 | |
| 88. | ⽗ | U+2F57 | U+7236 | 1 | |
| 89. | ⽘ | U+2F58 | U+723B | 1 | |
| 90. | ⽙ | U+2F59 | U+723F | 2 | |
| 91. | ⽚ | U+2F5A | U+7247 | 1 | |
| 92. | ⽛ | U+2F5B | U+7259 | 1 | |
| 93. | ⽜ | U+2F5C | U+725B | 1 | |
| 94. | ⽝ | U+2F5D | U+72AC | 1 | |
| 95. | ⽞ | U+2F5E | U+7384 | 1 | |
| 96. | ⽟ | U+2F5F | U+7389 | 1 | |
| 97. | ⽠ | U+2F60 | U+74DC | 1 | |
| 98. | ⽡ | U+2F61 | U+74E6 | 1 | |
| 99. | ⽢ | U+2F62 | U+7518 | 1 | |
| 100. | ⽣ | U+2F63 | U+751F | 1 | |
| 101. | ⽤ | U+2F64 | U+7528 | 1 | |
| 102. | ⽥ | U+2F65 | U+7530 | 1 | |
| 103. | ⽦ | U+2F66 | U+758B | 1 | |
| 104. | ⽧ | U+2F67 | U+7592 | 3 | d4 |
| 105. | ⽨ | U+2F68 | U+7676 | 3 | d5 |
| 106. | ⽩ | U+2F69 | U+767D | 1 | |
| 107. | ⽪ | U+2F6A | U+76AE | 1 | |
| 108. | ⽫ | U+2F6B | U+76BF | 1 | |
| 109. | ⽬ | U+2F6C | U+76EE | 1 | |
| 110. | ⽭ | U+2F6D | U+77DB | 1 | |
| 111. | ⽮ | U+2F6E | U+77E2 | 1 | |
| 112. | ⽯ | U+2F6F | U+77F3 | 1 | |
| 113. | ⽰ | U+2F70 | U+793A | 1 | |
| 114. | ⽱ | U+2F71 | U+79B8 | 2 | |
| 115. | ⽲ | U+2F72 | U+79BE | 1 | |
| 116. | ⽳ | U+2F73 | U+7A74 | 1 | |
| 117. | ⽴ | U+2F74 | U+7ACB | 1 | |
| 118. | ⽵ | U+2F75 | U+7AF9 | 1 | |
| 119. | ⽶ | U+2F76 | U+7C73 | 1 | |
| 120. | ⽷ | U+2F77 | U+7CF8 | 1 | |
| 121. | ⽸ | U+2F78 | U+7F36 | 1 | |
| 122. | ⽹ | U+2F79 | U+7F51 | 2 | |
| 123. | ⽺ | U+2F7A | U+7F8A | 1 | |
| 124. | ⽻ | U+2F7B | U+7FBD | 1 | |
| 125. | ⽼ | U+2F7C | U+8001 | 1 | |
| 126. | ⽽ | U+2F7D | U+800C | 1 | |
| 127. | ⽾ | U+2F7E | U+8012 | 1 | |
| 128. | ⽿ | U+2F7F | U+8033 | 1 | |
| 129. | ⾀ | U+2F80 | U+807F | 1 | |
| 130. | ⾁ | U+2F81 | U+8089 | 1 | |
| 131. | ⾂ | U+2F82 | U+81E3 | 1 | |
| 132. | ⾃ | U+2F83 | U+81EA | 1 | |
| 133. | ⾄ | U+2F84 | U+81F3 | 1 | |
| 134. | ⾅ | U+2F85 | U+81FC | 1 | |
| 135. | ⾆ | U+2F86 | U+820C | 1 | |
| 136. | ⾇ | U+2F87 | U+821B | 1 | |
| 137. | ⾈ | U+2F88 | U+821F | 1 | |
| 138. | ⾉ | U+2F89 | U+826E | 1 | |
| 139. | ⾊ | U+2F8A | U+8272 | 1 | |
| 140. | ⾋ | U+2F8B | U+8278 | 2 | |
| 141. | ⾌ | U+2F8C | U+864D | 2 | |
| 142. | ⾍ | U+2F8D | U+866B | 1 | |
| 143. | ⾎ | U+2F8E | U+8840 | 1 | |
| 144. | ⾏ | U+2F8F | U+884C | 1 | |
| 145. | ⾐ | U+2F90 | U+8863 | 1 | |
| 146. | ⾑ | U+2F91 | U+897E | 2 | |
| 147. | ⾒ | U+2F92 | U+898B | 1 | |
| 148. | ⾓ | U+2F93 | U+89D2 | 1 | |
| 149. | ⾔ | U+2F94 | U+8A00 | 1 | |
| 150. | ⾕ | U+2F95 | U+8C37 | 1 | |
| 151. | ⾖ | U+2F96 | U+8C46 | 1 | |
| 152. | ⾗ | U+2F97 | U+8C55 | 1 | |
| 153. | ⾘ | U+2F98 | U+8C78 | 2 | |
| 154. | ⾙ | U+2F99 | U+8C9D | 1 | |
| 155. | ⾚ | U+2F9A | U+8D64 | 1 | |
| 156. | ⾛ | U+2F9B | U+8D70 | 1 | |
| 157. | ⾜ | U+2F9C | U+8DB3 | 1 | |
| 158. | ⾝ | U+2F9D | U+8EAB | 1 | |
| 159. | ⾞ | U+2F9E | U+8ECA | 1 | |
| 160. | ⾟ | U+2F9F | U+8F9B | 1 | |
| 161. | ⾠ | U+2FA0 | U+8FB0 | 1 | |
| 162. | ⾡ | U+2FA1 | U+8FB5 | 3 | d6 |
| 163. | ⾢ | U+2FA2 | U+9091 | 1 | |
| 164. | ⾣ | U+2FA3 | U+9149 | 1 | |
| 165. | ⾤ | U+2FA4 | U+91C6 | 1 | |
| 166. | ⾥ | U+2FA5 | U+91CC | 1 | |
| 167. | ⾦ | U+2FA6 | U+91D1 | 1 | |
| 168. | ⾧ | U+2FA7 | U+9577 | 1 | |
| 169. | ⾨ | U+2FA8 | U+9580 | 1 | |
| 170. | ⾩ | U+2FA9 | U+961C | 1 | |
| 171. | ⾪ | U+2FAA | U+96B6 | 3 | d7 |
| 172. | ⾫ | U+2FAB | U+96B9 | 1 | |
| 173. | ⾬ | U+2FAC | U+96E8 | 1 | |
| 174. | ⾭ | U+2FAD | U+9751 | 3 | |
| 175. | ⾮ | U+2FAE | U+975E | 1 | |
| 176. | ⾯ | U+2FAF | U+9762 | 1 | |
| 177. | ⾰ | U+2FB0 | U+9769 | 1 | |
| 178. | ⾱ | U+2FB1 | U+97CB | 1 | |
| 179. | ⾲ | U+2FB2 | U+97ED | 1 | |
| 180. | ⾳ | U+2FB3 | U+97F3 | 1 | |
| 181. | ⾴ | U+2FB4 | U+9801 | 1 | |
| 182. | ⾵ | U+2FB5 | U+98A8 | 1 | |
| 183. | ⾶ | U+2FB6 | U+98DB | 1 | |
| 184. | ⾷ | U+2FB7 | U+98DF | 1 | |
| 185. | ⾸ | U+2FB8 | U+9996 | 1 | |
| 186. | ⾹ | U+2FB9 | U+9999 | 1 | |
| 187. | ⾺ | U+2FBA | U+99AC | 1 | |
| 188. | ⾻ | U+2FBB | U+9AA8 | 1 | |
| 189. | ⾼ | U+2FBC | U+9AD8 | 1 | |
| 190. | ⾽ | U+2FBD | U+9ADF | 2 | |
| 191. | ⾾ | U+2FBE | U+9B25 | 1 | |
| 192. | ⾿ | U+2FBF | U+9B2F | 2 | |
| 193. | ⿀ | U+2FC0 | U+9B32 | 1 | |
| 194. | ⿁ | U+2FC1 | U+9B3C | 1 | |
| 195. | ⿂ | U+2FC2 | U+9B5A | 1 | |
| 196. | ⿃ | U+2FC3 | U+9CE5 | 1 | |
| 197. | ⿄ | U+2FC4 | U+9E75 | 1 | |
| 198. | ⿅ | U+2FC5 | U+9E7F | 1 | |
| 199. | ⿆ | U+2FC6 | U+9EA5 | 1 | |
| 200. | ⿇ | U+2FC7 | U+9EBB | 1 | |
| 201. | ⿈ | U+2FC8 | U+9EC3 | 1 | |
| 202. | ⿉ | U+2FC9 | U+9ECD | 1 | |
| 203. | ⿊ | U+2FCA | U+9ED1 | 1 | |
| 204. | ⿋ | U+2FCB | U+9EF9 | 2 | |
| 205. | ⿌ | U+2FCC | U+9EFD | 2 | |
| 206. | ⿍ | U+2FCD | U+9F0E | 1 | |
| 207. | ⿎ | U+2FCE | U+9F13 | 1 | |
| 208. | ⿏ | U+2FCF | U+9F20 | 1 | |
| 209. | ⿐ | U+2FD0 | U+9F3B | 1 | |
| 210. | ⿑ | U+2FD1 | U+9F4A | 1 | |
| 211. | ⿒ | U+2FD2 | U+9F52 | 1 | |
| 212. | ⿓ | U+2FD3 | U+9F8D | 1 | |
| 213. | ⿔ | U+2FD4 | U+9F9C | 1 | |
| 214. | ⿕ | U+2FD5 | U+9FA0 | 2 |
Safari EUC substitutes U+9752 青 for U+9751 靑 as radical no. 174. This is actually a better match for the glyph in ISO-IR 171 and Lunde.
Firefox maps to Unicode’s Kangxi Radicals block.
In Opera, all 213 radicals are missing. In Safari’s ISO 2022 encoding, 210 radicals are missing, only the 3 radicals in Unihan (and nowhere else in CNS 11643) are included.
Internet Explorer includes the 25 E-Ten radicals (although no. 35. ⼢ is mapped to no. 34 ⼡), which means that 189 of the 213 radicals are missing
* * *
In Internet Explorer (Big 5 and DEC, possibly E-Ten), the 33 Control Pictures U+2400–U+241F and U+2421 in Column 34 are replaced by real control characters 0x00–0x1F and 0x7F, which turn into question marks when they appear in HTML (as opposed to plain text). These characters are missing from Safari (Big 5) as well.
For the hanzi in Columns 36–93, all browsers follow Unihan almost perfectly. Internet Explorer however substitutes U+5F5D 彝 for U+5F5E 彞, thus making it Big5-compatible.
All four browsers include the euro symbol at the end of the symbols range in Big 5.
Ref.: ISO-IR 172 Chinese Standard Interchange Code (CSIC) — Set 2.
7,650 hanzi in Columns 1–82.
Unihan T2 = CNS1992-2 = CNS1986-2 (PDF).
Unihan BigFive (level 2) contains the same hanzi as well as two duplicates, viz, U+FA0C 兀 in addition to U+5140 兀 and U+FA0D 嗀 in addition to U+55C0 嗀 (PDF).
Ref.: ISO-IR 183 Chinese Standard Interchange Code — Set 3.
6,148 hanzi in Columns 1–66.
Unihan T3 covers all but 1, 6,147 hanzi (PDF).
Unihan T3 additionally contains a number of hanzi in Columns 68–71 which are not part of the published CNS standard. Lunde refers to this as a ‘fictitious extension’ (PDF).
Safari’s ISO-2022 implementation is fairly complete (and also includes the fictitious extension, which is perhaps not a good idea), whereas over a third of the characters are missing from the ISO-2022 and EUC implementations in Firefox (which excludes fictitious extension). Neither Opera nor Internet Explorer implements Planes 3–7. Opera however implements Plane 14 (see below).
The 1986 version of the standard had a Plane 14 which was mostly identical to Plane 3 in the 1992 version, the only difference being the additional 171 hanzi later assigned to Plane 4, which gives a total of 6,319 hanzi. Lunde provides the mapping from Plane 14 to Plane 4 for these additional characters, which, in combination with Unihan T4, enables us to make a character chart for 170 of these (PDF — Plane 14 extension), the last one being mapped to a Plane 4 character missing from Unihan.
Unihan CNS1992-3 = CNS1986-E is an incomplete subset of Plane 3 with Plane 14 and fictitious extensions. This collection of characters is of little interest except that it seems to form the basis for Opera’s implementation of Plane 14. Around one third of the characters are missing.
Ref.: ISO-IR 184 Chinese Standard Interchange Code — Set 4.
7,298 hanzi in Columns 1–78.
Unihan T4 only includes 7,286 hanzi, which means that 12 are missing (PDF).
Nearly half of the characters are missing in Safari, and almost nine tenths are missing in Firefox.
Ref.: ISO-IR 185 Chinese Standard Interchange Code — Set 5.
8,603 hanzi in Columns 1–92.
Unihan T5 enumerates 8,601 hanzi; 2 are missing (PDF).
Safari implements around five per cent, Firefox less than one per cent of the characters.
Ref.: ISO-IR 186 Chinese Standard Interchange Code — Set 6.
6,388 hanzi in Columns 1–68.
Unihan T6 covers 6,386 of these; 2 are missing (PDF).
Safari implements under four per cent, Firefox just over four per mille of the characters.
Ref.: ISO-IR 187 Chinese Standard Interchange Code — Set 7.
6,539 hanzi in Columns 1–70.
Unihan T7 includes 6,357 hanzi; again, 2 are missing (PDF).
Safari implements around two and a half per cent, Firefox around two and a half per mille of the characters.
The current version of the standard furthermore includes Planes 10, 11, 12, 13, 14 (unrelated to the old Plane 14 described above) and 15 (already present in the 1986 version; certain characters seem to have been moved to other planes). We are not aware of any support for these planes in any browser.
E-Ten 1 contains 365 characters added at the end of Plane 1 in Big 5: the 30 numerals from CNS 11643 Plane 1 Column 6, 25 radicals (see table above), 169 Japanese hiragana/katakana, 66 Cyrillic letters, 40 E-Ten input codes (not included in the PDF or in any browser implementation) and 35 hanzi and symbols (PDF). 29 of the radicals/hanzi can be found in Unihan H.
This extension is missing from IE as well as from Safari’s Big 5 (non-HKSCS) implementation. Differences between implementations are summarised below for plain Big 5 (B) as well as Big 5 with HKSCS extensions (H).
| Safari | Firefox | Opera | |||
|---|---|---|---|---|---|
| H | B | H | B | H | |
| 丶 4e36 | ⼂ 2f02 | ||||
| 丿 4e3f | ⼃ 2f03 | ||||
| 亅 4e85 | ⼅ 2f05 | ||||
| 亠 4ea0 | ⼇ 2f07 | ||||
| 冂 5182 | ⼌ 2f0c | ||||
| 冖 5196 | ⼍ 2f0d | ||||
| 冫 51ab | ⼎ 2f0e | ||||
| 勹 52f9 | ⼓ 2f13 | ||||
| 匸 5338 | ⼖ 2f16 | ||||
| 卩 5369 | ⼙ 2f19 | ||||
| 厶 53b6 | ⼛ 2f1b | ||||
| 夊 590a | ⼢ 2f22 | ||||
| 宀 5b80 | ⼧ 2f27 | ||||
| 巛 5ddb | ⼮ 2f2e | ||||
| 幺 5e7aℎ | ⼳ 2f33 | ⼳ 2f33 | ⼳ 2f33 | ||
| 广 5e7f | ⼴ 2f34 | ||||
| 廴 5ef4† | � fffd | ⼵ 2f35 | � fffd | ||
| 彐 5f50 | ⼹ 2f39 | ||||
| 彡 5f61 | ⼺ 2f3a | ||||
| 攴 6534 | ⽁ 2f41 | ||||
| 无 65e0† | � fffd | ⽆ 2f46 | � fffd | ||
| 疒 7592 | ⽧ 2f67 | ||||
| 癶 7676† | � fffd | ⽨ 2f68 | � fffd | ||
| 辵 8fb5 | ⾡ 2fa1 | ||||
| 隶 96b6† | � fffd | ⾪ 2faa | � fffd | ||
| ˆ 2c6 | ^ ff3e | ||||
| 〃 3003ℏ | � fffd | � fffd | |||
| 仝 4eddℏ | � fffd | � fffd | |||
| А 410 | � fffd | ||||
| ⋮ | ⋮ (+31) | ||||
| Я 42f | � fffd | ||||
| а 430 | � fffd | ||||
| ⋮ | ⋮ (+31) | ||||
| я 44f | � fffd | ||||
| ⇧ 21e7 | � fffd | ||||
| ↸ 21b8 | � fffd | ||||
| ↹ 21b9 | � fffd | ||||
| ㇏ 31cf | f7e5 | f7e5 | � fffd | ||
| 𠃌 200cc | 𠃌 d840 | f7e6 | f7e6 | � fffd | 𠃌 d840 |
| 乚 4e5a | � fffd | ||||
| 𠂊 2008a | 𠂊 d840 | f7e8 | f7e8 | � fffd | 𠂊 d840 |
| 刂 5202 | � fffd | ||||
| 䒑 4491 | � fffd | ||||
| 龰 9fb0 | f7eb | f7eb | � fffd | ||
| 冈 5188 | � fffd | ||||
| 龱 9fb1 | f7ed | f7ed | � fffd | ||
| 𧘇 27607 | 𧘇 d85d | f7ee | f7ee | � fffd | 𧘇 d85d |
| ¬ ffe2 | � fffd | ||||
| ¦ ffe4 | � fffd | ||||
| ' ff07 | � fffd | ||||
| " ff02 | � fffd | ||||
| ㈱ 3231 | � fffd | ||||
| № 2116 | � fffd | ||||
| ℡ 2121 | � fffd | ||||
| 13 | 3 | 6 | 112 | 10 | |
There is also a less common version of E-Ten 1 which fills in empty cells in CNS 11643 Plane 1, which does not include numerals or radicals (which are in CNS 11643 Plane 1 itself already) but otherwise encodes an almost identical set of characters (PDF). No reference for this — there are probably some errors and missing characters in the PDF.
E-Ten 2 contains 41 characters added at the end of Plane 2 in Big 5: 7 hanzi and 34 line-drawing characters (PDF). The hanzi can be found in Unihan H. The four quarter-circles are supposed to have a double line (according to Lunde), but the corresponding characters appear to be missing from Unicode.
All four browsers include the E-Ten 2 extension. However, the official HKSCS table substitutes U+FFED for U+2593, and Safari and Opera follows this for their Big 5 HKSCS implementations.
Less common extensions include Big 5 Plus, Big 5 E and Unicode-at-on (a version of which appears to have been implemented in Firefox). The lack of documentation, implementations or both makes it difficult to provide much useful information.
「香港增補字符集」 (Hong Kong Supplementary Character Set).
Previously:
「政府通用字庫」 (Government Chinese Character Set)
Ref.: HKSCS specification published by the Office of the Government Chief Information Officer, Hong Kong.
The Hong Kong Supplementary Character Set is an extension to Big 5 which encodes a number of hanzi needed in Hong Kong, as well as a few Latin letters and symbols, 5,009 characters in total.
Unihan H includes 4,543 hanzi (PDF) in addition to the ones found in E-Ten extensions as mentioned above. Safari, Firefox and Opera all implement around three fifths of these hanzi in accordance with Unihan.
HKSCS furthermore includes 66 additional hanzi and extended Latin letters in Column 8 (PDF). 17 of these are mapped to PUA characters in Firefox and Safari:
| ㇀ 31c0 | f303 |
|---|---|
| ㇁ 31c1 | f304 |
| ㇂ 31c2 | f305 |
| ㇃ 31c3 | f306 |
| ㇄ 31c4 | f307 |
| ㇅ 31c5 | f309 |
| ㇆ 31c6 | f30c |
| ㇇ 31c7 | f30d |
| ㇈ 31c8 | f310 |
| ㇉ 31c9 | f312 |
| ㇊ 31ca | f313 |
| ㇋ 31cb | f314 |
| ㇌ 31cc | f315 |
| ㇍ 31cd | f317 |
| ㇎ 31ce | f318 |
| ⏚ 23da | f34a |
| ⏛ 23db | f34b |
40 additional radicals and phonetic letters have been added at the end of the E-Ten 1 extension (PDF). All but 6 of the 366 characters in the two E-Ten extensions are included as well (see above for details).
84 hanzi included in previous versions of HKSCS have been unified with characters found elsewhere in the extension or (more often) in Big 5 itself, i.e., in CNS 11643 Planes 1 and 2 (PDF). For 22 hanzi included in previous versions of the standards, no Unicode mapping is provided. These are currently characterised as ‘non verifiable’. Safari does not implement these compatibility mappings.
IE (default version, Western locale) shows no evidence of implementing HKSCS.
Note: This encoding includes Latin and Simplified Chinese as well. Only Traditional Chinese character sets are mentioned on this page.
The designator sequence ESC $ ) G selects Plane 1 as G1, which can be invoked by shift out (SO, 0x0E); shift in (SI, 0x0F) switches back to G0, which always encodes ISO646-US. (iso-2022-cn-ext), (iso-2022-cn).
The designator sequence ESC $ * H selects Plane 2 as G2, which can be invoked (for the following character only) by single shift 2 (SS2, ESC N). (iso-2022-cn-ext), (iso-2022-cn),
The designator sequence ESC $ + I–M selects Plane 3–7 as G3, which can be invoked (for the following character only) by single shift 3 (SS3, ESC O).
Plane 3:
iso-2022-cn-ext,
iso-2022-cn.
Plane 4:
iso-2022-cn-ext,
iso-2022-cn.
Plane 5:
iso-2022-cn-ext,
iso-2022-cn.
Plane 6:
iso-2022-cn-ext,
iso-2022-cn.
Plane 7:
iso-2022-cn-ext,
iso-2022-cn.
Code set 0 (7-bit characters) encodes ISO646-US (euc-tw), (x-euc-tw).
Code set 1 (unprefixed 8-bit characters) encodes Plane 1: euc-tw, x-euc-tw,
Code set 2 (8-bit characters prefixed by two bytes: SS2, 0x8E and 0xAn) encodes Plane n.
Plane 1:
euc-tw,
x-euc-tw.
Plane 2:
euc-tw,
x-euc-tw.
Plane 3:
euc-tw,
x-euc-tw.
Plane 14:
euc-tw,
x-euc-tw.
Plane 4:
euc-tw,
x-euc-tw.
Plane 5:
euc-tw,
x-euc-tw.
Plane 6:
euc-tw,
x-euc-tw.
Plane 7:
euc-tw,
x-euc-tw.
Plane 15:
euc-tw,
x-euc-tw.
The one-byte range encodes ISO646-US. The Internet Explorer implementation also includes characters from Windows-1252 in columns 8 and 9.
Plane 1 is encoded just like in EUC. Plane 2 is encoded as an 8-bit byte (the same as in EUC) followed by a 7-bit byte (the same as in ISO 2022). The IE implementation includes the E-Ten 1 (CNS) extension (test, 2nd byte ‘<’ not handled properly).
DEC Hanyu encodes Planes 3 and 4 with the prefix 0xC2 0xCB. The IE implementation does does not seem to support this, however.
The one-byte range encodes ISO646-US.
The one-byte range encodes ISO646-US: big5, big5-hkscs.
The two-byte range encodes Planes 1 and 2 with E-Ten 1 and 2 extensions as detailed above: big5.
Provided the appropriate MIME label, HKSCS extensions are included as well: big5-hkscs.