韓 Korean

South Korean character sets

KS X 1001 (KS C 5601)

《정보 교환용 부호계(한글 및 한자)》 [Code for information interchange (Hangeul and hanja)]

Ref.: ISO-IR 149. Korean Graphic Character Set for Information Interchange

8,244 characters: hanja, hangul and letters/symbols.

Unihan KSC0 enumerates all the 4,888 hanja (Chinese characters or 한자/漢字) in Columns 42–93 (PDF — hanja). Unihan K0 differs in four positions:

Column-CellKSC0K0
50-11蓼 f9c2
62-62沙 6c99蓼 f9c2
62-63泗 6cd7隸 f9b8
71-70隸 f9b8

Columns 16–40 contain a set of 2,350 precomposed (wansung 완성/完成) hangul (한글) syllables (PDF — hangul) in Unicode order, selected one by one based on frequency of use in Korean without any apparent systematicity.

Columns 1–12 cover a large set of 986 alphabetic characters (Latin, Greek, Cyrillic, Japanese and Korean), numerals, abbreviations and symbols (PDF — others). The overview of characters in the reference above includes four additional Roman numerals (xi, xii, XI and XII) which are missing both from the table and from the total character count. Most browser implementations include the 1998 additions (€ at 2–70 and ® at 2–71), but none includes the 2002 addition (postal code mark, circled 우 at 2–72).

Eight characters have diverging Unicode mappings in Safari’s ISO-2022 implementation:

Safari
2022
· b7・ 30fb
­ ad‐ 2010
― 2015— 2014
∼ 223c〜 301c
~ ff5e˜ 2dc
⊙ 2299◉ 25c9
€ 20ac� fffd
® ae� fffd

The first part of Column 4 (modern jamo) are missing from Internet Explorer’s Johab implementation. (These characters are covered elsewhere as part of Johab’s systematic encoding of individual jamo and composed hangul.) The following 51 characters are affected:

IE
Johab
ㄱ 3131? 3f
ㄲ 3132? 3f
ㄳ 3133? 3f
ㄴ 3134? 3f
ㄵ 3135? 3f
ㄶ 3136? 3f
ㄷ 3137? 3f
ㄸ 3138? 3f
ㄹ 3139? 3f
ㄺ 313a? 3f
ㄻ 313b? 3f
ㄼ 313c? 3f
ㄽ 313d? 3f
ㄾ 313e? 3f
ㄿ 313f? 3f
ㅀ 3140? 3f
ㅁ 3141? 3f
ㅂ 3142? 3f
ㅃ 3143? 3f
ㅄ 3144? 3f
ㅅ 3145? 3f
ㅆ 3146? 3f
ㅇ 3147? 3f
ㅈ 3148? 3f
ㅉ 3149? 3f
ㅊ 314a? 3f
ㅋ 314b? 3f
ㅌ 314c? 3f
ㅍ 314d? 3f
ㅎ 314e? 3f
ㅏ 314f? 3f
ㅐ 3150? 3f
ㅑ 3151? 3f
ㅒ 3152? 3f
ㅓ 3153? 3f
ㅔ 3154? 3f
ㅕ 3155? 3f
ㅖ 3156? 3f
ㅗ 3157? 3f
ㅘ 3158? 3f
ㅙ 3159? 3f
ㅚ 315a? 3f
ㅛ 315b? 3f
ㅜ 315c? 3f
ㅝ 315d? 3f
ㅞ 315e? 3f
ㅟ 315f? 3f
ㅠ 3160? 3f
ㅡ 3161? 3f
ㅢ 3162? 3f
ㅣ 3163? 3f

The second part of Column 4 (hangul filler and archaic jamo) are included in IE’s Johab implementation, but they are mapped to the Hangul Jamo block rather than the Hangul Compatibility Jamo block unlike in all other implementations in IE and other browsers. The following table shows the alternative mappings for all the 43 characters concerned:

IE
Johab
ㅤ 3164ᅟ 115f
ㅥ 3165ᄔ 1114
ㅦ 3166ᄕ 1115
ㅧ 3167ᇇ 11c7
ㅨ 3168ᇈ 11c8
ㅩ 3169ᇌ 11cc
ㅪ 316aᇎ 11ce
ㅫ 316bᇓ 11d3
ㅬ 316cᇗ 11d7
ㅭ 316dᇙ 11d9
ㅮ 316eᄜ 111c
ㅯ 316fᇝ 11dd
ㅰ 3170ᇟ 11df
ㅱ 3171ᄝ 111d
ㅲ 3172ᄞ 111e
ㅳ 3173ᄠ 1120
ㅴ 3174ᄢ 1122
ㅵ 3175ᄣ 1123
ㅶ 3176ᄧ 1127
ㅷ 3177ᄨ 1128
ㅸ 3178ᄫ 112b
ㅹ 3179ᄬ 112c
ㅺ 317aᄭ 112d
ㅻ 317bᄮ 112e
ㅼ 317cᄯ 112f
ㅽ 317dᄲ 1132
ㅾ 317eᄶ 1136
ㅿ 317fᅀ 1140
ㆀ 3180ᅇ 1147
ㆁ 3181ᅌ 114c
ㆂ 3182ᅅ 1145
ㆃ 3183ᅆ 1146
ㆄ 3184ᅗ 1157
ㆅ 3185ᅘ 1158
ㆆ 3186ᅙ 1159
ㆇ 3187ᆄ 1184
ㆈ 3188ᆅ 1185
ㆉ 3189ᆈ 1188
ㆊ 318aᆑ 1191
ㆋ 318bᆒ 1192
ㆌ 318cᆔ 1194
ㆍ 318dᆞ 119e
ㆎ 318eᆡ 11a1

Supplementary hangul

KS X 1001 contains only 2,350 of the possible 11,172 hangul (19 × 21 × (27+1): 19 initial consonants, 21 vowels, 27 final consonants or none) as precomposed characters. The full set of 11,172 hangul can be encoded as four characters (eight bytes) from Column 4: 1) filler, 2) initial consonant, 3) vowel and 4) final consonant or filler. Unfortunately, this encoding is not widely supported.

Unified Hangul Code enumerates, in Unicode order, all the 8,822 hangul missing from KS X 1001 (PDF).

Johab (조합/組合, ‘combining’) instead encodes the full set of 11,172 hangul and 67 jamo (19 + 21 + 27) systematically.

Firefox’s implementation maps Johab jamo to Unicode characters in the Hangul Compatibility Jamo block rather than the Hangul Jamo block. This makes it possible to keep Column 15 in KS X 1001 unmodified without introducing duplicates (cf. description of IE’s Johab implementation above), but causes problems for consonants which are used both syllable-initially and syllable-finally since positional variants are unified in the Hangul Compatibility Jamo block, which has been solved in Firefox by excluding the final variant when both exist. The following table shows how all the 67 Johab jamo are mapped in Firefox.

Firefox
Johab
ᆨ 11a8
ᆩ 11a9
ᆪ 11aaㄳ 3133
ᆫ 11ab
ᆬ 11acㄵ 3135
ᆭ 11adㄶ 3136
ᆮ 11ae
ᆯ 11af
ᆰ 11b0ㄺ 313a
ᆱ 11b1ㄻ 313b
ᆲ 11b2ㄼ 313c
ᆳ 11b3ㄽ 313d
ᆴ 11b4ㄾ 313e
ᆵ 11b5ㄿ 313f
ᆶ 11b6ㅀ 3140
ᆷ 11b7
ᆸ 11b8
ᆹ 11b9ㅄ 3144
ᆺ 11ba
ᆻ 11bb
ᆼ 11bc
ᆽ 11bd
ᆾ 11be
ᆿ 11bf
ᇀ 11c0
ᇁ 11c1
ᇂ 11c2
ᅡ 1161ㅏ 314f
ᅢ 1162ㅐ 3150
ᅣ 1163ㅑ 3151
ᅤ 1164ㅒ 3152
ᅥ 1165ㅓ 3153
ᅦ 1166ㅔ 3154
ᅧ 1167ㅕ 3155
ᅨ 1168ㅖ 3156
ᅩ 1169ㅗ 3157
ᅪ 116aㅘ 3158
ᅫ 116bㅙ 3159
ᅬ 116cㅚ 315a
ᅭ 116dㅛ 315b
ᅮ 116eㅜ 315c
ᅯ 116fㅝ 315d
ᅰ 1170ㅞ 315e
ᅱ 1171ㅟ 315f
ᅲ 1172ㅠ 3160
ᅳ 1173ㅡ 3161
ᅴ 1174ㅢ 3162
ᅵ 1175ㅣ 3163
ᄀ 1100ㄱ 3131
ᄁ 1101ㄲ 3132
ᄂ 1102ㄴ 3134
ᄃ 1103ㄷ 3137
ᄄ 1104ㄸ 3138
ᄅ 1105ㄹ 3139
ᄆ 1106ㅁ 3141
ᄇ 1107ㅂ 3142
ᄈ 1108ㅃ 3143
ᄉ 1109ㅅ 3145
ᄊ 110aㅆ 3146
ᄋ 110bㅇ 3147
ᄌ 110cㅈ 3148
ᄍ 110dㅉ 3149
ᄎ 110eㅊ 314a
ᄏ 110fㅋ 314b
ᄐ 1110ㅌ 314c
ᄑ 1111ㅍ 314d
ᄒ 1112ㅎ 314e

ISO-2022 encoding

PDF. The Korean version of ISO 2022 is defined in RFC 1557.

MIME charset label: iso-2022-kr.

ISO646-US (ASCII) / ISO646-KR (KS-Roman)

G0 encodes ISO646.

Only the positions 0x5C and 0x7E differ between ISO646-US and ISO646-KR. The following table summarises standards and implementations:

ISO646IESafariFirefoxOpera
USKR
0x5C\ 5C₩ 20A9₩ 5C₩ 20A9\ 5C\ 5C
0x7E~ 7E¯ AF~ 7E~ 7E~ 7E~ 7E
Note the character ‘₩ 5C’ found in IE, which appears as a won sign but has the Unicode scalar value of a backslash.

Opera includes ISO 8859/1 characters.

KS X 1001

G1 encodes KS X 1001. None of the browsers supports 8-byte hangul.

Only Opera requires the designation sequence ESC $ ) C to appear before this character set is selected by means of a shift out character (SO, 0x0E).

EUC encoding

PDF. MIME charset label: euc-kr.

ISO646-US (ASCII) / ISO646-KR (KS-Roman)

Code set 0 (7-bit characters) encodes ISO646-US with the exception that IE displays the backslash as a won sign.

KS X 1001

Code set 1 encodes KS X 1001 with the UHC extension.

Firefox supports the 8-byte encoding of hangul.

Johab encoding

PDF. MIME charset labels: johab (IE), x-johab (Firefox).

ISO646-US (ASCII) / ISO646-KR (KS-Roman)

The one-byte range encodes ISO646-US (IE displays the backslash as a won sign): johab, x-johab.

KS X 1001

The two-byte range encodes the symbol and hanja parts of KS X 1001 as well as the complete set of Johab hangul. johab x-johab

Ad­ver­tise­ments

Contact

temp-onj8@coq.no