Ref.: ISO-IR 013 The Japanese KATAKANA graphic set of characters / Jeu de caractères graphiques japonais KATAKANA.
This small character set contains only katakana (apart from a few Japanese punctuation marks), mapped to halfwidth characters in Unicode (PDF).
Ref.: ISO-IR 168 Japanese Graphic Character Set for Information Interchange.
Unihan J0 = Jis0 enumerates the 2,965 JIS Level 1 kanji (Chinese characters or 漢字) in Rows 16–47, the 3,390 JIS Level 2 kanji in Rows 48–84 and the single kanji 仝 in Row 1 (considered a symbol rather than a kanji by this Japanese standard), 6,356 kanji in total (PDF — kanji).
Apart from 仝 (mentioned above), Rows 1–8 consist of non-kanji, more specifically a set of 523 alphabetic characters (Hiragana, Katakana, Latin, Greek and Cyrillic), miscellaneous symbols and line-drawing elements (PDF — non-kanji).
Firefox has different Unicode mappings for seven characters:
|― 2015||— 2014|
|～ FF5E||〜 301C|
|∥ 2225||‖ 2016|
|－ FF0D||− 2212|
|￠ FFE0||¢ A2|
|￡ FFE1||£ A3|
|￢ FFE3||¬ AC|
Unihan IBMJapan enumerates 360 kanji known as ‘IBM Selected Kanji’ encoded in Rows 115–118 (PDF — IBM kanji). There is also a set of 28 ‘IBM Selected Non-kanji’ encoded in Row 115 (PDF — IBM non-kanji). This extension is only available in the Shift-JIS encoding since ISO-2022 and EUC do not include rows beyond 94.
Nippon Electronics Corporation (NEC) has defined an alternative encoding of the IBM extension with the kanji in Rows 89–92 (PDF — IBM–NEC kanji). and the non-kanji split between Row 92 and Row 13, to which has been added a number of additional non-kanji and ligatures (PDF — IBM–NEC non-kanji).
Ref.: ISO-IR 159 Supplementary Japanese Graphic Character Set for Information Interchange.
Unihan J1 = Jis1 enumerates the 5,801 Supplemental JIS kanji (Chinese characters or 漢字) in Rows 16–77 (PDF — kanji).
Rows 2–11 contain 266 non-kanji (accented Latin, Greek and Cyrillic letters, diacritics, punctuation and a few other symbols) (PDF — non-kanji).
Firefox and Opera have different Unicode mappings for one character:
|～ FF5E||~ 7E|
MIME charset label: iso-2022-jp.
For historical reasons unknown to the writer, ESC ( H (technically reserved for ISO646-SE2, a Swedish character set of little use in a Japanese context) may also be used to designate ISO646-JP.
Only the positions 0x5C and 0x7E differ between ISO646-US and ISO646-JP. The following table summarises standards and implementations:
|0x5C||\ 5C||¥ A5||¥ 5C||¥ 5C||¥ A5||\ 5C||\ 5C||¥ A5|
|0x7E||~ 7E||¯ AF||~ 7E||~ 7E||‾ 203E||~ 7E||~ 7E||‾ 203E|
The escape sequences ESC $ @, officially designating the 1978 version, and ESC $ B, officially designating the 1983 version, both select the newer 1990 vintage, whose official two-part escape sequence ESC & @ ESC $ B remains unrecognised.
Implementation error: In IE, the escape sequence ESC $ ( D (see below) also designates this character set, thus making JIS X 0212 inaccessible.
All browsers include IBM and NEC extensions. IE additionally includes 63 half-width katakana (the ones from JIS X 0201) in Row 10, presumably a subset of NEC’s Row 10.
The escape sequence ESC $ ( D works as expected in Firefox and Opera. IE misinterprets it (see above). Safari does not recognise it at all.
All browsers recognise the escape sequence ESC ( I.
Only Internet Explorer recognises SI (shift out) as a method of switching to half-width katakana.
Opera and Internet Explorer interpret 8-bit characters as half-width katakana. Opera must be in ISO646-JP mode.
Opera interprets 8-bit characters in ISO646-US according to ISO 8859/1.
Internet Explorer often interprets 8-bit bytes and other undefined bytes according to (or inspired by) Shift-JIS. A hybrid of ISO-2022-JP and Shift-JIS might not be a bad idea to deal with mislabelled material, but the actual implementation is much more complex than seems to be necessary. Details may be added later.
Code set 0 (7-bit characters) is assigned to ISO646-US. Safari and IE displays the backslash as a yen sign (cf. ISO-2022-JP above).
Code set 1 (unprefixed 8-bit characters) encodes JIS X 0208-1990 with NEC extensions.
Code set 2 (8-bit characters prefixed by SS2, 0x8E) encodes half-width katakana.
Safari includes the following characters in the range 0xE0–0xE4: ¢, £, ¬, ¥, ~.
Code set 3 (8-bit characters prefixed by SS3, 0x8F) encodes JIS X 0212-1990. Firefox and Opera provide complete implementations of this code set, whereas Safari’s implementation only covers around five per cent of the characters and Internet Explorer has no implementation at all.
7-bit bytes are assigned to ISO646-US characters. Safari and IE displays the backslash as a yen sign (exactly as for EUC-JP).
8-bit bytes defined in JIS X 0201 encode half-width katakana.