Ref.: ISO-IR 013 The Japanese KATAKANA graphic set of characters / Jeu de caractères graphiques japonais KATAKANA.
This small character set contains only katakana (apart from a few Japanese punctuation marks), mapped to halfwidth characters in Unicode (PDF).
Ref.: ISO-IR 168 Japanese Graphic Character Set for Information Interchange.
Unihan J0 = Jis0 enumerates the 2,965 JIS Level 1 kanji (Chinese characters or 漢字) in Rows 16–47, the 3,390 JIS Level 2 kanji in Rows 48–84 and the single kanji 仝 in Row 1 (considered a symbol rather than a kanji by this Japanese standard), 6,356 kanji in total (PDF — kanji).
Apart from 仝 (mentioned above), Rows 1–8 consist of non-kanji, more specifically a set of 523 alphabetic characters (Hiragana, Katakana, Latin, Greek and Cyrillic), miscellaneous symbols and line-drawing elements (PDF — non-kanji).
Firefox has different Unicode mappings for seven characters:
| Others | Firefox |
|---|---|
| ― 2015 | — 2014 |
| ~ FF5E | 〜 301C |
| ∥ 2225 | ‖ 2016 |
| - FF0D | − 2212 |
| ¢ FFE0 | ¢ A2 |
| £ FFE1 | £ A3 |
| ¬ FFE3 | ¬ AC |
Unihan IBMJapan enumerates 360 kanji known as ‘IBM Selected Kanji’ encoded in Rows 115–118 (PDF — IBM kanji). There is also a set of 28 ‘IBM Selected Non-kanji’ encoded in Row 115 (PDF — IBM non-kanji). This extension is only available in the Shift-JIS encoding since ISO-2022 and EUC do not include rows beyond 94.
Nippon Electronics Corporation (NEC) has defined an alternative encoding of the IBM extension with the kanji in Rows 89–92 (PDF — IBM–NEC kanji). and the non-kanji split between Row 92 and Row 13, to which has been added a number of additional non-kanji and ligatures (PDF — IBM–NEC non-kanji).
Ref.: ISO-IR 159 Supplementary Japanese Graphic Character Set for Information Interchange.
Unihan J1 = Jis1 enumerates the 5,801 Supplemental JIS kanji (Chinese characters or 漢字) in Rows 16–77 (PDF — kanji).
Rows 2–11 contain 266 non-kanji (accented Latin, Greek and Cyrillic letters, diacritics, punctuation and a few other symbols) (PDF — non-kanji).
Firefox and Opera have different Unicode mappings for one character:
| Firefox | Opera |
|---|---|
| ~ FF5E | ~ 7E |
MIME charset label: iso-2022-jp.
In theory, the escape sequence ESC ( B designates ISO646-US and ESC ( J designates ISO646-JP.
For historical reasons unknown to the writer, ESC ( H (technically reserved for ISO646-SE2, a Swedish character set of little use in a Japanese context) may also be used to designate ISO646-JP.
Only the positions 0x5C and 0x7E differ between ISO646-US and ISO646-JP. The following table summarises standards and implementations:
| ISO646 | IE | Safari | Firefox | Opera | ||||
|---|---|---|---|---|---|---|---|---|
| US | JP | B/J/H | B | J/H | J | B | J | |
| 0x5C | \ 5C | ¥ A5 | ¥ 5C | ¥ 5C | ¥ A5 | \ 5C | \ 5C | ¥ A5 |
| 0x7E | ~ 7E | ¯ AF | ~ 7E | ~ 7E | ‾ 203E | ~ 7E | ~ 7E | ‾ 203E |
The escape sequences ESC $ @, officially designating the 1978 version, and ESC $ B, officially designating the 1983 version, both select the newer 1990 vintage, whose official two-part escape sequence ESC & @ ESC $ B remains unrecognised.
Implementation error: In IE, the escape sequence ESC $ ( D (see below) also designates this character set, thus making JIS X 0210 inaccessible.
All browsers include IBM and NEC extensions. IE additionally includes 63 half-width katakana (the ones from JIS X 0201) in Row 10, presumably a subset of NEC’s Row 10.
The escape sequence ESC $ ( D works as expected in Firefox and Opera. IE misinterprets it (see above). Safari does not recognise it at all.
All browsers recognise the escape sequence ESC ( I.
Only Internet Explorer recognises SI (shift out) as a method of switching to half-width katakana.
Opera and Internet Explorer interpret 8-bit characters as half-width katakana. Opera must be in ISO646-JP mode.
Firefox recognises the escape sequences ESC . A for ISO 8859-1 (Latin-1), ESC . F for ISO 8859-7 (Greek), etc. [add Chinese and Korean here].
Opera interprets 8-bit characters in ISO646-US according to ISO 8859/1.
Internet Explorer often interprets 8-bit bytes and other undefined bytes according to (or inspired by) Shift-JIS. A hybrid of ISO-2022-JP and Shift-JIS might not be a bad idea to deal with mislabelled material, but the actual implementation is much more complex than seems to be necessary. Details may be added later.
Code set 0 (7-bit characters) is assigned to ISO646-US. Safari and IE displays the backslash as a yen sign (cf. ISO-2022-JP above).
Code set 1 (unprefixed 8-bit characters) encodes JIS X 0208-1990 with NEC extensions.
Code set 2 (8-bit characters prefixed by SS2, 0x8E) encodes half-width katakana.
Safari includes the following characters in the range 0xE0–0xE4: ¢, £, ¬, ¥, ~.
Code set 3 (8-bit characters prefixed by SS3, 0x8F) encodes JIS X 0212-1990. Firefox and Opera provide complete implementations of this code set, whereas Safari’s implementation only covers around five per cent of the characters and Internet Explorer has no implementation at all.
7-bit bytes are assigned to ISO646-US characters. Safari and IE displays the backslash as a yen sign (exactly as for EUC-JP).
8-bit bytes defined in JIS X 0201 encode half-width katakana.