| Name | Tables | ∆ |
|---|---|---|
| dos-864 (ibm864) | PDF JS | |
| dos-866 (ibm866) | PDF JS | |
| iso-8859-2 | PDF JS | |
| iso-8859-3 | PDF JS | |
| iso-8859-4 | PDF JS | |
| iso-8859-5 | PDF JS | |
| iso-8859-6 | PDF JS | |
| iso-8859-7 | PDF JS | |
| iso-8859-8 | PDF JS | |
| iso-8859-10 | PDF JS | |
| iso-8859-13 | PDF JS | |
| iso-8859-14 | PDF JS | |
| iso-8859-15 | PDF JS | |
| iso-8859-16 | PDF JS | |
| koi8-r | PDF JS | |
| koi8-u | PDF JS | |
| macintosh | PDF JS | |
| windows-874 | PDF JS | |
| windows-1250 | PDF JS | |
| windows-1251 | PDF JS | |
| windows-1252 | PDF JS | |
| windows-1253 | PDF JS | |
| windows-1254 | PDF JS | |
| windows-1255 | PDF JS | |
| windows-1256 | PDF JS | |
| windows-1257 | PDF JS | |
| windows-1258 | PDF JS | |
| mac-cyrillic (x-mac-cyrillic) | PDF JS |
The inclusion of Code page 864 (Arabic) is somewhat surprising. No proper reference has been found, a few positions differ between implementations, and it is unclear whether the mapping to presentation forms is essential or incidental.
It may be that KOI8-RU should be included instead of KOI8-U.
(Previous versions of Van Kesteren’s draft included spurious mappings for a few undefined positions in Windows-874 and Windows-1253. This has now been corrected.)
The character set tables whose name starts with ‘u-’ are derived from Unihan, whereas ‘o-’ indicates a complementary table listing characters missing from Unihan.
Character sets: u-g0 o-g0 o-gbk1 €.
IE decodes 8-bit bytes according to (a version of) EUC/GBK.
A (circle) character sets:
u-g0
o-g0
o-gbk1
€.
B (diamond) character set:
u-gbk3.
C (square) character sets:
u-gbk4
u-g9
o-g9
o-gbk5.
GB18030 includes additional characters not mentioned above.
A (circle) character sets:
u-big5-1
u-big5-2
o-big5-1
€
eten1
eten2
eten1-hk.
B (diamond) character sets:
u-h
o-h
o-h-comp.
ESC $ @ and ESC $ B character sets:
u-j0
o-j0
u-nec
o-nec.
ESC $ ( D character sets:
u-j1
o-j1.
ESC ( I and SI and 8-bit (single-octet) character set:
jis-x-0201.
Non-Japanese character sets are not mentioned here.
IE decodes 8-bit bytes according to (a version of) Shift-JIS.
Character sets 1 (unprefixed):
u-j0
o-j0
u-nec
o-nec.
Character set 2 (SS2 prefix, single-octet):
jis-x-0201.
Character sets 3 (SS3 prefix):
u-j1
o-j1.
Characters set (single-octet):
jis-x-0201.
Characters sets:
u-j0
o-j0
u-ibmjapan
o-ibmjapan
u-nec
o-nec
Character sets: u-ksc0 o-k0 wansung.
8-byte hangul usually not supported.
IE decodes 8-bit bytes according to (a version of) EUC-KR.
A (circle) character sets:
u-ksc0
o-k0
wansung.
B (diamond) character set:
uhc.
8-byte hangul?