DOS encodings

The DOS encodings listed below define the upper half of eight-bit encoding vectors (i.e., positions 128..255), the lower half 0..127 being mapped to the corresponding Unicode range U+0..U+7F (with one possible exception: 0x25 in code page 864 should perhaps map to an Arabic per-cent sign rather than the standard one). Safari’s implementation of most of these encodings appears to be based on tables from IBM mapping 0x1A to U+001C, 0x1C to U+007F and 0x7F to U+001A (‘big blue control code carousel’).

The main reference is Nadine Kano’s Developing International Software for Windows 95 and Windows NT, 1st Edition, 1995, referred to as DIS1 below. Many encodings are also documented on the Microsoft Developer Network website.

Most of these encodings see little use today. Firefox supports around a third of the encodings below, whereas only one has been implemented in Opera. Safari and Internet Explorer, both of which use external libraries, provide more complete support.

Code page 437: MS-DOS Latin US

ibm437250
1 + 1 + 3
252
3
PDF. Refs: DIS1 p. 488, MSDN.

Safari substitutes (real) Greek mu for micro sign.

Code page 708: MS-DOS Arabic ASMO

asmo-708245
3 + 7
PDF. Ref.: DIS1 p. 489.

This encoding is an almost compatible superset of ISO 8859/6 (all Arabic letters in the same positions, only one incompatible assignment, adding line-drawing characters and lowercase French accented lowercase vowels). Only Internet Explorer actually supports this encoding; the others take ‘asmo-708’ to mean ISO 8859/6.

Code page 720: MS-DOS Arabic

dos-720252
3
PDF. Ref.: MSDN.

Support for this encoding only confirmed in Internet Explorer. Safari incorrectly takes ‘dos-720’ to mean code page 864.

Code page 737: MS-DOS Greek

ibm737254
1
252
3
PDF. Refs: DIS1 p. 490, MSDN.

Code page 775: MS-DOS Baltic Rim

ibm775251
1 + 3
252
3
PDF. Refs: DIS1 p. 491, MSDN.

Code page 850: MS-DOS Latin 1

ibm850251
1 + 3
252
3
253
1 + 1
PDF. Refs: DIS1 p. 492, MSDN.

Firefox decodes this as Code page 858 (Latin 1 + Euro)

Code page 851: MS-DOS Greek 1

ibm851249
1 + 1 + 1 + 3
PDF. Ref.: DIS1 p. 493.

Safari substitutes acute accent for the visually similar Greek tonos. No support in Internet Explorer, which suggests that this particular MS-DOS encoding is not all that popular. Code page 869 (Greek 2) decodes all Greek letters in this encoding correctly.

Code page 852: MS-DOS Latin 2

ibm852250
1 + 1 + 3
252
3
253
1 + 1
PDF. Refs: DIS1 p. 494, MSDN.

There appears to be some confusion surrounding position 0xAA: Microsoft (and Internet Explorer) assigns a not sign (U+00AC), IBM’s original table left it undefined (as does Safari), and IBM’s updated table (and Firefox) maps it to the euro sign.

Code page 855: MS-DOS Cyrillic

ibm855251
1 + 3
252
3
254
1
PDF. Refs: DIS1 p. 495 (0xFD undefined), MSDN.

Code page 857: MS-DOS Turkish

ibm857251
1 + 3
249
3 + 3
251
1 + 3
PDF. Refs: DIS1 p. 496, MSDN.

Firefox uses IBM’s updated table with euro and incorrectly assigns characters already encoded elsewhere to two (other) undefined bytes.

Code page 858: MS-DOS Latin 1 + Euro

ibm00858251
1 + 3
252
3
ibm850254
1
PDF. Ref. MSDN

This encoding is almost identical to Code page 850 (Latin 1), the only difference being the euro sign replacing dotless i. Firefox incorrectly takes ibm850 to mean this encoding.

Code page 860: MS-DOS Portuguese

ibm860250
1 + 1 + 3
252
3
PDF. Ref.: DIS1 p. 497.

Based on Code page 437. Safari substitutes (real) Greek mu for micro sign.

Code page 861: MS-DOS Icelandic

ibm861250
1 + 1 + 3
252
3
PDF. Ref.: DIS1 p. 498.

Based on Code page 437. Safari substitutes (real) Greek mu for micro sign.

Code page 862: MS-DOS Hebrew

ibm862250
1 + 1 + 3
252
3
242
1 + 12
PDF. Refs: DIS1 p. 499, MSDN.

Safari substitutes (real) Greek mu for micro sign. Firefox actually implements code page 867 instead.

Code page 863: MS-DOS French Canada

ibm863250
1 + 1 + 3
252
3
PDF. Ref.: DIS1 p. 500.

Based on Code page 437. Safari substitutes (real) Greek mu for micro sign.

Code page 864: [Arabic]

ibm864245
1 + 4 + 2 + 3
249
3 + 3
247
1 + 7
PDF. Ref. missing.

Firefox uses IBM’s updated table with euro and incorrectly assigns characters already encoded elsewhere to four or five (other) undefined bytes. There is some further variation.

Code page 865: MS-DOS Nordic

ibm865250
1 + 1 + 3
252
3
PDF. Ref.: DIS1 p. 501.

Based on Code page 437 (only differs in three positions). Safari substitutes (real) Greek mu for micro sign.

Code page 866: MS-DOS Cyrillic CIS 1

ibm866251
1 + 3
252
3
254
1
255
PDF. Refs: DIS1 p. 502, MSDN.

Code page 869: MS-DOS Greek 2

ibm869240
1 + 2 + 9 + 3
252
3
PDF. Ref.: DIS1 p. 503.

Based on Code page 851 (Greek 1); removes accented Latin letters and adds some symbols and Greek uppercase Ϊ and Ϋ. Safari substitutes acute accent for the visually similar Greek tonos and uses U+0387 instead of the canonically equivalent U+00B7 middle dot.

Ad­ver­tise­ments

Contact

temp-owti@coq.no