Windows encodings

The Windows encodings listed below define the upper half of eight-bit encoding vectors (i.e., positions 128..255), the lower half 0..127 being mapped to the corresponding Unicode range U+0..U+7F.

Otherwise undefined bytes in the range 128..159 are mapped to control characters U+80..U+9F by Microsoft software, an approach generally followed by others. On the other hand, Microsoft’s mapping of undefined bytes in the range 160..255 to Unicode private-use characters is probably best avoided.

Windows-1250 (Central Europe)

windows-1250254
1
252
3
249
1 + 5
250
5
PDF. Ref.: Microsoft Windows 1250

Windows-1251 (Cyrillic)

windows-1251254
1
252
3
253
1 + 1
254
1
PDF. Ref.: Microsoft Windows 1251

Windows-1252 (Latin I)

windows-1252254
1
252
3
254
1
250
5
iso-8859-1254
1
252
3
254
1
250
5
PDF. Ref.: Microsoft Windows 1252

All browsers take ISO 8859/1 to mean this encoding instead.

Windows-1253 (Greek)

windows-1253253
1 + 1
249
3 + 3
240
1 + 14
241
14
PDF. Ref.: Microsoft Windows 1253

Safari incorrectly maps 0xAA to U+AA. Bug reported to WebKit and to ICU, who has acknowledged the error.

Windows-1254 (Turkish)

windows-1254254
1
252
3
247
1 + 7
248
7
iso-8859-9254
1
252
3
221
1 + 25 + 8
230
25
PDF. Ref.: Microsoft Windows 1254

Some browsers take ISO 8859/9 to mean this encoding instead.

Windows-1255 (Hebrew)

windows-1255254
1
241
3 + 1 + 10
242
1 + 12
243
12
PDF. Ref.: Microsoft Windows 1255

Internet Explorer maps 0xCA to U+5BA, which seems reasonable. This contradicts Microsoft’s standard character-set reference, but matches the ‘best-fit’ mapping.

Windows-1256 (Arabic)

windows-1256254
1
252
3
254
1
255
PDF. Ref.: Microsoft Windows 1256

Windows-1257 (Baltic)

windows-1257254
1
250
3 + 2
244
1 + 10
245
10
PDF. Ref.: Microsoft Windows 1257

Windows-1258 (Vietnam)

windows-1258254
1
252
3
245
1 + 9
246
9
PDF. Ref.: Microsoft Windows 1258

Windows-874 (Thai)

windows-874246
1 + 8
244
3 + 8
231
1 + 23
232
23
iso-8859-11246
1 + 8
244
3 + 8
231
1 + 23
232
23
PDF. Ref.: Microsoft Windows 874

All browsers take ISO 8859/11 to mean this encoding instead.

Ad­ver­tise­ments

Contact

temp-ozn3@coq.no