The KOI-8 family of encodings encode basic Cyrillic letters in such a way that a case-swapped Latin transliteration is obtained if the eighth bit is stripped and the resulting bytes are interpreted as ISO 646 or ASCII.
| koi8-r | ![]() | 254 1 | ![]() | 252 3 | ![]() | 253 1 + 1 | ![]() | 255 |
Roman Czyborra surmises that position 0x95 should actually be U+2022 (general bullet) rather than U+2219 (mathematical bullet operator) as defined in RFC 1489. Whatever the case may be, it is almost certainly too late now to correct this possible error.
| koi8-u | ![]() | 254 1 | ![]() | 250 3 + 2 | ![]() | 253 1 + 1 | ![]() | 255 |
KOI8-U replaces 8 line-drawing characters from columns 10 and 11 in KOI8-R with the Ukrainian letters Є/є, І/і, Ї/ї and Ґ/ґ.
Internet Explorer interprets KOI8-U as KOI8-RU (see below).
| koi8-ru | ![]() | ![]() | 252 3 | ![]() | ![]() | |||
| koi8-u | ![]() | ![]() | 252 3 | ![]() | ![]() |
KOI8-RU can be considered a further modification of KOI8-U obtained by replacing 2 line-drawing characters with the Byelorussian letter Ў/ў. Some sources (e.g., Валентин Нечаев’s exposé) show versions with typographical symbols replacing a number of line-drawing characters in column 9.
Internet Explorer interprets KOI8-U as KOI8-RU.
| iso-ir-111 | ![]() | ![]() | ![]() | 254 1 | ![]() |
NB: The character set referred to as ‘iso-ir-111’ in RFC 1345 is completely different.
| viscii | ![]() | ![]() | ![]() | 254 1 | ![]() | 255 |
VISCII fills columns 8–15 with Vietnamese letters and puts the remaining six relatively infrequent accented uppercase letters in columns 0 and 1. The potentially problematic columns 8 and 9 are reserved for uppercase letters to ensure that a full set of lowercase letters will always be available.
This encoding was designed to be compatible with ISO 8859/1 in the sense that all accented letters present in both be encoded in the same positions, which was indeed the case for version 1.0 of the standard. Unfortunately, the lowercase letter ạ was put at 0xA0, which may be problematic under Windows since this position is normally used for a non-breaking space. To solve the problem, it was swapped with Õ (originally at 0xD5) in version 1.1. Full ISO-8859/1 compatibility could easily have been preserved by choosing another uppercase letter from columns 10–15, so it is not clear why Õ was selected.
| tcvn | ![]() | ![]() | ![]() | ![]() | 255 | |||
| x-viet-tcvn5712 | ![]() | ![]() | ![]() | 254 1 | ![]() |
In addition to a full set of precomposed Vietnamese letters, TCVN-5712 includes non-breaking space at 0xA0 and the five tone marks as combining diacritics at 0xB0–0xB4, which means that twelve letters are relegated to columns 0 and 1. Uppercase and lowercase Vietnamese letters without tone marks are placed at 0xA1–0xAE in column 10, and all lowercase letters with tone marks can be found in columns 11–15.
There is a third variant of this encoding which contains no uppercase letters with tone marks. This is intended to be used with a dedicated uppercase font.
| vps | ![]() | ![]() | ![]() | ![]() | 255 | |||
| x-viet-vps | ![]() | ![]() | ![]() | 254 1 | ![]() |
The Windows version of the Vietnamese Professionals Society’s encoding, as implemented in their fonts, contains, in addition to a full set of precomposed Vietnamese letters, inverted commas (‘ and ’), non-breaking space and an eclectic collection of five lowercase European letters (ß, ö, ü, î and ç), possibly an attempt to satisfy the basic needs for French and German (although the absence of û and ä makes this explanation less plausible). These eight positions are all undefined in the Society’s Unix fonts, however, and it is not clear why they were not used to encode Vietnamese letters, fourteen of which were instead put in columns 0 and 1.
| windows-sami-2 | ![]() | ![]() | ![]() | ![]() | 248 7 |
In Firefox, the MIME charset string t.61 (amongst others) selects this encoding. Errors: Ż replaces Ź, Ņ and ¤ are missing. Alternative mappings: ^ 2C and ~ 7E replace ˆ 2C6 and ˜ 2DC, the visual mapping to ǵ (g with acute) is used instead of the logical mapping to ģ (g with cedilla), and Đ 110 (d with stroke) has been chosen instead of the visually identical Ð D0 (eth). $ is encoded twice, but # is not.
IE has an implementation associated with the MIME charset string x-cp20269. Some characters are missing, and accented letters are handled incorrectly (diacritics will appear either in front of the letter they are meant to modify or above/below the preceding letter).
Compared to T.51, the following characters are missing from the left-hand side: \, ^, `, {, }, ~ and delete. The right-hand side excludes the following: no-breaking space, ‘, “, ←, ↑, →, ↓, ’, ”, —, ¹, ®, ©, ™, ♪, ¬, ¦, ⅛, ⅜, ⅝, ⅞, soft hyphen. Non-spacing underscore used for underling, since deprecated, is included.
IE implements this encoding under the MIME label x-cp20261. As in Firefox’s T.51 implementation, ǵ, ^ and ~ replace ģ, ˆ and ˜. A number of additional accented letters are included.