ISO/IEC 8859-1
From Freepedia
ISO 8859-1, more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859, a standard character encoding originally developed by ISO, but later jointly maintained by ISO and IEC. The standard, when supplemented with additional character assignments, is the basis of two widely-used character maps known as ISO-8859-1 (note the extra hyphen) and Windows-1252.
In June 2004, the ISO/IEC working group responsible for maintaining eight-bit coded character sets disbanded and ceased all maintenance of ISO 8859, including ISO 8859-1, in order to concentrate on the Universal Character Set and Unicode. In computing applications, encodings that provide full UCS support (such as UTF-8 and UTF-16) are finding increasing favor over encodings based on ISO 8859-1.
Contents |
Coverage
ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script. Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following European languages (with a few exceptions due to missing characters, as noted):
- Albanian,
- Basque,
- Catalan,
- Danish,
- Dutch (missing IJ, ij),
- English,
- Estonian (missing Š, š, Ž, ž for loan words),
- Note that Windows-1252 and ISO-8859-15 do contain these
- Faroese,
- French (missing Œ, œ and rare Ÿ),
- Note that Windows-1252 and ISO-8859-15 do contain these
- Finnish (missing Š, š, Ž, ž for loan words),
- Note that Windows-1252 and ISO-8859-15 do contain these
- German,
- Icelandic,
- Irish (new orthography),
- Italian,
- Latin,
- Norwegian (Bokmål and Nynorsk),
- Portuguese,
- Rhaeto-Romanic,
- Scottish,
- Spanish,
- Swedish.
Other languages covered include
Thus, this character encoding is used throughout The Americas, Western Europe, Oceania, and much of Africa. For some languages the correct typographical quotation marks are missing, for only « and » are included.
See also: Alphabets derived from the Latin
Differences with ISO/IEC 8859-15
ISO/IEC 8859-1 suffers from a number of deficiencies, including the omission of a few French letters, a single-glyph representation for the letter IJ, two Finnish letters used for transcription of some foreign names and in a few loanwords, and the lack of common glyphs such as the dagger †, typographic quotation marks and dashes, and other characters. Additionally the euro symbol is not encoded. For this reason, ISO/IEC 8859-15 has been developed as an update of ISO/IEC 8859-1 to add the euro sign and other required additional characters. This required, however, the removal of some less used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: ¤, ¦, ¨, ´, ¸, ¼, ½, and ¾.
Code table
Since all 191 characters encoded by ISO/IEC 8859-1 are graphic and compatible with most web browsers, they can be shown as glyphs in the following table. Since they would not normally be visible, the space character, the no-break space character, and the soft hyphen character are represented by abbreviations for their names. All other characters are represented literally. In the table, the row and column headings indicate the hexadecimal digit combinations to produce the eight-bit code value; e.g., the letter L is at code point 4C (hex), or binary 01001100.
| ISO/IEC 8859-1 | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| x0 | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | xA | xB | xC | xD | xE | xF | |
| 0x | unused | |||||||||||||||
| 1x | ||||||||||||||||
| 2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
| 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
| 8x | unused | |||||||||||||||
| 9x | ||||||||||||||||
| Ax | NBSP | ¡ | ¢ | £ | ¤ | ¥ | ¦ | § | ¨ | © | ª | « | ¬ | SHY | ® | ¯ |
| Bx | ° | ± | ² | ³ | ´ | µ | ¶ | · | ¸ | ¹ | º | » | ¼ | ½ | ¾ | ¿ |
| Cx | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï |
| Dx | Ð | Ñ | Ò | Ó | Ô | Õ | Ö | × | Ø | Ù | Ú | Û | Ü | Ý | Þ | ß |
| Ex | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï |
| Fx | ð | ñ | ò | ó | ô | õ | ö | ÷ | ø | ù | ú | û | ü | ý | þ | ÿ |
Code values 00-1F, 7F, and 80-9F are not assigned to characters by ISO/IEC 8859-1.
ISO 8859-1 was based on the Multinational Character Set used by Digital Equipment Corporation in the popular VT220 terminal. It was developed within ECMA, the European Computer Manufacturers Association, and published along with ISO 8859-2, ISO 8859-3, and ISO 8859-4 as part of the specification ECMA-94, by which name it is still sometimes known.
Related character maps
The ISO/IEC 8859-1 standard has long been the basis of a number of character maps, also known as character sets, charsets, or code pages, the most popular being ISO-8859-1 (note the extra hyphen) and Windows-1252. Both of these maps are a superset of ISO/IEC 8859-1; they supplement the standard's 191 character assignments by mapping additional characters to at least some portion of the code value ranges 00-1F, 7F, and 80-9F. Mac-Roman is another popular character map that has some similarities to the others, but is not based on ISO/IEC 8859-1.
The distinction between ISO-8859-1, Windows-1252, Mac-Roman, and the ISO/IEC 8859-1 standard is a common source of confusion among computer programmers and other users of the Internet.
ISO-8859-1
In 1992, the IANA registered the character map ISO-8859-1 (note the extra hyphen), a superset of ISO/IEC 8859-1, for use on the Internet. This map assigns control characters to the code values 00-1F, 7F, and 80-9F. It thus provides for 256 characters via every possible 8-bit value. See ISO-8859-1.
Windows-1252
See Windows-1252.
Similar character sets
- Main article: Western Latin character sets (computing)
The Apple Macintosh computer introduced a character encoding called Mac Roman, or Mac-Roman, in 1984. It was meant to be suitable for Western European desktop publishing. It is a superset of ASCII, like ISO-8859-1, and has most of the characters that are in IS0-8859-1 but in a totally different arrangement. A later version, registered with IANA as "Macintosh", replaced the generic currency symbol with the euro symbol. The few printable characters that are in ISO 8859-1 but not in this set are often a source of trouble when editing text on websites using older Macintosh browsers.
DOS had code page 850, which had all printable characters that ISO-8859-1 had (albeit in a totally different arrangement) plus the most widely used graphics characters from code page 437.
External links
- ISO/IEC 8859-1:1998 final draft of the standard (PDF)
- Windows Codepages
- Differences between ANSI, ISO-8859-1 and MacRoman Character Sets
- The Letter Database
- ASCII - ISO 8859-1 Table with HTML Entity Names
- The ISO 8859 Alphabet Soup - Roman Czyborra's history of ISO character sets



