UTF-EBCDIC

From Freepedia

Unicode
Encodings
UCS
Bi-directional text
BOM
Han unification
Unicode and HTML
Unicode and e-mail

UTF-EBCDIC is an encoding of Unicode that is meant to be EBCDIC friendly so that some older EBCDIC applications can handle some Unicode data. It has similar advantages for existing EBCDIC based systems as UTF-8 has for existing ASCII based systems. Details about UTF-EBCDIC are defined in Unicode Technical Report #16.

To produce the UTF-EBCDIC encoded version of a series of Unicode code points, an encoding based on UTF-8 (known in the specification as UTF-8-Mod) is applied first. The main difference between this encoding and UTF-8 is that it allows code points 80 through 9F (which map to EBCDIC control codes) to be represented as a single byte. In order to achieve this 101XXXXX was used instead of 10XXXXXX as the format for later bytes in the sequence. As this can only hold 5 bits rather than 6, UTF-EBCDIC will generally produce larger output for the same input data than UTF-8.

Finally a reversible byte-byte transform is made on this data using a lookup table to make it as close to normal EBCDIC code pages as feasible.

Since all the stages in the encoding process are reversible, the original sequence of Unicode code points can be recovered exactly from the UTF-EBCDIC encoding.

Generally, this encoding form is rarely used, even on EBCDIC based mainframes for which it was designed. IBM EBCDIC based mainframes, like z/OS, usually use UTF-16 for complete Unicode support. For example, DB2 UDB, COBOL, PL/I, Java and the IBM XML toolkit support UTF-16 on IBM mainframes.

External links



Views
Personal tools
Similar Links