Tags Computer Programming Graphics Cards Computer Keyboards

Animals & Plants Arts & Entertainment Auto Beauty & Health Books and Literature Business Electronics Engineering & Technology Food & Drink History Hobbies Jobs & Education Law & Government Math People & Society Science Social Studies Sports Travel & Places

Subjects>Science>Engineering

How does unicode relate to ascii?

Updated: 12/22/2022

Wiki User

∙ 9y ago

Best Answer

UNICODE and ASCII are related in that they are both used to exchange information, primarily in the form of text (plain-text as opposed to typography). That is, when we want exchange the character 'A' between systems we do not transmit the entire bitmap for the glyph we simply transmit a character code. Both systems must know what each character code represents and this is achieved through a "code page" which maps individual character codes to their respective glyphs. In this way we minimise the amount of information that needs to be transmitted.

The problem is that different languages use different symbols. The letter 'A' is a Latin symbol which is fairly common to many European languages, however not all languages use the Latin alphabet. In order to cater for every language worldwide we'd need to encode more than 110,000 symbols which would require at least 17 bits per character.

Prior to multi-language support, most information was transmitted in English. To cater for this we needed to encode 26 symbols for the upper case alphabet, 26 for the lower case alphabet, 10 digits, a handful of common punctuation marks such as periods, commas, parenthesis, and so on, plus some common symbols such as %, & and @. Transmitting information to a printer, screen or some other device also required some non-printing control characters, such as carriage return, line feed whitespace, transmission begin/end and so on. Thus the American Standard Code for Information Interchange (ASCII) decided that 128 characters was sufficient to encode the entire Latin alphabet plus control codes using just 7 bits and all systems were standardised to accommodate this encoding. Although most systems today use an 8 bit byte, many older printers and other transmission protocols used just 7 bits to maintain the highest possible rate of throughput. Some even used specialised encodings with fewer bits (and fewer symbols) to speed up transfers even further. Each encoding therefore required its own standard, many of which were defined by ASCII.

To cater for more specialised symbols and to provide support for some foreign languages, an 8 bit extended character set was used, yielding an additional 128 symbols. The first 127 characters in every ASCII code page are always the same, but the extended character set could be switched simply by changing the code page. However, only one code page can be in effect at any one time, so systems were not only limited to 256 characters total, they had to use the same code page to ensure extended character information was correctly decoded.

Today, when we speak of ASCII, we are generally speaking about the ISO/IEC 8859 standard (or code page 8859). The majority of programming languages utilise this standard to define the language's symbols, thus making it possible to transmit the same source code between machines.

UNICODE addresses the limitations of 8-bit ASCII by using more bits per character. A key aspect of UNICODE is that the first 128 characters must always match the 7 bit standard ASCII encodings, regardless of how many bits are employed in the actual encoding. While it would be relatively simple to encode every symbol used by every language using just 17 bits, this limits the ability to expand the number of characters beyond 131,072. More importantly, it is helpful to space the symbols out such that the most-significant bits in the encoding can be used to more easily identify a particular set of symbols. Thus UNICODE uses 32-bits per character, with individual character sets (or code pages) spread throughout the range.

This immediately puts an overhead upon English-based text transmissions because we'd have to transmit four times as many bits as we would with the ASCII equivalent. To get around this, UNICODE introduced variable-width encodings, such that the first 128 characters were encoded using 8 bits, exactly mirroring ASCII when the most significant bit is 0. If the most significant bit were 1, however, this would indicate that the symbol was encoded using anything from 2 to 6 bytes, depending on the state of other high-order bits. Each of these multi-byte encodings is then mapped to a 32-bit UNICODE character.

UTF8 is the most common form of UNICODE in use today because it has no overhead compared to 8 bit standard ASCII and, for most transmissions, has less overhead than 32-bit UNICODE (also known as UTF32). UTF16 uses 16 bits throughout but doesn't cover the complete range of UNICODE encodings.

Wiki User

∙ 9y ago

This answer is: