answersLogoWhite

0

How does unicode relate to ascii?

Updated: 12/22/2022
User Avatar

Wiki User

9y ago

Best Answer

UNICODE and ASCII are related in that they are both used to exchange information, primarily in the form of text (plain-text as opposed to typography). That is, when we want exchange the character 'A' between systems we do not transmit the entire bitmap for the glyph we simply transmit a character code. Both systems must know what each character code represents and this is achieved through a "code page" which maps individual character codes to their respective glyphs. In this way we minimise the amount of information that needs to be transmitted.

The problem is that different languages use different symbols. The letter 'A' is a Latin symbol which is fairly common to many European languages, however not all languages use the Latin alphabet. In order to cater for every language worldwide we'd need to encode more than 110,000 symbols which would require at least 17 bits per character.

Prior to multi-language support, most information was transmitted in English. To cater for this we needed to encode 26 symbols for the upper case alphabet, 26 for the lower case alphabet, 10 digits, a handful of common punctuation marks such as periods, commas, parenthesis, and so on, plus some common symbols such as %, & and @. Transmitting information to a printer, screen or some other device also required some non-printing control characters, such as carriage return, line feed whitespace, transmission begin/end and so on. Thus the American Standard Code for Information Interchange (ASCII) decided that 128 characters was sufficient to encode the entire Latin alphabet plus control codes using just 7 bits and all systems were standardised to accommodate this encoding. Although most systems today use an 8 bit byte, many older printers and other transmission protocols used just 7 bits to maintain the highest possible rate of throughput. Some even used specialised encodings with fewer bits (and fewer symbols) to speed up transfers even further. Each encoding therefore required its own standard, many of which were defined by ASCII.

To cater for more specialised symbols and to provide support for some foreign languages, an 8 bit extended character set was used, yielding an additional 128 symbols. The first 127 characters in every ASCII code page are always the same, but the extended character set could be switched simply by changing the code page. However, only one code page can be in effect at any one time, so systems were not only limited to 256 characters total, they had to use the same code page to ensure extended character information was correctly decoded.

Today, when we speak of ASCII, we are generally speaking about the ISO/IEC 8859 standard (or code page 8859). The majority of programming languages utilise this standard to define the language's symbols, thus making it possible to transmit the same source code between machines.

UNICODE addresses the limitations of 8-bit ASCII by using more bits per character. A key aspect of UNICODE is that the first 128 characters must always match the 7 bit standard ASCII encodings, regardless of how many bits are employed in the actual encoding. While it would be relatively simple to encode every symbol used by every language using just 17 bits, this limits the ability to expand the number of characters beyond 131,072. More importantly, it is helpful to space the symbols out such that the most-significant bits in the encoding can be used to more easily identify a particular set of symbols. Thus UNICODE uses 32-bits per character, with individual character sets (or code pages) spread throughout the range.

This immediately puts an overhead upon English-based text transmissions because we'd have to transmit four times as many bits as we would with the ASCII equivalent. To get around this, UNICODE introduced variable-width encodings, such that the first 128 characters were encoded using 8 bits, exactly mirroring ASCII when the most significant bit is 0. If the most significant bit were 1, however, this would indicate that the symbol was encoded using anything from 2 to 6 bytes, depending on the state of other high-order bits. Each of these multi-byte encodings is then mapped to a 32-bit UNICODE character.

UTF8 is the most common form of UNICODE in use today because it has no overhead compared to 8 bit standard ASCII and, for most transmissions, has less overhead than 32-bit UNICODE (also known as UTF32). UTF16 uses 16 bits throughout but doesn't cover the complete range of UNICODE encodings.

User Avatar

Wiki User

9y ago
This answer is:
User Avatar

Add your answer:

Earn +20 pts
Q: How does unicode relate to ascii?
Write your answer...
Submit
Still have questions?
magnify glass
imp
Related questions

What is the cousin of ascii?

Unicode


Why do you need ascii?

You don't need ASCII, you need Unicode..


Reasearch and report on the issue of ascii coding and unicode coexist?

Reasearch and report on the issue of ascii coding and unicode coexist?


Need of ASCII and Unicode?

answer please


What is ASCII code of U?

Upper case U in ASCII/Unicode is binary 0101011, U is code number 85. Lower case u in ASCII/Unicode is binary 01110101, u is code number 117.


What is a 16-bit coding scheme called?

Unicode


This coding system is designed to support international languages like Chinese and Japanese?

unicode or ansic


What is the difference between ASCII and Unicode?

ASCII is a set of digital codes widely used as a standard fromat in the transfer of text. Unicode is an international encoding standard for used with different languages and scripts


What is different between unicode and ascii?

ASCII is a set of digital codes widely used as a standard fromat in the transfer of text. Unicode is an international encoding standard for used with different languages and scripts


What are the difference between Unicode and ASCII code?

describe the destination index


What are the ASCII codes for subset and proper subset?

Since ASCII ⊊ unicode, I don't know if there are ASCII codes for subset and proper subset. There are Unicode characters for subset and proper subset though: Subset: ⊂, ⊂, ⊂ Subset (or equal): ⊆, ⊆, ⊆ Proper subset: ⊊, ⊊,


What is the ASCII code for letter D?

The ASCII code for the letter D is 68 in decimal, 0x44 in hexadecimal/Unicode.