answersLogoWhite

0


Best Answer

There is no standard method of differentiating between the two because all files are binary files. Even text files must be binary encoded since that's the only encoding the computer can actually use. However, there are a few methods you can use to try and determine the encoding.

Text files only differ insofar as they only contain text (plain text) so one way we can differentiate is by examining the individual character values because there are certain characters we simply would not expect to find in a plain text file. All the ASCII control codes can be found in the range 0x00 through 0x1F (0 to 31 decimal) and the majority of these should not be present. The exceptions are 0x09 (horizontal tab), 0x0A (new line) and 0x0D (carriage return), all of which can reasonably be expected to be present in a text file.

However, plain text files may use other encodings besides ASCII. These alternative encodings typically use multi-byte encoding where a single character is composed from one or more bytes rather than just a single byte as with ASCII. These encodings are collectively known as UNICODE, however there are many variants, such as UTF-8, UTF-16, UTF-32 and so on. A UNICODE text file should (but doesn't always) include a byte order mark (BOM) at the start of the file. This is primarily used to determine if the multi-byte encoding is big-endian or little-endian but is also used to determine the specific encoding itself. So if the first few characters form a recognised BOM it is reasonable to assume (with a high degree of confidence) that the remainder of the file is multi-byte encoded text and the BOM itself will tell us precisely how to interpret it correctly.

The most common BOMs in use today are:

0xEFBBBF : UTF-8

0xFEFF : UTF-16BE (big-endien)

0xFFFE : UTF-16LE (little-endien)

0x0000FEFF : UTF-32BE

0xFFFE0000 : UTF-32LE

0xF7644C : UTF-1

0xDD736673 : UTF-EBCDIC

0x0EFEFF : SCSU

0xFBEE28 : BOCU

0x84319533 : GB-18030

UTF-7 has several BOMs, each of which begins 0x2B2F76:

0x2B2F7638

0x2B2F7639

0x2B2F762B

0x2B2F762F

0x2B2F76382D

When looking for a BOM, always look for the largest BOM first.

If there is no BOM present and the file does not appear to be ASCII plain text, there may be an HTML-style header at the start of the file that provides us the actual encoding. For instance, the following header tells us that this file uses UTF-8 encoding:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

UTF-16 and UTF-32 encodings that do not include a BOM or a plain-text header can often be deduced by examining all the null bytes (0x00) bytes in each group of two or four bytes. If the majority appear on either the odd or even bytes, then the encoding is probably UTF-16LE or UTF-16BE. Similarly with UTF-32 if the majority of odd or even pairs of bytes are 0x0000 (check for these before checking for UTF-16).

If none of these methods can determine the encoding, the best thing to do is alert the user that it is not possible to deduce the encoding, and perhaps allowing them to choose the correct encoding for themselves.

Note also that when presenting plain text, it is important that you also use the appropriate character set in combination with the correct decoding method. Even with the correct decoding method, the text may still appear garbled if the wrong characters are being represented.

User Avatar

Wiki User

8y ago
This answer is:
User Avatar

Add your answer:

Earn +20 pts
Q: How do you check a file is a binary or a text file through c programme?
Write your answer...
Submit
Still have questions?
magnify glass
imp
Continue Learning about Engineering

Write a C program that takes a binary file as input and finds error check using different mechanisms?

write a c program that takes a binary file as input and finds error check using different mechanisms.


What is a proper method of opening a file for writing as binary file in c?

4524524


What is the difference between binary file and executable file?

windows support 2 file formats 1.text file 2.binary file in a text file in windows , each line is teminated with a carriage reurn followed by a linefeed character .but when a file is read by a c prog in text mode,c library converts carriage reurn/ linefeed character both in to a single linefeed character. but in case of binary file ,the prog will see both carriage return &amp; linefeed character


How do you do binary search in file?

A binary search on a random-access file is performed much in the same way as a binary search in memory is performed, with the exception that instead of pointers to items in memory file seek operations are used to locate individual items within the file, then load into memory for further examination. The key aspects of the binary search algorithm do not depend on the specifics of the set of searchable items: the set is expected to be sorted, and it must be possible to determine an order between any two items A and B. Finally, the binary search algorithm requires that the set of searchable items is finite in size, and of a known size.


What are example files of data encoding?

Not to be flippant, but every file is an example of data encoding. Before data can be stored in computer memory or in a disk file, it first has to be digitally encoded in binary. The binary encodings can then be further encoded using encryption or compression.