A filename extension is a suffix to the name of a
computer file applied to indicate its type. It is
commonly used to infer information about what sort of data might be stored in the file. The description above is meant to mostly
explain the intent of filename extensions: a true definition, giving the criterion for deciding what part of the file
name is its extension, belongs to the rules of the specific filesystem used; most times the
extension is the substring which follows the last occurrence, if any, of the dot character
(e.g. "txt" is the extension of the filename "readme.txt", "html" the extension of "mysite.index.html"). On filesystems on
mainframe systems such as MVS, VMS, and PC systems such as
CP/M and derivative systems such as Microsoft DOS,
the extension is actually a separate namespace from the filename. This is different from Unix-like operating systems, where filesystems do not actually support the notion of an extension, where a
suffix is not a separate namespace, and where even having a suffix is voluntary for executables, as permissions are used to decide whether a file is executable.
With the advent of the GUI, the issue of file management and interface
behavior arose. The Windows platform allowed multiple applications to be associated with a given filename extension, and
different "actions," using those applications, defined for opening, editing, viewing, and so-forth by means of a context menu. File managers such as Windows Explorer can have applications assigned for any extension. For example, a text editor for .txt, a word processor for .doc or .odt, a
web browser for .htm or .html, PDF viewer
or editor for .pdf, a graphics program for .png, .gif or .jpg, a spreadsheet program for .xls or .ods, etc.
Under Microsoft's operating systems DOS and Windows, some extensions, including .exe, .com, .bat, and .cmd, indicate that a file is an
executable program.
Filename extensions have been in use for decades, but they have gained common usage because the file systems included with DOS and Windows had severe limitations on filename lengths for many years, which
strongly encouraged the use of filename extensions. Filename extensions can be considered as a type of metadata.
Pre-OS X versions of the Mac OS disposed of filename extensions entirely, instead using a file
type code to identify the file format. Additionally, a creator
code was specified to determine which application would be launched when the file's icon was double-clicked. Mac OS
X, however, uses filename suffixes as a consequence of being derived from the Unix-like NEXTSTEP operating system, which didn't have type or creator code support in its file system.
Historical limitations
Filename extensions were used in Digital Equipment Corporation (DEC)
operating systems (for example, TOPS-10, OS/8 and RT-11). CP/M adopted the convention and MS-DOS, as a
re-implementation of CP/M, did so as well.
The DEC operating systems internally split the filename into a "base name" and a filename extension, with the "base name"
limited to five to eight characters and the extension limited to two or three characters; when a filename/filename extension
combination was typed in commands, a period (.) was placed between the filename and filename
extension. CP/M worked the same way; the filename was limited to eight characters and the filename extension was limited to three
characters, with a period between them. Early versions of the FAT filesystem used
in MS-DOS and Microsoft Windows imposed the same limitations. This is sometimes
referred to as the 8.3 filename convention, and since the word filename is eight
letters long and ext is a reasonable abbreviation for extension, it can be generalized as:
FILENAME.EXT
When doing a file listing, the base name and extension would be separated by spaces, much like this:
Volume in drive A: is LINUX BOOT
Volume Serial Number is 2410-07EF
Directory for A:\
LDLINUX SYS 5480 1999-04-19 23:24
VMLINUZ 530921 1999-04-19 23:24
BOOT MSG 559 1999-04-19 23:24
EXPERT MSG 668 1999-04-19 23:24
GENERAL MSG 986 1999-04-19 23:24
KICKIT MSG 979 1999-04-19 23:24
PARAM MSG 875 1999-04-19 23:24
RESCUE MSG 1020 1999-04-19 23:24
SYSLINUX CFG 420 1999-04-19 23:24
INITRD IMG 878502 1999-04-19 23:24
10 files 1,420,410 bytes
35,840 bytes free
This use of spaces often led to confusion with novice DOS users, who thought of the "." as part of the file's identifier,
rather than merely a convention for separating the two components of that identifier.
The need for more
The filename extension was originally used to easily determine the file's generic type. The need to condense a file's type
into three characters frequently led to inscrutable extensions. Examples include using .GFX for graphics files, .TXT for plain text, and .MUS for
music. However, because many different software programs have been made that all handle these data
types (and others) in a variety of ways, filename extensions started to become closely associated with certain products—even
specific product versions. For example, early WordStar files used .WS or
.WSn, where n was the program's version number. Also, filename extensions began to conflict between
separate files. One example is .rpm, used for both RPM Package Manager
packages and RealPlayer Media files; another being .qif shared by both
Quicken Information Files (financial ledgers) and
QuickTime Image Format (pictures).
As time went on, hundreds of different extensions came into use as software developers invented more and more file formats.
This led to reference manuals being published, devoted entirely to listing the extensions and the type (or types) of data that
might be found in files so named. These issues led to the need for alternative systems with significantly lower chances of
conflicts.
Some other operating systems, such as Multics that used filename extensions generally had
much more liberal sizes for filenames. Many allowed full filename lengths of 14 or more characters, and maximum name lengths up
to 255 were not uncommon. The file systems in operating systems such as Unix stored the file name
as a single string, not split into base name and extension components, with the '.' being just another character allowed in file
names. Such systems generally allow for variable-length filenames, permitting more than one dot, and hence multiple suffixes.
Some components of Multics and Unix, and applications running on them, used suffixes, in some cases, to indicate file types, but
they didn't use them as much - for example, programs and ordinary text files had no suffixes in their names.
The High Performance File System (HPFS), used in Microsoft and
IBM's OS/2 also supported long file names, and didn't divide the file
name into a name and an extension. However, the convention of using suffixes continued, even though HPFS supported extended
attributes for files, allowing a file's type to be stored with the file as an extended attribute.
In addition, Microsoft's Windows NT's native file system, NTFS, supported long file names and didn't divide the file name into a name and an extension, but again, the
convention of using suffixes to simulate extensions continued, for compatibility with existing versions of Windows.
As the Internet age arrived, it was possible to discern who was using Windows systems to
edit their web pages versus who used Macintosh or Unix computers, since the Windows users were
generally restricted to ending their web page filenames in .HTM (instead of .html). This also became a
problem with programmers experimenting with the Java programming language,
since it required source code files to have the four-letter suffix .java
and compiled object code output files to have the
five-letter .class suffix.
Eventually, Microsoft introduced support for long file names, and removed the 8.3
name/extension split in file names, in an extended version of the commonly used FAT file system called VFAT. VFAT first appeared in
Windows NT 3.5 and Windows 95. The internal
implementation of long file names in VFAT is largely considered to be an ugly kludge, but it
removed the important length restriction, and allowed files to have a mix of upper case
and lower case letters, on machines that would not run Windows
NT well. However, the use of three character extensions under Windows has continued, originally for backward compatibility
with older versions of Windows and now by habit, along with the problems it creates.
Security issues
The default behavior of Windows Explorer, the Microsoft file browser, is for file extensions not to be shown. Malicious users
have tried to spread computer viruses and computer
worms by using file names formed like LOVE-LETTER-FOR-YOU.TXT.vbs. The hope is that this will appear as
LOVE-LETTER-FOR-YOU.TXT, a harmless text file, without alerting the user to the fact that it is a harmful computer
program, in this case written in VBScript.
Some similar historical Microsoft Windows security issues are discussed under COM file.
Later Windows versions (starting with Windows XP Service Pack 2 and Windows Server 2003) included customizable lists of file extensions that should be considered
'dangerous' in certain 'zones' of operation, such as when downloaded from
the web, received as an e-mail attachment etc. Modern antivirus software systems also help to defend users against such attempted attacks where
possible.
There have been instances of malware crafted to exploit vulnerabilities in some Windows
applications which could cause a stack-based buffer overflow when opening a file with an
overly long, unhandled file extension.
Relation to Internet content types
In network contexts, files are regarded as streams of bits and do not have filenames or
extensions.
In the internet protocol suite the information about a certain type
relating to a certain bitstream is encoded in the MIME Content-type of the stream, represented by a
row of text in a block of text preceding the stream, such as:
Content-type: text/plain
BeOS, whose BFS file system supports extended
attributes, would tag a file with its MIME Content-type as an extended attribute. The KDE and
GNOME desktop environments associate a MIME
Content-type with a file by examining the filename suffix and examining the contents of the file, in the fashion of the
file command, as a heuristic. They choose the application
to launch when a file is opened based on the MIME Content-type, reducing the dependency on filename extensions. Mac OS X uses both filename extensions and mime types, as well as file type
codes, to select a Uniform Type Identifier by which to identify the file
type internally.
See also
External links
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)