A document (noun) is a bounded physical representation of body of information
designed with the capacity (and usually intent) to communicate. A document may manifest
symbolic, diagrammatic or sensory-representational information. To document (verb) is to produce a document artifact by
collecting and representing information. In prototypical usage, a document is understood as a paper artifact, containing
information in the form of ink marks. Increasingly, documents are also understood as digital artifacts.
Colloquial usage is revealed by the connotations and denotations that appear in a search for document. From these usages, one can infer the following typical
connotations:
- Writing that provides information (especially information of an official nature)
- Anything serving as a representation of a person's thinking by means of symbolic marks
- A written account of ownership or obligation
- To record in detail; "The parents documented every step of their child's development"
- A digital file in a particular format
- To support or supply with references; "Can you document your claims?"
- An artifact that meets a legal notion of document for purposes of discovery in litigation
The variety of usage reveals that the notion of document has rich social and cultural aspects besides the physical, functional
and operational aspects.
Conceptualization in analytical philosophy
The notion of document admits both an empirical (in terms of a fuzzy set of real-world instances) and analytical
characterization. The analytical characterization hinges on the semantic character of the word document, as well as the
use of a primitive notion of document in accounts of larger communication constructs such as discourses, or related
constructs such as language games.
The nominal 'document', like other nominals, exhibits familiar patterns of polysemy (a kind of ambiguity). For example,
"document" might be used on an occasion to denote a certain body of information independently of how that information is
physically rendered (as in 'the Bible is my favorite document.'; 'Have you finished reading all the documents for Monday's class
yet?'), or it might be used to denote a particular physical instantiation of a body of information (as in 'that document is worn
and needs to be re-bound.'; 'Return the documents you borrowed to the reference desk.'). This kind of polysemy bears some
similarity to what Nunberg, 1979 termed "container/contents polysemy" (as in 'Mary broke the
bottle' versus 'the baby finished the bottle'). These patterns of polysemy exhibited by 'document' matter for the following
reason. A certain document qua body of information (e.g. the Bible, not a particular bound copy thereof) will have different
properties than a document qua physical rendering of a body of information (e.g. a particular bound copy of the Bible).
Importantly, the latter would have the property of being a static, physically bounded thing. The former would have the properties
of being able to evolve over time, being susceptible of certain changes to information content, and being capable of supporting
multiple physical instantiations that have allowable differences in information content. This distinction is relevant to the
discussion of aspects and history of documents below.
Empirical characterization
In light of the polysemy of the core concept of document, it is useful to note a number of examples ranging from instances
commonly understood as prototypical documents, to instances that are understood as documents only in specialized or rare
situations.
- Prototypical Documents: Letters, memos, legal forms, Instruction
manual
- Documents of Record: Newspapers and magazines
- Books: Text book, Novels, Recipe books, Encyclopedia, Comic books
- Canonical Documents: The Bible,Iliad and
Odyssey,Vedas, Ramayana,
Mahabharata, Quran, Code
of Hammurabi,Tao Te Ching
- Transactional Documents: Cheque, Contracts,
Prescription, Receipt, Form (document), Postage Stamp
- Functional Documents: PDF files, PostScript files, XML files, Email
- Non-Prototypical Documents: Post-it notes, Fortune cookie strips, Maps, Paintings,
milk cartons, cereal boxes
- Non-Classical Digital Documents: Web Page, Weblog,
Wiki
- Boundary Examples: The plaque on the Pioneer 11 spacecraft, designed by astronomer
Carl Sagan, and using information assumed to be universal is an extreme example of a document that is intended to communicate
with aliens. Conversely, the recorded and printed signals of the SETI project would constitute documents if they were discovered
to contain alien communication.
Social aspects of documents
Documents play a key role in the construction of social reality (Searle, 1996) and therefore play a part in accounts of every
important aspect of human society and culture. An example of this type of account is in the seminal account of the role of print
in political evolution, Imagined Communities, (Anderson, B., 2006). More direct examples include the works of
Marshall McLuhan (McLuhan, 1964 and 1969). Many key social aspects of documents arise
from their historically unchanging character. This aspect leads to a definition of a document as a talking thing
(Levy, D., 2003), whose strengths and weaknesses both arise from its relative (historical) immutability with respect to oral
forms of communication. The relative immutability of documents has thus historically been important for establishing a record of
transient events, or for preserving information whose precise linguistic form is of ritual or practical importance (such as
religious texts or legal documents). Note though, that historically many societies have accorded greater authority to disciplined
oral traditions as more reliable than parallel written ones. With this caveat in mind, the following social aspects of
information may be noted.
- Social Value: The information in documents as well as documents themselves are often valuable; the information because
of the influence represented, and the document itself when it is believed to be a rare or unique and authentic representation of
the information it contains. (citation needed here)
- Manifestation of authority: Documents are often produced to provide a record that will be considered authoritative in
the future, particularly with respect to government. Consider receipts, titles, and deeds as examples of proof of ownership, and
passports or driver's licenses as proof of identity.
- Conventional: Documents inherit a key feature of language-based communication in general: they are denoted as
documents by convention (Lewis, 2002). Virtually any medium can constitute a document provided the people involved can agree on
the meaning represented. Hence cave drawings, hieroglyphics, scrolls of sheepskin, sheets of papyrus, ink on paper, magnetic tape
and electronic files are all documents under certain accounts.
- Manifestation of economic labor: Historically, the effort required to produce a document has been significant, so only
the most important documents were created. The Illuminated manuscript of the
pre-Gutenberg era demonstrates the cost (and associated imputed value) of documents. Historically, the cost of producing
documents has declined, while their functional characteristics ("affordances" in the sense of Sellen and Harper, 2001) have
become richer.
- Manifestation of business processes: Documents play many roles in the internal management of a business as well in the
interfaces between businesses and their suppliers, employees, and customers. Current trends toward longer value chains and
increased regulation increase the number of documents that must be generated and processed.
- Instruments of Governance and Law: The unchanging aspect of documents is crucial to the consistent communication of
policy and administration of law to citizens. Documents that play such roles include constitutions, corporate annual reports and
religious texts.
- Analytical philosophical character: The notion of document plays a role in political philosophy (example, the notion
of social contract as a primitive construct), as well as in the philosophy of law
- Role in Religion: Documents play a key role in religion, and constitute canonical content. Document-related terms such
as dogma and doctrine have today acquired pejorative connotations primarily due to historical events associated
with religious documents.
- Cultural Significance: Documents play a central role in art of all varieties. In the movie Office Space for instance, central plot elements are frustration with bureacratic process involving the
fictional "TPS reports" and a malfunctioning printer.
- Metaphoric Significance: Metaphors based on documents permeate our thinking, ranging from the obvious ("let's start
with a clean sheet for this design", "this is a new chapter in my life" and "she wrote the book on that") to the highly
allegorical ("All mankind is of one author, and is one volume; when one man dies, one chapter is not torn out of the book, but
translated into a better language; and every chapter must be so translated" — John Donne).
Functional characteristics
Documents also manifest several, more localized characteristics that determine how we use them in everyday life:
- Manifest nature: Information is physical, i.e. it always must exist in a tangible form, even when digital. IBM
computer scientist Rolf Landauer is credited with this observation and working out its implications. By virtue of being
realizations of chunks of information, documents are necessarily physical in all their forms.
- Contextuality and Situatedness: All communication takes place in a context, which includes at least the shared
understanding of the parties communicating (Lewis, 2002). Explicit and implicit references to the context can convey a large
amount of meaning by building on the shared understanding, but that meaning is lost to another party that does not share that
context. For example, Shakespeare in the original would be incomprehensible to modern readers simply because of the evolution of
language and spelling since the seventeenth century, and modern readers (besides Shakespeare scholars) normally read modernized
versions. Similarly, hypertext documents exist in a context which is lost if printed, leading to a different offline reading
context.
- Evolvability: When we think of a document as a definitive source containing the best known information about a topic
there is need to change that information as more is learned. This is frequently done by revising the document into a new version
or edition. Typically, older versions are archived to facilitate understanding how the document has changed. In modern contexts,
when technologies such as wikis or software source code are under discussion, this evolvability can require very sophisticated
version control technologies.
- Renderability: Every abstract entity that is understood to be a document in some context can be rendered, often in
more than one way. A rendition of a document refers to a particular physical or electronic representation of the information from
the document. For example, a portable document format (pdf) representation and
a web page may contain the same information but have substantially different properties and appearances. We think of them as
different renditions (or renderings) of the same document. We might similarly consider different translations of a document to be
the same document although differences in language context and structure may make it impossible to express precisely the same
meaning in both languages.
- Affordances: Documents in digital and physical forms manifest various "affordances" (Sellen and Harper, 2001,
Gladwell, 2002)). The affordances of a particular rendition of a document determine its uses. For example, paper has the
affordances of allowing flipping and easy tactile manipulation, while digital forms are easier to edit.
Classical roles and workflows in document production
There are a number of roles in which people are involved in the creation and distribution of traditional paper documents
(Romano, 1989); some, but not all documents are processed by people acting in each role, each of which may be performed by an
individual or a group. Books are a well known example of documents that require an extensive publication process, but many other
documents undergo similar processes to at least some of those from book publication. Each of these roles is considered to improve
or add value to a document. These roles are generally understood as being clustered in various phases in the production of a
classical document, including authorship, editing and prepress. Roles and workflows in the production of modern digital documents
are more variable and are discussed in the section on future documents.
- An author selects the content to be communicated and performs the initial organization
and recording of the content. A document in this state is often called a manuscript.
- A reviewer reads the content and evaluates it with respect to the intended audience.
Reviewers often recommend only the best documents to be published. Documented reviews are frequently published as guidelines for
document consumers as well.
- An editor helps to organize and express the content so that the meaning is clear and
understandable, and follows the conventions of the symbolic representation such as spelling and grammar.
- A publisher orchestrates the process of producing a document, often decides
whether a document is worth the effort of publishing (usually an economic decision), and collects and disseminates the profits
from sales of a produced document.
- A printer formats the document into a comfortable form such as a bound
book. Printing can be a very complex and elaborate process, including
- pagination - function performed by an individual who takes on the tasks of
organizing text, fonts, images, headings, footnotes, chapters and sections to accommodate the physical constraints of a printed
page aesthetically.
- pre-press -- function performed by print shops in preparing paper documents for production.
- imposition - organizing desired pages on a larger media such that when folded and
trimmed the pages will be upright and in order.
- printing - marking paper with ink or toner
- folding pages into sections
- binding pages together and covering
- trimming
- packaging
- A distributor manages inventory and physical distribution of printed documents to
retailers.
- A retailer manages a local inventory and sales to consumers, and often is familiar
with the content and can make appropriate recommendations.
- A librarian organizes, tracks borrowing of, and archives documents.
A publication process enables a consumer to purchase or borrow, read and learn from
documents. Consumers are often the intended audience of the publication process.
Document production technology
Document production technology has evolved significantly through history. While a great deal can be said about ancient
production technologies including papyrus, palm leaves, stone tablets and marking devices ranging from quills to chisels, the
modern form of the document has evolved largely under the influence of printing technologies. The Illuminated manuscript of Europe is a useful prototypical instance of the document at the end of
its evolution before the widespread use of printing. The associated technology was largely a human one. Other cultures at this
stage used other forms of pre-print era documents. The history of printing can be traced as follows:
Bronze age civilizations made extensive use of seals (see Seal) for commercial and transactional
purposes. The particular case of the signet ring was of particular importance, and is still in use in place of signatures in East
Asian countries like Korea, where it is common for individuals to carry a seal.
Chinese Woodblock printing was the first widespread technology that automated
important parts of the document production process.
The Gutenberg Printing Press (McLuhan, M., 1969) enabled the mass production of
faithful copies of documents, and hence the widespread dissemination of information. The widespread access to information enabled
(and necessitated) fundamental changes to society in religion, government, law, business, and entertainment. Prior to the press
the huge effort required to faithfully hand-copy severely limited the number of documents available, and hence access to the
information contained therein. The effort to set type and prepare a document for reproduction was still high, but many high
fidelity copies could be produced.
The development of Lithography constituted the next great advance in document production
technology and continues today to dominate the economic landscape of document production, an economic sector estimated to be of
the order of $1 trillion. Lithography brought economies of scale and extremely high quality and low cost to documents.
The typewriter improved the accessibility of document production technologies and enabled
it to enter mainstream workplaces. Carbon paper enabled a modest number of copies to be
produced concurrently with the original. A brief era of photography-based technologies flourished (including the photostat and
cyclostyle processes) in parallel with the age of typewriters.
The Xerox Copier became a major milestone in document production by eliminating the
typesetting effort required by a printing press. The Xerographic ("dry writing") technology (also referred to as
electrophotography) could produce durable and economical copies of a paper document easily and quickly. Modern digital printers
from Xerox and other companies such as HP, Canon and Ricoh, can produce more than 240 black and white or 170 copies of a page
each minute, and work with up to 6 colors and dry and wet inks. This technology supports a $100 billion market in digital
printing, particularly in domains where lithography has clear limitations.
Computers enabled information to be stored electronically in databases and electronic files
on magnetic tapes, drums, and disks. This led to a radical disruption of all document production technologies. Initially most of
this information was printed onto paper by teletypes (automated typewriters), but computer
printers rapidly became faster and more sophisticated. Computers, by controlling lasers in xerography, micro-nozzles in inkjet
systems, and tiny solenoids in mechanical systems, became capable of being serially embedded in the document production process.
Computers are also critical to modern lithography.
Today, epaper is viewed as one potential future evolutionary physical form of the
prototypical document.
Document life cycle management technology
Technology to manage documents has evolved in parallel with documents themselves. Of particular importances are practices
concerning the preservation, archival, destruction and management of documents. These constitute what is known as the "document
life cycle"
- Physical preservation: Documents in both traditional physical forms and in digital physical forms such as magnetic
media must be physically preserved. This aspect of document management deals with such issues as the aging of paper (the
innovation of acid-free paper is an advance in preservation) and obsolescence of magnetic media.
- Storage: This aspect includes management of scarce resources such as shelf space and disk space, and associated
technologies such as optimal space utilization. Modern libraries such as the University of Nevada and the University of Michigan
often use complex space-saving technologies such as robotic retrieval systems for stacks and moving bookshelves. In the digital
realm, the entire discipline of compression technologies can be viewed as concerned with the storage of documents.
- Cultural Preservation: This function, traditionally ascribed to librarians involves the selection, arrangement and
storage of documents in safe places. The importance of this part of document life cycle management can be seen in the impact of
historical events such as the destruction of books in ancient China (reference needed) and the burning of the library at
Alexandria. Today, library and information science has evolved into an important academic discipline.
- Bibliometrics: This aspect of document management involves functions of indexing, generating statistics and
taxonomies, and improving the usability of large collections of documents. The modern history of this management technology dates
back to Melvil Dewey and the Dewey Decimal System. Today, the science of
bibliometrics is largely concerned with managing the impact of electronic technologies. This aspect must also deal with ISBN
numbers, Library of Congress data and other standards.
- Digital Content Management: The explosion of digital content has resulted in technologies to manage large collections
of digital information generated by organizations. Such systems must manage access control and privileges, multiple electronic
format, interface with printing infrastructures and enable collaborative workflows around documents.
- Digital-Physical Interaction Management: As long as both paper and digital documents continue to have value, the
modern management technologies to manage their interaction will continue. Key to this management is the management of large scale
and systematic scanning of physical documents (such as the Google book scanning project).
- Destruction: With the increased cost of identity theft, corporate scandals and privacy concerns, the destruction of
both paper and electronic documents has become increasingly important to manage. Technologies such as shredders play a role, as
do verifiable processes of destruction of electronic documents to ensure compliance with privacy laws.
- Security: Shannon's information theory has led to an entire discipline that concerns itself with the security of
documents, and associated technologies such as encryption, as well as more physical security features such as watermarks and
making currency documents safe from counterfeiting.
- Transportation: The entire postal system, as well as modern courier systems, is largely built on the need to move
documents physically from one location to the other.
The document economy
The economics of the production and management of documents indirectly impacts every economic sector. While the total economic
value of the document economy is hard to estimate, the economic sectors with business models directly dependent on documents
include:
- Document Authoring Technology: This sector supports a huge variety of digital and physical production technologies,
ranging from Microsoft Word to LaTeX to advanced layout
software.
- Education: The production and processing of documents is so critical that entire educational disciplines have evolved
around writing, editing, layout and design of documents. The information sciences are also part of the document economy.
- Electronic Document Management: Managing documents within organizations and in public and personal contexts supports a
huge industry in content management systems, ranging from free public infrastructure such as wikipedia to proprietary enterprise applications such as Docushare and Documentum.
- Physical Document Management: Large manufacturing sectors producing everything from 3-ring binders to filing cabinets
and office desks exist largely due to the need to process documents.
- Media: The paper industry exists to support the document economy.
- Print equipment: From lithography and xerography to pencils and crayons, an extraordinarily diverse set of equipment
industries depend on documents.
- Document Services: In large organizations, the life of documents in the workflows and processes of daily activity
represent an enormous locus of value addition and cost reduction, which has led to a burgeoning industry in managed document
services, ranging from specialized niches (such as payroll management by PayChex Inc.,) to managed office printing.
- Retail Production: From large chains such as Kinko's in the United States to small copy shops and offset print shops,
documents support a large production sector for the end user.
- Publishing: All publishing, ranging from offset-based newspaper and magazine printing, to highly customized modern
publishing using publish-on-demand digital print technology, is part of the document economy. The publishing industry includes
major sub-areas such as the writer's market, small, medium and large publishing houses, small and large distributors and a vast
network of independent and chain bookstores, online retailers, a large used-documents market and subscription-based markets.
- Document Transportation: The international postal system, as well as the commercial package transportation systems
represented by companies such as DHL and UPS have economic models based largely on the demand for document transportation.
Future of documents
Since the advent of the digital era, documents have been evolving on a trajectory of radical evolution, requiring fundamental
reconceptualization (Wesch, 2006). Efforts at reconceptualization date as far back as Vannevar Bush's initial conceptualization
of hypertext (Bush, V., 1945) to modern treatments of hypertext. The impact of digital
technology can be understood in terms of several key aspects:
- Blurring of the notion of document boundary: hypertext and Web content make it hard to determine what is being denoted
by the term document. While the early days of the Web resulted in documents that mimicked their physical ancestors, Web
content rapidly took on new characteristics. Reconceptualization of the notion of "boundary" is a key intellectual challenge
(Sweet, 2002).
- Increasing structure and openness: The document is going from an opaque container of information to a much more open,
structured document. XML is underlying most document formats today (OpenDocument or Office Open XML). In the future, it will become
even more queriable, with the actual elements of this document being tagged - e.g. HR-XML.
- Dynamic nature: Web analogs of traditional paper documents like a newspaper column have taken on a dynamic character
due to the impact of technology enabling the addition of comments from readers. The document will increasingly become "virtual",
bringing up-to-date information from various sources in one container (a la "mash-up") - as such,it will be kept evergreen.
- Paper and electronic are reconciling: Paper has traditionally been a gap in document processing workflows.
Technologies such as OCR, OMR, or 2D Barcodes are helping get its content back into the electronic
world. In the future however, Not only will that transition be seamless, but it will also be possible to track it while in the
"physical" world through RFID or MemorySpot.
- Hybrid automated/human authorship: authorship workflows for digital documents have evolved to include the computer in
a key role. Dynamic Web pages may be viewed as the joint output of a human author (who produces a template) and a software system
(that fills in the template). Sophisticated examples of this phenomenon can be found in recent evolutions in paper documents as
well. Variable data technology, for instance, allows creators of direct mail marketing documents to vary the content of every
piece in a print run using technologies such as XMPie.
- Prosumer workflows: Content repositories such as wikipedia radically alter
traditional document production workflows by blurring roles such as author and editor.
- Customizability: Digital technology allows users to actively participate in the construction of documents they see,
realizing the postmodern notion of construction of meaning in an unexpectedly literal way.
- Long Tail Economics: Technologies such as blogs have allowed document production economics to operate with such
radically cheap cost structures that single individuals can derive an income from a global audience with low capital expenses.
This has led to an explosion of niche content.
- Blurring of Documents and Interfaces: Technologies such as Ajax or Apollo
blur the distinction between documents and user interfaces to "intelligent" technologies, leading to a whole class of smart
documents that can go beyond the passive nature of traditional documents.
- Fluidity and Dynamic Microstructure: Distinct from the impact of hypertext on the notion of document is the fluid
potential of modern documents at the microlevel, which allows an enormous variety of word and sentence level dynamic
phenonomenology (Kelly, K., 2006).
See also
References
- Sellen, A. J. and Harper, R. H. R., 2001, The Myth of the Paperless Office
- McLuhan, M., 1969, The Gutenberg Galaxy
- McLuhan, M., 1964, Understanding Media: The Extensions of Man
- Landow, G. P., 2006, Hypertext 3.0: Critical Theory and New Media in an Era of Globalization
- Bush, V., 1945, As We May Think, Atlantic Monthly, http://www.theatlantic.com/doc/194507/bush
- Kelly, K. 2006, Scan This Book!, New York Times Magazine,
http://www.kk.org/writings/scan_this_book.php
- Owen, D., 2004, Copies in Seconds: How a Lone Inventor and an Unknown Company Created the Biggest Communication
Breakthrough Since Gutenberg—Chester Carlson and the Birth of the Xerox Machine
- Searle, J. R., 1997, The Construction of Social Reality
- Anderson, B., 2006, Imagined Communities: Reflections on the Origin and Spread of Nationalism, New Edition
- Levy, D., 2003, Scrolling Forward: Making Sense of Documents in the Digital Age
- Gladwell, M., 2002, The Social Life of Paper, New Yorker Magazine, http://www.gladwell.com/2002/2002_03_25_a_paper.htm
- Lewis, D. K., 2002 Convention: A Philosophical Study (Revised edition)
- Pedauque, R. T., Document: Form, Sign and Medium, as Reformulated for Electronic Documents [1]
- Romano, F., 1989, Pocket Guide to Digital Prepress
- Sweet, J., 2003, Document Boundaries Master's Thesis, Rochester
Institute of Technology
- Wesch, M., 2006, The Machine is Us/ing Us, video short documentary, http://www.youtube.com/watch?v=6gmP4nk0EOE
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)