Share on Facebook Share on Twitter Email
Answers.com

Distributed computing

 
Sci-Tech Dictionary: distributed computing
(di′strib·yəd·əd kəm′pyüd·iŋ)

(computer science) The use of multiple network-connected computers for solving a problem or for information processing.


Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
Sci-Tech Encyclopedia: Distributed systems
Top

A distributed system consists of a collection of autonomous computers linked by a computer network and equipped with distributed system software. This software enables computers to coordinate their activities and to share the resources of the system hardware, software, and data. Users of a distributed system should perceive a single, integrated computing facility even though it may be implemented by many computers in different locations. This is in contrast to a network, where the user is aware that there are several machines whose locations, storage replications, load balancing, and functionality are not transparent. Benefits of distributed systems include bridging geographic distances, improving performance and availability, maintaining autonomy, reducing cost, and allowing for interaction. See also Local-area networks; Wide-area networks.

The object-oriented model for a distributed system is based on the model supported by object-oriented programming languages. Distributed object systems generally provide remote method invocation (RMI) in an object-oriented programming language together with operating systems support for object sharing and persistence. Remote procedure calls, which are used in client-server communication, are replaced by remote method invocation in distributed object systems. See also Object-oriented programming.

The state of an object consists of the values of its instance variables. In the object-oriented paradigm, the state of a program is partitioned into separate parts, each of which is associated with an object. Since object-based programs are logically partitioned, the physical distribution of objects into different processes or computers in a distributed system is a natural extension. The Object Management Group's Common Object Request Broker (CORBA) is a widely used standard for distributed object systems. Other object management systems include the Open Software Foundation's Distributed Computing Environment (DCE) and Microsoft's Distributed Common Object Manager (DCOM).

CORBA specifies a system that provides interoperability among objects in a heterogeneous, distributed environment in a way that is transparent to the programmer. Its design is based on the Object Management Group's object model.

This model defines common object semantics for specifying the externally visible characteristics of objects in a standard and implementation-independent way. In this model, clients request services from objects (which will also be called servers) through a well-defined interface. This interface is specified in Object Management Group Interface Definition Language (IDL). The request is an event, and it carries information including an operation, the object reference of the service provider, and actual parameters (if any). The object reference is a name that defines an object reliably.

The central component of CORBA is the object request broker (ORB). It encompasses the entire communication infrastructure necessary to identify and locate objects, handle connection management, and deliver data. In general, the object request broker is not required to be a single component; it is simply defined by its interfaces. The core is the most crucial part of the object request broker; it is responsible for communication of requests.

The basic functionality provided by the object request broker consists of passing the requests from clients to the object implementations on which they are invoked. In order to make a request, the client can communicate with the ORB core through the Interface Definition Language stub or through the dynamic invocation interface (DII). The stub represents the mapping between the language of implementation of the client and the ORB core. Thus the client can be written in any language as long as the implementation of the object request broker supports this mapping. The ORB core then transfers the request to the object implementation which receives the request as an up-call through either an Interface Definition Language (IDL) skeleton (which represents the object interface at the server side and works with the client stub) or a dynamic skeleton (a skeleton with multiple interfaces).

Many different ORB products are currently available; this diversity is very wholesome since it allows the vendors to gear their products toward the specific needs of their operational environment. It also creates the need for different object request brokers to interoperate. Furthermore, there are distributed and client-server systems that are not CORBA-compliant, and there is a growing need to provide interoperability between those systems and CORBA. In order to answer these needs, the Object Management Group has formulated the ORB interoperability architecture.

The interoperability approaches can be divided into mediated and immediate bridging. With mediated bridging, interacting elements of one domain are transformed at the boundary of each domain between the internal form specific to this domain and some other form mutually agreed on by the domains. This common form could be either standard (specified by the Object Management Group, for example, Internet Inter-ORB Protocol or IIOP), or a private agreement between the two parties. With immediate bridging, elements of interaction are transformed directly between the internal form of one domain and the other. The second solution has the potential to be much faster, but is the less general one; it therefore should be possible to use both. Furthermore, if the mediation is internal to one execution environment (for example, TCP/IP), it is known as a full bridge; otherwise, if the execution environment of one object request broker is different from the common protocol, each object request broker is said to be a half bridge.


Computer Desktop Encyclopedia: distributed computing
Top

(1) The use of multiple computers networked throughout a wide geographical area, or the world via the Internet, in order to solve a single problem. See grid computing.

(2) The use of multiple computers in an enterprise rather than one centralized system. This use of the term was coined in the late 1970s when minicomputers were first installed in departments throughout a company instead of deploying terminals to a mainframe.

Download Computer Desktop Encyclopedia to your iPhone/iTouch

Wikipedia: Distributed computing
Top

Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal. A computer program that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs.[1]

Distributed computing also refers to the use of distributed systems to solve computational problems. In distributed computing, a problem is divided into many tasks, each of which is solved by one computer.[2]

Contents

Introduction

The word distributed in terms such as "distributed system", "distributed programming", and "distributed algorithm" originally referred to computer networks where individual computers were physically distributed within some geographical area.[3] The terms are nowadays used in a much wider sense, even when referring to autonomous processes that run on the same physical computer and interact with each other by message passing.[4]

While there is no single definition of a distributed system,[5] the following defining properties are commonly used:

  • There are several autonomous computational entities, each of which has its own local memory.[6]
  • The entities communicate with each other by message passing.[7]

In this article, the computational entities are called computers or nodes.

A distributed system may have a common goal, such as solving a large computational problem.[8] Alternatively, each computer may have its own user with individual needs, and the purpose of the distributed system is to coordinate the use of shared resources or provide communication services to the users.[9]

Other typical properties of distributed systems include the following:

  • The system has to tolerate failures in individual computers.[10]
  • The structure of the system (network topology, network latency, number of computers) is not known in advance, the system may consist of different kinds of computers and network links, and the system may change during the execution of a distributed program.[11]
  • Each computer has only a limited, incomplete view of the system. Each computer may know only one part of the input.[12]
(a)–(b) A distributed system.
(c) A parallel system.

Parallel or distributed computing?

The terms "concurrent computing", "parallel computing", and "distributed computing" have a lot of overlap, and no clear distinction exists between them.[13] The same system may be characterised both as "parallel" and "distributed"; the processors in a typical distributed system run concurrently in parallel.[14] Parallel computing may be seen as a particular tightly-coupled form of distributed computing,[15] and distributed computing may be seen as a loosely-coupled form of parallel computing.[16] Nevertheless, it is possible to roughly classify concurrent systems as "parallel" or "distributed" using the following criteria:

  • In parallel computing, all processors have access to a shared memory. Shared memory can be used to exchange information between processors.[17]
  • In distributed computing, each processor has its own private memory (distributed memory). Information is exchanged by passing messages between the processors.[18]

The figure on the right illustrates the difference between distributed and parallel systems. Figure (a) is a schematic view of a typical distributed system; as usual, the system is represented as a graph in which each node (vertex) is a computer and each edge (line between two nodes) is a communication link. Figure (b) shows the same distributed system in more detail: each computer has its own local memory, and information can be exchanged only by passing messages from one node to another by using the available communication links. Figure (c) shows a parallel system in which each processor has a direct access to a shared memory.

The situation is further complicated by the traditional uses of the terms parallel and distributed algorithm that do not quite match the above definitions of parallel and distributed systems; see the section Theoretical foundations below for more detailed discussion. Nevertheless, as a rule of thumb, high-performance parallel computation in a shared-memory multiprocessor uses parallel algorithms while the coordination of a large-scale distributed system uses distributed algorithms.

History

The use of concurrent processes that communicate by message-passing has its roots in operating system architectures studied in 1960s.[19] The first wide-spread distributed systems were local-area networks such as Ethernet that was invented in 1970s.[20]

ARPANET, the predecessor of the Internet, was introduced in the late 1960s, and ARPANET e-mail was invented in the early 1970s. E-mail became the most successful application of ARPANET,[21] and it is probably the earliest example of a large-scale distributed application. In addition to ARPANET and its successor Internet, other early worldwide computer networks included Usenet and FidoNet from 1980s, both of which were used to support distributed discussion systems.

The study of distributed computing became its own branch of computer science in the late 1970s and early 1980s. The first conference in the field, Symposium on Principles of Distributed Computing (PODC), dates back to 1982, and its European counterpart International Symposium on Distributed Computing (DISC) was first held in 1985.

Applications

There are two main reasons for using distributed systems and distributed computing. First, the very nature of the application may require the use of a communication network that connects several computers. For example, data is produced in one physical location and it is needed in another location.

Second, there are many cases in which the use of a single computer would be possible in principle, but the use of a distributed system is beneficial for practical reasons. For example, it may be more cost-efficient to obtain the desired level of performance by using a cluster of several low-end computers, in comparison with a single high-end computer. A distributed system can be more reliable than a non-distributed system, as there is no single point of failure. Moreover, a distributed system may be easier to expand and manage than a monolithic uniprocessor system.[22]

Examples of distributed systems and applications of distributed computing include the following:[23]

Theoretical foundations

Models

Many tasks that we would like to automate by using a computer are of question–answer type: we would like to ask a question and the computer should produce an answer. In theoretical computer science, such tasks are called computational problems. Formally, a computational problem consists of instances together with a solution for each instance. Instances are questions that we can ask, and solutions are desired answers to these questions.

Theoretical computer science seeks to understand which computational problems can be solved by using a computer (computability theory) and how efficiently (computational complexity theory). Traditionally, it is said that a problem can be solved by using a computer if we can design an algorithm that produces a correct solution for any given instance. Such an algorithm can be implemented as a computer program that runs on a general-purpose computer: the program reads a problem instance from input, performs some computation, and produces the solution as output. Formalisms such as random access machines or universal Turing machines can be used as abstract models of a sequential general-purpose computer executing such an algorithm.

The field of concurrent and distributed computing studies similar questions in the case of either multiple computers, or a computer that executes a network of interacting processes: which computational problems can be solved in such a network and how efficiently? However, it is not at all obvious what is meant by “solving a problem” in the case of a concurrent or distributed system: for example, what is the task of the algorithm designer, and what is the concurrent and/or distributed equivalent of a sequential general-purpose computer?

The discussion below focusses on the case of multiple computers, although many of the issues are the same for concurrent processes running on a single computer.

Three viewpoints are commonly used:

Parallel algorithms in shared-memory model
  • All computers have access to a shared memory. The algorithm designer chooses the program executed by each computer.
  • One theoretical model is the parallel random access machines (PRAM) are used.[24] However, the classical PRAM model assumes synchronous access to the shared memory.
  • A model that is closer to the behavior of real-world multiprocessor machines and takes into account the use of machine instructions such as Compare-and-swap (CAS) is that of asynchronous shared memory. There is a wide body of work on this model, a summary of which can be found in the literature. [25][26]


Parallel algorithms in message-passing model
  • The algorithm designer chooses the structure of the network, as well as the program executed by each computer.
  • Models such as Boolean circuits and sorting networks are used.[27] A Boolean circuit can be seen as a computer network: each gate is a computer that runs an extremely simple computer program. Similarly, a sorting network can be seen as a computer network: each comparator is a computer.
Distributed algorithms in message-passing model
  • The algorithm designer only chooses the computer program. All computers run the same program. The system must work correctly regardless of the structure of the network.
  • A commonly used model is a graph with one finite-state machine per node.

In the case of distributed algorithms, computational problems are typically related to graphs. Often the graph that describes the structure of the computer network is the problem instance. This is illustrated in the following example.

An example

Consider the computational problem of finding a coloring of a given graph G. Different fields might take the following approaches:

Centralized algorithms
  • The graph G is encoded as a string, and the string is given as input to a computer. The computer program finds a coloring of the graph, encodes the coloring as a string, and outputs the result.
Parallel algorithms
  • Again, the graph G is encoded as a string. However, multiple computers can access the same string in parallel. Each computer might focus on one part of the graph and produce a colouring for that part.
  • The main focus is on high-performance computation that exploits the processing power of multiple computers in parallel.
Distributed algorithms
  • The graph G is the structure of the computer network. There is one computer for each node of G and one communication link for each edge of G. Initially, each computer only knows about its immediate neighbours in the graph G; the computers must exchange messages with each other to discover more about the structure of G. Each computer must produce its own colour as output.
  • The main focus is on coordinating the operation of an arbitrary distributed system.

While the field of parallel algorithms has a different focus than the field of distributed algorithms, there is a lot of interaction between the two fields. For example, the Cole–Vishkin algorithm for graph colouring[28] was originally presented as a parallel algorithm, but the same technique can also be used directly as a distributed algorithm.

Moreover, a parallel algorithm can be implemented either in a parallel system (using shared memory) or in a distributed system (using message passing).[29] The traditional boundary between parallel and distributed algorithms (choose a suitable network vs. run in any given network) does not lie in the same place as the boundary between parallel and distributed systems (shared memory vs. message passing).

Complexity measures

A centralised algorithm is efficient if it does not require much time (number of computational steps) or space (amount of memory). These complexity measures give rise to complexity classes such as P (decision problems solvable in polynomial time) and PSPACE (decision problems solvable in polynomial space).

In parallel algorithms, yet another resource in addition to time and space is the number of computers. Indeed, often there is a trade-off between the running time and the number of computers: the problem can be solved faster if there are more computers running in parallel (see speedup). If a decision problem can be solved in polylogarithmic time by using a polynomial number of processors, then the problem is said to be in the class NC.[30] The class NC can be defined equally well by using the PRAM formalism or Boolean circuits – PRAM machines can simulate Boolean circuits efficiently and vice versa.[31]

In the analysis of distributed algorithms, more attention is usually paid on communication operations than computational steps. Perhaps the simplest model of distributed computing is a synchronous system where all nodes operate in a lockstep fashion. During each communication round, all nodes in parallel (1) receive the latest messages from their neighbours, (2) perform arbitrary local computation, and (3) send new messages to their neighbours. In such systems, a central complexity measure is the number of synchronous communication rounds required to complete the task.[32]

This complexity measure is closely related to the diameter of the network. Let D be the diameter of the network. On the one hand, any computable problem can be solved trivially in a synchronous distributed system in approximately 2D communication rounds: simply gather all information in one location (D rounds), solve the problem, and inform each node about the solution (D rounds).

On the other hand, if the running time of the algorithm is much smaller than D communication rounds, then the nodes in the network must produces their output without having the possibility to obtain information about distant parts of the network. In other words, the nodes must make globally consistent decisions based on information that is available in their local neighbourhood. Many distributed algorithms are known with the running time much smaller than D rounds, and understanding which problems can be solved by such algorithms is one of the central research questions of the field.[33]

Other commonly used measures are the total number of bits transmitted in the network (cf. communication complexity).

Other problems

Traditional computational problems take the perspective that we ask a question, a computer (or a distributed system) processes the question for a while, and then produces an answer and stops. However, there are also problems where we do not want the system to ever stop. Examples of such problems include the dining philosophers problem and other similar mutual exclusion problems. In these problems, the distributed system is supposed to continuously coordinate the use of shared resources so that no conflicts or deadlocks occur.

There are also fundamental challenges that are unique to distributed computing. The first example is challenges that are related to fault-tolerance. Examples of related problems include consensus problems,[34] Byzantine fault tolerance,[35] and self-stabilisation.[36]

A lot of research is also focused on understanding the asynchronous nature of distributed systems:

Properties of distributed systems

So far the focus has been on designing a distributed system that solves a given problem. A complementary research problem is studying the properties of a given distributed system.

The halting problem is an analogous example from the field of centralised computation: we are given a computer program and the task is to decide whether it halts or runs forever. The halting problem is undecidable in the general case, and naturally understanding the behaviour of a computer network is at least as hard as understanding the behaviour of one computer.

However, there are many interesting special cases that are decidable. In particular, it is possible to reason about the behaviour of a network of finite-state machines. One example is telling whether a given network of interacting (asynchronous and non-deterministic) finite-state machines can reach a deadlock. This problem is PSPACE-complete,[40] i.e., it is decidable, but it is not likely that there is an efficient (centralised, parallel or distributed) algorithm that solves the problem in the case of large networks.

Architectures

Various hardware and software architectures are used for distributed computing. At a lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of whether that network is printed onto a circuit board or made up of loosely-coupled devices and cables. At a higher level, it is necessary to interconnect processes running on those CPUs with some sort of communication system.

Distributed programming typically falls into one of several basic architectures or categories: Client-server, 3-tier architecture, N-tier architecture, Distributed objects, loose coupling, or tight coupling.

  • Client-server — Smart client code contacts the server for data, then formats and displays it to the user. Input at the client is committed back to the server when it represents a permanent change.
  • 3-tier architecture — Three tier systems move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are 3-Tier.
  • N-tier architecture — N-Tier refers typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success of application servers.
  • Tightly coupled (clustered) — refers typically to a cluster of machines that closely work together, running a shared process in parallel. The task is subdivided in parts that are made individually by each one and then put back together to make the final result.
  • Peer-to-peer — an architecture where there is no special machine or machines that provide a service or manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and servers.
  • Space based — refers to an infrastructure that creates the illusion (virtualization) of one single address-space. Data are transparently replicated according to application needs. Decoupling in time, space and reference is achieved.

Another basic aspect of distributed computing architecture is the method of communicating and coordinating work among concurrent processes. Through various message passing protocols, processes may communicate directly with one another, typically in a master/slave relationship. Alternatively, a "database-centric" architecture can enable distributed computing to be done without any form of direct inter-process communication, by utilizing a shared database.[41]

See also

Notes

  1. ^ Andrews (2000). Dolev (2000). Ghosh (2007), p. 10.
  2. ^ Godfrey (2002).
  3. ^ Lynch (1996), p. 1.
  4. ^ Andrews (2000), p. 291–292. Dolev (2000), p. 5.
  5. ^ Ghosh (2007), p. 10.
  6. ^ Andrews (2000), p. 8–9, 291. Dolev (2000), p. 5. Ghosh (2007), p. 3. Lynch (1996), p. xix, 1. Peleg (2000), p. xv.
  7. ^ Andrews (2000), p. 291. Ghosh (2007), p. 3. Peleg (2000), p. 4.
  8. ^ Ghosh (2007), p. 3–4. Peleg (2000), p. 1.
  9. ^ Ghosh (2007), p. 4. Peleg (2000), p. 2.
  10. ^ Ghosh (2007), p. 4, 8. Lynch (1996), p. 2–3. Peleg (2000), p. 4.
  11. ^ Lynch (1996), p. 2. Peleg (2000), p. 1.
  12. ^ Ghosh (2007), p. 7. Lynch (1996), p. xix, 2. Peleg (2000), p. 4.
  13. ^ Ghosh (2007), p. 10. Keidar (2008).
  14. ^ Lynch (1996), p. xix, 1–2. Peleg (2000), p. 1.
  15. ^ Peleg (2000), p. 1.
  16. ^ Ghosh (2007), p. 10.
  17. ^ Papadimitriou (1994), Chapter 15. Keidar (2008).
  18. ^ See references in Introduction.
  19. ^ Andrews (2000), p. 348.
  20. ^ Andrews (2000), p. 32.
  21. ^ Peter (2004), The history of email.
  22. ^ Elmasri & Navathe (2000), Section 24.1.2.
  23. ^ Andrews (2000), p. 10–11. Ghosh (2007), p. 4–6. Lynch (1996), p. xix, 1. Peleg (2000), p. xv. Elmasri & Navathe (2000), Section 24.
  24. ^ Cormen, Leiserson & Rivest (1990), Section 30.
  25. ^ Herlihy & Shavit (2008), Chapters 2-6.
  26. ^ Lynch (1996)
  27. ^ Cormen, Leiserson & Rivest (1990), Sections 28 and 29.
  28. ^ Cole & Vishkin (1986). Cormen, Leiserson & Rivest (1990), Section 30.5.
  29. ^ Andrews (2000), p. ix.
  30. ^ Arora & Barak (2009), Section 6.7. Papadimitriou (1994), Section 15.3.
  31. ^ Papadimitriou (1994), Section 15.2.
  32. ^ Lynch (1996), p. 17–23.
  33. ^ Peleg (2000), Sections 2.3 and 7. Linial (1992). Naor & Stockmeyer (1995).
  34. ^ Lynch (1996), Sections 5–7. Ghosh (2007), Chapter 13.
  35. ^ Lynch (1996), p. 99–102. Ghosh (2007), p. 192–193.
  36. ^ Dolev (2000). Ghosh (2007), Chapter 17.
  37. ^ Lynch (1996), Section 16. Peleg (2000), Section 6.
  38. ^ Lynch (1996), Section 18. Ghosh (2007), Sections 6.2–6.3.
  39. ^ Ghosh (2007), Section 6.4.
  40. ^ Papadimitriou (1994), Section 19.3.
  41. ^ A database-centric virtual chemistry system, J Chem Inf Model. 2006 May-Jun;46(3):1034-9

References

Books
Articles
Web sites

Further reading

Books
  • Tel, Gerard (1994). Introduction to Distributed Algorithms. Cambridge University Press. 
  • Attiya, Hagit and Welch, Jennifer (2004). Distributed Computing: Fundamentals, Simulations, and Advanced Topics. Wiley-Interscience.  ISBN 0471453242.
Articles

External links


 
 

 

Copyrights:

Sci-Tech Dictionary. McGraw-Hill Dictionary of Scientific and Technical Terms. Copyright © 2003, 1994, 1989, 1984, 1978, 1976, 1974 by McGraw-Hill Companies, Inc. All rights reserved.  Read more
Sci-Tech Encyclopedia. McGraw-Hill Encyclopedia of Science and Technology. Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved.  Read more
Computer Desktop Encyclopedia. THIS COPYRIGHTED DEFINITION IS FOR PERSONAL USE ONLY.
All other reproduction is strictly prohibited without permission from the publisher.
© 1981-2009 Computer Language Company Inc.  All rights reserved.  Read more
Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Distributed computing" Read more