Share on Facebook Share on Twitter Email
Answers.com

Self-stabilization

 
Wikipedia: Self-stabilization

Self-stabilization is a concept of fault-tolerance in distributed computing. Distributed computing systems are challenging to debug and analyze, because their behaviour can change significantly with a small change in the environment. As a result, strong properties (properties that hold under a variety of circumstances) of such systems are especially important to simplify systems analysis and to prove system correctness. Self-stabilization is considered a highly desirable property. A distributed system that is self-stabilizing will end up in a correct state no matter what state it is initialized with. That correct state is reached after a finite number of execution steps.

At first glance, the guarantee of self stabilization may seem less promising than that of the more traditional fault-tolerance of algorithms, that guarantee that the system always remains in a correct state, under certain kinds of state transitions. However, that traditional fault tolerance cannot always be achieved, for example, it cannot be achieved when the system starts in an incorrect state. Traditional fault tolerance is also impossible under certain kinds of strong faults, such as outsider external interventions.

The ability to recover without external intervention is very desirable in modern computer and telecommunications networks, since it gives them the ability to cope with faults that were not foreseen in the design of the algorithm. No matter what were the faults, as long as no new faults happen, system can repair errors and return to normal operation on its own. Hence, many years after the seminal paper of Dijkstra, this concept remains important as it presents an important foundation for self-managing computer systems and fault-tolerant systems. As a result, Dijkstra's paper received the 2002 ACM PODC Influential-Paper Award - one of the highest recognitions in the distributed computing community.[1]

Contents

History

E.W. Dijkstra in 1974 presented the concept of self-stabilization, prompting further research in this area.[2] He also presented the first self-stabilizing algorithms that did not rely on strong assumptions on the system. Some previous protocols used in practice did actually stabilize, but only assuming the existence of a clock that was global to the system, and assuming a known upper bound on the duration of each system transition.

Overview

A distributed algorithm is self-stabilizing if, starting from an arbitrary state, it is guaranteed to converge to a legitimate state and remain in a legitimate set of states thereafter. A state is legitimate if starting from this state the algorithm satisfies its specification. The property of self-stabilization enables a distributed algorithm to recover from a transient fault regardless of its nature. Moreover, a self-stabilizing algorithm does not have to be initialized as it eventually starts to behave correctly regardless of its initial state.

Dijkstra's paper, which introduces the concept of self-stabilization, presents an example in the context of a "token ring" — a network of computers ordered in a circle, such that exactly one of them is supposed to "hold a token" at any given time.

  • Not holding a token is a correct state for each computer in this network, since the token can be held by another computer. However, if every computer is in the state of "not holding a token" then the network as a whole is not in a correct state.
  • Similarly, if more than one computer "has a token" then this is not a correct state for the network, although it cannot be observed to be incorrect by viewing any computer individually. Since every computer can "see" only the states of its two neighbors, it is hard for the computers to decide whether the network as a whole is in a correct state.

The first self-stabilizing algorithms did not detect errors explicitly in order to subsequently repair them. Instead, they constantly pushed the system towards a legitimate state, even without explicitly detecting error states. Since traditional methods for detecting an error[3][not in citation given] were often very difficult and time-consuming, such a behavior was considered desirable.

Efficiency improvements

More recently, researchers have presented newer methods for light-weight error detection for self-stabilizing systems using local checking [4][5]. The term local refers to a part of a computer network. When local detection is used, a computer in a network is not required to communicate with the entire network in order to detect an error — the error can be detected by having each computer communicate only with its nearest neighbors. These local detection methods simplified the task of designing self-stabilizing algorithms considerably. This is because the error detection mechanism and the recovery mechanism can be designed separately. Newer algorithms based on these detection methods also turned out to be much more efficient. Moreover, these papers suggested rather efficient general transformers to transform non self stabilizing algorithms to become self stabilizing. The idea is to

  1. Run the non self stabilizing protocol, while
  2. detect faults (during the execution of the given protocol) using the above mentioned detection methods,
  3. then, apply a (self stabilizing) "reset" protocol to return the system to some predetermined initial state, and, finally,
  4. restart the given protocol.

The combination of these 4 parts is self stabilizing. Initial self stabilizing protocols were also presented in the above papers. More efficient reset protocols were presented later, e.g. [6].

Additional efficiency was introduced with the notion of time-adaptive protocols.[7] The idea behind these is that when only a small number of errors occurs, the recovery time can (and should) be made short. Dijkstra's original self-stabilization algorithms do not have this property.

A useful property of self-stabilizing algorithms is that they can be composed of layers if the layers do not exhibit any circular dependencies. The stabilization time of the composition is then bounded by the sum of the individual stabilization times of each layer.

Time complexity

The time complexity of a self-stabilizing algorithm is measured in (asynchronous) rounds or cycles.

  • A round is a shortest execution trace in which each processor executes at least one step.
  • Similarly, a cycle is a shortest execution trace in which each processor executes at least one complete iteration of its repeatedly executed list of commands.

It is also interesting to measure the output stabilization time. For that, a subset of the state variables is defined to be externally visible (the output). Certain states of outputs are defined to be correct (legitimate). The set of the outputs of all the components of the system is said to have stabilized at the time that it starts to be correct, provided it stays correct indefinitely, unless additional faults occur. The output stabilization time is the time (the number of (asynchronous) rounds) until the output stabilized.[4]

Definition

A system is self-stabilizing if and only if:

  1. Starting from any state, it is guaranteed that the system will eventually reach a correct state (convergence).
  2. Given that the system is in a correct state, it is guaranteed to stay in a correct state, provided that no fault happens (closure).

A system is said to be randomized self-stabilizing if and only if it is self-stabilizing and the expected number of rounds needed to reach a correct state is bounded by some constant k.[8]

A self-stabilizing algorithm is silent if and only if it converges to a global state where the values of communication registers used by the algorithm remain fixed.[9]

Related work

An extension of the concept of self-stabilization is that of superstabilization.[10] The intent here is to cope with dynamic distributed systems that undergo topological changes. In classical self-stabilization theory, arbitrary changes are viewed as errors where no guarantees are given until the system has stabilized again. With superstabilizing systems, there is a passage predicate that is always satisfied while the system's topology is reconfigured.

References

  1. ^ "PODC Influential Paper Award: 2002". ACM Symposium on Principles of Distributed Computing. http://www.podc.org/influential/2002.html. Retrieved 2009-09-01. 
  2. ^ Dijkstra, Edsger W. (1974), "Self-stabilizing systems in spite of distributed control", Communications of the ACM 17 (11): 643–644, doi:10.1145/361179.361202, http://www.cs.utexas.edu/~EWD/ewd04xx/EWD426.PDF .
  3. ^ Katz, Shmuel; Perry, Kenneth J. (1993), "Self-stabilizing extensions for meassage-passing systems", Distributed Computing 7 (1): 17–26, doi:10.1007/BF02278852 .
  4. ^ a b Awerbuch, Baruch; Patt-Shamir, Boaz; Varghese, George (1991), "Self-stabilization by local checking and correction", Proc. 32nd Symposium on Foundations of Computer Science (FOCS), pp. 268–277, doi:10.1109/SFCS.1991.185378 .
  5. ^ Yehuda Afek, Shay Kutten, Moti Yung. The Local Detection Paradigm and Its Application to Self-Stabilization. Theor. Comput. Sci. 186(1-2): 199-229 (1997).
  6. ^ [Baruch Awerbuch, Shay Kutten, Yishay Mansour, Boaz Patt-Shamir, George Varghese. Time optimal self-stabilizing synchronization. ACM STOC 1993: 652-661.]
  7. ^ Shay Kutten, Boaz Patt-Shamir: Stabilizing Time-Adaptive Protocols. Theor. Comput. Sci. 220(1): 93-111 (1999).
  8. ^ Dolev, Shlomi (2000), Self-Stabilization, MIT Press, ISBN 0-262-04178-2 .
  9. ^ Shlomi Dolev, Mohamed G. Gouda, and Marco Schneider. Memory requirements for silent stabilization. In PODC '96: Proceedings of the fifteenth annual ACM Symposium on Principles of Distributed Computing, pages 27--34, New York, NY, USA, 1996. ACM Press. Online extended abstract.
  10. ^ Dolev, Shlomi; Herman, Ted (1997), "Superstabilizing protocols for dynamic distributed systems", Chicago Journal of Theoretical Computer Science, http://cjtcs.cs.uchicago.edu/articles/1997/4/contents.html , article 4.

Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
 
 
Learn More
Shlomi Dolev
Superstabilization
Dijkstra Prize

What is a stability room? Read answer...
What is a stabilizer bar? Read answer...
Matt stabile is? Read answer...

Help us answer these
What you stability?
What are stabilizer?
When is stabilizes?

Post a question - any question - to the WikiAnswers community:

 

Copyrights:

Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Self-stabilization" Read more