There is no easy answer here, but it requires some experience and solid understanding of how the system works, the service that it is intended to provide, and the probable ways that it can fail to provide that service... (Software crash, hardware failure, network outage, power outage, hardware breaks, etc)...
There are many buzzwords that come out of this careful study and analysis... here are a few to'
think about...
-Single Points of Failure
-Redundancy
-Active/Standby
-Active/Active
-Crosschecking
-Fencing off (related to making sure bad hardware stays down and doesn't come back up and online for another disastrous failure, or service outage.
-Hot Swap (as in HS power supplies and hard drives)
-Disaster Recovery Center
-Continuous Backup
That should be a good start to the understanding. It is not for the weak or inexperienced as normally, there is a lot of money (or liability) riding on fault tolerant systems, and some very serious consequences can happen as a result of a service/system failure.
Fault Tolerant refers to systems capable of uptimes of 99.999% or higher. RAID disks, multi-pathing, RAIN Networking are some technologies in use in fault tolerant systems that can continue operation during a failure. System backups are NOT part of a fault tolerant plan per se but, are required in a fault tolerant system in case of a complete failure. System backups should always be kept outside of the fault tolerant environment.
P. N Marinos has written: 'A simulator for reliability predictions of fault-tolerant system architectures' -- subject(s): Redundancy (Engineering), Fault-tolerant computing
COMPUTERS, LRT's
RAID 1 is the most fault tolerant, as all drives have to fail to lose data.
RAID (Redundant Array of Independent Disks) uses two or more drives in combination to create a fault-tolerant system. RAID configurations distribute data across multiple drives to improve performance, redundancy, or a combination of both.
Raymond S. Lim has written: 'Fault-tolerant computing' -- subject(s): Fault-tolerant computing
Parallel Backbone
Yes True
Parallel Backbone
Parallel Backbone
Brendan Tangney has written: 'Some ideas on support for fault tolerance in COMANDOS, an object oriented distributed system' -- subject(s): Fault-tolerant computing, Artificial intelligence
Fault-tolerant computer