Performance Analysis of Fault Tolerant Multistage Interconnection Networked Parallel Instrumentation with Concurrent Testing and Diagnosis
Performance and reliability are two of the most crucial issues in today's high-performance instrumentation and measurement systems. Instrumentation and measurement systems have found and enjoyed their performance enhancement through parallel and distributed processing. High speed and density Multistage Interconnection Networks (MINs) is a widely-used subsystem of parallel processing and communication systems. New performance models are proposed to evaluate the fault tolerant MIN in this paper, thereby establishing a sound foundation for assuring the performance and reliability of fault tolerant MINs with high confidence level during parallel instrumentation. A concurrent fault detection and recovery scheme for MINs is introduced to enable a generic approach to fault tolerance by rerouting over the redundant interconnection links. A switch architecture to realize the concurrent testing and diagnosis is shown. The proposed performance models are developed and used to evaluate the compound effect of the fault tolerant operations such as testing, diagnosis and recovery on the throughput and delay. Results are shown on single transient and permanent stuck-at faults on links and storage units in switching elements. It is shown that the performance degradation for the overhead due to the fault tolerance is quite graceful while the performance degradation without fault recovery is unacceptable.
M. Choi et al., "Performance Analysis of Fault Tolerant Multistage Interconnection Networked Parallel Instrumentation with Concurrent Testing and Diagnosis," Proceedings of the 19th IEEE Instrumentation and Measurement Technology Conference (2002, Anchorage, AK), vol. 2, pp. 1481-1486, Institute of Electrical and Electronics Engineers (IEEE), May 2002.
The definitive version is available at http://dx.doi.org/10.1109/IMTC.2002.1007177
19th IEEE Instrumentation and Measurement Technology Conference: IMTC (2002: May 21-23, Anchorage, AK)
Electrical and Computer Engineering
Keywords and Phrases
Computer System Recovery; Data Communication Systems; Distributed Computer Systems; Failure Analysis; Fault Tolerant Computer Systems; Instrument Testing; Mathematical Models; Parallel Processing Systems; Performance; Telecommunication Links; Concurrent Fault Detection; Fault Detection; Multistage Interconnection Network; Parallel Instrumentation; Interconnection Networks; Diagnosis; Distributed Systems; Instrumentation; Parallel Processing; Performance Analysis
International Standard Book Number (ISBN)
International Standard Serial Number (ISSN)
Article - Conference proceedings
© 2002 Institute of Electrical and Electronics Engineers (IEEE), All rights reserved.