Abstract

Error-detecting algorithms can determine when, at run time, a program deviates from its expected behavior due to a hardware, software or communication error. In a fixed interconnect multiprocessor system, the error detecting ability heavily depends on the number of faults, which is bounded, and their spatial distribution. Otherwise multiple fault occurrences can mask each other. This paper provides a general method for computing the overall system failure bound, the maximal fault index, from the system topology and local communication patterns. The result of the computation is used to design a mapping of processes to processor groups such that multiple processor failures preserve the error-detecting ability of the algorithm. We show the problem of finding the maximal fault index to be NP-Hard and show, for certain regular topologies, that the problem yields polynomially computable embeddings. Finally, we give an example of mapping an error detecting matrix relaxation algorithm derived from program verification using Changeling.

Department(s)

Computer Science

Comments

This work was supported in part by the National Science Foundation under Grant Numbers MSS-9216479 and CDA-9222827, and, in part, from the Air Force Office of Scientific Research under contract numbers F49620-92-J-0546 and F49620-93-I-0409.

Report Number

CSC-93-16

Document Type

Technical Report

Document Version

Final Version

File Type

text

Language(s)

English

Rights

© 1993 University of Missouri--Rolla, All rights reserved.

Publication Date

23 Jun 1993

Share

 
COinS