Abstract
Error-detecting algorithms can determine when, at run time, a program deviates from its expected behavior due to a hardware, software or communication error. In a fixed interconnect multiprocessor system, the error detecting ability heavily depends on the number of faults, which is bounded, and their spatial distribution. Otherwise multiple fault occurrences can mask each other. This paper provides a general method for computing the overall system failure bound, the maximal fault index, from the system topology and local communication patterns. The result of the computation is used to design a mapping of processes to processor groups such that multiple processor failures preserve the error-detecting ability of the algorithm. We show the problem of finding the maximal fault index to be NP-Hard and show, for certain regular topologies, that the problem yields polynomially computable embeddings. Finally, we give an example of mapping an error detecting matrix relaxation algorithm derived from program verification using Changeling.
Recommended Citation
Schollmeyer, Martina and McMillin, Bruce M., "A General Method for Maximizing the Error-Detecting Ability of Distributd Algorithms" (1993). Computer Science Technical Reports. 38.
https://scholarsmine.mst.edu/comsci_techreports/38
Department(s)
Computer Science
Report Number
CSC-93-16
Document Type
Technical Report
Document Version
Final Version
File Type
text
Language(s)
English
Rights
© 1993 University of Missouri--Rolla, All rights reserved.
Publication Date
23 Jun 1993
Comments
This work was supported in part by the National Science Foundation under Grant Numbers MSS-9216479 and CDA-9222827, and, in part, from the Air Force Office of Scientific Research under contract numbers F49620-92-J-0546 and F49620-93-I-0409.