Abstract

Error-detecting algorithms can determine when, at run time, a program deviates from its expected behavior due to a hardware, software or communication error. In a fixed interconnect multiprocessor system, the error detecting ability heavily depends on the number of faults, which is bounded, and their spatial distribution. Otherwise multiple fault occurrences can mask each other. This paper provides a general method for computing the overall system failure bound, the maximal fault index, from the system topology and local communication patterns. The result of the computation is used to design a mapping of processes to processor groups such that multiple processor failures preserve the error-detecting ability of the algorithm. We show the problem of finding the maximal fault index to be NP-Hard and show, for certain regular topologies, that the problem yields polynomially computable embeddings. Finally, we give an example of mapping an error detecting matrix relaxation algorithm derived from program verification using Changeling.

Recommended Citation

Schollmeyer, Martina and McMillin, Bruce M., "A General Method for Maximizing the Error-Detecting Ability of Distributd Algorithms" (1993). Computer Science Technical Reports. 38.
https://scholarsmine.mst.edu/comsci_techreports/38

Department(s)

Computer Science

Comments

This work was supported in part by the National Science Foundation under Grant Numbers MSS-9216479 and CDA-9222827, and, in part, from the Air Force Office of Scientific Research under contract numbers F49620-92-J-0546 and F49620-93-I-0409.

Report Number

CSC-93-16

Document Type

Technical Report

Document Version

Final Version

File Type

text

Language(s)

English

Rights

Publication Date

23 Jun 1993

Download

Included in

Computer Sciences Commons

COinS

Computer Science Technical Reports

A General Method for Maximizing the Error-Detecting Ability of Distributd Algorithms

Abstract

Recommended Citation

Department(s)

Comments

Report Number

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Computer Science Technical Reports

A General Method for Maximizing the Error-Detecting Ability of Distributd Algorithms

Author

Abstract

Recommended Citation

Department(s)

Comments

Report Number

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations