Abstract

The checksum technique is a low-cost method to detect errors in matrix operations performed by processor arrays. The fault detection of this method is done only at problem termination, so this method is not an effective fault tolerance technique for large scale matrix multiplication. This paper presents a new algorithm, the ID algorithm, which minimizes the fault-detection latency, In the ID algorithm, a fault is detected as soon as the fault occurs instead of at problem termination. For n2 processors, the fault-latency time of the ID algorithm is l/n of that of checksum algorithm with a run-time penalty of O(nlog2n) in a nxn matrix operation. This new algorithm has better performance in terms of error coverage and expected run time in large scale matrix multiplications such as signal and image processing, weather prediction, and finite element analysis.

Department(s)

Computer Science

Keywords and Phrases

Application-oriented fault tolerance; Multicomputers

International Standard Serial Number (ISSN)

0730-3157

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2023 Institute of Electrical and Electronics Engineers, All rights reserved.

Publication Date

01 Jan 1991

Share

 
COinS