Fault-tolerant Parallel Matrix Multiplication With One Iteration Fault Detection Latency

Chul Eui Hong
Bruce M. McMillin, Missouri University of Science and TechnologyFollow

Abstract

The checksum technique is a low-cost method to detect errors in matrix operations performed by processor arrays. The fault detection of this method is done only at problem termination, so this method is not an effective fault tolerance technique for large scale matrix multiplication. This paper presents a new algorithm, the ID algorithm, which minimizes the fault-detection latency, In the ID algorithm, a fault is detected as soon as the fault occurs instead of at problem termination. For n2 processors, the fault-latency time of the ID algorithm is l/n of that of checksum algorithm with a run-time penalty of O(nlog2n) in a nxn matrix operation. This new algorithm has better performance in terms of error coverage and expected run time in large scale matrix multiplications such as signal and image processing, weather prediction, and finite element analysis.

Recommended Citation

C. E. Hong and B. M. McMillin, "Fault-tolerant Parallel Matrix Multiplication With One Iteration Fault Detection Latency," Proceedings - International Computer Software and Applications Conference, pp. 665 - 672, article no. 170258, Institute of Electrical and Electronics Engineers, Jan 1991.

The definitive version is available at https://doi.org/10.1109/CMPSAC.1991.170258

Department(s)

Computer Science

Keywords and Phrases

Application-oriented fault tolerance; Multicomputers

International Standard Serial Number (ISSN)

0730-3157

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

01 Jan 1991

Computer Science Faculty Research & Creative Works

Fault-tolerant Parallel Matrix Multiplication With One Iteration Fault Detection Latency

Abstract

Recommended Citation

Department(s)

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Computer Science Faculty Research & Creative Works

Fault-tolerant Parallel Matrix Multiplication With One Iteration Fault Detection Latency

Author

Abstract

Recommended Citation

Department(s)

Keywords and Phrases

International Standard Serial Number (ISSN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations