Reliable Distributed Sorting Through the Application-oriented Fault Tolerance Paradigm

Bruce M. McMillin, Missouri University of Science and Technology
L. M. Ni

This document has been relocated to http://scholarsmine.mst.edu/comsci_facwork/273

There were 7 downloads as of 28 Jun 2016.

Abstract

The design and implementation of a reliable version of the distributed bitonic sorting algorithm using the application-oriented fault tolerance paradigm on a commercial multicomputer is described. Sorting assertions in general are discussed and the bitonic sort algorithm is introduced. Faulty behavior is discussed and a fault-tolerant parallel bitonic sort developed using this paradigm is presented. The error coverage and the response of the fault-tolerant algorithm to faulty behavior are presented. Both asymptotic complexity and the results of run-time experimental measurements on an Ncube multicomputer are given. The authors demonstrate that the application-oriented fault tolerance paradigm is applicable to problems of a noniterative nature