Abstract

We consider the problem of identifying the source of failure in a network after receiving alarms or having observed symptoms. To locate the root cause accurately and timely in a large communication system is challenging because a single fault can often result in a large number of alarms, and multiple faults can occur concurrently. In this paper, we present a new fault localization method using a machine-learning approach. We propose to use logistic regression to study the correlation among network events based on end-to-end measurements. Then based on the regression model, we develop fault hypothesis that best explains the observed symptoms. Unlike previous work, the machine-learning algorithm requires neither the knowledge of dependencies among network events, nor the probabilities of faults, nor the conditional probabilities of fault propagation as input. The 'low requirement' feature makes it suitable for large complex networks where accurate dependencies and prior probabilities are difficult to obtain. We then evaluate the performance of the learning algorithm with respect to the accuracy of fault hypothesis and the concentration property. Experimental results and theoretical analysis both show satisfactory performance.

Department(s)

Computer Science

Keywords and Phrases

Complex networks; computer network reliability; fault diagnosis; fault location; logistic regression; machine learning

International Standard Serial Number (ISSN)

2327-4662

Document Type

Article - Journal

Document Version

Citation

File Type

text

Language(s)

English

Rights

© 2024 Institute of Electrical and Electronics Engineers, All rights reserved.

Publication Date

01 Oct 2016

Share

 
COinS