Computer Science Faculty Research & Creative Works

Identifying Character Non-Independence in Phylogenetic Data using Data Mining Techniques

Jennifer Leopold, Missouri University of Science and TechnologyFollow
Anne M. Maglia, Missouri University of Science and TechnologyFollow
M. Thakur
B. Patel
Fikret Erçal, Missouri University of Science and TechnologyFollow

Abstract

Undiscovered relationships in a data set may confound analyses, particularly those that assume data independence. Such problems occur when characters used for phylogenetic analyses are not independent of one another. A main assumption of phylogenetic inference methods such as maximum likelihood and parsimony is that each character serves as an independent hypothesis of evolution. When this assumption is violated, the resulting phylogeny may not reflect true evolutionary history. Therefore, it is imperative that character non-independence be identified prior to phylogenetic analyses. To identify dependencies between phylogenetic characters, we applied three data mining techniques: 1) Bayesian networks, 2) decision tree induction, and 3) rule induction from coverings. We briefly discuss the main ideas behind each strategy, show how each technique performs on a small sample data set, and apply each method to an existing phylogenetic data set. We discuss the interestingness of the results of each method, and show that, although each method has its own strengths and weaknesses, rule induction from coverings presents the most useful solution for determining dependencies among phylogenetic data at this time.

Recommended Citation

J. Leopold et al., "Identifying Character Non-Independence in Phylogenetic Data using Data Mining Techniques," Proceedings of the 2nd Asia-Pacific Bioinformatics Conference (2004: Jan. 18-22, Dunedin, New Zealand), Australian Computer Society, Inc., Jan 2004.

The definitive version is available at https://doi.org/10.2495/DATA070051

Meeting Name

2nd Asia-Pacific Bioinformatics Conference, APBC2004 (2004: Jan. 18-22, Dunedin, New Zealand)

Department(s)

Computer Science

Keywords and Phrases

Character Independence; Data Mining; Machine Learning; Phylogenetic Data

Document Type

Article - Conference proceedings

Document Version

Final Version

File Type

text

Language(s)

English

Rights

Publication Date

22 Jan 2004

Download

Full Text Link

Included in

Biology Commons

COinS

Computer Science Faculty Research & Creative Works

Identifying Character Non-Independence in Phylogenetic Data using Data Mining Techniques

Abstract

Recommended Citation

Meeting Name

Department(s)

Keywords and Phrases

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Search

Browse

Faculty Gallery

Author Corner

Related Content

Useful Links

Article Locations

Computer Science Faculty Research & Creative Works

Identifying Character Non-Independence in Phylogenetic Data using Data Mining Techniques

Author

Abstract

Recommended Citation

Meeting Name

Department(s)

Keywords and Phrases

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Included in

Share

Search

Browse

Faculty Gallery

Author Corner

Related Content

Useful Links

Article Locations