Computer Science Faculty Research & Creative Works

Evolving Decision Trees for the Categorization of Software

Jasenko Hosic
Daniel R. Tauritz, Missouri University of Science and TechnologyFollow
Samuel A. Mulder

Abstract

Current manual techniques of static reverse engineering are inefficient at providing semantic program understanding. We have developed an automated method to categorize applications in order to quickly determine pertinent characteristics. Prior work in this area has had some success, but a major strength of our approach is that it produces heuristics that can be reused for quick analysis of new data. Our method relies on a genetic programming algorithm to evolve decision trees which can be used to categorize software. The terminals, or leaf nodes, within the trees each contain values based on selected features from one of several attributes: system calls, byte n-grams, opcode n-grams, cyclomatic complexity, and bonding. The evolved decision trees are reusable and achieve average accuracies above 95% when categorizing programs based on compiler origin and versions. Developing new decision trees simply requires more labeled datasets and potentially different feature selection algorithms for other attributes, depending on the data being classified.

Recommended Citation

J. Hosic et al., "Evolving Decision Trees for the Categorization of Software," Proceedings of the IEEE 38th Annual International Computers, Software and Applications Conference Workshops, COMPSACW 2014, p. 337, Institute of Electrical and Electronics Engineers (IEEE), Jan 2014.

The definitive version is available at https://doi.org/10.1109/COMPSACW.2014.59

Meeting Name

38th Annual IEEE Computer Software and Applications Conference Workshops, COMPSACW 2014 (2014: Jul. 27-29, Vasteras, Sweden)

Department(s)

Computer Science

Research Center/Lab(s)

Center for High Performance Computing Research

Sponsor(s)

Missouri University of Science and Technology. Natural Computation Laboratory

Keywords and Phrases

Genetic Programming; Program Understanding

International Standard Book Number (ISBN)

978-1479935789

Document Type

Article - Conference proceedings

Document Version

Citation

File Type

text

Language(s)

English

Rights

Publication Date

01 Jan 2014

Link to Full Text

COinS

Computer Science Faculty Research & Creative Works

Evolving Decision Trees for the Categorization of Software

Abstract

Recommended Citation

Meeting Name

Department(s)

Research Center/Lab(s)

Sponsor(s)

Keywords and Phrases

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations

Computer Science Faculty Research & Creative Works

Evolving Decision Trees for the Categorization of Software

Author

Abstract

Recommended Citation

Meeting Name

Department(s)

Research Center/Lab(s)

Sponsor(s)

Keywords and Phrases

International Standard Book Number (ISBN)

Document Type

Document Version

File Type

Language(s)

Rights

Publication Date

Share

Search

Browse

Author Corner

Related Content

Useful Links

Article Locations