Masters Theses
Keywords and Phrases
Decision Trees; Evolutionary Computation; Feature Selection; Genetic Programming; Software Classification
Abstract
"Current manual techniques of static reverse engineering are inefficient at providing semantic program understanding. An automated method to categorize applications was developed in order to quickly determine pertinent characteristics. Prior work in this area has had some success, but a major strength of the approach detailed in this thesis is that it produces heuristics that can be reused for quick analysis of new data. The method relies on a genetic programming algorithm to evolve decision trees which can be used to categorize software. The terminals, or leaf nodes, within the trees each contain values based on selected features from one of several attributes: system calls, byte N-grams, opcode N-grams, registers, opcode collocation, cyclomatic complexity, and bonding. The evolved decision trees are reusable and achieve average accuracies above 90% when categorizing programs based on compiler origin, authorship, and versions. Developing new decision trees simply requires more labeled datasets and potentially different feature selection algorithms for other attributes, depending on the data being classified. The genetic programming algorithm used to evolve the decision trees was compared against C4.5, a classic decision tree technique.In all experiments, the genetic programming approach outperformed C4.5.
This thesis is an extension and expansion of the work published in the Computer Forensics in Software Engineering workshop at COMPSAC 2014 - the Annual 38th IEEE International Conference on Computer Software and Applications. This thesis is also being prepared as a journal article to be submitted for publication."--Abstract, page iii.
Advisor(s)
Tauritz, Daniel R.
Committee Member(s)
Mulder, Samuel A.
Chellappan, Sriram
Department(s)
Computer Science
Degree Name
M.S. in Computer Science
Sponsor(s)
Sandia Laboratories
United States. National Nuclear Security Administration
Publisher
Missouri University of Science and Technology
Publication Date
Summer 2014
Pagination
ix, 56 pages
Note about bibliography
Includes bibliographical references (pages 52-55).
Rights
© 2014 Jasenko Hosic, All rights reserved.
Document Type
Thesis - Open Access
File Type
text
Language
English
Subject Headings
Decision treesComputational intelligence -- Simulation methodsGenetic programming (Computer science)
Thesis Number
T 10512
Electronic OCLC #
894579837
Recommended Citation
Hosic, Jasenko, "Evolving decision trees for the categorization of software" (2014). Masters Theses. 7308.
https://scholarsmine.mst.edu/masters_theses/7308