Effective and Real-Time In-App Activity Analysis in Encrypted Internet Traffic Streams
Abstract
The mobile in-App service analysis, aiming at classifying mobile internet traffic into different types of service usages, has become a challenging and emergent task for mobile service providers due to the increasing adoption of secure protocols for in-App services. While some efforts have been made for the classification of mobile internet traffic, existing methods rely on complex feature construction and large storage cache, which lead to low processing speed, and thus not practical for online real-time scenarios. To this end, we develop an iterative analyzer for classifying encrypted mobile traffic in a real-time way. Specifically, we first select an optimal set of most discriminative features from raw features extracted from traffic packet sequences by a novel Maximizing Inner activity similarity and Minimizing Different activity similarity (MIMD) measurement. To develop the online analyzer, we first represent a traffic flow with a series of time windows, which are described by the optimal feature vector and are updated iteratively at the packet level. Instead of extracting feature elements from a series of raw traffic packets, our feature elements are updated when a new traffic packet is observed and the storage of raw traffic packets is not required. The time windows generated from the same service usage activity are grouped by our proposed method, namely, recursive time continuity constrained KMeans clustering (rCKC). The feature vectors of cluster centers are then fed into a random forest classifier to identify corresponding service usages. Finally, we provide extensive experiments on real-world traffic data from Wechat, Whatsapp, and Facebook to demonstrate the effectiveness and efficiency of our approach. The results show that the proposed analyzer provides high accuracy in real-world scenarios, and has low storage cache requirement as well as fast processing speed.
Recommended Citation
J. Liu et al., "Effective and Real-Time In-App Activity Analysis in Encrypted Internet Traffic Streams," Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2017, Halifax, Canada), pp. 335 - 344, Association for Computing Machinery (ACM), Aug 2017.
The definitive version is available at https://doi.org/10.1145/3097983.3098049
Meeting Name
23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017 (2017: Aug. 13-17, Halifax, Canada)
Department(s)
Computer Science
Keywords and Phrases
Cache memory; Cryptography; Data mining; Decision trees; Digital storage; Internet; Iterative methods; Mobile devices; Discriminative features; Extracting features; Feature construction; Mobile service providers; Random forest classifier; Real-world scenario; Service usage; Time-series segmentation; Time series analysis; In-app analytics; Internet trafic analysis; Service usage classification; Time series segmentation
International Standard Book Number (ISBN)
978-1-4503-4887-4
Document Type
Article - Conference proceedings
Document Version
Citation
File Type
text
Language(s)
English
Rights
© 2017 Association for Computing Machinery (ACM), All rights reserved.
Publication Date
01 Aug 2017
Comments
This research was supported in part by the Natural Science Foundation of China (71329201) and Futurewei Technologies, Inc.