Unsupervised anomaly detection with non-numerical sequence data by average index difference, with applicaiton to masquerade detection


Stefan Jan Skudlarek, Hirosuke Yamamoto

Applied Stochastic Models in Business and Industry, vol.30, no.5, pp.632-656, September/October 2014


Anomaly detection within non-numerical sequence data has developed into an important topic of data mining, but comparatively little research has been done regarding anomaly detection without training data (unsupervised anomaly detection). One application found in computer security is the detection of a so-calledmasquerade attack, which consists of an attacker abusing a regular account. This leaves only the session input, which is basically a string of non-numerical commands, for analysis. Our previous approach to this problem introduced the use of the so-called average index difference function for mapping the non-numerical symbol data to a numerical space. In the present paper, we examine the theoretical properties of the average index difference function, present an enhanced unsupervised anomaly detection algorithm based on the average index difference function, show the parameters to be theoretically inferable, and evaluate the performance using real-world data.


Index Terms : unsupervised; anomaly detection; sequence data; non-numerical; masquerade; average index difference

DOI: 10.1002/asmb.2057