Node Similarity For Anomaly Detection in Attributed Graphs
Abstract
Most graph-based anomaly detection work uses structural graph connectivity or node information for discovering anomalies in a graph. Approaches relying on solely node information for detecting anomalies do not exploit the structural information, and approaches relying on just the structural connectivity information do not exploit node label values, or attribute information. Little research has been done that uses both structural connectivity as well as node attributes for finding anomalies in data represented as a graph. In this work, we attempt to use both the node attribute information together with the structural connectivity in order to discover anomalies in graph. While existing approaches treat all the attribute values as discrete, when the attributes are numeric values, they lose their measure of similarity - or closeness of information. In this work, in order to preserve the closeness information, we consider the similarities in node values using not only single attributes, but also multiple attributes. In order to discover the similarity between the attribute values, we use discretization, distance-based similarity measures, and a k-means clustering approach. After discovering nodes with similar label values, we use revised labels together with structural properties for discovering anomalies in a graph. Our hypothesis is that if we use node label similarity information together with structural properties of the graph, we can detect anomalies which would be missed by approaches only relying on either structural connectivity or node attribute information.