Active Learning for Anomaly Detection in Large-Scale Systems
Abstract: As networks are expected to grow more complex, developing effective algorithms for information gathering becomes more critical for learning tasks. In active learning (first considered by Chernoff in 1959, referred to as sequential design of experiments), a decision maker is required to actively choose among different processes to effectively shape the quality of the observations so as to optimize certain objectives in the system. In this talk I will present several results from my research on active learning for anomaly detection in cyber-systems. We consider a system with a large number of processes (each process is associated with a router, path, etc.), among which a few are anomalous (i.e., infected). At each time, a subset of the processes can be observed, and the observations from each chosen process follow two different distributions, depending on whether the process is normal or abnormal. The problem is to find a sequential search strategy that optimizes a certain objective in the network (e.g., detection delay, cost) subject to reliability constraints. (Asymptotically) optimal algorithms will be presented to solve the anomaly detection problem under different objectives. The problems considered in this work also find applications in target search, spectrum scanning in cognitive-radio networks, and event detection in sensor networks.