Anomalies are “items, events or observations which do not conform to an expected pattern or other items in a dataset”. In a data-driven business, finding anomalies in data is a critical process; if data can be thought of as the lifeblood of a data-driven enterprise, anomaly detection is its heart.
Automated, machine-learning based anomaly detection is a key requirement for any business designed to scale, because data grows faster than the human ability to read it. An average enterprise has 200 terabytes of data and it is growing at a geometric rate, tripling every five years by one estimate.
Up until now, the race of data intelligence against data growth has concentrated on applying increasing numbers of human workers to ‘tag’ and tokenize data to make it manipulable and meaningful to data analytics. A modern exemplar of the human-centered tagging approach might be the work conducted by personalization vendors like A Touch of Modern or Gilt Group or in advertising technology, a data intelligence vendor like Adchemy, recently acquired by Walmart Labs, or the crowdsourced data curation services utilized by Bing in conjunction with Lionbridge, a global outsourcing company.
These companies represent an empirically tested but non-scalable means of increasing data digestion and reading, because the race of human tagging against big data growth is a losing race. Big data volume means that today, a small fraction of the data contained in an enterprise is tagged and even less analyzed. An IDC estimate suggests that 23% of the world’s data would have value if tagged and analyzed. Around 3% of the useful data in the world is tagged and even less analyzed. Data is growing at exponentially increasing rates, while human tagging at its very best manages a geometric rate of increase in data digestion across small, limited sub-sections of the overall universe of data in an enterprise.
Fortunately for big data business and data workers in general, fast RAM, SSD and cloud storage solutions are available at low cost today that allows anomaly detection and machine learning to run faster than it ever has before — fast enough, in fact, to produce relevant data intelligence without the requirement for a laboriously tended “private garden” of manually tagged data.
Anomaly detection becomes more complex as operations scale in size, involving complex, multivariate regressions and correlations across multiple measurements. Thus threshold-based alarms that predominate in the business intelligence market are becoming irrelevant. People are looking towards machine intelligence to give them the right answers.
The benefits of a truly scalable and fast machine-intelligence driven anomaly detection system are profound, and they cash out into very real competitive advantages for enterprises. For instance, a small retailer experiencing a rise in shopping cart views and cancellations can be simply and temporally contextualized if it occurs during holiday season. The same phenomenon in a major online retailer context with anomalous activity in security systems and vendor account activity, however, might also constitute a signal of the data exfiltration stage of a hack; in the context of a higher-magnitude increase in total page views it may even indicate something completely different. Smart, context-aware, machine-learning anomaly detection gives you this information and more.
As we continue iterating upon our pre-release version, maximizing the informational benefits of anomaly detection and creating a context-aware process are key developmental concerns for us. At LogBase, we’re not using the history of tagging to improve tagging — we’re using the history of data itself to revolutionize the entire data intelligence process itself.
We’re always interested in learning about new contexts and how our vision of anomaly detection can serve in a variety of contexts. To learn more about how LogBase can optimize data-driven businesses, contact us.