LogBase Security – How do we keep things secure?

LogBase Corporate Security

Compromising an employee laptop/account is the most popular method to gain access to a corporate environment. So we take extra care in protecting our corporate IT systems.

  • All employee laptops are encrypted.
  • Email and other accounts are protected using multi-factor authentication.
  • Employees use secure VPN connections to connect to the backend infrastructure.

Anomaly Detection And The Business Value of Machine Learning

Anomalies are “items, events or observations which do not conform to an expected pattern or other items in a dataset”. In a data-driven business, finding anomalies in data is a critical process; if data can be thought of as the lifeblood of a data-driven enterprise, anomaly detection is its heart.

Automated, machine-learning based anomaly detection is a key requirement for any business designed to scale, because data grows faster than the human ability to read it. An average enterprise has 200 terabytes of data and it is growing at a geometric rate, tripling every five years by one estimate.

Machine Data Intelligence: On-Premises or In-Cloud?

Humans are not always perfectly rational decision-makers, and more often than not log file analysis is an investment made by human decision-makers only after conditions have forced their hand. Complicating matters, there’s a lot of media attention on the disruptive impact of the cloud computing paradigm and the technologies that it enables — commodity computing, software as a service, infrastructure as a service and big data. In this article we will discuss service delivery modes — specifically, the question of on-premises vs. cloud.

on-premises doesn’t automatically mean ‘secure’
On-premises installations are traditionally thought of as more secure and less prone to attack than cloud-based systems. For this reason, security-conscious organizations strongly prefer on-premises solutions to the exclusion of public cloud. Oftentimes, security conditions will dictate that this will be an “air-gapped” private cloud infrastructure — machine data intelligence within the defense and aerospace spheres are prime examples.

The Uses of Log File Analysis: Log File Analysis in Security

Log file analysis is critical in modern IT operations, development and security. Yet it’s difficult, manually intensive, and unlike many other cognitive aspects of working on software addressed by everything from IDEs to coding guides to APIs to libraries, the human searching, reading and inferential thinking required by log file analysis is difficult to automate. Perhaps most importantly, the information it produces is difficult for non-technical decision-makers to use or even appreciate.

From discussions with system administrators and data scientists in the field, we’ve heard a range of common issues arising in modern businesses facing the difficulty and the necessity of log file analysis. The most common of them that crystallize into two issues:

  • Prospect-based risk: Management is reluctant to invest in log file analysis until the prospect of some exigent circumstance forces their hand — usually a site crash, data breach or IT crisis of some kind.
  • Data accessibility for decision-makers: Business intelligence is fundamentally useless in the long run if it’s not made accessible and comprehensible to non-technical decision-makers.

The clearest examples of prospect-based risk assessment and decision-maker data accessibility occur in IT security.

Why LogBase: Understanding the value of log data accessibility

“Those who cannot remember the past are condemned to repeat it.” – George Santayana

why log data?

The founders of LogBase have chosen log data as a core area of innovation because log data encapsulates — directly, or inferentially — the most significant aspects of a business’ operation. Is it working? Is it working well? Could it work better? The answers to all of these questions usually lie in log data.

SQL over anything with Optiq

Of late there has been a lot of attention over SQL query planning engines, especially with the rise of “SQL on Hadoop”. With the proliferation of multiple storage solutions like Hadoop, Hbase and NoSQL DBs has also come the problem of accessing data in a uniform way. Each storage solution has come up with its own set of APIs and “SQL-like” querying languages. This poses a serious challenge and steepens the learning curve for scientists and researchers accessing data on a large scale. Many players in the Big Data space have realized this and are moving towards ANSI SQL standards.

Interactively analyzing a large JSON in memory

I have been doing a comparative study on different ways to analyse a large JSON file in memory. Our basic requirement is to do interactive analysis on nested data; for test purposes I am refraining from using a distributed/big data set up. For this use case, what’s interesting is the variance in test results for analysis in a simple row vs columnar fashion.