Why LogBase: Understanding the value of log data accessibility

“Those who cannot remember the past are condemned to repeat it.” – George Santayana

why log data?

The founders of LogBase have chosen log data as a core area of innovation because log data encapsulates — directly, or inferentially — the most significant aspects of a business’ operation. Is it working? Is it working well? Could it work better? The answers to all of these questions usually lie in log data.

what is log data?

Today, log data comprises a wide variety of different formats, different types of data and diverse business processes. All logs of all types have one basic dynamic: a log accumulates records of events; as events occur and are recorded, they are appended to a log. Log file might include:

Application logs: Applications like web servers, network switches, and firewalls generate log files recording the actions they take and crashes and exceptions they encounter.

Debug logs: these are the messages that are logged by the developer code to track and debug issues in code and debug any issues. This is usually logged as plain text without any specific schema to keep logging simple and easy.

Metrics: metrics are time-series data that are generated by applications measuring some aspect of performance, like CPU utilization or disk utilization percentages. These are usually stored in a time-series database like OpenTSDB or as RRD files.

Tracking events: These events are generated as a result of user actions on a site or an app. These are mainly used by business analysts to generate business metrics and by data scientists for building machine learning models. These events are usually logged in a well-structured format like Avro, JSON, or protobuf.

Here’s the problem, though. This is an example of a typical log:

A sample log file.

A sample log file.

If you’ve been working in IT for your entire life, you might not be able to fully appreciate how incomprehensible this is for more than 90% of the human race. It looks like… English? …but there’s numbers mixed in and IP addresses for things you don’t know connoting events you don’t understand.

And herein lies the problem: There is business history that everyone needs to know encoded in a massive store of files that look like this, and no one except for data scientists specializing in logs can help you figure out what it is.

why does log data matter?

When log data is fully exploited for its business value, the benefits are profound. Analysis of log data forms critical underpinnings of highly valuable basic operational intelligence: you literally cannot know what you have, who you’re serving, what you’re selling, who’s looking at you or if your business is even running without going through logs.

Business self-awareness: Business self-awareness is a business’ ability to perceive and regulate its business processes. As a business scales, its ability to self-perceive diminishes because data flow (a machine-driven process) naturally grows in volume and importance faster than the capacity to organize and curate it does (a human-driven process). Human data scientists, in contrast to machines, are relatively expensive, slow to spin up and thus difficult to scale.

Corporate memory: The log comprises the only record that exists of the results that can be expected from basic business systems – critical information that everyone in the organization needs. But data scientists are the only people in the organization who can understand it and process it into a format that decision-makers (or, for that matter, literally the rest of humanity) can even understand.

Predictive/inferential value: Unlike the universe of business data that your organization lives in, log data gets easier to deal with as it gets bigger, because the statistical measures that drive intelligence are working over a larger and larger sample set. It’s Statistics 101 — the bigger the sample size, the more confident we are in our analytics. This is a critical scaling dynamic to harness in a data business — as your knowledge of the particulars decreases at an accelerating rate, your knowledge of operations and ability to optimize and improve them can also increase at an exponentially accelerating rate.

Data discovery for the people: making logs work for your organization

At LogBase, we believe that when you — and by “you” we mean “the rest of of humanity other than data scientists” — are empowered to find the truth sitting in your logs yourself, you’ll find problems and answers that might not even have occurred to the data scientists. Data-empowered decision-makers don’t have to repeat history when they can see it themselves from logs.

It all comes back to the individual data worker — maybe you, or someone in your organization, sitting there with the massive universe of the organization’s log files in front of you organized under a database schema that may not even be appropriate, making a report that will probably end up having to be re-done or analyzing an incident for root cause hypotheses by poring through lines after line of a log file.

You may wonder to yourself — as our founders have — “Why can’t we just give these people a tool to find the answers themselves?” It’s the kind of tool that anyone who’s ever worked professionally with log file analysis has wished for, and a key focus for our research. If you’re working on similar issues, contact us and sign up for our beta.

Comments ( 0 )