18 Jun 2019

DRAIN: A fixed depth tree based online log parsing method

With increasing scale of services and their increasing complexity, managing vastly distributed services (or even many different services which might not be distributed) has become challenging. Logs are pervasive data sources that contain useful information about the health or abnormalities in the normal functioning of these processes. However, the ratio of noise to useful information in the log files is usually very high, which renders this way of health estimation tiresome. Drain, which forms a part in the highly regarded log parsing algorithm, provides an efficient way of reducing these millions or even billion lines of log files into a few recurring and useful log templates which can then be used to train an anomaly detection model or even highlight useful information in case of an unhealthy services. This technique is summarised below.

Formal Problem Definition and expectations from a good log parser

Goal: Transform raw log messages into structured log templates as shown below:

08/11/09 204608 Receiving block blk_3587 src: /10.251.42.84:57069 dest: /10.251.42.84:50010
08/11/09 204655 PacketResponder 0 for block blk_4003 terminating
08/11/09 204655 Received block blk_3587 of size 67108864 from /10.251.42.84

blk_3587 Receiving block * src: * dest: *
blk_4003 PacketResponder * for block * terminating
blk_3587 Received block * of size * from *

More loosely defined, the primary goal of log parsing is to distinguish between constant and variable part from the log contents, while also achieving clustering to a reasonable extent.
Why Do This:
- From the generated log patterns/templates you can identify the important ones and write a wrapper on the top of it that hides the useless logs. This will make log files more readable
- A machine learnt anomaly detection model can further e trained over these log patterns which are the recurring frequent patterns
- Can also build other log analysis models that can be employed to improve the quality of services overtime.
Expectations:
- Log parsers have to be extremely simple and can generally only afford to have linear time complexities, since even quadratic time complexities algorithms might not be able to process hundreds of millions of log lines
- Log parsers should be dynamic. A simple log parser would be a human manully tagging/making log patterns, however as lines of code change so do log lines and hence we would require more and more humans looking at increasing log lines to identify the correct patterns and disregard the older invalid patterns.
- The log parser should be perferably usable online, so that the log lines can be identified with a log pattern as they come and can be used to mark anomalies in real time if required.

Drain: How does it accomplish log parsing?

Drain is a fixed height tree based logparser.
Drain uses a tree structure to optimise on the number of comparisions that we might have to attempt to classify a newly arriving line into one of the log groups/templates/patterns.
Structure of the Tree:
- Root Node is the top most node.
- The internal nodes encode special rules to guide the search process
- The leaf nodes contain log templates/groups along with some metadata.
- Each path of the tree essentially ends with a leaf nodes that stores log groups corresponding to that path. A log group contains log events (generic log template) along with log IDS (lines on which these templates occur)
- The fixed tree depth (along with the maxChold parameter that restrics the number of children per node) bound the number of comparisions that Drain needs to make for each new log line that arrives.

Stage I: Preprocess the log file using simple regular expressions based on domain knowledge. This step substitutes the matching log lines with a corresponding regex pattern.
Stage II: Search by Log Message Length: the 1st level of nodes in the parse tree represent log groups based on message lengths. (This is based on the assumption that log messages with same log event will probably have same log message lengths - which might not be always true)
Stage 3: Search by Preceding Tokens: Drain then traverses from 1st layer node from step 2 to a leaf node based on the tokens in the beginning of the log line (consuming one token at each level) - this is again based on a assumption that the beginning tokens of a log message are more likely to be constants.
- Specifically, Drain selects the next internal node as the token from the log line.
- If this token is a number then, it gets replaced by a * since numbers are likely to be variables or cause the problem of branch explosion. This forms the * node group
- Beside numbers, even the number of tokens below an internal node exceeding the maxChild parameter also results in formation of a ‘*’ group. If a node already has maxChild number of children, any non-matched tokens will match this * node among all children.
Stage 4: Search by Token Similarity: Now that drain has reached the leaf node, it needs to identify if the log line complies with any of the log groups, else form a new log group. Selection of most suitable log group is based on the formula below:

Here, seq1 refers to the log message and seq2 refers to the log template. equ is an indicator function s.t. equ(t1,t2) = I{t1==t2};
If the similarity is greater than the similarity threshold then it is assigned to the most suitable log group, otherwise it forms a new log group.
Stage 5: Update the parse tree: The ID/log line is added to the most suitable log group (or new log group) in the corresponding cases. In case this log line matched to a suitable log group, the log event/template is also updated as follows:
- Scan the log message and the log template token by token and update the token position by wildcard in the log template if the log line does not fit the template.
Time complexity of Drain is O( (d+cm)n ) which is essentially linear.
- d: depth of parse tree
- c: number of candidate log groups in leaf node
- m: lenght of the log messages
- n: number of log messages

Conclusion

Drain has proven to be an effective, accurate and efficient online log parsing algorithm. It has tuned out to give extremely good accuracy scores as compared to other log parsing algorithms on similar datasets. This has also turned out to be effective in real world AD scenarios as the author points out in the case study.

On a personal experience front, we had one opportunity of being able to use a log parsing system, while working on an overnight Nokia Log Anomaly Detection Hackathon in 2018. The repository for the same can be found here. A preprocessing step, such as log parsing using Drain gave us great head start and we could secure the winners position on the Hackathon. We did try a few other popular log parsing methods, however finally decided to stick to DRAIN given our f1-score results. The other contributor was Mayank Rajoria.

DRAIN: A fixed depth tree based online log parsing method

Formal Problem Definition and expectations from a good log parser

Drain: How does it accomplish log parsing?

Conclusion

References

Tags:

Visitors:

Stats: