11. June. 2012 by Markus
“Correlation" - noun: mutual relation of two or more things..: Correlation has been in the English language since the 16th century. Its French cousin, corrélation, comes from Latin which literally means “restoring things together.”  It is clear from the latest industry reports that the collection, storage and archiving of large volumes of log data from a wide variety of devices is no longer a major issue for modern log management products. However, this very success has only aggravated the growing problem of the lack of efficient tools for extracting actionable information from those vast amounts of data. This issue can only be resolved by improvements in the searching, analysing and reporting capabilities of log management products. An important aspect of these capabilities is the process of Correlation.
Having searched, parsed and normalized all those various logs entries your normal log management tool can still only identify anomalies individually, within a single log file. Applying intelligence to identify possible connections between them and find any patterns can only be done by human analysis some time after the event. However actionable information that is critical to the security or performance of your systems needs to be available in near real-time. Unfortunately, real Correlation, or finding connections between things, is a creative process and not something that can be automated easily. Still, we can find genuine operational and security benefits from automating as much of the Correlation process as possible. The clearest explanation of Correlation we have found to date is contained in Brian Singer's blog post Not All Correlation is Created Equal. Don’t Fall Into the Alerting Trap , in which he provides three example scenarios:
A tool sends an alert to your beeper every time someone fails to login
Alert happens after 5 failed logins on one systems
Alert happens after failed login 5 times on one system followed by a success, and then 5 failures with the same user name on another, more critical system all within a 30 minute window of time
He then goes on to explain that in the first two cases there is no actual correlation taking place, they are simply alerts. The third case however, ties together multiple events and finds a pattern of activity across multiple systems within a specified time window. This Alert is a product of real Correlation. So any attempt to effectively automate the Correlation process, would have to begin by improving the efficiency of the processes to create and update the set of rules that your tool uses to identify anomalies.
So, the whole purpose of Correlation is to help you to refine and reduce incoming messages, making it easier to find pertinent information from the mass of data and, more importantly, to identify links to other messages that when combined may indicate potential anomalies, and finally produce the appropriate notifications. The ability to collect log data and present it in a single, consolidated view is nothing new. It is the ability to take raw log data from different sources and apply logical correlation rules to it, in near real-time, that is driving the development of Correlation tools. Companies such as CorreLog are developing advanced Correlation Engines able to perform semantic analysis of messages in real-time and employing neural networks with “auto-learning” to automatically adjust alert thresholds. These correlation engines are dramatically improving the ability of log management tools to confirm or, ideally, discover relationships that can reveal additional actionable information. They are also having some success in automating the generation of the real-time alerts produced by this information. So, its seems that the automation of the Correlation process would, as Terence Craig posits in his blog post Tales of Beers and Diapers  “lay the foundation for the holy grail – automatic EDA. The system telling users automatically what relationships are worth paying attention to".
SANS Seventh Annual Log Management Survey Report April 2011, Jerry Shenk available from: http://www.sans.org/reading_room/analysts_program/logmgt_survey-2011.pdf
Not All Correlation is Created Equal. Don’t Fall Into the Alerting Trap March 31, 2011, Brian Singer available from: http://archive.feedblitz.com/722160/~4000821#3
Tales of Beers and Diapers March 2, 2011, Terence Craig available from: http://blog.patternbuilders.com/2011/03/02/tales-of-beers-and-diapers/