Data collection and correlation
This is a follow-up to the article I wrote last week about the critical capabilities of a modern SOC. In this article, I cover the first capability discussed, namely, “data collection and correlation.”
The purpose of data collection and correlation is to make sure that the right data flows to the right places at the right time. This is essential to the success of a SOC – enabling the detection of threats and providing real‑time situational awareness for SOC analysts to carry out investigations accurately and in a timely fashion.
The tasks involved in data collection and correlation can demand significant time, effort, and expertise, which organisations should not underestimate. I have seen this phase of a project take several years to finish in large businesses with many locations and event sources.
Specific sensors can be deployed to supplement and enhance other data sources. Sensors are typically installed at an infrastructure or host level (e.g. IDS, EDR and other systems, such as Network Behaviour Analysis (NBA)).
Requisite logging information and other data should be determined based on threat scenarios and associated use cases, taking into account which systems are important and relevant to the detection of potential security events or incidents. The exact data to be collected from these sources should be carefully chosen to provide the detail and context needed to address the use cases. To onboard everything risks data overload and won’t add any value.
Data should be collected and stored securely in a central system, such as an Extended Detection & Response (XDR) or Security Information and Event Management (SIEM) platform, so it can be normalised, correlated and presented in real-time, in a format that is useful for review and analysis. Since breaches may not be discovered for some time, data needs to be kept for a long enough period. Increasingly, organisations are using data lakes for this purpose, as they are better suited to long retrospective searches and other big data techniques.
Data feeds need to be connected, configured and tuned to achieve the right level of logging. Data feeds should be continuously maintained to ensure that the data is flowing properly, otherwise relevant alerts can fail to trigger. Maintaining feeds will require close collaboration with the owners of the systems generating the data.
Correlation rules, or ML, are used to configure the SIEM to raise an alert when a given set of conditions or anomaly occurs. The combinations and sequencing of events across log and data sources that may indicate a potential security incident need to be well-understood to create these rules, which should be tested and refined to verify they deliver the expected results.
There is a diverse range of useful data that a SOC may acquire from various sources. At a very high level, a SOC will consume the following three types of data:
- Event data: data from log sources, network devices, analytical tools and sensors that help to build a picture of what is happening on an organisation’s infrastructure.
- Threat intelligence: information about adversarial threats’ past, present and predicted attacks to inform decisions or actions, often produced in the form of threat intelligence feeds that can be integrated with SOC tools, such as the SIEM.
- Contextual information: information that adds context necessary for conducting security analysis and investigation (e.g. vulnerability scans, asset information including, but not limited to, the Configuration Management Database xCMDB), penetration testing results and HR feeds).
Organisations may choose to build data lakes (a storage repository that integrates big data analytics and machine learning) to store and query vast amounts of data for increased retention periods.
Finally, due to the cost and complexity of operating a SIEM, small to medium-sized organisations have typically failed to get any meaningful security outcomes from SIEM deployments—hence why “SIEM” has become such a dirty word in the industry. You don’t want to be another statistic of a failed SIEM project. If you are not a major enterprise or if you don’t have the resources, then don’t start a SIEM journey. Remember, every single major enterprise that has been breached, had a SOC and a major SIEM deployment, and as research shows, 99% of attacks went undiscovered by logs.
How we do it at LMNTRIX: As I mentioned in my last article, we use our own proprietary XDR technology stack behind the controls that the client already has. We assume the client owned security controls will be continually breached, so relying on logs from these controls will not be very useful for threat detection. We do, however, rely on the client to block all the common and known threat vectors using their existing controls (e.g. NGFW, Web+Email Security) and hand us a cleanest possible network.
Our XDR tech stack is made up of machine and underground intelligence, EDR, NDR, Network Forensics, Security Analytics, Deceptions Everywhere, together with Mobile and Cloud threat detection coverage. This XDR tech stack is complemented with lots of automation and the 24/7 human element, which includes continuous monitoring, hunting, and response. Each of the XDR elements is very powerful on its own, but when combined to share context and intelligence, they form a force multiplier.
By focusing on operationalizing our tech stack, we have become experts at it, and naturally, what we detect has already bypassed existing controls owned by the client. There is no alert/log for what we detect in any existing SIEM or MSSP service owned by the client. Once a threat is detected, all we have to do next is validate it as a true positive before initiating the automated containment process built into our XDR platform. This approach detects & responds to threats missed by existing security controls while delivering zero false positives.