Monitoring
This is the fourth article in a series starting with the critical capabilities of a modern SOC. In the last article, I talked about the “detection” capability, and in this article, I talk about the third capability discussed, namely, “monitoring.”
Activities underpinning this capability include monitoring the alert queue, ascertaining the validity of alerts, and determining whether further investigation is necessary. This is traditionally a Tier 1 analyst task, but in many cases, it can benefit from automation, particularly if the organisation already has mature processes in place.
Alerts delivered into a queue can be dealt with through automated actions or selected by a SOC analyst (typically a Tier/Level 1 analyst) for initial assessment according to a predefined analysis flow. This involves the triage of alerts—a process to sort, categorise, and prioritise alerts within a short timeframe.
A key step in performing triage is to quickly determine whether the alert is a false positive (i.e., legitimate activity erroneously identified as suspicious) or a valid event of interest that merits further analysis, also known as a true positive. This may require the collection of additional information.
To facilitate a rapid decision, playbooks should be produced that map to each use case. These playbooks usually provide a checklist of steps (e.g., to leverage threat intelligence for additional context) as well as information about different threats that can be compared to the attributes of a given alert.
If validated as a true positive, cursory analysis should be undertaken to determine whether the alert meets a specified threshold, which may be based on a matrix classification (e.g. criticality and urgency) or other classification scheme (e.g., severity of the threat, incident type, number, and nature of systems affected). Some alerts may be classified automatically (or a score calculated), which can help with prioritising critical alerts.
If the threshold is met, a case should be opened, and the alert should be escalated for a more detailed investigation in line with its assigned priority level (e.g., high, medium, or low). Alerts that do not reach the threshold should be closed.
For alerts deemed to be false positives, feedback should be shared with the alert development team so rules can be fine-tuned to reduce the number of false positive alerts without creating blind spots.
Alert Fatigue
The Tier 1 analyst role of monitoring the alert queue and performing triage can result in alert fatigue due to an overwhelming volume of daily alerts and too many false positives. Alert fatigue can hinder the vigilance of analysts in distinguishing between alerts that matter and those that are false alarms or redundant, resulting in valuable information or significant threats being overlooked. You can study the Target breach case study for the negative effect alert fatigue can have on SOC operations.
Diligent tuning can help significantly, making correlation rules more specific, eradicating false positives and improving the quality of alerts. Tuning requires an understanding of what is important to provide alerts on. This can be ascertained by drawing on a baseline of normal, expected behaviour, threat analysis and threat intelligence.
A ticketing or case management system is typically used to track alerts until closure. More advanced or mature organisations are adopting Security Orchestration & Automation Response (SOAR) platforms to assist the SOC with the automation of triaging, enrichment tasks, and case management features. Automating some of the simpler, routine tasks that consume analysts’ time and energy helps to alleviate the propensity for alert fatigue.
By providing analysts with enriched data to inform their decision making, automation facilitates the movement and escalation of potential security incidents through the investigation and incident response stages. However, it is important that automation be supported by robust processes that are repeatable, consistent, and tested. For the most part, automation systems rely on playbooks (or runbooks) to execute tasks, although artificial intelligence may lessen this reliance in the future.
How we do it at LMNTRIX: At LMNTRIX we don’t use a SOAR however we have developed similar capability using bots and code built directly into our XDR platform so alerts are validated automatically, scored, enriched and respective incidents created ready for analysts to contain and escalate to clients. This automation is what allows us to meet our MTTD and MTTR SLA and scale our operation.