How to Deploy a Security Information and Event Management Solution Successfully

How to Deploy a Security Information and Event Management Solution Successfully

 

SIEM deployments may stall or fail if not implemented with the right scope, use cases, data sources, architecture, expertise or staff size. Security and risk management leaders deploying a SIEM solution should follow this structured approach to ensure a successful implementation.

Overview

Key Challenges

  • Deploying a SIEM solution effectively is predicated on a clear understanding of the scope, objectives, associated use cases, and the availability of trained personnel, a managed security service provider (MSSP) or a co-managed SIEM service provider.
  • Throwing all possible event and data sources at the SIEM solution at once, without considering what the sources are used for, will be setting the SIEM initiative up for failure, and will lead to an unsuccessful deployment.
  • A poor architecture choice can have wide-ranging consequences, from insufficient capacity and a lack of redundancy and disaster recovery capabilities to an inability to meet future objectives, while spending too much.

Recommendations

Security and risk management leaders responsible for security monitoring and operations should:
  • Use a phased, output-driven approach to deploy a SIEM solution. Determine the scope and use cases, and build requirements from those inputs.
  • Prioritize use cases and iteratively onboard those use cases in a methodical manner to gain experience and build competence with the SIEM solution.
  • Work with line of business (LOB) owners and IT systems operators to refine logging and audit capabilities to ensure the data sources are generating logs/events specific to security use cases.
  • Implement a suitable deployment architecture, whether on-premises or cloud, to address the specified use cases with acceptable performance, and to enable future expansion.

Introduction

Before embarking on a security information and event management (SIEM) implementation project, an organization needs to understand the overall project life cycle. There are multiple stages within this life cycle, and they are all an integral part to ensure a successful SIEM solution deployment (see Figure 1).

Figure 1. SIEM Implementation Stages

Source: Gartner (March 2018)

345606_0001.png

SIEM Implementation Stages
In “Establish Scope and Requirements for a Successful Security Information and Event Management Deployment,” Gartner defines the tasks that are needed to plan for a successful SIEM deployment. This particular research explicitly focuses on the deploy stage. It provides specific guidance for organizations that have understood the requirements and responsibilities of maintaining and running a SIEM solution, and that have decided to deploy it either on-premises, self-hosted in a public cloud service (PCS), or hosted by the vendor or a third party. However, if your organization does not have the expertise, nor the resources to implement, manage, maintain and monitor a SIEM solution, there are operating models available that might be more appropriate, such as using:
  • Managed security services (MSSs) (see “Magic Quadrant for Managed Security Services, Worldwide”)
  • Managed detection and response (MDR) services (see “Market Guide for Managed Detection and Response Services”)
  • Co-managed SIEM services (see “How and When to Use Co-managed Security Information and Event Management”)
You could also invest in something else first, for lower effort and reduced cost-risk (see “Use Central Log Management for Security Event Monitoring Use Cases”).

Analysis

Use a Phased Approach

The inherent complexity of an enterprisewide SIEM deployment can be lessened by taking a tactical and phased deployment approach. Focusing on quick wins by the sequential implementation of specific use cases enables ultimately reaching the desired scale and strategic objectives. Except in very small and restricted environments, the “collect all at once and sort it later” approach never results in a successful and effective deployment.
There are two approaches to a phased SIEM deployment.

Deploy Log Management First

Security and risk management leaders need to carefully assess whether the log management and collection architecture should be deployed first, either by a separate central log management (CLM) solution or by using the SIEM’s log management capability. Indeed, SIEMs have two main components, a base layer of CLM functionality and an additional layer for some security analytics. Most SIEM vendors are capable of selling this CLM solution separately (also known as SKUs [stock keeping units]) for clients who just want a CLM functionality. CLM is a critical IT capability that has value for all parts of IT operations, including security, and it provides some benefits to the organization if deployed first. Some of the benefits are:
  • It is relatively easy to deploy and can deliver visibility into user and resource access activities, including security-related ones, and it aligns well with compliance use cases (see “Security Event Monitoring Options for Midsize Enterprises”).
  • Event management is easier to deploy from a scalability and performance perspective when log management functions enable selective forwarding of events to the event management function. This will also impact the ultimate specifications and subsequent price for a SIEM solution.
  • Already-collected log data can then be used to develop security-specific correlation rules and use cases, and event data can then be selectively forwarded to the SIEM system.
Operationally, deploying log management first:
  • Allows for later forensic usage or fulfillment of nonsecurity requirements, which can be done on the log collection and management tier without polluting the SIEM tool.
  • Significantly reduces the time to search a collection of logs from hours, and sometimes days to minutes, should this need to be performed during an incident response activity. Without it, the task could become a slow and onerous effort, as even an ad hoc, manual collection and centralization of logs can be very time consuming when these are distributed across the enterprise and across organizations’ technology stacks.
You may come to find that your organization may already have existing log management infrastructure. If that is the case, a gap analysis ensuring the scope is consistent with the objectives for the SIEM deployment will be required.
It may seem counterintuitive to differentiate between SIEM and log management. However, SIEM is not suited to be an enterprisewide log management solution per se, where every log source regardless of security use-case value is collected and stored for the purpose of forensics, operational monitoring or regulatory compliance. SIEM is about monitoring and identifying security-related violations and incidents. It does not benefit from additional data for which there are no corresponding correlation rules, dashboards or reports, but it can suffer for it. There are also more cost-effective ways of providing pure log collection and storage than a full-fledged SIEM solution (see “Use Central Log Management for Security Event Monitoring Use Cases”).

Use-Case-by-Use-Case (Output-Driven) Approach

In this scenario, use cases are planned and implemented one by one. Supporting log management and SIEM components are deployed in support of each use case, and the process is repeated. This is an output-driven approach (see “Overcoming Common Causes for SIEM Solution Deployment Failures” and Figure 2).
The sequence of use-case implementation should be determined by the combination of critical need balanced with quick wins, based on the lowest effort to the highest risk reduction. These quick wins can then provide early metrics for business justification of the investment in SIEM, and also give security and risk management leaders responsible for security monitoring and operations confidence to expand the use cases further. As the process matures in a “crawl, walk, run” fashion, the organization can move to implement batches of use cases as part of sprints — especially when the use cases share similarities on chosen technology, data sources and objectives (see “How to Develop and Maintain Security Monitoring Use Cases”). Furthermore, a phased, use-case-by-use-case approach will build momentum within the organization for more complex use cases with greater scope.

Figure 2. Output-Driven SIEM

From“Security Information and Event Management Architecture and Operational Processes”

Source: Gartner (March 2018)

345606_0002

Output-Driven SIEM

Nested Phasing
Beyond implementing a use-case-based approach to evolve and expand the SIEM deployment, real-world users will often find that additional organization units, network environments or even geographies have to be included in the future security monitoring scope. There are two ways to approach this. You can apply the phased, output-driven approach to each new environment, implementing each use case until all have been applied before moving on to the next environment, essentially nesting the phases. You can also begin managing correlation rule sets as policies to facilitate a more rapid rate of scope expansion. The effectiveness of this will depend on how standardized your environments are. If a correlation rule set is applied to an environment and a large percentage of the rules need extensive modification, then the phased use-case-by-use-case approach may be better-suited, even if just individually enabling the associated rules.

Refine Your Monitoring Policy

Logging policies implemented to monitor event sources in support of general IT systems will often be very different from those required for effective security monitoring. After the use cases have been developed, a better understanding will emerge for the data sources that need to be collected and managed, and the level of detail required for logs and events from these sources (also known as “verbose level”). Security and risk management leaders need to work closely with LOB owners and IT systems operators (as well as the team responsible for threat intelligence, regarding newer threats) to assess the current logging level and apply audit-level changes to in-scope event sources. This enables the generation of events relevant to security use cases. It is important to note that attempting to send all of your logs to the SIEM solution at once, and hoping to be able to clean up later is one of the main causes for a SIEM solution deployment failure. Doing so drastically increases the cost to maintain the solution, while reducing its effectiveness to detect threats and deliver mandated compliance outcomes. Unfortunately, a “habitable zone” is nonexistent since sending too many logs to the SIEM solution is just as bad as not sending enough. Instead, align your SIEM solution to the specific use cases, which should already be prioritized according to the organization’s needs, then instantiate a logging regime that meets those needs. Part of this activity requires an assessment of the logging policies of the required log sources to verify whether necessary security events are being generated and, if not, to identify the required changes to allow that to happen (see “How to Develop and Maintain Security Monitoring Use Cases”).
Enabling extensive logging and auditing on any system can potentially have an effect on the performance of that system and your SIEM solution. This needs to be carefully assessed. In some cases, the infrastructure may simply be unable to cope with the additional load without detrimental effect on any hosted services or applications. For example, the network can be overloaded with traffic from too many sources with too high a verbose flowing to the CLM and/or SIEM. Databases are especially susceptible due to the large amount of resulting disk activity. They can benefit from prioritizing the categories of events and/or alternate approaches to getting the required information, such as third-party database audit and protection (DAP) tools, workstations, and endpoint protection and response (EDR) solutions. Also, increasing logging horizontally (for example, by adding more context, such as usernames, through X-Forwarded-For [XFF]) can have more advantages than adding extra events.

Follow Log Source Sequence

Use cases should determine security data collection decisions. For example, if you are trying to implement account compromise by monitoring authentication events, then active directory (AD) logs should be collected. The sequence of log source integration should be determined on the basis of importance and feasibility. Establishing some quick wins can build credibility and momentum for the monitoring program, and provide valuable experience to enable you to tune the SIEM deployment. Furthermore, doing so can give you the experience and trust in the solution itself to create more use cases. It is important to note that not every log source type will be relevant to every organization, but they should support the desired use cases. Based on these criteria, we can list the log source integration sequence we see implemented in general by most organizations (the list below does not imply any specific order or priority):
  • Network firewalls
  • Intrusion detection system (IDS)/intrusion prevention system (IPS) devices
  • Network sandboxing
  • Network and host data loss prevention (DLP) solutions
  • Web proxy logs
  • Authentication server logs, such as Windows Active Directory and virtual private network (VPN) access logs
  • Internal DNS server logs
  • Server activity, such as UNIX and/or Windows
  • Cloud service application programming interfaces (APIs)
  • Endpoint security logs (such as antivirus and host IPS)
  • Web server and web application logs
  • NetFlow data
  • Database logs
  • Application logs
Although this exact log source integration sequence should not be construed as a best practice or as a recommendation, it is something Gartner still observes with its more successful clients, and can be used as a rough guide. It should also be noted that this sequence could be used for a subset of some event source types (for example, critical web servers and critical database instances first, followed by moderate-criticality resources). For more information on logging policies, see “Security Information and Event Management Architecture and Operational Processes.”

Use a Deployment Architecture

There are several broad architecture choices that can be used for an enterprise SIEM solution deployment, such as:
  • A single SIEM solution with all components, including log collection and retention, contained within the same system (“appliance SIEM” architecture, suitable for small organizations)
  • A single SIEM solution with collection components distributed across the organization and centralized log storage and analysis (“single SIEM” architecture)
  • Multiple SIEM solutions, one for each region or each data center (“multiple SIEM” architecture, where individual systems may or may not be connected)
  • A federated SIEM solution, in which a higher-tier SIEM system gathers select information from individual SIEM deployments at lower tiers (“federated SIEM” architecture)
  • A SIEM solution delivered as a service (simplifies and reduces the time to implement, administer, maintain and scale, while reducing licensing complexity — see “Innovation Insight for SIEM as a Service”)
  • A SIEM solution hosted in a service provider’s data center or public cloud service (PCS), such as Amazon Web Services or Microsoft Azure
  • A SIEM solution delivered from the cloud as a software as a service (SaaS) application
How the deployment will need to be managed and maintained, and how to expand the scope going forward, will be the indirect result of the choice of architecture, whether deployed on-premises or in the cloud. Also, each architecture type poses slightly different challenges in regard to redundancy and backup requirements, and considerations such as data governance or cross-border privacy issues.
Since SIEM solution architectures vary, some vendors provide collectors that only forward or buffer data, while others provide horizontally and vertically scalable distributed collection, processing and analysis capabilities.
In a SIEM implementation, log management systems (or components) may be deployed in different locations for on-premises deployments. The purposes of log collectors are to:
  • Feed SIEM solutions and log management solutions.
  • Feed into log management solutions, which then send filtered sets into SIEM solutions.
  • Feed SIEM solutions with unfiltered sets of data that are then streamed into log management solutions.
On the other hand, if deployed in the cloud, the organization accepts storing its data either in the SaaS SIEM vendor’s cloud or on a collector controlled by the vendor. In this case, the customer needs to keep in mind that if access to logs requires API keys, or requires credentials on particular systems, then these keys and credentials need to be shared and stored outside of their premises.
Ideally, log collectors are placed closest to the data they need to collect. Organizations should plan at least one collector per site, with distributed storage locations for raw data in the largest environments. In some instances, a security data lake can be implemented, in which a log collection and management tier leverages commercial and open-source log tools, as well as big data platforms. This typically occurs in organizations that already have those technologies deployed and have internal expertise available to the security team. A distributed architecture also requires a secure communication channel — for organizations leveraging the internet — to ensure that the data confidentiality, integrity and availability attributes are respected (usually requires encryption, and store and forward mechanisms). Lastly, the performance (or connectivity for cloud deployments) has to be considered, as well. A vertical architecture with too many tiers can introduce delays that will make real-time monitoring difficult.
Single SIEM systems will often suffer from performance issues when high ingestion rates, analysis and reporting are done concurrently. SIEM systems are mainly designed for ingestion reliability at the cost of processing. This type of architecture also lacks redundancy and requires an external backup facility, which is a valid concern, especially if used for regulatory compliance purposes. An alternative, or more importantly a best practice, would be running a production/development system as a failover. In other words, have a smaller production/development system in place that doesn’t take large log feeds, and can test the log feeds that have been ingested without affecting the actual SIEM solution. However, if deploying in the cloud, compensate for cloud-related issues by designing the architecture and associated processes to include additional resilience factors, such as connectivity (see “Selecting and Deploying SaaS SIEM for Security Monitoring”).

Context Data Source Integration

Beyond event and log data, SIEM systems also ingest contextual data, such as:
  • User context from identity and access management (IAM) solutions
  • Asset context from configuration management databases (CMDBs) or asset inventory solutions
  • Threat context from threat intelligence platforms (TIPs) or services
  • Vulnerability context from vulnerability management/assessments
There are specific use cases that may require these, but these sources also provide general context for event data (for example, by adding user context to a specific event).
These may have different functional requirements depending on the integration mechanism (for example, deep integration via bidirectional APIs or even just a flat-file import). Connectivity between the SIEM system and the contextual data source must be considered, as well as organizational aspects, such as including context that identifies the LOB owners and/or system operators (see “How to Develop and Maintain Security Monitoring Use Cases”).
Gartner defines several types of context that are useful for security monitoring (see Table 1).

Table 1: Sources of Context Data

Enlarge Table
Context Type
Typical Source
User context
IAM systems and directory services, and other user repository information, such as human resources information (for example, employees who have joined and left the organization)
Asset context
Asset management systems, directory services and internal SIEM asset subsystem
Vulnerability context
Vulnerability Assessment (VA) tools, dynamic application security testing (DAST) and static application security testing (SAST) tools
Threat context
Tactical threat intelligence (TI) feeds that present lists of “bad” entities, such as internet Protocol (IP) addresses, domain name system (DNS) names, URLs and file hash sums
Configuration context
CMDB and VA tools with security configuration assessment capability
Data context
DAP and DLP tools and data management systems
External context
Public and private TI feeds and social media monitoring
Application context
Infrastructure and business applications, and DAST and SAST tools
Business context
Business unit managers, personnel and business applications, and integrated risk management (IRM) tools
Location and physical context
GPS sensors built into systems, network location data and physical access control systems
Source: Gartner (March 2018)

SIEM Integration Considerations

When planning for a SIEM deployment, organizations should also consider how users will interact with the SIEM, and where and how the SIEM integrates with the overall organization security workflows and processes. Specifically:
  • What is the organization’s overall workflow for end-to-end security monitoring and remediation?
  • How are the SIEM alerts managed? Is the triage, investigation and remediation done manually by people, or are the alerts fed into other tools?
  • If the level of analytics of the SIEM is not sufficient, are the SIEM alerts and event context dynamically fed into User and Entity Behavior Analytics (UEBA) systems (see “Market Guide for User and Entity Behavior Analytics”)?
  • If the alerts need to be triaged, and extensive real-time context needs to be applied for this investigation, does the SIEM integrate with Security Orchestration, Automation and Response (SOAR) tools (see “Innovation Insight for Security Orchestration, Automation and Response”)?
  • Does the SIEM need to integrate with the organization’s case management or IT Service Management (ITSM) corporate solutions?

SIEM Architecture Dimensions

The following attributes may affect SIEM architecture choices and need to be assessed and considered:
  • Number of log sources
  • Volume of logged data to be collected and analyzed
  • Types of collection mechanisms utilized
  • Specific set of use cases (across all phases of the project)
  • Network topology
  • Available bandwidth
  • Regulatory compliance issues, including log retention period mandates
  • Log retention locations, both physically and logically (for example, physically stored in a country, but by an outsourcer), as well as log retention duration
  • Data governance, cross-border privacy issues and legal limitations
Choosing which data to selectively forward in a distributed environment (for example, to facilitate centralized organization-wide threat monitoring) will be a key decision. The output-driven SIEM approach, based on use cases, will provide a framework for this, as well. Generally, log data not immediately required for a specific use case should not be forwarded. It will not provide any immediate security benefit, but it will add noise and impact performance. The organization can still access the raw log data itself in its original form, if required for investigative purposes. The “let’s collect it just in case, you never know” approach needs to be balanced with pragmatic cost, efficiency and scalability considerations.
Table 2 provides an outline to create a high-level plan for a SIEM architecture.

Table 2: SIEM Architecture Dimensions

Enlarge Table
Architecture Dimension
Description
SIEM Architecture Impact
Goals
What problems will a SIEM solution solve?
A high-level list of problems that a SIEM solution is purchased and deployed to solve has a primary architecture impact.
Use cases
Why is the system here, and how is it used (depends on goals)?
Specific use cases for initial and later phases affect where and how different components are deployed.
Users
Who would be using the system?
The number of users of the system affects information presentation architecture.
Collection scope
What will data do (depends on use cases)?
The scope of data collection, itself defined by the use cases, affects the number, size and distribution of collection components, and determines which data should be collected for alerting versus reporting.
Event source topology
What data sources are located at various remote sites?
The location of different types of log sources at different data centers and remote offices affects the number and placement of collectors and distributed storage.
Retention scope
What data is stored, and for how long (depends on use cases)?
Log retention scope defines what types of data are stored and where and how that data is stored.
Organization’s size
What is the overall size and complexity of organizational networks?
The size of an organization, and specifically the size of its IT environment, affects scalability requirements.
Organization’s distributed nature
Is the organization globally distributed?
The distributed nature of an organization affects not just collection architecture, but also retention and data analysis of a SIEM product.
Organization’s IT approach
Will there be somebody using a tool in each region or just in a central location?
A centralized or distributed approach to IT maps to SIEM architecture and affects a choice of single or federated SIEM deployment.
Integration with upstream systems
What systems will consume SIEM output?
SIEM data may flow to a UEBA system, SOAR tool or an IRM tool, and the methods for data selection and transfer need to be decided on and implemented or configured.
False Positive Reduction
Does the organization have processes to adequately understand how its content is performing?
Continuous feedback can help with reducing false positives. Data that is aligned to the SIEM solution’s performance should be analyzed annually. The processes in place to reduce false positives and remove inefficient rule sets should be carried out on a regular basis.
Context data integration
What additional data will need to be collected to enable the required analysis?
SIEM may collect all the right logs, but it may need links to identify management, asset management and other enterprise systems. Such links need to be defined and implemented.
Source: Gartner (March 2018)

Additional Architecture Questions to Be Answered Before the Deployment Stage

Organizations should be asking themselves additional questions before deploying their SIEM solution, such as:
  • Where should we use agents versus agentless collection of log and context data (if there is a choice)?
  • What collector form factor (appliance, software or virtual image) should we use? How many collectors? Which collector types?
  • Can we use independent collection and routing technologies, such as Apache Kafka and NiFi?
  • How do we decide which log sources go into which collector if there are many collectors per site?
  • How do we deal with super-high-volume and super-low-volume log sources?
  • How do we architect log collection around network architecture boundaries, such as zones and access control lists (ACLs)? Specifically, how do we run demilitarized zone (DMZ) log collection?
  • Is there a separate audit zone in network security architecture?
  • Can correlation be distributed? Tiered? How will the log data be routed to different correlation engines, and where should each rule run?
  • How can storage be distributed across sites?
  • What is being stored: structured data, unstructured data or both? Can many data stores of structured or unstructured data be queried from one place?
  • What do we do when any particular component becomes oversubscribed?
  • How will redundancy, availability and recovery be architected? How will the data be backed up? How fast can a reserve instance be brought online?
  • Is there data caching in lower-reliability areas of the network to ensure data will not be lost?
  • What network architecture constraints (such as connectivity and link bandwidth) are in place, and how do we work around them for log data transport? How is log transport redundancy architected given the above constraints?
  • How, and from where, can you get NetFlow data?
  • How do we deal with other external constraints on architecture: firewall rules, security policy, available servers and user population?
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s