Comprehensive Explanation: What is a SIEM (in 2020 and beyond.)
[I have not had the time to proof read nor correct grammatical errors, spelling mistakes and typos. ]
SIEM unifies Threat Detection and Hunting.
This is an old topic worth revising and level setting with the latest advancements, concepts and learning from a decades of unsuccessful SIEM deployments! It is worth revisiting as allot people don’t understand the value and even less understand how to effectively operationalise and achieve business outcomes utilising the power of a SIEM.
After reading this you will gain enough insight into the basics of SIEM.
I am continually asked the same questions around SIEM design, so glad to finally brain dump this knowledge and share with the community
(SIEM in Public Cloud is beyond the scope of this article, while all the information is relevant, I will write another article focusing specifically for Threat Detection for Public Cloud environments. )
Security Information and Event Management
A SIEM seeks to provide a holistic approach to an organisation’s IT security. A SIEM represents a combination of services, appliances, and software products. It performance real-time collection of log data from devices, applications and hosts. It also process the collected log data, enabling real-time analysis of security alerts generated by network hardware and applications, Advanced Correlation for security and operational events, as well as real-time alarming and scheduled reporting.
SIEM technology is used in many enterprise organizations to provide real time reporting and long term analysis of security events. SIEM products evolved from two previously distinct product categories, namely security information management (SIM) and security event management (SEM).
Table 1 shows this evolution.
Table 1 . SIM and SEM Product Features Incorporated into SIEM
Separate SIM and SEM Products
Security Information Management:
Log collection, archiving, historical reporting, forensics
Security Event Management:
Real time reporting, log collection, normalization, correlation, aggregation
Combined SIEM Product
SIEM combines the essential functions of SIM and SEM products to provide a comprehensive view of the enterprise network using the following functions:
- Log collection of event records from sources throughout the organization provides important forensic tools and helps to address compliance reporting requirements.
- Normalization maps log messages from different systems into a common data model, enabling the organization to connect and analyze related events, even if they are initially logged in different source formats.
- Correlation link slogs and events from disparate systems or applications, speeding detection of and reaction to security threats.
- Aggregation reduces the volume of event data by consolidating duplicate event records.
- Reporting presents the correlated aggregated event data in real-time monitoring and long-term summaries.
Internal IT environment consists of services, networking equipment, application, and components that they want to protect and prevent intrusion into. In order to protect these assets and data, you can deploy protection in the form of firewalls, antivirus, IPS/IDS and Authentication. Protection Examples such as;
- Web Security
- Email Security
- Traffic Capture
- Secure Access Service Edge
Despite all of the systems and effort put into these solutions, those trying to breach that environment will get in. Once they are in, detecting and responding to their attack is time critical.
A SIEM receives or taps into all of these activity as it is continually receiving thousands of logs per second from all of these devices and systems within the environment. The SIEM process log data to make meaning of what is actually happening on a device aka Detection, and analytics are used to analyses data activity, providing more input into what is actually happening.
SIEM solutions also provides the ability to analysis log historic data and generate reports for compliances purposes as well as providing digital forensic and fulfilling additional parts of overall information security strategy.
SIEM solutions centralising log data within IT environments, augmenting security measures and enabling real-time analysis. It is constantly watching, monitoring and analysing events and alerts with the environment in an effort to detect attacks and intrusions.
Fourth Wave of SIEM
SIEMs sometimes gets a bad name as it is incredibly powerful and yet takes enormous amount of skills and effort to get working. Not because of the SIEM, but it requires data from all of your IT environment and that particularly causes massive delays in successful SIEM deployment. (This can be easily solved. Keep reading.) SIEM has evolved to very mature platforms. E.g. ArcSight 20+ years of evolution. Read ArcSight History here
- First Wave
- PCI-DSS really drove first phase of SIEM deployment for Complaint Business outcome.
- Second Wave
- Then people started to detect bad things in network activity.
- Third Wave
- This phase was when customer started to build SOCs.
- Fourth Wave
- This is about SOCs developing Threat Hunting utilising NDR, EDR, SIEM and SOAR
SIEM processes all types of Machine data produced by devices in a IT environment.
Machine data is one of the most underused and undervalued assets of any organization. But some of the most important insights that you can gain—across IT and the business—are hidden in this data: where things went wrong, how to optimize the customer experience, the fingerprints of fraud. All of these insights can be found in the machine data that’s generated by the normal operations of your organization.
Machine data is valuable because it contains a definitive record of all the activity and behavior of your customers, users, transactions, applications, servers, networks and mobile devices. It includes configurations, data from APIs, message queues, change events, the output of diagnostic commands, call detail records and sensor data from industrial systems, and more.
The challenge with leveraging machine data is that it comes in a dizzying array of unpredictable formats, and traditional monitoring and analysis tools weren’t designed for the variety, velocity, volume or variability of this data.
In computing, syslog /ˈsɪslɒɡ/ is a standard for message logging. It allows separation of the software that generates messages, the system that stores them, and the software that reports and analyzes them. Each message is labeled with a facility code, indicating the software type generating the message, and assigned a severity level.
The syslog protocol, defined in RFC 3164, protocol provides a transport to allow a device to send event notification messages across IP networks to event message collectors, also known as syslog servers. The protocol is simply designed to transport these event messages from the generating device to the collector. The collector doesn’t send back an acknowledgment of the receipt of the messages.
Syslog uses the User Datagram Protocol (UDP), port 514, for communication. Being a connectionless protocol, UDP does not provide acknowledgments. Additionally, at the application layer, syslog servers do not send acknowledgments back to the sender for receipt of syslog messages. Consequently, the sending device generates syslog messages without knowing whether the syslog server has received the messages. In fact, the sending devices send messages even if the syslog server does not exist.
The syslog packet size is limited to 1024 bytes and carries the following information:
Computer system designers may use syslog for system management and security auditing as well as general informational, analysis, and debugging messages. A wide variety of devices, such as printers, routers, and message receivers across many platforms use the syslog standard. This permits the consolidation of logging data from different types of systems in a central repository. Implementations of syslog exist for many operating systems.
When operating over a network, syslog uses a client-server architecture where a syslog server listens for and logs messages coming from clients.
The Syslog protocol is defined by Request for Comments (RFC) documents published by the Internet Engineering Task Force (Internet standards). The following is a list of RFCs that define the syslog protocol:
- The BSD syslog Protocol. RFC3164. (obsoleted by The Syslog Protocol. RFC5424.)
- Reliable Delivery for syslog. RFC3195.
- The Syslog Protocol. RFC5424.
- TLS Transport Mapping for Syslog. RFC5425.
- Transmission of Syslog Messages over UDP. RFC5426.
- Textual Conventions for Syslog Management. RFC5427.
- Signed Syslog Messages. RFC5848.
- Datagram Transport Layer Security (DTLS) Transport Mapping for Syslog. RFC6012.
- Transmission of Syslog Messages over TCP. RFC6587.
More reading on Syslog;
- An Overview of the syslog Protocol
- Machine Data
SIEM is a mandatory requirement for Compliance Audits such as PCI-DSS, ISO, 27001, Sarbanes–Oxley Act of 2002(thanks Enron), and other standards.
The Payment Card Industry (PCI) Security Standards Council was founded by five global payment brands: American Express, Discover Financial Services, JCB International, MasterCard, and Visa. These five payment brands had a common vision of strengthening security policies across the industry to prevent data breaches for businesses that accept and process payment cards. Together they drafted and released the first version of PCI Data Security Standard (PCI DSS 1.0) on December 15, 2004.
PCI DSS is a regulation with twelve requirements that serve as a security baseline to secure payment card data.
- PCI-DSS v 3.2.1 Requirements;
- Requirement 10: Track and monitor all access to network resources and cardholder data.
- Requirement 11.5: Deploy a change detection mechanism (for example, file integrity monitoring tools) to alert 24 personnel to unauthorized modification (including changes, additions, and deletions) of critical system files, configuration files or content files. Configure the software to perform critical file comparisons at least weekly. Implement a process to respond to any alerts generated by the change-detection solution.
Depending on your PCI-DSS merchant level and number of Credit Card transactions you process, you will need to adhere to different levels of PCI-Auditing.
Cyber Threat Intelligence
Threat intelligence, or cyber threat intelligence, is information an organization uses to understand the threats that have, will, or are currently targeting the organization. This info is used to prepare, prevent, and identify cyber threats looking to take advantage of valuable resources.
Cyber Threat Intelligence consists of many number of information including; Indicators of Comprise and Indicators of Attacks
Indicators of compromise (IOCs) are “pieces of forensic data, such as data found in system log entries or files, that identify potentially malicious activity on a system or network.” Indicators of compromise aid information security and IT professionals in detecting data breaches, malware infections, or other threat activity. By monitoring for indicators of compromise, organizations can detect attacks and act quickly to prevent breaches from occurring or limit damages by stopping attacks in earlier stages.
Indicators of compromise act as breadcrumbs that lead infosec and IT pros to detect malicious activity early in the attack sequence. These unusual activities are the red flags that indicate a potential or in-progress attack that could lead to a data breach or systems compromise.
Indicators of attack are similar to IOCs, but rather than focusing on forensic analysis of a compromise that has already taken place, indicators of attack focus on identifying attacker activity while an attack is in process. Indicators of compromise help answer the question “What happened?” while indicators of attack can help answer questions like “What is happening and why?” A proactive approach to detection uses both IOAs and IOCs to discover security incidents or threats in as close to real time as possible
- Unusual Outbound Network Traffic
- Anomalies in Privileged User Account Activity
- Geographical Irregularities
- Log-In Red Flags
- Increases in Database Read Volume
- HTML Response Sizes
- Large Numbers of Requests for the Same File
- Mismatched Port-Application Traffic
- Suspicious Registry or System File Changes
- Unusual DNS Requests
- Unexpected Patching of Systems
- Mobile Device Profile Changes
- Bundles of Data in the Wrong Place
- Web Traffic with Unhuman Behavior
- Signs of DDoS Activity
ATPs and Tactics, Techniques and Procedures (TTPs)
SIEM can utilise Cyber threat intelligence/IoCs/IoAs/TTPS and correlate with the IT environment log data to Detect threats in real-time and history log data.
Correlation Rules, Behaviour patterns, Pattern matching, Anomaly detection, Conditions, Thresholds, Network Modelling and Machine learning (Phew give me a pay rise. )
Correlation is one of the key components of any effective SIEM tool. As information from across your digital environment feeds into a SIEM, it uses correlation to identify any possible issues. It does so by comparing sequences of activity against preset rules, conditions and thresholds. SIEMs allow sophisticated ways to implement risk based rules.
The latest SIEM, can now implement Anomaly detection via Machine learning.
All integrated with Threat Intelligence information.
The Brains inside a SIEM is based on Correlation Rules, Pattern matching, Conditions, Thresholds and now implementation of Machine learning via Unsupervised and Supervised Models.
- Correlation Rules
- Pattern Matching
- Supervised Machine Learning
- Unsupervised Machine Learning
- Network Modelling and Risk Scoring
Use case is a term used for Threat Detection in terms of Business Context. It combines the value and context in SIEM platform.
Leading SIEM platforms such as ArcSight has built-in ESM Default Content Use Cases for 80% of your Threat Detection requirements. There are also 3rd Party Use Case library’s including SOCPrime – https://my.socprime.com/en/integrations/ , MITRE ATT&CK® and SIGMA generic SIEM rules format. SIGMA Rules
You can catch just about everything with ArcSight Default Content and SIGMA Rules! The rest you need to pay someone like me to workshop and write.
Machine Data Sources
|Data Type||Use Cases||Examples|
|Amazon Web Services||Security & Compliance, IT Operations||Data from AWS can support service monitoring, alarms and a dashboards for metrics, and can also track security-relevant activities, such as login and logout events.|
|APM Tool Logs||Security & Compliance, IT Operations||APM tool logs can provide end-to-end measurement of complex, multi-tier applications, and be used to perform post-hoc forensic analytics on security incidents that span multiple systems.|
|Authentication||Security & Compliance, IT Operations, Application Delivery||Authentication data can help identify users that are struggling to log in to applications and provide insight into potentially anomalous behaviors, such as activities from different locations within a specified time period.|
|Firewall||Security & Compliance, IT Operations||Firewall data can provide visibility into blocked traffic in case an application is having communication problems. It can also be used to help identify traffic to malicious and unknown domains.|
|Industrial Control Systems (ICS)||Security & Compliance, Internet of Things, Business Analytics||ICS data provides visibility into the uptime and availability of critical assets, and can play a major role in identifying when these systems have fallen victim to malicious activity.|
|Medical Devices||Security & Compliance, Internet of Things, Business Analytics||Medical device data can support patient monitoring and provide insights to optimize patient care. It can also help identify compromised protected health information.|
|Network Protocols||Security & Compliance, IT Operations||Network protocol data can provide visibility into the network’s role in overall availability and performance of critical services. It’s also an important source for identifying advanced persistent threats.|
|Sensor Data||Security & Compliance, IT Operations, Internet of Things||Sensor data can provide visibility into system performance and support compliance reporting of devices. It can also be used to proactively identify systems that require maintenance.|
|System Logs||Security & Compliance, IT Operations||System logs are key to troubleshooting system problems and can be used to alert security teams to network attacks, a security breach or compromised software.|
|Web Server||Security & Compliance, IT Operations, Business Analytics||Web logs are critical in debugging web application and server problems, and can also be used to detect attacks, such as SQL injections.|
SIEM Data formats
Typical formats supported by SIEM platform to ingest Log data;
Syslog, SNMP, SMTP, SCP, FTP, flat file, SQL query, Database Reader, cloud APIs, REST_api, XML, Secure syslog, Cisco FIREsight and SDEE, Checkpoint LEA. AWS Guard duty, Cloudwatch, AWS S3, SCP, JDBC, etc.
Common Event Format (CEF)
In the realm of security event management, a myriad of event formats streaming from disparate devices makes for a complex integration. Common Event format by ArcSight promote interoperability between various event- or log-generating devices.
- ArcSight Common Event Format Implementation
- Time Normalisation
- Ensures timestamps all reflect the same time zone to correlate events from different timezones.
- Time is an important piece for threat detection. Some time zones around the world don’t observe Daylight Savings Time (DST) and some time zones are actually a half hour different than others. In addition to time zone issues, some devices don’t include a time in the log message. A SIEM needs to timestamp a log with a single time zone.
- Data Enrichment (Meta data extracting, tagging and enrichment)
- SIEM parses and breaks down log message into core components and adding context. e.g. adding customer tag, etc.
- Log data is not uniform, they following a standard protocol, but the information within isn’t standard followed by log source providers, so a SIEM has to process the log into a unified threat detection taxonomy and universal schema in order to run mathematical rules.
- Log information needs to be assigned into common schema so that a [User Log on] message from various system from Unix, Windows, Active Directory, AWS, etc will all be tagged as User Log on to assist threat detection search rules.
- Threat and Risk Contextualisation
- Evaluate each log and provide risk-based priority value. e.g. Information for Edge services / DMZ or Authentication such as Active Direction, DNS information, etc.
May 11 10:00:39 scrooge SG_child: [ID 748625 user.info] m:WR-SG-SUMMARY c:X vhost:iscrooge61.seclutions.com:80 (http) GET / => http://bali/ , status:200 , redirection URL: , referer: , mapping:bali , request size: 421 , backend response size: 12960 , audit token:- , time statistics (microseconds): [request total 16617 , allow/deny filters 1290 , backend responsiveness 11845 , response processing 1643 , ICAP reqmod , ICAP respmod ] timestamp: [2012-05-11 10:00:39] [ rid:T6zHJ38AAAEAAAo2BCwAAAMk sid:910e5dd02df49434d0db9b445ebba975 ip:172.18.61.2 ]
Events are a collections of syslogs that is created after processing with Threat Intelligence and/or correlation rules. An Event is a actionable log items sent to human Analysts for further triage, performing investigations and reporting.
Sizing SIEM solutions
Sizing a SIEM solutions, begins with the basic list of devices that you want to monitor. See Example Device List collection Tool;
|Windows Server (Active Directory)||Microsoft||1|
|Windows Server (DNS)||Microsoft||1|
|Fortinet Firewall (IDS/IPS/VPN)||Fortinet||1|
|Citrix Access Gateway||Citrix||1|
SIEM Sizing (Events Per Second)
Critical to sizing and design of a SIEM platform, is to determine Events Per Second produced by the quantity of devices Size,
You need to determine and estimate the following SIEM fundamentals;
- Events Per Second
- Events Per Day:
- Online Retention Period and requirement Storage in GBs
- Retention Period and required Storage in GBs
- Network Bandwidth Peak requirements: (GB /per second for all Devices.)
- EPS Peak
- EPS average (Day, Week, Month, etc.)
- Estimated Device Growth over 3 years
- EPS Headroom (Allow 10-30%)
- Recovery Point Objective
- Recovery Time Objective
- Uptime requirement
- Event / Alert Size (512 Kbs per Event is a rough estimate.)
SIEM Sizing Rosetta Stone
|GB (1 GB = 1,000,000,000 BYTES)||EPS (1 EVENT = 600 BYTES)|
Storage and Archival are critical for any Security Logging platform
- Raw Event Size
- Normalised Event Size
- Retention Time
- Online Retention Period
- Events Per Day
- Compression Ration
- GB Storage per day/Retention time.
It is vital to understand the way your SIEM platform receivers and processing data; What is the Schema format, Schema on Read, Schema on Write. Is it using Distributed Search or in-memory Real-time, etc. The last thing you want to do is HORD data and not understand what you are collecting and be scared of getting rid of it and not even be able to get any value from the data; Don’t turn into this guy, because the Finance department will start knocking on your door and the day will come when you will have to provide justification and prove business results. If you ever get breached and can’t even useful information after you stored tons of data. You might need to find another job.
Overwhelming about of logs sources without proper sanitisation and normalisation can lead to massive amount of useless information in SIEM leading to alert fatigue
False-Positive and False-Negatives
A false positive state is when the SIEM identifies an activity as an attack but the activity is acceptable behavior. A false positive is a false alarm.
A false negative state is the most serious and dangerous state. This is when the SIEM identifies an activity as acceptable when the activity is actually an attack. That is, a false negative is when the SIEM fails to catch an attack. This is the most dangerous state since the security professional has no idea that an attack took place.
False positives, on the other hand, are an inconvenience at best and can cause significant issues. However, with the right amount of overhead, false positives can be successfully adjudicated; false negatives cannot.
- Airport Security: a “false positive” is when ordinary items such as keys or coins get mistaken for weapons (machine goes “beep”)
- Medical screening: low-cost tests given to a large group can give many false positives (saying you have a disease when you don’t), and then ask you to get more accurate tests.
- Antivirus software: a “false positive” is when a normal file is thought to be a virus
Popular SYSLOG Servers
- ArcSight Logger
Log Sources Categories
- Operations Systems
- windows Phone
- err no clue
- Policy Devices
- Network Devices
- Public Cloud
SIEM – Real-Time vs Search
As the ever increasing volume of data increases, it becomes increasingly difficult to gain critical insights into to massive volumes of data for SIEMs and other data analytics platforms. SIEMs need to detect threats in-real time and search years of log source archives at the same time. So you are trying to solve two critical problems at the same time;
- Security Event Management
- Real-Time Streaming Data Analytics
- Security Information Management
- Searching Large Data sets at scale and speed
These two requirements are incredibly difficult to solve at scale. So, lo and behold, Open source to the rescue; Apache Kafka and Apache Hadoop provide solutions for both of these requirements.
A streaming platform has three key capabilities:
- Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
- Store streams of records in a fault-tolerant durable way.
- Process streams of records as they occur.
Kafka is generally used for two broad classes of applications:
- Building real-time streaming data pipelines that reliably get data between systems or applications
- Building real-time streaming applications that transform or react to the streams of data
Apache Hadoop (aka Data Lake)
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
Security Operations and Automated Response (SOAR.)
This subject is beyond the scope of this article. I will dive into this in the near future.
Leading SIEM Vendor Solutions
- ArcSight Data Platform
- ArcSight really almost invited the SIEM industry with 20+ year Product portfolio and invented CEF format for cyber security now supports Apache Kafka and Apache Hadoop. Integrating Unsupervised Machine learning via Vertica, IDOL and Interset.
- While gaining popularity for general purpose IT monitoring, they do have some capability in Security and Big Data Analytics. Splunk Enterprise is the Base, solution, with Splunk Enterprise Security, Splunk UBA, Splunk Cloud and Splunk Phantom. , Splunk Machine Learning Toolkit, Splunk uses Common Information Model
- IBM QRadar
- Another original SIEM vendor.
- I don’t have any experience with QRadar.
- ELK Security Onion / HELK
- Fastest growing Open source Search stack. ELK is Opensource. Elastic is very powerful opensource platform, recently acquired Endgame. ELK stack; Elasticsearch, Kibana, Logstash, Beats. ECS Elastic Common Schema
- McAfee Nitro
- Popular due to McAfee Enterprise license agreements.
- 100% Windows Server Based, no linux edition. Every complex to deploy and requires high resources and application administration. Does have SYSMON, FIM, NETMON, UEBA and SOAR as part of the solution.
- FireEye / Mandiant
- Premium products for Banking and Defence Grade Technology combined with 24/7 DFIR SOC services. So this is Product solution and arguably the best DFIR Team (Mandiant). Every expensive.. HX, NX, MX proud lines, for Endpoint, Network and Cloud SIEM.
Thank you for reading this article, please support my sharing, Next article, I will look at Log collection and SIEM Design patterns in Cloud.
If you would like to sponsor my next article or this blog, please get in touch.