Using Stats in Splunk Part 1: Basic Anomaly Detection (2024)

One of the most powerful uses of Splunk rests in its ability to take large amounts of data and pick out outliers in the data. For some events this can be done simply, where the highest values can be picked out via commands like rare and top. However, more subtle anomalies or anomalies occurring over a span of time require a more advanced approach.

This article will offer an explanation of the standard score (also known as z-score) in statistics, how to implement it in Splunk’s search processing language (SPL), and some caveats associated with the technique. By the end of this article you should have a better familiarity with these statistical concepts and gain some intuition on the appropriate uses of such techniques.

Commands and subcommands

There are several commands and subcommands that this technique uses. Below is a brief overview of these; feel free to skip this section if you’re already familiar with them.

bin/bucket

The bin/bucket commands (which can be used interchangeably) break timestamps down into chunks we can use for processing in the stats command.

Avg/stdev/count/sum

  • Average:calculates the average (sum of all values over the number of the events) of a particular numerical field.
  • Stdev:calculates the standard deviation of a numerical field. Standard deviation is a measure of how variable the data is. If the standard deviation is low, you can expect most data to be very close to the average. If it is high, the data is more spread out.
  • Count:provides a count of occurrences of field values within a field. You’ll want to use this if you’re dealing with text data.
  • Sum:provides a sum of all values of data within a given field. You’ll want to use this for numerical data (e.g. if the field contains the number of bytes transferred in the event).

How many events do we need?

When calculating the statistics mentioned above, we need to make sure the sample size we’re choosing accurately represents the data. If we choose too small of a timeframe, we might not get a representative sample of the data. Our calculations could produce either a lot of false positives or miss some anomalous events as a result.

Luckily, the Central Limit Theorem offers us some insight into how many events we need for a good sample. The short version of the theorem states that as sample size increases, the mean (average) of the sample data will be closer to the mean of the overall population. Since getting an average for all your data is likely impractical computationally, we can use this theorem to our advantage. If we can create a search that has around30 data pointsper time span, we’ll likely have enough data to have an accurate sample.

Applying what we learned

Given this information, we can do something like the following to calculate some statistics about the normal indexing of data, which we save into a lookup for future reference:

Copy to Clipboard

The above produces a lookup containing the amount of data indexed for an index in a 15m period.

From this we can begin to work on our detection search. We’ll join the historical statistical data we saved to the lookup with a new search that will look for drops. After we do so, we can calculate the z-score, which tells us the number of standard deviations a particular value is from the average.

Copy to Clipboard

More about z-score

How do we determine what value of z-score to set for our threshold? The answer is a bit complicated. There are, however, a few rules that we can take into consideration to help us decide:

1. 68–95–99.7 rule

This rule applies to totally normal distributions (where the data looks like a standard bell curvehttps://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg<- good chart). The quick takeaway is that if the distribution is normal, we can expect 99.7% of values to have a z-score of less than 3.

2. Chebyshev’s inequality

This is a more general rule stating that for a wide class of probability distributions, we only expect values to be a certain distance (measured in standard deviation) from the mean.https://en.wikipedia.org/wiki/Chebyshev%27s_inequality
The quick takeaway is that for most distributions we expect 99% of values to have a z-score of less than 10.

In the above example, we’re assuming that the distribution matches a standard distribution, but your data may be different. In that case, you should apply the findings of Chebyshev’s inequality to determine the threshold to use.

Conclusion

Hopefully this article provided some insight into how to perform basic anomaly detection using some of Splunk’s built-in SPL commands. It should also give you an idea of what thresholds to use to determine what constitutes an anomaly. Happy Splunking!

Using Stats in Splunk Part 1: Basic Anomaly Detection (2024)
Top Articles
Everest Trio Poppers - Pentyl, Propyl & Amyl Nitrite
Poppers Everest Brutal | 30 ml
What Did Bimbo Airhead Reply When Asked
No Hard Feelings (2023) Tickets & Showtimes
7543460065
Oppenheimer & Co. Inc. Buys Shares of 798,472 AST SpaceMobile, Inc. (NASDAQ:ASTS)
Midway Antique Mall Consignor Access
What is IXL and How Does it Work?
California Department of Public Health
978-0137606801
Craigslist Malone New York
Highland Park, Los Angeles, Neighborhood Guide
Theresa Alone Gofundme
Craiglist Kpr
111 Cubic Inch To Cc
Rams vs. Lions highlights: Detroit defeats Los Angeles 26-20 in overtime thriller
Free Online Games on CrazyGames | Play Now!
Vigoro Mulch Safe For Dogs
Earl David Worden Military Service
UPS Store #5038, The
Robin D Bullock Family Photos
Between Friends Comic Strip Today
Www.craigslist.com Savannah Ga
67-72 Chevy Truck Parts Craigslist
Wisconsin Volleyball Team Boobs Uncensored
Boise Craigslist Cars And Trucks - By Owner
California Online Traffic School
Snohomish Hairmasters
Biografie - Geertjan Lassche
This Is How We Roll (Remix) - Florida Georgia Line, Jason Derulo, Luke Bryan - NhacCuaTui
Healthy Kaiserpermanente Org Sign On
Duke University Transcript Request
R/Orangetheory
Craigslist Free Puppy
10 Most Ridiculously Expensive Haircuts Of All Time in 2024 - Financesonline.com
Tal 3L Zeus Replacement Lid
Wsbtv Fish And Game Report
Studio 22 Nashville Review
Überblick zum Barotrauma - Überblick zum Barotrauma - MSD Manual Profi-Ausgabe
Section 212 at MetLife Stadium
Barstool Sports Gif
Unitedhealthcare Community Plan Eye Doctors
Mathews Vertix Mod Chart
Grizzly Expiration Date Chart 2023
Hk Jockey Club Result
Ups Authorized Shipping Provider Price Photos
Ehc Workspace Login
The Blackening Showtimes Near Ncg Cinema - Grand Blanc Trillium
Workday Latech Edu
Jimmy John's Near Me Open
Jovan Pulitzer Telegram
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated:

Views: 5705

Rating: 5 / 5 (80 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.