Introduce Descriptive statistics: mean, median, standard deviation, and z-scores and connect to business applications
Sarah needs a way to measure exactly how unusual each transaction is. She can't just look at numbers and guess - she needs an objective standard. That's where z-scores come in.
A z-score tells you how many standard deviations away from the average a particular value sits. It's like a "weirdness meter" - the farther from zero, the more unusual the value.
z = (x - μ) / σ
x = the individual transaction amount
μ (mu) = the mean (average) of all transactions
σ (sigma) = the standard deviation
What the Score Means
- z = 0: Exactly average - totally normal
- z = ±1: Within normal range
- z = ±2: Unusual - worth checking
- z > 2 or z < -2: Likely outlier - needs investigation
Why 2 is the Magic Number
In a normal distribution, about 95% of values fall within 2 standard deviations of the mean. Anything beyond that is statistically unusual - only about 5% of data points. That's rare enough to investigate!
Let's walk through Sarah's calculation for the catering order of $127.50:
Step 1: Find the mean and standard deviation
From earlier analysis, Sarah knows: Mean (μ) = $12.50, Standard Deviation (σ) = $8.25
Step 2: Calculate the z-score
z = 115.00 / 8.25
z = 13.94
Step 3: Interpret the result
A z-score of 13.94 is way more than 2. This transaction is roughly 14 standard deviations above average - that's essentially impossible in normal circumstances. Sarah knows immediately this needs investigation.
Now calculate the z-score for the suspicious $0.05 transaction:
Given: Mean = $12.50, Standard Deviation = $8.25, Transaction = $0.05
z = (0.05 - 12.50) / 8.25 = ?
Your calculation:
z = -12.45 / 8.25 = -1.51
Wait, that's only -1.51, which is less than 2... so it's not an outlier by the numbers?
But wait!
Actually, -1.51 is just outside the ±2 range when you consider direction. More importantly,$0.05 makes no business sense - no menu item costs that little. This is clearly a data entry error that needs correction regardless of the z-score.
Key Insight: Statistics + Business Context
Z-scores tell you what's unusual mathematically. But you also need business judgment to decide what's unusual practically. $127.50 with z=13.94 is clearly a real transaction (catering order). $0.05 with z=-1.51 is clearly an error (no items cost 5 cents).
Once Sarah identifies an outlier, she has three choices:
1. KEEP - If it's legitimate business
The $127.50 catering order is real business. Sarah keeps it but notes it separately for planning.
2. FLAG - If it's uncertain
Sarah marks unusual values so she can analyze both with and without them to see the impact.
3. CORRECT/REMOVE - If it's clearly an error
The $0.05 data entry error should be corrected to the actual value or removed from analysis.
Now you'll practice the complete outlier detection workflow with the actual café data - calculating z-scores, making business decisions, and defending your choices with evidence.