Work through structured examples applying Descriptive statistics: mean, median, standard deviation, and z-scores with teacher support
Sarah selects a sample of 10 transactions from last Saturday. She calculates the statistics and now needs your help to identify and decide what to do with the outliers.
| Transaction | Amount ($) |
|---|---|
| Coffee | 4.25 |
| Muffin | 2.75 |
| Latte | 5.25 |
| Lunch Combo | 12.95 |
| Catering Order | 127.50 |
| Tea | 3.50 |
| Salad | 8.75 |
| Data Error | 0.05 |
| Breakfast Sandwhich | 6.95 |
| Cookie | 2.25 |
| Mean (μ) | 17.42 |
| Standard Deviation (σ) | 38.47 |
Using z = (x - μ) / σ, calculate the z-score for each transaction and decide:
Analysis Template
| Transaction | x | z = (x-μ)/σ | |z| > 2? | Decision |
|---|---|---|---|---|
| Coffee | 4.25 | -0.34234468416948277 | No | Keep |
| Muffin | 2.75 | -0.38133610605666757 | No | Keep |
| Catering Order | 127.50 | 13.94 | YES | Investigate |
| Data Error | 0.05 | -1.51 | Close | Correct |
Discussion Questions
- Why does the catering order have such a huge z-score while the data error doesn't?
- Despite the different z-scores, why should both be treated as outliers?
- How would your analysis change if the catering order was actually $12.75 (a typo)?
- What additional information would help Sarah make better decisions about these outliers?
Here's something important Sarah noticed: the catering order changes everything about the statistics.
With Outliers
- Mean = $17.42
- Std Dev = $38.47
Without Outliers
- Mean = $5.19
- Std Dev = $3.32
Why This Matters
The mean more than tripled when the outlier is included! This is why Sarah must decide what to do with outliers before calculating statistics for planning. Her recommendation to the café depends on this choice.
In the next phase, you'll work with a larger dataset, develop your own outlier detection workflow, and practice defending your data quality decisions with clear reasoning.