Lesson ProgressPhase 3 of 6
Phase 3Guided Practice
Guided Practice: Outliers and Data Quality

Work through structured examples applying Descriptive statistics: mean, median, standard deviation, and z-scores with teacher support

Sarah's First Investigation

Sarah selects a sample of 10 transactions from last Saturday. She calculates the statistics and now needs your help to identify and decide what to do with the outliers.

The Sample Data
TransactionAmount ($)
Coffee4.25
Muffin2.75
Latte5.25
Lunch Combo12.95
Catering Order127.50
Tea3.50
Salad8.75
Data Error0.05
Breakfast Sandwhich6.95
Cookie2.25
Mean (μ)17.42
Standard Deviation (σ)38.47
Your Task: Complete the Analysis

Using z = (x - μ) / σ, calculate the z-score for each transaction and decide:

Analysis Template

Transactionxz = (x-μ)/σ|z| > 2?Decision
Coffee4.25-0.34234468416948277NoKeep
Muffin2.75-0.38133610605666757NoKeep
Catering Order127.5013.94YESInvestigate
Data Error0.05-1.51CloseCorrect

Discussion Questions

  1. Why does the catering order have such a huge z-score while the data error doesn't?
  2. Despite the different z-scores, why should both be treated as outliers?
  3. How would your analysis change if the catering order was actually $12.75 (a typo)?
  4. What additional information would help Sarah make better decisions about these outliers?
Complication: The Impact of Outliers on Statistics

Here's something important Sarah noticed: the catering order changes everything about the statistics.

With Outliers

  • Mean = $17.42
  • Std Dev = $38.47

Without Outliers

  • Mean = $5.19
  • Std Dev = $3.32

Why This Matters

The mean more than tripled when the outlier is included! This is why Sarah must decide what to do with outliers before calculating statistics for planning. Her recommendation to the café depends on this choice.

Next: Your Turn

In the next phase, you'll work with a larger dataset, develop your own outlier detection workflow, and practice defending your data quality decisions with clear reasoning.