Lesson ProgressPhase 2 of 6
Phase 2Introduction
Introduction: Outliers and Data Quality

Introduce Descriptive statistics: mean, median, standard deviation, and z-scores and connect to business applications

The Z-Score: A "Weirdness Meter" for Data

Sarah needs a way to measure exactly how unusual each transaction is. She can't just look at numbers and guess - she needs an objective standard. That's where z-scores come in.

A z-score tells you how many standard deviations away from the average a particular value sits. It's like a "weirdness meter" - the farther from zero, the more unusual the value.

The Z-Score Formula

z = (x - μ) / σ

x = the individual transaction amount

μ (mu) = the mean (average) of all transactions

σ (sigma) = the standard deviation

What the Score Means

  • z = 0: Exactly average - totally normal
  • z = ±1: Within normal range
  • z = ±2: Unusual - worth checking
  • z > 2 or z < -2: Likely outlier - needs investigation

Why 2 is the Magic Number

In a normal distribution, about 95% of values fall within 2 standard deviations of the mean. Anything beyond that is statistically unusual - only about 5% of data points. That's rare enough to investigate!

Step-by-Step: Sarah Calculates Z-Scores

Let's walk through Sarah's calculation for the catering order of $127.50:

Step 1: Find the mean and standard deviation

From earlier analysis, Sarah knows: Mean (μ) = $12.50, Standard Deviation (σ) = $8.25

Step 2: Calculate the z-score

z = (127.50 - 12.50) / 8.25
z = 115.00 / 8.25
z = 13.94

Step 3: Interpret the result

A z-score of 13.94 is way more than 2. This transaction is roughly 14 standard deviations above average - that's essentially impossible in normal circumstances. Sarah knows immediately this needs investigation.

Now Try It: Calculate the Z-Score

Now calculate the z-score for the suspicious $0.05 transaction:

Given: Mean = $12.50, Standard Deviation = $8.25, Transaction = $0.05

z = (0.05 - 12.50) / 8.25 = ?

Your calculation:

z = -12.45 / 8.25 = -1.51

Wait, that's only -1.51, which is less than 2... so it's not an outlier by the numbers?

But wait!

Actually, -1.51 is just outside the ±2 range when you consider direction. More importantly,$0.05 makes no business sense - no menu item costs that little. This is clearly a data entry error that needs correction regardless of the z-score.

Key Insight: Statistics + Business Context

Z-scores tell you what's unusual mathematically. But you also need business judgment to decide what's unusual practically. $127.50 with z=13.94 is clearly a real transaction (catering order). $0.05 with z=-1.51 is clearly an error (no items cost 5 cents).

The Three Data Quality Decisions

Once Sarah identifies an outlier, she has three choices:

1. KEEP - If it's legitimate business

The $127.50 catering order is real business. Sarah keeps it but notes it separately for planning.

2. FLAG - If it's uncertain

Sarah marks unusual values so she can analyze both with and without them to see the impact.

3. CORRECT/REMOVE - If it's clearly an error

The $0.05 data entry error should be corrected to the actual value or removed from analysis.

Next: Applied Outlier Detection

Now you'll practice the complete outlier detection workflow with the actual café data - calculating z-scores, making business decisions, and defending your choices with evidence.