Lesson ProgressPhase 1 of 6
Phase 1Hook
Hook: Data Cleaning and Analysis

Business scenario where data cleaning matters for investor decisions

Phase 1: Hook

Why Clean Data Matters to Investors

Before you can forecast, you need data you can trust. Investors lose confidence when analysts present messy, unverified data.

The Café Data Problem

Sarah's café client needs a weekend forecast to plan inventory and staffing. The POS system exports data that looks ready—but has hidden problems that will wreck any analysis.

What's Wrong

  • Dates stored as text (can't do time analysis)
  • Prices include $ signs (break formulas)
  • Duplicate transaction rows
  • Product names with inconsistent spacing
  • Missing values where system timed out

Why Investors Care

  • Dirty data → unreliable forecasts
  • Unverified data → audit failures
  • Documented cleaning → trust and credibility
  • Clean pipeline → reproducible analysis
The Investor Question

"How do you know this data is accurate enough to base decisions on?"

Your answer shapes whether investors trust your forecast. Show documented cleaning steps, before/after metrics, and data quality flags. If you can't prove the data is clean, they won't trust your model.

Diagnostic: Can You Spot the Cleaning Priority?
Which data problem matters most and in what order? Think like an analyst preparing for investor review.

1. Sarah receives the café's raw weekend sales data. The dates are formatted as text, prices include currency symbols, and some rows are duplicated. What do you clean first?

2. An investor asks: 'How do you know this data is reliable?' What data cleaning evidence should you show?

3. The POS system exported product names with extra spaces: ' Latte ' vs 'Latte'. Why does this matter for analysis?

0 of 3 questions answered
Turn and Talk

Discussion Prompt (3 minutes):

What would you do if you had 30 minutes to clean this data before a 2pm investor meeting?

  • Which cleaning steps give you the most confidence quickly?
  • What would you skip if time runs out?
  • What would you tell the investor about your data limitations?