Python

Data Pipeline: A Mini ETL

Read messy CSV and JSON records, clean and reshape them, and emit a tidy summary report. The exact pipeline every data engineer builds on day one.

PythonIntermediatePortfolio piece

What you'll be able to build

Read messy CSV and JSON records, clean and reshape them, and emit a tidy summary report. The exact pipeline every data engineer builds on day one. Along the way you pick up real, transferable Python skills, not just this one project:

  • reading CSV with the csv module
  • parsing JSON into dicts/lists
  • normalising & validating records
  • grouping + aggregating with dicts
  • handling missing/dirty fields
  • writing a formatted summary report

A course like this one

Yours is built from your own placement, so module count and depth will differ. This map shows what a intermediate-level Python learner building Data Pipeline actually gets.

  1. Module 1: Values and output5 lessons

    Builds the script for your data pipeline.

  2. Module 2: Collections and data5 lessons

    Builds the data flow workflow for your data pipeline.

  3. Module 3: Branching and state5 lessons

    Builds the function that powers your data pipeline.

  4. Module 4: Functions and tests5 lessons

    Builds the reusable module for your data pipeline.

  5. Module 5: Files, APIs, and persistence5 lessons

    Builds the service boundary for your data pipeline.

  6. Module 6: Packaging and review3 lessons

    Builds the release package for your data pipeline.

How the lessons actually work

Leans on:csvjson

Every lesson has you predict what a piece of Python code will output before you run it, then run it for real in your browser and fix what you got wrong. Each module ends in a challenge gate with hidden tests, so you can't advance until your code actually works. The course closes with a capstone that assembles everything into Data Pipeline, and a runnable proof page tied to your own code.

Common questions

How long does the Data Pipeline: A Mini ETL course take?

about 7 hours, across 6 modules and 28 lessons, at roughly 15 minutes per lesson. Your own course may run shorter or longer, since it's sized to your placement result, not a fixed template.

Do I need experience?

Some. This is an intermediate-tier Python project, so it assumes you're comfortable with Python basics and pushes past them.

How much does it cost?

$15 one-time, no subscription. The first module is free, so you can see exactly how the course teaches before you pay for the rest.

No subscription. Module one is free.

Build my Data Pipeline