Data Quality Checks Skill
This skill guides the creation of data quality checks in Dagster.
Workflow
- •Discovery: Ask the user to identify the critical asset(s) they want to validate or identify critical assets yourself (e.g. bronze/silver layer tables).
- •Proposal:
- •Query the asset data to understand its shape and common values (use
duckdborpolars). - •List potential quality checks (e.g., "column
idshould be unique", "columnstatusshould be one of ['active', 'inactive']", "no null values intimestamp"). - •Present this list to the user for confirmation.
- •Query the asset data to understand its shape and common values (use
- •Implementation:
- •Create a new Python file in
src/validation/asset_checks/(create directories if needed). - •Implement checks using the
@asset_checkdecorator. - •Ensure the new module is discoverable by
src/main.py. This means ensuring it's imported insrc/validation/asset_checks/__init__.pyor thatload_asset_checks_from_package_modulescans it recursively.
- •Create a new Python file in
Coding Standards
- •Use
polarsfor data processing within checks if possible. - •Return
AssetCheckResultwith metadata (e.g., number of failing rows). - •Follow the project's linting rules.
Example
python
import dagster as dg
import polars as pl
from src.resources.io_managers import PolarsDeltaIOManager
@dg.asset_check(asset=dg.AssetKey(["bronze", "work", "github", "github_repository_stats"]))
def check_repo_stats_positive(context, io_manager: PolarsDeltaIOManager):
df = io_manager.load_input(context)
# logic
return dg.AssetCheckResult(passed=True)