Narwhals - DataFrame Agnostic API

Narwhals is a lightweight, zero-dependency compatibility layer for dataframe libraries in Python that provides a unified interface across different backends.

Docs: https://narwhals-dev.github.io/narwhals/

What is Narwhals?

Narwhals enables writing dataframe-agnostic code that works seamlessly across multiple Python dataframe libraries:

Full API Support:

•cuDF
•Modin
•pandas
•Polars
•PyArrow

Lazy-Only Support:

•Dask
•DuckDB
•Ibis
•PySpark
•SQLFrame

Core Philosophy

Why Narwhals?

•Resolves subtle differences between libraries (e.g., pandas checking index vs Polars checking values)
•Provides unified, simple, and predictable API
•Handles backwards compatibility internally
•Tests against nightly builds of supported libraries
•Maintains negligible performance overhead
•Full static typing support
•Zero dependencies

Target Use Case: Anyone building libraries, applications, or services that consume dataframes and need complete backend independence.

Key Features

•Backend Agnostic: Write once, run on any supported dataframe library
•Polars-Like API: Uses a subset of the Polars API for consistency
•Lazy & Eager Execution: Separate APIs for both execution modes
•Expression Support: Full expression API for complex operations
•Type Safety: Perfect static typing support
•100% Branch Coverage: Thoroughly tested

Basic Usage Pattern

Three-Step Workflow

python

import narwhals as nw

# 1. Convert to Narwhals
df_nw = nw.from_native(df)  # Works with pandas, Polars, PyArrow, etc.

# 2. Perform operations using Polars-like API
result = df_nw.select(a_sum=nw.col("a").sum(), a_mean=nw.col("a").mean(), b_std=nw.col("b").std())

# 3. Convert back to original library
result_native = result.to_native()

Using the @narwhalify Decorator

Simplifies function definitions for automatic conversion:

python

@nw.narwhalify
def my_func(df: IntoDataFrameT):
    return df.select(nw.col("a").sum(), nw.col("b").mean()).filter(nw.col("a") > 0)


# Automatically handles conversion to/from Narwhals
result = my_func(pandas_df)  # Works!
result = my_func(polars_df)  # Also works!

Top-Level Functions

Conversion Functions

•
from_native(df, ...): Convert native DataFrame/Series to Narwhals object
- •Parameters: pass_through, backend, eager_only, allow_series
•to_native(nw_obj): Convert Narwhals object back to native library type
•narwhalify(): Decorator for automatic dataframe-agnostic functions

Data Creation

•new_series(name, values, dtype): Create a new Series
•from_dict(data): Create DataFrame from dictionary
•from_dicts(data): Create DataFrame from sequence of dictionaries

File I/O

Eager Loading:

•read_csv(source, **kwargs): Read CSV file into DataFrame
•read_parquet(source, **kwargs): Read Parquet file into DataFrame

Lazy Loading:

•scan_csv(source, **kwargs): Lazily scan CSV file
•scan_parquet(source, **kwargs): Lazily scan Parquet file

Aggregation Functions

•sum(), mean(), min(), max(), median()
•sum_horizontal(), mean_horizontal(), etc.

Expression Creation

•col(name): Reference column by name
•lit(value): Create literal expression
•when(condition): Create conditional expression
•format(template, *args): Format expression as string

Utilities

•generate_temporary_column_name(): Generate unique column names
•get_native_namespace(obj): Get the native library of an object
•show_versions(): Print debugging information

DataFrame Methods

Properties

•columns: List of column names
•schema: Ordered mapping of column names to dtypes
•shape: Tuple of (rows, columns)
•implementation: Name of native implementation

Column Operations

•select(*exprs): Select columns using expressions
•with_columns(*exprs): Add or modify columns
•drop(*columns): Remove specified columns
•rename(mapping): Rename columns

Row Operations

•filter(predicate): Filter rows based on conditions
•head(n): Get first n rows
•tail(n): Get last n rows
•sample(n): Randomly sample n rows
•drop_nulls(): Drop rows with null values
•unique(): Remove duplicate rows

Inspection

•is_empty(): Check if DataFrame has no rows
•is_duplicated(): Identify duplicated rows
•is_unique(): Identify unique rows
•null_count(): Count null values per column
•estimated_size(): Estimate memory usage

Transformations

•sort(*by): Sort by one or more columns
•group_by(*by): Group by columns for aggregation
•join(other, on, how): Perform SQL-style joins
•pivot(on, index, values): Create pivot table
•explode(*columns): Expand list columns to long format
•lazy(): Convert to LazyFrame

Export

•to_native(): Convert to original library type
•to_numpy(): Convert to NumPy array
•to_pandas(): Convert to pandas DataFrame
•to_polars(): Convert to Polars DataFrame
•clone(): Create a copy

LazyFrame Methods

LazyFrame provides the same API as DataFrame but with lazy evaluation:

Key Differences

•Operations build an execution plan without computing
•collect(): Materialize the LazyFrame into a DataFrame
•collect_schema(): Get schema without collecting data
•sink_parquet(path): Write results directly to Parquet

Common Methods

All DataFrame methods are available on LazyFrame:

•select(), filter(), with_columns(), drop()
•group_by(), join(), sort(), unique()
•head(), tail(), top_k()
•gather_every(): Select rows at regular intervals
•unpivot(): Convert from wide to long format
•with_row_index(): Add row index column
•pipe(): Apply function to LazyFrame

Expression (Expr) API

Expressions are the building blocks for column operations.

Creation

python

nw.col("column_name")  # Reference column
nw.lit(42)  # Literal value

Filtering

•filter(predicate): Filter elements
•is_in(values): Check membership
•is_between(lower, upper): Check range
•drop_nulls(): Remove nulls

Aggregations

•count(): Count non-null elements
•null_count(): Count null values
•n_unique(): Count unique values
•sum(), mean(), median(): Statistical aggregations
•min(), max(): Extremes
•std(), var(): Spread measures
•quantile(q): Quantile values

Transformations

Mathematical:

•abs(): Absolute value
•round(), floor(), ceil(): Rounding
•sqrt(), log(), exp(): Mathematical functions

Type/Value Operations:

•cast(dtype): Change data type
•fill_null(value): Replace null values
•replace_strict(old, new): Replace specific values

Window Operations:

•rolling_mean(window_size): Moving average
•rolling_sum(window_size): Moving sum
•rolling_std(window_size): Moving standard deviation
•shift(n): Shift values by n positions
•over(*by): Compute expression over groups

Ranking/Uniqueness:

•rank(): Assign ranks
•unique(): Get unique values
•is_duplicated(): Identify duplicates
•is_first_distinct(): Mark first distinct occurrences

Namespace Methods

Expressions have specialized namespaces for specific data types:

String Operations (Expr.str)

•String manipulation methods

DateTime Operations (Expr.dt)

•Date/time manipulation methods

List Operations (Expr.list)

•List column operations

Categorical Operations (Expr.cat)

•Categorical data methods

Struct Operations (Expr.struct)

•Struct/nested data methods

Name Operations (Expr.name)

•Column name operations

Series API

Series represents a single column:

Properties

•Same as DataFrame: shape, dtype, name

Methods

•Similar to DataFrame but for single column operations
•Has specialized namespaces: str, dt, list, cat, struct

Type Hints

Full docs: narwhals.typing

TLDR:

DataFrameT module-attribute

DataFrameT = TypeVar('DataFrameT', bound='DataFrame[Any]') TypeVar bound to Narwhals DataFrame.

Use this if your function can accept a Narwhals DataFrame and returns a Narwhals DataFrame backed by the same backend.

Examples:

>>> import narwhals as nw
>>> from narwhals.typing import DataFrameT
>>> @nw.narwhalify
>>> def func(df: DataFrameT) -> DataFrameT:
...     return df.with_columns(c=df["a"] + 1)
Frame module-attribute

Frame: TypeAlias = Union["DataFrame[Any]", "LazyFrame[Any]"] Narwhals DataFrame or Narwhals LazyFrame.

Use this if your function can work with either and your function doesn't care about its backend.

Examples:

>>> import narwhals as nw
>>> from narwhals.typing import Frame
>>> @nw.narwhalify
... def agnostic_columns(df: Frame) -> list[str]:
...     return df.columns
FrameT module-attribute

FrameT = TypeVar( "FrameT", "DataFrame[Any]", "LazyFrame[Any]" ) TypeVar bound to Narwhals DataFrame or Narwhals LazyFrame.

Use this if your function accepts either nw.DataFrame or nw.LazyFrame and returns an object of the same kind.

Examples:

>>> import narwhals as nw
>>> from narwhals.typing import FrameT
>>> @nw.narwhalify
... def agnostic_func(df: FrameT) -> FrameT:
...     return df.with_columns(c=nw.col("a") + 1)

IntoDataFrame module-attribute

IntoDataFrame: TypeAlias = NativeDataFrame Anything which can be converted to a Narwhals DataFrame.

Use this if your function accepts a narwhalifiable object but doesn't care about its backend.

Examples:

>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> def agnostic_shape(df_native: IntoDataFrame) -> tuple[int, int]:
...     df = nw.from_native(df_native, eager_only=True)
...     return df.shape

IntoDataFrameT module-attribute

IntoDataFrameT = TypeVar( "IntoDataFrameT", bound=IntoDataFrame ) TypeVar bound to object convertible to Narwhals DataFrame.

Use this if your function accepts an object which can be converted to nw.DataFrame and returns an object of the same class.

Examples:

>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrameT
>>> def agnostic_func(df_native: IntoDataFrameT) -> IntoDataFrameT:
...     df = nw.from_native(df_native, eager_only=True)
...     return df.with_columns(c=df["a"] + 1).to_native()

Common Patterns

Group By and Aggregate

python

result = df.group_by("category").agg(
    count=nw.col("id").count(),
    total=nw.col("amount").sum(),
    average=nw.col("amount").mean(),
)

Conditional Operations

python

result = df.with_columns(category=nw.when(nw.col("value") > 100).then(nw.lit("high")).otherwise(nw.lit("low")))

Joins

python

result = df1.join(
    df2,
    on="key_column",
    how="left",  # inner, left, outer, cross
)

Chain Operations

python

result = (
    df.filter(nw.col("status") == "active")
    .select("user_id", "amount")
    .group_by("user_id")
    .agg(total=nw.col("amount").sum())
    .sort("total", descending=True)
    .head(10)
)

Best Practices

•
Use @nw.narwhalify for library functions: Simplifies API and handles conversion automatically

•

Prefer expressions over method chaining: More flexible and composable

python

# Good
df.select(nw.col("a").sum(), nw.col("b").mean())

# Also fine, but less composable
df.select("a", "b")

•
Use lazy evaluation when possible: Better performance for complex pipelines
python
```
result = df.lazy().select(...).filter(...).collect()
```
•
Always convert back to native: Remember to call .to_native() when returning from library functions (unless using @narwhalify)
•
Type hint your functions: Use IntoDataFrame and FrameT for better IDE support
•
Check supported backends: Some operations may not be available on all backends

Important Constraints

•
Zero Dependencies: Narwhals has no dependencies, keeping it lightweight
•
Polars API Subset: Uses Polars-style API but may not support all Polars features
•
Backend Limitations: Some backends (lazy-only) have restricted functionality

Tips

Checking whether a Narwhals frame is a Polars frame:

import polars as pl
import narwhals as nw

df_native = pl.DataFrame({"a": [1, 2, 3]})
df = nw.from_native(df_native)
df.implementation.is_polars()

Asserting series equality:

import pandas as pd
import narwhals as nw
from narwhals.testing import assert_series_equal
s1 = nw.from_native(pd.Series([1, 2, 3]), series_only=True)
s2 = nw.from_native(pd.Series([1, 5, 3]), series_only=True)
assert_series_equal(s1, s2)
Traceback (most recent call last):
...
AssertionError: Series are different (exact value mismatch)
[left]:
┌───────────────┐
|Narwhals Series|
|---------------|
| 0    1        |
| 1    2        |
| 2    3        |
| dtype: int64  |
└───────────────┘
[right]:
┌───────────────┐
|Narwhals Series|
|---------------|
| 0    1        |
| 1    5        |
| 2    3        |
| dtype: int64  |
└───────────────┘