flashback

"Why did my metric change?"

You ran a 6-hour training job. The Sharpe ratio dropped from 1.4 to 0.9. Somewhere between the raw tick data and the feature matrix, a silent transformation introduced look-ahead bias or row filtering. You have no idea where. Your logs don't track it. Git can't diff binary Parquet files. And by the time you discover the bug, you've wasted days.

Why Existing Tools Don't Work

❌ DVC is too heavy

Versions entire files with S3 backends, CI pipelines, and YAML configs. You don't want to learn a new orchestration system—you want to know what happened to column price_lag1 between step 3 and step 7.

✓ flashback is surgical

Tracks at the column and row level. Zero I/O overhead. Records every transformation in an in-memory DAG. Instant time-travel to any checkpoint.

❌ Git doesn't understand columns

git diff on a Parquet file is binary noise. It cannot tell you "this .filter() removed 412 rows" or "this .with_columns() introduced a null in 3% of rows."

✓ flashback speaks DataFrame

Shows you exactly which rows were added/removed, which columns changed, and how many nulls appeared—all in human-readable format.

The Solution: flashback

flashback wraps your DataFrame in a zero-cost proxy that records every transformation as a node in an in-memory Directed Acyclic Graph (DAG). Each node is identified by a deterministic SHA-256 hash of the schema + operation arguments.

The key insight: Identical transformations applied to identical data always produce the same node ID. Transformations are deterministic by construction, making them reproducible and cache-able.

What You Get

⏪ Instant Time-Travel

fb.checkout("before-lag") returns the exact frame at that checkpoint with no I/O unless you ask for it.

📊 Structural Diffing

frame.diff(other) shows exactly which rows were added or removed between any two checkpoints.

🎨 Beautiful Lineage Views

fb.visualize() renders a git-log-style tree in your terminal or an SVG graph in Jupyter.

🔄 Reproducibility

Identical transformations on identical data always produce the same node ID—no hidden state, no surprises.

How It Works

The Architecture

┌──────────────────────────────────────────────────────────┐
│  FlashbackFrame                                          │
│                                                          │
│  ┌──────────────┐    intercept    ┌───────────────────┐  │
│  │  Polars API  │ ─────────────▶ │   LineageDAG      │  │
│  │  .filter()   │                │                   │  │
│  │  .sort()     │  record node   │  root ──▶ filter  │  │
│  │  .join()     │ ◀──────────── │         ──▶ sort  │  │
│  └──────────────┘                │         ──▶ join  │  │
│         │                        └───────────────────┘  │
│         ▼                                               │
│  polars.DataFrame  (unchanged; Polars still optimises)  │
└──────────────────────────────────────────────────────────┘

Node Identity & Determinism

Each node is identified by a 20-character hex SHA-256 of:

{
  "parents": ["<parent_node_id>"],
  "op": "filter",
  "kwargs": {"arg_0": "[(col(\"price\")) > (0)]"},
  "schema": {"id": "Int64", "price": "Float64", ...}
}

This means:

Identical pipelines on identical data always hash to the same node → instant cache hits
Changing any argument or parent state produces a different hash → no silent collisions
The hash is deterministic → reproducible across runs and environments

Visualization

Terminal output with fb.visualize():

╭─ flashback lineage  •  4 commits  •  HEAD → rolling_mean ─────────────────╮
│                                                                             │
│  📂 LOAD  5,000 rows × 4 cols  [14:03:01]                                  │
│  │                                                                          │
│  ├─ 🔍 filter  arg_0=...col("price")...  4,823 rows × 4 cols  #a1b2c3d4   │
│  │                                                                          │
│  ├─ ➕ with_columns  arg_0=...alias("notional")  4,823 rows × 5  #e5f6a7  │
│  │                                                                          │
│  ├─ ⏪ lag  column='price'  n=1  4,823 rows × 6  [before-lag]  #b8c9d0    │
│  │                                                                          │
│  └─ 📈 rolling_mean  window=5  4,823 rows × 7 ● HEAD  #01e2f3a4           │
│                                                                             │
╰─────────────────────────────────────────────────────────────────────────────╯

Quickstart

Installation

pip install flashback-df
# or with uv (recommended):
uv add flashback

5-Minute Example

import flashback as fb

# ── 1. Load any source ──────────────────────────────────────────────────────
df = fb.load("trades.parquet")          # Parquet
df = fb.load("prices.csv")              # CSV
df = fb.load(my_polars_df)              # Polars DataFrame
df = fb.load(my_pandas_df)              # Pandas DataFrame

# ── 2. Transform — every step is recorded automatically ─────────────────────
df = df.filter(fb.col("price") > 0)
df = df.with_columns(
    (fb.col("price") * fb.col("volume")).alias("notional")
)

# Tag a checkpoint before a risky operation
df = df.tag("before-lag")

df = df.lag("price", 1)                 # sugar for shift(-1) + tracking
df = df.rolling_mean("notional", 5)

# ── 3. Time-travel ──────────────────────────────────────────────────────────
df_clean = fb.checkout("before-lag")    # instant; no disk I/O

# ── 4. See what broke your Sharpe ratio ─────────────────────────────────────
fb.visualize()

API Reference

Core Functions

Function	Description
`fb.load(source, *, label=None)`	Load from file (.parquet, .csv, .json), Polars/Pandas DataFrame. Returns FlashbackFrame.
`fb.col(name)`	Alias for `polars.col`. Use for IDE-friendly imports.
`fb.commit(frame, label, *, message="")`	Tag the current state with a human-readable label (like `git tag`).
`fb.checkout(label, *, frame=None)`	Time-travel to a named checkpoint. Returns a fully materialised FlashbackFrame.
`fb.visualize(frame=None, *, style="tree")`	Render the transformation lineage. Styles: "tree" (default), "dag", "svg" (Jupyter).

FlashbackFrame Methods

Method	Description
`.lag(column, n=1, *, alias=None)`	Shift `column` by `n` periods with tracking. Creates `column_lag_n`.
`.rolling_mean(column, window, *, alias=None)`	Rolling mean with tracking. Creates `column_rmean_window`.
`.tag(label, *, message="")`	Same as `fb.commit()` but called on the frame.
`.diff(other)`	Structural diff. Returns Polars DataFrame with `_diff` column.
`.history()`	Full transformation chain as list of dicts (root → HEAD).

Persistence & Serialisation

Lineage graphs can be saved to and loaded from disk for reproducible research:

from flashback.storage import Storage

# Save a lineage DAG
store = Storage(".flashback")  # or Storage.from_cwd()
store.save(df, frame_id="experiment-001")

# Later, in another session:
df = store.load("experiment-001")

The .flashback/ directory layout:

.flashback/
├── config.json
├── graphs/
│   └── experiment-001.json       # serialised DAG
└── cache/
    └── <node_id>.parquet        # materialised node snapshots

Efficient storage: Only materialises nodes you explicitly checkpoint or request. Intermediate DAG nodes are compressed and stored as metadata only.

Development & Testing

Quick Start for Contributors

git clone https://github.com/flashback-dev/flashback
cd flashback
pip install -e ".[dev]"

# Lint
ruff check flashback tests
ruff format --check flashback tests

# Type-check
mypy flashback

# Test with coverage
pytest

CI/CD Matrix

The CI pipeline validates across:

OS

3

Ubuntu, macOS, Windows

Python

5

3.10 – 3.13

Coverage

90%

Hard threshold

Roadmap

Branching

fb.branch("experiment-A") for parallel pipeline exploration

Merge

Reconcile two branches at the DAG level with conflict detection

Remote Storage

Push/pull lineage graphs to S3, GCS, or Hugging Face Hub

Lazy Plans

Track Polars lazy evaluation before .collect()

Notebook Magic

%load_ext flashback with live DAG sidebar in Jupyter

DVC Export

Generate .dvc stage files from a flashback DAG

Why Use flashback?

🔍 Debug Pipeline Mysteries

Instantly see which transformation caused your metric to drop. No guessing. No log archaeology.

⏱️ Save Debugging Time

Time-travel to any checkpoint in milliseconds. No re-running 6-hour jobs.

📊 Reproducible Research

Deterministic node hashing ensures the same pipeline is always reproducible across environments.

🚀 Zero Overhead

Wraps Polars—doesn't slow it down. Lazy materialization means minimal memory use.

🐼 Works with Pandas & Polars

Load from either. Convert between them. Mix and match in your pipeline.

📦 Lightweight

Pure Python. No database. No external services. Runs entirely locally.