KagglePipe

Workflow orchestration for Kaggle. Turn Kaggle into a remote feature-engineering cluster from your terminal.

kagglepipe feature run user_features --gpu t4x2

That's it. KagglePipe handles the rest — package source code, generate a parameterized notebook, push the kernel, wait for completion, download the artifact, and store it as a parquet file.

Kaggle CLI manages Kaggle resources. KagglePipe manages Kaggle workflows.

Live Monitoring

kagglepipe monitor              # auto-refresh every 5s
kagglepipe monitor --refresh 2  # tighter refresh
kagglepipe monitor --once       # one-shot snapshot (CI / scripts)

The monitor is read-only — no forms, no editing, no mutations. It reads the existing .kagglepipe/ state (runs, submissions, experiments, manifests) and renders a clean 2x3 dashboard that answers the questions every Kaggle competitor has:

What is running? What completed? What produced my best score? Are my artifacts fresh?

┌─────────────────────────────────────────────────────────────────────────────┐
│   KagglePipe Monitor    Project: demotest    User: holamigohello          │
└─────────────────────────────────────────────────────────────────────────────┘
┌───────── Active Jobs ─────────┐┌─ Pipeline Overview ─┐┌─ Latest Artifacts ──┐
│branch             status      ││ --------- --------- ││ arti…   size   when │
│baseline           ●DONE · t4 x2││    50.0% complete   ││ base…    4.3    59m │
│user_features      ◐RUN · t4 x2││      Total  2       ││           MB    ago │
│                               ││   Complete  1       ││                     │
│                               ││    Running  1       ││                     │
│                               ││     Failed  0       ││                     │
└───────────────────────────────┘└─────────────────────┘└─────────────────────┘
┌── Latest Submission ───┐┌─ ★ Best Submission Eve─┐┌── Experiment Summary ───┐
│  Competi…  titanic-2…  ││     Score  0.87234     ││    Experiments  5       │
│     Score  0.87234     ││      Rank  #15         ││      Manifests  4       │
│      Rank  #15         ││  Git commit  a7d9c13   ││     Cache hits  0       │
│  Submitt…  1h ago      ││   Experiment  exp-04   ││       Features  3       │
└────────────────────────┘└────────────────────────┘└────────────────────────┘

Empty states are handled gracefully — a freshly-initialized project shows "No active jobs" / "No submissions recorded" / "No artifacts yet" panels that look just as good as the populated ones.

The Problem

Most serious Kaggle competitors eventually end up with the same mess:

A local codebase with feature engineering scripts
Multiple feature branches — each run differently
GPU training jobs scattered across manual notebooks
Notebook generation hell: copy-paste-edit-repeat per branch
Dataset versioning by hand: src-v1, src-v2, src-v3...
Source code that needs to be synced to Kaggle before every run
Waiting for kernels to finish — then checking the web UI
Downloading outputs, renaming files, organizing artifacts
kernel-metadata.json that needs to stay in sync with your local config
The same 12-step workflow repeated every time you want to iterate

It's not a Kaggle problem. It's a workflow problem. And everyone solves it the same way: custom scripts, Makefiles, CI pipelines, shell aliases — eventually building their own internal tooling.

KagglePipe is that tooling, built for everyone.

What KagglePipe Does

Instead of manually:

Packaging source code into a tarball
Uploading it as a Kaggle Dataset
Generating a parameterized notebook per branch
Creating kernel-metadata.json
Pushing a kernel
Polling kaggle kernels status until it completes
Downloading output artifacts
Organizing everything into a feature store

You run one command:

kagglepipe feature run user_features --gpu t4x2

And KagglePipe orchestrates the entire pipeline — end to end, from your terminal.

KagglePipe treats Kaggle kernels as remote workers and Kaggle datasets as versioned artifacts.

Why Not Just Use Kaggle CLI?

Kaggle CLI gives you primitives — raw API operations like datasets version, kernels push, kernels output. You can wire these together yourself. People do. That's how every serious competitor ends up with a Makefile or a run_kaggle.sh script by month two.

The objection is fair: why not just script it myself?

The answer: you can. People do. But:

Scripting dataset versioning yourself is brittle — kaggle datasets list --search doesn't reliably match slugs, so version detection breaks silently
Notebook generation by copy-paste doesn't scale — add a new branch, update 4 files
Polling loops are tedious — and they crash on Windows cp1252 consoles when the CLI emits box-drawing characters
Source upload + kernel push + artifact download is 6 steps that should be 1

KagglePipe packages the patterns that experienced Kaggle competitors build anyway — into a reusable, versioned, configurable tool.

Mental Model

Git                        ~  Kaggle CLI
GitHub Actions             ~  KagglePipe

GitHub Actions builds on Git to add workflow orchestration.
KagglePipe builds on the Kaggle CLI to add workflow orchestration.

GitHub Actions doesn't replace Git — it sits on top of it.
KagglePipe doesn't replace the Kaggle CLI — it sits on top of it.

Kaggle CLI is the engine. KagglePipe is the vehicle.

Install

git clone https://github.com/AdityaProCoder/kagglepipe && cd kagglepipe
python -m venv .venv
.venv\Scripts\python.exe -m pip install -e ".[dev]"   # Windows
.venv/bin/python -m pip install -e ".[dev]"           # Linux/macOS

kagglepipe --version        # -> kagglepipe 0.1.0
kagglepipe whoami           # verify credentials

Credentials via ~/.kaggle/kaggle.json, or set KAGGLE_USERNAME / KAGGLE_KEY.

Configure

cd ~/my-kaggle-project
kagglepipe config init --name myproj
$EDITOR kaggle.toml

[project]
name = "myproj"

[source]
include = ["src", "configs", "scripts", "pyproject.toml"]
exclude_dirs = [".venv", "data", "models", ".git", "__pycache__"]
exclude_exts = [".parquet", ".lgb", ".pt", ".bin"]
src_dataset_slug = "{username}/myproj-src"

[data]
dataset_slug = "{username}/myproj-data"

[feature]
branches = ["user_features", "graph_features", "embedding_features"]
heavy_branches = ["graph_features", "embedding_features"]
default_gpu = "t4x2"
kernel_slug_template = "{username}/myproj-{branch}"
kernel_title_prefix = "myproj"
notebook_command = "python scripts/run.py --out {out_dir}"
output_glob = "{branch}.parquet"

[kernels]
is_private = true
enable_internet = true

[paths]
notebooks_dir = "kaggle_notebooks"
features_dir  = "features_kaggle"

Every field accepts env-var overrides: KAGGLEPIPE_<SECTION>__<FIELD> (e.g. KAGGLEPIPE_FEATURE__DEFAULT_GPU=p100).

Core Workflows

Source Dataset Management

Package your local codebase and upload it as a versioned Kaggle Dataset. KagglePipe auto-detects whether to create (v1) or version (v2+).

kagglepipe src upload
# Packaging . -> user/myproj-src v3
# Built tarball: /tmp/src.tar.gz
# Uploaded: user/myproj-src v3

Single Feature Branch Execution

Render a parameterized notebook, push it as a kernel, poll until complete, download the output artifact — in one command.

kagglepipe feature run user_features --gpu t4x2
# Wrote notebook: kaggle_notebooks/extract_user_features.ipynb
# Pushed kernel: user/myproj-user_features
# Kernel state: complete
# Downloaded: features_kaggle/user_features.parquet

Full Feature Pipeline Execution

Run all configured branches sequentially, with a summary.

kagglepipe feature all --gpu t4x2
# === user_features ===
# === graph_features ===
# === embedding_features ===
# === Summary (2,180s) ===
# Total: 3, OK: 3, Failed: 0

ls features_kaggle/
# embedding_features.parquet  graph_features.parquet  user_features.parquet

Real-World Example

A competition team with three feature branches working in parallel:

kagglepipe src upload              # sync source (auto-versioned)
kagglepipe feature all --gpu t4x2  # run all three feature pipelines

After the run:

features_kaggle/
  embedding_features.parquet   # 384-dim embeddings from a vision model
  graph_features.parquet        # graph connectivity features
  user_features.parquet         # hand-crafted user signals

# Each parquet is ready to feed directly into a LightGBM stacker

No manual notebook editing. No checking the web UI. No renaming files. One command, three feature pipelines on Kaggle's free GPU hardware.

End-to-End Flow

Local codebase                  Kaggle infrastructure
────────────────────           ─────────────────────────────────────
│                              │
src/                       ──► │  Kaggle Dataset (versioned source)
configs/                         │
scripts/                         │
                                 │
kagglepipe feature run <branch>  │  Kernel (GPU) executes the pipeline
                                 │
                                 ▼
features_kaggle/             ◄── │  Output artifacts downloaded
  branch-a.parquet
  branch-b.parquet

Kaggle CLI vs KagglePipe

Task	Kaggle CLI	KagglePipe
Upload source code	`datasets create` / `datasets version`	`kagglepipe src upload`
Detect next version	Manual	Auto (queries existing versions)
Generate a notebook	Manual (copy-paste-edit)	Template rendering (Jinja2)
Push a kernel	`kernels push`	`kagglepipe feature run`
Poll for completion	`kaggle kernels status` (manual loop)	Auto (configurable interval + timeout)
Download outputs	`kaggle kernels output`	Auto (glob-matched, placed in features dir)
Run multiple branches	Sequential manual calls	`kagglepipe feature all`
Orchestrate the whole pipeline	DIY scripts + Makefiles	`kagglepipe feature run <branch>`

Kaggle CLI is the engine. KagglePipe is the vehicle.

Who Should Use KagglePipe?

Good fit:

Serious Kaggle competitors running multi-branch feature pipelines
Competition teams with shared feature engineering codebases
Users running GPU-heavy feature extraction on Kaggle's free hardware
ML engineers who want to develop locally and execute remotely

Not necessary:

Casual Kaggle users who submit a few notebooks manually
People who only use Kaggle's web editor
Simple single-submission workflows

Design Philosophy

Thin layer over the official Kaggle CLI — no API magic, just better workflow
Configuration-driven — kaggle.toml encodes your workflow, not your code
Reproducible workflows — same config, same result every run
Local-first development — iterate on your code, push when ready
Remote execution on Kaggle infrastructure — free GPU time, no local hardware needed

Roadmap

Workflow features shipped:

Parallel branch execution ✅ — feature all --parallel 3
Retry/resume failed runs ✅ — feature retry failed / --resume
Submission automation ✅ — kagglepipe submit + submissions list/latest
Dependency graphs ✅ — feature build <target>
Artifact caching ✅ — cache status / cache clear
Experiment tracking ✅ — experiments record/list/show
Feature registry ✅ — features list/show
Dataset lineage ✅ — lineage show <feature>
Dry-run mode ✅ — feature run --dry-run / src upload --dry-run
Pre-flight validation ✅ — kagglepipe validate
Leaderboard tracking ✅ — submissions watch / leaderboard latest
Submission provenance ✅ — submissions best / submissions show <id> (P11.5)
Project templates ✅ — template init tabular|cv|nlp
Strong run manifests ✅ — every run writes a JSON manifest to .kagglepipe/manifests/
Reproducibility bundles ✅ — run export <branch> / run reproduce <bundle.tar.gz>

Roadmap for v1.0 (no more major features, focus on adoption):

PyPI release
Screencast / GIF demo
Bug fixes from real-user feedback
Performance improvements
Better error messages
More template types (recsys, time-series, RL)

The goal is no longer more features. The goal is adoption, reliability, and becoming the standard workflow tool for serious Kaggle users.

Visual Demo

A terminal recording of kagglepipe feature run in action would convey the workflow faster than any documentation. If you'd like to contribute a GIF/screen recording showing the full src upload → kernel polling → artifact download cycle, it would significantly improve first-impression conversion.

Full Command Reference

Command	Description
`kagglepipe whoami`	Print verified username
`kagglepipe login`	Bootstrap `~/.kaggle/kaggle.json`
`kagglepipe config init`	Scaffold `kaggle.toml`
`kagglepipe config show [--json]`	Print effective config (human or JSON)
`kagglepipe validate [--json]`	Pre-flight checks (P10); `--json` emits machine-readable output
`kagglepipe template init <type>`	Scaffold a starter project (tabular/cv/nlp) (P12)
`kagglepipe template list`	List available templates
`kagglepipe src upload [--version N] [--dry-run]`	Package & push source dataset
`kagglepipe feature run <branch> [--dry-run]`	Render notebook → push → poll → download
`kagglepipe feature all [--parallel N] [--resume]`	Run all configured branches; N>=2 for concurrency
`kagglepipe feature retry [selector]`	Re-run failed/error/timeout/all branches (P2)
`kagglepipe feature resume`	Resume, skipping branches that already completed (P2)
`kagglepipe feature build <target>`	Run a feature plus its declared dependencies (P4)
`kagglepipe feature plan <target>`	Print the dependency plan for `<target>` (P4)
`kagglepipe status [--all] [--csv]`	List your kernels
`kagglepipe kernels list`	List kernels with filters
`kagglepipe kernels status <slug>`	Live kernel status
`kagglepipe kernels output <slug>`	Download kernel output directory
`kagglepipe kernels logs <slug>`	Print logs URL
`kagglepipe kernels stop <slug>`	Cancel a running kernel
`kagglepipe datasets list`	List your datasets
`kagglepipe datasets get <slug> <path>`	Download a dataset
`kagglepipe datasets create <dir>`	Create a new dataset
`kagglepipe datasets version <dir> -m "msg"`	New version of existing dataset
`kagglepipe competitions list`	Active competitions
`kagglepipe competitions submit <comp> <file> -m "msg"`	Submit to a competition
`kagglepipe competitions leaderboard <comp>`	Competition leaderboard
`kagglepipe submit [--competition X] [--file f] [--train]`	Submit a file (P3)
`kagglepipe submissions list\|latest\|watch\|best\|show`	Submission history + provenance (P3/P11/P11.5)
`kagglepipe leaderboard latest <competition> [--top N] [--json]`	Top-of-leaderboard view (P11)
`kagglepipe cache status\|clear`	Artifact cache (P5)
`kagglepipe experiments record\|list\|show`	Experiment tracking (P6)
`kagglepipe features list\|show`	Feature registry (P7)
`kagglepipe lineage show\|add-parent\|remove`	Dataset lineage (P8)
`kagglepipe run export <branch\|manifest>`	Export a run as a portable tarball (P14)
`kagglepipe run reproduce <bundle.tar.gz>`	Reproduce a run from a bundle (P14)

Run kagglepipe <cmd> --help for all flags.

Project Layout

src/kagglepipe/
  cli.py              argparse root + dispatch
  config.py           kaggle.toml loader + env overrides
  credentials.py      ~/.kaggle/kaggle.json + KAGGLE_USERNAME/KEY
  runner.py           subprocess wrapper (UTF-8 safe, python -X utf8 -m kaggle)
  slug.py            {username}/{branch} template resolver
  tarball.py         build_tarball(include, exclude_dirs, exclude_exts)
  notebook.py        render Jinja2 notebook + kernel-metadata.json
  polling.py         poll_kernel_status(...)
  kaggle_api.py      high-level wrappers around the kaggle CLI
  commands/           one module per command group
  templates/          default notebook template
tests/               80 unit tests + 1 live integration test
docs/quickstart.md   step-by-step walkthrough

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
docs		docs
examples/minimal		examples/minimal
src/kagglepipe		src/kagglepipe
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KagglePipe

Live Monitoring

The Problem

What KagglePipe Does

Why Not Just Use Kaggle CLI?

Mental Model

Install

Configure

Core Workflows

Source Dataset Management

Single Feature Branch Execution

Full Feature Pipeline Execution

Real-World Example

End-to-End Flow

Kaggle CLI vs KagglePipe

Who Should Use KagglePipe?

Design Philosophy

Roadmap

Visual Demo

Full Command Reference

Project Layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KagglePipe

Live Monitoring

The Problem

What KagglePipe Does

Why Not Just Use Kaggle CLI?

Mental Model

Install

Configure

Core Workflows

Source Dataset Management

Single Feature Branch Execution

Full Feature Pipeline Execution

Real-World Example

End-to-End Flow

Kaggle CLI vs KagglePipe

Who Should Use KagglePipe?

Design Philosophy

Roadmap

Visual Demo

Full Command Reference

Project Layout

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages