Workflow orchestration for Kaggle. Turn Kaggle into a remote feature-engineering cluster from your terminal.
kagglepipe feature run user_features --gpu t4x2That's it. KagglePipe handles the rest — package source code, generate a parameterized notebook, push the kernel, wait for completion, download the artifact, and store it as a parquet file.
Kaggle CLI manages Kaggle resources. KagglePipe manages Kaggle workflows.
kagglepipe monitor # auto-refresh every 5s
kagglepipe monitor --refresh 2 # tighter refresh
kagglepipe monitor --once # one-shot snapshot (CI / scripts)The monitor is read-only — no forms, no editing, no mutations. It reads
the existing .kagglepipe/ state (runs, submissions, experiments, manifests)
and renders a clean 2x3 dashboard that answers the questions every Kaggle
competitor has:
What is running? What completed? What produced my best score? Are my artifacts fresh?
┌─────────────────────────────────────────────────────────────────────────────┐
│ KagglePipe Monitor Project: demotest User: holamigohello │
└─────────────────────────────────────────────────────────────────────────────┘
┌───────── Active Jobs ─────────┐┌─ Pipeline Overview ─┐┌─ Latest Artifacts ──┐
│branch status ││ --------- --------- ││ arti… size when │
│baseline ●DONE · t4 x2││ 50.0% complete ││ base… 4.3 59m │
│user_features ◐RUN · t4 x2││ Total 2 ││ MB ago │
│ ││ Complete 1 ││ │
│ ││ Running 1 ││ │
│ ││ Failed 0 ││ │
└───────────────────────────────┘└─────────────────────┘└─────────────────────┘
┌── Latest Submission ───┐┌─ ★ Best Submission Eve─┐┌── Experiment Summary ───┐
│ Competi… titanic-2… ││ Score 0.87234 ││ Experiments 5 │
│ Score 0.87234 ││ Rank #15 ││ Manifests 4 │
│ Rank #15 ││ Git commit a7d9c13 ││ Cache hits 0 │
│ Submitt… 1h ago ││ Experiment exp-04 ││ Features 3 │
└────────────────────────┘└────────────────────────┘└────────────────────────┘
Empty states are handled gracefully — a freshly-initialized project shows "No active jobs" / "No submissions recorded" / "No artifacts yet" panels that look just as good as the populated ones.
Most serious Kaggle competitors eventually end up with the same mess:
- A local codebase with feature engineering scripts
- Multiple feature branches — each run differently
- GPU training jobs scattered across manual notebooks
- Notebook generation hell: copy-paste-edit-repeat per branch
- Dataset versioning by hand:
src-v1,src-v2,src-v3... - Source code that needs to be synced to Kaggle before every run
- Waiting for kernels to finish — then checking the web UI
- Downloading outputs, renaming files, organizing artifacts
kernel-metadata.jsonthat needs to stay in sync with your local config- The same 12-step workflow repeated every time you want to iterate
It's not a Kaggle problem. It's a workflow problem. And everyone solves it the same way: custom scripts, Makefiles, CI pipelines, shell aliases — eventually building their own internal tooling.
KagglePipe is that tooling, built for everyone.
Instead of manually:
- Packaging source code into a tarball
- Uploading it as a Kaggle Dataset
- Generating a parameterized notebook per branch
- Creating
kernel-metadata.json - Pushing a kernel
- Polling
kaggle kernels statusuntil it completes - Downloading output artifacts
- Organizing everything into a feature store
You run one command:
kagglepipe feature run user_features --gpu t4x2And KagglePipe orchestrates the entire pipeline — end to end, from your terminal.
KagglePipe treats Kaggle kernels as remote workers and Kaggle datasets as versioned artifacts.
Kaggle CLI gives you primitives — raw API operations like
datasets version, kernels push, kernels output. You can wire these together
yourself. People do. That's how every serious competitor ends up with a
Makefile or a run_kaggle.sh script by month two.
The objection is fair: why not just script it myself?
The answer: you can. People do. But:
- Scripting dataset versioning yourself is brittle —
kaggle datasets list --searchdoesn't reliably match slugs, so version detection breaks silently - Notebook generation by copy-paste doesn't scale — add a new branch, update 4 files
- Polling loops are tedious — and they crash on Windows cp1252 consoles when the CLI emits box-drawing characters
- Source upload + kernel push + artifact download is 6 steps that should be 1
KagglePipe packages the patterns that experienced Kaggle competitors build anyway — into a reusable, versioned, configurable tool.
Git ~ Kaggle CLI
GitHub Actions ~ KagglePipe
GitHub Actions builds on Git to add workflow orchestration.
KagglePipe builds on the Kaggle CLI to add workflow orchestration.
GitHub Actions doesn't replace Git — it sits on top of it.
KagglePipe doesn't replace the Kaggle CLI — it sits on top of it.
Kaggle CLI is the engine. KagglePipe is the vehicle.
git clone https://github.com/AdityaProCoder/kagglepipe && cd kagglepipe
python -m venv .venv
.venv\Scripts\python.exe -m pip install -e ".[dev]" # Windows
.venv/bin/python -m pip install -e ".[dev]" # Linux/macOS
kagglepipe --version # -> kagglepipe 0.1.0
kagglepipe whoami # verify credentialsCredentials via ~/.kaggle/kaggle.json, or set KAGGLE_USERNAME / KAGGLE_KEY.
cd ~/my-kaggle-project
kagglepipe config init --name myproj
$EDITOR kaggle.toml[project]
name = "myproj"
[source]
include = ["src", "configs", "scripts", "pyproject.toml"]
exclude_dirs = [".venv", "data", "models", ".git", "__pycache__"]
exclude_exts = [".parquet", ".lgb", ".pt", ".bin"]
src_dataset_slug = "{username}/myproj-src"
[data]
dataset_slug = "{username}/myproj-data"
[feature]
branches = ["user_features", "graph_features", "embedding_features"]
heavy_branches = ["graph_features", "embedding_features"]
default_gpu = "t4x2"
kernel_slug_template = "{username}/myproj-{branch}"
kernel_title_prefix = "myproj"
notebook_command = "python scripts/run.py --out {out_dir}"
output_glob = "{branch}.parquet"
[kernels]
is_private = true
enable_internet = true
[paths]
notebooks_dir = "kaggle_notebooks"
features_dir = "features_kaggle"Every field accepts env-var overrides: KAGGLEPIPE_<SECTION>__<FIELD>
(e.g. KAGGLEPIPE_FEATURE__DEFAULT_GPU=p100).
Package your local codebase and upload it as a versioned Kaggle Dataset.
KagglePipe auto-detects whether to create (v1) or version (v2+).
kagglepipe src upload
# Packaging . -> user/myproj-src v3
# Built tarball: /tmp/src.tar.gz
# Uploaded: user/myproj-src v3Render a parameterized notebook, push it as a kernel, poll until complete, download the output artifact — in one command.
kagglepipe feature run user_features --gpu t4x2
# Wrote notebook: kaggle_notebooks/extract_user_features.ipynb
# Pushed kernel: user/myproj-user_features
# Kernel state: complete
# Downloaded: features_kaggle/user_features.parquetRun all configured branches sequentially, with a summary.
kagglepipe feature all --gpu t4x2
# === user_features ===
# === graph_features ===
# === embedding_features ===
# === Summary (2,180s) ===
# Total: 3, OK: 3, Failed: 0
ls features_kaggle/
# embedding_features.parquet graph_features.parquet user_features.parquetA competition team with three feature branches working in parallel:
kagglepipe src upload # sync source (auto-versioned)
kagglepipe feature all --gpu t4x2 # run all three feature pipelinesAfter the run:
features_kaggle/
embedding_features.parquet # 384-dim embeddings from a vision model
graph_features.parquet # graph connectivity features
user_features.parquet # hand-crafted user signals
# Each parquet is ready to feed directly into a LightGBM stacker
No manual notebook editing. No checking the web UI. No renaming files. One command, three feature pipelines on Kaggle's free GPU hardware.
Local codebase Kaggle infrastructure
──────────────────── ─────────────────────────────────────
│ │
src/ ──► │ Kaggle Dataset (versioned source)
configs/ │
scripts/ │
│
kagglepipe feature run <branch> │ Kernel (GPU) executes the pipeline
│
▼
features_kaggle/ ◄── │ Output artifacts downloaded
branch-a.parquet
branch-b.parquet
| Task | Kaggle CLI | KagglePipe |
|---|---|---|
| Upload source code | datasets create / datasets version |
kagglepipe src upload |
| Detect next version | Manual | Auto (queries existing versions) |
| Generate a notebook | Manual (copy-paste-edit) | Template rendering (Jinja2) |
| Push a kernel | kernels push |
kagglepipe feature run |
| Poll for completion | kaggle kernels status (manual loop) |
Auto (configurable interval + timeout) |
| Download outputs | kaggle kernels output |
Auto (glob-matched, placed in features dir) |
| Run multiple branches | Sequential manual calls | kagglepipe feature all |
| Orchestrate the whole pipeline | DIY scripts + Makefiles | kagglepipe feature run <branch> |
Kaggle CLI is the engine. KagglePipe is the vehicle.
Good fit:
- Serious Kaggle competitors running multi-branch feature pipelines
- Competition teams with shared feature engineering codebases
- Users running GPU-heavy feature extraction on Kaggle's free hardware
- ML engineers who want to develop locally and execute remotely
Not necessary:
- Casual Kaggle users who submit a few notebooks manually
- People who only use Kaggle's web editor
- Simple single-submission workflows
- Thin layer over the official Kaggle CLI — no API magic, just better workflow
- Configuration-driven —
kaggle.tomlencodes your workflow, not your code - Reproducible workflows — same config, same result every run
- Local-first development — iterate on your code, push when ready
- Remote execution on Kaggle infrastructure — free GPU time, no local hardware needed
Workflow features shipped:
- Parallel branch execution ✅ —
feature all --parallel 3 - Retry/resume failed runs ✅ —
feature retry failed/--resume - Submission automation ✅ —
kagglepipe submit+submissions list/latest - Dependency graphs ✅ —
feature build <target> - Artifact caching ✅ —
cache status/cache clear - Experiment tracking ✅ —
experiments record/list/show - Feature registry ✅ —
features list/show - Dataset lineage ✅ —
lineage show <feature> - Dry-run mode ✅ —
feature run --dry-run/src upload --dry-run - Pre-flight validation ✅ —
kagglepipe validate - Leaderboard tracking ✅ —
submissions watch/leaderboard latest - Submission provenance ✅ —
submissions best/submissions show <id>(P11.5) - Project templates ✅ —
template init tabular|cv|nlp - Strong run manifests ✅ — every run writes a JSON manifest to
.kagglepipe/manifests/ - Reproducibility bundles ✅ —
run export <branch>/run reproduce <bundle.tar.gz>
Roadmap for v1.0 (no more major features, focus on adoption):
- PyPI release
- Screencast / GIF demo
- Bug fixes from real-user feedback
- Performance improvements
- Better error messages
- More template types (recsys, time-series, RL)
The goal is no longer more features. The goal is adoption, reliability, and becoming the standard workflow tool for serious Kaggle users.
A terminal recording of kagglepipe feature run in action would convey the
workflow faster than any documentation. If you'd like to contribute a
GIF/screen recording showing the full src upload → kernel polling → artifact
download cycle, it would significantly improve first-impression conversion.
| Command | Description |
|---|---|
kagglepipe whoami |
Print verified username |
kagglepipe login |
Bootstrap ~/.kaggle/kaggle.json |
kagglepipe config init |
Scaffold kaggle.toml |
kagglepipe config show [--json] |
Print effective config (human or JSON) |
kagglepipe validate [--json] |
Pre-flight checks (P10); --json emits machine-readable output |
kagglepipe template init <type> |
Scaffold a starter project (tabular/cv/nlp) (P12) |
kagglepipe template list |
List available templates |
kagglepipe src upload [--version N] [--dry-run] |
Package & push source dataset |
kagglepipe feature run <branch> [--dry-run] |
Render notebook → push → poll → download |
kagglepipe feature all [--parallel N] [--resume] |
Run all configured branches; N>=2 for concurrency |
kagglepipe feature retry [selector] |
Re-run failed/error/timeout/all branches (P2) |
kagglepipe feature resume |
Resume, skipping branches that already completed (P2) |
kagglepipe feature build <target> |
Run a feature plus its declared dependencies (P4) |
kagglepipe feature plan <target> |
Print the dependency plan for <target> (P4) |
kagglepipe status [--all] [--csv] |
List your kernels |
kagglepipe kernels list |
List kernels with filters |
kagglepipe kernels status <slug> |
Live kernel status |
kagglepipe kernels output <slug> |
Download kernel output directory |
kagglepipe kernels logs <slug> |
Print logs URL |
kagglepipe kernels stop <slug> |
Cancel a running kernel |
kagglepipe datasets list |
List your datasets |
kagglepipe datasets get <slug> <path> |
Download a dataset |
kagglepipe datasets create <dir> |
Create a new dataset |
kagglepipe datasets version <dir> -m "msg" |
New version of existing dataset |
kagglepipe competitions list |
Active competitions |
kagglepipe competitions submit <comp> <file> -m "msg" |
Submit to a competition |
kagglepipe competitions leaderboard <comp> |
Competition leaderboard |
kagglepipe submit [--competition X] [--file f] [--train] |
Submit a file (P3) |
kagglepipe submissions list|latest|watch|best|show |
Submission history + provenance (P3/P11/P11.5) |
kagglepipe leaderboard latest <competition> [--top N] [--json] |
Top-of-leaderboard view (P11) |
kagglepipe cache status|clear |
Artifact cache (P5) |
kagglepipe experiments record|list|show |
Experiment tracking (P6) |
kagglepipe features list|show |
Feature registry (P7) |
kagglepipe lineage show|add-parent|remove |
Dataset lineage (P8) |
kagglepipe run export <branch|manifest> |
Export a run as a portable tarball (P14) |
kagglepipe run reproduce <bundle.tar.gz> |
Reproduce a run from a bundle (P14) |
Run kagglepipe <cmd> --help for all flags.
src/kagglepipe/
cli.py argparse root + dispatch
config.py kaggle.toml loader + env overrides
credentials.py ~/.kaggle/kaggle.json + KAGGLE_USERNAME/KEY
runner.py subprocess wrapper (UTF-8 safe, python -X utf8 -m kaggle)
slug.py {username}/{branch} template resolver
tarball.py build_tarball(include, exclude_dirs, exclude_exts)
notebook.py render Jinja2 notebook + kernel-metadata.json
polling.py poll_kernel_status(...)
kaggle_api.py high-level wrappers around the kaggle CLI
commands/ one module per command group
templates/ default notebook template
tests/ 80 unit tests + 1 live integration test
docs/quickstart.md step-by-step walkthrough
MIT