This directory contains backup workflows for various platforms. Each workflow is a Prefect flow that can be run directly or scheduled as a deployment.
| Workflow | Status | Description |
|---|---|---|
| GitHub | Working | Backup repositories and commit history |
| Twitter/X | Working | Backup tweets, bookmarks, and likes with media |
| YouTube | Working | Download YouTube videos (Twilio SMS trigger) |
| Crunchyroll | Working (requires auth) | Download anime from Crunchyroll |
| Working | Backup saved posts, comments, and upvoted content | |
| Google Drive | Working | Download files and folders with Workspace exports |
| Amazon | Working (Python 3.12/3.11) | Download order history |
| Discord | Working | Backup messages, attachments, and metadata |
| Example | Template | Basic Prefect flow reference |
cannot-automate/google_photos.py |
Cannot automate | Google deprecated Library API scopes April 1, 2025 |
to-fix/instagram.py |
Broken | Needs repair |
to-fix/notion.py |
Broken | Needs repair |
These must be installed on the host system (or in Docker):
| Dependency | Required By | Install (macOS) | Install (Ubuntu) |
|---|---|---|---|
| ffmpeg | youtube, crunchyroll | brew install ffmpeg |
apt install ffmpeg |
| git | github | Pre-installed | apt install git |
| mkvtoolnix | crunchyroll | brew install mkvtoolnix |
apt install mkvtoolnix |
| Node.js 18+ | crunchyroll | brew install node |
apt install nodejs |
| pnpm | crunchyroll | npm install -g pnpm |
npm install -g pnpm |
The Crunchyroll workflow requires multi-downloader-nx. No npm package exists - build from source:
# Clone the repo
git clone https://github.com/anidl/multi-downloader-nx.git ~/tools/multi-downloader-nx
cd ~/tools/multi-downloader-nx
# Install dependencies (requires pnpm)
pnpm install
# Build CLI
pnpm run prebuild-cli
# Create wrapper script
cat > ~/tools/multi-downloader-nx/multi-downloader-nx << 'EOF'
#!/bin/bash
SCRIPT_DIR="$HOME/tools/multi-downloader-nx/lib"
cd "$SCRIPT_DIR"
exec node index.js "$@"
EOF
chmod +x ~/tools/multi-downloader-nx/multi-downloader-nx
# Add to PATH (create ~/bin and symlink)
mkdir -p ~/bin
ln -sf ~/tools/multi-downloader-nx/multi-downloader-nx ~/bin/multi-downloader-nx
echo 'export PATH="$HOME/bin:$PATH"' >> ~/.zshrc # or ~/.bashrc
source ~/.zshrc
# Verify installation
multi-downloader-nx --versionCurrent installation: ~/tools/multi-downloader-nx/ (v5.6.9)
Note: Crunchyroll uses DRM protection. Requires a Crunchyroll Premium subscription. May require additional decryption tools (mp4decrypt or shaka-packager) for some content.
For running workflows in Docker containers with Prefect, see: https://docs.prefect.io/v3/how-to-guides/deployment_infra/docker
FROM python:3.12-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
git \
mkvtoolnix \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Node.js (for crunchyroll)
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
&& apt-get install -y nodejs \
&& npm install -g pnpm
# Install multi-downloader-nx
WORKDIR /tools
RUN git clone https://github.com/anidl/multi-downloader-nx.git \
&& cd multi-downloader-nx \
&& pnpm install \
&& pnpm run prebuild-cli
ENV PATH="/tools/multi-downloader-nx/lib:${PATH}"
# Install Python dependencies
WORKDIR /app
COPY pyproject.toml .
RUN pip install --no-cache-dir -e .
# Copy workflow code
COPY . .
# Default command
CMD ["prefect", "worker", "start", "--pool", "docker-pool"]# Create a Docker work pool
prefect work-pool create --type docker aqueduct-docker-pool
# Start a worker
prefect worker start --pool aqueduct-docker-poolfrom workflows.youtube import download_youtube_video
from prefect.docker import DockerImage
download_youtube_video.deploy(
name="youtube-download",
work_pool_name="aqueduct-docker-pool",
image=DockerImage(
name="aqueduct",
tag="latest",
dockerfile="Dockerfile"
),
push=False # Set True to push to registry
)File: github.py
Backs up repositories and commit history using the GitHub GraphQL API.
- Create a GitHub personal access token with
reposcope - Set
GITHUB_TOKENin.env - Register the Prefect block:
prefect block register -m prefect_github
- Uses GraphQL API for efficient data fetching
- Supports
until_dateparameter for incremental backups - Clones repositories locally in addition to fetching metadata
python workflows/github.pyFile: twitter.py
Downloads tweets, bookmarks, and likes with media files using the X API v2 (xdk SDK).
- Create a Twitter/X Developer account and app at https://developer.x.com
- Generate OAuth 2.0 credentials with read permissions
- Set the following in
.env:TWITTER_CLIENT_ID=your_client_id TWITTER_CLIENT_SECRET=your_client_secret TWITTER_REDIRECT_URI=http://localhost:8080/callback - Register the Prefect block:
python blocks/twitter_block.py
- Uses OAuth 2.0 PKCE flow for authentication
- First run opens browser for authorization
- Saves tokens at
~/.twitter-tokens/token.jsonfor subsequent runs - Downloads all media (photos, videos) associated with tweets
- Preserves full tweet metadata in JSON format
python workflows/twitter.pyFile: reddit.py
Downloads saved posts, comments, and upvoted content using PRAW (Python Reddit API Wrapper).
- Create a Reddit app at https://www.reddit.com/prefs/apps
- App type: "script"
- Redirect URI: http://localhost:8080
- Set the following in
.env:REDDIT_CLIENT_ID=your_client_id REDDIT_CLIENT_SECRET=your_client_secret REDDIT_USER_AGENT=aqueduct-backup/1.0 REDDIT_USERNAME=your_username REDDIT_PASSWORD=your_password - Register the Prefect block:
python blocks/reddit_block.py
- Downloads saved posts, saved comments, and upvoted posts
- Saves media files (images, videos) alongside metadata
- Uses date-segmented directories for organization
- Implements rate limiting to avoid Reddit API throttling
python workflows/reddit.pyFile: google_drive.py
Downloads files and folders with Google Workspace exports using the Drive API.
- Create a Google Cloud project and enable the Drive API
- Create OAuth2 credentials (Desktop app type)
- Download credentials JSON and set
GOOGLE_DRIVE_CREDENTIALS_PATHin.env - Register the Prefect block:
python blocks/google_drive_block.py
- Exports Google Workspace files (Docs, Sheets, Slides) to standard formats
- First run opens browser for OAuth authorization
- Tokens cached at
~/.google-drive-tokens/token.json - Preserves folder structure
- Date-segmented backups for idempotency
python workflows/google_drive.pyFile: amazon.py
Downloads order history from Amazon.
Important: Requires Python 3.12 or 3.11 due to dependency constraints (amazon-orders → amazoncaptcha → pillow<9.6.0 cannot build on Python 3.13).
- Install with compatible Python version:
uv venv --python 3.12 source .venv/bin/activate uv pip install -e .
- Set Amazon credentials in
.env:AMAZON_EMAIL=your_email AMAZON_PASSWORD=your_password - Register the Prefect block:
python blocks/amazon_block.py
- May require CAPTCHA solving on first run
- Downloads order details and metadata
- Preserves order history in structured JSON format
python workflows/amazon.pyFile: discord.py
Backs up Discord messages, attachments, and metadata.
- Get your Discord user token (see instructions in
discord.py) - Set
DISCORD_TOKENin.env - Register the Prefect block:
python blocks/discord_block.py
- Backs up messages from specified channels
- Downloads all attachments (images, videos, files)
- Preserves message metadata and thread structure
- Rate-limited to respect Discord API limits
python workflows/discord.pyFile: youtube.py
Downloads YouTube videos using yt-dlp. Supports Twilio SMS webhook trigger.
System dependencies:
brew install ffmpeg # macOS
# or
apt install ffmpeg # UbuntuTwilio SMS Webhook (optional):
- Get a Twilio account and phone number
- Run the webhook server:
python workflows/youtube.py --serve
- Expose locally with ngrok:
ngrok http 5000
- Configure Twilio webhook URL:
https://your-ngrok-url.ngrok.io/sms - Text a YouTube URL to your Twilio number
# Interactive download
python workflows/youtube.py
# Run Twilio webhook server
python workflows/youtube.py --serve
# In code
from workflows.youtube import download_youtube_video
download_youtube_video(url="https://youtube.com/watch?v=...")- No authentication required (public videos only)
- Uses
download_archive.txtfor idempotency (won't re-download) - Saves video + metadata JSON + thumbnail + subtitles
- Default quality: 1080p max
File: crunchyroll.py
Downloads anime from Crunchyroll using multi-downloader-nx.
- Install system dependencies (see System Dependencies)
- Build multi-downloader-nx from source (see above)
- Have a Crunchyroll Premium subscription
- Authenticate with Crunchyroll:
This will prompt you to log in. Credentials are saved locally.
multi-downloader-nx --service crunchy --auth
multi-downloader-nx --service crunchy --search "solo leveling"Edit config/crunchyroll_series.json:
{
"crunchyroll_config": {
"quality": "1080",
"audio_lang": "jaJP",
"subtitle_lang": "enUS"
},
"series": [
{
"name": "Solo Leveling",
"url": "https://www.crunchyroll.com/series/GDKHZEJ0K/solo-leveling",
"episodes": "1-12",
"enabled": true
}
]
}# Run backup of all configured series
python workflows/crunchyroll.py
# Add a series interactively
python workflows/crunchyroll.py --add
# List configured series
python workflows/crunchyroll.py --list
# Download single series directly
python workflows/crunchyroll.py --download "Solo Leveling" "https://crunchyroll.com/series/..." "1-12"- Crunchyroll uses DRM; may require additional decryption tools
- Episode ranges:
1-12(range),1-(all from 1),1,5,10(specific) - Output: MKV with Japanese audio and English subtitles by default
File: example.py
A template workflow showing the basic Prefect flow structure. Use this as a reference when creating new workflows.
python workflows/example.pyFile: cannot-automate/google_photos.py
Status: Cannot be automated as of April 1, 2025
Google deprecated the Photos Library API scopes required for programmatic backup. The API now only supports limited read access for approved applications.
See workflows/cannot-automate/README.md for details.
All backups are stored in ./backups/local/:
backups/local/
├── github/
│ └── {username}/
│ └── repositories/
│ └── {date}/
├── twitter/
│ └── {username}/
│ └── {date}/
│ ├── tweets/
│ ├── bookmarks/
│ └── likes/
├── reddit/
│ └── {username}/
│ └── {date}/
│ ├── saved_posts/
│ ├── saved_comments/
│ └── upvoted/
├── google_drive/
│ └── {email}/
│ └── {date}/
├── amazon/
│ └── {email}/
│ └── orders/
│ └── {date}/
├── discord/
│ └── {username}/
│ └── {date}/
├── youtube/
│ ├── videos/
│ │ └── {uploader}/
│ ├── download_archive.txt
│ └── _records/
└── crunchyroll/
└── {series_name}/
Before starting, research the platform's APIs:
- Ensure the APIs are not deprecated (e.g., Google Photos API was deprecated April 1, 2025)
- Verify the workflow can be fully automated without manual intervention
- If manual steps are required (e.g., CAPTCHA, manual auth), place in
cannot-automate/directory
Development process:
- Use the workflow-builder agent to create the initial workflow (see
.claude/agents/workflow-builder.md) - Create a credentials block in
blocks/that extendsprefect.blocks.core.BlockwithSecretStrfields - Add corresponding environment variables to
.env.example - Implement the workflow following these patterns:
- Use
@taskdecorator for granular operations (auth, API calls, downloads, processing) - Create a main
@flowdecorated function to orchestrate tasks - Follow backup structure:
./backups/local/platform/username/content_type/ - Save metadata as JSON for future querying
- Use
cache_policy=NO_CACHEto ensure fresh data
- Use
- Use the idempotency-guardian agent to ensure the workflow is idempotent
- Use the workflow-testing-agent to test the workflow end-to-end
- Add documentation to this README with setup instructions, caveats, and usage examples