AI Codebase Intelligence Platform

An AI-powered developer tool that helps teams understand, query, and collaborate on large GitHub codebases using Retrieval-Augmented Generation (RAG).

The platform reduces onboarding friction and context switching by transforming an entire repository into a searchable, explainable knowledge base backed by source-grounded AI responses.

Live Demo → (add deployed URL)

What It Does

Onboarding to a new codebase or switching context between repositories is expensive. This platform lets developers ask natural-language questions about any GitHub repository and get answers grounded in the actual source files — not hallucinated summaries.

Ask "How does authentication work in this repo?" and get a sourced answer with file references
Get AI-generated summaries of commit diffs without reading raw diffs
Share project workspaces with teammates under access-controlled projects
Track AI usage per project with a credit-based billing model

Features

Repository ingestion & indexing — connects to GitHub via API, parses source files and commits, builds a semantic vector index
Contextual code Q&A (RAG) — natural-language questions answered with references to the exact source files used
Commit summarization — AI-generated summaries of diffs to understand changes at a glance
Meeting transcription — AssemblyAI integration for transcribing and indexing team discussions alongside code context
Collaborative projects — shared workspaces with team access control and archived project support
Credit-based usage model — tracks AI inference cost per project and user, with Stripe-powered credit purchases

Architecture

GitHub Repository
       ↓
  GitHub API (file tree, commits, diffs)
       ↓
  Chunking + Embedding Pipeline (Google Gemini)
       ↓
  pgvector (PostgreSQL) ──── semantic vector index
       ↓
  RAG Query
       ↓
  Gemini LLM → source-grounded response with file references
 
  AssemblyAI ──── meeting transcripts → same vector index
  Clerk      ──── auth + user/project management
  Stripe     ──── credit purchases + webhook-based usage tracking
  Prisma ORM ──── relational state (users, projects, permissions, credits)

Tech Stack

Layer	Technology
Framework	Next.js (App Router), TypeScript
AI / LLM	Google Gemini API
RAG & Embeddings	pgvector (PostgreSQL-native vector search)
Transcription	AssemblyAI
Database	PostgreSQL via Prisma ORM (NeonDB hosted)
Auth	Clerk
Payments	Stripe (credits + webhooks)
GitHub Integration	GitHub API
Package Manager	Bun
Deployment	Fly.io + Docker

Project Structure

├── src/
│   ├── app/          # Next.js App Router — pages and API routes
│   ├── components/   # UI components
│   ├── lib/          # RAG pipeline, GitHub ingestion, embedding logic
│   └── env.js        # Environment variable schema and validation
├── prisma/           # Schema and migrations
├── Dockerfile        # Container build
├── fly.toml          # Fly.io deployment config
├── start-database.sh # Local PostgreSQL + pgvector setup script
└── .env.example      # Environment variable reference

Running Locally

Prerequisites: Node.js 18+, Bun, PostgreSQL with pgvector extension, accounts for Clerk, Stripe, Google Gemini, AssemblyAI, and a GitHub OAuth app

git clone https://github.com/VinayakMaharaj/AI-Developer-Collaboration-Platform.git
cd AI-Developer-Collaboration-Platform
bun install

Set up local PostgreSQL with pgvector:

./start-database.sh

Create a .env file (copy from .env.example):

# Database (PostgreSQL with pgvector)
DATABASE_URL="postgresql://postgres:password@localhost:5432/yourdbname"
 
# Clerk
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=
CLERK_SECRET_KEY=
 
# Google Gemini
GEMINI_API_KEY=
 
# AssemblyAI
ASSEMBLYAI_API_KEY=
 
# GitHub OAuth
GITHUB_CLIENT_ID=
GITHUB_CLIENT_SECRET=
 
# Stripe
STRIPE_SECRET_KEY=
STRIPE_WEBHOOK_SECRET=
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=

bunx prisma migrate dev
bun dev

Open http://localhost:3000.

Deployment

The app is containerised and deployed on Fly.io:

fly deploy

The Dockerfile and fly.toml are pre-configured. Set your production secrets in Fly's dashboard or via fly secrets set KEY=value.

Design Decisions

Why pgvector over Pinecone? pgvector keeps the vector index in the same PostgreSQL database as the relational data — no additional managed service, no extra network hop, and transactional consistency between the relational and vector layers. For a project where the access-controlled project structure is tightly coupled to what vectors are retrievable, co-location was the right call.

Why RAG over fine-tuning for code understanding? Fine-tuning would bake in knowledge of a specific codebase at training time, requiring retraining for every repo or every significant change. RAG retrieves context dynamically at inference time — any repo, any version, zero retraining.

Why AssemblyAI for transcription? Building accurate speech-to-text from scratch is a months-long problem. AssemblyAI's API gives speaker diarization, timestamped transcripts, and high accuracy out of the box. The transcripts are then embedded and indexed alongside code — so developers can query across both.

Why a credit model over flat subscriptions? AI inference costs scale with usage — a flat subscription creates adverse incentives for heavy users. Credits let users pay proportionally to how much AI compute they actually consume, making the unit economics sustainable.

Known Limitations

RAG quality depends on chunking strategy and embedding quality — long files with sparse comments chunk poorly
Large repositories increase indexing time and storage costs proportionally
AI-generated commit summaries may miss architectural intent that isn't captured in the diff itself

These are intentional tradeoffs to balance usability, performance, and cost.

What I Learned

Designing a production-grade RAG pipeline over real, heterogeneous codebases
When pgvector is preferable to a dedicated vector DB (and when it isn't)
Combining AI systems with authentication, billing, and multi-user collaboration
Tradeoffs between retrieval depth, latency, and per-query AI cost at the SaaS layer

Author

Vinayak Maharaj — LinkedIn · Portfolio

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
prisma		prisma
.dockerignore		.dockerignore
.env.example		.env.example
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
bun.lockb		bun.lockb
components.json		components.json
fly.toml		fly.toml
next-env.d.ts		next-env.d.ts
next.config.js		next.config.js
package.json		package.json
postcss.config.cjs		postcss.config.cjs
prettier.config.js		prettier.config.js
start-database.sh		start-database.sh
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Codebase Intelligence Platform

What It Does

Features

Architecture

Tech Stack

Project Structure

Running Locally

Deployment

Design Decisions

Known Limitations

What I Learned

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Codebase Intelligence Platform

What It Does

Features

Architecture

Tech Stack

Project Structure

Running Locally

Deployment

Design Decisions

Known Limitations

What I Learned

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages