Skip to content

VinayakMaharaj/AI-Developer-Collaboration-Platform

Repository files navigation

AI Codebase Intelligence Platform

An AI-powered developer tool that helps teams understand, query, and collaborate on large GitHub codebases using Retrieval-Augmented Generation (RAG).

The platform reduces onboarding friction and context switching by transforming an entire repository into a searchable, explainable knowledge base backed by source-grounded AI responses.

Live Demo(add deployed URL)


What It Does

Onboarding to a new codebase or switching context between repositories is expensive. This platform lets developers ask natural-language questions about any GitHub repository and get answers grounded in the actual source files — not hallucinated summaries.

  • Ask "How does authentication work in this repo?" and get a sourced answer with file references
  • Get AI-generated summaries of commit diffs without reading raw diffs
  • Share project workspaces with teammates under access-controlled projects
  • Track AI usage per project with a credit-based billing model

Features

  • Repository ingestion & indexing — connects to GitHub via API, parses source files and commits, builds a semantic vector index
  • Contextual code Q&A (RAG) — natural-language questions answered with references to the exact source files used
  • Commit summarization — AI-generated summaries of diffs to understand changes at a glance
  • Meeting transcription — AssemblyAI integration for transcribing and indexing team discussions alongside code context
  • Collaborative projects — shared workspaces with team access control and archived project support
  • Credit-based usage model — tracks AI inference cost per project and user, with Stripe-powered credit purchases

Architecture

GitHub Repository
       ↓
  GitHub API (file tree, commits, diffs)
       ↓
  Chunking + Embedding Pipeline (Google Gemini)
       ↓
  pgvector (PostgreSQL) ──── semantic vector index
       ↓
  RAG Query
       ↓
  Gemini LLM → source-grounded response with file references
 
  AssemblyAI ──── meeting transcripts → same vector index
  Clerk      ──── auth + user/project management
  Stripe     ──── credit purchases + webhook-based usage tracking
  Prisma ORM ──── relational state (users, projects, permissions, credits)

Tech Stack

Layer Technology
Framework Next.js (App Router), TypeScript
AI / LLM Google Gemini API
RAG & Embeddings pgvector (PostgreSQL-native vector search)
Transcription AssemblyAI
Database PostgreSQL via Prisma ORM (NeonDB hosted)
Auth Clerk
Payments Stripe (credits + webhooks)
GitHub Integration GitHub API
Package Manager Bun
Deployment Fly.io + Docker

Project Structure

├── src/
│   ├── app/          # Next.js App Router — pages and API routes
│   ├── components/   # UI components
│   ├── lib/          # RAG pipeline, GitHub ingestion, embedding logic
│   └── env.js        # Environment variable schema and validation
├── prisma/           # Schema and migrations
├── Dockerfile        # Container build
├── fly.toml          # Fly.io deployment config
├── start-database.sh # Local PostgreSQL + pgvector setup script
└── .env.example      # Environment variable reference

Running Locally

Prerequisites: Node.js 18+, Bun, PostgreSQL with pgvector extension, accounts for Clerk, Stripe, Google Gemini, AssemblyAI, and a GitHub OAuth app

git clone https://github.com/VinayakMaharaj/AI-Developer-Collaboration-Platform.git
cd AI-Developer-Collaboration-Platform
bun install

Set up local PostgreSQL with pgvector:

./start-database.sh

Create a .env file (copy from .env.example):

# Database (PostgreSQL with pgvector)
DATABASE_URL="postgresql://postgres:password@localhost:5432/yourdbname"
 
# Clerk
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=
CLERK_SECRET_KEY=
 
# Google Gemini
GEMINI_API_KEY=
 
# AssemblyAI
ASSEMBLYAI_API_KEY=
 
# GitHub OAuth
GITHUB_CLIENT_ID=
GITHUB_CLIENT_SECRET=
 
# Stripe
STRIPE_SECRET_KEY=
STRIPE_WEBHOOK_SECRET=
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=
bunx prisma migrate dev
bun dev

Open http://localhost:3000.


Deployment

The app is containerised and deployed on Fly.io:

fly deploy

The Dockerfile and fly.toml are pre-configured. Set your production secrets in Fly's dashboard or via fly secrets set KEY=value.


Design Decisions

Why pgvector over Pinecone? pgvector keeps the vector index in the same PostgreSQL database as the relational data — no additional managed service, no extra network hop, and transactional consistency between the relational and vector layers. For a project where the access-controlled project structure is tightly coupled to what vectors are retrievable, co-location was the right call.

Why RAG over fine-tuning for code understanding? Fine-tuning would bake in knowledge of a specific codebase at training time, requiring retraining for every repo or every significant change. RAG retrieves context dynamically at inference time — any repo, any version, zero retraining.

Why AssemblyAI for transcription? Building accurate speech-to-text from scratch is a months-long problem. AssemblyAI's API gives speaker diarization, timestamped transcripts, and high accuracy out of the box. The transcripts are then embedded and indexed alongside code — so developers can query across both.

Why a credit model over flat subscriptions? AI inference costs scale with usage — a flat subscription creates adverse incentives for heavy users. Credits let users pay proportionally to how much AI compute they actually consume, making the unit economics sustainable.


Known Limitations

  • RAG quality depends on chunking strategy and embedding quality — long files with sparse comments chunk poorly
  • Large repositories increase indexing time and storage costs proportionally
  • AI-generated commit summaries may miss architectural intent that isn't captured in the diff itself

These are intentional tradeoffs to balance usability, performance, and cost.


What I Learned

  • Designing a production-grade RAG pipeline over real, heterogeneous codebases
  • When pgvector is preferable to a dedicated vector DB (and when it isn't)
  • Combining AI systems with authentication, billing, and multi-user collaboration
  • Tradeoffs between retrieval depth, latency, and per-query AI cost at the SaaS layer

Author

Vinayak MaharajLinkedIn · Portfolio

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors