GolamV2 is a low-memory web crawler designed for maximum throughput in resource-constrained environments. It supports multiple hunting modes including email extraction, keyword searching, and dead link detection. It is a rewrite of the Python Version Gollum Spyder here. Includes a Custom Interactive CLI Explore for its BadgerDB database
- Multi-Purpose Crawling: Email hunting, keyword searching, dead link detection
- Memory Efficiency: Can run with decent through put on low resource environments
- Robots.txt Compliant: Respects robots.txt and crawl delays
- Real-time Dashboard: Web-based monitoring interface
- Interactive CLI Explorer: Comprehensive data exploration and analysis tool
- Clean Architecture: Modular, maintainable codebase
- Efficient Storage: BadgerDB for persistent storage
- Bloom Filter: Memory-efficient duplicate URL detection
- Priority Queue: Smart URL queuing with database fallback
- URL Queue: Priority-based queue (100k URLs limit) with automatic database refilling and spilling
- Bloom Filter: To dedupe
- Storage Layer: BadgerDB for persistent URL and result storage
- Worker Pool: Configurable concurrent workers
- Content Extractor:
- Robots Checker: Compliant robots.txt parsing and enforcement. Also parses sitemaps
- Dashboard: Real-time web interface for monitoring
# Clone the repository
git clone https://github.com/nobrainghost/golamv2
cd GolamV2
# Install dependencies
go mod tidy
# Build the application
go build -o golamv2 main.go./golamv2 --email --url https://example.com --workers 25./golamv2 --keywords "password,login,admin" --url https://example.com --workers 30./golamv2 --domains --url https://example.com --workers 20./golamv2 --email --domains --keywords "smeagol,ring" --url https://example.com --workers 40# Explore crawl data interactively
./golamv2 explore
# Explore with custom data directory
./golamv2 explore --data /path/to/data./golamv2 \
--email \
--url https://example.com \
--workers 50 \
--memory 400 \
--depth 5 \
--dashboard 8080| Flag | Description | Default |
|---|---|---|
--email |
Hunt for email addresses | false |
--domains |
Hunt for dead URLs and domains | false |
--keywords |
Hunt for specific keywords (comma-separated) | [] |
--url |
Starting URL to crawl (required) | - |
--workers |
Maximum number of concurrent workers | 50 |
--memory |
Maximum memory usage in MB | 500 |
--depth |
Maximum crawling depth | 5 |
--dashboard |
Dashboard port | 8080 |
Access the real-time dashboard at http://localhost:8080 (or your specified port). The paths /db currently dont work
- Real-time Metrics: Live updates via a WebSocket
- Performance Monitoring: URLs/second, memory usage, uptime
- Queue Status: URLs in queue, database, active workers
- Findings Summary: Emails, keywords, dead links found
- Success Rate: Error tracking and success percentage
GolamV2 includes an interactive CLI tool for exploring and analyzing crawl data stored in its BadgerDB databases.
# Use default data directory (golamv2_data)
./golamv2 explore
# Specify custom data directory
./golamv2 explore --data /path/to/data
# With output file for exports
./golamv2 explore --output results.json| Command | Description | Example |
|---|---|---|
help |
Show all available commands | help |
stats |
Display database statistics | stats |
urls [limit] |
List URLs (default: 10) | urls 20 |
results [limit] |
List crawl results (default: 10) | results 50 |
search <term> |
Search in results content | search "admin panel" |
emails [limit] |
Show found emails | emails 25 |
keywords [limit] |
Show found keywords | keywords 15 |
deadlinks [limit] |
Show dead links found | deadlinks 30 |
export <type> |
Export data to JSON | export emails |
raw <key> |
Show raw data for specific key | raw url:example.com |
analyze |
Detailed data analysis | analyze |
timeline |
Show crawling timeline | timeline |
domains |
Show domain statistics | domains |
clear |
Clear terminal screen | clear |
quit/exit |
Exit explorer | quit |
- Full-text search across all results
- Search in titles, content, emails, and keywords
- Filter by status, domain, or content type
- Export URLs, results, emails, or keywords to JSON
- Configurable output files
- Data formatting for further analysis ##NOTE : NOT FULLY TESTED
- Domain-based statistics and analysis
- Timeline visualization of crawling activity
- Success rate analysis by domain
- Error categorization and reporting
- Performance metrics and trends
$ ./golamv2 explore
🕸️ GolamV2 Data Explorer
========================
Interactive tool to explore crawl data
Data path: golamv2_data
golamv2> stats
Database Statistics
=====================
URLs in database: 2,767
Results in database: 37,635
Emails found: 118,613
Keywords found: 1,258
Dead links found: 20,422
Errors encountered: 226
Success rate: 99.4%
golamv2> search "login"
Search Results for "login":
=============================
Found 45 results containing "login"
- example.com/admin - Admin Login Portal
- test.org/user - User Login Page
...
golamv2> export emails
Exporting emails...
Exported 118,613 emails to emails_export.json
golamv2> quit
Goodbye! [waveEmoji]| Flag | Short | Description | Default |
|---|---|---|---|
--data |
-d |
Path to GolamV2 data directory | golamv2_data |
--output |
-o |
Output file for exports | (none) |
- Stores pending URLs for crawling
- Automatic queue refilling when memory queue is <40% full
- Optimized for fast retrieval and batch operations
- Stores crawling results based on mode:
finds_email: Email hunting resultsfinds_keywords: Keyword search resultsfinds_domains: Dead link detection resultsfinds: All-mode results
- Bloom Filter: 10M URL capacity, 1% false positive rate
- Priority Queue: 100k URL limit with smart refilling
- BadgerDB: Tuned for low memory - can increase to suit your environment
- HTTP Responses: 10MB size limit to prevent memory exhaustion
- Worker Pool: Configurable concurrent processing
- Rate Limiting: Respectful crawling (10 req/sec default)
- Batch Operations: Efficient database operations
- Connection Pooling: Reused HTTP connections
- Automatic Parsing: Fetches and caches robots.txt
- Crawl Delays: Respects specified delays
- Sitemap Discovery: Extracts sitemap URLs for better crawling
- User-Agent Specific: Follows rules for GolamV2-Crawler/1.0
export GOLAMV2_DB_PATH="./golamv2_data"
export GOLAMV2_USER_AGENT="GolamV2-Crawler/1.0"
export GOLAMV2_RATE_LIMIT="10"- 70%: URL storage and processing
- 30%: Results storage and caching
-
Memory Usage Too High
- Reduce
--workerscount - Lower
--memorylimit - Reduce crawling
--depth
- Reduce
-
Slow Performance
- Increase
--workerscount - Check network connectivity
- Monitor robots.txt delays
- Increase
-
Database Issues
- Ensure sufficient disk space
- Check file permissions
- Restart application for corruption
-
For High-Memory Systems
./golamv2 --workers 100 --memory 800 --url https://example.com
-
For Low-Memory Systems
./golamv2 --workers 20 --memory 200 --url https://example.com
-
For Maximum Throughput
./golamv2 --workers 200 --memory 400 --depth 3 --url https://example.com
MIT License - see LICENSE file for details.
For issues, questions, or contributions, please use the GitHub issue tracker or mailto:golam@benar.me
