Skip to content

rjpr/user-agents-cloudfront

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

User Agents CloudFront

Serves daily-updated JSON files of realistic browser user agents via CloudFront. User agents are sorted by popularity (most common first), useful for web scrapers, crawlers, and automation tools that need to rotate user agents.

Architecture

  • S3 - Private bucket stores JSON files
  • CloudFront - Public HTTPS distribution with OAC for S3 access (24-hour cache TTL)
  • Lambda - Generates unique user agents using real-world distribution
  • EventBridge - Triggers Lambda daily at 3:00 AM UTC

Prerequisites

  • AWS CLI configured with appropriate permissions
  • Node.js 20+ and npm
  • jq (for teardown script)

Data Source

User agent data comes from intoli/user-agents, which provides realistic browser fingerprints based on real-world usage statistics.

Usage

# Deploy (uses defaults: bucket=rjpr-user-agents, region=us-east-2)
./deploy.sh

# Deploy with custom config
BUCKET_NAME=my-user-agents AWS_REGION=us-west-2 ./deploy.sh

# Teardown all resources
./teardown.sh

Configuration

Variable Default Description
BUCKET_NAME rjpr-user-agents S3 bucket name
AWS_REGION us-east-2 AWS region for S3 bucket
OUTPUT_FILES all Comma-separated list of files to generate. Valid values: combined, combinedFull, desktop, desktopFull, mobile, mobileFull
SAMPLE_COUNT 100 Number of user agents per category
INVALIDATE_CACHE false Invalidate CloudFront cache after upload

CloudFront uses the AWS managed "CachingOptimized" cache policy with a 24-hour TTL. Since files are updated daily, cache invalidation is optional.

Output Files

Six JSON files can be generated (count in filename, default 100):

File Description
user-agents-100.json Combined desktop + mobile strings
user-agents-100-full.json Combined with full fingerprint data
user-agents-100-desktop.json Desktop only strings
user-agents-100-desktop-full.json Desktop only with full data
user-agents-100-mobile.json Mobile only strings
user-agents-100-mobile-full.json Mobile only with full data

Simple format

{
  "generatedAt": "2024-01-15T03:00:00.000Z",
  "desktop": ["Mozilla/5.0 (Windows NT 10.0; Win64; x64)..."],
  "mobile": ["Mozilla/5.0 (iPhone; CPU iPhone OS 17_0...)..."]
}

Full format (for puppeteer/playwright)

{
  "generatedAt": "2024-01-15T03:00:00.000Z",
  "desktopFull": [
    {
      "platform": "Win32",
      "pluginsLength": 3,
      "vendor": "Google Inc.",
      "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)...",
      "viewportHeight": 660,
      "viewportWidth": 1260,
      "deviceCategory": "desktop",
      "screenHeight": 800,
      "screenWidth": 1280
    }
  ],
  "mobileFull": []
}

Troubleshooting

View Lambda logs:

aws logs tail /aws/lambda/update-user-agents --follow

Manually trigger an update:

aws lambda invoke --function-name update-user-agents /tmp/output.json && cat /tmp/output.json

Test locally:

cd src && npm run test:local

Cost

At low volume, this runs essentially for free:

  • Lambda: ~1 invocation/day, ~5 seconds each
  • S3: A few KB of JSON files
  • CloudFront: Depends on request volume (first 1TB/month free)

License

MIT. See THIRD-PARTY-LICENSES for data source attribution.

About

AWS Lambda + CloudFront service that serves daily-updated browser user agents sorted by popularity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors