Skip to content

reory/web-atlas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ•ΈοΈ WebAtlas: Interactive SEO Link Visualizer

Last Commit Repo Size License

PyVis NetworkX Openpyxl

Interactive SEO Link Visualizer for Screaming Frog Exports. WebAtlas is a Python-based technical SEO tool that transforms flat Screaming Frog Excel exports into dynamic, interactive network graphs. It allows SEOs to visualize internal linking structures, identify crawl depth issues, and discover topical silos that are invisible in standard spreadsheets.


πŸ“Ή Demo video

Recording.2026-04-02.235205.mp4

πŸš€ The Core Problem & Solution

  • The "Graphviz Nightmare"Initially, this project attempted to use Graphviz (the industry standard for diagramming). However, I encountered a major hurdle:
  • Graphviz is designed for Hierarchical Trees (like org charts).
  • Websites are Non-Linear Networks (where any page can link to any other page). When Graphviz tries to force a website into a tree structure, it results in an "infinite vertical strip" that is impossible to read or navigate.
  • The "PyVis" Evolution I pivoted to NetworkX (for the mathematical graph logic) and PyVis (for the browser-based visualization).
  • Physics-Based Layout: Instead of a static image, I now have a "force-directed" graph. Nodes push each other away until they find space, naturally "untangling" the website's structure.
  • Interactivity: Users can zoom, drag nodes, and hover over elements to see hidden metadata.

✨ Key Features

  • Interactive Hover Tooltips: Nodes are labeled with short, clean URLs (slugs), but hovering over them reveals the full URL and the exact XPath/Link Position where the link was found.
  • Noise Reduction: Automatically filters out non-hyperlinks (Images, CSS, JS, etc.) to focus specifically on the "Internal Link Ecosystem.
  • Performance Optimized: Uses the Barnes-Hut physics engine and stabilization iterations to ensure that even large sites (the "Mothball" effect) load quickly in the browser.
  • Dark - Mode Visualization: Designed with a high-contrast dark theme for better visibility of complex link paths.

πŸ› οΈ Installation

Ensure you have Python 3.10+ installed.

Clone the repository:

git clone https://github.com/reory/web-atlas.git
cd web-atlas

Install dependencies:

pip install pyvis networkx openpyxl

πŸ“‚ Usage

  • To generate your interactive map, you need two exports from Screaming Frog (XLSX format):Internal (All HTML pages)All Outlinks (The massive list of every link found)
  • Run the following command: PowerShell
python main.py --xlsx internal.xlsx All_Outlinks.xlsx atlas.html
  • Open atlas.html in any modern web browser to explore your site's architecture.

Tech Stack

  • PyVis is a Python library built specifically for creating interactive network graphs that run in a web browser.
  • Networkx Performs advanced SEO analysis, such as identifying "Orphan Pages" (nodes with no incoming edges) or calculating "PageRank" to see which pages hold the most authority.
  • OpenPyxl is a library used to read and write Excel (.xlsx) files.

πŸ“ Project Structure

web-atlas/
β”œβ”€β”€ webatlas/                 # Core package
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ sf_import.py          # Screaming Frog XLSX importer
β”‚   β”œβ”€β”€ graph_builder.py      # NetworkX graph construction + SEO logic
β”‚                             # PyVis HTML visualization
β”‚
β”œβ”€β”€ lib/                      # Additional assets or helper modules
β”‚   └── ...                   
β”‚
β”œβ”€β”€ tests/                    # Pytest suite
β”‚   β”œβ”€β”€ test_graph_builder.py
β”‚   
β”‚
β”œβ”€β”€ main.py                   # CLI entry point
β”œβ”€β”€ README.md                 # Project documentation
β”œβ”€β”€ CONTRIBUTING.md           # Contribution guidelines
β”œβ”€β”€ LICENSE.md                # MIT license
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ pyproject.toml            # Build + metadata config
β”œβ”€β”€ demo.mp4                  # Demo video

🧠 Why this Architecture?

  • Component Why I chose it
  • Openpyxl Allows us to read massive Excel files directly without needing a database
  • NetworkX The "brain" of the project. It handles the complex math of linking thousands of nodes.
  • PyVis The "beauty." It turns the math into an interactive HTML/JavaScript experience that runs in any browser.

πŸ•΅οΈβ€β™‚οΈ How to Read the "Mothball"

  • When you first open your graph, you might see a dense cluster (a "mothball"). Here’s how to audit it:

  • The Core: The center of the ball usually contains your Header and Footer links.

  • The Planets: Small clusters orbiting the main ball are often your Blog categories or Service silos.

  • The Drifters: Nodes floating far away with only one connection are "Deep Pages" or near-orphans that need better internal linking to rank.

  • Built for SEOs who want to see their data, not just read it.


πŸ§ͺ Tests

  • The project includes a Pytest suite to validate graph initialization and data integrity.
  • Run python -m pytest tests/ to ensure the "mothball" logic and XPath mapping are functioning correctly before deployment.

πŸ›£οΈ Roadmap Features

  • [ ]🚦Status Code Color-Coding Right now, every node is blue. In a real audit, you want the "danger" to pop out visually.

The Feature: Modify sf_import.py to pull the Status Code column from the "Internal" export.

The Visual: * Green: 200 OK

Yellow: 301/302 Redirects

Red: 404/500 Errors

  • [ ]🎈 Node Scaling by "Internal PageRank"

The Feature: Use NetworkX to calculate the pagerank or in_degree (number of incoming links) for every node.

The Visual: Important pages with lots of "Link Juice" become large, dominant circles, while deep, unimportant pages stay tiny.

  • [ ]πŸ“‚ Folder-Based "Silo" Grouping Websites are usually organized by folders (e.g., /blog/, /products/, /services/).

Use Python's string splitting to group URLs by their first subfolder.

The Visual: Add a "Legend" or use different colors for different folders. You can even add a toggle to "Hide all Blog posts" to see only the core service architecture.


  • Built by Roy Peters 😁 LinkedIn

About

WebAtlas is a Python-based technical SEO tool that transforms flat Screaming Frog Excel exports into dynamic, interactive network graphs. It allows SEOs to visualize internal linking structures, identify crawl depth issues, and discover topical silos that are invisible in standard spreadsheets.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors