|
| 1 | +--- |
| 2 | +title: "Networks: Basic Components" |
| 3 | +subtitle: "Nodes and Edges" |
| 4 | +format: html |
| 5 | +--- |
| 6 | + |
| 7 | +A graph is a mathematical structure used to model pairwise relations between objects. It consists of **nodes** (also called vertices) and **edges** (also called links) that connect pairs of nodes. |
| 8 | + |
| 9 | +In this page, we will explore the basic components of a graph using the `networkx` library in Python. We will cover: |
| 10 | + |
| 11 | +- What are nodes and edges? |
| 12 | +- How to create a graph using `networkx`. |
| 13 | + |
| 14 | +First, let's import the necessary library: |
| 15 | + |
| 16 | +```{python} |
| 17 | +#| echo: false |
| 18 | +#| message: false |
| 19 | +import networkx as nx |
| 20 | +``` |
| 21 | + |
| 22 | +::: {.callout-tip collapse="true"} |
| 23 | +## Module not found error? |
| 24 | + |
| 25 | +If you get a `ModuleNotFoundError` for `networkx`, you may need to install it first. |
| 26 | + |
| 27 | +If you are working on **Google Colab**, you can run: |
| 28 | +```python |
| 29 | +!pip install networkx |
| 30 | +``` |
| 31 | + |
| 32 | +If you are working in a local Python environment, use conda or run: |
| 33 | +```bash |
| 34 | +pip install networkx |
| 35 | +``` |
| 36 | +::: |
| 37 | + |
| 38 | +And now we can initialize an empty graph: |
| 39 | + |
| 40 | +```{python} |
| 41 | +#| echo: true |
| 42 | +#| message: true |
| 43 | +G = nx.Graph() |
| 44 | +
|
| 45 | +print(G) |
| 46 | +``` |
| 47 | + |
| 48 | +Our variable `G` is now an empty graph object. We can add nodes and edges to it, which we will see in the next sections. |
| 49 | + |
| 50 | +## Nodes (Vertices) |
| 51 | + |
| 52 | +Nodes represent the entities in a graph. They can be anything: people in a social network, airports in a flight network, or web pages in the internet. Each node can have attributes that provide additional information about it. For example, in a social network, a node might represent a person and have attributes like name, age, or location. |
| 53 | + |
| 54 | +In order to add nodes to our graph, we can use the `add_node(<id>)` method. The `<id>` can be any hashable Python object. We can see the list of nodes in the graph using the `nodes()` method. |
| 55 | + |
| 56 | +```{python} |
| 57 | +#| echo: true |
| 58 | +#| message: true |
| 59 | +# Add three nodes to the graph |
| 60 | +G.add_node("Spain") |
| 61 | +G.add_node("Portugal") |
| 62 | +G.add_node("France") |
| 63 | +
|
| 64 | +# Show the nodes in the graph |
| 65 | +print(G.nodes()) |
| 66 | +``` |
| 67 | + |
| 68 | +::: {.callout-tip collapse="true"} |
| 69 | +## What does "hashable" mean? |
| 70 | + |
| 71 | +In Python, a hashable object is an object that has a hash value that remains constant during its lifetime. This means that the object can be used as a key in a dictionary or as an element in a set. Examples of hashable objects include integers, strings, and tuples (as long as they contain only hashable types). Lists and dictionaries are not hashable because they are mutable (their contents can change). |
| 72 | +::: |
| 73 | + |
| 74 | +### Node Attributes |
| 75 | + |
| 76 | +We can also add attributes to nodes to store additional information. Think of `G.nodes` as a dictionary where the keys are the node IDs and the values are dictionaries of attributes. We can add attributes to a node by accessing it through `G.nodes[<id>]` and assigning values to the attributes. |
| 77 | + |
| 78 | +For example, we can add a "population" attribute to our country nodes: |
| 79 | + |
| 80 | +```{python} |
| 81 | +#| echo: true |
| 82 | +#| message: true |
| 83 | +# Add population attribute to the nodes |
| 84 | +G.nodes["Spain"]["population"] = 47_000_000 |
| 85 | +G.nodes["Portugal"]["population"] = 10_000_000 |
| 86 | +G.nodes["France"]["population"] = 67_000_000 |
| 87 | +
|
| 88 | +# Show the nodes with their attributes |
| 89 | +population = nx.get_node_attributes(G, 'population') |
| 90 | +for node, pop in population.items(): |
| 91 | + print(f"{node}: {pop} inhabitants") |
| 92 | +``` |
| 93 | + |
| 94 | +Using `nx.get_node_attributes(G, 'population')`, we can retrieve the population attribute for all nodes in the graph as a dictionary. |
| 95 | + |
| 96 | +If you are including a new node and want to add attributes at the same time, you can use the `add_node()` method with keyword arguments. For example: |
| 97 | + |
| 98 | +```{python} |
| 99 | +#| echo: true |
| 100 | +#| message: true |
| 101 | +# Add a new node with attributes |
| 102 | +G.add_node("Italy", population=60_000_000) |
| 103 | +
|
| 104 | +# Show the nodes with their attributes |
| 105 | +population = nx.get_node_attributes(G, 'population') |
| 106 | +for node, pop in population.items(): |
| 107 | + print(f"{node}: {pop} inhabitants") |
| 108 | +``` |
| 109 | + |
| 110 | +## Edges (Links) |
| 111 | + |
| 112 | +Edges represent the connections between nodes in a graph. They can also have attributes, such as weight, which might represent the strength of the connection. For example, in a social network, an edge might represent a friendship between two people, and the weight could represent how close they are. |
| 113 | + |
| 114 | +To add edges to our graph, we can use the `add_edge(<node1>, <node2>)` method. This will create an undirected edge between `node1` and `node2` (we use their IDs here). We can see the list of edges in the graph using the `edges()` method. |
| 115 | + |
| 116 | +```{python} |
| 117 | +#| echo: true |
| 118 | +#| message: true |
| 119 | +# Add edges between the nodes (neighboring countries) |
| 120 | +G.add_edge("Spain", "Portugal") |
| 121 | +G.add_edge("Spain", "France") |
| 122 | +# Show the edges in the graph |
| 123 | +print(G.edges()) |
| 124 | +``` |
| 125 | + |
| 126 | +### Edge Attributes |
| 127 | + |
| 128 | +Just like nodes, edges can also have attributes. We can add attributes to an edge by accessing it through `G.edges[<node1>, <node2>]` and assigning values to the attributes. For example, we can add a "distance" (between capitals) attribute to represent the distance between the countries: |
| 129 | + |
| 130 | +```{python} |
| 131 | +#| echo: true |
| 132 | +#| message: true |
| 133 | +# Add distance attribute to the edges |
| 134 | +G.edges["Spain", "Portugal"]["distance"] = 600 # distance in kilometers |
| 135 | +G.edges["Spain", "France"]["distance"] = 1000 # distance in kilometers |
| 136 | +
|
| 137 | +# Show the edges with their attributes |
| 138 | +distance = nx.get_edge_attributes(G, 'distance') |
| 139 | +for edge, dist in distance.items(): |
| 140 | + print(f"{edge}: {dist} km") |
| 141 | +``` |
| 142 | + |
| 143 | +Using `nx.get_edge_attributes(G, 'distance')`, we can retrieve the distance attribute for all edges in the graph as a dictionary. |
| 144 | + |
| 145 | +Again, you can also add attributes to an edge at the same time as you create it using the `add_edge()` method with keyword arguments. For example: |
| 146 | + |
| 147 | +```{python} |
| 148 | +#| echo: true |
| 149 | +#| message: true |
| 150 | +# Add a new edge with attributes |
| 151 | +G.add_edge("France", "Italy", distance=800) |
| 152 | +
|
| 153 | +# Show the edges with their attributes |
| 154 | +distance = nx.get_edge_attributes(G, 'distance') |
| 155 | +for edge, dist in distance.items(): |
| 156 | + print(f"{edge}: {dist} km") |
| 157 | +``` |
| 158 | + |
| 159 | +### Adding Nodes and Edges Together |
| 160 | + |
| 161 | +We can also add nodes and edges together using the `add_edge()` method. If we try to add an edge between two nodes that do not exist in the graph, `networkx` will automatically create those nodes for us. For example: |
| 162 | + |
| 163 | +```{python} |
| 164 | +#| echo: true |
| 165 | +#| message: true |
| 166 | +# Add an edge between two nodes that do not exist |
| 167 | +G.add_edge("USA", "Canada", distance=3000) |
| 168 | +
|
| 169 | +# In this case, we will have to add the population attribute for the new nodes separately |
| 170 | +G.nodes["USA"]["population"] = 331_000_000 |
| 171 | +G.nodes["Canada"]["population"] = 38_000_000 |
| 172 | +
|
| 173 | +# Show the nodes and edges in the graph |
| 174 | +print("Nodes:", G.nodes()) |
| 175 | +print("Edges:", G.edges()) |
| 176 | +``` |
| 177 | + |
| 178 | + |
| 179 | +## Visualization |
| 180 | + |
| 181 | +Printing the graph object gives us a summary of its structure, but it doesn't show us the actual connections. To visualize the graph, we can use the `draw()` function from `networkx`, which uses Matplotlib to display the graph. |
| 182 | + |
| 183 | +```{python} |
| 184 | +#| echo: true |
| 185 | +#| message: true |
| 186 | +import matplotlib.pyplot as plt |
| 187 | +
|
| 188 | +# Draw the graph |
| 189 | +nx.draw( |
| 190 | + G, |
| 191 | + with_labels=True, # show node labels (IDs) |
| 192 | + node_color='lightblue', # color of the nodes (vertices) |
| 193 | + edge_color='gray', # color of the edges (links) |
| 194 | + node_size=2000, # size of the nodes (vertices) |
| 195 | + font_size=12 # size of the labels (IDs) |
| 196 | + ) |
| 197 | +plt.show() |
| 198 | +``` |
| 199 | + |
| 200 | +### Layouts |
| 201 | + |
| 202 | +The `draw()` function has a `pos` parameter that allows us to specify the layout of the graph. A layout is a way to position the nodes in the graph for visualization. `networkx` provides several built-in layouts, such as `spring_layout`, `circular_layout`, and `shell_layout`. For example, we can use the spring layout, which simulates a force-directed algorithm to position the nodes: |
| 203 | + |
| 204 | +```{python} |
| 205 | +#| echo: true |
| 206 | +#| message: true |
| 207 | +# Use the spring layout for visualization |
| 208 | +pos = nx.spring_layout(G) |
| 209 | +nx.draw( |
| 210 | + G, |
| 211 | + pos=pos, # specify the layout |
| 212 | + with_labels=True, |
| 213 | + node_color='lightblue', |
| 214 | + edge_color='gray', |
| 215 | + node_size=2000, |
| 216 | + font_size=12 |
| 217 | + ) |
| 218 | +plt.show() |
| 219 | +``` |
| 220 | + |
| 221 | +Playing with different layouts can help us better understand the structure of the graph and the relationships between nodes. Try it yourself! |
| 222 | + |
| 223 | +### Visualizing Node Attributes |
| 224 | + |
| 225 | +We can also visualize the attributes of nodes and edges by using different colors or sizes. For example, we can color the nodes based on their population attribute: |
| 226 | + |
| 227 | +```{python} |
| 228 | +#| echo: true |
| 229 | +#| message: true |
| 230 | +# Get the population attribute for each node |
| 231 | +population = nx.get_node_attributes(G, 'population') |
| 232 | +# Draw the graph with node sizes proportional to population |
| 233 | +node_sizes = [population[node] / 1_000_000 for node in G.nodes()] # scale down for visualization |
| 234 | +
|
| 235 | +pos = nx.spring_layout(G) |
| 236 | +
|
| 237 | +nx.draw( |
| 238 | + G, |
| 239 | + pos=pos, |
| 240 | + with_labels=True, |
| 241 | + node_color='lightblue', |
| 242 | + edge_color='gray', |
| 243 | + node_size=node_sizes, # size of the nodes (vertices) proportional to population |
| 244 | + font_size=12, |
| 245 | + ) |
| 246 | +plt.show() |
| 247 | +``` |
| 248 | + |
| 249 | +::: {.callout-tip collapse="true"} |
| 250 | +## What happens if a node is missing an attribute? |
| 251 | + |
| 252 | +In this case, the `population` dictionary will not have an entry for that node, and trying to access it will raise a `KeyError`. To avoid this, we can use the `get()` method of the dictionary, which allows us to specify a default value if the key is not found. For example: |
| 253 | + |
| 254 | +```{python} |
| 255 | +#| echo: true |
| 256 | +#| message: true |
| 257 | +
|
| 258 | +# Add a new node without the population attribute |
| 259 | +G.add_edge("France", "Germany", distance=900) |
| 260 | +# Get the population attribute for each node, using 0 as default if not found |
| 261 | +population = nx.get_node_attributes(G, 'population') |
| 262 | +# Draw the graph with node sizes proportional to population |
| 263 | +node_sizes = [population.get(node, 0) / 1_000_000 for node in G.nodes()] # scale down for visualization |
| 264 | +
|
| 265 | +pos = nx.spring_layout(G) |
| 266 | +
|
| 267 | +nx.draw( |
| 268 | + G, |
| 269 | + pos=pos, |
| 270 | + with_labels=True, |
| 271 | + node_color='lightblue', |
| 272 | + edge_color='gray', |
| 273 | + node_size=node_sizes, # size of the nodes (vertices) proportional to population |
| 274 | + font_size=12, |
| 275 | + ) |
| 276 | +plt.show() |
| 277 | +``` |
| 278 | +::: |
| 279 | + |
| 280 | +**Exercise:** Add a new attribute to the nodes, called "visited", which is a boolean that indicates whether you have visited that country or not. Then, visualize the graph by coloring the nodes differently based on whether you have visited them or not: use blue for visited countries and red for unvisited countries. |
| 281 | + |
| 282 | +::: {.callout-tip collapse="true"} |
| 283 | +## Solution to the Exercise |
| 284 | + |
| 285 | +```{python} |
| 286 | +#| echo: true |
| 287 | +#| message: true |
| 288 | +# Add the "visited" attribute to the nodes |
| 289 | +G.nodes["Spain"]["visited"] = True |
| 290 | +G.nodes["Portugal"]["visited"] = True |
| 291 | +G.nodes["France"]["visited"] = True |
| 292 | +G.nodes["Italy"]["visited"] = True |
| 293 | +G.nodes["USA"]["visited"] = False |
| 294 | +G.nodes["Canada"]["visited"] = True |
| 295 | +
|
| 296 | +# Get the "visited" attribute for each node |
| 297 | +visited = nx.get_node_attributes(G, 'visited') |
| 298 | +# Define node colors based on the "visited" attribute |
| 299 | +node_colors = ['blue' if visited[node] else 'red' for node in G.nodes()] |
| 300 | +
|
| 301 | +pos = nx.spring_layout(G) |
| 302 | +
|
| 303 | +# Draw the graph with node colors based on the "visited" attribute |
| 304 | +nx.draw( |
| 305 | + G, |
| 306 | + pos=pos, |
| 307 | + with_labels=True, |
| 308 | + node_color=node_colors, # color of the nodes based on "visited" attribute |
| 309 | + edge_color='gray', |
| 310 | + node_size=2000, |
| 311 | + font_size=12, |
| 312 | + ) |
| 313 | +plt.show() |
| 314 | +``` |
| 315 | +::: |
| 316 | + |
| 317 | +### Visualizing Edge Attributes |
| 318 | + |
| 319 | +We can also visualize edge attributes by showing them as labels on the edges. For example, we can show the distance attribute on the edges: |
| 320 | + |
| 321 | +```{python} |
| 322 | +#| echo: true |
| 323 | +#| message: true |
| 324 | +# Get the distance attribute for each edge |
| 325 | +distance = nx.get_edge_attributes(G, 'distance') |
| 326 | +# Draw the graph |
| 327 | +pos = nx.spring_layout(G) |
| 328 | +nx.draw( |
| 329 | + G, |
| 330 | + pos=pos, |
| 331 | + with_labels=True, |
| 332 | + node_color='lightblue', |
| 333 | + edge_color='gray', |
| 334 | + node_size=2000, |
| 335 | + font_size=12, |
| 336 | + ) |
| 337 | +# Draw edge labels for the distance attribute |
| 338 | +nx.draw_networkx_edge_labels(G, pos, edge_labels=distance) |
| 339 | +plt.show() |
| 340 | +``` |
| 341 | + |
| 342 | +## Creating a Graph from an Edge List |
| 343 | + |
| 344 | +In practice, we often have data in the form of an edge list, which is a list of pairs of nodes that are connected by edges. We can create a graph directly from an edge list using the `from_edgelist()` method. For example: |
| 345 | + |
| 346 | +```{python} |
| 347 | +#| echo: true |
| 348 | +#| message: true |
| 349 | +# Define our edge list (actors that have worked together in movies) |
| 350 | +edge_list = [ |
| 351 | + ("Antonio Banderas", "Brad Pitt"), # Interview with the Vampire (1994) |
| 352 | + ("Antonio Banderas", "Javier Bardem"), # Automata (2014) |
| 353 | + ("Antonio Banderas", "Penelope Cruz"), # Dolor y Gloria (2019) |
| 354 | + ("Antonio Banderas", "Tom Holland"), # Uncharted (2022) |
| 355 | + ("Brad Pitt", "Javier Bardem"), # F1 (2025) |
| 356 | + ("Javier Bardem", "Timothée Chalamet"), # Dune (2021) |
| 357 | + ("Timothée Chalamet", "Zendaya"), # Dune (2021) |
| 358 | + ("Tom Holland", "Zendaya"), # Spider-Man: No Way Home (2021) |
| 359 | +] |
| 360 | +
|
| 361 | +# Create a graph from the edge list |
| 362 | +G_actors = nx.from_edgelist(edge_list) |
| 363 | +# Draw the graph |
| 364 | +pos = nx.shell_layout(G_actors) # use shell layout for visualization |
| 365 | +nx.draw( |
| 366 | + G_actors, |
| 367 | + pos=pos, |
| 368 | + with_labels=True, |
| 369 | + node_color='lightgreen', |
| 370 | + edge_color='gray', |
| 371 | + node_size=2000, |
| 372 | + font_size=12 |
| 373 | + ) |
| 374 | +plt.show() |
| 375 | +``` |
| 376 | + |
| 377 | +**Exercise:** In the code above, I included the movies in the comments next to the edges. Can you create a graph where the edges are labeled with the movie titles? |
0 commit comments