|
| 1 | +# Low-level memory model |
| 2 | + |
| 3 | +SPy aims to be a "two-level" language, with a low-level, possibly unsafe core, upon |
| 4 | +which to build higher level abstractions which we expect end users to use. |
| 5 | + |
| 6 | +It might help to draw a parallel with CPython: the core of the interpreter and of many |
| 7 | +libraries is written in C, which is low level and inherently unsafe. The result is a |
| 8 | +safe high-level language which is what most people see and use. |
| 9 | + |
| 10 | +This document describes the low-level memory model of SPy. |
| 11 | + |
| 12 | +While reading this document, it is worth remembering that SPy has two main mode of |
| 13 | +execution, interpreted and compiled. |
| 14 | + |
| 15 | +The SPy compiler works by translating `*.spy` code into `*.c` code (after |
| 16 | +[redshifting](https://antocuni.eu/2025/10/29/inside-spy-part-1-motivations-and-goals/#redshifting)), |
| 17 | +which is then compiled by e.g. `gcc` or `clang`. In the next sections, we will also |
| 18 | +explain how SPy types are translated into C. |
| 19 | + |
| 20 | +## Save vs unsafe code |
| 21 | + |
| 22 | +By default, SPy code is **safe**, meanging that: |
| 23 | + |
| 24 | + 1. memory and lifetimes are managed automatically |
| 25 | + |
| 26 | + 2. you cannot corrupt memory or access memory which was already freed |
| 27 | + |
| 28 | +However, SPy also offers the `unsafe` module, for writing low-level code and for |
| 29 | +specialized cases. At the moment of writing, the `unsafe` module can be imported |
| 30 | +freely, but the plan is to allow unsafe code only in few specific and clearly labeled |
| 31 | +section of the program. |
| 32 | + |
| 33 | + |
| 34 | +## Primitive types |
| 35 | + |
| 36 | +At the core, we have primitive numeric types such as `i32`, `f32`, `f64`, etc. These are |
| 37 | +translated into their C equivalent `int32_t`, `float`, `double`, etc. |
| 38 | + |
| 39 | +Moreover, SPy defines the `int` and `float` aliases, which maps to `i32` and `f64` |
| 40 | +respectively. At the moment this is hardcoded, but eventually the precise mapping will |
| 41 | +depend on the target platform: |
| 42 | + |
| 43 | + |
| 44 | +<div align="right"><sub><a href="https://github.com/spylang/spy/blob/37ee3e29a7707618adf107ca7d8d19de2942ab55/spy/vm/modules/builtins.py#L222-L230">See on GitHub</a></sub></div> |
| 45 | + |
| 46 | +```python title="spy/vm/modules/builtins.py @ 37ee3e29" linenums="222" |
| 47 | +# add aliases for common types. For now we map: |
| 48 | +# int -> i32 |
| 49 | +# float -> f64 |
| 50 | +# |
| 51 | +# We might want to map int to different concrete types, depending on the |
| 52 | +# platform? Or maybe have some kind of "configure step"? |
| 53 | +BUILTINS.add("int", BUILTINS.w_i32) |
| 54 | +BUILTINS.add("float", BUILTINS.w_f64) |
| 55 | +``` |
| 56 | + |
| 57 | +## Stack-allocated structs |
| 58 | + |
| 59 | +We can define C-like structs: |
| 60 | + |
| 61 | +```python |
| 62 | +@struct |
| 63 | +class Point: |
| 64 | + x: int |
| 65 | + y: int |
| 66 | +``` |
| 67 | + |
| 68 | +Structs can be instantiated directly, and are **immutable**: |
| 69 | + |
| 70 | +```python |
| 71 | +p = Point(1, 2) |
| 72 | +print(p.x, p.y) |
| 73 | + |
| 74 | +p.x = 3 # TypeError |
| 75 | +``` |
| 76 | + |
| 77 | +Structs have **inline storage**: |
| 78 | + |
| 79 | + 1. if used as local variables, they are allocated "on the stack"; |
| 80 | + |
| 81 | + 2. if used as fields of a bigger struct, they are allocated "inline" the bigger |
| 82 | + struct; |
| 83 | + |
| 84 | + 3. they are passed by value, which means that passing around big structs can be |
| 85 | + costly. |
| 86 | + |
| 87 | + |
| 88 | +For example: |
| 89 | + |
| 90 | +```python |
| 91 | +@struct |
| 92 | +class Rect: |
| 93 | + a: Point |
| 94 | + b: Point |
| 95 | + |
| 96 | +assert sizeof(Point) == sizeof(int) * 2 |
| 97 | +assert sizeof(Rect) == sizeof(int) * 4 |
| 98 | + |
| 99 | +r = Rect(Point(1, 2), Point(3, 4)) |
| 100 | +``` |
| 101 | + |
| 102 | +The compiler translates them into plain C structs, something along these lines: |
| 103 | + |
| 104 | +```c |
| 105 | +typedef struct { |
| 106 | + int32_t x; |
| 107 | + int32_t y; |
| 108 | +} Point; |
| 109 | + |
| 110 | +typedef struct { |
| 111 | + Point a; |
| 112 | + Point b; |
| 113 | +} Rect; |
| 114 | + |
| 115 | +Point p = {1, 2}; |
| 116 | +Rect r = {(Point){1, 2}, (Point){3, 4}}; |
| 117 | +``` |
| 118 | +
|
| 119 | +Stack-allocated structs are always safe to use. |
| 120 | +
|
| 121 | +## Raw and GC memory |
| 122 | +
|
| 123 | +The heap is conceptually divided into two main regions: **raw memory** and **GC memory**. |
| 124 | +The low-level manipulation of both areas of memory is **unsafe**. |
| 125 | +
|
| 126 | +Raw memory is "C style": |
| 127 | +
|
| 128 | + - memory is allocated with `raw_alloc[T]`; pointers are of type `raw_ptr[T]`; |
| 129 | +
|
| 130 | + - the memory must be explicitly released by calling `raw_free[T]` (NOT IMPLEMENTED |
| 131 | + YET!) |
| 132 | +
|
| 133 | + - it is responsibility of the programmer to avoid use-after-free and out-of-bounds |
| 134 | + access; |
| 135 | +
|
| 136 | + - once allocated, the address of the memory is non-movable and can be safely passed to |
| 137 | + 3rd party libraries |
| 138 | +
|
| 139 | +GC memory: |
| 140 | +
|
| 141 | + - memory is allocated with `gc_alloc[T]`; pointers are of type `gc_ptr[T]`; |
| 142 | +
|
| 143 | + - the memory is automatically released by the GC when it's no longer needed; |
| 144 | +
|
| 145 | + - it is *still* responsibility of the programmer to avoid out-of-bounds access to |
| 146 | + arrays; |
| 147 | +
|
| 148 | + - objects are potentially **movable** (depending on the GC strategy), so their address |
| 149 | + might change; |
| 150 | +
|
| 151 | + - it is possible to get a temporary non-movable `raw_ptr` by "pinning" a `gc_ptr` (NOT |
| 152 | + IMPLEMENTED YET!). |
| 153 | +
|
| 154 | +/// warning |
| 155 | +GC is not implemented yet; currently `gc_alloc` is just an alias to `raw_alloc`, |
| 156 | +meaning that it leaks memory |
| 157 | +/// |
| 158 | +
|
| 159 | +
|
| 160 | +## Heap-allocated structs |
| 161 | +
|
| 162 | +We can allocated structs "on the heap". This is a lower-level functionality which |
| 163 | +requires the use of `unsafe` functions; they can be allocated both in raw and GC memory: |
| 164 | +
|
| 165 | +```python |
| 166 | +from unsafe import raw_ptr, raw_alloc |
| 167 | +
|
| 168 | +p1: raw_ptr[Point] = raw_alloc[Point](1) |
| 169 | +p1.x = 1 |
| 170 | +p1.y = 2 |
| 171 | +``` |
| 172 | + |
| 173 | +Contrarily to their stack-allocated counterparts, heap-allocated structs are mutable. |
| 174 | +you should think of heap-allocated structs as the basic building blog for all |
| 175 | +higher-level types. |
| 176 | + |
| 177 | +It might be helpful to draw again a parallel to CPython: in CPython, objects of type |
| 178 | +`tuple` and `str` are immutable, but under the hood they are implemented by mutable heap |
| 179 | +allocated structs written in C. |
| 180 | + |
| 181 | + |
| 182 | + |
| 183 | +## Raw allocation |
| 184 | + |
| 185 | +`raw_alloc[T](n)` allocates an **array** of `T` on the heap. To allocate a single |
| 186 | +element, you just pass `n = 1`. For convenience, if `T` is a struct you can access it's |
| 187 | +fields without having to use `[0]`, exactly as in C: |
| 188 | + |
| 189 | +```python |
| 190 | +def test(p: raw_ptr[Point]) -> None: |
| 191 | + assert p.x == p[0].x |
| 192 | + assert p.y == p[0].y |
| 193 | +``` |
| 194 | + |
| 195 | +The low-level representation of pointers depends on the excecution mode. |
| 196 | + |
| 197 | +The interpreter keeps track of the address **and the length** of the allocated region, |
| 198 | +and checks for out-of-bounds access: |
| 199 | + |
| 200 | +<div align="right"><sub><a href="https://github.com/spylang/spy/blob/8a360bc11d95db09fee34964ce3cab6639c06f1f/spy/vm/modules/unsafe/ptr.py#L128-L150">See on GitHub</a></sub></div> |
| 201 | +```python title="spy/vm/modules/unsafe/ptr.py @ 8a360bc1" linenums="128" |
| 202 | +@UNSAFE.builtin_type("__base_ptr") |
| 203 | +class W_BasePtr(W_Object): |
| 204 | + [...] |
| 205 | + w_ptrtype: W_BasePtrType |
| 206 | + addr: fixedint.Int32 |
| 207 | + length: fixedint.Int32 # how many items in the array |
| 208 | +``` |
| 209 | + |
| 210 | +The same happens in **debug compiled mode**, where `raw_ptr[T]` is translated to a fat |
| 211 | +pointer. Finally, in **release compiled mode**, `raw_ptr[T]` is translated as a plain C |
| 212 | +pointer, and there is no out-of-bounds check: |
| 213 | + |
| 214 | + |
| 215 | +<div align="right"><sub><a href="https://github.com/spylang/spy/blob/8a360bc11d95db09fee34964ce3cab6639c06f1f/spy/libspy/include/spy/unsafe.h#L12-L18">See on GitHub</a></sub></div> |
| 216 | +```c title="spy/libspy/include/spy/unsafe.h @ 8a360bc1" linenums="12" |
| 217 | + typedef struct Ptr_T { |
| 218 | + T *p; |
| 219 | + #ifdef SPY_PTR_CHECKED |
| 220 | + size_t length; |
| 221 | + #endif |
| 222 | + } Ptr_T; |
| 223 | + |
| 224 | +``` |
| 225 | +
|
| 226 | +## GC allocation |
| 227 | +
|
| 228 | +**Not implemented yet** |
| 229 | +
|
| 230 | +See the **very rough** [plan](https://github.com/antocuni/spy-memory-model) |
| 231 | +
|
| 232 | +
|
| 233 | +## Raw references: `raw_ref[T]` |
| 234 | +
|
| 235 | +Structs and pointers are loosely modeled against C, but there is a big semantic |
| 236 | +difference between Python and C that we need to take into account in order to provide an |
| 237 | +intuitive way to deal with structs. |
| 238 | +
|
| 239 | +Consider the following example, using the `Rect` and `Point` structs defined above. It |
| 240 | +modifies a **nested** struct: |
| 241 | +
|
| 242 | +```python |
| 243 | +def test(r: raw_ptr[Rect]) -> None: |
| 244 | + r.a.x = 0 |
| 245 | +``` |
| 246 | + |
| 247 | +In Python (and thus SPy) the above expression decomposes to: |
| 248 | + |
| 249 | +```python |
| 250 | +tmp = r.a |
| 251 | +tmp.x = 0 |
| 252 | +``` |
| 253 | + |
| 254 | +or, more explicitly: |
| 255 | + |
| 256 | +```python |
| 257 | +tmp = getattr(r, "a") |
| 258 | +setattr(tmp, "x", 0) |
| 259 | +``` |
| 260 | + |
| 261 | +The naive implementation of `r.a` would be to return a *copy* of the `Point` but this |
| 262 | +means that `tmp.x` would modify the copy, not the original. |
| 263 | + |
| 264 | +To solve the problem, we return a **reference** instead: |
| 265 | + |
| 266 | +```python |
| 267 | +tmp: raw_ref[Point] = r.a |
| 268 | +tmp.x = 0 |
| 269 | +``` |
| 270 | + |
| 271 | +Contrarily to pointers, references cannot be indexed and cannot be `NULL`. Moreover, a |
| 272 | +`raw_ref[T]` can be automatically converted into a `T`. E.g. consider the following: |
| 273 | + |
| 274 | +```python |
| 275 | +def foo(r: raw_ptr[Rect]) -> None: |
| 276 | + r2: Rect = r # ERROR: cannot convert raw_ptr[Rect] to Rect |
| 277 | + p: Point = r.a # works: r.a is raw_ref[Point], and it's converted to a Point |
| 278 | +``` |
| 279 | + |
| 280 | +In the C backend, `raw_ref[T]` is implemented in the exact same way as `raw_ptr[T]`. |
0 commit comments