You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You write your pipeline against a single API, then decide how it runs. Point it at one engine and it runs there — or hand Wayang's cost-based optimizer the choice and let it pick the best platform for each step across your laptop, Apache Spark, Apache Flink, or a database, even splitting a single job across several. Either way, when your data outgrows one machine you don't rewrite anything — you just make another engine available.
35
+
You write your pipeline against a single API, then decide how it runs. Point it at one engine and it runs there. Or hand Wayang's cost-based optimizer the choice and let it pick the best platform for each step across your laptop, Apache Spark, Apache Flink, or a database, even splitting a single job across several. Either way, when your data outgrows one machine you don't rewrite anything, you just make another engine available.
36
36
37
37
<palign="center">
38
38
<imgsrc="guides/img/wayang-architecture.svg"alt="A single pipeline, written once, feeds the Wayang optimizer, which routes each step to the best available engine — Local, Spark, Flink, Postgres, and others."width="720" />
@@ -52,9 +52,9 @@ You write your pipeline against a single API, then decide how it runs. Point it
52
52
53
53
## How it works
54
54
55
-
Most data processing systems are designed around a single execution engine. That keeps things simple, but your pipeline ends up tied to that engine's API — so combining engines, or moving to another, typically means rewriting.
55
+
Most data processing systems are designed around a single execution engine. That keeps things simple, but your pipeline ends up tied to that engine's API. So combining engines, or moving to another, typically means rewriting and gluing together which is costly and time-consuming.
56
56
57
-
Wayang sits one level up. You write a pipeline against Wayang's API and register the engines you *have* — then it's your call. Want control? Register one engine and it runs there. Want it handled? Register several and let the cost-based optimizer pick the best one for each step, even splitting a single job across engines.
57
+
Wayang sits one level up. You write a pipeline against Wayang's API and register the engines you *have*. Then it's your call. Want control? Register one engine and it runs there. Want it handled? Register several and let the cost-based optimizer pick the best one for each step, even splitting a single job across engines.
58
58
59
59
**Supported platforms today**
60
60
@@ -117,17 +117,17 @@ It executes locally. Good for development, tests, and small data.
117
117
Now run the *exact same pipeline* on Spark instead of locally. You don't touch the pipeline — you change which platform you register: comment out Java and register Spark.
118
118
119
119
```java
120
-
importorg.apache.wayang.spark.Spark; //← swap the import
120
+
importorg.apache.wayang.spark.Spark; // swap the import
121
121
122
122
// Same pipeline as before — only the registered platform changed.
123
123
WayangContext wayang =newWayangContext(newConfiguration())
124
-
// .withPlugin(Java.basicPlugin()) // ← comment out the local engine
Run it again. The same pipeline now executes on Spark — you changed *where* it runs without changing *what* it does. Switch to Flink or any other supported platform the same way: swap the import and the registered plugin.
128
+
Run it again. The same pipeline now executes on Spark. You changed *where* it runs without changing *what* it does. Switch to Flink or any other supported platform the same way: swap the import and the registered plugin.
129
129
130
-
> **Why register only Spark here?** Wayang's real power is registering several platforms and letting the optimizer pick. But on small test data the optimizer will almost always pick the local engine — Spark's startup overhead isn't worth it for a tiny file — so you'd never actually see Spark run. Registering Spark alone forces the issue so you can confirm it works. Step 3 shows the production pattern.
130
+
> **Why register only Spark here?** Wayang's real power is registering several platforms and letting the optimizer pick. But on small test data the optimizer will almost always pick the local engine (Spark's startup overhead isn't worth it for a tiny file) so you'd never actually see Spark run. Registering Spark alone forces the issue so you can confirm it works. Step 3 shows the production pattern.
131
131
132
132
### 3. Register both and let the optimizer choose
133
133
@@ -140,7 +140,7 @@ WayangContext wayang = new WayangContext(new Configuration())
140
140
.withPlugin(Spark.basicPlugin());
141
141
```
142
142
143
-
Now Wayang owns the placement decision. For each operator it estimates the cost on every registered platform and picks the cheapest — keeping a small job entirely local, pushing a large one onto Spark, or mixing both within the same job as the data demands. On a tiny input you'll see it keep everything local (that's the optimizer working correctly, not ignoring Spark); cross-platform splits show up once the data is big enough to justify them.
143
+
Now Wayang owns the placement decision. For each operator it estimates the cost on every registered platform and picks the cheapest, keeping a small job entirely local, pushing a large one onto Spark, or mixing both within the same job as the data and query demands. On a tiny input you'll see it keep everything local (that's the optimizer working correctly, not ignoring Spark); cross-platform splits show up once the data is big enough to justify them.
144
144
145
145
## Install
146
146
@@ -280,8 +280,6 @@ If you're looking for somewhere to begin, doc improvements, new operators, and a
280
280
281
281
-**Mailing lists** — [https://wayang.apache.org/docs/community/mailinglist](https://wayang.apache.org/docs/community/mailinglist) (user and dev)
0 commit comments