Skip to content

Commit f2e4647

Browse files
committed
minor
1 parent 0edac89 commit f2e4647

1 file changed

Lines changed: 9 additions & 11 deletions

File tree

README.md

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232

3333
[![Tweet](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/intent/tweet?text=Apache%20Wayang%20enables%20cross%20platform%20data%20processing,%20star%20it%20via:%20&url=https://github.com/apache/wayang&via=apachewayang&hashtags=dataprocessing,bigdata,analytics,hybridcloud,developers) [![LinkedIn](https://img.shields.io/badge/LinkedIn-Follow-0A66C2?style=social&logo=linkedin)](https://www.linkedin.com/company/apachewayang)
3434

35-
You write your pipeline against a single API, then decide how it runs. Point it at one engine and it runs there — or hand Wayang's cost-based optimizer the choice and let it pick the best platform for each step across your laptop, Apache Spark, Apache Flink, or a database, even splitting a single job across several. Either way, when your data outgrows one machine you don't rewrite anything you just make another engine available.
35+
You write your pipeline against a single API, then decide how it runs. Point it at one engine and it runs there. Or hand Wayang's cost-based optimizer the choice and let it pick the best platform for each step across your laptop, Apache Spark, Apache Flink, or a database, even splitting a single job across several. Either way, when your data outgrows one machine you don't rewrite anything, you just make another engine available.
3636

3737
<p align="center">
3838
<img src="guides/img/wayang-architecture.svg" alt="A single pipeline, written once, feeds the Wayang optimizer, which routes each step to the best available engine — Local, Spark, Flink, Postgres, and others." width="720" />
@@ -52,9 +52,9 @@ You write your pipeline against a single API, then decide how it runs. Point it
5252

5353
## How it works
5454

55-
Most data processing systems are designed around a single execution engine. That keeps things simple, but your pipeline ends up tied to that engine's API — so combining engines, or moving to another, typically means rewriting.
55+
Most data processing systems are designed around a single execution engine. That keeps things simple, but your pipeline ends up tied to that engine's API. So combining engines, or moving to another, typically means rewriting and gluing together which is costly and time-consuming.
5656

57-
Wayang sits one level up. You write a pipeline against Wayang's API and register the engines you *have* — then it's your call. Want control? Register one engine and it runs there. Want it handled? Register several and let the cost-based optimizer pick the best one for each step, even splitting a single job across engines.
57+
Wayang sits one level up. You write a pipeline against Wayang's API and register the engines you *have*. Then it's your call. Want control? Register one engine and it runs there. Want it handled? Register several and let the cost-based optimizer pick the best one for each step, even splitting a single job across engines.
5858

5959
**Supported platforms today**
6060

@@ -117,17 +117,17 @@ It executes locally. Good for development, tests, and small data.
117117
Now run the *exact same pipeline* on Spark instead of locally. You don't touch the pipeline — you change which platform you register: comment out Java and register Spark.
118118

119119
```java
120-
import org.apache.wayang.spark.Spark; // swap the import
120+
import org.apache.wayang.spark.Spark; // swap the import
121121

122122
// Same pipeline as before — only the registered platform changed.
123123
WayangContext wayang = new WayangContext(new Configuration())
124-
// .withPlugin(Java.basicPlugin()) // comment out the local engine
125-
.withPlugin(Spark.basicPlugin()); // register Spark instead
124+
// .withPlugin(Java.basicPlugin()) // comment out the local engine
125+
.withPlugin(Spark.basicPlugin()); // register Spark instead
126126
```
127127

128-
Run it again. The same pipeline now executes on Spark — you changed *where* it runs without changing *what* it does. Switch to Flink or any other supported platform the same way: swap the import and the registered plugin.
128+
Run it again. The same pipeline now executes on Spark. You changed *where* it runs without changing *what* it does. Switch to Flink or any other supported platform the same way: swap the import and the registered plugin.
129129

130-
> **Why register only Spark here?** Wayang's real power is registering several platforms and letting the optimizer pick. But on small test data the optimizer will almost always pick the local engine Spark's startup overhead isn't worth it for a tiny file so you'd never actually see Spark run. Registering Spark alone forces the issue so you can confirm it works. Step 3 shows the production pattern.
130+
> **Why register only Spark here?** Wayang's real power is registering several platforms and letting the optimizer pick. But on small test data the optimizer will almost always pick the local engine (Spark's startup overhead isn't worth it for a tiny file) so you'd never actually see Spark run. Registering Spark alone forces the issue so you can confirm it works. Step 3 shows the production pattern.
131131
132132
### 3. Register both and let the optimizer choose
133133

@@ -140,7 +140,7 @@ WayangContext wayang = new WayangContext(new Configuration())
140140
.withPlugin(Spark.basicPlugin());
141141
```
142142

143-
Now Wayang owns the placement decision. For each operator it estimates the cost on every registered platform and picks the cheapestkeeping a small job entirely local, pushing a large one onto Spark, or mixing both within the same job as the data demands. On a tiny input you'll see it keep everything local (that's the optimizer working correctly, not ignoring Spark); cross-platform splits show up once the data is big enough to justify them.
143+
Now Wayang owns the placement decision. For each operator it estimates the cost on every registered platform and picks the cheapest, keeping a small job entirely local, pushing a large one onto Spark, or mixing both within the same job as the data and query demands. On a tiny input you'll see it keep everything local (that's the optimizer working correctly, not ignoring Spark); cross-platform splits show up once the data is big enough to justify them.
144144

145145
## Install
146146

@@ -280,8 +280,6 @@ If you're looking for somewhere to begin, doc improvements, new operators, and a
280280

281281
- **Mailing lists**[https://wayang.apache.org/docs/community/mailinglist](https://wayang.apache.org/docs/community/mailinglist) (user and dev)
282282
- **LinkedIn**[Apache Wayang](https://www.linkedin.com/company/apachewayang)
283-
- **Twitter**[@apachewayang](https://twitter.com/apachewayang)
284-
285283

286284
## Authors
287285

0 commit comments

Comments
 (0)