Curriculum Learning for Wordle RL agents via Latent Action Space Exploitation

This is a mini-project for the course CS-439: Optimization for Machine Learning at EPFL.

Setup

pip install -r requirements.txt

Project Structure

├── results                     <- Directory for saving results
│
│── src                         <- Source code
│   ├── data                    <- Data directory
│   ├── env                     <- Environment directory
│   ├── models                  <- Model directory
│
│── curriculum.py               <- File for training using a curriculum of increasing difficulty
│
│── naive.py                    <- File for training without curriculum
│
│── warmup.py                   <- File for training using a randomized curriculum
│
├── results.ipynb               <- Notebook with curriculum definition, final results and plots
│
├── requirements.txt            <- File for installing python dependencies
│
├── literature.bib              <- Literature references
│
└── README.md

Project Description

This project implements a Reinforcemnt Learning agent to play the popular word-guessing game Wordle and explores pathways of training optimization via transfer learning.

Environment

State: A 391-dimensional vector encodes the number of attempts left and each letter's correctness and position relative to the target word.
Latent action space: 130 dimensions.
Action: The actor outputs are multiplied with an OHE matrix of all words in the vocabulary. The results are passed as logits to a categorical distribution and an action is sampled.
Rewards: +2 for green, +1 for yellow, +10 $\times$ (1 + # of attempts left) for winning to encourage early winning.
Penalty for losing: -50 to discourage exploitation of the reward system.

Agents:

Advantage Actor Critic: An actor network that learns to maximize the expected return, and a critic network that estimates the value function to reduce variance in the policy gradient updates.

Model Architecture:

Actor: 391-dimensional input, one hidden layer of 256 units with ReLU activation, and a 130-dimensional output layer.
Critic: The same architecture as the actor, but with a single output unit for the value function.

A note on hardware optimization:

A significant amount of time was spent exploring the best hardware configuration for the project. Our testing across different hardware configurations (modern consumer-grade CPUs, older server-class CPUs, and GPUs) revealed that the performance bottleneck of the project was single-thread CPU performance and memory bandwidth. After trying to optimize the code for multi-threading, we found that the overhead of thread management outweighed the benefits. Similarly, due to the small number of low-dimensional layers in our neural networks, GPU utilization hindered performance. While it would be possible to further increase the dimensionality of the network's layers or the batch size, we believe that in a setting with many possible actions such as Wordle, the most important factor is experience. We therefore decided to keep the neural network minimal to facilitate faster learning per episode.

For the reasons stated above, we recommend using a modern CPU with high single-core performance for training.

Team Members:

Farhan Ali, Jean Fregeville, Vasileios Gkikas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Curriculum Learning for Wordle RL agents via Latent Action Space Exploitation

Setup

Project Structure

Project Description

Environment

Agents:

Model Architecture:

A note on hardware optimization:

Team Members:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
results		results
src		src
README.md		README.md
curriculum.py		curriculum.py
literature.bib		literature.bib
naive.py		naive.py
requirements.txt		requirements.txt
results.ipynb		results.ipynb
warmup.py		warmup.py

Folders and files

Latest commit

History

Repository files navigation

Curriculum Learning for Wordle RL agents via Latent Action Space Exploitation

Setup

Project Structure

Project Description

Environment

Agents:

Model Architecture:

A note on hardware optimization:

Team Members:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages