Any plan to release the paper or technical report?

I was wondering if you could share a bit more info about the **data processing and training parameters** for your models? Things like:

* Any specific data cleaning steps and tokenizer preparation?
* What were the main training settings?
* evaluation details.

Thanks a lot!