Skip to content

Add a GPT-2 training example #19

Description

@bwdGitHub

We would like to use these issues to gauge user interest.

It is possible to use the GPT-2 implementation for further language model training. There is no example demonstrating this on the repo or otherwise.

To make this possible on a typical consumer GPU will likely require some technique to reduce the amount of GPU memory required to train. There are a number of options:

  1. Add support for a smaller GPT-2 model.
  2. Only train a subset of the GPT-2 parameters.
  3. Use gradient accumulation.
  4. Gradient checkpointing.
  5. Reduced precision gradients.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions