First, you need to set up the conda environment on NYU GREENE:
-
Read the
JUPYTER.mdfile for detailed instructions on setting up the conda environment on NYU GREENEcat JUPYTER.md
-
Follow the instructions in
JUPYTER.mdto create and activate your conda environment
After setting up the conda environment, install the required packages:
-
Run the
req.sbatchscript to download and install all necessary packages for the conda environmentsbatch req.sbatch
-
Check the status of your job using
squeue -u $USERto ensure it completes successfully
Generate the training data:
-
Navigate to the
data_genfoldercd data_gen -
Run the
rearc.sbatchscript to generate 10,000 examples for each of the 400 ARC problemssbatch rearc.sbatch
-
This process may take some time to complete. You can monitor the job status using:
squeue -u $USER
Verify that the data generation was successful:
-
Open and run the
visualization.ipynbnotebook to inspect the generated datajupyter notebook visualization.ipynb
-
Ensure that the notebook shows the correct number of examples (10K for each problem) and that the data format is as expected
After data generation and verification, proceed with the model training:
-
Navigate to the
lladafoldercd ../llada -
Run the parallel processing script
sbatch llada_parallel.sbatch
-
After the parallel processing completes, run the Supervised Fine-Tuning (SFT) script
sbatch llada_sft.sbatch
-
Monitor both jobs using
squeue -u $USERand check the output logs for any errors
If you encounter any issues:
- Check the job output logs in the slurm output files (typically named as
slurm-JOBID.out) - Verify that all paths in the sbatch scripts are correct
- Ensure that the conda environment is properly activated in each sbatch script
- Check for sufficient disk space and compute resources
- The data generation step (
rearc.sbatch) creates 10,000 examples for each of the 400 ARC problems, which will require significant disk space - The
llada_parallel.sbatchandllada_sft.sbatchscripts will utilize GPU resources, so make sure your allocation has sufficient GPU time available - Depending on your resource allocation, you may need to adjust the resource requests in the sbatch scripts