NVIDIA NeMo Workflow

13. NVIDIA NeMo Workflow#

This section documents NVIDIA NeMo Workflows, designed for training and finetuning large language models (LLMs) on the Kempner AI Cluster using NVIDIA data center GPUs (e.g., A100, H100, and H200).

13.1. Overview#

This section provides a modular, cluster-ready workflow built on top of the NVIDIA NeMo framework. It supports:

Pretraining and finetuning of LLMs such as GPT2 and Llama 3
Scalable distributed training utilizing SLURM
Containerized execution with reproducible environments

The goal is to replicate NeMo’s capabilities in a way that is optimized for the AI cluster infrastructure, while maintaining flexibility to adapt to new models and datasets.

GitHub Repository: KempnerInstitute/nvidia-nemo-workflows

13.2. About NeMo#

NVIDIA NeMo is a scalable, research-oriented AI framework for:

Large Language Models (LLMs)
Multimodal Models (MM)
Automatic Speech Recognition (ASR)
Text-to-Speech (TTS)
Computer Vision (CV)

It is designed to streamline experimentation and deployment with modular components and out-of-the-box support for distributed training and mixed precision.

13.3. Prerequisites#

To use these workflows, ensure the following:

Access to a SLURM partition with data center GPUs (e.g., A100, H100, or H200).
Container with NeMo installed
Access to pretrained models (e.g., Llama 3, Megatron GPT2)
Approved access to any Gated Repositories on Hugging Face

[!NOTE]
Some models require token-based authentication and acceptance of usage terms on Hugging Face before use.

13.4. Available NeMo Workflows#

13.4.1. Pretraining Workflows#

Workflow Name	Model	Dataset
gpt2_pretraining_codeparrot	Megatron GPT2	CodeParrot

13.4.2. Finetuning Workflows#

Type	Workflow Name	Model	Dataset
Full	sft_full_llama3-70b_dolly15k	Meta-Llama-3-70B	databricks-dolly-15k
Full	sft_full_llama3-70b_pubmedqa	Meta-Llama-3-70B	PubMedQA
Full	sft_full_llama3-8b_dolly15k	Meta-Llama-3-8B-Instruct	databricks-dolly-15k
Full	sft_full_llama3-8b_pubmedqa	Meta-Llama-3-8B-Instruct	PubMedQA
LoRA	sft_lora_llama3-70b_dolly15k	Meta-Llama-3-70B	databricks-dolly-15k
LoRA	sft_lora_llama3-70b_pubmedqa	Meta-Llama-3-70B	PubMedQA
LoRA	sft_lora_llama3-8b_dolly15k	Meta-Llama-3-8B-Instruct	databricks-dolly-15k
LoRA	sft_lora_llama3-8b_pubmedqa	Meta-Llama-3-8B-Instruct	PubMedQA
P-Tuning	ptuning_llama3-70b_dolly15k	Meta-Llama-3-70B	databricks-dolly-15k
P-Tuning	ptuning_llama3-8b_dolly15k	Meta-Llama-3-8B-Instruct	databricks-dolly-15k

13.4.3. Reinforcement Learning (RL) Workflows#

Type	Workflow Name	Model	Dataset
DPO	dpo_llama3-8b	Meta-Llama-3-8B-Instruct	email response

Note

Follow this repository for regular updates on workflow instructions for the latest training, fine-tuning, and RL workflows on AI clusters.