13. NVIDIA NeMo Workflow#

This section documents NVIDIA NeMo Workflows, designed for training and finetuning large language models (LLMs) on the Kempner AI Cluster using NVIDIA data center GPUs (e.g., A100, H100, and H200).

13.1. Overview#

This section provides a modular, cluster-ready workflow built on top of the NVIDIA NeMo framework. It supports:

  • Pretraining and finetuning of LLMs such as GPT2 and Llama 3

  • Scalable distributed training utilizing SLURM

  • Containerized execution with reproducible environments

The goal is to replicate NeMo’s capabilities in a way that is optimized for the AI cluster infrastructure, while maintaining flexibility to adapt to new models and datasets.

GitHub Repository: KempnerInstitute/nvidia-nemo-workflows

13.2. About NeMo#

NVIDIA NeMo is a scalable, research-oriented AI framework for:

  • Large Language Models (LLMs)

  • Multimodal Models (MM)

  • Automatic Speech Recognition (ASR)

  • Text-to-Speech (TTS)

  • Computer Vision (CV)

It is designed to streamline experimentation and deployment with modular components and out-of-the-box support for distributed training and mixed precision.

13.3. Prerequisites#

To use these workflows, ensure the following:

  • Access to a SLURM partition with data center GPUs (e.g., A100, H100, or H200).

  • Container with NeMo installed

  • Access to pretrained models (e.g., Llama 3, Megatron GPT2)

  • Approved access to any Gated Repositories on Hugging Face

[!NOTE]
Some models require token-based authentication and acceptance of usage terms on Hugging Face before use.

13.4. Available NeMo Workflows#

13.4.1. Pretraining Workflows#

13.4.2. Finetuning Workflows#

13.4.3. Reinforcement Learning (RL) Workflows#

Type

Workflow Name

Model

Dataset

DPO

dpo_llama3-8b

Meta-Llama-3-8B-Instruct

email response

Note

Follow this repository for regular updates on workflow instructions for the latest training, fine-tuning, and RL workflows on AI clusters.