Large Language Model Distributed Inference Workshop

23.6. Large Language Model Distributed Inference Workshop#

23.6.1. Workshop Summary#

This workshop provides hands-on training on hosting and running inference for large language models that exceed the memory capacity of a single GPU. Participants work with the vLLM library to host large Llama models, including those with 70 billion and 405 billion parameters, on an HPC cluster. The workshop covers prompting these models, extracting logits, and parallelizing prompts to maximize efficiency of GPU resources.

23.6.1.1. Prerequisites#

  • Familiarity with Python programming

  • Familiarity with LLMs

  • Familiarity with high performance computing (HPC)

  • Access to the FASRC cluster

23.6.2. Workshop Slides#

To download the “Large Language Model Distributed Inference” workshop slides, click the link below.

Kempner LLM Distributed Inference Workshop