Skip to main content
Ctrl+K
Kempner Institute Computing Handbook - Home Kempner Institute Computing Handbook - Home
  • Kempner Computing Handbook

High Performance Computing

  • 1. Kempner AI Cluster
    • 1.1. Introduction and Basics
    • 1.2. Overview of Cluster
    • 1.3. Cluster Usage Policies
    • 1.4. Accessing the Cluster
    • 1.5. Access by FASRC Users
    • 1.6. Fairshare Policy
  • 2. Job Submission
    • 2.1. Understanding SLURM
    • 2.2. Job Submission Basics
    • 2.3. Array Jobs
    • 2.4. Job Dependencies
    • 2.5. Advanced SLURM Features
    • 2.6. Open OnDemand
  • 3. Development Environment
    • 3.1. Software Modules
    • 3.2. Containerization
    • 3.3. VSCode for Remote Dev
    • 3.4. Conda Environment
    • 3.5. Spack Package Manager
    • 3.6. Shell Configuration
  • 4. Storage and Data Transfer
    • 4.1. Storage Options
    • 4.2. Data Transfer
    • 4.3. Shared Data/Model Repository

Software Engineering for Research

  • 5. Collaborative Code Development
  • 6. Software Design Principles
  • 7. Documentation and Readability
  • 8. Testing and Continuous Integration
  • 9. Package Development
  • 10. Reproducible Research

AI Tools and Workflows

  • 11. Data Discovery and Tokenization
  • 12. Distributed Inference
  • 13. NVIDIA NeMo Workflow
  • 14. Scalable Vision Workflows
  • 15. Hugging Face Models

Neuro AI Workflows

  • 16. Spike Sorting

AI Scaling and Engineering

  • 17. Scalability
    • 17.1. Intro to Parallel Computing
    • 17.2. GPU Computing
    • 17.3. GPU Profiling
    • 17.4. Distributed GPU Computing
    • 17.5. Parallel I/O
  • 18. Efficiency
    • 18.1. ML Efficiency
    • 18.2. Performance Monitoring
    • 18.3. Deployment and Inference
    • 18.4. Experiment Management
  • 19. Experiment Management
    • 19.1. Weights & Biases - Intro
    • 19.2. Weights & Biases - Sweeps

Security and Compliance

  • 20. Security and Compliance

Open Source Hub

  • 21. Open Source Hub

Support

  • 22. Support and Troubleshooting
    • 22.1. FAQ

Workshops

  • 23. About Workshops @ Kempner
    • 23.2. Introduction to the Kempner AI cluster Workshop
    • 23.3. Introduction to Distributed Computing Workshop
    • 23.4. Large Language Model Distributed Training Workshop
    • 23.5. Building a Transformer from Scratch Workshop
    • 23.6. Large Language Model Distributed Inference Workshop
    • 23.7. Spike Sorting on an HPC Cluster Workshop
    • 23.8. Optimizing ML Workflows on an AI Cluster Workshop
  • Repository
  • Open issue

Index

By Kempner Institute

© Copyright 2026, The President and Fellows of Harvard College.