Skip to main content
Ctrl+K
Kempner Institute Computing Handbook - Home Kempner Institute Computing Handbook - Home
  • Kempner Computing Handbook

High Performance Computing

  • 1. Kempner AI Cluster
    • 1.1. New User Checklist
    • 1.2. Introduction and Basics
    • 1.3. Overview of Cluster
    • 1.4. Cluster Usage Policies
    • 1.5. Accessing the Cluster
    • 1.6. Access by FASRC Users
    • 1.7. Fairshare Policy
  • 2. Job Submission
    • 2.1. Understanding SLURM
    • 2.2. Job Submission Basics
    • 2.3. Array Jobs
    • 2.4. Job Dependencies
    • 2.5. Advanced SLURM Features
    • 2.6. Open OnDemand
  • 3. Development Environment
    • 3.1. Software Modules
    • 3.2. Containerization
    • 3.3. VSCode for Remote Dev
    • 3.4. Conda Environment
    • 3.5. Spack Package Manager
    • 3.6. Shell Configuration
  • 4. Storage and Data Transfer
    • 4.1. Storage Options
    • 4.2. Data Transfer
    • 4.3. Shared Data/Model Repository
    • 4.4. Data Management Plan
    • 4.5. FAQ: Data Management and Offboarding

Software Engineering for Research

  • 5. Collaborative Code Development
  • 6. Software Design Principles
  • 7. Documentation and Readability
  • 8. Testing and Continuous Integration
  • 9. Package Development
  • 10. Reproducible Research

AI Tools and Workflows

  • 11. Data Discovery and Tokenization
  • 12. Distributed Inference
  • 13. NVIDIA NeMo Workflow
  • 14. Scalable Vision Workflows
  • 15. Hugging Face Models
  • 16. KempnerPulse
  • 17. KempnerForge

Neuro AI Workflows

  • 18. Spike Sorting

AI Scaling and Engineering

  • 19. Scalability
    • 19.1. Intro to Parallel Computing
    • 19.2. GPU Computing
    • 19.3. GPU Profiling
    • 19.4. Distributed GPU Computing
    • 19.5. Parallel I/O
  • 20. Efficiency
    • 20.1. ML Efficiency
    • 20.2. Performance Monitoring
    • 20.3. Deployment and Inference
    • 20.4. Experiment Management
  • 21. Experiment Management
    • 21.1. Weights & Biases - Intro
    • 21.2. Weights & Biases - Sweeps

Security and Compliance

  • 22. Security and Compliance

Open Source Hub

  • 23. Open Source Hub

Support

  • 24. Support and Troubleshooting
    • 24.1. FAQ

Workshops

  • 25. About Workshops @ Kempner
    • 25.2. Introduction to the Kempner AI cluster Workshop
    • 25.3. Introduction to Distributed Computing Workshop
    • 25.4. Large Language Model Distributed Training Workshop
    • 25.5. Building a Transformer from Scratch Workshop
    • 25.6. Large Language Model Distributed Inference Workshop
    • 25.7. Spike Sorting on an HPC Cluster Workshop
    • 25.8. Optimizing ML Workflows on an AI Cluster Workshop
  • Repository
  • Open issue

Index

By Kempner Institute

© Copyright 2026, The President and Fellows of Harvard College.