Installation Instructions
Setting up everything needed for GainSight to work can be a long and tedious process, thus please read the instructions carefully and follow them step by step. We recommend using Docker to set up this project, but you may find it easier to set up the project manually if you are planning to contribute to the project.
There are also additional workloads not included in the GainSight submodule hierarchy or the Docker image that you can download and run.
Prerequisites
- *nix-based operating system with the x86_64 processor architecture. The following instructions are tailored for Ubuntu 22.04 with the apt package manager and the bash shell.
- At least 33 GB of free disk space and at least 16 GB of DRAM.
- NVIDIA GPU with architecture generations ranging from Kepler to Hopper.
- Git with SSH access to the repository
If you plan to use Docker, this project was originally built with Docker version 27.3.1 and NVIDIA Container Toolkit version 1.17.4. Please refer to the official Docker documentation for instructions on how to install Docker and the official NVIDIA Container Toolkit documentation for instructions on how to install the NVIDIA Container Toolkit.
If you are planning to build and run the tool from source, you will need to install CUDA Toolkit version 11.8 and NVIDIA Nsight Compute v2025.1.0. Please refer to the NVIDIA CUDA Toolkit documentation for instructions on how to install the CUDA Toolkit and the NVIDIA Nsight Compute documentation for instructions on how to install Nsight Compute.
Docker Installation
Docker is the preferred method of running this project. You can access the container registry at https://code.stanford.edu/tambe-lab/gainsight/container_registry/481.
1. Pulling a Prebuilt Docker Image
A prebuilt Docker image is available on Stanford GitLab at scr.svc.stanford.edu/tambe-lab/gainsight:latest.
To pull the prebuilt Docker image, run the following command.
Note that you will need a personal access token to log in to the Stanford GitLab container registry.
1 | |
2. Running the Docker Image
Use the following command to run the Docker image.
Note that the --gpus all flag is required to enable GPU acceleration, and the --rm flag is optional but recommended to remove the container after it is stopped; the --it flag is required to interact with the container through the terminal.
1 2 | |
If you want to run workloads under the /gainsight/workloads/mlperf-hugging-face directory, you will need to set the HUGGINGFACE_TOKEN environment variable to your Hugging Face token.
You can set the token by running the following commands.
1 2 | |
3. Building the Docker Image
If you intend to contribute to the project, you will need to build the Docker image from source. This project was originally built with Docker version 27.3.1 and NVIDIA Container Toolkit version 1.17.4. It is assumed that you are running a *nix-based operation system of the x86_64 architecture with an NVIDIA GPU from anywhere between the Kepler and Hopper architectures. Please refer to the official Docker documentation for instructions on how to install Docker and the official NVIDIA Container Toolkit documentation for instructions on how to install the NVIDIA Container Toolkit.
Please first clone the repository and set the PROJECT_ROOT environment variable to point to the root of the cloned repository.
It is essential to clone this repository with the --recursive flag to ensure that all submodules are cloned as well.
1 2 3 | |
After cloning, please set up the PROJECT_ROOT environment variable to point to the root of the cloned repository.
This is essential for the build process and for running the workloads since many other environment variables are set relative to this path.
1 2 | |
After installing Docker and the NVIDIA Container Toolkit, you can build the Docker image by running the following command from the root of the cloned repository.
1 | |
The build process will take up to 1 hour and requires up to 64GB of disk space.
Manual Installation
Please follow the instructions carefully if you prefer a manual installation over the Docker method or if you need to develop or modify the code.
1. Install System Dependencies
1 2 3 4 5 6 7 8 | |
2. Install CUDA and cuDNN
If you haven't already installed CUDA 11.8 and cuDNN:
- Download CUDA 11.8 from NVIDIA's website
- Follow the installation instructions provided by NVIDIA
- Install NVIDIA Nsight Compute 2025.1.0 from NVIDIA's website
- Ensure you have the correct version for your system
- Follow the installation instructions provided by NVIDIA
- (Optional) Install cuDNN 9 for CUDA 11.8 following NVIDIA's cuDNN installation guide
3. Set Up Conda Environment
Follow the instructions on the Anaconda website to install Anaconda or Miniconda.
Then create a conda environment for Gainsight. Note that the conda environment must be created with Python 3.12.
1 2 3 | |
It is assumed that the environment variable CONDA_PREFIX is set to the path of the conda environment.
Manually set it if necessary.
1 2 | |
4. Clone the Repository
It is essential to clone this repository with the --recursive flag to ensure that all submodules are cloned as well.
1 2 3 | |
After cloning, please set up the PROJECT_ROOT environment variable to point to the root of the cloned repository.
This is essential for the build process and for running the workloads since many other environment variables are set relative to this path.
1 2 | |
Also create a log directory to store the logs generated by the workloads.
1 | |
5. Set Up Environment Variables
The script setup_template.sh is provided to set up the environment variables and symbolic links required for the project.
You can copy the template file to setup.sh and modify it as needed.
1 2 | |
Other components of the project require setup.sh to be sourced and that running setup.sh alone would be sufficient, so please change setup.sh accordingly.
5.1. Set Up Environment Variables for CUDA and Nsight Compute
First set CUDA_INSTALL_PATH and PTXAS_CUDA_INSTALL_PATH to the path where CUDA is installed.
Also append the Python API for Nsight Compute to PYTHONPATH by adding the Nsight Compute install path to it.
If you installed cuDNN in a previous step, set CUDNN_PATH to the path where cuDNN is installed.
1 2 3 | |
Since the path to Nsight Compute is variable on the version of the application, you will need to set an additional environment variable to point to the Nsight Compute version you installed.
1 | |
5.2. Set up PATH and loader/linker variables
Add the CUDA binaries to the system PATH and set Python to use the Nsight Compute Python API.
1 2 | |
Set the LD_LIBRARY_PATH and LDFLAGS environment variables to include the paths to the CUDA libraries and the conda environment libraries.
1 2 | |
5.3. Set up additional environment variables
The following additional environment variables are required to point to various components of the GPU simulator backend.
1 2 3 | |
If you want to run workloads under the /gainsight/workloads/mlperf-hugging-face directory, you will need to set the HUGGINGFACE_TOKEN environment variable to your Hugging Face token.
1 2 | |
6. Set Up Symbolic Links
Certain components of the project require symbolic links to be set up for proper functionality, including but not limited to the dynamically linked libraries and the NVBit runtime.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
7. Install Python Dependencies
1 2 3 4 5 6 7 8 | |
8. Build Required Components
Even though most of this project is written in Python, there are some components that require compilation, specifically the GPU simulator backend forked from the Accel-Sim project, as well as its various CUDA-based workloads.
8.0. Build NVBit Runtime
1 2 | |
8.1. Build Accel-Sim Tracer
1 2 3 4 5 6 7 8 9 | |
8.2. Build Accel-Sim
1 2 3 | |
8.3. Build Polybench Benchmark Workloads
The Polybench workloads are located in the workloads/polybenchGpu/CUDA directory. Clone the repository from https://code.stanford.edu/tambe-lab/polybench-gpu.git if it is not already present.
Then compile the workloads using the provided script.
1 2 | |
8.4. Build SCALE-Sim
Note that you need to create a symbolic link between the lib and lib64 directories in the conda environment to avoid issues with the linker after building the simulator.
1 2 3 4 | |
If you were to create new forks of SCALE-Sim based on the original repository, you will need to edit backend/scalesim/scale-sim-v2/scalesim/scale.py and change line 34 (or the relevant line defining save_disk_space) to:
1 | |
This is crucial to ensure that the simulator saves the memory access traces generated during the simulation.
Downloading and Running Additional Workloads
These additional workloads are not required for the main functionality of the project, are not included in the arXiv preprint artifacts, and are not fully tested to work.
CUDA Samples
1 2 3 4 5 | |
Pytorch Examples
pytorch/examples is a repository showcasing examples of using PyTorch. The goal is to have curated, short, few/no dependencies high quality examples that are substantially different from each other.
You can clone our fork of the repository from https://code.stanford.edu/tambe-lab/pytorch-examples.git.
1 2 | |
Additional Accel-Sim Benchmarks
Clone and build the additional Accel-Sim benchmarks as follows:
1 2 3 4 5 | |
Some of the workloads in this repository require additional data files to run. To download the necessary data files, run the following command. Note that you need to have an additional 20 GB of free disk space to download the data files.
1 2 | |