Installation Instructions

Setting up everything needed for GainSight to work can be a long and tedious process, thus please read the instructions carefully and follow them step by step. We recommend using Docker to set up this project, but you may find it easier to set up the project manually if you are planning to contribute to the project.

There are also additional workloads not included in the GainSight submodule hierarchy or the Docker image that you can download and run.

Prerequisites

*nix-based operating system with the x86_64 processor architecture. The following instructions are tailored for Ubuntu 22.04 with the apt package manager and the bash shell.
At least 33 GB of free disk space and at least 16 GB of DRAM.
NVIDIA GPU with architecture generations ranging from Kepler to Hopper.
Git with SSH access to the repository

If you plan to use Docker, this project was originally built with Docker version 27.3.1 and NVIDIA Container Toolkit version 1.17.4. Please refer to the official Docker documentation for instructions on how to install Docker and the official NVIDIA Container Toolkit documentation for instructions on how to install the NVIDIA Container Toolkit.

If you are planning to build and run the tool from source, you will need to install CUDA Toolkit version 11.8 and NVIDIA Nsight Compute v2025.1.0. Please refer to the NVIDIA CUDA Toolkit documentation for instructions on how to install the CUDA Toolkit and the NVIDIA Nsight Compute documentation for instructions on how to install Nsight Compute.

Docker Installation

Docker is the preferred method of running this project. You can access the container registry at https://code.stanford.edu/tambe-lab/gainsight/container_registry/481.

1. Pulling a Prebuilt Docker Image

A prebuilt Docker image is available on Stanford GitLab at scr.svc.stanford.edu/tambe-lab/gainsight:latest. To pull the prebuilt Docker image, run the following command. Note that you will need a personal access token to log in to the Stanford GitLab container registry.

1	`docker pull scr.svc.stanford.edu/tambe-lab/gainsight:latest`

2. Running the Docker Image

Use the following command to run the Docker image. Note that the --gpus all flag is required to enable GPU acceleration, and the --rm flag is optional but recommended to remove the container after it is stopped; the --it flag is required to interact with the container through the terminal.

docker run --gpus all -it --rm --cap-add=SYS_ADMIN scr.svc.stanford.edu/tambe-lab/gainsight:latest
cd /gainsight && conda activate gainsight

If you want to run workloads under the /gainsight/workloads/mlperf-hugging-face directory, you will need to set the HUGGINGFACE_TOKEN environment variable to your Hugging Face token. You can set the token by running the following commands.

export HUGGINGFACE_TOKEN=<your_huggingface_token>
echo 'export HUGGINGFACE_TOKEN="<your_huggingface_token>"' >> /gainsight/setup.sh

3. Building the Docker Image

If you intend to contribute to the project, you will need to build the Docker image from source. This project was originally built with Docker version 27.3.1 and NVIDIA Container Toolkit version 1.17.4. It is assumed that you are running a *nix-based operation system of the x86_64 architecture with an NVIDIA GPU from anywhere between the Kepler and Hopper architectures. Please refer to the official Docker documentation for instructions on how to install Docker and the official NVIDIA Container Toolkit documentation for instructions on how to install the NVIDIA Container Toolkit.

Please first clone the repository and set the PROJECT_ROOT environment variable to point to the root of the cloned repository. It is essential to clone this repository with the --recursive flag to ensure that all submodules are cloned as well.

# Clone the repository with submodules
git clone --recursive git@code.stanford.edu:tambe-lab/gainsight.git
cd gainsight

After cloning, please set up the PROJECT_ROOT environment variable to point to the root of the cloned repository. This is essential for the build process and for running the workloads since many other environment variables are set relative to this path.

# Set PROJECT_ROOT to the path of the cloned repository
export PROJECT_ROOT=$(pwd)

After installing Docker and the NVIDIA Container Toolkit, you can build the Docker image by running the following command from the root of the cloned repository.

DOCKER_BUILDKIT=1 docker build --target final -t gainsight:latest .

The build process will take up to 1 hour and requires up to 64GB of disk space.

Manual Installation

Please follow the instructions carefully if you prefer a manual installation over the Docker method or if you need to develop or modify the code.

1. Install System Dependencies

# Update package lists
sudo apt-get update && sudo apt-get -y upgrade

# Install essential packages
sudo apt-get -y install git openssh-client wget build-essential g++ vim \
    cmake xutils-dev bison zlib1g-dev flex libglu1-mesa-dev libssl-dev \
    libxml2-dev libboost-all-dev libxml2-dev freeglut3-dev libxmu-dev \
    libxi-dev

2. Install CUDA and cuDNN

If you haven't already installed CUDA 11.8 and cuDNN:

Download CUDA 11.8 from NVIDIA's website
Follow the installation instructions provided by NVIDIA
Install NVIDIA Nsight Compute 2025.1.0 from NVIDIA's website
Ensure you have the correct version for your system
Follow the installation instructions provided by NVIDIA
(Optional) Install cuDNN 9 for CUDA 11.8 following NVIDIA's cuDNN installation guide

3. Set Up Conda Environment

Follow the instructions on the Anaconda website to install Anaconda or Miniconda.

Then create a conda environment for Gainsight. Note that the conda environment must be created with Python 3.12.

# Create a conda environment for Gainsight
conda create -y -n gainsight python=3.12
conda activate gainsight

It is assumed that the environment variable CONDA_PREFIX is set to the path of the conda environment. Manually set it if necessary.

# Set CONDA_PREFIX to the path of the conda environment
export CONDA_PREFIX=$(conda info --base)/envs/gainsight

4. Clone the Repository

It is essential to clone this repository with the --recursive flag to ensure that all submodules are cloned as well.

# Clone the repository with submodules
git clone --recursive git@code.stanford.edu:tambe-lab/gainsight.git
cd gainsight

After cloning, please set up the PROJECT_ROOT environment variable to point to the root of the cloned repository. This is essential for the build process and for running the workloads since many other environment variables are set relative to this path.

# Set PROJECT_ROOT to the path of the cloned repository
export PROJECT_ROOT=$(pwd)

Also create a log directory to store the logs generated by the workloads.

1	`mkdir -p logs`

5. Set Up Environment Variables

The script setup_template.sh is provided to set up the environment variables and symbolic links required for the project. You can copy the template file to setup.sh and modify it as needed.

# Copy the template file to setup.sh
cp setup_template.sh setup.sh

Other components of the project require setup.sh to be sourced and that running setup.sh alone would be sufficient, so please change setup.sh accordingly.

5.1. Set Up Environment Variables for CUDA and Nsight Compute

First set CUDA_INSTALL_PATH and PTXAS_CUDA_INSTALL_PATH to the path where CUDA is installed. Also append the Python API for Nsight Compute to PYTHONPATH by adding the Nsight Compute install path to it. If you installed cuDNN in a previous step, set CUDNN_PATH to the path where cuDNN is installed.

export CUDA_INSTALL_PATH=/usr/local/cuda-11.8
export PTXAS_CUDA_INSTALL_PATH=/usr/local/cuda-11.8
export CUDNN_PATH=/usr/include  # Optional

Since the path to Nsight Compute is variable on the version of the application, you will need to set an additional environment variable to point to the Nsight Compute version you installed.

export NCU_VERSION=$(ncu --version | grep "Version" | sed -E 's/Version ([0-9]+\.[0-9]+\.[0-9]+).*/\1/')

5.2. Set up PATH and loader/linker variables

Add the CUDA binaries to the system PATH and set Python to use the Nsight Compute Python API.

export PATH=${CUDA_INSTALL_PATH}/bin:${PATH}
export PYTHONPATH=/opt/nvidia/nsight-compute/${NCU_VERSION}/extras/python:${PYTHONPATH}

Set the LD_LIBRARY_PATH and LDFLAGS environment variables to include the paths to the CUDA libraries and the conda environment libraries.

export LD_LIBRARY_PATH=$CUDA_INSTALL_PATH/lib64/stubs:$CUDA_INSTALL_PATH/lib64:$CUDA_INSTALL_PATH/lib:$CUDNN_PATH:/usr/lib/x86_64-linux-gnu:/usr/lib:$LD_LIBRARY_PATH
export LDFLAGS="-L/usr/local/cuda/lib64/stubs/ -L$CUDA_INSTALL_PATH/lib64 -L$CUDA_INSTALL_PATH/lib -L$CUDNN_PATH -L/usr/lib/x86_64-linux-gnu -L/usr/lib"

5.3. Set up additional environment variables

The following additional environment variables are required to point to various components of the GPU simulator backend.

export ACCELSIM_ROOT=$PROJECT_ROOT/backend/accel-sim/gpu-simulator
export GPGPUSIM_ROOT=$PROJECT_ROOT/backend/accel-sim/gpu-simulator/gpgpu-sim
export GPUAPPS_ROOT=$PROJECT_ROOT/workloads/accel-sim-benchmarks

If you want to run workloads under the /gainsight/workloads/mlperf-hugging-face directory, you will need to set the HUGGINGFACE_TOKEN environment variable to your Hugging Face token.

export HUGGINGFACE_TOKEN=<your_huggingface_token>
echo 'export HUGGINGFACE_TOKEN="<your_huggingface_token>"' >> /gainsight/setup.sh

6. Set Up Symbolic Links

Certain components of the project require symbolic links to be set up for proper functionality, including but not limited to the dynamically linked libraries and the NVBit runtime.

# Navigate to project root
cd $PROJECT_ROOT

# Create symbolic links for NVBit
rm -rf $PROJECT_ROOT/backend/accel-sim/util/tracer_nvbit/nvbit_release
ln -sfn $PROJECT_ROOT/lib/nvbit_release $PROJECT_ROOT/backend/accel-sim/util/tracer_nvbit/nvbit_release
ln -sfn $PROJECT_ROOT/lib/nvbit_release/core/libnvbit.a $PROJECT_ROOT/backend/ncu-nvbit/libnvbit.a

# Create symbolic links for PyBind11
rm -rf $PROJECT_ROOT/backend/accel-sim/gpu-simulator/extern/pybind11
mkdir -p $PROJECT_ROOT/backend/accel-sim/gpu-simulator/extern
ln -sfn $PROJECT_ROOT/lib/pybind11 $PROJECT_ROOT/backend/accel-sim/gpu-simulator/extern/pybind11

# Create dynamic links for conda environment
ln -sfn $CONDA_PREFIX/lib $CONDA_PREFIX/lib64
# Create dynamic links for cuDNN if installed
ln -sfn $CUDNN_PATH/lib $CUDNN_PATH/lib64

7. Install Python Dependencies

# Activate conda environment if not already activated
conda activate gainsight

# Install PyTorch with CUDA 11.8 support
pip install torch torchdata torchtext torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install other required packages
pip install -I -r requirements.txt

8. Build Required Components

Even though most of this project is written in Python, there are some components that require compilation, specifically the GPU simulator backend forked from the Accel-Sim project, as well as its various CUDA-based workloads.

8.0. Build NVBit Runtime

cd $PROJECT_ROOT/backend/ncu-nvbit
make -j$(nproc)

8.1. Build Accel-Sim Tracer

cd $PROJECT_ROOT/backend/accel-sim/util/tracer_nvbit
export BASH_ROOT=$PROJECT_ROOT/backend/accel-sim/util/tracer_nvbit
make -j$(nproc)

# Set additional environment variables specific to the tracer
export TRACER_PATH=$PROJECT_ROOT/backend/accel-sim/util/tracer_nvbit
export TRACER_TOOLS=$PROJECT_ROOT/backend/accel-sim/util/tracer_nvbit/tracer_tool
export TRACER_LIB=$PROJECT_ROOT/backend/accel-sim/util/tracer_nvbit/tracer_tool/tracer_tool.so
export TRACER_POST_PROCESS=$PROJECT_ROOT/backend/accel-sim/util/tracer_nvbit/tracer_tool/traces-processing/post-traces-processing

8.2. Build Accel-Sim

cd $PROJECT_ROOT/backend/accel-sim/gpu-simulator/
source setup_environment.sh
make -j$(nproc)

8.3. Build Polybench Benchmark Workloads

The Polybench workloads are located in the workloads/polybenchGpu/CUDA directory. Clone the repository from https://code.stanford.edu/tambe-lab/polybench-gpu.git if it is not already present. Then compile the workloads using the provided script.

cd $PROJECT_ROOT/workloads/polybenchGpu/CUDA
bash compileCodes.sh

8.4. Build SCALE-Sim

Note that you need to create a symbolic link between the lib and lib64 directories in the conda environment to avoid issues with the linker after building the simulator.

cd $PROJECT_ROOT/backend/scalesim/scale-sim-v2
python3 setup.py install
# Or run pip3 install . if setup.py fails
ln -sfn $CONDA_PREFIX/lib $CONDA_PREFIX/lib64

If you were to create new forks of SCALE-Sim based on the original repository, you will need to edit backend/scalesim/scale-sim-v2/scalesim/scale.py and change line 34 (or the relevant line defining save_disk_space) to:

save_disk_space=False

This is crucial to ensure that the simulator saves the memory access traces generated during the simulation.

Downloading and Running Additional Workloads

These additional workloads are not required for the main functionality of the project, are not included in the arXiv preprint artifacts, and are not fully tested to work.

CUDA Samples

cd $PROJECT_ROOT/workloads
wget https://github.com/NVIDIA/cuda-samples/archive/refs/tags/v11.8.tar.gz
tar -xzvf v11.8.tar.gz && rm -rf v11.8.tar.gz
cd $PROJECT_ROOT/workloads/cuda-samples-11.8
make -j$(nproc)

Pytorch Examples

pytorch/examples is a repository showcasing examples of using PyTorch. The goal is to have curated, short, few/no dependencies high quality examples that are substantially different from each other. You can clone our fork of the repository from https://code.stanford.edu/tambe-lab/pytorch-examples.git.

cd $PROJECT_ROOT/workloads
git clone https://code.stanford.edu/tambe-lab/pytorch-examples.git

Additional Accel-Sim Benchmarks

Clone and build the additional Accel-Sim benchmarks as follows:

cd $PROJECT_ROOT/workloads
git clone https://code.stanford.edu/tambe-lab/accel-sim-benchmarks.git
cd accel-sim-benchmarks
source ./src/setup_environment
make all -i -j -C ./src

Some of the workloads in this repository require additional data files to run. To download the necessary data files, run the following command. Note that you need to have an additional 20 GB of free disk space to download the data files.

make data -C ./src
rm -rf all.gpgpu-sim-app-data.tgz