GainSight with SCALE-Sim-v2 Systolic Array Simulator Backend

This document describes how to use the SCALE-Sim backend within the GainSight profiling framework. This backend utilizes a fork of SCALE-Sim v2 to simulate deep learning accelerators and extract memory access patterns for lifetime analysis.

Description

The SCALE-Sim backend, located in backend/scalesim/, allows GainSight to analyze memory lifetimes based on simulations of systolic array accelerators. It processes the memory access traces generated by SCALE-Sim v2.

Key components: - backend/scalesim/scale-sim-v2/: Contains the forked SCALE-Sim v2 simulator code. - backend/scalesim/python/: Contains Python scripts (run.py, parse_lifetimes.py, etc.) to process SCALE-Sim output traces. - frontend/scale_sim_frontend.py: Frontend script to perform final analysis and generate reports from the processed data.

Usage Workflow

The complete workflow for using the SCALE-Sim backend in GainSight involves three main phases:

1. Generate Memory Access Traces with SCALE-Sim

Generate memory access traces using the SCALE-Sim simulator.
You need a topology file (e.g., describing network layers like resnet-50.csv) and a configuration file (e.g., defining hardware parameters like gainsight.cfg). Examples might be found in backend/scalesim/scale-sim-v2/configs and topologies. Adjust the configuration file (.cfg) for your target hardware (PE array size, SRAM size, etc.).

Execute the simulation:

# Navigate to the scale-sim-v2 directory if not already there
cd backend/scalesim/scale-sim-v2
# Run the simulation
python3 scalesim/scale.py -t <path_to_topology_file.csv> -c <path_to_config_file.cfg> -p <path_to_scalesim_output_directory/>

This command generates detailed trace files in the specified output directory, including:
- COMPUTE_REPORT.csv: Layer-by-layer computational statistics
- BANDWIDTH_REPORT.csv: Memory bandwidth requirements for each layer
- Layer-specific trace files showing cycle-accurate memory accesses

2. Process Memory Access Traces with `run.py`

The run.py script in backend/scalesim/python/ is the primary entry point for processing SCALE-Sim traces into GainSight-compatible data:

cd $PROJECT_ROOT/backend/scalesim/python
python3 run.py -s <path_to_scalesim_output_directory/> -o <path_to_gainsight_processed_data_directory/>

This script performs the following operations for each layer in the network:

Parse Memory Lifetimes: Calls parse_lifetimes.py to analyze the trace files and calculate data lifetimes
Processes IFMAP, Filter, and OFMAP traces separately
Tracks both reads and writes to calculate memory lifetime (time between write and last read)
Outputs detailed CSV files with address-specific lifetime data
Generate Aggregate Statistics: For each data type (IFMAP, OFMAP, Filter), calculates:
Average, median, 90th percentile, and maximum lifetimes
Read and write frequencies
Total read and write counts
Unique address count (memory footprint)
Outputs an _aggregate_data.csv file with these statistics
Create Visualizations: Uses the create_graphs.py module to generate lifetime distribution plots for each layer and data type

The script organizes outputs in directories mirroring the structure of the SCALE-Sim results, with each network layer having its own subdirectory containing: - <layer_name>_lifetime_data.csv: Raw lifetime data for each memory address - <layer_name>_aggregate_data.csv: Aggregated memory statistics - <layer_name>_graph.png: Visualization of lifetime distributions

3. Generate Final Analysis with `scale_sim_frontend.py`

After processing the raw data, run the frontend analysis script to calculate key memory technology metrics:

cd $PROJECT_ROOT/frontend
python3 scale_sim_frontend.py <path_to_gainsight_processed_data_directory/run_name/>

Details of what the frontend does can be found in the frontend documentation; in summary, it performs the following tasks:

Import and Combine Layer Data: Concatenates data from all neural network layers
Aggregates statistics across all memory types or per memory type (IFMAP, OFMAP, Filter)
Calculate Memory Technology Metrics: Performs analysis based on different memory cell technologies
Reads cell technology parameters from simple_gc_list.json (gain cell retention times)
Reads area and power parameters from area_power.json
Calculates required refresh rates based on data lifetimes and cell retention times
Determines area requirements based on memory footprint and cell size
Estimates energy consumption accounting for reads, writes, and refresh operations
Output Results: Generates a comprehensive JSON report with:
Overall workload information (name, size, dataflow style)
Data lifetime statistics across all memory subdivisions
Memory technology comparisons (SRAM vs. different gain cell variants)
Area and energy estimates for different memory technologies

Output Files and Data Format

SCALE-Sim Raw Output

The SCALE-Sim simulator generates these files in the output directory: - COMPUTE_REPORT.csv: Layer-wise computational statistics - BANDWIDTH_REPORT.csv: Memory bandwidth requirements - DETAILED_ACCESS_REPORT.csv: Access patterns summary - Per-layer trace directories containing: - IFMAP_SRAM_TRACE.csv, IFMAP_DRAM_TRACE.csv: Input feature map memory accesses - FILTER_SRAM_TRACE.csv, FILTER_DRAM_TRACE.csv: Filter/weight memory accesses - OFMAP_SRAM_TRACE.csv, OFMAP_DRAM_TRACE.csv: Output feature map memory accesses

These traces record cycle-accurate memory accesses when save_disk_space=False is configured in SCALE-Sim.

GainSight Processed Data

The run.py script generates these files for each layer: - <layer>_lifetime_data.csv: Raw lifetime data with columns: - On-chip memory subdivision (ifmap/filter/ofmap) - Memory address - Lifetime in cycles (time between write and last read)

<layer>_aggregate_data.csv: Statistical summary with columns:
On-chip memory subdivision (ifmap/filter/ofmap)
Lifetime statistics (avg, median, 90th%, max)
Access frequencies (read/write)
Operation counts (reads/writes)
Memory footprint (unique addresses)

GainSight Frontend Output

scale_sim_frontend.py produces a JSON file with: - Workload identification (name, size, dataflow style) - Write frequency analysis per data type - Refresh count estimations for different memory technologies - Area projections for different memory technologies - Energy consumption estimates for different memory technologies

This final output enables architects to make technology-aware decisions about memory hierarchy design for specific AI workloads.