GainSight Analytical Frontend

This document describes the GainSight analytical frontend, which consumes memory access traces and device models associated with the backend to project the performance of emerging devices as on-chip memories working with existing accelerators. This original version of GainSight is designed to evaluate specifically using gain cell random access memory (GCRAM) as a replacement for SRAM to store short-lived data in the memory hierarchy of GPUs and systolic array DNN accelerators.

The GainSight frontend analytics are implemented in Python and operationalize the methodology described above by processing memory access traces, correlating them with device models, and generating structured outputs for further analysis and visualization.

For more details, see the source code in the Stanford GitLab repository and the methodology described in the GainSight paper.

Command-Line Usage

The script is designed to be run from the command line. It accepts arguments specifying the paths to profiling results, device model dictionaries, and output files. Example usage:

python gain_cell_frontend.py <profile_results_path> [--simulation] [--sample] [--cluster_path <path>] [--freq_retention_dict_path <path>] [--area_power_dict_path <path>] [--output_path <path>]

profile_results_path: Path to the profiling results CSV file (required).
--sample: Indicates if sampling was used in the backend simulation (default: False).
--cluster_path: Path to cluster data if sampling is enabled.
--freq_retention_dict_path: Path to the gain cell frequency-retention dictionary.
--area_power_dict_path: Path to the area and power dictionary.
--output_path: Path to write the output JSON report.

Output

The script generates a JSON file summarizing all computed statistics, including write frequencies, retention times, refresh counts, area, and energy for each memory device. This output can be used for further analysis or visualization (e.g., in Tableau).

Main Analysis Workflow

Upon execution, the script:

Loads memory access traces and device model data.
Computes key statistics for each memory hierarchy level (L1, L2 in GPUs) or subpartition (input/output/weight scratchpad buffers in systolic arrays) read/writewrite frequencies, data lifetimes, and capacity utilization.
Correlates these statistics with device models to estimate retention times, refresh requirements, area, and energy consumption.

Device Model Projection

The script uses device model data to project the performance of both fully silicon and hybrid Si-ITO gain cell random access memory (GCRAM) as a replacement for SRAM in the memory hierarchy of GPUs and systolic array DNN accelerators. The analysis includes the following:

Retention Time and Refresh Requirements

For each device, the script determines the minimum retention time required to support observed write frequencies, using a frequency-retention dictionary. For each memory block, the number of refresh operations is computed as the integer division of data lifetime by device retention time, summed across all accesses. This follows the formula: $$ R = \sum \left\lfloor \frac{T_k}{t_r} \right\rfloor \cdot B_k $$ where $T_k$ is the data lifetime, $t_r$ is the device retention time, and $B_k$ is the bit-width per data value.

Memory Array Area Requirements

The script calculates the required area for each device as: $$ A = A_{cell} \cdot 2^{\lceil\log_2 N_{addr} \cdot B \rceil} $$ where $A_{cell}$ is the bit cell area $N_{addr}$ is the number of unique addresses accessed, and $B$ is the bit-width per data value.

Dynamic energy consumption

The total active energy is computed by summing the energy for all reads, writes, and refreshes: $$ E_{dynamic} = E_r \cdot (N_r + R) + E_w \cdot (N_w + R) $$ where $E_r$ and $E_w$ are the per-bit read/write energies, and $N_r$, $N_w$, and $R$ are the counts of reads, writes, and refreshes, respectively.

Key Methods in `gain_cell_frontend.py`

analyze_write_freq(): Calculates write frequencies different memory hierarchy levels or subpartitions.
analyze_retention(): Determines the minimum device retention time required for observed write frequencies, using device model data.
analyze_refresh(): Computes the number of refresh operations required for each device and on-chip memory level, based on data lifetimes and device retention times.
analyze_area(): Estimates the physical area required for each device, based on the number of unique addresses and device bit cell area.
analyze_energy(): Calculates the total dynamic energy consumed, including the cost of refresh operations, for each device and each memory level.
run(): Executes the full analysis pipeline and outputs a structured JSON report.