GainSight Accel-Sim Backend Python API Documentation
This page documents the Python scripts in gainsight/backend/python-scripts/ supporting the Accel-Sim backend using mkdocstrings.
Please refer to the backend wiki for a summary of implementation details and usage instructions.
Accel-Sim Runner Script
The accel_sim.py script is the main entry point for running the Accel-Sim simulator. It handles command-line arguments, initializes the simulation environment, and manages the execution of the simulation.
Simulation-based, cache line-level analysis of GPU programs using Accel-Sim.
This script runs the Accel-Sim simulator on a given program with the specified arguments, by first generating SASS traces using the NVBit tracer and then running the simulator on the generated traces. The script also provides an option to run the program with kernel sampling using principal kernel selection (PKS) to reduce the number of kernels in the traces.
TODO: Output redirection to log files, error handling, and more detailed documentation.
Usage
python3 accel_sim.py
Pre-requisites
- The program must exist and be executable.
- The program must be run within its desired working directory.
- The Accel-Sim tools must be compiled and available in the environment.
- The PROJECT_ROOT and CUDA_INSTALL_PATH environment variables must be set to the project root directory and the CUDA installation path, respectively.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
program
|
str
|
The program to profile. |
required |
args
|
List[str]
|
The arguments to pass to the program. |
required |
--sample
|
bool
|
Run the program with kernel sampling using PKS. |
required |
--arch
|
str
|
The architecture to simulate (default: "SM90_H100"). |
required |
--delete
|
bool
|
Delete the traces directory |
required |
--verbose
|
bool
|
Store verbose output from Accel-Sim. |
required |
Output
The log files are saved under $PROJECT_ROOT/logs/
AccelSimRunner
Manages the Accel-Sim workflow: tracing, sampling, simulation, and post-processing.
This class encapsulates the steps required to profile a GPU program using Accel-Sim: 1. Optionally run Nsight Compute and Principal Kernel Selection (PKS) for sampling. 2. Run the NVBit tracer to generate SASS instruction traces. 3. Run the Accel-Sim GPGPU simulator on the generated (or sampled) traces. 4. Run post-processing scripts to analyze simulation output and generate reports.
Attributes:
| Name | Type | Description |
|---|---|---|
program |
str
|
Absolute path to the executable program or Python interpreter. |
program_args |
List[str]
|
List of arguments to pass to the program. |
arch |
str
|
Target GPU architecture for simulation (e.g., "SM90_H100"). |
verbose |
bool
|
Flag to enable verbose logging during simulation. |
delete |
bool
|
Flag to delete existing trace directory before tracing. |
pwd |
str
|
The original working directory from where the script was invoked. |
original_cwd |
str
|
The original current working directory (same as pwd). |
log_file_name |
str
|
Base name for log files (program_timestamp). |
log_file_path |
str
|
Path to the directory where logs are stored. |
kernelslist_g |
str
|
Path to the kernelslist.g file within the trace directory. |
ncu_file |
str
|
Path to an existing NCU report file to use for sampling. |
Source code in python-scripts/accel_sim.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 | |
__init__(program, args, arch='SM90_H100', verbose=False, delete=False, rename=None, ncu_file=None)
Initialize the AccelSimRunner.
Sets up paths, log naming, and execution environment based on input parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
program
|
str
|
Path or name of the program to profile. |
required |
args
|
List[str]
|
Arguments to pass to the program. |
required |
arch
|
str
|
GPU architecture (default: "SM90_H100"). |
'SM90_H100'
|
verbose
|
bool
|
Enable verbose simulator logging. |
False
|
delete
|
bool
|
Remove existing traces directory. |
False
|
rename
|
str
|
Custom base name for log files. |
None
|
ncu_file
|
str
|
Existing Nsight Compute report for sampling. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
None |
Initializes runner state and prepares directories. |
Source code in python-scripts/accel_sim.py
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 | |
run_accel_sim(no_write_allocate=False)
Run the Accel-Sim GPGPU simulator on the generated traces.
Sets up the environment for Accel-Sim, constructs the command line with
appropriate configuration files (including handling the write-allocate setting),
and executes the simulator. It captures and redirects the simulator's output
to different log files (.sim_cache.log, .sim.log, .sim_verbose.log)
based on regex patterns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
no_write_allocate
|
bool
|
Disable cache write-allocate configuration. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
None |
Executes the simulation and writes log files. |
Source code in python-scripts/accel_sim.py
333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 | |
run_post_processing(config_file, sample=False)
Post-process simulation outputs into CSV and JSON reports.
First, runs the accel_sim_parser.run_parser function to generate CSV summaries
(*.sim.csv, *.sim_l1.csv, *.sim_l2.csv) from the simulation cache log.
Second, runs the gain_cell_frontend.py script to generate a JSON file
(*.frontend.json) for visualization or further analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_file
|
str
|
Path to GPGPU-Sim configuration used during simulation. |
required |
sample
|
bool
|
Indicates if sampling was used. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
None |
Outputs |
Source code in python-scripts/accel_sim.py
519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 | |
run_sampling(sample_delete=False)
Perform kernel sampling using Nsight Compute and Principal Kernel Selection (PKS).
If an NCU report (self.ncu_file) is not provided, it first runs Nsight Compute
to generate one. Then, it runs the PrincipalKernelSelector on the NCU report
to identify representative kernels. It renames the original kernelslist.g
(if it exists) to kernelslist.old.g and generates a new kernelslist.g
containing only the selected kernels. Optionally deletes trace files not
corresponding to the selected kernels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_delete
|
bool
|
Remove traces not selected by PKS. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
Path to the directory with updated |
Raises:
| Type | Description |
|---|---|
Exception
|
Propagates errors from NCU or PKS execution. |
Source code in python-scripts/accel_sim.py
453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 | |
run_tracer(sample=False)
Run the NVBit tracer to generate SASS instruction traces for the program.
Sets up the environment for the NVBit tracer tool, executes the target program
under the tracer, and then runs the post-processing script to convert raw
traces into the format required by Accel-Sim (.traceg files). Optionally
deletes the intermediate .trace files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample
|
bool
|
Enable kernel sampling via environment variable. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
Path to the directory containing processed traces. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the program executable is not found. |
Source code in python-scripts/accel_sim.py
250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 | |
parse_args()
Parse command-line arguments for the Accel-Sim runner script.
Defines and parses arguments related to program execution, simulation options, tracing, sampling, and post-processing.
Returns:
| Type | Description |
|---|---|
|
argparse.Namespace: An object containing the parsed command-line arguments.
Includes attributes like |
Source code in python-scripts/accel_sim.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
Sampling via Principal Kernel Selection (PKS)
The pks.py script implements the Principal Kernel Selection (PKS) algorithm for sampling in the Accel-Sim simulator. It provides functions to select a subset of kernels based on their execution characteristics and performance metrics.
Principal Kernel Selection implementation for CUDA kernel profiling.
This module implements the Principal Kernel Selection (PKS) algorithm, which uses PCA and K-means clustering to identify representative CUDA kernels from an NVIDIA NSight Compute report. This helps reduce simulation time by focusing on a smaller set of representative kernels.
The functionalities of this module is an adaptation and reproduction of the following work: Cesar Avalos Baddouh, Mahmoud Khairy, Roland N. Green, Mathias Payer, and Timothy G. Rogers. 2021. Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '21). Association for Computing Machinery, New York, NY, USA, 724–737. https://doi.org/10.1145/3466752.3480100
TODO: Implement real-time monitoring of read/write frequencies, bursts, and lifetime trends.
Typical usage
pks = PrincipalKernelSelector( ncu_input_file="path/to/report.ncu-rep", output_dir="path/to/traces_dir" ) pks.run_pks()
PrincipalKernelSelector
Analyzes CUDA kernel metrics to identify representative kernels.
Implements the Principal Kernel Selection (PKS) algorithm using PCA and K-means clustering to select representative CUDA kernels for simulation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ncu_input_file
|
str
|
Path to the NCU report file to analyze. |
required |
output_dir
|
str
|
Directory for saving output files. Defaults to script directory. |
None
|
Returns:
| Type | Description |
|---|---|
|
None |
Attributes:
| Name | Type | Description |
|---|---|---|
metrics |
List[str]
|
List of collected NCU metrics for analysis. |
ncu_data |
DataFrame
|
DataFrame containing raw kernel metrics. |
pca_df |
DataFrame
|
DataFrame with PCA-transformed kernel data. |
cluster_count |
int
|
Optimal number of clusters determined. |
kernel_df |
DataFrame
|
DataFrame with kernel details and cluster assignments. |
cluster_df |
DataFrame
|
DataFrame with cluster details and centroids. |
output_dir |
str
|
Directory for saving output files. |
bypass |
bool
|
Flag indicating if PKS should be bypassed. |
sum_lts_t_sectors_op_write |
float
|
Sum of 'lts__t_sectors_op_write.sum' for all kernels. |
Source code in python-scripts/pks.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 | |
__init__(ncu_input_file, output_dir=None)
Initialize the Principal Kernel Selector with an NCU report.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ncu_input_file
|
str
|
Path to the NCU report file to analyze. |
required |
output_dir
|
str
|
Directory for saving output files. Should be the directory containing trace files (kernel-1.traceg, etc.). Defaults to the directory of this script. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the input file contains fewer than 2 kernels (unless bypass is triggered). |
FileNotFoundError
|
If the metrics_list.json file is not found. |
ImportError
|
If the ncu_report module cannot be imported. |
Source code in python-scripts/pks.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
generate_kernels_csv()
Generate CSV files containing the kernel and cluster data.
Creates two CSV files in the output_dir:
- kernels.csv: Contains information about all kernels, including their
original ID, name, assigned cluster ID, and the ID of the centroid
representing their cluster.
- clusters.csv: Contains information about each cluster, including its ID,
the number of kernels it contains, and the ID and name of its
representative centroid kernel.
Returns:
| Name | Type | Description |
|---|---|---|
None |
None
|
Writes |
Source code in python-scripts/pks.py
419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 | |
generate_kernelslist(delete=False)
Generate a list file containing the selected representative kernels.
Creates a kernelslist.g file in the output_dir. This file lists the
trace file names (e.g., kernel-1.traceg, kernel-5.traceg) corresponding
to the selected centroid kernels. Optionally, deletes trace files in the
output_dir that do not correspond to the selected centroids.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
delete
|
bool
|
If True, deletes trace files ( |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
None |
None
|
Writes |
Source code in python-scripts/pks.py
384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 | |
kmeans(data, n_clusters=3)
Perform K-means clustering with a specified number of clusters.
Calculates cluster assignments, centroids, and a custom score based on the difference between the sum of 'lts__t_sectors_op_write.sum' for centroid-representative kernels (weighted by cluster size) and the total sum for all kernels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to cluster (typically PCA-transformed). |
required |
n_clusters
|
int
|
Number of clusters to form. Defaults to 3. |
3
|
Returns:
| Type | Description |
|---|---|
Tuple[ndarray, ndarray, float]
|
Tuple[np.ndarray, np.ndarray, float]: A tuple containing: - np.ndarray: Array of cluster labels for each data point. - np.ndarray: Array of cluster centers (centroids). - float: Custom score representing the relative error in 'lts__t_sectors_op_write.sum'. |
Source code in python-scripts/pks.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 | |
kmeans_scan(data, lower_bound=2, upper_bound=20)
Find optimal number of clusters by scanning a range of values.
Tries K-means with different numbers of clusters (from lower_bound to upper_bound).
Selects the number of clusters corresponding to the lowest custom score calculated by kmeans.
If multiple cluster counts yield scores within 5% of the minimum, the one with the
fewest clusters is chosen. Updates class attributes cluster_count, kernel_df,
and cluster_df with the results of the best clustering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data to cluster (typically PCA-transformed). |
required |
lower_bound
|
int
|
Minimum number of clusters to try. Defaults to 2. |
2
|
upper_bound
|
int
|
Maximum number of clusters to try. Defaults to 20. |
20
|
Returns:
| Name | Type | Description |
|---|---|---|
None |
None
|
Updates class attributes with clustering results. |
Source code in python-scripts/pks.py
257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 | |
pca(data, var_threshold=0.95)
Perform Principal Component Analysis on kernel metrics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
DataFrame containing kernel metrics. |
required |
var_threshold
|
float
|
Variance threshold for PCA dimensionality reduction. Defaults to 0.95. |
0.95
|
Returns:
| Type | Description |
|---|---|
Tuple[DataFrame, ndarray]
|
Tuple[pd.DataFrame, np.ndarray]: A tuple containing: - pd.DataFrame: DataFrame with transformed data, columns named 'PC0', 'PC1', etc. - np.ndarray: Raw numpy array of the transformed data. |
Source code in python-scripts/pks.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
run_pks(output_file=None, delete=False)
Execute the complete Principal Kernel Selection workflow.
Performs PCA on the kernel metrics, finds the optimal number of clusters
using K-means scanning, selects representative centroid kernels for each cluster,
and generates output files (kernelslist.g, kernels.csv, clusters.csv)
summarizing the results. If the number of kernels is small (<= 20),
it bypasses the PCA and clustering steps, treating each kernel as its own cluster.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_file
|
str
|
Path to the output |
None
|
delete
|
bool
|
If True, deletes non-centroid trace files during the
|
False
|
Returns:
| Name | Type | Description |
|---|---|---|
None |
None
|
Generates output files with the results of the analysis. |
Source code in python-scripts/pks.py
447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 | |
select_centroid()
Select representative kernels by finding points closest to cluster centers.
For each cluster identified by K-means, this method finds the kernel
(data point) in the PCA space that is closest to the cluster's center
(centroid) using Euclidean distance. It marks this kernel as the
representative for that cluster. Updates both kernel_df and cluster_df
with centroid information (Kernel ID and Name).
Returns:
| Name | Type | Description |
|---|---|---|
None |
None
|
Updates |
Source code in python-scripts/pks.py
320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 | |
Sampling Utilities: NVIDIA NSight Compute Coarse-Grained Profiling
The two scripts nsight_nvbit.py and ncu_exec.py are used for coarse-grained profiling of GPU kernels using NVIDIA NSight Compute.
The profile results are used in the sampling process to select the most representative kernels for simulation.
Class to run Nsight Compute and NVBit on a given program with the specified arguments.
Provides methods to execute NVIDIA Nsight Compute (ncu) and a custom NVBit tool for profiling GPU applications. It handles environment setup, command construction, and log file management.
The ability to run this script as a standalone program is deprecated. Please use ncu_exec.py instead.
NsightNVBitRunner
A runner class to execute Nsight Compute and NVBit profiling tools.
Manages the configuration and execution of Nsight Compute (ncu) and a custom
NVBit tool (ncu-nvbit.so) on a specified target program. It sets up
necessary environment variables, constructs command lines, runs the tools,
and manages output log files.
Attributes:
| Name | Type | Description |
|---|---|---|
program |
str
|
Path to the executable program or interpreter. |
program_args |
List[str]
|
Arguments for the target program. |
log_file_name |
str
|
Base name for generated log files (e.g., program_timestamp). |
log_file_path |
str
|
Directory path where log files are stored. |
mangled |
bool
|
Flag indicating whether to use mangled kernel names (primarily for NVBit). |
Source code in python-scripts/nsight_nvbit.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | |
__init__()
Initialize the NsightNVBitRunner with default None values.
Source code in python-scripts/nsight_nvbit.py
49 50 51 52 53 54 55 56 | |
get_metrics()
Load and return the list of Nsight Compute metrics from a JSON file.
Reads metrics specified in 'metrics_list.json' located in the same directory as this script. The JSON file should contain a dictionary where values are lists of metric names.
Returns:
| Name | Type | Description |
|---|---|---|
str |
A comma-separated string of all metric names found in the JSON file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If 'metrics_list.json' is not found. |
ValueError
|
If 'metrics_list.json' is empty or contains no metrics. |
JSONDecodeError
|
If 'metrics_list.json' is not valid JSON. |
Source code in python-scripts/nsight_nvbit.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |
init_from_params(program_name, program_args, log_file_name, log_file_path, mangled=True)
Initialize the runner with explicitly provided parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
program_name
|
str
|
The program executable or interpreter name/path. |
required |
program_args
|
List[str]
|
List of arguments for the program. |
required |
log_file_name
|
str
|
Base name for log files. |
required |
log_file_path
|
str
|
Directory path for log files. |
required |
mangled
|
bool
|
Whether to use mangled kernel names. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
|
None |
Source code in python-scripts/nsight_nvbit.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
init_from_program(program_name, program_args, mangled=True)
Initialize the runner based on program name and arguments.
Determines the actual executable (handling Python scripts), generates log file names and paths based on the program name and timestamp, and prints a configuration summary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
program_name
|
str
|
The program executable or script name/path. |
required |
program_args
|
List[str]
|
List of arguments for the program/script. |
required |
mangled
|
bool
|
Whether to use mangled kernel names. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
|
None |
Source code in python-scripts/nsight_nvbit.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |
run_ncu(dry_run=True)
Run NVIDIA Nsight Compute (ncu) on the target program.
Constructs the ncu command line with specified metrics (from get_metrics),
configuration flags (e.g., --force-overwrite, --replay-mode), and output
options (--export). Sets the TMPDIR environment variable. If dry_run is False,
it executes ncu and captures its command-line output to a .exec_ncu.log file.
The main report is saved to a .ncu-rep file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dry_run
|
bool
|
If True, print the command without executing it. Defaults to True. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
The full path to the generated |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the |
CalledProcessError
|
If the |
Source code in python-scripts/nsight_nvbit.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | |
run_nvbit(dry_run=True)
Run the custom NVBit tool (ncu-nvbit.so) on the target program.
Sets up the environment (CUDA_INJECTION64_PATH, etc.), constructs the
command, and executes the program under the NVBit tool. If dry_run is False,
it captures the output to a .nvbit log file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dry_run
|
bool
|
If True, print the command without executing it. Defaults to True. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
The full path to the generated |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the |
CalledProcessError
|
If the NVBit execution fails (when dry_run=False). |
Source code in python-scripts/nsight_nvbit.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
Execution-based, kernel-level, runtime cache analysis using Nsight Compute and NVBit.
This script profiles a specified program using NVIDIA Nsight Compute CLI and a custom NVBit tool. It runs both tools on the target program, parses their outputs, computes derived cache metrics (like lifetime, frequency, utilization), and generates CSV reports and optional plots.
Usage
python3 ncu_exec.py [--dry-run][--mangled] [--histogram]
Pre-requisites
- The program must exist and be executable.
- The PROJECT_ROOT and CUDA_INSTALL_PATH environment variables must be set.
- The Nsight Compute CLI must be installed and available in CUDA_INSTALL_PATH.
- The NVBit library (
ncu-nvbit.so) must be compiled and available in$PROJECT_ROOT/backend/ncu-nvbit.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
program
|
str
|
The program to profile. |
required |
args
|
List[str]
|
Arguments to pass to the program. |
required |
--dry-run
|
bool
|
Print commands without running tools. Defaults to False. |
required |
--mangled
|
bool
|
Use mangled kernel names in output. Defaults to False. |
required |
--histogram
|
bool
|
Generate plots of computed metrics. Defaults to False. |
required |
Output
Log files are saved under $PROJECT_ROOT/logs/<program_name>/<program_name>_<timestamp>:
- .ncu-rep: Raw Nsight Compute report file.
- .nvbit: Raw NVBit log file.
- .exec_ncu.log: Nsight Compute CLI command output log.
- .exec_cmd.log: Command used to run the script.
- .exec.kernels: (If generated by ncu) Kernel name mapping file.
- .exec.csv: Computed metrics for each kernel (CSV format).
- .exec_l1.png: (Optional) L1 cache metrics plots.
- .exec_l2.png: (Optional) L2 cache metrics plots.
compute_kernel_metrics(kernel_id, kernel_action, unique_sector_counts)
Compute derived cache metrics for a single kernel based on NCU and NVBit data.
Calculates metrics like cache active time, read/write frequency, lifetime, and
utilization for both L1 and L2 caches using raw counter values from the Nsight
Compute report (kernel_action) and unique sector counts from NVBit data
(unique_sector_counts).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kernel_id
|
int
|
The ID (index) of the kernel being processed. |
required |
kernel_action
|
IAction
|
The Nsight Compute action object containing metrics for this kernel. |
required |
unique_sector_counts
|
List[int]
|
A list containing |
required |
Returns:
| Type | Description |
|---|---|
|
Optional[pd.DataFrame]: A one-row DataFrame containing the computed metrics for the kernel, or None if the kernel execution time or relevant cache access times are zero. Columns include "Kernel ID", "Function Name", "Total Cycles", "Kernel Execution Time", "L1 Active Time", "L1 Read Frequency", "L1 Write Frequency", "L1 Lifetime", "L1 Utilization", "L2 Active Time", "L2 Read Frequency", "L2 Write Frequency", "L2 Lifetime", "L2 Utilization". Times are in microseconds, frequencies in MHz. |
Source code in python-scripts/ncu_exec.py
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 | |
get_correspondence_table(results, mangled)
Extract and print a mapping between kernel IDs and kernel names.
Creates a DataFrame containing either the mangled or unmangled kernel names
based on the mangled flag, alongside their corresponding kernel IDs. Prints
this table to the console.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
DataFrame
|
DataFrame containing computed metrics and kernel names (Mangled Names, Unmangled Names columns). |
required |
mangled
|
bool
|
If True, use 'Mangled Names' column; otherwise, use 'Unmangled Names'. |
required |
Returns:
| Type | Description |
|---|---|
|
pd.DataFrame: A DataFrame with two columns: 'Kernel ID' and either 'Mangled Names' or 'Unmangled Names'. |
Source code in python-scripts/ncu_exec.py
445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 | |
parse_arguments()
Parse command-line arguments for the NCU/NVBit execution script.
Defines and parses arguments for specifying the target program, its arguments, and options like dry run, mangled names, and histogram generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
None
|
Uses defined CLI options. |
required |
Returns:
| Type | Description |
|---|---|
|
argparse.Namespace: Parsed arguments including: program (str): Program to profile. args (List[str]): Arguments to the program. dry_run (bool): Print commands without running tools. mangled (bool): Use mangled kernel names. histogram (bool): Generate metric plots. |
Source code in python-scripts/ncu_exec.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
plot_l1_metrics(results, basename)
Create and save plots summarizing L1 cache metrics across kernels.
Generates a PNG image file containing bar plots for L1 Utilization, L1 Lifetime, and L1 Refreshes (calculated based on a fixed retention time).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
DataFrame
|
DataFrame containing computed metrics for all kernels
(output of |
required |
basename
|
str
|
The base path and filename prefix for the output PNG file
(e.g., |
required |
Returns:
| Name | Type | Description |
|---|---|---|
None |
Saves the plot to a file. |
Source code in python-scripts/ncu_exec.py
345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 | |
plot_l2_metrics(results, basename)
Create and save plots summarizing L2 cache metrics across kernels.
Generates a PNG image file containing bar plots for L2 Utilization and L2 Lifetime. Adds a horizontal line indicating the assumed retention time on the Lifetime plot.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
DataFrame
|
DataFrame containing computed metrics for all kernels
(output of |
required |
basename
|
str
|
The base path and filename prefix for the output PNG file
(e.g., |
required |
Returns:
| Name | Type | Description |
|---|---|---|
None |
Saves the plot to a file. |
Source code in python-scripts/ncu_exec.py
392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 | |
read_nvbit_data(nvbit_input_file, kernel_count)
Read and parse the NVBit log file to extract unique sector counts and kernel names.
Parses the .nvbit output file generated by the custom NVBit tool. It extracts
the number of unique L1 and L2 cache sectors accessed by each kernel, as well
as the mangled and unmangled names for each kernel ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nvbit_input_file
|
str
|
Path to the NVBit log file ( |
required |
kernel_count
|
int
|
The expected number of kernels (obtained from NCU report). |
required |
Returns:
| Type | Description |
|---|---|
|
Tuple[List[List[int]], List[str], List[str]]: A tuple containing:
- List[List[int]]: A list where each inner list contains |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If |
AssertionError
|
If the kernel count reported in the NVBit log does not match |
Source code in python-scripts/ncu_exec.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | |
Parser for Accel-Sim Trace Files
The accel_sim_parser.py script is responsible for parsing the trace files generated by the Accel-Sim simulator. It extracts relevant information about data lifetime, read and write operations, and other performance metrics from the trace files.
Module for parsing Accel-Sim simulation logs and generating cache lifetime statistics.
This module provides classes and functions to process GPGPU-Sim simulation logs (specifically the cache access logs generated when running Accel-Sim) to calculate cache line lifetime metrics. It parses log lines, tracks cache line residency, computes lifetimes, and aggregates statistics per kernel and across the entire run. It outputs results in CSV format.
LifetimeType
Bases: object
Represents the lifetime of a cache line (sector) at a specific address.
Stores the start and end simulation cycles for a cache line's residency.
Attributes:
| Name | Type | Description |
|---|---|---|
address |
int
|
The memory address of the cache line. |
start |
int
|
The simulation cycle when the cache line entered the cache. |
end |
Optional[int]
|
The simulation cycle when the cache line was evicted or the last cycle it was accessed before the simulation ended. Initially None. |
Source code in python-scripts/accel_sim_parser.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
__dict__()
Return a dictionary representation of the lifetime entry.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary with 'address' (hex), 'start', and 'end' keys. |
Source code in python-scripts/accel_sim_parser.py
76 77 78 79 80 81 82 83 84 85 86 87 88 | |
__init__(address, start, end)
Initialize a LifetimeType object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
address
|
int
|
The memory address. |
required |
start
|
int
|
The start cycle. |
required |
end
|
Optional[int]
|
The end cycle (can be None initially). |
required |
Source code in python-scripts/accel_sim_parser.py
63 64 65 66 67 68 69 70 71 72 73 74 | |
calculate_lifetime()
Calculate the duration of the cache line lifetime in cycles.
Returns:
| Name | Type | Description |
|---|---|---|
int |
The difference between the end and start cycles. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
Source code in python-scripts/accel_sim_parser.py
90 91 92 93 94 95 96 97 98 99 100 | |
SimulationParser
Parses Accel-Sim cache log lines for a single kernel to calculate lifetime statistics.
Processes log lines associated with one kernel launch, tracks cache line entries and exits based on load/store operations and cache status (hit/miss), considering the configured cache policies (write-allocate, write-back). It calculates the lifetime for each cache line instance and aggregates statistics.
Attributes:
| Name | Type | Description |
|---|---|---|
GPU_FREQ |
int
|
GPU frequency in MHz. |
CYCLE_TIME |
float
|
GPU cycle time in nanoseconds. |
log_path |
str
|
Path to the log directory. |
log_file |
str
|
Base name of the log file. |
kernel_name |
str
|
Name of the kernel being processed. |
kernel_id |
int
|
ID of the kernel being processed. |
sim_cycles |
int
|
Simulation cycles for this kernel. |
sim_insn |
int
|
Instructions executed by this kernel. |
ipc |
float
|
Instructions per cycle for this kernel. |
total_sim_cycles |
int
|
Total simulation cycles up to this kernel. |
total_sim_insn |
int
|
Total instructions executed up to this kernel. |
log_lines |
list[str]
|
Log lines from |
sector_size |
int
|
Cache line size in bytes. |
l1_size |
int
|
L1 cache size in bytes. |
l1_cache_lines |
float
|
Number of cache lines in L1. |
l2_size |
int
|
L2 cache size in bytes. |
l2_cache_lines |
float
|
Number of cache lines in L2. |
l1_lifetimes |
list[LifetimeType]
|
List of completed L1 lifetime objects. |
l2_lifetimes |
list[LifetimeType]
|
List of completed L2 lifetime objects. |
l1_most_recent_read |
dict[int, int]
|
Maps L1 address to the cycle of its most recent read hit. |
l2_most_recent_read |
dict[int, int]
|
Maps L2 address to the cycle of its most recent read hit. |
l1_current_lifetime_index |
dict[int, int]
|
Maps L1 address to the index of its currently active lifetime in the internal list. |
l2_current_lifetime_index |
dict[int, int]
|
Maps L2 address to the index of its currently active lifetime in the internal list. |
l1_lifetime_cycles |
ndarray
|
Array of completed L1 lifetimes in cycles. |
l1_lifetime_ns |
ndarray
|
Array of completed L1 lifetimes in nanoseconds. |
l2_lifetime_cycles |
ndarray
|
Array of completed L2 lifetimes in cycles. |
l2_lifetime_ns |
ndarray
|
Array of completed L2 lifetimes in nanoseconds. |
l1_read_count |
int
|
Total L1 read operations. |
l1_write_count |
int
|
Total L1 write operations. |
l2_read_count |
int
|
Total L2 read operations. |
l2_write_count |
int
|
Total L2 write operations. |
l1_read_cycles |
list[int]
|
List of unique cycles with L1 reads. |
l1_write_cycles |
list[int]
|
List of unique cycles with L1 writes. |
l2_read_cycles |
list[int]
|
List of unique cycles with L2 reads. |
l2_write_cycles |
list[int]
|
List of unique cycles with L2 writes. |
l1_read_cycle_count |
int
|
Count of unique cycles with L1 reads. |
l1_write_cycle_count |
int
|
Count of unique cycles with L1 writes. |
l2_read_cycle_count |
int
|
Count of unique cycles with L2 reads. |
l2_write_cycle_count |
int
|
Count of unique cycles with L2 writes. |
l1_unique_addrs |
int
|
Count of unique addresses seen in L1. |
l2_unique_addrs |
int
|
Count of unique addresses seen in L2. |
l1_zero_count |
int
|
Count of L1 lifetimes calculated as zero or incomplete. |
l2_zero_count |
int
|
Count of L2 lifetimes calculated as zero or incomplete. |
l1_lifetimes_count |
int
|
Count of valid, non-zero L1 lifetimes calculated. |
l2_lifetimes_count |
int
|
Count of valid, non-zero L2 lifetimes calculated. |
l1_write_policy |
WritePolicy
|
L1 write policy enum. |
l1_write_allocation |
WriteAllocation
|
L1 write allocation enum. |
l2_write_policy |
WritePolicy
|
L2 write policy enum. |
l2_write_allocation |
WriteAllocation
|
L2 write allocation enum. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kernel
|
dict
|
Dictionary containing kernel metadata and log lines from |
required |
log_file_path
|
str
|
Path to the log directory. |
required |
log_file_base
|
str
|
Base name of the log files. |
required |
kernel_name
|
str
|
Actual kernel name. Defaults to None. |
None
|
config_file_path
|
str
|
Path to the GPGPU-Sim config file. Required. |
None
|
Raises:
| Type | Description |
|---|---|
AssertionError
|
If |
FileNotFoundError
|
If the config file cannot be read. |
Returns:
| Name | Type | Description |
|---|---|---|
None |
Initializes parser state for lifetime analysis. |
Source code in python-scripts/accel_sim_parser.py
352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 | |
__init__(kernel, log_file_path, log_file_base, kernel_name=None, config_file_path=None)
Initialize the SimulationParser for a specific kernel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kernel
|
dict
|
Dictionary containing kernel metadata and log lines from |
required |
log_file_path
|
str
|
Path to the log directory. |
required |
log_file_base
|
str
|
Base name of the log files. |
required |
kernel_name
|
str
|
Actual kernel name (e.g., from |
None
|
config_file_path
|
str
|
Path to the GPGPU-Sim config file. Required. |
None
|
Raises:
| Type | Description |
|---|---|
AssertionError
|
If |
FileNotFoundError
|
If the config file cannot be read by |
Source code in python-scripts/accel_sim_parser.py
427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 | |
coarse_grained_dict()
Generate a dictionary containing coarse-grained, kernel-level statistics.
Aggregates lifetime data (mean, median, 90th percentile, max) and combines
it with instruction statistics from get_instruction_stats() and cache
configuration details into a single dictionary summarizing the kernel's behavior.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing aggregated statistics for the kernel, including lifetime metrics (mean, median, etc. in microseconds), instruction counts and frequencies, utilization, cache configuration, and lifetime counts. Keys match the column names used for the coarse-grained CSV output. |
Source code in python-scripts/accel_sim_parser.py
893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 | |
fine_grained_df()
Create DataFrames containing fine-grained lifetime data for L1 and L2 caches.
Generates two pandas DataFrames, one for L1 and one for L2, containing individual cache line lifetime entries. Each row represents a completed lifetime instance.
Returns:
| Type | Description |
|---|---|
|
Tuple[pd.DataFrame, pd.DataFrame]: A tuple containing: - l1_df: DataFrame with columns 'kernel_id', 'address' (hex), 'lifetime_cycles', 'lifetime_ns'. - l2_df: DataFrame with columns 'kernel_id', 'address' (hex), 'lifetime_cycles', 'lifetime_ns'. |
Source code in python-scripts/accel_sim_parser.py
859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 | |
get_instruction_stats()
Calculate instruction frequencies and cache utilization statistics.
Computes L1/L2 load/store frequencies (in MHz) based on the number of unique cycles with corresponding operations and the total simulation cycles for the kernel. Also calculates overall load/store frequencies and L1/L2 cache utilization based on unique addresses accessed.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
A dictionary containing various statistics: - 'kernel_name', 'kernel_id' - 'l1_load_count', 'l1_store_count': Total L1 operations. - 'l1_read_cycles', 'l1_write_cycles': Unique cycles with L1 ops. - 'l1_load_frequency', 'l1_store_frequency': Frequencies in MHz. - 'l2_load_count', 'l2_store_count': Total L2 operations. - 'l2_read_cycles', 'l2_write_cycles': Unique cycles with L2 ops. - 'l2_load_frequency', 'l2_store_frequency': Frequencies in MHz. - 'l1_unique_addrs', 'l2_unique_addrs': Unique addresses accessed. - 'l1_utilization', 'l2_utilization': Cache utilization ratios. - 'load_count', 'store_count': Total unique cycles with loads/stores. - 'load_frequency', 'store_frequency': Overall frequencies in MHz. - 'l1_zero_count', 'l2_zero_count': Counts of zero/incomplete lifetimes. - 'l1_lifetimes_count', 'l2_lifetimes_count': Counts of valid lifetimes. |
Source code in python-scripts/accel_sim_parser.py
780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 | |
import_cache_states(cache_states)
Import cache state from the previous kernel to continue lifetime tracking.
Takes the state returned by parse_log_file from the previously processed
kernel and initializes the current parser instance with it. This allows
lifetimes spanning across kernel boundaries to be tracked correctly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_states
|
list
|
The list returned by |
required |
Returns:
| Name | Type | Description |
|---|---|---|
None |
Updates internal state ( |
Source code in python-scripts/accel_sim_parser.py
966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 | |
parse_log_file()
Parse all log lines for the kernel and finalize lifetime calculations.
Iterates through self.log_lines, calling process_line for each.
After processing all lines, it finalizes any remaining active lifetimes
using the l1_most_recent_read and l2_most_recent_read dictionaries.
Calculates lifetime durations in cycles and nanoseconds, storing them in
l1_lifetime_cycles, l1_lifetime_ns, etc. Updates final counts for
unique addresses and zero/incomplete lifetimes.
Returns:
| Name | Type | Description |
|---|---|---|
list |
list
|
A list containing the state needed to continue parsing for the
next kernel: |
Source code in python-scripts/accel_sim_parser.py
671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 | |
process_cycle(line)
Extract the simulation cycle number from a log line.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
line
|
str
|
A log line containing 'Cycle |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The extracted cycle number. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the cycle number cannot be parsed as an integer. |
IndexError
|
If the line format is unexpected. |
Source code in python-scripts/accel_sim_parser.py
507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 | |
process_line(line)
Process a single cache log line to update lifetime tracking.
Parses the line to identify L1 or L2 cache accesses, address, cycle,
status (hit/miss), and operation type (load/store). Updates the internal
lifetime tracking structures (l1_lifetimes, l2_lifetimes,
l1_current_lifetime_index, l2_current_lifetime_index,
l1_most_recent_read, l2_most_recent_read) based on the access and
configured cache policies. Also updates read/write counters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
line
|
str
|
The cache log line to process. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
None |
Modifies internal state. |
Source code in python-scripts/accel_sim_parser.py
525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 | |
WriteAllocation
Bases: Enum
Cache write allocation policies.
Source code in python-scripts/accel_sim_parser.py
37 38 39 40 | |
WritePolicy
Bases: Enum
Cache write policies.
Source code in python-scripts/accel_sim_parser.py
43 44 45 46 | |
convert_size(size_str)
Convert a size string (e.g., "128KB", "50MB") to bytes.
Parses strings with suffixes KB, MB, GB (case-insensitive) and returns the equivalent size in bytes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
size_str
|
str
|
The size string to convert (e.g., "256KB", "1GB"). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The size in bytes. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the string format or suffix is invalid. |
Source code in python-scripts/accel_sim_parser.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | |
find_and_append(kernels, current_index, key, line, dtype=int)
Find a key-value pair in a log line and append it to the current kernel's data.
Searches for 'line. If found, converts the value
to the specified dtype and adds it to the dictionary at kernels[current_index]
with the given key. Handles potential errors during conversion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kernels
|
list
|
A list of dictionaries, where each dictionary holds data for a kernel. |
required |
current_index
|
int
|
The index in the |
required |
key
|
str
|
The key string to search for in the line (e.g., "gpu_sim_cycle"). |
required |
line
|
str
|
The log line to parse. |
required |
dtype
|
type
|
The data type to convert the found value to (e.g., int, float). Defaults to int. |
int
|
Returns:
| Name | Type | Description |
|---|---|---|
None |
Modifies the |
Source code in python-scripts/accel_sim_parser.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | |
get_cache_policy(line)
Extract cache write policy and allocation policy from a GPGPU-Sim config line.
Parses a GPGPU-Sim configuration line (e.g., starting with '-gpgpu_cache:dl1') to determine the cache's write policy (Write-Back/Write-Through) and write allocation policy (Write-Allocate/No-Write-Allocate).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
line
|
str
|
A configuration line string (e.g., "-gpgpu_cache:dl1 S:4:128:256,L:T:m:L:L,A:384:48,16:0,32"). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
A dictionary containing: - 'write_policy' (WritePolicy): The parsed write policy enum. - 'write_allocation' (WriteAllocation): The parsed write allocation enum. |
Source code in python-scripts/accel_sim_parser.py
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 | |
read_cache_log(log_file_path, basename=None)
Read Accel-Sim cache and simulation logs, grouping lines by kernel.
Parses the .sim_cache.log and .sim.log files. It identifies kernel
launches and groups subsequent log lines belonging to each kernel. It also
extracts kernel metadata like name, ID, simulation cycles, instructions, IPC,
and detected cache configuration changes from the log lines.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log_file_path
|
str
|
The directory containing the log files. |
required |
basename
|
str
|
The base name of the log files (e.g., "program_timestamp"). |
None
|
Returns:
| Type | Description |
|---|---|
|
list[dict]: A list of dictionaries. Each dictionary represents a kernel and contains:
- 'kernel_name' (str): The name of the kernel.
- 'kernel_id' (int): The ID of the kernel.
- 'lines' (list[str]): A list of log lines associated with this kernel from |
Source code in python-scripts/accel_sim_parser.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | |
read_config_file(config_file_path=None)
Read L1 and L2 cache policies from a GPGPU-Sim configuration file.
Parses the specified configuration file (or the default configs/gpgpusim.config)
to find lines defining the L1 data cache (-gpgpu_cache:dl1) and L2 data cache
(-gpgpu_cache:dl2) and extracts their write and allocation policies using
get_cache_policy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_file_path
|
str
|
Path to the GPGPU-Sim config file.
Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
|
Tuple[dict, dict]: A tuple containing two dictionaries: - l1_config: Dictionary with 'write_policy' and 'write_allocation' for L1. - l2_config: Dictionary with 'write_policy' and 'write_allocation' for L2. Returns empty dictionaries if config lines are not found. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the specified or default config file does not exist. |
Source code in python-scripts/accel_sim_parser.py
312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 | |
read_function_names(log_file_path, basename)
Read kernel ID to kernel name mapping from the kernels.csv file.
Parses the kernels.csv file (typically generated by PKS or tracing)
to create a dictionary mapping kernel IDs (as integers) to their
corresponding names (as strings).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log_file_path
|
str
|
The directory containing the |
required |
basename
|
str
|
The base name of the log files (unused in this function, but kept for consistency). |
required |
Returns:
| Type | Description |
|---|---|
|
dict[int, str]: A dictionary mapping kernel IDs to kernel names. Returns
an empty dictionary if |
Source code in python-scripts/accel_sim_parser.py
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 | |
run_parser(log_file_path, log_file_base, config_file_path)
Main function to run the simulation log parsing process for all kernels.
Reads the grouped kernel data using read_cache_log, iterates through each
kernel, creates a SimulationParser instance, imports state from the previous
kernel (if any), parses the current kernel's logs, and collects both
fine-grained (per-lifetime) and coarse-grained (per-kernel) statistics.
Finally, saves the aggregated statistics to CSV files using Dask DataFrames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log_file_path
|
str
|
Path to the directory containing the log files. |
required |
log_file_base
|
str
|
Base name of the log files (e.g., "program_timestamp"). |
required |
config_file_path
|
str
|
Path to the GPGPU-Sim configuration file used for simulation. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
None |
Generates CSV output files:
- |
Raises:
| Type | Description |
|---|---|
SystemExit
|
If no kernels are found in the log file. |
Source code in python-scripts/accel_sim_parser.py
1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 | |