mgpusim

module
v2.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 17, 2022 License: MIT

README

MGPUSIM

Go Report Card Test Coverage

MGPUSim is a high-flexibility, high-performance, high-accuracy GPU simulator. It models GPUs that run the AMD GCN3 instruction sets. One main feature of MGPUSim is the support for multi-GPU simulation (you can still use it for single-GPU architecture research).

Communication

Slack: Slack

Discord: Discord Chat

Getting Started

  • Install the most recent version of Go from golang.org.
  • Clone this repository, assuming the path is [mgpusim_home].
  • Change your current directory to [mgpusim_home]/samples/fir.
  • Compile the simulator with the benchmark with go build. The compiler will generate an executed called fir (on Linux or Mac OS) or fir.exe (on Windows) for you.
  • Run the simulation with ./fir -timing --report-all to run the simulation.
  • Check the generated metrics.csv file for high-level metrics output.

Benchmark Support

AMD APP SDK DNN Mark HeteroMark Polybench Rodinia SHOC
Bitonic Sort MaxPooling AES ATAX Needleman-Wunsch BFS
Fast Walsh Transform ReLU FIR BICG FFT
Floyd-Warshall KMeans SPMV
Matrix Multiplication PageRank Stencil2D
Matrix Transpose
NBody
Simple Covolution

Default Performance Metrics Supported

You can run a simulation with the --report-all argument to enable all the performance metrics.

  • Total execution time
  • Total kernel time
  • Per-GPU kernel time
  • Instruction count on each Compute Unit
  • Average request latency on all the cache components
  • Number of read-misses, read-mshr-hits, read-hits, write-misses, write-mshr-hits, and write hits on all the cache components
  • Number of incoming transactions and outgoing transactions on all the RDMA components.
  • Number of transactions on each DRAM controller.

How to Prepare Your Own Experiment

  • Create a new repository repo. Typically we create one repo for each project, which may contain multiple experiments.
  • Create a folder in your repo for each experiment. Run go init [git repo path]/[directory_name] to initialize the folder as a new go module. For example, if your git repository is hosted at https://gitlab.com/syifan/fancy_project and your experiment folder is named as exp1, your module path should be gitlab.com/syifan/fancy_project/exp1.
  • Copy all the files under the directory samples/experiment to your experiment folder. In the main.go file, change the benchmark and the problem size to run. Or you can use an argument to select which benchmark to run. The file runner.go, platform.go, r9nano.go, and shaderarray.go serve as configuration files. So you need to change them according to your need.
  • It is also possible to modify an existing component or adding a new component. You should copy the folder that includes the component you want to modify to your repo first. Then, modify the configuration scripts to link the system with your new component. You can try to add some print commands to see if your local component is used. Finally, you can start to modify the component code.

Contributing

  • If you find any bug related to the simulator (e.g., simulator is not accurately modeling some behavior or the simulator is not getting the correct emulation result), please raise an issue in the issue tab MGPUSim.
  • If you want a new feature (e.g., you need to implement some new instructions or you want to model some new components), please also raise an issue.
  • If you want to add a feature or fix a bug, create a merge request using the "Create merge request" button in the corresponding issue. Gitlab will create a branch for you and you can develop your code there. Feel free to commit often and push often as you do not need to be responsible for the coding quality of every commit.
  • When you are done with developing, click the "Mark as ready" button in the merge request. Someone will review your code and see if the code can be merged. If nobody responds you in 2 days, please notify us on Slack.
  • There is no particular style requirement other than the default Go style requirement. Please run gofmt, goimports, or goreturns before making your merge request ready. Also, running golangci-lint run in the root directory will point you out most of the styling errors.

Citation

If you use MGPUSim in your research, please cite our ISCA '19 paper.

@inproceedings{sun19mgpusim, 
    author = {Sun, Yifan and Baruah, Trinayan and Mojumder, Saiful A. and Dong, Shi and Gong, Xiang and Treadway, Shane and Bao, Yuhui and Hance, Spencer and McCardwell, Carter and Zhao, Vincent and Barclay, Harrison and Ziabari, Amir Kavyan and Chen, Zhongliang and Ubal, Rafael and Abell\'{a}n, Jos\'{e} L. and Kim, John and Joshi, Ajay and Kaeli, David}, 
    title = {MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization}, 
    year = {2019}, 
    isbn = {9781450366694}, 
    publisher = {Association for Computing Machinery}, 
    address = {New York, NY, USA}, 
    url = {https://doi.org/10.1145/3307650.3322230}, 
    doi = {10.1145/3307650.3322230}, 
    booktitle = {Proceedings of the 46th International Symposium on Computer Architecture}, 
    pages = {197–209}, 
    numpages = {13}, 
    keywords = {simulation, multi-GPU systems, memory management}, 
    location = {Phoenix, Arizona}, 
    series = {ISCA '19} 
}

Papers that use MGPUSim:

  • Dynamic GMMU Bypass for Address Translation in Multi-GPU Systems
  • Valkyrie: Leveraging Inter-TLB Locality to Enhance GPU Performance
  • MGPU-TSM: A Multi-GPU System with Truly Shared Memory
  • Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems
  • HALCONE: A Hardware-Level Timestamp-based Cache Coherence Scheme for Multi-GPU systems
  • Priority-Based PCIe Scheduling for Multi-Tenant Multi-GPU Systems
  • Exploiting Adaptive Data Compression to Improve Performance and Energy-efficiency of Compute Workloads in Multi-GPU Systems

License

MIT © Project Akita Developers.

Directories

Path Synopsis
Package benchmarks defines Benchmark interface.
Package benchmarks defines Benchmark interface.
amdappsdk/bitonicsort
Package bitonicsort implements the bitonicsort benchmark from AMDAPPSDK.
Package bitonicsort implements the bitonicsort benchmark from AMDAPPSDK.
amdappsdk/fastwalshtransform
Package fastwalshtransform implements the fastwalshtransform benchmark from AMDAPPSDK.
Package fastwalshtransform implements the fastwalshtransform benchmark from AMDAPPSDK.
amdappsdk/floydwarshall
Package floydwarshall implements the Floyd-Warshall benchmark from AMDAPPSDK.
Package floydwarshall implements the Floyd-Warshall benchmark from AMDAPPSDK.
amdappsdk/matrixmultiplication
Package matrixmultiplication implements the matrix multiplication benchmark from AMDAPPSDK.
Package matrixmultiplication implements the matrix multiplication benchmark from AMDAPPSDK.
amdappsdk/matrixtranspose
Package matrixtranspose implements the matrix transpose benchmark from AMDAPPSDK.
Package matrixtranspose implements the matrix transpose benchmark from AMDAPPSDK.
amdappsdk/nbody
Package nbody include the benchmark of NBody sample Derived from SDKSample base class
Package nbody include the benchmark of NBody sample Derived from SDKSample base class
amdappsdk/simpleconvolution
Package simpleconvolution implements the Simple Convolution benchmark from AMDAPPSDK.
Package simpleconvolution implements the Simple Convolution benchmark from AMDAPPSDK.
dnn/conv2d
Package conv2d defines a benchmark for the Convolutional Layer.
Package conv2d defines a benchmark for the Convolutional Layer.
dnn/gputraining
Package gputraining defines GPU training process.
Package gputraining defines GPU training process.
dnn/im2col
Package im2col defines a benchmark for the im2col operation.
Package im2col defines a benchmark for the im2col operation.
dnn/lenet
Package lenet implements lenet network training.
Package lenet implements lenet network training.
dnn/minerva
Package minerva implements minerva network training.
Package minerva implements minerva network training.
dnn/relu
Package relu implements the relu algorithm as a benchmark.
Package relu implements the relu algorithm as a benchmark.
dnn/tensor
Package tensor provides GPU tensor and tensor operation implementations.
Package tensor provides GPU tensor and tensor operation implementations.
dnn/vgg16
Package vgg16 implements VGG16 network training.
Package vgg16 implements VGG16 network training.
dnn/xor
Package xor implements a extremely simple network that can perform the xor operation.
Package xor implements a extremely simple network that can perform the xor operation.
heteromark/aes
Package aes implements the AES benchmark form Hetero-Mark.
Package aes implements the AES benchmark form Hetero-Mark.
heteromark/fir
Package fir implements the FIR benchmark form Hetero-Mark.
Package fir implements the FIR benchmark form Hetero-Mark.
heteromark/kmeans
Package kmeans implements the Kmeans benchmark form Hetero-Mark.
Package kmeans implements the Kmeans benchmark form Hetero-Mark.
heteromark/pagerank
Package pagerank implements the PageRank benchmark form Hetero-Mark.
Package pagerank implements the PageRank benchmark form Hetero-Mark.
matrix/csr
Package csr provides a csr matrix definition
Package csr provides a csr matrix definition
mccl
Package mccl provides a collective communication library implementation.
Package mccl provides a collective communication library implementation.
polybench/atax
Package atax implements the ATAX benchmark from Polybench.
Package atax implements the ATAX benchmark from Polybench.
polybench/bicg
Package bicg implements the bicg benchmark from Polybench.
Package bicg implements the bicg benchmark from Polybench.
rodinia/nw
Package nw defines the Needleman–Wunsch benchmark
Package nw defines the Needleman–Wunsch benchmark
shoc/bfs
Package bfs implements the bfs benchmark from the SHOC suite.
Package bfs implements the bfs benchmark from the SHOC suite.
shoc/fft
Package fft include the benchmark of Fourier
Package fft include the benchmark of Fourier
shoc/spmv
Package spmv include the benchmark of sparse matrix-vector matiplication.
Package spmv include the benchmark of sparse matrix-vector matiplication.
shoc/stencil2d
Package stencil2d implements the stencil2d benchmark from the SHOC suite.
Package stencil2d implements the stencil2d benchmark from the SHOC suite.
Package bitops defines commonly used bit operations
Package bitops defines commonly used bit operations
Package driver implements a GPU driver that interfaces the benchmarks with the simulator.
Package driver implements a GPU driver that interfaces the benchmarks with the simulator.
internal
Package internal provides support for the driver implementation.
Package internal provides support for the driver implementation.
Package emu emulates GCN3 instructions.
Package emu emulates GCN3 instructions.
Package insts provides the definition for GCN3 instructions.
Package insts provides the definition for GCN3 instructions.
Package kernels provides basic data definitions related to GPU kernels.
Package kernels provides basic data definitions related to GPU kernels.
Package protocol defines the common messages used in MGPUSim
Package protocol defines the common messages used in MGPUSim
samples
aes
bfs
fft
fir
nw
runner
Package runner defines how default benchmark samples are executed.
Package runner defines how default benchmark samples are executed.
xor
Package server defines a server that can receives commands from external applications.
Package server defines a server that can receives commands from external applications.
tests
timing
cp
Package cp defines the Command Processor component of a GCN3 GPU
Package cp defines the Command Processor component of a GCN3 GPU
cp/internal/dispatching
Package dispatching defines how work-groups and wavefronts are dispatched to compute units.
Package dispatching defines how work-groups and wavefronts are dispatched to compute units.
cp/internal/resource
Package resource manages the Compute Unit resources
Package resource manages the Compute Unit resources
cu
Package cu provides an implementation of detailed Compute Unit modeling.
Package cu provides an implementation of detailed Compute Unit modeling.
pagemigrationcontroller
Package pagemigrationcontroller provides an implementation of a PageMigrationController.
Package pagemigrationcontroller provides an implementation of a PageMigrationController.
rdma
Package rdma provides the implementation of an RDMA engine.
Package rdma provides the implementation of an RDMA engine.
rob
Package rob implemented an reorder buffer for memory requests.
Package rob implemented an reorder buffer for memory requests.
wavefront
Package wavefront defines concepts related to a wavefront.
Package wavefront defines concepts related to a wavefront.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL