The highest tagged major version is v3.

mgpusim

module

v2.0.2 Latest Latest Go to latest Published: Feb 17, 2022 License: MIT

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

gitlab.com/akita/mgpusim/v2

Links

Open Source Insights

README ¶

MGPUSIM

MGPUSim is a high-flexibility, high-performance, high-accuracy GPU simulator. It models GPUs that run the AMD GCN3 instruction sets. One main feature of MGPUSim is the support for multi-GPU simulation (you can still use it for single-GPU architecture research).

Communication

Slack:

Discord:

Getting Started

Install the most recent version of Go from golang.org.
Clone this repository, assuming the path is [mgpusim_home].
Change your current directory to [mgpusim_home]/samples/fir.
Compile the simulator with the benchmark with go build. The compiler will generate an executed called fir (on Linux or Mac OS) or fir.exe (on Windows) for you.
Run the simulation with ./fir -timing --report-all to run the simulation.
Check the generated metrics.csv file for high-level metrics output.

Benchmark Support

AMD APP SDK	DNN Mark	HeteroMark	Polybench	Rodinia	SHOC
Bitonic Sort	MaxPooling	AES	ATAX	Needleman-Wunsch	BFS
Fast Walsh Transform	ReLU	FIR	BICG		FFT
Floyd-Warshall		KMeans			SPMV
Matrix Multiplication		PageRank			Stencil2D
Matrix Transpose
NBody
Simple Covolution

Default Performance Metrics Supported

You can run a simulation with the --report-all argument to enable all the performance metrics.

Total execution time
Total kernel time
Per-GPU kernel time
Instruction count on each Compute Unit
Average request latency on all the cache components
Number of read-misses, read-mshr-hits, read-hits, write-misses, write-mshr-hits, and write hits on all the cache components
Number of incoming transactions and outgoing transactions on all the RDMA components.
Number of transactions on each DRAM controller.

How to Prepare Your Own Experiment

Create a new repository repo. Typically we create one repo for each project, which may contain multiple experiments.
Create a folder in your repo for each experiment. Run go init [git repo path]/[directory_name] to initialize the folder as a new go module. For example, if your git repository is hosted at https://gitlab.com/syifan/fancy_project and your experiment folder is named as exp1, your module path should be gitlab.com/syifan/fancy_project/exp1.
Copy all the files under the directory samples/experiment to your experiment folder. In the main.go file, change the benchmark and the problem size to run. Or you can use an argument to select which benchmark to run. The file runner.go, platform.go, r9nano.go, and shaderarray.go serve as configuration files. So you need to change them according to your need.
It is also possible to modify an existing component or adding a new component. You should copy the folder that includes the component you want to modify to your repo first. Then, modify the configuration scripts to link the system with your new component. You can try to add some print commands to see if your local component is used. Finally, you can start to modify the component code.

Contributing

If you find any bug related to the simulator (e.g., simulator is not accurately modeling some behavior or the simulator is not getting the correct emulation result), please raise an issue in the issue tab MGPUSim.
If you want a new feature (e.g., you need to implement some new instructions or you want to model some new components), please also raise an issue.
If you want to add a feature or fix a bug, create a merge request using the "Create merge request" button in the corresponding issue. Gitlab will create a branch for you and you can develop your code there. Feel free to commit often and push often as you do not need to be responsible for the coding quality of every commit.
When you are done with developing, click the "Mark as ready" button in the merge request. Someone will review your code and see if the code can be merged. If nobody responds you in 2 days, please notify us on Slack.
There is no particular style requirement other than the default Go style requirement. Please run gofmt, goimports, or goreturns before making your merge request ready. Also, running golangci-lint run in the root directory will point you out most of the styling errors.

Citation

If you use MGPUSim in your research, please cite our ISCA '19 paper.

@inproceedings{sun19mgpusim, 
    author = {Sun, Yifan and Baruah, Trinayan and Mojumder, Saiful A. and Dong, Shi and Gong, Xiang and Treadway, Shane and Bao, Yuhui and Hance, Spencer and McCardwell, Carter and Zhao, Vincent and Barclay, Harrison and Ziabari, Amir Kavyan and Chen, Zhongliang and Ubal, Rafael and Abell\'{a}n, Jos\'{e} L. and Kim, John and Joshi, Ajay and Kaeli, David}, 
    title = {MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization}, 
    year = {2019}, 
    isbn = {9781450366694}, 
    publisher = {Association for Computing Machinery}, 
    address = {New York, NY, USA}, 
    url = {https://doi.org/10.1145/3307650.3322230}, 
    doi = {10.1145/3307650.3322230}, 
    booktitle = {Proceedings of the 46th International Symposium on Computer Architecture}, 
    pages = {197–209}, 
    numpages = {13}, 
    keywords = {simulation, multi-GPU systems, memory management}, 
    location = {Phoenix, Arizona}, 
    series = {ISCA '19} 
}

Papers that use MGPUSim:

Dynamic GMMU Bypass for Address Translation in Multi-GPU Systems
Valkyrie: Leveraging Inter-TLB Locality to Enhance GPU Performance
MGPU-TSM: A Multi-GPU System with Truly Shared Memory
Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems
HALCONE: A Hardware-Level Timestamp-based Cache Coherence Scheme for Multi-GPU systems
Priority-Based PCIe Scheduling for Multi-Tenant Multi-GPU Systems
Exploiting Adaptive Data Compression to Improve Performance and Energy-efficiency of Compute Workloads in Multi-GPU Systems

License

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
benchmarks Package benchmarks defines Benchmark interface.	Package benchmarks defines Benchmark interface.
amdappsdk/bitonicsort Package bitonicsort implements the bitonicsort benchmark from AMDAPPSDK.	Package bitonicsort implements the bitonicsort benchmark from AMDAPPSDK.
amdappsdk/fastwalshtransform Package fastwalshtransform implements the fastwalshtransform benchmark from AMDAPPSDK.	Package fastwalshtransform implements the fastwalshtransform benchmark from AMDAPPSDK.
amdappsdk/floydwarshall Package floydwarshall implements the Floyd-Warshall benchmark from AMDAPPSDK.	Package floydwarshall implements the Floyd-Warshall benchmark from AMDAPPSDK.
amdappsdk/matrixmultiplication Package matrixmultiplication implements the matrix multiplication benchmark from AMDAPPSDK.	Package matrixmultiplication implements the matrix multiplication benchmark from AMDAPPSDK.
amdappsdk/matrixtranspose Package matrixtranspose implements the matrix transpose benchmark from AMDAPPSDK.	Package matrixtranspose implements the matrix transpose benchmark from AMDAPPSDK.
amdappsdk/nbody Package nbody include the benchmark of NBody sample Derived from SDKSample base class	Package nbody include the benchmark of NBody sample Derived from SDKSample base class
amdappsdk/simpleconvolution Package simpleconvolution implements the Simple Convolution benchmark from AMDAPPSDK.	Package simpleconvolution implements the Simple Convolution benchmark from AMDAPPSDK.
dnn/conv2d Package conv2d defines a benchmark for the Convolutional Layer.	Package conv2d defines a benchmark for the Convolutional Layer.
dnn/gputraining Package gputraining defines GPU training process.	Package gputraining defines GPU training process.
dnn/im2col Package im2col defines a benchmark for the im2col operation.	Package im2col defines a benchmark for the im2col operation.
dnn/lenet Package lenet implements lenet network training.	Package lenet implements lenet network training.
dnn/minerva Package minerva implements minerva network training.	Package minerva implements minerva network training.
dnn/relu Package relu implements the relu algorithm as a benchmark.	Package relu implements the relu algorithm as a benchmark.
dnn/tensor Package tensor provides GPU tensor and tensor operation implementations.	Package tensor provides GPU tensor and tensor operation implementations.
dnn/vgg16 Package vgg16 implements VGG16 network training.	Package vgg16 implements VGG16 network training.
dnn/xor Package xor implements a extremely simple network that can perform the xor operation.	Package xor implements a extremely simple network that can perform the xor operation.
heteromark/aes Package aes implements the AES benchmark form Hetero-Mark.	Package aes implements the AES benchmark form Hetero-Mark.
heteromark/fir Package fir implements the FIR benchmark form Hetero-Mark.	Package fir implements the FIR benchmark form Hetero-Mark.
heteromark/kmeans Package kmeans implements the Kmeans benchmark form Hetero-Mark.	Package kmeans implements the Kmeans benchmark form Hetero-Mark.
heteromark/pagerank Package pagerank implements the PageRank benchmark form Hetero-Mark.	Package pagerank implements the PageRank benchmark form Hetero-Mark.
matrix/csr Package csr provides a csr matrix definition	Package csr provides a csr matrix definition
mccl Package mccl provides a collective communication library implementation.	Package mccl provides a collective communication library implementation.
polybench/atax Package atax implements the ATAX benchmark from Polybench.	Package atax implements the ATAX benchmark from Polybench.
polybench/bicg Package bicg implements the bicg benchmark from Polybench.	Package bicg implements the bicg benchmark from Polybench.
rodinia/nw Package nw defines the Needleman–Wunsch benchmark	Package nw defines the Needleman–Wunsch benchmark
shoc/bfs Package bfs implements the bfs benchmark from the SHOC suite.	Package bfs implements the bfs benchmark from the SHOC suite.
shoc/fft Package fft include the benchmark of Fourier	Package fft include the benchmark of Fourier
shoc/spmv Package spmv include the benchmark of sparse matrix-vector matiplication.	Package spmv include the benchmark of sparse matrix-vector matiplication.
shoc/stencil2d Package stencil2d implements the stencil2d benchmark from the SHOC suite.	Package stencil2d implements the stencil2d benchmark from the SHOC suite.
bitops Package bitops defines commonly used bit operations	Package bitops defines commonly used bit operations
driver Package driver implements a GPU driver that interfaces the benchmarks with the simulator.	Package driver implements a GPU driver that interfaces the benchmarks with the simulator.
internal Package internal provides support for the driver implementation.	Package internal provides support for the driver implementation.
emu Package emu emulates GCN3 instructions.	Package emu emulates GCN3 instructions.
insts Package insts provides the definition for GCN3 instructions.	Package insts provides the definition for GCN3 instructions.
gcn3disassembler
kernels Package kernels provides basic data definitions related to GPU kernels.	Package kernels provides basic data definitions related to GPU kernels.
protocol Package protocol defines the common messages used in MGPUSim	Package protocol defines the common messages used in MGPUSim
samples
aes
atax
bfs
bicg
bitonicsort
concurrentkernel
concurrentworkload
conv2d
fastwalshtransform
fft
fir
floydwarshall
im2col
kmeans
lenet
matrixmultiplication
matrixtranspose
memcopy
minerva
nbody
nw
pagerank
relu
runner Package runner defines how default benchmark samples are executed.	Package runner defines how default benchmark samples are executed.
server
simpleconvolution
spmv
stencil2d
vgg16
xor
server Package server defines a server that can receives commands from external applications.	Package server defines a server that can receives commands from external applications.
tests
acceptance
deterministic/empty_kernel
deterministic/memcopy
timing
cp Package cp defines the Command Processor component of a GCN3 GPU	Package cp defines the Command Processor component of a GCN3 GPU
cp/internal/dispatching Package dispatching defines how work-groups and wavefronts are dispatched to compute units.	Package dispatching defines how work-groups and wavefronts are dispatched to compute units.
cp/internal/resource Package resource manages the Compute Unit resources	Package resource manages the Compute Unit resources
cu Package cu provides an implementation of detailed Compute Unit modeling.	Package cu provides an implementation of detailed Compute Unit modeling.
pagemigrationcontroller Package pagemigrationcontroller provides an implementation of a PageMigrationController.	Package pagemigrationcontroller provides an implementation of a PageMigrationController.
rdma Package rdma provides the implementation of an RDMA engine.	Package rdma provides the implementation of an RDMA engine.
rob Package rob implemented an reorder buffer for memory requests.	Package rob implemented an reorder buffer for memory requests.
wavefront Package wavefront defines concepts related to a wavefront.	Package wavefront defines concepts related to a wavefront.