goperf

package module
v0.0.0-...-c6ef057 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 4, 2021 License: MIT Imports: 7 Imported by: 0

README

Performance counters for Go

Just a wrapper around perf_event_open.

Useful when you want to get performance counters for some code snippet.
Normally, you use 'perf' itself but it's not always possible to extract
some piece of code from your project and isolate it for performance counters.

Notes

  • CPUs have limited PMU registers. So, performance counters can be activated at the same time are limited. Some performance counters can be scheduled on specific PMU only. So, combination of some performance counters may not work or they might be multiplexed. Check out "Measurement Time" section in the report to see if it's multiplexed. (less than %100) . Also, not all counters are supported by Linux or your CPU. If you are surprised that some counters does not work, search scheduling algorithm of performance counters online.
  • This tool will measure all threads of the process, including gc threads.
  • This tool is direct translation from C version, just for experimental purposes. : https://github.com/tezc/sc/tree/master/perf

Config

To allow recording kernel events, you may need to run :

sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'

Usage :

package main

import (
	"github.com/tezc/goperf"
	"hash/crc32"
	"os"
)

func main() {

	var x uint32 = 0
	data := []byte("hello world!")

	goperf.Start()

	// long running code
	for i := 0; i < 10000000; i++ {
		x += crc32.ChecksumIEEE(data)
	}

	goperf.End()

	os.Exit(int(x))
}

Output :

| Event                     | Value              | Measurement time  
---------------------------------------------------------------
| time (seconds)            | 0.22               | (100,00%)  
| cpu-clock                 | 219,754,463.00     | (100.00%)  
| task-clock                | 219,754,469.00     | (100.00%)  
| page-faults               | 4.00               | (100.00%)  
| context-switches          | 0.00               | (100.00%)  
| cpu-migrations            | 0.00               | (100.00%)  
| page-fault-minor          | 4.00               | (100.00%)  
| cpu-cycles                | 911,266,695.00     | (100.00%)  
| instructions              | 2,300,365,430.00   | (100.00%)  
| cache-misses              | 14,812.00          | (100.00%)  
| L1D-read-miss             | 11,385.00          | (100.00%)  
| L1I-read-miss             | 48,424.00          | (100.00%)  
 

You can add or disable counters, check Counters table for the full list :

package main

import (
	"github.com/tezc/goperf"
	"hash/crc32"
	"os"
)

func main() {

	var x uint32 = 0
	data := []byte("hello world!")

	goperf.Disable("page-faults")
	goperf.Disable("context-switches")
	goperf.Enable("L1D-read-miss")
	goperf.Start()

	// long running code
	for i := 0; i < 10000000; i++ {
		x += crc32.ChecksumIEEE(data)
	}

	goperf.End()

	os.Exit(int(x))
}

Run multiple time :

package main

import (
	"github.com/tezc/goperf"
	"hash/crc32"
	"os"
)

func main() {

	var x uint32 = 0
	data := []byte("hello world!")
	
	goperf.Start()
	// long running code
	for i := 0; i < 10000000; i++ {
		x += crc32.ChecksumIEEE(data)
	}
	goperf.End()

	
	
	test := []byte("test!")

	goperf.Start()
	// long running code
	for i := 0; i < 10000000; i++ {
		x += crc32.ChecksumIEEE(test)
	}
	goperf.End()
	

	os.Exit(int(x))
}

Documentation

Index

Constants

This section is empty.

Variables

View Source
var Counters = [...]counter{
	{"cpu-clock", unix.PERF_TYPE_SOFTWARE, unix.PERF_COUNT_SW_CPU_CLOCK, true},
	{"task-clock", unix.PERF_TYPE_SOFTWARE, unix.PERF_COUNT_SW_TASK_CLOCK, true},
	{"page-faults", unix.PERF_TYPE_SOFTWARE, unix.PERF_COUNT_SW_PAGE_FAULTS, true},
	{"context-switches", unix.PERF_TYPE_SOFTWARE, unix.PERF_COUNT_SW_CONTEXT_SWITCHES, true},
	{"cpu-migrations", unix.PERF_TYPE_SOFTWARE, unix.PERF_COUNT_SW_CPU_MIGRATIONS, true},
	{"page-fault-minor", unix.PERF_TYPE_SOFTWARE, unix.PERF_COUNT_SW_PAGE_FAULTS_MIN, true},
	{"page-fault-major", unix.PERF_TYPE_SOFTWARE, unix.PERF_COUNT_SW_PAGE_FAULTS_MAJ, false},
	{"alignment-faults", unix.PERF_TYPE_SOFTWARE, unix.PERF_COUNT_SW_ALIGNMENT_FAULTS, false},
	{"emulation-faults", unix.PERF_TYPE_SOFTWARE, unix.PERF_COUNT_SW_EMULATION_FAULTS, false},
	{"cpu-cycles", unix.PERF_TYPE_HARDWARE, unix.PERF_COUNT_HW_CPU_CYCLES, true},
	{"instructions", unix.PERF_TYPE_HARDWARE, unix.PERF_COUNT_HW_INSTRUCTIONS, true},
	{"cache-references", unix.PERF_TYPE_HARDWARE, unix.PERF_COUNT_HW_CACHE_REFERENCES, false},
	{"cache-misses", unix.PERF_TYPE_HARDWARE, unix.PERF_COUNT_HW_CACHE_MISSES, true},
	{"branch-instructions", unix.PERF_TYPE_HARDWARE, unix.PERF_COUNT_HW_BRANCH_INSTRUCTIONS, false},
	{"branch-misses", unix.PERF_TYPE_HARDWARE, unix.PERF_COUNT_HW_BRANCH_MISSES, false},
	{"bus-cycles", unix.PERF_TYPE_HARDWARE, unix.PERF_COUNT_HW_BUS_CYCLES, false},
	{"stalled-cycles-frontend", unix.PERF_TYPE_HARDWARE, unix.PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, false},
	{"stalled-cycles-backend", unix.PERF_TYPE_HARDWARE, unix.PERF_COUNT_HW_STALLED_CYCLES_BACKEND, false},
	{"ref-cpu-cycles", unix.PERF_TYPE_HARDWARE, unix.PERF_COUNT_HW_REF_CPU_CYCLES, false},
	{"L1D-read-access", unix.PERF_TYPE_HW_CACHE, cache(pL1D, pREAD, pACCESS), false},
	{"L1D-read-miss", unix.PERF_TYPE_HW_CACHE, cache(pL1D, pREAD, pMISS), true},
	{"L1D-write-access", unix.PERF_TYPE_HW_CACHE, cache(pL1D, pWRITE, pACCESS), false},
	{"L1D-write-miss", unix.PERF_TYPE_HW_CACHE, cache(pL1D, pWRITE, pMISS), false},
	{"L1D-prefetch-access", unix.PERF_TYPE_HW_CACHE, cache(pL1D, pPREFETCH, pACCESS), false},
	{"L1D-prefetch-miss", unix.PERF_TYPE_HW_CACHE, cache(pL1D, pPREFETCH, pMISS), false},
	{"L1I-read-access", unix.PERF_TYPE_HW_CACHE, cache(pL1I, pREAD, pACCESS), false},
	{"L1I-read-miss", unix.PERF_TYPE_HW_CACHE, cache(pL1I, pREAD, pMISS), true},
	{"L1I-write-access", unix.PERF_TYPE_HW_CACHE, cache(pL1I, pWRITE, pACCESS), false},
	{"L1I-write-miss", unix.PERF_TYPE_HW_CACHE, cache(pL1I, pWRITE, pMISS), false},
	{"L1I-prefetch-access", unix.PERF_TYPE_HW_CACHE, cache(pL1I, pPREFETCH, pACCESS), false},
	{"L1I-prefetch-miss", unix.PERF_TYPE_HW_CACHE, cache(pL1I, pPREFETCH, pMISS), false},
	{"LL-read-access", unix.PERF_TYPE_HW_CACHE, cache(pLL, pREAD, pACCESS), false},
	{"LL-read-miss", unix.PERF_TYPE_HW_CACHE, cache(pLL, pREAD, pMISS), false},
	{"LL-write-access", unix.PERF_TYPE_HW_CACHE, cache(pLL, pWRITE, pACCESS), false},
	{"LL-write-miss", unix.PERF_TYPE_HW_CACHE, cache(pLL, pWRITE, pMISS), false},
	{"LL-prefetch-access", unix.PERF_TYPE_HW_CACHE, cache(pLL, pPREFETCH, pACCESS), false},
	{"LL-prefetch-miss", unix.PERF_TYPE_HW_CACHE, cache(pLL, pPREFETCH, pMISS), false},
	{"DTLB-read-access", unix.PERF_TYPE_HW_CACHE, cache(pDTLB, pREAD, pACCESS), false},
	{"DTLB-read-miss", unix.PERF_TYPE_HW_CACHE, cache(pDTLB, pREAD, pMISS), false},
	{"DTLB-write-access", unix.PERF_TYPE_HW_CACHE, cache(pDTLB, pWRITE, pACCESS), false},
	{"DTLB-write-miss", unix.PERF_TYPE_HW_CACHE, cache(pDTLB, pWRITE, pMISS), false},
	{"DTLB-prefetch-access", unix.PERF_TYPE_HW_CACHE, cache(pDTLB, pPREFETCH, pACCESS), false},
	{"DTLB-prefetch-miss", unix.PERF_TYPE_HW_CACHE, cache(pDTLB, pPREFETCH, pMISS), false},
	{"ITLB-read-access", unix.PERF_TYPE_HW_CACHE, cache(pITLB, pREAD, pACCESS), false},
	{"ITLB-read-miss", unix.PERF_TYPE_HW_CACHE, cache(pITLB, pREAD, pMISS), false},
	{"ITLB-write-access", unix.PERF_TYPE_HW_CACHE, cache(pITLB, pWRITE, pACCESS), false},
	{"ITLB-write-miss", unix.PERF_TYPE_HW_CACHE, cache(pITLB, pWRITE, pMISS), false},
	{"ITLB-prefetch-access", unix.PERF_TYPE_HW_CACHE, cache(pITLB, pPREFETCH, pACCESS), false},
	{"ITLB-prefetch-miss", unix.PERF_TYPE_HW_CACHE, cache(pITLB, pPREFETCH, pMISS), false},
	{"BPU-read-access", unix.PERF_TYPE_HW_CACHE, cache(pBPU, pREAD, pACCESS), false},
	{"BPU-read-miss", unix.PERF_TYPE_HW_CACHE, cache(pBPU, pREAD, pMISS), false},
	{"BPU-write-access", unix.PERF_TYPE_HW_CACHE, cache(pBPU, pWRITE, pACCESS), false},
	{"BPU-write-miss", unix.PERF_TYPE_HW_CACHE, cache(pBPU, pWRITE, pMISS), false},
	{"BPU-prefetch-access", unix.PERF_TYPE_HW_CACHE, cache(pBPU, pPREFETCH, pACCESS), false},
	{"BPU-prefetch-miss", unix.PERF_TYPE_HW_CACHE, cache(pBPU, pPREFETCH, pMISS), false},
	{"NODE-read-access", unix.PERF_TYPE_HW_CACHE, cache(pNODE, pREAD, pACCESS), false},
	{"NODE-read-miss", unix.PERF_TYPE_HW_CACHE, cache(pNODE, pREAD, pMISS), false},
	{"NODE-write-access", unix.PERF_TYPE_HW_CACHE, cache(pNODE, pWRITE, pACCESS), false},
	{"NODE-write-miss", unix.PERF_TYPE_HW_CACHE, cache(pNODE, pWRITE, pMISS), false},
	{"NODE-prefetch-access", unix.PERF_TYPE_HW_CACHE, cache(pNODE, pPREFETCH, pACCESS), false},
	{"NODE-prefetch-miss", unix.PERF_TYPE_HW_CACHE, cache(pNODE, pPREFETCH, pMISS), false},
}

Functions

func Disable

func Disable(counter string)

Disable a counter by its name, check Counters array for the list

func Enable

func Enable(counter string)

Enable a counter by its name, check Counters array for the list Hardware counters are limited on your CPU (~7 these days). Some performance counters cannot be enabled at the same time. Unsupported counters (either by OS or your CPU) will fail. Some counters can be scheduled at the same PMU on the CPU, so they will be multiplexed. You can check measurement time in the output to see this is the case.

func End

func End()

func Pause

func Pause()

func Start

func Start()

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL