compile

package
v0.9.2202 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 21, 2024 License: BSD-3-Clause Imports: 18 Imported by: 0

README

Introduction to the Go compiler

cmd/compile contains the main packages that form the Go compiler. The compiler may be logically split in four phases, which we will briefly describe alongside the list of packages that contain their code.

You may sometimes hear the terms "front-end" and "back-end" when referring to the compiler. Roughly speaking, these translate to the first two and last two phases we are going to list here. A third term, "middle-end", often refers to much of the work that happens in the second phase.

Note that the go/* family of packages, such as go/parser and go/types, are mostly unused by the compiler. Since the compiler was initially written in C, the go/* packages were developed to enable writing tools working with Go code, such as gofmt and vet. However, over time the compiler's internal APIs have slowly evolved to be more familiar to users of the go/* packages.

It should be clarified that the name "gc" stands for "Go compiler", and has little to do with uppercase "GC", which stands for garbage collection.

1. Parsing

  • cmd/compile/internal/syntax (lexer, parser, syntax tree)

In the first phase of compilation, source code is tokenized (lexical analysis), parsed (syntax analysis), and a syntax tree is constructed for each source file.

Each syntax tree is an exact representation of the respective source file, with nodes corresponding to the various elements of the source such as expressions, declarations, and statements. The syntax tree also includes position information which is used for error reporting and the creation of debugging information.

2. Type checking

  • cmd/compile/internal/types2 (type checking)

The types2 package is a port of go/types to use the syntax package's AST instead of go/ast.

3. IR construction ("noding")

  • cmd/compile/internal/types (compiler types)
  • cmd/compile/internal/ir (compiler AST)
  • cmd/compile/internal/noder (create compiler AST)

The compiler middle end uses its own AST definition and representation of Go types carried over from when it was written in C. All of its code is written in terms of these, so the next step after type checking is to convert the syntax and types2 representations to ir and types. This process is referred to as "noding."

Noding using a process called Unified IR, which builds a node representation using a serialized version of the typechecked code from step 2. Unified IR is also involved in import/export of packages and inlining.

4. Middle end

  • cmd/compile/internal/deadcode (dead code elimination)
  • cmd/compile/internal/inline (function call inlining)
  • cmd/compile/internal/devirtualize (devirtualization of known interface method calls)
  • cmd/compile/internal/escape (escape analysis)

Several optimization passes are performed on the IR representation: dead code elimination, (early) devirtualization, function call inlining, and escape analysis.

5. Walk

  • cmd/compile/internal/walk (order of evaluation, desugaring)

The final pass over the IR representation is "walk," which serves two purposes:

  1. It decomposes complex statements into individual, simpler statements, introducing temporary variables and respecting order of evaluation. This step is also referred to as "order."

  2. It desugars higher-level Go constructs into more primitive ones. For example, switch statements are turned into binary search or jump tables, and operations on maps and channels are replaced with runtime calls.

6. Generic SSA

  • cmd/compile/internal/ssa (SSA passes and rules)
  • cmd/compile/internal/ssagen (converting IR to SSA)

In this phase, IR is converted into Static Single Assignment (SSA) form, a lower-level intermediate representation with specific properties that make it easier to implement optimizations and to eventually generate machine code from it.

During this conversion, function intrinsics are applied. These are special functions that the compiler has been taught to replace with heavily optimized code on a case-by-case basis.

Certain nodes are also lowered into simpler components during the AST to SSA conversion, so that the rest of the compiler can work with them. For instance, the copy builtin is replaced by memory moves, and range loops are rewritten into for loops. Some of these currently happen before the conversion to SSA due to historical reasons, but the long-term plan is to move all of them here.

Then, a series of machine-independent passes and rules are applied. These do not concern any single computer architecture, and thus run on all GOARCH variants. These passes include dead code elimination, removal of unneeded nil checks, and removal of unused branches. The generic rewrite rules mainly concern expressions, such as replacing some expressions with constant values, and optimizing multiplications and float operations.

7. Generating machine code

  • cmd/compile/internal/ssa (SSA lowering and arch-specific passes)
  • cmd/internal/obj (machine code generation)

The machine-dependent phase of the compiler begins with the "lower" pass, which rewrites generic values into their machine-specific variants. For example, on amd64 memory operands are possible, so many load-store operations may be combined.

Note that the lower pass runs all machine-specific rewrite rules, and thus it currently applies lots of optimizations too.

Once the SSA has been "lowered" and is more specific to the target architecture, the final code optimization passes are run. This includes yet another dead code elimination pass, moving values closer to their uses, the removal of local variables that are never read from, and register allocation.

Other important pieces of work done as part of this step include stack frame layout, which assigns stack offsets to local variables, and pointer liveness analysis, which computes which on-stack pointers are live at each GC safe point.

At the end of the SSA generation phase, Go functions have been transformed into a series of obj.Prog instructions. These are passed to the assembler (cmd/internal/obj), which turns them into machine code and writes out the final object file. The object file will also contain reflect data, export data, and debugging information.

8. Tips

Getting Started
  • If you have never contributed to the compiler before, a simple way to begin can be adding a log statement or panic("here") to get some initial insight into whatever you are investigating.

  • The compiler itself provides logging, debugging and visualization capabilities, such as:

    $ go build -gcflags=-m=2                   # print optimization info, including inlining, escape analysis
    $ go build -gcflags=-d=ssa/check_bce/debug # print bounds check info
    $ go build -gcflags=-W                     # print internal parse tree after type checking
    $ GOSSAFUNC=Foo go build                   # generate ssa.html file for func Foo
    $ go build -gcflags=-S                     # print assembly
    $ go tool compile -bench=out.txt x.go      # print timing of compiler phases
    

    Some flags alter the compiler behavior, such as:

    $ go tool compile -h file.go               # panic on first compile error encountered
    $ go build -gcflags=-d=checkptr=2          # enable additional unsafe pointer checking
    

    There are many additional flags. Some descriptions are available via:

    $ go tool compile -h              # compiler flags, e.g., go build -gcflags='-m=1 -l'
    $ go tool compile -d help         # debug flags, e.g., go build -gcflags=-d=checkptr=2
    $ go tool compile -d ssa/help     # ssa flags, e.g., go build -gcflags=-d=ssa/prove/debug=2
    

    There are some additional details about -gcflags and the differences between go build vs. go tool compile in a section below.

  • In general, when investigating a problem in the compiler you usually want to start with the simplest possible reproduction and understand exactly what is happening with it.

Testing your changes
  • Be sure to read the Quickly testing your changes section of the Go Contribution Guide.

  • Some tests live within the cmd/compile packages and can be run by go test ./... or similar, but many cmd/compile tests are in the top-level test directory:

    $ go test cmd/internal/testdir                           # all tests in 'test' dir
    $ go test cmd/internal/testdir -run='Test/escape.*.go'   # test specific files in 'test' dir
    

    For details, see the testdir README. The errorCheck method in testdir_test.go is helpful for a description of the ERROR comments used in many of those tests.

    In addition, the go/types package from the standard library and cmd/compile/internal/types2 have shared tests in src/internal/types/testdata, and both type checkers should be checked if anything changes there.

  • The new application-based coverage profiling can be used with the compiler, such as:

    $ go install -cover -coverpkg=cmd/compile/... cmd/compile  # build compiler with coverage instrumentation
    $ mkdir /tmp/coverdir                                      # pick location for coverage data
    $ GOCOVERDIR=/tmp/coverdir go test [...]                   # use compiler, saving coverage data
    $ go tool covdata textfmt -i=/tmp/coverdir -o coverage.out # convert to traditional coverage format
    $ go tool cover -html coverage.out                         # view coverage via traditional tools
    
Juggling compiler versions
  • Many of the compiler tests use the version of the go command found in your PATH and its corresponding compile binary.

  • If you are in a branch and your PATH includes <go-repo>/bin, doing go install cmd/compile will build the compiler using the code from your branch and install it to the proper location so that subsequent go commands like go build or go test ./... will exercise your freshly built compiler.

  • toolstash provides a way to save, run, and restore a known good copy of the Go toolchain. For example, it can be a good practice to initially build your branch, save that version of the toolchain, then restore the known good version of the tools to compile your work-in-progress version of the compiler.

    Sample set up steps:

    $ go install golang.org/x/tools/cmd/toolstash@latest
    $ git clone https://go.googlesource.com/go
    $ cd go
    $ git checkout -b mybranch
    $ ./src/all.bash               # build and confirm good starting point
    $ export PATH=$PWD/bin:$PATH
    $ toolstash save               # save current tools
    

    After that, your edit/compile/test cycle can be similar to:

    <... make edits to cmd/compile source ...>
    $ toolstash restore && go install cmd/compile   # restore known good tools to build compiler
    <... 'go build', 'go test', etc. ...>           # use freshly built compiler
    
  • toolstash also allows comparing the installed vs. stashed copy of the compiler, such as if you expect equivalent behavior after a refactor. For example, to check that your changed compiler produces identical object files to the stashed compiler while building the standard library:

    $ toolstash restore && go install cmd/compile   # build latest compiler
    $ go build -toolexec "toolstash -cmp" -a -v std # compare latest vs. saved compiler
    
  • If versions appear to get out of sync (for example, with errors like linked object header mismatch with version strings like devel go1.21-db3f952b1f), you might need to do toolstash restore && go install cmd/... to update all the tools under cmd.

Additional helpful tools
  • compilebench benchmarks the speed of the compiler.

  • benchstat is the standard tool for reporting performance changes resulting from compiler modifications, including whether any improvements are statistically significant:

    $ go test -bench=SomeBenchmarks -count=20 > new.txt   # use new compiler
    $ toolstash restore                                   # restore old compiler
    $ go test -bench=SomeBenchmarks -count=20 > old.txt   # use old compiler
    $ benchstat old.txt new.txt                           # compare old vs. new
    
  • bent facilitates running a large set of benchmarks from various community Go projects inside a Docker container.

  • perflock helps obtain more consistent benchmark results, including by manipulating CPU frequency scaling settings on Linux.

  • view-annotated-file (from the community) overlays inlining, bounds check, and escape info back onto the source code.

  • godbolt.org is widely used to examine and share assembly output from many compilers, including the Go compiler. It can also compare assembly for different versions of a function or across Go compiler versions, which can be helpful for investigations and bug reports.

-gcflags and 'go build' vs. 'go tool compile'
  • -gcflags is a go command build flag. go build -gcflags=<args> passes the supplied <args> to the underlying compile invocation(s) while still doing everything that the go build command normally does (e.g., handling the build cache, modules, and so on). In contrast, go tool compile <args> asks the go command to invoke compile <args> a single time without involving the standard go build machinery. In some cases, it can be helpful to have fewer moving parts by doing go tool compile <args>, such as if you have a small standalone source file that can be compiled without any assistance from go build. In other cases, it is more convenient to pass -gcflags to a build command like go build, go test, or go install.

  • -gcflags by default applies to the packages named on the command line, but can use package patterns such as -gcflags='all=-m=1 -l', or multiple package patterns such as -gcflags='all=-m=1' -gcflags='fmt=-m=2'. For details, see the cmd/go documentation.

Further reading

To dig deeper into how the SSA package works, including its passes and rules, head to cmd/compile/internal/ssa/README.md.

Finally, if something in this README or the SSA README is unclear or if you have an idea for an improvement, feel free to leave a comment in issue 30074.

Documentation

Overview

Compile, typically invoked as “go tool compile,” compiles a single Go package comprising the files named on the command line. It then writes a single object file named for the basename of the first source file with a .o suffix. The object file can then be combined with other objects into a package archive or passed directly to the linker (“go tool link”). If invoked with -pack, the compiler writes an archive directly, bypassing the intermediate object file.

The generated files contain type information about the symbols exported by the package and about types used by symbols imported by the package from other packages. It is therefore not necessary when compiling client C of package P to read the files of P's dependencies, only the compiled output of P.

Command Line

Usage:

go tool compile [flags] file...

The specified files must be Go source files and all part of the same package. The same compiler is used for all target operating systems and architectures. The GOOS and GOARCH environment variables set the desired target.

Flags:

-D path
	Set relative path for local imports.
-I dir1 -I dir2
	Search for imported packages in dir1, dir2, etc,
	after consulting $GOROOT/pkg/$GOOS_$GOARCH.
-L
	Show complete file path in error messages.
-N
	Disable optimizations.
-S
	Print assembly listing to standard output (code only).
-S -S
	Print assembly listing to standard output (code and data).
-V
	Print compiler version and exit.
-asmhdr file
	Write assembly header to file.
-asan
	Insert calls to C/C++ address sanitizer.
-buildid id
	Record id as the build id in the export metadata.
-blockprofile file
	Write block profile for the compilation to file.
-c int
	Concurrency during compilation. Set 1 for no concurrency (default is 1).
-complete
	Assume package has no non-Go components.
-cpuprofile file
	Write a CPU profile for the compilation to file.
-dynlink
	Allow references to Go symbols in shared libraries (experimental).
-e
	Remove the limit on the number of errors reported (default limit is 10).
-goversion string
	Specify required go tool version of the runtime.
	Exits when the runtime go version does not match goversion.
-h
	Halt with a stack trace at the first error detected.
-importcfg file
	Read import configuration from file.
	In the file, set importmap, packagefile to specify import resolution.
-installsuffix suffix
	Look for packages in $GOROOT/pkg/$GOOS_$GOARCH_suffix
	instead of $GOROOT/pkg/$GOOS_$GOARCH.
-l
	Disable inlining.
-lang version
	Set language version to compile, as in -lang=go1.12.
	Default is current version.
-linkobj file
	Write linker-specific object to file and compiler-specific
	object to usual output file (as specified by -o).
	Without this flag, the -o output is a combination of both
	linker and compiler input.
-m
	Print optimization decisions. Higher values or repetition
	produce more detail.
-memprofile file
	Write memory profile for the compilation to file.
-memprofilerate rate
	Set runtime.MemProfileRate for the compilation to rate.
-msan
	Insert calls to C/C++ memory sanitizer.
-mutexprofile file
	Write mutex profile for the compilation to file.
-nolocalimports
	Disallow local (relative) imports.
-o file
	Write object to file (default file.o or, with -pack, file.a).
-p path
	Set expected package import path for the code being compiled,
	and diagnose imports that would cause a circular dependency.
-pack
	Write a package (archive) file rather than an object file
-race
	Compile with race detector enabled.
-s
	Warn about composite literals that can be simplified.
-shared
	Generate code that can be linked into a shared library.
-spectre list
	Enable spectre mitigations in list (all, index, ret).
-traceprofile file
	Write an execution trace to file.
-trimpath prefix
	Remove prefix from recorded source file paths.

Flags related to debugging information:

-dwarf
	Generate DWARF symbols.
-dwarflocationlists
	Add location lists to DWARF in optimized mode.
-gendwarfinl int
	Generate DWARF inline info records (default 2).

Flags to debug the compiler itself:

-E
	Debug symbol export.
-K
	Debug missing line numbers.
-d list
	Print debug information about items in list. Try -d help for further information.
-live
	Debug liveness analysis.
-v
	Increase debug verbosity.
-%
	Debug non-static initializers.
-W
	Debug parse tree after type checking.
-f
	Debug stack frames.
-i
	Debug line number stack.
-j
	Debug runtime-initialized variables.
-r
	Debug generated wrappers.
-w
	Debug type checking.

Compiler Directives

The compiler accepts directives in the form of comments. To distinguish them from non-directive comments, directives require no space between the comment opening and the name of the directive. However, since they are comments, tools unaware of the directive convention or of a particular directive can skip over a directive like any other comment.

Line directives come in several forms:

//line :line
//line :line:col
//line filename:line
//line filename:line:col
/*line :line*/
/*line :line:col*/
/*line filename:line*/
/*line filename:line:col*/

In order to be recognized as a line directive, the comment must start with //line or /*line followed by a space, and must contain at least one colon. The //line form must start at the beginning of a line. A line directive specifies the source position for the character immediately following the comment as having come from the specified file, line and column: For a //line comment, this is the first character of the next line, and for a /*line comment this is the character position immediately following the closing */. If no filename is given, the recorded filename is empty if there is also no column number; otherwise it is the most recently recorded filename (actual filename or filename specified by previous line directive). If a line directive doesn't specify a column number, the column is "unknown" until the next directive and the compiler does not report column numbers for that range. The line directive text is interpreted from the back: First the trailing :ddd is peeled off from the directive text if ddd is a valid number > 0. Then the second :ddd is peeled off the same way if it is valid. Anything before that is considered the filename (possibly including blanks and colons). Invalid line or column values are reported as errors.

Examples:

//line foo.go:10      the filename is foo.go, and the line number is 10 for the next line
//line C:foo.go:10    colons are permitted in filenames, here the filename is C:foo.go, and the line is 10
//line  a:100 :10     blanks are permitted in filenames, here the filename is " a:100 " (excluding quotes)
/*line :10:20*/x      the position of x is in the current file with line number 10 and column number 20
/*line foo: 10 */     this comment is recognized as invalid line directive (extra blanks around line number)

Line directives typically appear in machine-generated code, so that compilers and debuggers will report positions in the original input to the generator.

The line directive is a historical special case; all other directives are of the form //go:name, indicating that they are defined by the Go toolchain. Each directive must be placed its own line, with only leading spaces and tabs allowed before the comment. Each directive applies to the Go code that immediately follows it, which typically must be a declaration.

//go:noescape

The //go:noescape directive must be followed by a function declaration without a body (meaning that the function has an implementation not written in Go). It specifies that the function does not allow any of the pointers passed as arguments to escape into the heap or into the values returned from the function. This information can be used during the compiler's escape analysis of Go code calling the function.

//go:uintptrescapes

The //go:uintptrescapes directive must be followed by a function declaration. It specifies that the function's uintptr arguments may be pointer values that have been converted to uintptr and must be on the heap and kept alive for the duration of the call, even though from the types alone it would appear that the object is no longer needed during the call. The conversion from pointer to uintptr must appear in the argument list of any call to this function. This directive is necessary for some low-level system call implementations and should be avoided otherwise.

//go:noinline

The //go:noinline directive must be followed by a function declaration. It specifies that calls to the function should not be inlined, overriding the compiler's usual optimization rules. This is typically only needed for special runtime functions or when debugging the compiler.

//go:norace

The //go:norace directive must be followed by a function declaration. It specifies that the function's memory accesses must be ignored by the race detector. This is most commonly used in low-level code invoked at times when it is unsafe to call into the race detector runtime.

//go:nosplit

The //go:nosplit directive must be followed by a function declaration. It specifies that the function must omit its usual stack overflow check. This is most commonly used by low-level runtime code invoked at times when it is unsafe for the calling goroutine to be preempted.

//go:linkname localname [importpath.name]

The //go:linkname directive conventionally precedes the var or func declaration named by “localname“, though its position does not change its effect. This directive determines the object-file symbol used for a Go var or func declaration, allowing two Go symbols to alias the same object-file symbol, thereby enabling one package to access a symbol in another package even when this would violate the usual encapsulation of unexported declarations, or even type safety. For that reason, it is only enabled in files that have imported "unsafe".

It may be used in two scenarios. Let's assume that package upper imports package lower, perhaps indirectly. In the first scenario, package lower defines a symbol whose object file name belongs to package upper. Both packages contain a linkname directive: package lower uses the two-argument form and package upper uses the one-argument form. In the example below, lower.f is an alias for the function upper.g:

package upper
import _ "unsafe"
//go:linkname g
func g()

package lower
import _ "unsafe"
//go:linkname f upper.g
func f() { ... }

The linkname directive in package upper suppresses the usual error for a function that lacks a body. (That check may alternatively be suppressed by including a .s file, even an empty one, in the package.)

In the second scenario, package upper unilaterally creates an alias for a symbol in package lower. In the example below, upper.g is an alias for the function lower.f.

package upper
import _ "unsafe"
//go:linkname g lower.f
func g()

package lower
func f() { ... }

The declaration of lower.f may also have a linkname directive with a single argument, f. This is optional, but helps alert the reader that the function is accessed from outside the package.

//go:wasmimport importmodule importname

The //go:wasmimport directive is wasm-only and must be followed by a function declaration. It specifies that the function is provided by a wasm module identified by “importmodule“ and “importname“.

//go:wasmimport a_module f
func g()

The types of parameters and return values to the Go function are translated to Wasm according to the following table:

Go types        Wasm types
int32, uint32   i32
int64, uint64   i64
float32         f32
float64         f64
unsafe.Pointer  i32

Any other parameter types are disallowed by the compiler.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Main

func Main()

Types

This section is empty.

Directories

Path Synopsis
Package flag implements command-line flag parsing.
Package flag implements command-line flag parsing.
internal
abi
abt
arm
compare
Package compare contains code for generating comparison routines for structs, strings and interfaces.
Package compare contains code for generating comparison routines for structs, strings and interfaces.
devirtualize
Package devirtualize implements two "devirtualization" optimization passes:
Package devirtualize implements two "devirtualization" optimization passes:
gc
importer
package importer implements Import for gc-generated object files.
package importer implements Import for gc-generated object files.
inline/interleaved
Package interleaved implements the interleaved devirtualization and inlining pass.
Package interleaved implements the interleaved devirtualization and inlining pass.
ir
loopvar
Package loopvar applies the proper variable capture, according to experiment, flags, language version, etc.
Package loopvar applies the proper variable capture, according to experiment, flags, language version, etc.
pgo
pgo/internal/graph
Package graph represents a pprof profile as a directed graph.
Package graph represents a pprof profile as a directed graph.
rangefunc
Package rangefunc rewrites range-over-func to code that doesn't use range-over-funcs.
Package rangefunc rewrites range-over-func to code that doesn't use range-over-funcs.
rttype
Package rttype allows the compiler to share type information with the runtime.
Package rttype allows the compiler to share type information with the runtime.
ssa
ssa/_gen
The gen command generates Go code (in the parent directory) for all the architecture-specific opcodes, blocks, and rewrites.
The gen command generates Go code (in the parent directory) for all the architecture-specific opcodes, blocks, and rewrites.
types2
Package types declares the data types and implements the algorithms for type-checking of Go packages.
Package types declares the data types and implements the algorithms for type-checking of Go packages.
x86

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL