Originally, Go object files were Plan 9 object files, but no longer. Now they are more like standard object files, in that each symbol is defined by an associated memory image (bytes) and a list of relocations to apply during linking. We do not (yet?) use a standard file format, however. For now, the format is chosen to be as simple as possible to read and write. It may change for reasons of efficiency, or we may even switch to a standard file format if there are compelling benefits to doing so. See golang.org/s/go13linker for more background.
The file format is:
- magic header: "\x00\x00go19ld" - byte 1 - version number - sequence of strings giving dependencies (imported packages) - empty string (marks end of sequence) - sequence of symbol references used by the defined symbols - byte 0xff (marks end of sequence) - sequence of integer lengths: - total data length - total number of relocations - total number of pcdata - total number of automatics - total number of funcdata - total number of files - data, the content of the defined symbols - sequence of defined symbols - byte 0xff (marks end of sequence) - magic footer: "\xff\xffgo19ld"
All integers are stored in a zigzag varint format. See golang.org/s/go12symtab for a definition.
Data blocks and strings are both stored as an integer followed by that many bytes.
A symbol reference is a string name followed by a version.
A symbol points to other symbols using an index into the symbol reference sequence. Index 0 corresponds to a nil symbol pointer. In the symbol layout described below "symref index" stands for this index.
Each symbol is laid out as the following fields:
- byte 0xfe (sanity check for synchronization) - type [byte] - name & version [symref index] - flags [int] 1<<0 dupok 1<<1 local 1<<2 add to typelink table - size [int] - gotype [symref index] - p [data block] - nr [int] - r [nr relocations, sorted by off]
If type == STEXT, there are a few more fields:
- args [int] - locals [int] - nosplit [int] - flags [int] 1<<0 leaf 1<<1 C function 1<<2 function may call reflect.Type.Method 1<<3 function compiled with -shared - nlocal [int] - local [nlocal automatics] - pcln [pcln table]
Each relocation has the encoding:
- off [int] - siz [int] - type [int] - add [int] - sym [symref index]
Each local has the encoding:
- asym [symref index] - offset [int] - type [int] - gotype [symref index]
The pcln table has the encoding:
- pcsp [data block] - pcfile [data block] - pcline [data block] - pcinline [data block] - npcdata [int] - pcdata [npcdata data blocks] - nfuncdata [int] - funcdata [nfuncdata symref index] - funcdatasym [nfuncdata ints] - nfile [int] - file [nfile symref index] - ninlinedcall [int] - inlinedcall [ninlinedcall int symref int symref]
The file layout and meaning of type integers are architecture-independent.
TODO(rsc): The file format is good for a first pass but needs work.
- There are SymID in the object file that should really just be strings.
const ( PCDATA_StackMapIndex = 0 PCDATA_InlTreeIndex = 1 PCDATA_RegMapIndex = 2 FUNCDATA_ArgsPointerMaps = 0 FUNCDATA_LocalsPointerMaps = 1 FUNCDATA_InlTree = 2 FUNCDATA_RegPointerMaps = 3 // ArgsSizeUnknown is set in Func.argsize to mark all functions // whose argument size is unknown (C vararg functions, and // assembly code without an explicit specification). // This value is generated by the compiler, assembler, or linker. ArgsSizeUnknown = -0x80000000 )
const ( STACKSYSTEM = 0 StackSystem = STACKSYSTEM StackBig = 4096 StackGuard = 880*stackGuardMultiplier + StackSystem StackSmall = 128 StackLimit = StackGuard - StackSystem - StackSmall )
const ( KindBool = 1 + iota KindInt KindInt8 KindInt16 KindInt32 KindInt64 KindUint KindUint8 KindUint16 KindUint32 KindUint64 KindUintptr KindFloat32 KindFloat64 KindComplex64 KindComplex128 KindArray KindChan KindFunc KindInterface KindMap KindPtr KindSlice KindString KindStruct KindUnsafePointer KindDirectIface = 1 << 5 KindGCProg = 1 << 6 KindNoPointers = 1 << 7 KindMask = (1 << 5) - 1 )
const ( StackPreempt = -1314 // 0xfff...fade )
var ( GOROOT = envOr("GOROOT", defaultGOROOT) GOARCH = envOr("GOARCH", defaultGOARCH) GOOS = envOr("GOOS", defaultGOOS) GO386 = envOr("GO386", defaultGO386) GOARM = goarm() GOMIPS = gomips() GOMIPS64 = gomips64() Version = version )
AbsFile returns the absolute filename for file in the given directory. It also removes a leading pathPrefix, or else rewrites a leading $GOROOT prefix to the literal "$GOROOT". If the resulting path is the empty string, the result is "??".
func Flagparse(usage func())
PathToPrefix converts raw string to the prefix that will be used in the symbol table. All control characters, space, '%' and '"', as well as non-7-bit clean bytes turn into %xx. The period needs escaping only in the last segment of the path, and it makes for happier users if we escape that as little as possible.
WorkingDir returns the current working directory (or "/???" if the directory cannot be identified), with "/" as separator.
A FuncID identifies particular functions that need to be treated specially by the runtime. Note that in some situations involving plugins, there may be multiple copies of a particular special runtime function. Note: this list must match the list in runtime/symtab.go.
const ( FuncID_normal FuncID = iota // not a special function FuncID_runtime_main FuncID_goexit FuncID_jmpdefer FuncID_mcall FuncID_morestack FuncID_mstart FuncID_rt0_go FuncID_asmcgocall FuncID_sigpanic FuncID_runfinq FuncID_gcBgMarkWorker FuncID_systemstack_switch FuncID_systemstack FuncID_cgocallback_gofunc FuncID_gogo FuncID_externalthreadhandler FuncID_debugCallV1 )
HeadType is the executable header type.
const ( R_ADDR RelocType = 1 + iota // R_ADDRPOWER relocates a pair of "D-form" instructions (instructions with 16-bit // immediates in the low half of the instruction word), usually addis followed by // another add or a load, inserting the "high adjusted" 16 bits of the address of // the referenced symbol into the immediate field of the first instruction and the // low 16 bits into that of the second instruction. R_ADDRPOWER // R_ADDRARM64 relocates an adrp, add pair to compute the address of the // referenced symbol. R_ADDRARM64 // R_ADDRMIPS (only used on mips/mips64) resolves to the low 16 bits of an external // address, by encoding it into the instruction. R_ADDRMIPS // R_ADDROFF resolves to a 32-bit offset from the beginning of the section // holding the data being relocated to the referenced symbol. R_ADDROFF // R_WEAKADDROFF resolves just like R_ADDROFF but is a weak relocation. // A weak relocation does not make the symbol it refers to reachable, // and is only honored by the linker if the symbol is in some other way // reachable. R_WEAKADDROFF R_SIZE R_CALL R_CALLARM R_CALLARM64 R_CALLIND R_CALLPOWER // R_CALLMIPS (only used on mips64) resolves to non-PC-relative target address // of a CALL (JAL) instruction, by encoding the address into the instruction. R_CALLMIPS R_CONST R_PCREL // R_TLS_LE, used on 386, amd64, and ARM, resolves to the offset of the // thread-local symbol from the thread local base and is used to implement the // "local exec" model for tls access (r.Sym is not set on intel platforms but is // set to a TLS symbol -- runtime.tlsg -- in the linker when externally linking). R_TLS_LE // R_TLS_IE, used 386, amd64, and ARM resolves to the PC-relative offset to a GOT // slot containing the offset from the thread-local symbol from the thread local // base and is used to implemented the "initial exec" model for tls access (r.Sym // is not set on intel platforms but is set to a TLS symbol -- runtime.tlsg -- in // the linker when externally linking). R_TLS_IE R_GOTOFF R_PLT0 R_PLT1 R_PLT2 R_USEFIELD // R_USETYPE resolves to an *rtype, but no relocation is created. The // linker uses this as a signal that the pointed-to type information // should be linked into the final binary, even if there are no other // direct references. (This is used for types reachable by reflection.) R_USETYPE // R_METHODOFF resolves to a 32-bit offset from the beginning of the section // holding the data being relocated to the referenced symbol. // It is a variant of R_ADDROFF used when linking from the uncommonType of a // *rtype, and may be set to zero by the linker if it determines the method // text is unreachable by the linked program. R_METHODOFF R_POWER_TOC R_GOTPCREL // R_JMPMIPS (only used on mips64) resolves to non-PC-relative target address // of a JMP instruction, by encoding the address into the instruction. // The stack nosplit check ignores this since it is not a function call. R_JMPMIPS // R_DWARFSECREF resolves to the offset of the symbol from its section. // Target of relocation must be size 4 (in current implementation). R_DWARFSECREF // R_DWARFFILEREF resolves to an index into the DWARF .debug_line // file table for the specified file symbol. Must be applied to an // attribute of form DW_FORM_data4. R_DWARFFILEREF // Set a MOV[NZ] immediate field to bits [15:0] of the offset from the thread // local base to the thread local variable defined by the referenced (thread // local) symbol. Error if the offset does not fit into 16 bits. R_ARM64_TLS_LE // Relocates an ADRP; LD64 instruction sequence to load the offset between // the thread local base and the thread local variable defined by the // referenced (thread local) symbol from the GOT. R_ARM64_TLS_IE // R_ARM64_GOTPCREL relocates an adrp, ld64 pair to compute the address of the GOT // slot of the referenced symbol. R_ARM64_GOTPCREL // R_POWER_TLS_LE is used to implement the "local exec" model for tls // access. It resolves to the offset of the thread-local symbol from the // thread pointer (R13) and inserts this value into the low 16 bits of an // instruction word. R_POWER_TLS_LE // R_POWER_TLS_IE is used to implement the "initial exec" model for tls access. It // relocates a D-form, DS-form instruction sequence like R_ADDRPOWER_DS. It // inserts to the offset of GOT slot for the thread-local symbol from the TOC (the // GOT slot is filled by the dynamic linker with the offset of the thread-local // symbol from the thread pointer (R13)). R_POWER_TLS_IE // R_POWER_TLS marks an X-form instruction such as "MOVD 0(R13)(R31*1), g" as // accessing a particular thread-local symbol. It does not affect code generation // but is used by the system linker when relaxing "initial exec" model code to // "local exec" model code. R_POWER_TLS // R_ADDRPOWER_DS is similar to R_ADDRPOWER above, but assumes the second // instruction is a "DS-form" instruction, which has an immediate field occupying // bits [15:2] of the instruction word. Bits [15:2] of the address of the // relocated symbol are inserted into this field; it is an error if the last two // bits of the address are not 0. R_ADDRPOWER_DS // R_ADDRPOWER_PCREL relocates a D-form, DS-form instruction sequence like // R_ADDRPOWER_DS but inserts the offset of the GOT slot for the referenced symbol // from the TOC rather than the symbol's address. R_ADDRPOWER_GOT // R_ADDRPOWER_PCREL relocates two D-form instructions like R_ADDRPOWER, but // inserts the displacement from the place being relocated to the address of the // relocated symbol instead of just its address. R_ADDRPOWER_PCREL // R_ADDRPOWER_TOCREL relocates two D-form instructions like R_ADDRPOWER, but // inserts the offset from the TOC to the address of the relocated symbol // rather than the symbol's address. R_ADDRPOWER_TOCREL // R_ADDRPOWER_TOCREL relocates a D-form, DS-form instruction sequence like // R_ADDRPOWER_DS but inserts the offset from the TOC to the address of the // relocated symbol rather than the symbol's address. R_ADDRPOWER_TOCREL_DS // R_PCRELDBL relocates s390x 2-byte aligned PC-relative addresses. // TODO(mundaym): remove once variants can be serialized - see issue 14218. R_PCRELDBL // R_ADDRMIPSU (only used on mips/mips64) resolves to the sign-adjusted "upper" 16 // bits (bit 16-31) of an external address, by encoding it into the instruction. R_ADDRMIPSU // R_ADDRMIPSTLS (only used on mips64) resolves to the low 16 bits of a TLS // address (offset from thread pointer), by encoding it into the instruction. R_ADDRMIPSTLS // R_ADDRCUOFF resolves to a pointer-sized offset from the start of the // symbol's DWARF compile unit. R_ADDRCUOFF // R_WASMIMPORT resolves to the index of the WebAssembly function import. R_WASMIMPORT )
go:generate stringer -type=RelocType
IsDirectJump returns whether r is a relocation for a direct jump. A direct jump is a CALL or JMP instruction that takes the target address as immediate. The address is embedded into the instruction, possibly with limited width. An indirect jump is a CALL or JMP instruction that takes the target address in register or memory.
A SymKind describes the kind of memory represented by a symbol.
const ( // An otherwise invalid zero value for the type Sxxx SymKind = iota // Executable instructions STEXT // Read only static data SRODATA // Static data that does not contain any pointers SNOPTRDATA // Static data SDATA // Statically data that is initially all 0s SBSS // Statically data that is initially all 0s and does not contain pointers SNOPTRBSS // Thread-local data that is initially all 0s STLSBSS // Debugging data SDWARFINFO SDWARFRANGE SDWARFLOC SDWARFMISC )
Defined SymKind values. These are used to index into cmd/link/internal/sym/AbiSymKindToSymKind
TODO(rsc): Give idiomatic Go names. go:generate stringer -type=SymKind