ngaro: github.com/db47h/ngaro/asm Index | Examples | Files

package asm

import "github.com/db47h/ngaro/asm"

Package asm provides utility functions to assemble and disassemble Ngaro VM code.

Supported assembler mnemonics:

TOS is the value on top of the data stack. NOS is the next value on the data stack.
Instructions with a check mark in the "arg" column expect an argument in the cell
following them.

opcode	asm	alias	arg	stack	description
------	---	-----	---	-----	------------------------------------------------------------------------
0	nop				no-op
1	lit		✓	-n	push the value in the following memory location to the data stack.
2	dup			n-nn	duplicate TOS
3	drop			n-	drop TOS
4	swap			xy-yx	swap TOS and NOS
5	push			n-	push TOS to address stack
6	pop			-n	pop value on top of address stack and place it on TOS
7	loop		✓	n-?	decrement TOS. If >0 jump to address in next cell, else drop TOS and do nothing
8	jump	jmp	✓		jump to address in next cell
9	;	ret			return: pop address from address stack, add 1 and jump to it.
10	>jump	jgt	✓	xy-	jump to address in next cell if NOS > TOS
11	<jump	jlt	✓	xy-	jump to address in next cell if NOS < TOS
12	!jump	jne	✓	xy-	jump to address in next cell if NOS != TOS
13	=jump	jeq	✓	xy-	jump to address in next cell if NOS == TOS
14	@			a-n	fetch: get the value at the address on TOS and place it on TOS.
15	!			na-	store: store the value in NOS at address in TOS
16	+	add		xy-z	add NOS to TOS and place result on TOS
17	-	sub		xy-z	subtract NOS from TOS and place result on TOS
18	*	mul		xy-z	multiply NOS with TOS and place result on TOS
19	/mod	div		xy-rq	divide TOS by NOS and place remainder in NOS, quotient in TOS
20	and			xy-z	do a logical and of NOS and TOS and place result on TOS
21	or			xy-z	do a logical or of NOS and TOS and place result on TOS
22	xor			xy-z	do a logical xor of NOS and TOS and place result on TOS
23	<<	shl		xy-z	do a logical left shift of NOS by TOS and place result on TOS
24	>>	asr		xy-z	do an arithmetic right shift of NOS by TOS and place result on TOS
25	0;	0ret		n-?	ZeroExit: if TOS is 0, drop it and do a return, else do nothing
26	1+	inc		n-n	increment tos
27	1-	dec		n-n	decrement tos
28	in			p-n	I/O in (see Ngaro VM spec)
29	out			np-	I/O out (see Ngaro VM spec)
30	wait			?-	I/O wait (see Ngaro VM spec)

Comments:

Comments are placed between parentheses, i.e. '(' and ')'. The body of the comment must be separated from the enclosing parentheses by a space. That is:

Some valid comments:

( this is a valid comment )
( this is a
  rather long
  multiline comment )

The following ae invalid comments:

(this will be seen by the parser as label "(this" and will not work )
( comments may ( not be nested ) here, the parser will complain trying to resolve
  "here," as a label )

Literals and label/const identifiers:

The parser behaves almost like a Forth parser: input is split at white space (space, tab or new line) into tokens. The parser then does the following:

- If a token can be converted to a Go integer (see strconv.ParseInt), it will
  be converted to an integer literal.
- If it is a Go character literal between single quotes, it will be converted to
  the corresponding integer literal. Watch out with unicode chars: they will be
  converted to the proper rune (int32), but they are not natively supported by
  the VM I/O code.
- If a token is the name of a defined constant, it will be replaced internally by
  the constant's value and can be used anywhere an integer literal is expected.

- Then name resolution applies:
  - if an instruction is expected, the token is looked up in the assembler
    mnemonics and if no match is found, it is considered to be a label.
  - if an argument is expected, the token is always considered a label.

You may therefore define unusual labels or constant names (at least for Go programmers) such as "2dup", "(weird" or "end-weird)". Also, more than one instruction may appear on the same line and comments can be placed anywhere between instructions.

Implicit "lit":

Where the parser is expecting an instruction, integer literals, character literals and constants will be compiled with an implicit "lit":

lit 42
42	( will compile as "lit 42", just like above )
( like ) 'a' ( compiles as ) lit 'a' ( which in fact compiles as ) lit 97

Labels:

Labels are defined by prefixing them with a colon (:) and can be used as address in any lit, jump or loop instruction (without the ':' prefix). For example:

foo		( forward references are ok. This will be compiled as a call to foo )
lit foo		( this will compile as lit <address of foo>. This is actually the
		  only way to place the address of a label on the stack. )

:foo		( foo defined here )
nop
;

:bar	nop	( label definitions can be grouped with other instructions on the same line )
	;

:foobar	nop ;	( we can actually place any number of instructions on the same line )

Local labels:

Local labels work in the same way as in the GNU assembler. They are defined as a colon followed by a sequence of digits (i.e. :007, :0, :42). Although they can be defined multiple times, the compiler internally assigns them a unique name of the form N·counter (the middle character is '\u00b7'). References to such labels must be suffixed with either a '-' (meaning backward reference to the last definition of this label), or a '+' (meaning a forward reference to the next definition of this label). For example, in the following code:

:1	jump 1+	( not to be confused with the '1+' mnemonic. Here it means next occurrence of :1 )
:2	jump 1-
:1	jump 2+
:2	jump 1-

the labels will be internally converted to:

:1·1	jump 1·2
:2·1	jump 1·1
:1·2	jump 2·1
:2·2	jump 1·2

As a consequence, you should not use or define labels of the form N·N where N is any non-empty sequence of difigts. This also prevents the definition of labels of the form N+ or N- because they will not be addressable.

Please note that the parser does not prevent you either from using/defining labels with the same name as instructions. The only caveat, besides confusing yourself, is that you will not be able to use implicit calls to such labels:

:drop	'D' 1 1 out 0 0 out wait ( print 'D' )
	drop ;	( this will not loop forever, drop will be compiled as opcode 3, not a call )
drop		( still opcode 3 )
.dat drop	( will compile an implicit call to our custom drop )

Assembler directives:

The assembler supports the following directives:

.equ <identifier> <value>

defines a constant value. <identifier> can be any valid identifier (any combination of letters, symbols, digits and punctuation). The value must be an integer value, named constant or character literal. Constants must be defined before being used. Constants can be redefined, the compiler will always use the last assigned value.

.org <value>

Will place the next instruction at the address specified by the given integer literal or named constant.

.dat <value>

Will compile the specified integer value, named constant, character literal or string as-is (i.e. with no implicit "lit"). This is primarily used used for data storage structures:

:table	.dat 65
	.dat 'B'
	.dat "Hello,\n      world!"

The cells at addresses table+0 and table+1 will contain 65 and 66 respectively.

Strings are any text enclosed between a pair of double quotes ("). They are encoded as utf-8, one byte per Cell and zero terminated. Go escape sequences are supported. Strings cannot span multiple lines.

.opcode <identifier> <value>

defines a custom opcode. <identifier> can be any valid identifier (any combination of letters, symbols, digits and punctuation). The value must be an integer value, named constant or character literal. Custom opcodes must be defined before being used. They can be redefined, the compiler will always use the last assigned value. Default opcodes can also be redefined (think override) with this directive, it should therefore be used with caution.

For example, suppose that we have a VM implementation that maps opcode -42 to a function that computes the square root of the number on top of the data stack:

.opcode sqrt -42

lit 49
sqrt		( this compiles as .dat -42 )
7 !jump error

Note that there is no mechanism to tell the assdembler that a given custom opcode expects an argument from the next memory location (like lit or jump). Should you need to implement this type of opcode, constant and integer arguments would have to be prefixed with a .dat directive. For example, a compare instruction would look like:

.opcode cmp -1		( compares TOS with value in next memory location )

cmp 0		( Wrong: would compile as ".dat -1 lit 0" )
cmp .dat 0	( Correct: will compile as ".dat -1 0" )

Demonstrates use of local labels

Code:

code := `
	:1	jump 1+
	:2	jump 1-
	:1	jump 2+
	:2	jump 1-
	`

img, err := asm.Assemble("locals", strings.NewReader(code))
if err != nil {
    fmt.Println(err)
    return
}

asm.DisassembleAll(img, 0, os.Stdout)

Output:

         0	jump 4
         2	jump 0
         4	jump 6
         6	jump 4

Index

Examples

Package Files

asm.go doc.go parser.go

func Assemble Uses

func Assemble(name string, r io.Reader) (img []vm.Cell, err error)

Assemble compiles assembly read from the supplied io.Reader and returns the resulting memory image and error if any.

Then name parameter is used only in error messages to name the source of the error. If the io.Reader is a file, name should be the file name.

The returned error, if not nil, can safely be cast to an ErrAsm value that will contain up to 10 entries.

Shows off some of the assembler features (although the example assembly program is complete non-sense).

Code:

code := `
		( this is a comment. brackets must be separated by spaces )
		
		( a constant definition. Does not generate any code on its own )
		.equ SOMECONST 42
		
		nop
		123			( implicit literal )
		SOMECONST   ( const literal )
		drop
		drop
		foo			( implicit call to label foo )
		pop
		lit table	( address of table )
		'x'			( char literal, compiles as lit 'x' )
		
		.org 32 ( set compilation address )
		
:foo	42 bar drop ;
:bar	1+ ;  ( several instructions on the same line )

		.opcode sqrt -1	( test custom opcode )
		sqrt			( should compile like .dat -1 )
		
:table	( data structure )
		.dat -100		( will appear in the disassembly as "call -100" )
		.dat 0666		( octal )
		.dat 0x27		( hex )
		.dat '\u2033'	( unicode char )
		.dat SOMECONST
		.dat foo		( address of some label )
`

img, err := asm.Assemble("raw_string", strings.NewReader(code))
if err != nil {
    fmt.Println(err)
    return
}

asm.DisassembleAll(img, 0, os.Stdout)

Output:

         0	nop
         1	123
         3	42
         5	drop
         6	drop
         7	.dat 32	( call 32 )
         8	pop
         9	40
        11	120
        13	nop
        14	nop
        15	nop
        16	nop
        17	nop
        18	nop
        19	nop
        20	nop
        21	nop
        22	nop
        23	nop
        24	nop
        25	nop
        26	nop
        27	nop
        28	nop
        29	nop
        30	nop
        31	nop
        32	42
        34	.dat 37	( call 37 )
        35	drop
        36	;
        37	1+
        38	;
        39	.dat -1	( call -1 )
        40	.dat -100	( call -100 )
        41	.dat 438	( call 438 )
        42	.dat 39	( call 39 )
        43	.dat 8243	( call 8243 )
        44	.dat 42	( call 42 )
        45	.dat 32	( call 32 )

func Disassemble Uses

func Disassemble(i []vm.Cell, pc int, w io.Writer) (next int, err error)

Disassemble writes a disassembly of the cells in the given slice at position pc to the specified io.Writer and returns the position of the next valid opcode and any write error.

Note that some instructions will Disassemble like:

.dat 860	( call 860 )

This is because the cell value is 860 and disassenbler cannot determine if it's an implicit call or raw data. Disassembling it this way reminds you that this could be a call, while allowing the output to be passed as-is to the assembler.

Disassemble is pretty straightforward. Here we Disassemble a hand crafted fibonacci function.

Code:

fibS := `
	:fib
		push 0 1 pop	( like [ 0 1 ] dip )
		jump 1+		( jump forward to the next :1 )
	:0  push		( local label )
		dup	push
		+
		pop	swap
		pop
	:1  loop 0-		( local label back )
		swap drop ;
		lit		( lit deliberately unterminated at end of image for testing purposes.
		 		  should disassemble unequivocally as .dat 1 )
		`
img, err := asm.Assemble("fib", strings.NewReader(fibS))
if err != nil {
    panic(err)
}

for pc := 0; pc < len(img); {
    fmt.Printf("% 4d\t", pc)
    pc, err = asm.Disassemble(img, pc, os.Stdout)
    if err != nil {
        panic(err)
    }
    fmt.Println()
}

fmt.Println("Partial disassembly with DisassembleAll:")

// Partial diasssembly. Set base accordingly so that the address column is correct.
asm.DisassembleAll(img[15:20], 15, os.Stdout)

Output:

   0	push
   1	0
   3	1
   5	pop
   6	jump 15
   8	push
   9	dup
  10	push
  11	+
  12	pop
  13	swap
  14	pop
  15	loop 8
  17	swap
  18	drop
  19	;
  20	.dat 1
Partial disassembly with DisassembleAll:
        15	loop 8
        17	swap
        18	drop
        19	;

func DisassembleAll Uses

func DisassembleAll(i []vm.Cell, base int, w io.Writer) error

DisassembleAll writes a disassembly of all cells in the given slice to the specified io.Writer. The base argument specifies the real address of the frist cell (i[0]). It will return any write error.

type ErrAsm Uses

type ErrAsm []struct {
    // Position of the first character of the token that caused the error
    Pos scanner.Position
    // Detailed error description
    Msg string
}

ErrAsm encapsulates errors generated by the assembler. Some errors like duplicate declarations are followed by an entry giving the location of the first declaration.

func (ErrAsm) Error Uses

func (e ErrAsm) Error() string

Error returns a string representation of the error.

Package asm imports 7 packages (graph). Updated 2016-09-08. Refresh now. Tools for package owners. This is an inactive package (no imports and no commits in at least two years).