commonmark

package module

v0.2.0 Latest Latest Go to latest Published: Apr 30, 2023 License: Apache-2.0 Imports: 11 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/zombiezen/go-commonmark

Links

Open Source Insights

README ¶

`zombiezen.com/go/commonmark`

This Go package provides a CommonMark parser, a specific dialect of Markdown. It allows you to parse, analyze, and render CommonMark/Markdown documents with an abstract syntax tree API.

This implementation conforms to Version 0.30 of the CommonMark Specification.

Goals

A few other Markdown/CommonMark packages exist for Go. This package prioritizes (in order):

Ability to connect the parse tree to CommonMark input losslessly in order to enable creation of tools that reformat or manipulate CommonMark documents.
Adherence to CommonMark specification.
Comprehensibility of implementation.
Performance.

Install

go get zombiezen.com/go/commonmark

Getting Started

package main

import (
  "fmt"
  "io"
  "os"

  "zombiezen.com/go/commonmark"
)

func main() {
  commonmarkSource, err := io.ReadAll(os.Stdin)
  if err != nil {
    fmt.Fprintln(os.Stderr, err)
    os.Exit(1)
  }
  blocks, refMap := commonmark.Parse(commonmarkSource)
  commonmark.RenderHTML(os.Stdout, blocks, refMap)
}

License

Apache 2.0

Documentation ¶

Overview ¶

Package commonmark provides a CommonMark parser.

Example ¶

package main

import (
	"os"

	"zombiezen.com/go/commonmark"
)

func main() {
	// Convert CommonMark to a parse tree and any link references.
	blocks, refMap := commonmark.Parse([]byte("Hello, **World**!\n"))
	// Render parse tree to HTML.
	commonmark.RenderHTML(os.Stdout, blocks, refMap)
}

Output:

<p>Hello, <strong>World</strong>!
</p>

Index ¶

func IsEmailAddress(s string) bool
func NormalizeURI(s string) string
func Parse(source []byte) ([]*RootBlock, ReferenceMap)
func RenderHTML(w io.Writer, blocks []*RootBlock, refMap ReferenceMap) error
type Block
- func (b *Block) AsNode() Node
- func (b *Block) Child(i int) Node
- func (b *Block) ChildCount() int
- func (b *Block) HeadingLevel() int
- func (b *Block) InfoString() *Inline
- func (b *Block) IsOrderedList() bool
- func (b *Block) IsTightList() bool
- func (b *Block) Kind() BlockKind
- func (b *Block) ListItemNumber(source []byte) int
- func (b *Block) Span() Span
type BlockKind
- func (k BlockKind) IsCode() bool
- func (k BlockKind) IsHeading() bool
- func (i BlockKind) String() string
type BlockParser
- func NewBlockParser(r io.Reader) *BlockParser
- func (p *BlockParser) NextBlock() (*RootBlock, error)
type Inline
- func (inline *Inline) AsNode() Node
- func (inline *Inline) Child(i int) *Inline
- func (inline *Inline) ChildCount() int
- func (inline *Inline) IndentWidth() int
- func (inline *Inline) Kind() InlineKind
- func (inline *Inline) LinkDestination() *Inline
- func (inline *Inline) LinkReference() string
- func (inline *Inline) LinkTitle() *Inline
- func (inline *Inline) Span() Span
- func (inline *Inline) Text(source []byte) string
type InlineKind
- func (i InlineKind) String() string
type InlineParser
- func (p *InlineParser) Rewrite(root *RootBlock)
type LinkDefinition
type Node
- func (n Node) Block() *Block
- func (n Node) Inline() *Inline
- func (n Node) Span() Span
type ReferenceMap
- func (m ReferenceMap) Extract(source []byte, node Node)
- func (m ReferenceMap) MatchReference(normalizedLabel string) bool
type ReferenceMatcher
type RootBlock
type Span
- func NullSpan() Span
- func (span Span) Intersect(span2 Span) Span
- func (span Span) IsValid() bool
- func (span Span) Len() int
- func (span Span) String() string

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func IsEmailAddress ¶

func IsEmailAddress(s string) bool

IsEmailAddress reports whether the string is a CommonMark email address.

func NormalizeURI ¶

func NormalizeURI(s string) string

NormalizeURI percent-encodes any characters in a string that are not reserved or unreserved URI characters. This is commonly used for transforming CommonMark link destinations into strings suitable for href or src attributes.

func Parse ¶

func Parse(source []byte) ([]*RootBlock, ReferenceMap)

Parse parses an in-memory UTF-8 CommonMark document and returns its blocks. As long as source does not contain NUL bytes, the blocks will use the original byte slice as their source.

func RenderHTML ¶

func RenderHTML(w io.Writer, blocks []*RootBlock, refMap ReferenceMap) error

RenderHTML writes the given sequence of parsed blocks to the given writer as HTML. It will return the first error encountered, if any.

Types ¶

type Block ¶

type Block struct {
	// contains filtered or unexported fields
}

A Block is a structural element in a CommonMark document.

func (*Block) AsNode ¶

func (b *Block) AsNode() Node

AsNode converts the block node to a Node pointer.

func (*Block) Child ¶

func (b *Block) Child(i int) Node

Child returns the i'th child of the node.

func (*Block) ChildCount ¶

func (b *Block) ChildCount() int

ChildCount returns the number of children the node has. Calling ChildCount on nil returns 0.

func (*Block) HeadingLevel ¶

func (b *Block) HeadingLevel() int

HeadingLevel returns the 1-based level for an ATXHeadingKind or SetextHeadingKind, or zero otherwise.

func (*Block) InfoString ¶

func (b *Block) InfoString() *Inline

InfoString returns the info string node for a FencedCodeBlockKind block or nil otherwise.

func (*Block) IsOrderedList ¶

func (b *Block) IsOrderedList() bool

IsOrderedList reports whether the block is an ordered list or an ordered list item.

func (*Block) IsTightList ¶

func (b *Block) IsTightList() bool

IsTightList reports whether the block is an tight list or a tight list item.

func (*Block) Kind ¶

func (b *Block) Kind() BlockKind

Kind returns the type of block node or zero if the node is nil.

func (*Block) ListItemNumber ¶

func (b *Block) ListItemNumber(source []byte) int

ListItemNumber returns the number of a ListItemKind block or -1 if the block does not represent an ordered list item.

func (*Block) Span ¶

func (b *Block) Span() Span

Span returns the position information relative to [RootBlock.Source].

type BlockKind ¶

type BlockKind uint16

BlockKind is an enumeration of values returned by *Block.Kind.

const (
	// ParagraphKind is used for a block of text.
	ParagraphKind BlockKind = 1 + iota
	// ThematicBreakKind is used for a thematic break, also known as a horizontal rule.
	// It will not contain children.
	ThematicBreakKind
	// ATXHeadingKind is used for headings that start with hash marks.
	ATXHeadingKind
	// SetextHeadingKind is used for headings that end with a divider.
	SetextHeadingKind
	// IndentedCodeBlockKind is used for code blocks started by indentation.
	IndentedCodeBlockKind
	// FencedCodeBlockKind is used for code blocks started by backticks or tildes.
	FencedCodeBlockKind
	// HTMLBlockKind is used for blocks of raw HTML.
	// It should not be wrapped by any tags in rendered HTML output.
	HTMLBlockKind
	// LinkReferenceDefinitionKind is used for a [link reference definition].
	// The first child is always a [LinkLabelKind],
	// the second child is always a [LinkDestinationKind],
	// and it may end with an optional [LinkTitleKind].
	//
	// [link reference definition]: https://spec.commonmark.org/0.30/#link-reference-definition
	LinkReferenceDefinitionKind
	// BlockQuoteKind is used for block quotes.
	BlockQuoteKind
	// ListItemKind is used for items in an ordered or unordered list.
	// The first child will always be of [ListMarkerKind].
	// If the item contains a paragraph and the item is "tight",
	// then the paragraph tag should be stripped.
	ListItemKind
	// ListKind is used for ordered or unordered lists.
	ListKind
	// ListMarkerKind is used to contain the marker in a [ListItemKind] node.
	// It is typically not rendered directly.
	ListMarkerKind
)

func (BlockKind) IsCode ¶

func (k BlockKind) IsCode() bool

IsCode reports whether the kind is IndentedCodeBlockKind or FencedCodeBlockKind.

func (BlockKind) IsHeading ¶

func (k BlockKind) IsHeading() bool

IsHeading reports whether the kind is ATXHeadingKind or SetextHeadingKind.

func (BlockKind) String ¶ added in v0.2.0

func (i BlockKind) String() string

type BlockParser ¶

type BlockParser struct {
	// contains filtered or unexported fields
}

A BlockParser splits a CommonMark document into blocks.

Example ¶

package main

import (
	"io"
	"os"
	"strings"

	"zombiezen.com/go/commonmark"
)

func main() {
	input := strings.NewReader(
		"Hello, [World][]!\n" +
			"\n" +
			"[World]: https://www.example.com/\n",
	)

	// Parse document into blocks (e.g. paragraphs, lists, etc.)
	// and collect link reference definitions.
	parser := commonmark.NewBlockParser(input)
	var blocks []*commonmark.RootBlock
	refMap := make(commonmark.ReferenceMap)
	for {
		block, err := parser.NextBlock()
		if err == io.EOF {
			break
		}
		if err != nil {
			// Not expecting an error from a string.
			panic(err)
		}

		// Add block to list.
		blocks = append(blocks, block)
		// Add any link reference definitions to map.
		refMap.Extract(block.Source, block.AsNode())
	}

	// Finish parsing inside blocks.
	inlineParser := &commonmark.InlineParser{
		ReferenceMatcher: refMap,
	}
	for _, block := range blocks {
		inlineParser.Rewrite(block)
	}

	// Render blocks as HTML.
	commonmark.RenderHTML(os.Stdout, blocks, refMap)
}

Output:

<p>Hello, <a href="https://www.example.com/">World</a>!
</p>

func NewBlockParser ¶

func NewBlockParser(r io.Reader) *BlockParser

NewBlockParser returns a block parser that reads from r.

Block parsers maintain their own buffering and may read data from r beyond the blocks requested.

func (*BlockParser) NextBlock ¶

func (p *BlockParser) NextBlock() (*RootBlock, error)

NextBlock reads the next top-level block in the document, returning the first error encountered. Blocks returned by NextBlock will typically contain UnparsedKind nodes for any text: use *InlineParser.Rewrite to complete parsing.

type Inline ¶

type Inline struct {
	// contains filtered or unexported fields
}

Inline represents CommonMark content elements like text, links, or emphasis.

func (*Inline) AsNode ¶

func (inline *Inline) AsNode() Node

AsNode converts the inline node to a Node pointer.

func (*Inline) Child ¶

func (inline *Inline) Child(i int) *Inline

Child returns the i'th child of the node.

func (*Inline) ChildCount ¶

func (inline *Inline) ChildCount() int

ChildCount returns the number of children the node has. Calling ChildCount on nil returns 0.

func (*Inline) IndentWidth ¶

func (inline *Inline) IndentWidth() int

IndentWidth returns the number of spaces the IndentKind span represents, or zero if the node is nil or of a different type.

func (*Inline) Kind ¶

func (inline *Inline) Kind() InlineKind

Kind returns the type of inline node or zero if the node is nil.

func (*Inline) LinkDestination ¶

func (inline *Inline) LinkDestination() *Inline

LinkDestination returns the destination child of a LinkKind node or nil if none is present or the node is not a link.

func (*Inline) LinkReference ¶

func (inline *Inline) LinkReference() string

LinkReference returns the normalized form of a link label.

func (*Inline) LinkTitle ¶

func (inline *Inline) LinkTitle() *Inline

LinkTitle returns the title child of a LinkKind node or nil if none is present or the node is not a link.

func (*Inline) Span ¶

func (inline *Inline) Span() Span

Span returns the position information relative to [RootBlock.Source].

func (*Inline) Text ¶

func (inline *Inline) Text(source []byte) string

Text converts a non-container inline node into a string.

type InlineKind ¶

type InlineKind uint16

InlineKind is an enumeration of values returned by *Inline.Kind.

const (
	// TextKind is used for literal text.
	TextKind InlineKind = 1 + iota
	// SoftLineBreakKind is rendered as either a space or as a hard line break,
	// depending on the renderer.
	SoftLineBreakKind
	// HardLineBreakKind is rendered as a line break.
	HardLineBreakKind
	// IndentKind represents one or more space characters
	// (the exact number can be retrieved by [*Inline.IndentWidth]).
	// It's placed in the parse tree
	// in situations where the number of logical spaces does not match the source.
	IndentKind
	// CharacterReferenceKind is used for ampersand escape characters
	// (e.g. "&amp;").
	CharacterReferenceKind
	// InfoStringKind is used for the [info string] of a fenced code block.
	// It's typically not rendered directly and its contents are implementation-defined.
	//
	// [info string]: https://spec.commonmark.org/0.30/#info-string
	InfoStringKind
	// EmphasisKind is used for text that has stress emphasis.
	EmphasisKind
	// StrongKind is used for text that has strong emphasis.
	StrongKind
	// LinkKind is used for hyperlinks.
	// The [*Inline.LinkDestination], [*Inline.LinkTitle], and [*Inline.LinkReference] methods
	// can be used to retrieve specific parts of the link.
	LinkKind
	// ImageKind is used for images.
	// The contents of the node are used as the image's text description.
	// Otherwise, ImageKind is similar to [LinkKind].
	ImageKind
	// LinkDestinationKind is used as part of links and images
	// to indicate the destination or image source, respectively.
	LinkDestinationKind
	// LinkTitleKind is used as part of links and images
	// to hold advisory text typically rendered as a tooltip.
	LinkTitleKind
	// LinkLabelKind is used as either a link reference definition label
	// or in a link or image to reference a link reference definition.
	LinkLabelKind
	// CodeSpanKind is used for inline code in a non-code-block context.
	CodeSpanKind
	// AutolinkKind is used for [autolinks].
	// The node's content is also the link's destination.
	//
	// [autolinks]: https://spec.commonmark.org/0.30/#autolinks
	AutolinkKind

	// HTMLTagKind is a container for one or more [RawHTMLKind] nodes
	// that represents an open tag, a closing tag, an HTML comment,
	// a processing instruction, a declaration, or a CDATA section.
	HTMLTagKind
	// RawHTMLKind is a text node that should be reproduced in HTML verbatim.
	RawHTMLKind

	// UnparsedKind is used for inline text that has not been tokenized.
	UnparsedKind
)

func (InlineKind) String ¶ added in v0.2.0

func (i InlineKind) String() string

type InlineParser ¶

type InlineParser struct {
	ReferenceMatcher ReferenceMatcher
}

An InlineParser converts UnparsedKind Inline nodes into inline trees.

func (*InlineParser) Rewrite ¶

func (p *InlineParser) Rewrite(root *RootBlock)

Rewrite replaces any UnparsedKind nodes in the given root block with parsed versions of the node.

type LinkDefinition ¶

type LinkDefinition struct {
	Destination  string
	Title        string
	TitlePresent bool
}

LinkDefinition is the data of a link reference definition.

type Node ¶

type Node struct {
	// contains filtered or unexported fields
}

Node is a pointer to a Block or an Inline.

func (Node) Block ¶

func (n Node) Block() *Block

Block returns the referenced block or nil if the pointer does not reference a block.

func (Node) Inline ¶

func (n Node) Inline() *Inline

Inline returns the referenced inline or nil if the pointer does not reference an inline.

func (Node) Span ¶ added in v0.2.0

func (n Node) Span() Span

Span returns the span of the referenced node or an invalid span if the pointer is nil.

type ReferenceMap ¶

type ReferenceMap map[string]LinkDefinition

ReferenceMap is a mapping of normalized labels to link definitions.

func (ReferenceMap) Extract ¶

func (m ReferenceMap) Extract(source []byte, node Node)

Extract adds any link reference definitions contained in node to the map. In case of conflicts, Extract will not replace any existing definitions in the map and will use the first definition in source order.

func (ReferenceMap) MatchReference ¶

func (m ReferenceMap) MatchReference(normalizedLabel string) bool

MatchReference reports whether the normalized label appears in the map.

type ReferenceMatcher ¶

type ReferenceMatcher interface {
	MatchReference(normalizedLabel string) bool
}

A type that implements ReferenceMatcher can be checked for the presence of link reference definitions.

type RootBlock ¶

type RootBlock struct {
	// Source holds the bytes of the block read from the original source.
	// Any NUL bytes will have been replaced with the Unicode Replacement Character.
	Source []byte
	// StartLine is the 1-based line number of the first line of the block.
	StartLine int
	// StartOffset is the byte offset from the beginning of the original source
	// that this block starts at.
	StartOffset int64
	// EndOffset is the byte offset from the beginning of the original source
	// that this block ends at.
	// Unless the original source contained NUL bytes,
	// EndOffset = StartOffset + len(Source).
	EndOffset int64

	Block
}

RootBlock represents a "top-level" block, that is, a block whose parent is the document. Root blocks store their CommonMark source and document position information. All other position information in the tree is relative to the beginning of the root block.

type Span ¶

type Span struct {
	// Start is the index of the first byte of the span,
	// relative to the beginning of the [RootBlock].
	Start int
	// End is the end index of the span (exclusive),
	// relative to the beginning of the [RootBlock].
	End int
}

Span is a contiguous region of a document reference in a RootBlock.

func NullSpan ¶

func NullSpan() Span

NullSpan returns an invalid span.

func (Span) Intersect ¶

func (span Span) Intersect(span2 Span) Span

Intersect returns the intersection of two spans or an invalid span if none exists.

func (Span) IsValid ¶

func (span Span) IsValid() bool

IsValid reports whether the span is valid.

func (Span) Len ¶

func (span Span) Len() int

Len returns the length of the span or zero if the span is invalid.

func (Span) String ¶

func (span Span) String() string

String formats the span indices as a mathematical range like "[12,34)".

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL