commonmark

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 30, 2023 License: Apache-2.0 Imports: 11 Imported by: 0

README

zombiezen.com/go/commonmark

Go Reference

This Go package provides a CommonMark parser, a specific dialect of Markdown. It allows you to parse, analyze, and render CommonMark/Markdown documents with an abstract syntax tree API.

This implementation conforms to Version 0.30 of the CommonMark Specification.

Goals

A few other Markdown/CommonMark packages exist for Go. This package prioritizes (in order):

  1. Ability to connect the parse tree to CommonMark input losslessly in order to enable creation of tools that reformat or manipulate CommonMark documents.
  2. Adherence to CommonMark specification.
  3. Comprehensibility of implementation.
  4. Performance.

Install

go get zombiezen.com/go/commonmark

Getting Started

package main

import (
  "fmt"
  "io"
  "os"

  "zombiezen.com/go/commonmark"
)

func main() {
  commonmarkSource, err := io.ReadAll(os.Stdin)
  if err != nil {
    fmt.Fprintln(os.Stderr, err)
    os.Exit(1)
  }
  blocks, refMap := commonmark.Parse(commonmarkSource)
  commonmark.RenderHTML(os.Stdout, blocks, refMap)
}

License

Apache 2.0

Documentation

Overview

Package commonmark provides a CommonMark parser.

Example
package main

import (
	"os"

	"zombiezen.com/go/commonmark"
)

func main() {
	// Convert CommonMark to a parse tree and any link references.
	blocks, refMap := commonmark.Parse([]byte("Hello, **World**!\n"))
	// Render parse tree to HTML.
	commonmark.RenderHTML(os.Stdout, blocks, refMap)
}
Output:

<p>Hello, <strong>World</strong>!
</p>

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func IsEmailAddress

func IsEmailAddress(s string) bool

IsEmailAddress reports whether the string is a CommonMark email address.

func NormalizeURI

func NormalizeURI(s string) string

NormalizeURI percent-encodes any characters in a string that are not reserved or unreserved URI characters. This is commonly used for transforming CommonMark link destinations into strings suitable for href or src attributes.

func Parse

func Parse(source []byte) ([]*RootBlock, ReferenceMap)

Parse parses an in-memory UTF-8 CommonMark document and returns its blocks. As long as source does not contain NUL bytes, the blocks will use the original byte slice as their source.

func RenderHTML

func RenderHTML(w io.Writer, blocks []*RootBlock, refMap ReferenceMap) error

RenderHTML writes the given sequence of parsed blocks to the given writer as HTML. It will return the first error encountered, if any.

Types

type Block

type Block struct {
	// contains filtered or unexported fields
}

A Block is a structural element in a CommonMark document.

func (*Block) AsNode

func (b *Block) AsNode() Node

AsNode converts the block node to a Node pointer.

func (*Block) Child

func (b *Block) Child(i int) Node

Child returns the i'th child of the node.

func (*Block) ChildCount

func (b *Block) ChildCount() int

ChildCount returns the number of children the node has. Calling ChildCount on nil returns 0.

func (*Block) HeadingLevel

func (b *Block) HeadingLevel() int

HeadingLevel returns the 1-based level for an ATXHeadingKind or SetextHeadingKind, or zero otherwise.

func (*Block) InfoString

func (b *Block) InfoString() *Inline

InfoString returns the info string node for a FencedCodeBlockKind block or nil otherwise.

func (*Block) IsOrderedList

func (b *Block) IsOrderedList() bool

IsOrderedList reports whether the block is an ordered list or an ordered list item.

func (*Block) IsTightList

func (b *Block) IsTightList() bool

IsTightList reports whether the block is an tight list or a tight list item.

func (*Block) Kind

func (b *Block) Kind() BlockKind

Kind returns the type of block node or zero if the node is nil.

func (*Block) ListItemNumber

func (b *Block) ListItemNumber(source []byte) int

ListItemNumber returns the number of a ListItemKind block or -1 if the block does not represent an ordered list item.

func (*Block) Span

func (b *Block) Span() Span

Span returns the position information relative to [RootBlock.Source].

type BlockKind

type BlockKind uint16

BlockKind is an enumeration of values returned by *Block.Kind.

const (
	// ParagraphKind is used for a block of text.
	ParagraphKind BlockKind = 1 + iota
	// ThematicBreakKind is used for a thematic break, also known as a horizontal rule.
	// It will not contain children.
	ThematicBreakKind
	// ATXHeadingKind is used for headings that start with hash marks.
	ATXHeadingKind
	// SetextHeadingKind is used for headings that end with a divider.
	SetextHeadingKind
	// IndentedCodeBlockKind is used for code blocks started by indentation.
	IndentedCodeBlockKind
	// FencedCodeBlockKind is used for code blocks started by backticks or tildes.
	FencedCodeBlockKind
	// HTMLBlockKind is used for blocks of raw HTML.
	// It should not be wrapped by any tags in rendered HTML output.
	HTMLBlockKind
	// LinkReferenceDefinitionKind is used for a [link reference definition].
	// The first child is always a [LinkLabelKind],
	// the second child is always a [LinkDestinationKind],
	// and it may end with an optional [LinkTitleKind].
	//
	// [link reference definition]: https://spec.commonmark.org/0.30/#link-reference-definition
	LinkReferenceDefinitionKind
	// BlockQuoteKind is used for block quotes.
	BlockQuoteKind
	// ListItemKind is used for items in an ordered or unordered list.
	// The first child will always be of [ListMarkerKind].
	// If the item contains a paragraph and the item is "tight",
	// then the paragraph tag should be stripped.
	ListItemKind
	// ListKind is used for ordered or unordered lists.
	ListKind
	// ListMarkerKind is used to contain the marker in a [ListItemKind] node.
	// It is typically not rendered directly.
	ListMarkerKind
)

func (BlockKind) IsCode

func (k BlockKind) IsCode() bool

IsCode reports whether the kind is IndentedCodeBlockKind or FencedCodeBlockKind.

func (BlockKind) IsHeading

func (k BlockKind) IsHeading() bool

IsHeading reports whether the kind is ATXHeadingKind or SetextHeadingKind.

func (BlockKind) String added in v0.2.0

func (i BlockKind) String() string

type BlockParser

type BlockParser struct {
	// contains filtered or unexported fields
}

A BlockParser splits a CommonMark document into blocks.

Example
package main

import (
	"io"
	"os"
	"strings"

	"zombiezen.com/go/commonmark"
)

func main() {
	input := strings.NewReader(
		"Hello, [World][]!\n" +
			"\n" +
			"[World]: https://www.example.com/\n",
	)

	// Parse document into blocks (e.g. paragraphs, lists, etc.)
	// and collect link reference definitions.
	parser := commonmark.NewBlockParser(input)
	var blocks []*commonmark.RootBlock
	refMap := make(commonmark.ReferenceMap)
	for {
		block, err := parser.NextBlock()
		if err == io.EOF {
			break
		}
		if err != nil {
			// Not expecting an error from a string.
			panic(err)
		}

		// Add block to list.
		blocks = append(blocks, block)
		// Add any link reference definitions to map.
		refMap.Extract(block.Source, block.AsNode())
	}

	// Finish parsing inside blocks.
	inlineParser := &commonmark.InlineParser{
		ReferenceMatcher: refMap,
	}
	for _, block := range blocks {
		inlineParser.Rewrite(block)
	}

	// Render blocks as HTML.
	commonmark.RenderHTML(os.Stdout, blocks, refMap)
}
Output:

<p>Hello, <a href="https://www.example.com/">World</a>!
</p>

func NewBlockParser

func NewBlockParser(r io.Reader) *BlockParser

NewBlockParser returns a block parser that reads from r.

Block parsers maintain their own buffering and may read data from r beyond the blocks requested.

func (*BlockParser) NextBlock

func (p *BlockParser) NextBlock() (*RootBlock, error)

NextBlock reads the next top-level block in the document, returning the first error encountered. Blocks returned by NextBlock will typically contain UnparsedKind nodes for any text: use *InlineParser.Rewrite to complete parsing.

type Inline

type Inline struct {
	// contains filtered or unexported fields
}

Inline represents CommonMark content elements like text, links, or emphasis.

func (*Inline) AsNode

func (inline *Inline) AsNode() Node

AsNode converts the inline node to a Node pointer.

func (*Inline) Child

func (inline *Inline) Child(i int) *Inline

Child returns the i'th child of the node.

func (*Inline) ChildCount

func (inline *Inline) ChildCount() int

ChildCount returns the number of children the node has. Calling ChildCount on nil returns 0.

func (*Inline) IndentWidth

func (inline *Inline) IndentWidth() int

IndentWidth returns the number of spaces the IndentKind span represents, or zero if the node is nil or of a different type.

func (*Inline) Kind

func (inline *Inline) Kind() InlineKind

Kind returns the type of inline node or zero if the node is nil.

func (*Inline) LinkDestination

func (inline *Inline) LinkDestination() *Inline

LinkDestination returns the destination child of a LinkKind node or nil if none is present or the node is not a link.

func (*Inline) LinkReference

func (inline *Inline) LinkReference() string

LinkReference returns the normalized form of a link label.

func (*Inline) LinkTitle

func (inline *Inline) LinkTitle() *Inline

LinkTitle returns the title child of a LinkKind node or nil if none is present or the node is not a link.

func (*Inline) Span

func (inline *Inline) Span() Span

Span returns the position information relative to [RootBlock.Source].

func (*Inline) Text

func (inline *Inline) Text(source []byte) string

Text converts a non-container inline node into a string.

type InlineKind

type InlineKind uint16

InlineKind is an enumeration of values returned by *Inline.Kind.

const (
	// TextKind is used for literal text.
	TextKind InlineKind = 1 + iota
	// SoftLineBreakKind is rendered as either a space or as a hard line break,
	// depending on the renderer.
	SoftLineBreakKind
	// HardLineBreakKind is rendered as a line break.
	HardLineBreakKind
	// IndentKind represents one or more space characters
	// (the exact number can be retrieved by [*Inline.IndentWidth]).
	// It's placed in the parse tree
	// in situations where the number of logical spaces does not match the source.
	IndentKind
	// CharacterReferenceKind is used for ampersand escape characters
	// (e.g. "&amp;").
	CharacterReferenceKind
	// InfoStringKind is used for the [info string] of a fenced code block.
	// It's typically not rendered directly and its contents are implementation-defined.
	//
	// [info string]: https://spec.commonmark.org/0.30/#info-string
	InfoStringKind
	// EmphasisKind is used for text that has stress emphasis.
	EmphasisKind
	// StrongKind is used for text that has strong emphasis.
	StrongKind
	// LinkKind is used for hyperlinks.
	// The [*Inline.LinkDestination], [*Inline.LinkTitle], and [*Inline.LinkReference] methods
	// can be used to retrieve specific parts of the link.
	LinkKind
	// ImageKind is used for images.
	// The contents of the node are used as the image's text description.
	// Otherwise, ImageKind is similar to [LinkKind].
	ImageKind
	// LinkDestinationKind is used as part of links and images
	// to indicate the destination or image source, respectively.
	LinkDestinationKind
	// LinkTitleKind is used as part of links and images
	// to hold advisory text typically rendered as a tooltip.
	LinkTitleKind
	// LinkLabelKind is used as either a link reference definition label
	// or in a link or image to reference a link reference definition.
	LinkLabelKind
	// CodeSpanKind is used for inline code in a non-code-block context.
	CodeSpanKind
	// AutolinkKind is used for [autolinks].
	// The node's content is also the link's destination.
	//
	// [autolinks]: https://spec.commonmark.org/0.30/#autolinks
	AutolinkKind

	// HTMLTagKind is a container for one or more [RawHTMLKind] nodes
	// that represents an open tag, a closing tag, an HTML comment,
	// a processing instruction, a declaration, or a CDATA section.
	HTMLTagKind
	// RawHTMLKind is a text node that should be reproduced in HTML verbatim.
	RawHTMLKind

	// UnparsedKind is used for inline text that has not been tokenized.
	UnparsedKind
)

func (InlineKind) String added in v0.2.0

func (i InlineKind) String() string

type InlineParser

type InlineParser struct {
	ReferenceMatcher ReferenceMatcher
}

An InlineParser converts UnparsedKind Inline nodes into inline trees.

func (*InlineParser) Rewrite

func (p *InlineParser) Rewrite(root *RootBlock)

Rewrite replaces any UnparsedKind nodes in the given root block with parsed versions of the node.

type LinkDefinition

type LinkDefinition struct {
	Destination  string
	Title        string
	TitlePresent bool
}

LinkDefinition is the data of a link reference definition.

type Node

type Node struct {
	// contains filtered or unexported fields
}

Node is a pointer to a Block or an Inline.

func (Node) Block

func (n Node) Block() *Block

Block returns the referenced block or nil if the pointer does not reference a block.

func (Node) Inline

func (n Node) Inline() *Inline

Inline returns the referenced inline or nil if the pointer does not reference an inline.

func (Node) Span added in v0.2.0

func (n Node) Span() Span

Span returns the span of the referenced node or an invalid span if the pointer is nil.

type ReferenceMap

type ReferenceMap map[string]LinkDefinition

ReferenceMap is a mapping of normalized labels to link definitions.

func (ReferenceMap) Extract

func (m ReferenceMap) Extract(source []byte, node Node)

Extract adds any link reference definitions contained in node to the map. In case of conflicts, Extract will not replace any existing definitions in the map and will use the first definition in source order.

func (ReferenceMap) MatchReference

func (m ReferenceMap) MatchReference(normalizedLabel string) bool

MatchReference reports whether the normalized label appears in the map.

type ReferenceMatcher

type ReferenceMatcher interface {
	MatchReference(normalizedLabel string) bool
}

A type that implements ReferenceMatcher can be checked for the presence of link reference definitions.

type RootBlock

type RootBlock struct {
	// Source holds the bytes of the block read from the original source.
	// Any NUL bytes will have been replaced with the Unicode Replacement Character.
	Source []byte
	// StartLine is the 1-based line number of the first line of the block.
	StartLine int
	// StartOffset is the byte offset from the beginning of the original source
	// that this block starts at.
	StartOffset int64
	// EndOffset is the byte offset from the beginning of the original source
	// that this block ends at.
	// Unless the original source contained NUL bytes,
	// EndOffset = StartOffset + len(Source).
	EndOffset int64

	Block
}

RootBlock represents a "top-level" block, that is, a block whose parent is the document. Root blocks store their CommonMark source and document position information. All other position information in the tree is relative to the beginning of the root block.

type Span

type Span struct {
	// Start is the index of the first byte of the span,
	// relative to the beginning of the [RootBlock].
	Start int
	// End is the end index of the span (exclusive),
	// relative to the beginning of the [RootBlock].
	End int
}

Span is a contiguous region of a document reference in a RootBlock.

func NullSpan

func NullSpan() Span

NullSpan returns an invalid span.

func (Span) Intersect

func (span Span) Intersect(span2 Span) Span

Intersect returns the intersection of two spans or an invalid span if none exists.

func (Span) IsValid

func (span Span) IsValid() bool

IsValid reports whether the span is valid.

func (Span) Len

func (span Span) Len() int

Len returns the length of the span or zero if the span is invalid.

func (Span) String

func (span Span) String() string

String formats the span indices as a mathematical range like "[12,34)".

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL