gtoken

package module
v0.0.0-...-9bb2f31 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 25, 2024 License: MIT Imports: 12 Imported by: 3

README

gtoken

Generic markup tokens for mixed content (such as LwDITA)

Documentation

Overview

Package gtoken is awesome.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DumpTo

func DumpTo(rGTkns []*GToken, w io.Writer)

DumpTo writes out the `GToken`s to the `io.Writer`, one per line, and each line is prefixed with the token type. The output should parse the same as the input file, except perhaps for the treatment of all-whitespace CDATA.

func HasDoctype

func HasDoctype(GTs []*GToken) (bool, string)

Types

type GToken

type GToken struct {
	// ==========================================
	// CToken has all the info about the original
	// source token, when considered in isolation.
	// ==========================================
	// Fields:
	//  - CT.SourceToken interface{}: "source code" token
	//  - SU.MarkupType: one of SU.MU_type_(XML/HTML/MKDN/BIN)
	//  - CT.FilePosition: char position, and line nr & column nr
	//  - CT.TDType: type of [xml.Token] or subtype of [xml.Directive]
	//  - CT.CName: alias of [xml.Name], only for elements
	//  - CT.CAtts: alias of slice of [xml.Attr], only for start-elm
	//  - Text string: CDATA / PI Instr / DOCTYPE root elm decl
	//  - ControlStrings []string: XML PI Target / XML Drctv subtype
	CT.CToken

	// Depth is the level of nesting of the source tag.
	Depth int
	// IsBlock and IsInline are
	// dupes of TagalogEntry ?
	IsBlock, IsInline bool
	NodeLevel         int
	// Key stuff
	*lwdx.TagalogEntry
	// DitaTag and HtmlTag are
	// dupes of TagalogEntry ?
	NodeKind, DitaTag, HtmlTag, NodeText string
}

GToken is meant to simplify & unify tokenisation across LwDITA's three supported input formats: XDITA XML, HDITA HTML5, and MDITA-XP Markdown. It also serves to represent all the various kinds of XML Directives, including DTDs(!).

To do this, the tokens produced by each parsing API are reduced to their essentials:

  • tag/token type (defined by the enumeration [GTagTokType], named TT_type_*, values are strings)
  • tag name (iff a markup element; is stored in a [GName], incl. NS)
  • token text (non-tag text content)
  • tag attributes
  • whatever additional stuff is available for Markdown tokens (to include Pandoc-style attributes)

NOTE that XML Directives are later "normalized", but that's another story. .

func DeleteNils

func DeleteNils(inGTzn []*GToken) (outGTzn []*GToken)

func DoGTokens_html

func DoGTokens_html(pCPR *PU.ParserResults_html) ([]*GToken, error)

DoGTokens_html turns every html.Node (from stdlib) into a GToken. It's pretty simple because no tree building is done yet. Basically it just copies in the Node type and the Node's data, and sets the [TTType] field,

type Node struct {
     Parent, FirstChild, LastChild, PrevSibling, NextSibling *Node
     Type      NodeType
     DataAtom  atom.Atom
     Data      string
     Namespace string
     Attr      []Attribute
     }

Data is unescaped, so that it looks like "a<b" rather than "a&lt;b". For element nodes, DataAtom is the atom for Data, or zero if Data is not a known tag name.

type Attribute struct {
     Namespace, Key, Val string }

..

func DoGTokens_mkdn

func DoGTokens_mkdn(pCPR *PU.ParserResults_mkdn) ([]*GToken, error)

DoGTokens_mkdn turns every Goldmark ast.Node Markdown token into a GToken. It's pretty simple, because no tree building is done yet. However it does merge text tokens into their preceding tokens, which leaves some nils in the list of tokens. .

func DoGTokens_xml

func DoGTokens_xml(pCPR *XU.ParserResults_xml) ([]*GToken, error)

DoGTokens_xml turns every xml.Token (from stdlib) into a GToken. It's pretty simple because no tree building is done yet. Basically it just copies in the Node type and the Node's data, and sets the [TDType] field,

xml.Token is an "any" interface holding a token types: StartElement, EndElement, CharData, Comment, ProcInst, Directive. Note that gtoken.TDType is a superset of these types. .

func GetAllByTag

func GetAllByTag(gTkzn []*GToken, s string) []*GToken

GetAllByTag returns a new GTokenization. It checks the basic tag only, not any namespace.

func GetFirstByTag

func GetFirstByTag(gTkzn []*GToken, s string) *GToken

GetFirstByTag checks the basic tag only, not any namespace.

func (GToken) DumpTo

func (T GToken) DumpTo(w io.Writer)

String implements Markupper.

func (GToken) Echo

func (T GToken) Echo() string

Echo implements Markupper.

func (GToken) EchoTo

func (T GToken) EchoTo(w io.Writer)

EchoTo implements Markupper.

func (*GToken) SourceTokenType

func (p *GToken) SourceTokenType() string

SourceTokenType returns `XML`, `MKDN`, `HTML`, or future stuff TBD.

func (GToken) String

func (T GToken) String() string

String implements Markupper.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL