gtree

package module
v0.0.0-...-e2f8bfb Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 25, 2024 License: MIT Imports: 18 Imported by: 1

README

gtree

Generalized Golang trees for mixed content from diverse source formats

Documentation

Overview

Package mmmc contains generic Golang XML stuff: names, attributes, tags, elements, trees, files, documents.

Files in this directory use Markdown, so use `godoc2md` on 'em.

We make our own versions of Golang XML structures so that we can give them sensible new names and we can define our own methods for them.

We also use a couple of shortened names (Att *Attribute*, Elm *Element*) to keep code readable. )

### Method naming

- `NewFoo(..)` always allocates memory. - `Echo()` echoes an object back in source XML form, but normailzed. - `EchoCommented()` also outputs XML source form, but possibly with additional annotations added by processing. - `String()“ outputs a form that is useful for development and debugging but cannot be processed by an XML parser.

### About encoding/decoding and XML mixed content

When working with XML we can generally distinguish between two types of files: - Files containing record-oriented data - expressed using XML elements - Files containing natural language documents - also expressed using XML elements - Files containing validation rules - generally expressed as XSD, RNG, or DTD. It is interesting to note that DTDs actually obey the same syntax rules as the other two; the typical file extensions (`.dtd .mod`) are helpful to humans but are not required by a parser that fully understands XML syntax.

Package gtree defines low-level structures for Generic Golang XML analysis, structures that are built directly atop (or map'd directly to) Golang's own XML structures. We do this for three reasons: - So that we can define our own helper methods in our own golang namespace; - Cos golang's XML package uses lousy naming (`Name Name`, anyone ?); - Cos golang's XML is written for XML data records, not for mixed content.

This repo implements a protocol stack that goes: - the input text file (i.e. XML or other markup) - package encoding/xml (golang's XML package) - package gparse (low-level stuff) - package gfile (deep analysis of individual markup files) - package mmmc (processing of heterogeneous markup files)

In general, all go files in this protocol stack should be organised as: <br/> - struct definition() - constructors (named `New*`) - printf stuff (Raw(), Echo(), String())

Some characteristic methods: - Raw() returns the original string passed from the golang XML parser - Echo() returns a string of the item in normalised form, altho the presence of terminating newlines is not uniform String() returns a string suitable for runtime nonitoring and debugging

NOTE:1280 the use of shorthand in variable names: Doc, Elm, Att.

NOTE:1220 that we store non-nil namespaces with a colon appended, for easy output.

NOTE:1230 that we use `godoc2md`, so we can use Markdown in these code comments.

NOTE:1030 that like other godoc comments, this package comment must be *right* above the target statement (`package`) if it is to be included by `godoc2md`.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DumpBFnode

func DumpBFnode(p AST.Node, iLvl int)

func DumpCdBlk

func DumpCdBlk(cb AST.CodeBlock) string

IsFenced bool // Fenced code block, or else an indented one Info []byte // This holds the info string FenceChar byte FenceLength int FenceOffset int

func DumpGTag

func DumpGTag(p AST.Node) string

func DumpHdg

func DumpHdg(h AST.Heading) string

Level int // This holds the heading level number HeadingID string // This might hold heading ID, if present IsTitleblock bool // Specifies whether it's a title block

func DumpLink(L AST.Link) string

Destination []byte // Destination is what goes into a href Title []byte // The tooltip thing that goes in a title attribute NoteID int // The S/N of a footnote, or 0 if not a footnote Footnote *Node // If footnote, a direct link to the FN Node, else nil.

func DumpList

func DumpList(L AST.List) string

ListFlags ListType Tight bool // Skip <p>s around list item data if true BulletChar byte // '*', '+' or '-' in bullet lists Delimiter byte // '.' or ')' after the number in ordered lists RefLink []byte // If not nil, turns this list item into a footnote item and triggers different rendering IsFootnotesList bool // This is a list of footnotes

func GTokenizeMDbuffer

func GTokenizeMDbuffer(inString string) (GTokzn []*gtoken.GToken, err error)

GTokenizeMDbuffer takes the raw XML and parses it into a slice of `GToken`s. It takes a string, not an `io.Reader`, so we know that the caller already had access to the full file contents and verified that it is in fact an XML file.

func KidsAsSlice

func KidsAsSlice(p AST.Node) []AST.Node

func ListKids

func ListKids(p AST.Node) string

func NewGTokenFromHtmlToken

func NewGTokenFromHtmlToken(inT html.Token) (outT *gtoken.GToken, e error)

NewGTokenFromHtmlToken does not recognise and return Processing Instructions !

Types

type GRootTag

type GRootTag GTag

GRootTag makes sure that assignments to/from a root node are explicit.

func NewGTagTreeFromBFtree

func NewGTagTreeFromBFtree(p AST.Node) *GRootTag

type GTag

type GTag struct {
	// Nord provides tree structure
	ON.Nord
	// [GToken] includes Name and Attribute info for XML
	// tags. For a simple tag that cannot be namespaced,
	// such as a "tag" in Markdown, the tag name is in
	// [GToken.XName.Local].
	//
	// NOTE: For LwDITA's Markdown-XP, we could use
	// the Attributes to store Pandoc-style attributes.
	// TODO: Every node needs both NAMESPACE and LANGUAGE,
	// because they are inherited.
	gtoken.GToken

	// TODO: Should TagalogEntry be moved elsewhere ?
	*lwdx.TagalogEntry
	// EntityIsParameter is a bool field used for XML ENTITYs only.
	// It indicates whether the entity defined using a "%" or not.
	// This distinguishes a parameter/DTD entity from a general/data
	// entity. This is recorded during parsing, for later use when
	// we fully process the entity declaration.
	EntityIsParameter bool
	// contains filtered or unexported fields
}

GTag is a generic golang XML tag, used mainly for representing XML tags (or their Markdown equivalents) in a mixed content document. Child elements (called "Kids") are referenced by the embedded [Nord].

Note that this is the appropriate struct for indicating block/inline, via func [IsBlock].

(GTag might also be useful tho for holding multi-level attribute info in DTDs, but then again we also define a very different [DTag].)

GTag is also used to represent non-tag XML items, including PIs, Comments, Directives, and CDATA character data items. Therefore a GTag is created for every XML token (even [EndElement]s), and they are linked into a tree structure (a GTree).

GTag uses pointer receivers, not method receivers. <br/> For its kids it uses a linked list, not a slice. .

func CDataTagMD

func CDataTagMD(content string) *GTag

func EndTagMD

func EndTagMD(tag string) *GTag

func MakeGTagsFromGTokens

func MakeGTagsFromGTokens(GTs []*gtoken.GToken) (GEs []*GTag, err error)

func NewGTag

func NewGTag(aNS, aName string) *GTag

NewGTag initializes the node with parser results.

func NewGTagFromBFnode

func NewGTagFromBFnode(p AST.Node) *GTag

NewGTagFromBFnode basically just assigns to this field: - gparse.GToken which comprises: - GTagTokType - XName - GAttList

func NewGTagFromGToken

func NewGTagFromGToken(inGTkn gtoken.GToken) (pTag *GTag, e error)

NewGTagFromGToken embeds the GToken and processes it. NOTE: Returns (`nil,nil`) if the token is valid but useless, and should be skipped, i.e. an `xml.CharData` that is all whitespace.

func NewGTagFromHtmlToken

func NewGTagFromHtmlToken(T html.Token) (pTag *GTag, e error)

NewGTagFromHtmltoken is TODO. TODO: Pass a writer for Echo. NOTE: Returns "nil" if the token is valid but useless, and can be skipped, such as an xml.CharData that is all whitespace; NOTE: that it might cause problems. .

func StartTagMD

func StartTagMD(tag string) *GTag

func (*GTag) Echo

func (p *GTag) Echo() string

Echo implements Markupper.

func (*GTag) IsBlock

func (p *GTag) IsBlock() bool

IsBlock needs to check for which schema, because some tags occur in multiple schemata but with differing values for block/inline. .

func (*GTag) NewKid

func (anE *GTag) NewKid(aNS, aName string) *GTag

NewKid initializes the node with parser results and adds it to N as the last kid.

func (GTag) String

func (p GTag) String() string

String implements Markupper.

type GTree

type GTree struct {
	// This data structure should know where its own root tag is, but
	// try not to use this field a lot because it might be redundant.
	RootTagIndex   int
	RootTagCount   int
	RootTagsDiffer bool
	// Scratch variables for matching start and end tags
	NrOpenTags int
	Tagstack
}

GTree is the workspace for AND the results of parsing a hierarchically organised markup file. The file should *not* have to be XML, but *should* have (or be modelable with) hierarchical (tree) structure.

Currently an GTree contains XML. There is not (yet) any higher-order semantics imposed or added, but it is entirely possible that an GTree could instead be base don (say) the Pandoc AST.

GTree maintains a 1-to-1 mapping btwn the tokens returned by the Golang XML parser, and its own "Tag" elements. This makes it easy to sort out errors, and to provide meaningful error messages that directly quote inputs.

NOTE:1050 that the file does not have to be a well-formed XML file (or other markup file) with a single root element. It can also be - A DTD file (*.dtd, *.mod) - An XML data file that happens not to have a single top-level root element (this makes it an "XML fragment")

Thus this function makes no assumptions about the top-down structure of the XML file, but it does expect that it is basically well-formed. -n- The file is read entirely into memory and parsed as a unit, in several passes, which implies that - an GTree is returned as the complete end result of parsing a single file, and no intermediate results are exposed to the caller (altho after the parsing function returns, the total output of each pass is available as fields in the GTree struct) - every transcluded file (i.e. external entity reference - of type general "&foo;" or parameter "%foo;") is processed as a separate GTree and then merged (i.e. transcluded) into the file's GTree

Each data structure in this structure represents the results of another processing pass, and a further refinement of our run-time representation of the content.

TODO:540 Make sure that comments are properly associated with markup tags (XML document data) and markup declarations (DTD stuff).

func NewGTreeFromGTags

func NewGTreeFromGTags(GEs []*GTag) (pGT *GTree, err error)

NewGTreeFromGTags is TBS.

TODO: FIXME Check that root Tag matches DOCTYPE. TODO: FIXME Provide a slice of dirpaths, for resolving external IDs. TODO: FIXME Multiple root Tags, set Xml contype to Fragment TODO: FIXME If has DOCTYPE, set XML contype to document (unless is Fragment) TODO: FIXME If has LwDITA DOCTYPE, set DITA contype.

func NewGTreeFromMarkdownFile

func NewGTreeFromMarkdownFile(path FU.AbsFilePath) (pET *GTree, err error)

NewGTreeFromMarkdownFile is a convenience function that reads in the file, which is presumed to be Markdown (MDITA flavor), then tokenizes it, and then passes the buffered file contents to the next function, below.

TODO:670 Provide a slice of dirpaths, for resolving external IDs.

func (*GTree) EchoTo

func (T *GTree) EchoTo(w io.Writer)

func (GTree) String

func (et GTree) String() string

type Tagentry

type Tagentry struct {
	// contains filtered or unexported fields
}

func NewTagentry

func NewTagentry(aTag string, anIndex int) Tagentry

func (*Tagentry) Index

func (pTE *Tagentry) Index() int

func (*Tagentry) Tag

func (pTE *Tagentry) Tag() string

type Tagstack

type Tagstack []Tagentry

gagstack is a LIFO stack for GTags.

func (Tagstack) IsEmpty

func (ts Tagstack) IsEmpty() bool

IsEmpty is a no-brainer.

func (Tagstack) Peek

func (ts Tagstack) Peek() Tagentry

Peek will barf on an empty stack.

func (*Tagstack) Pop

func (ts *Tagstack) Pop() Tagentry

Pop will barf on an empty stack.

func (*Tagstack) Push

func (ts *Tagstack) Push(te Tagentry)

Push reslices the stack.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL