mcfile

package module
v0.0.0-...-860260b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 3, 2024 License: MIT Imports: 26 Imported by: 3

Documentation

Overview

Package mcfile defines a per-file structure [MCFile] that holds all relevant per-file information. This includes:

  • file path info
  • file content (UTF-8, tietysti)
  • file type information (MIME and more)
  • the results of markup-specific file analysis (in the most analysable case, i.e. XML, this comprises tokens, gtokens, gelms, gtree)

For a discussion of tree walk functions, see `doc_wfn.go`

Note that if we do not get an explicit XML DOCTYPE declaration, there is some educated guesswork required.

The first workflow was based on XML, and comprises: `text => XML tokens => GTokens => GTags => GTree`

First, package `gparse` gets as far as the `GToken`s, which can only be in a list: they have no tree structure. Then package `gtree` handles the rest.

XML analysis starts off with tokenization (by the stdlib), so it makes sense to then have separate steps for making `GToken's, GTag's, GTree`. <br/> MKDN and HTML analyses use higher-level libraries that deliver CSTs (Concrete Syntax Tree, i.e. parse tree). We choose to do this processing in `package gparse` rather than in `package gtree`.

MKDN gets a tree of `yuin/goldmark/ast/Node`, and HTML gets a tree of stdlib `golang.org/x/net/html/Node`. Since a CST is delivered fully-formed, it makes sense to have Step 1 that attaches to each node its `GToken´ and `GTag`, and then Step 2 that builds a `GTree`.

There are three major types of `MCFile`, corresponding to how we process the file content: - "XML" - - (§1) Use stdlib `encoding/xml` to get `[]XU.XToken` - - (§1) Convert `[]XU.XToken` to `[]gparse.GToken` - - (§2) Build `GTree` - "MKDN" - - (§1) Use `yuin/goldmark` to get tree of `yuin/goldmark/ast/Node` - - (§1) From each Node make a `MkdnToken` (in a list?) incl. `GToken` and `GTag` - - (§2) Build `GTree` - "HTML" - - (§1) Use `golang.org/x/net/html` to get a tree of `html.Node` - - (§1) From each Node make a `HtmlToken` (in a list?) incl. `GToken` and `GTag` - - (§2) Build `GTree`

In general, all go files in this protocol stack should be organised as: <br/> - struct definition() - constructors (named `New*`) - printf stuff (Raw(), Echo(), String())

Some characteristic methods: - Raw() returns the original string passed from the golang XML parser (with whitespace trimmed) - Echo() returns a string of the item in normalised form, altho be aware that the presence of terminating newlines is not treated uniformly - String() returns a string suitable for runtime nonitoring and debugging

NOTE The use of shorthand in variable names: Doc, Elm, Att.

NOTE We use `godoc2md`, so we can use Markdown in these code comments.

Index

Constants

This section is empty.

Variables

View Source
var GlobalAttCount int
View Source
var GlobalTagCount int
View Source
var LwDitaAttsForGLinks = []string{
	"name",
	"href",
	"id",
	"idref",
	"idrefs",
	"conref",
	"data-conref",
	"keys",
	"data-keys",
	"keyref",
	"data-keyref",
}

Functions

func AddInXName

func AddInXName(ElmT StringTally, AttT StringTally, gT *gtoken.GToken)

func DumpGElm

func DumpGElm(p AST.Node) string

func KidsAsSlice

func KidsAsSlice(p AST.Node) []AST.Node

func ListKids

func ListKids(p AST.Node) string

func NormalizeTextLeaves

func NormalizeTextLeaves(rootNode AST.Node)

Types

type Contentity

type Contentity struct {
	ON.Nord
	MU.Errer

	LogInfo

	// ContentityRow is what gets persisted to the DB (and has Raw)
	m5db.ContentityRow

	// ParserResults is parseutils.ParserResults_ffs
	// (ffs = file format -specific)
	ParserResults interface{}

	GTokens      []*gtoken.GToken
	GTags        []*gtree.GTag
	*gtree.GTree // maybe not need GRootTag or RootOfASTptr
	GTknsWriter, GTreeWriter,
	GEchoWriter io.Writer

	GLinks
	// GEnts is "ENTITY"" directives (both with "%" and without).
	GEnts map[string]*gparse.GEnt
	// DElms is "ELEMENT" directives.
	DElms map[string]*gtree.GTag

	TagTally StringTally
	AttTally StringTally
	// contains filtered or unexported fields
}

Contentity is awesome. .

func NewContentity

func NewContentity(aPath string) (*Contentity, error)

NewContentity returns a Contentity Nord (i.e. node with ordered children) that can NOT be the root of a Contentity tree.

NOTE: because of interface hassles, BOTH return values might be non-nil, in which case, ignore the error.

We want everything to be in a nice tree of Nords, and that means that we have to create Contenties for directories too (where MarkupType == SU.MU_type_DIRLIKE).

When this func is called while walking a DIRECTORY given on the command line, aPath is a simple file (or dir) name, with no path separators.

When this func is called for a FILE given on the command line, aPath can be either absolute or relative, depending on what was on the CLI (altho probably a relFP has been upgraded to an absFP).

Alternative hack to achieve a similar end: if pPP,e := NewPP(path); e == nil; pPA,e := new PA(pPP); e == nil; pCR,e := NewCR(pPA); e == nil { ... } .

func (*Contentity) DoBlockList

func (p *Contentity) DoBlockList() *Contentity

DoBlockList makes a list of all the nodes that are blocks, so that they cn be traversed for rendering, and targeted for references. .

func (*Contentity) DoEntitiesList

func (p *Contentity) DoEntitiesList() error

DoEntitiesList collects all entity definitions. -n Note that each Token has been normalized. -n- rtType:ENTITY string1:foo string2:"FOO" entityIsParameter:false -n- rtType:ENTITY string1:bar string2:"BAR" entityIsParameter:true

func (p *Contentity) DoGLinks() *Contentity

DoGLinks gathers links. .

func (*Contentity) DoTableOfContents

func (p *Contentity) DoTableOfContents() *Contentity

DoTableOfContents makes a ToC. .

func (*Contentity) DoValidation

func (p *Contentity) DoValidation(pXCF *XU.XmlCatalogFile) (dtdS string, docS string, errS string)

DoValidation TODO If no DOCTYPE, make a guess based on Filext but it can't be fatal.

func (*Contentity) ExecuteStages

func (p *Contentity) ExecuteStages() *Contentity

ExecuteStages processes a Contentity to completion in an isolated thread, and can eaily be converted to run as a goroutine. Summary:

  • st0_Init()
  • st1_Read()
  • st2_Tree()
  • st3_Refs()
  • st4_Done() (not currently called, but will work on all input files at once !)

An interesting question is, how can we indicate an error and terminate a thread prematurely ? The method currently chosen is to use interface github.com/fbaube/miscutils/Errer. This has to be checked for at the start of a func. But then we can chain functions by writing them left-to-right. Winning!

(If functions accept and return a ptr+error pair then they chain right-to-left, which is a big fail for readability.)

We could also pass in a `Context` and use its cancellation capability. Yet another way might be simply to `panic`, and so this function already has code to catch panics. .

func (p *Contentity) GatherLinks() error

GatherLinks is: @conref to reuse block-level content, @keyref to reuse phrase-level content. TODO Each type of link (i.e. elm/att where it occurs) has to be categorised. TODO Each format of link target has to be categorised. Cross ref : <xref> : <a href> : [link](/URI "title") Key def : <keydef> : <div data-class="keydef"> : <div data- class="keydef"> in HDITA syntax Map : <map> : <nav> : See Example of an MDITA map (20) Topic ref : <topicref> : <a href> inside a <li> : [link](/URI "title") inside a list item TODO Stuff to get: XDITA map - topicref @href (w @format) - task @id HDITA - article @id - span @data-keyref - p @data-conref MDITA - has YAML "id" - uses <p @data-conref> - uses <span @data-keyref> - uses MD [link_text](link_target.dita) - uses ![The remote](../images/remote-control-callouts.png "The remote") XDITA - topic @id - ph @keyref - image @href - p @id - video/source @value - section @id - p @conref

func (p *Contentity) GatherXmlGLinks() *Contentity

GatherXmlGLinks is: XmlItems is (DOCS) IDs & IDREFs, (DTDs) Elm defs (incl. Att defs) & Ent defs *xmlfile.XmlItems // *IDinfo

func (p *MCFile) GatherXmlGLinks() *MCFile {

func (*Contentity) IsDir

func (p *Contentity) IsDir() bool

func (*Contentity) IsDirlike

func (p *Contentity) IsDirlike() bool

func (*Contentity) L

func (p *Contentity) L(level LL, format string, a ...interface{})

func (*Contentity) LogPrefix

func (p *Contentity) LogPrefix(mid string) string

func (*Contentity) LogTextQuote

func (p *Contentity) LogTextQuote(level LL, textquote string, format string, a ...interface{})

func (*Contentity) NewEntitiesList

func (p *Contentity) NewEntitiesList() (gEnts map[string]*gparse.GEnt, err error)

NewEntitiesList collects all entity definitions. -n Note that each Token is normalized. -n- rtType:ENTITY string1:foo string2:"FOO" entityIsParameter:false -n- rtType:ENTITY string1:bar string2:"BAR" entityIsParameter:true

CALLED BY ProcessEntities only//

func (*Contentity) ProcessEntities_

func (p *Contentity) ProcessEntities_() error

func (*Contentity) RefineDirectives

func (p *Contentity) RefineDirectives() error

RefineDirectives scans to patch Directives with correct keyword.

func (*Contentity) SetError

func (p *Contentity) SetError(s string)

func (Contentity) String

func (p Contentity) String() string

String is developer output. Hafta dump: FU.InputFile, FU.OutputFiles, GTree, GRefs, *XmlFileMeta, *XmlItems, *DitaInfo

func (*Contentity) SubstituteEntities

func (p *Contentity) SubstituteEntities() error

SubstituteEntities does replacement in Entities for simple (single-token) entity references, i.e. that begin with "%" or "&".

func (*Contentity) TallyTags

func (p *Contentity) TallyTags()

func (*Contentity) WrapError

func (p *Contentity) WrapError(s string, e error)

type ContentityEngine

type ContentityEngine struct {
	// contains filtered or unexported fields
}

ContentityEngine tracks the (oops, global) state of a ContentityFS tree being assembled, for example when a directory is specified for recursive analysis.

FIXME: ID assignment should be offloaded to the DB ? .

CntyEng is a package global, which is dodgy and not re-entrant. The solution probably involves currying.

NOTE: Is the call to new(..) unnecessary? This variable should NOT be reinitialized for every new ContentityFS.

type ContentityError

type ContentityError struct {
	PE fs.PathError
	*Contentity
}

ContentityError is Contentity + SrcLoc (in source code) + PathError struct { Op, Path string; Err error }

Maybe use the format pkg.filename.methodname.Lnn

In code where package `mcfile` is not available, try a fileutils.PathPropsError

func NewContentityError

func NewContentityError(ermsg string, op string, cty *Contentity) ContentityError

func WrapAsContentityError

func WrapAsContentityError(e error, op string, cty *Contentity) ContentityError

func (ContentityError) Error

func (ce ContentityError) Error() string

func (*ContentityError) String

func (ce *ContentityError) String() string

type ContentityFS

type ContentityFS struct {
	// FS will be set from func [os.DirFS]
	fs.FS
	// contains filtered or unexported fields
}

ContentityFS is an instance of an fs.FS where every node is an mcfile.Contentity.

Note that directories ARE included in the tree, because the instances of [orderednodes.Nord] in each Contentity must properly interconnect in forming a complete tree.

Note that the file system is stored as a tree AND as a slice AND as a map. If any of these is modified without also modifying the others to match, there WILL be problems. For that reason, we use unexported instance variables that are accessible only via getters.

It ain't bulletproof tho. And in any case, users of a ContentityFS should feel free to use the functions on the embedded ordered nodes ("Nords"s). .

var CntyFS *ContentityFS

CntyFS is a global, which is a mistake.

func NewContentityFS

func NewContentityFS(aPath string, okayFilexts []string) *ContentityFS

(OBS?) NewContentityFS takes an absolute filepath. Passing in a relative filepath is going to cause major problems. .

func (*ContentityFS) AsSlice

func (p *ContentityFS) AsSlice() []*Contentity

func (*ContentityFS) DirCount

func (p *ContentityFS) DirCount() int

func (*ContentityFS) DoForEvery

func (p *ContentityFS) DoForEvery(stgprocsr ContentityStage)

func (*ContentityFS) FileCount

func (p *ContentityFS) FileCount() int

func (*ContentityFS) ItemCount

func (p *ContentityFS) ItemCount() int

func (*ContentityFS) RootAbsPath

func (p *ContentityFS) RootAbsPath() string

func (*ContentityFS) RootContentity

func (p *ContentityFS) RootContentity() *RootContentity

func (*ContentityFS) Size

func (p *ContentityFS) Size() int

type ContentityStage

type ContentityStage func(*Contentity) *Contentity

type Flags

type Flags int
const (
	IsRef      Flags = 1 << iota // 1 << 0 i.e. 0000 0001
	IsExtl                       // 1 << 1 i.e. 0000 0010
	IsURI                        // 1 << 2 i.e. 0000 0100
	IsKey                        // 1 << 3 i.e. 0000 1000
	IsResolved                   // 1 << 4 i.e. 0001 0000
)

func (Flags) IsSet

func (b Flags) IsSet(flag Flags) bool

func (Flags) Reset

func (b Flags) Reset(flag Flags) Flags

func (Flags) Set

func (b Flags) Set(flag Flags) Flags

func (Flags) String

func (f Flags) String() string
type GLink struct {
	// IsRefnc - else is Refnt (Referents are much more numerous)
	IsRefnc bool
	// IsExtl - else is Intl (which are more numerous)
	IsExtl bool
	// AddressMode is "http", "key", "idref", "uri"
	AddressMode string
	// Att is the XML attribute - id, idref, href, xref, keyref, etc.
	Att string
	// Tag is the tag that has this link-related attribute of interest
	Tag string
	// Link_raw as redd in during parsing
	Link_raw string
	// RelFP can be a URI or the resolution of a keyref.
	// "" if target is in same file; NOTE This is relative to the
	// sourcing file, NOT to the current working directory during parsing!
	RelFP string
	// AbsFP can be a URI or the resolution of a keyref.
	// "" if target is in same file
	AbsFP FU.AbsFilePath
	// TopicID iff present (but isn't it mandatory ?)
	TopicID string
	// FragID is peeled off from Raw
	FragID string
	// Resolved is used to narrow in on difficult cases
	Resolved bool
	// LinkedFrom is the GTag where the GLink is defined
	LinkedFrom *gtree.GTag
	// Original can be nil: it is the tag where the GLink is resolved to,
	// i.e. the REFERENT, and is quite possibly in another file, which we
	// hope we also have available in memory.
	Original *gtree.GTag
}

GLink summarizes a link (or key) (or reference) found in markup content. It is either URI-based (`href conref id`) or key-based (`key keyref`). It applies to all LwDITA formats, but not all fields apply to all LwDITA formats.

type GLinks struct {
	// OwnerP points back to the owning struct, so that
	// `GLink`s can be processed easily as simple data structures.
	OwnerP interface{}
	// KeyRefncs are outgoing key-based links/references
	KeyRefncs []*GLink // (Extl|Intl)KeyReferences
	// KeyRefnts are unique key-based definitions that are possible
	// referents (resolution targets) of same or other files' [KeyRefncs]
	KeyRefnts []*GLink // (Extl|Intl)KeyDefs
	// UriRefncs are outgoing URI-based links/references
	UriRefncs []*GLink // (Extl|Intl)UriReferences
	// UriRefnts are unique URI-based definitions that are possible
	// referents(resolution targets) of same or other files' [UriRefncs]
	UriRefnts []*GLink // (Extl|Intl)UriDefs
}

GLinks is used for (1) intra-file ref resolution, (2) inter-file ptr resolution, (3) ToC generation.

type LL

type LL LU.Level
var LDebug, LInfo, LOkay, LWarning, LError, LPanic LL

type LinkInfo

type LinkInfo struct {
}

type LinkInfos

type LinkInfos struct {
	Conrefs  []LinkInfo
	Keyrefs  []LinkInfo
	Datarefs []LinkInfo
	// contains filtered or unexported fields
}

LinkInfos is: @conref to reuse block-level content, @keyref to reuse phrase-level content. TODO Each type of link (i.e. elm/att where it occurs) has to be categorised. TODO Each format of link target has to be categorised. Cross ref : <xref> : <a href> : [link](/URI "title") Key def : <keydef> : <div data-class="keydef"> : <div data- class="keydef"> in HDITA syntax Map : <map> : <nav> : See Example of an MDITA map (20) Topic ref : <topicref> : <a href> inside a <li> : [link](/URI "title") inside a list item TODO Stuff to get: XDITA map - topicref @href (w @format) - task @id HDITA - article @id - span @data-keyref - p @data-conref MDITA - has YAML "id" - uses <p @data-conref> - uses <span @data-keyref> - uses MD [link_text](link_target.dita) - uses ![The remote](../images/remote-control-callouts.png "The remote") XDITA - topic @id - ph @keyref - image @href - p @id - video/source @value - section @id - p @conref

In GFile: LinkInfos:

type LogInfo

type LogInfo struct {
	W io.Writer
	// contains filtered or unexported fields
}

LogInfo exists mainly to provide a grep'able string - for example "(01:4a)". The io.Writer exists outside of the [github.com/fbaube/mlog] logging subsystem and should only be used if `mlog` is not.

func (*LogInfo) String

func (p *LogInfo) String() string

type NodeStringser

type NodeStringser interface {
	NodeEcho(int) string
	NodeInfo(int) string
	NodeDebug(int) string
	NodeCount() int
}

type RootContentity

type RootContentity Contentity

RootContentity makes assignments to/from root node explicit.

func NewRootContentity

func NewRootContentity(aRootPath string) (*RootContentity, error)

NewRootContentity returns a RootContentity Nord (i.e. node with ordered children) that can be the root of a new Contentity tree. It requires that argument aRootPath is an absolute filepath and is a directory. .

type StringTally

type StringTally map[string]int
var GlobalAttTally StringTally
var GlobalTagTally StringTally

func (StringTally) StringSortedValues

func (st StringTally) StringSortedValues() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL