Documentation ¶
Overview ¶
Package mcfile defines a per-file structure [MCFile] that holds all relevant per-file information. This includes:
- file path info
- file content (UTF-8, tietysti)
- file type information (MIME and more)
- the results of markup-specific file analysis (in the most analysable case, i.e. XML, this comprises tokens, gtokens, gelms, gtree)
For a discussion of tree walk functions, see `doc_wfn.go`
Note that if we do not get an explicit XML DOCTYPE declaration, there is some educated guesswork required.
The first workflow was based on XML, and comprises: `text => XML tokens => GTokens => GTags => GTree`
First, package `gparse` gets as far as the `GToken`s, which can only be in a list: they have no tree structure. Then package `gtree` handles the rest.
XML analysis starts off with tokenization (by the stdlib), so it makes sense to then have separate steps for making `GToken's, GTag's, GTree`. <br/> MKDN and HTML analyses use higher-level libraries that deliver CSTs (Concrete Syntax Tree, i.e. parse tree). We choose to do this processing in `package gparse` rather than in `package gtree`.
MKDN gets a tree of `yuin/goldmark/ast/Node`, and HTML gets a tree of stdlib `golang.org/x/net/html/Node`. Since a CST is delivered fully-formed, it makes sense to have Step 1 that attaches to each node its `GToken´ and `GTag`, and then Step 2 that builds a `GTree`.
There are three major types of `MCFile`, corresponding to how we process the file content: - "XML" - - (§1) Use stdlib `encoding/xml` to get `[]XU.XToken` - - (§1) Convert `[]XU.XToken` to `[]gparse.GToken` - - (§2) Build `GTree` - "MKDN" - - (§1) Use `yuin/goldmark` to get tree of `yuin/goldmark/ast/Node` - - (§1) From each Node make a `MkdnToken` (in a list?) incl. `GToken` and `GTag` - - (§2) Build `GTree` - "HTML" - - (§1) Use `golang.org/x/net/html` to get a tree of `html.Node` - - (§1) From each Node make a `HtmlToken` (in a list?) incl. `GToken` and `GTag` - - (§2) Build `GTree`
In general, all go files in this protocol stack should be organised as: <br/> - struct definition() - constructors (named `New*`) - printf stuff (Raw(), Echo(), String())
Some characteristic methods: - Raw() returns the original string passed from the golang XML parser (with whitespace trimmed) - Echo() returns a string of the item in normalised form, altho be aware that the presence of terminating newlines is not treated uniformly - String() returns a string suitable for runtime nonitoring and debugging
NOTE The use of shorthand in variable names: Doc, Elm, Att.
NOTE We use `godoc2md`, so we can use Markdown in these code comments.
Index ¶
- Variables
- func AddInXName(ElmT StringTally, AttT StringTally, gT *gtoken.GToken)
- func DumpGElm(p AST.Node) string
- func KidsAsSlice(p AST.Node) []AST.Node
- func ListKids(p AST.Node) string
- func NormalizeTextLeaves(rootNode AST.Node)
- type Contentity
- func (p *Contentity) DoBlockList() *Contentity
- func (p *Contentity) DoEntitiesList() error
- func (p *Contentity) DoGLinks() *Contentity
- func (p *Contentity) DoTableOfContents() *Contentity
- func (p *Contentity) DoValidation(pXCF *XU.XmlCatalogFile) (dtdS string, docS string, errS string)
- func (p *Contentity) ExecuteStages() *Contentity
- func (p *Contentity) GatherLinks() error
- func (p *Contentity) GatherXmlGLinks() *Contentity
- func (p *Contentity) IsDir() bool
- func (p *Contentity) IsDirlike() bool
- func (p *Contentity) L(level LL, format string, a ...interface{})
- func (p *Contentity) LogPrefix(mid string) string
- func (p *Contentity) LogTextQuote(level LL, textquote string, format string, a ...interface{})
- func (p *Contentity) NewEntitiesList() (gEnts map[string]*gparse.GEnt, err error)
- func (p *Contentity) ProcessEntities_() error
- func (p *Contentity) RefineDirectives() error
- func (p *Contentity) SetError(s string)
- func (p Contentity) String() string
- func (p *Contentity) SubstituteEntities() error
- func (p *Contentity) TallyTags()
- func (p *Contentity) WrapError(s string, e error)
- type ContentityEngine
- type ContentityError
- type ContentityFS
- func (p *ContentityFS) AsSlice() []*Contentity
- func (p *ContentityFS) DirCount() int
- func (p *ContentityFS) DoForEvery(stgprocsr ContentityStage)
- func (p *ContentityFS) FileCount() int
- func (p *ContentityFS) ItemCount() int
- func (p *ContentityFS) RootAbsPath() string
- func (p *ContentityFS) RootContentity() *RootContentity
- func (p *ContentityFS) Size() int
- type ContentityStage
- type Flags
- type GLink
- type GLinks
- type LL
- type LinkInfo
- type LinkInfos
- type LogInfo
- type NodeStringser
- type RootContentity
- type StringTally
Constants ¶
This section is empty.
Variables ¶
var GlobalAttCount int
var GlobalTagCount int
var LwDitaAttsForGLinks = []string{
"name",
"href",
"id",
"idref",
"idrefs",
"conref",
"data-conref",
"keys",
"data-keys",
"keyref",
"data-keyref",
}
Functions ¶
func AddInXName ¶
func AddInXName(ElmT StringTally, AttT StringTally, gT *gtoken.GToken)
func NormalizeTextLeaves ¶
Types ¶
type Contentity ¶
type Contentity struct { ON.Nord MU.Errer LogInfo // ContentityRow is what gets persisted to the DB (and has Raw) m5db.ContentityRow // ParserResults is parseutils.ParserResults_ffs // (ffs = file format -specific) ParserResults interface{} GTokens []*gtoken.GToken GTags []*gtree.GTag *gtree.GTree // maybe not need GRootTag or RootOfASTptr GTknsWriter, GTreeWriter, GEchoWriter io.Writer GLinks // GEnts is "ENTITY"" directives (both with "%" and without). GEnts map[string]*gparse.GEnt // DElms is "ELEMENT" directives. DElms map[string]*gtree.GTag TagTally StringTally AttTally StringTally // contains filtered or unexported fields }
Contentity is awesome. .
func NewContentity ¶
func NewContentity(aPath string) (*Contentity, error)
NewContentity returns a Contentity Nord (i.e. node with ordered children) that can NOT be the root of a Contentity tree.
NOTE: because of interface hassles, BOTH return values might be non-nil, in which case, ignore the error.
We want everything to be in a nice tree of Nords, and that means that we have to create Contenties for directories too (where MarkupType == SU.MU_type_DIRLIKE).
When this func is called while walking a DIRECTORY given on the command line, aPath is a simple file (or dir) name, with no path separators.
When this func is called for a FILE given on the command line, aPath can be either absolute or relative, depending on what was on the CLI (altho probably a relFP has been upgraded to an absFP).
Alternative hack to achieve a similar end: if pPP,e := NewPP(path); e == nil; pPA,e := new PA(pPP); e == nil; pCR,e := NewCR(pPA); e == nil { ... } .
func (*Contentity) DoBlockList ¶
func (p *Contentity) DoBlockList() *Contentity
DoBlockList makes a list of all the nodes that are blocks, so that they cn be traversed for rendering, and targeted for references. .
func (*Contentity) DoEntitiesList ¶
func (p *Contentity) DoEntitiesList() error
DoEntitiesList collects all entity definitions. -n Note that each Token has been normalized. -n- rtType:ENTITY string1:foo string2:"FOO" entityIsParameter:false -n- rtType:ENTITY string1:bar string2:"BAR" entityIsParameter:true
func (*Contentity) DoTableOfContents ¶
func (p *Contentity) DoTableOfContents() *Contentity
DoTableOfContents makes a ToC. .
func (*Contentity) DoValidation ¶
func (p *Contentity) DoValidation(pXCF *XU.XmlCatalogFile) (dtdS string, docS string, errS string)
DoValidation TODO If no DOCTYPE, make a guess based on Filext but it can't be fatal.
func (*Contentity) ExecuteStages ¶
func (p *Contentity) ExecuteStages() *Contentity
ExecuteStages processes a Contentity to completion in an isolated thread, and can eaily be converted to run as a goroutine. Summary:
- st0_Init()
- st1_Read()
- st2_Tree()
- st3_Refs()
- st4_Done() (not currently called, but will work on all input files at once !)
An interesting question is, how can we indicate an error and terminate a thread prematurely ? The method currently chosen is to use interface github.com/fbaube/miscutils/Errer. This has to be checked for at the start of a func. But then we can chain functions by writing them left-to-right. Winning!
(If functions accept and return a ptr+error pair then they chain right-to-left, which is a big fail for readability.)
We could also pass in a `Context` and use its cancellation capability. Yet another way might be simply to `panic`, and so this function already has code to catch panics. .
func (*Contentity) GatherLinks ¶
func (p *Contentity) GatherLinks() error
GatherLinks is: @conref to reuse block-level content, @keyref to reuse phrase-level content. TODO Each type of link (i.e. elm/att where it occurs) has to be categorised. TODO Each format of link target has to be categorised. Cross ref : <xref> : <a href> : [link](/URI "title") Key def : <keydef> : <div data-class="keydef"> : <div data- class="keydef"> in HDITA syntax Map : <map> : <nav> : See Example of an MDITA map (20) Topic ref : <topicref> : <a href> inside a <li> : [link](/URI "title") inside a list item TODO Stuff to get: XDITA map - topicref @href (w @format) - task @id HDITA - article @id - span @data-keyref - p @data-conref MDITA - has YAML "id" - uses <p @data-conref> - uses <span @data-keyref> - uses MD [link_text](link_target.dita) - uses ![The remote](../images/remote-control-callouts.png "The remote") XDITA - topic @id - ph @keyref - image @href - p @id - video/source @value - section @id - p @conref
func (*Contentity) GatherXmlGLinks ¶
func (p *Contentity) GatherXmlGLinks() *Contentity
GatherXmlGLinks is: XmlItems is (DOCS) IDs & IDREFs, (DTDs) Elm defs (incl. Att defs) & Ent defs *xmlfile.XmlItems // *IDinfo
func (p *MCFile) GatherXmlGLinks() *MCFile {
func (*Contentity) IsDir ¶
func (p *Contentity) IsDir() bool
func (*Contentity) IsDirlike ¶
func (p *Contentity) IsDirlike() bool
func (*Contentity) L ¶
func (p *Contentity) L(level LL, format string, a ...interface{})
func (*Contentity) LogPrefix ¶
func (p *Contentity) LogPrefix(mid string) string
func (*Contentity) LogTextQuote ¶
func (p *Contentity) LogTextQuote(level LL, textquote string, format string, a ...interface{})
func (*Contentity) NewEntitiesList ¶
func (p *Contentity) NewEntitiesList() (gEnts map[string]*gparse.GEnt, err error)
NewEntitiesList collects all entity definitions. -n Note that each Token is normalized. -n- rtType:ENTITY string1:foo string2:"FOO" entityIsParameter:false -n- rtType:ENTITY string1:bar string2:"BAR" entityIsParameter:true
CALLED BY ProcessEntities only//
func (*Contentity) ProcessEntities_ ¶
func (p *Contentity) ProcessEntities_() error
func (*Contentity) RefineDirectives ¶
func (p *Contentity) RefineDirectives() error
RefineDirectives scans to patch Directives with correct keyword.
func (*Contentity) SetError ¶
func (p *Contentity) SetError(s string)
func (Contentity) String ¶
func (p Contentity) String() string
String is developer output. Hafta dump: FU.InputFile, FU.OutputFiles, GTree, GRefs, *XmlFileMeta, *XmlItems, *DitaInfo
func (*Contentity) SubstituteEntities ¶
func (p *Contentity) SubstituteEntities() error
SubstituteEntities does replacement in Entities for simple (single-token) entity references, i.e. that begin with "%" or "&".
func (*Contentity) TallyTags ¶
func (p *Contentity) TallyTags()
func (*Contentity) WrapError ¶
func (p *Contentity) WrapError(s string, e error)
type ContentityEngine ¶
type ContentityEngine struct {
// contains filtered or unexported fields
}
ContentityEngine tracks the (oops, global) state of a ContentityFS tree being assembled, for example when a directory is specified for recursive analysis.
FIXME: ID assignment should be offloaded to the DB ? .
var CntyEng *ContentityEngine = new(ContentityEngine)
CntyEng is a package global, which is dodgy and not re-entrant. The solution probably involves currying.
NOTE: Is the call to new(..) unnecessary? This variable should NOT be reinitialized for every new ContentityFS.
type ContentityError ¶
type ContentityError struct { PE fs.PathError *Contentity }
ContentityError is Contentity + SrcLoc (in source code) + PathError struct { Op, Path string; Err error }
Maybe use the format pkg.filename.methodname.Lnn ¶
In code where package `mcfile` is not available, try a fileutils.PathPropsError
func NewContentityError ¶
func NewContentityError(ermsg string, op string, cty *Contentity) ContentityError
func WrapAsContentityError ¶
func WrapAsContentityError(e error, op string, cty *Contentity) ContentityError
func (ContentityError) Error ¶
func (ce ContentityError) Error() string
func (*ContentityError) String ¶
func (ce *ContentityError) String() string
type ContentityFS ¶
type ContentityFS struct { // FS will be set from func [os.DirFS] fs.FS // contains filtered or unexported fields }
ContentityFS is an instance of an fs.FS where every node is an mcfile.Contentity.
Note that directories ARE included in the tree, because the instances of [orderednodes.Nord] in each Contentity must properly interconnect in forming a complete tree.
Note that the file system is stored as a tree AND as a slice AND as a map. If any of these is modified without also modifying the others to match, there WILL be problems. For that reason, we use unexported instance variables that are accessible only via getters.
It ain't bulletproof tho. And in any case, users of a ContentityFS should feel free to use the functions on the embedded ordered nodes ("Nords"s). .
var CntyFS *ContentityFS
CntyFS is a global, which is a mistake.
func NewContentityFS ¶
func NewContentityFS(aPath string, okayFilexts []string) *ContentityFS
(OBS?) NewContentityFS takes an absolute filepath. Passing in a relative filepath is going to cause major problems. .
func (*ContentityFS) AsSlice ¶
func (p *ContentityFS) AsSlice() []*Contentity
func (*ContentityFS) DirCount ¶
func (p *ContentityFS) DirCount() int
func (*ContentityFS) DoForEvery ¶
func (p *ContentityFS) DoForEvery(stgprocsr ContentityStage)
func (*ContentityFS) FileCount ¶
func (p *ContentityFS) FileCount() int
func (*ContentityFS) ItemCount ¶
func (p *ContentityFS) ItemCount() int
func (*ContentityFS) RootAbsPath ¶
func (p *ContentityFS) RootAbsPath() string
func (*ContentityFS) RootContentity ¶
func (p *ContentityFS) RootContentity() *RootContentity
func (*ContentityFS) Size ¶
func (p *ContentityFS) Size() int
type ContentityStage ¶
type ContentityStage func(*Contentity) *Contentity
type GLink ¶
type GLink struct { // IsRefnc - else is Refnt (Referents are much more numerous) IsRefnc bool // IsExtl - else is Intl (which are more numerous) IsExtl bool // AddressMode is "http", "key", "idref", "uri" AddressMode string // Att is the XML attribute - id, idref, href, xref, keyref, etc. Att string // Tag is the tag that has this link-related attribute of interest Tag string // Link_raw as redd in during parsing Link_raw string // RelFP can be a URI or the resolution of a keyref. // "" if target is in same file; NOTE This is relative to the // sourcing file, NOT to the current working directory during parsing! RelFP string // AbsFP can be a URI or the resolution of a keyref. // "" if target is in same file AbsFP FU.AbsFilePath // TopicID iff present (but isn't it mandatory ?) TopicID string // FragID is peeled off from Raw FragID string // Resolved is used to narrow in on difficult cases Resolved bool // LinkedFrom is the GTag where the GLink is defined LinkedFrom *gtree.GTag // Original can be nil: it is the tag where the GLink is resolved to, // i.e. the REFERENT, and is quite possibly in another file, which we // hope we also have available in memory. Original *gtree.GTag }
GLink summarizes a link (or key) (or reference) found in markup content. It is either URI-based (`href conref id`) or key-based (`key keyref`). It applies to all LwDITA formats, but not all fields apply to all LwDITA formats.
type GLinks ¶
type GLinks struct { // OwnerP points back to the owning struct, so that // `GLink`s can be processed easily as simple data structures. OwnerP interface{} // KeyRefncs are outgoing key-based links/references KeyRefncs []*GLink // (Extl|Intl)KeyReferences // KeyRefnts are unique key-based definitions that are possible // referents (resolution targets) of same or other files' [KeyRefncs] KeyRefnts []*GLink // (Extl|Intl)KeyDefs // UriRefncs are outgoing URI-based links/references UriRefncs []*GLink // (Extl|Intl)UriReferences // UriRefnts are unique URI-based definitions that are possible // referents(resolution targets) of same or other files' [UriRefncs] UriRefnts []*GLink // (Extl|Intl)UriDefs }
GLinks is used for (1) intra-file ref resolution, (2) inter-file ptr resolution, (3) ToC generation.
type LinkInfos ¶
type LinkInfos struct { Conrefs []LinkInfo Keyrefs []LinkInfo Datarefs []LinkInfo // contains filtered or unexported fields }
LinkInfos is: @conref to reuse block-level content, @keyref to reuse phrase-level content. TODO Each type of link (i.e. elm/att where it occurs) has to be categorised. TODO Each format of link target has to be categorised. Cross ref : <xref> : <a href> : [link](/URI "title") Key def : <keydef> : <div data-class="keydef"> : <div data- class="keydef"> in HDITA syntax Map : <map> : <nav> : See Example of an MDITA map (20) Topic ref : <topicref> : <a href> inside a <li> : [link](/URI "title") inside a list item TODO Stuff to get: XDITA map - topicref @href (w @format) - task @id HDITA - article @id - span @data-keyref - p @data-conref MDITA - has YAML "id" - uses <p @data-conref> - uses <span @data-keyref> - uses MD [link_text](link_target.dita) - uses ![The remote](../images/remote-control-callouts.png "The remote") XDITA - topic @id - ph @keyref - image @href - p @id - video/source @value - section @id - p @conref
In GFile: LinkInfos:
type LogInfo ¶
LogInfo exists mainly to provide a grep'able string - for example "(01:4a)". The io.Writer exists outside of the [github.com/fbaube/mlog] logging subsystem and should only be used if `mlog` is not.
type NodeStringser ¶
type RootContentity ¶
type RootContentity Contentity
RootContentity makes assignments to/from root node explicit.
func NewRootContentity ¶
func NewRootContentity(aRootPath string) (*RootContentity, error)
NewRootContentity returns a RootContentity Nord (i.e. node with ordered children) that can be the root of a new Contentity tree. It requires that argument aRootPath is an absolute filepath and is a directory. .
type StringTally ¶
var GlobalAttTally StringTally
var GlobalTagTally StringTally
func (StringTally) StringSortedValues ¶
func (st StringTally) StringSortedValues() string
Source Files ¶
- contentity.go
- contentity_new.go
- contentity_newroot.go
- contentityengine.go
- contentityerror.go
- contentityfs.go
- contentityfs_new.go
- contentityfswalker.go
- doc.go
- doc_wfn.go
- getglinks-mkdn.go
- getglinks-xml.go
- glink.go
- log.go
- mkdn-textleaves.go
- nodestringser.go
- pathexclusions.go
- seterror.go
- st-exec.go
- st0-init.go
- st1-read.go
- st2-tree.go
- st3-refs.go
- st4-done.go
- tallytags.go
- utils-mkdn.go
- validation.go
- xmldoentities.go
- xmlprocentities.go
- xmlprocmeta.go