contentanalysis

package module
v0.0.0-...-3dbfd7a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 25, 2024 License: MIT Imports: 11 Imported by: 3

README

contentanalysis

Examine a content item to determine the (Mime) type of its content.

Documentation

Overview

Package analysis is TBS.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CollectKeysOfNonNilMapValues

func CollectKeysOfNonNilMapValues(M map[string]*CT.FilePosition) []string

Types

type Doctype

type Doctype string

type MimeType

type MimeType string

type PathAnalysis

type PathAnalysis struct {
	// ContypingInfo is simple fields:
	// FileExt MType MimeType's
	XU.ContypingInfo
	// ContentityBasics does NOT include Raw
	// (the entire input content)
	XU.ContentityBasics
	// KeyElms is: (Root,Meta,Text)ElmExtent
	// KeyElmsWithRanges
	// ContentitySections is: Text_raw, Meta_raw, MetaFormat; MetaProps SU.PropSet
	// ContentityRawSections
	// XmlInfo is: XmlPreambleFields, XmlDoctype, XmlDoctypeFields, ENTITY stuff
	// ** XmlInfo **
	// XmlContype is an enum: "Unknown", "DTD", "DTDmod", "DTDent",
	// "RootTagData", "RootTagMixedContent", "MultipleRootTags", "INVALID"}
	XmlContype string
	// XmlPreambleFields is nil if no preamble - it can always
	// default to xmlutils.STD_PreambleFields (from stdlib)
	*XU.ParsedPreamble
	// XmlDoctypeFields is a ptr - nil if ContypingInfo.Doctype
	// is "", i.e. if there is no DOCTYPE declaration
	*XU.ParsedDoctype
	// DitaInfo
	DitaFlavor  string
	DitaContype string
}

PathAnalysis is the results of content analysis on the contents of a non-embedded [FSItem]. .

func NewPathAnalysis

func NewPathAnalysis(pFSI *FU.FSItem) (*PathAnalysis, error)

NewPathAnalysis is called only by NewContentityRecord(..). It has very different handling for XML content versus non-XML content. Most of the function is making several checks for the presence of XML. When a file is identified as XML, we have much more info available, so processing becomes both simpler and more complicated.

Binary content is tagged as such and no further examination is done. So, the basic top-level classificaton of content is:

  • Binary
  • XML (when a DOCTYPE is detected)
  • Everything else (incl. plain text, Markdown, and XML/HTML that lacks DOCTYPE)

If the argument is "dirlike" (dir, symlink, etc.), then NewPathAnalysis returns (nil, nil).

If the first argument "sCont" (the content) is less than six bytes, return (nil, nil) to indicate that there is not enough content with which to do anything productive or informative. .

func (*PathAnalysis) DoAnalysis_bin

func (pAR *PathAnalysis) DoAnalysis_bin() error

DoAnalysis_bin doesn't do any further processing for binary, cos we basically trust that the sniffed MIME type is sufficient, and return. .

func (*PathAnalysis) DoAnalysis_sch

func (pAR *PathAnalysis) DoAnalysis_sch() error

DoAnalysis_sch will handle DTDs and related files, and the code is mostly written but not yet integrated, so this func doesn't really worry about it yet. .

func (*PathAnalysis) DoAnalysis_txt

func (pAR *PathAnalysis) DoAnalysis_txt(sCont string) error

DoAnalysis_txt is called when the content is identified as non-XML. It does not expect to see binary content. .

func (*PathAnalysis) DoAnalysis_xml

func (pAR *PathAnalysis) DoAnalysis_xml(pXP *XU.XmlPeek, sCont string) error

func (PathAnalysis) IsXML

func (p PathAnalysis) IsXML() bool

IsXML is true for all XML, including all HTML.

func (PathAnalysis) MarkupType

func (p PathAnalysis) MarkupType() SU.MarkupType

MarkupType returns an enum with values of SU.MU_type_*

func (*PathAnalysis) String

func (p *PathAnalysis) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL