epub

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 26, 2023 License: BSD-2-Clause Imports: 9 Imported by: 3

README

EPUB

GoDoc  Go Report Card 

epub package provides a way to retrieve stored metadata from epub files.

epub package offers also a minimal tool to print to the standard output the metadata of the given epub file.

INSTALLATION

Everything should work fine using go standard commands (build, get, install...).

To install the metadata reading utility, run go install ./cmd/epub.

USAGE

Running godoc should give you helpful guidelines on available features.

Metadata reading utility usage is straightforward, just type epub <epub>, where '' is the path to the epub file you want to read metadata from.

CONTRIBUTION

If you feel like to contribute, just follow github guidelines on forking then send a pull request

Documentation

Overview

Package epub provides a way to retrieve stored metadata from epub files.

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrStopWalk is used as a return value from WalkFunc to
	// indicate that the Walkxxx operation need to be
	// stopped. It is not return as an error by any Walkxxx
	// function.
	ErrStopWalk = errors.New("stop walk")
)

Functions

func WalkFiles added in v0.2.0

func WalkFiles(path string, walkFn WalkFunc) error

WalkFiles walks EPUB's files, calling walkFn for each visited resource.

func WalkPublicationResources added in v0.2.0

func WalkPublicationResources(path string, walkFn WalkFunc) error

WalkPublicationResources walks EPUB's publication resources as listed in EPUB's Manifest, calling walkFn for each visited resource. Limitation: resources that are not belonging to the EPUB archive itself (like remote resources) are silently ignored.

func WalkReadingContent added in v0.2.0

func WalkReadingContent(path string, walkFn WalkFunc) error

WalkReadingContent walks EPUB's publication resources as listed in EPUB's Spine, calling walkFn for each visited resource. Limitation: resources that are not belonging to the EPUB archive itself (like remote resources) are silently ignored.

Types

type Author

type Author struct {
	FullName string
	FileAs   string
	Role     string
}

Author represents an author.

type AuthorElt added in v0.2.0

type AuthorElt struct {
	*Element

	// FileAs attribute is used to specify a normalized form of the contents,
	// suitable for machine processing.
	FileAs string `xml:"file-as,attr,omitempty"`
	// Role attribute is used to refine the Author role. It's usually a
	// 3-character registered MARC value (http://www.loc.gov/marc/relators/).
	Role string `xml:"role,attr,omitempty"`
}

AuthorElt is a specific Element that provides information about a creator or contributor. It 'extends' EPUB3 Element to capture possible 'opf:role' and 'opf:file-as' attributes that can be found in older EPUB version.

type Collection added in v0.2.0

type Collection struct {
	// Dir attribute specifies the base text direction of the content and
	// attribute values of the carrying element and its descendants.
	// Allowed values are ltr (left-to-right) and rtl (right-to-left).
	Dir string `xml:"dir,attr,omitempty"`
	// ID attributes provides the ID [XML] of the element, which MUST be
	// unique within the document scope.
	ID string `xml:"id,attr,omitempty"`
	// Role uniquely identifies all conformant collection elements.
	Role string `xml:"role,attr"`
	// Lang specifies the language used in the contents and attribute
	// values of the carrying element and its descendants.
	Lang string `xml:"xml:lang,attr,omitempty"`

	Metadata    *Metadata    `xml:"metadata,omitempty"`
	Collections []Collection `xml:"collection,omitempty"`
	Links       []Link       `xml:"link,omitempty"`
}

Collection element defines a related group of resources.

type Date

type Date struct {
	Stamp string
	Event string
}

Date represents an event.

type DateElt added in v0.2.0

type DateElt struct {
	*Element

	// Event attribute further detailed the event to which the date correspond
	// to.
	Event string `xml:"event,attr,omitempty"`
}

DateElt is a specific Element that provides information about the date of publication.

type Element added in v0.2.0

type Element struct {
	// Dir attribute specifies the base text direction of the content and
	// attribute values of the carrying element and its descendants.
	// Allowed values are ltr (left-to-right) and rtl (right-to-left).
	Dir string `xml:"dir,attr,omitempty"`
	// ID attributes porvides the ID [XML] of the element, which MUST be
	// unique within the document scope.
	ID string `xml:"id,attr,omitempty"`
	// Lang specifies the language used in the contents and attribute
	// values of the carrying element and its descendants, as defined in
	// section 2.12 Language Identification of [XML].
	Lang string `xml:"xml:lang,attr,omitempty"`

	// Value is the Element's value
	Value string `xml:",chardata"`
}

Element is a generic Metadata element.

type Epub added in v0.3.0

type Epub struct {
	*zip.ReadCloser
	// contains filtered or unexported fields
}

Epub represents a read-only EPUB document.

func Open added in v0.3.0

func Open(path string) (*Epub, error)

Open an EPUB from a file. Returned Epub needs to be closed when no longer needed.

func (*Epub) Information added in v0.3.0

func (e *Epub) Information() (*Information, error)

Information returns a simplified but easier to use version of PackageDocument.Metadata.

func (*Epub) OpenItem added in v0.3.0

func (e *Epub) OpenItem(href string) (fs.File, error)

OpenItem opens an EPUB Publication Resource identified by its href as usually found in Manifest. OpenItem will try to unescape href first. Opening Items whoses Href points outside of EPUB archive will failed.

func (*Epub) Package added in v0.3.0

func (e *Epub) Package() (*PackageDocument, error)

Package returns the EPUB PackageDocument.

type GenericMetadata added in v0.2.0

type GenericMetadata struct {
	Name    string
	Content string
}

GenericMetadata represents a generic metadata.

type Identifier

type Identifier struct {
	Scheme string
	Value  string
}

Identifier represents an identifier.

type IdentifierElt added in v0.2.0

type IdentifierElt struct {
	*Element

	// Scheme attribute names the system or authority that generated or
	// assigned the text contained within the identifier element, for example
	// "ISBN" or "DOI
	Scheme string `xml:"scheme,attr,omitempty"`
}

IdentifierElt is a specific Element that provides a string or number used to uniquely identify the resource. It 'extends' EPUB3 Element to capture possible 'opf:scheme' attribute that can be found in older EPUB version.

type Information added in v0.2.0

type Information struct {
	// Identifier contains an identifier associated with the given
	// Rendition, such as a UUID, DOI or ISBN.
	Identifier []Identifier
	// Title represents the EPUB titles.
	Title []string
	// SubTitle represents the EPUB sub-titles.
	SubTitle []string `json:",omitempty"`

	// Language element specifies the language of the content of the
	// given Rendition.
	Language []string

	// Contributor represents the name of a person, organization, etc.
	// that played a secondary role in the creation of the content of an
	// EPUB Publication.
	Contributor []Author `json:",omitempty"`
	// Coverage gives the extent or scope of the publication’s content.
	Coverage []string `json:",omitempty"`
	// Creator represents the name of a person, organization, etc.
	// responsible for the creation of the content of the Rendition.
	Creator []Author
	// Date lists events associated to the EPUB like publication, creation...
	Date []Date `json:",omitempty"`
	// Description provides a description of the publication's content.
	Description []string `json:",omitempty"`
	// Format identifies the media type or dimensions of the resource.
	Format []string `json:",omitempty"`
	// Publisher identifies the publication's publisher.
	Publisher []string `json:",omitempty"`
	// Relation is an identifier of an auxiliary resource and its
	// relationship to the publication.
	Relation []string `json:",omitempty"`
	// Rights provides a statement about rights, or a reference to one.
	Rights []string `json:",omitempty"`
	// Sources provides information regarding a prior resource from which
	// the publication was derived.
	Source []string `json:",omitempty"`
	// Subject identifies the subject of the EPUB Publication.
	Subject []string `json:",omitempty"`
	// Type is used to indicate that the given EPUB Publication is of a
	// specialized type.
	Type []string `json:",omitempty"`

	// Meta element provides a generic means of including package
	// metadata.
	Meta []GenericMetadata `json:",omitempty"`

	// Series is the series to which this book belongs to.
	Series string `json:",omitempty"`
	// SeriesIndex is the position in the series to which the book belongs to.
	SeriesIndex string `json:",omitempty"`
}

Information gathers meta information about an epub as a simpler version of Metadata to offer a more direct access to an Epub's metadata for simple use cases.

func GetMetadataFromFile

func GetMetadataFromFile(path string) (*Information, error)

GetMetadataFromFile reads metadata from an epub file.

type Item added in v0.2.0

type Item struct {
	// Fallback attribute takes an IDREF [XML] that identifies a
	// fallback for the Publication Resource referenced from the item
	// element.
	Fallback string `xml:"fallback,attr,omitempty"`
	// Href is an absolute or relative IRI reference [RFC3987] to a
	// resource.
	Href string `xml:"href,attr"`
	// ID attributes provides the ID [XML] of the element, which MUST be
	// unique within the document scope.
	ID string `xml:"id,attr"`
	// MediaOverlay attribute takes an IDREF [XML] that identifies
	// the Media Overlay Document for the resource described by this
	// item.
	MediaOverlay string `xml:"media-overlay,attr,omitempty"`
	// MediaType indicates the MIME media type the Publication Resource
	// identified by Item MUST conform to.
	MediaType string `xml:"media-type,attr"`
	// Properties is a space-separated list of property values.
	Properties string `xml:"properties,attr,omitempty"`
}

Item element represents a Publication Resource.

type Itemref added in v0.2.0

type Itemref struct {
	// ID attributes provides the ID [XML] of the element, which MUST be
	// unique within the document scope.
	ID string `xml:"id,attr,omitempty"`
	// IDref references the ID [XML] of a unique item in the manifest via
	// the IDREF [XML] in its idref attribute (i.e., two or more itemref
	// elements cannot reference the same item).
	IDref string `xml:"idref,attr"`
	// Linear attribute indicates whether the referenced item contains
	// content that contributes to the primary reading order and has to
	// be read sequentially ("yes") or auxiliary content that enhances or
	// augments the primary content and can be accessed out of sequence
	// ("no").
	Linear string `xml:"linear,attr,omitempty"`
	// Properties is a space-separated list of property values.
	Properties string `xml:"properties,attr,omitempty"`
}

Itemref element represents a Publication Resource.

type Link struct {
	// Href is an absolute or relative IRI reference [RFC3987] to a
	// resource.
	Href string `xml:"href,attr"`
	// ID attributes provides the ID [XML] of the element, which MUST be
	// unique within the document scope.
	ID string `xml:"id,attr,omitempty"`
	// MediaType indicates the MIME media type the Publication Resource
	// identified by Item MUST conform to.
	MediaType string `xml:"media-type,attr,omitempty"`
	// Properties takes a space-separated list of property values.
	Properties string `xml:"properties,attr,omitempty"`
	// Refines identifies the expression or resource augmented by the
	// element. The value of the attribute must be a relative IRI
	// [RFC3987] referencing the resource or element being described.
	Refines string `xml:"refines,attr,omitempty"`
	// Rel attribute takes a space-separated list of property values that
	// establish the relationship the resource has with the Rendition.
	Rel string `xml:"rel,attr"`
}

Link element is used to associate resources with the given Rendition, such as metadata records.

type Manifest added in v0.2.0

type Manifest struct {
	// ID attributes provides the ID [XML] of the element, which MUST be
	// unique within the document scope.
	ID string `xml:"id,attr,omitempty"`
	// Items lists Publication Resources
	Items []Item `xml:"item"`
}

Manifest element provides an exhaustive list of the Publication Resources that constitute the given Rendition, each represented by an item element.

type Meta

type Meta struct {
	// Dir attribute specifies the base text direction of the content and
	// attribute values of the carrying element and its descendants.
	// Allowed values are ltr (left-to-right) and rtl (right-to-left).
	Dir string `xml:"dir,attr,omitempty"`
	// ID attributes provides the ID [XML] of the element, which MUST be
	// unique within the document scope.
	ID string `xml:"id,attr,omitempty"`
	// Property takes a property data type value that defines the
	// statement being made in the expression, and the text content of
	// the element represents the assertion.
	Property string `xml:"property,attr"`
	// Refines identifies the expression or resource augmented by the
	// element.
	Refines string `xml:"refines,attr,omitempty"`
	// Scheme attribute identifies the system or scheme that the
	// element's value is drawn from.
	Scheme string `xml:"scheme,attr,omitempty"`
	// Lang specifies the language used in the contents and attribute
	// values of the carrying element and its descendants.
	Lang string `xml:"xml:lang,attr,omitempty"`

	// Value is the Element's value
	Value string `xml:",chardata"`
}

Meta element provides a generic means of including package metadata.

type MetaLegacy added in v0.2.0

type MetaLegacy struct {
	*Meta

	// Name identifies the user-defined metadata.
	Name string `xml:"name,attr"`
	// Content is the value of the metadata.
	Content string `xml:"content,attr"`
}

MetaLegacy extends Meta to adapt to a possible OPF2 meta statement.

type Metadata

type Metadata struct {
	// Identifier contains an identifier associated with the given
	// Rendition, such as a UUID, DOI or ISBN.
	Identifier []IdentifierElt `xml:"identifier"`
	// Title represents an instance of a name given to the EPUB
	// Publication.
	Title []Element `xml:"title"`
	// Language element specifies the language of the content of the
	// given Rendition.
	Language []Element `xml:"language"`

	// Contributor represents the name of a person, organization, etc.
	// that played a secondary role in the creation of the content of an
	// EPUB Publication.
	Contributor []AuthorElt `xml:"contributor,omitempty"`
	// Coverage gives the extent or scope of the publication’s content.
	Coverage []Element `xml:"coverage,omitempty"`
	// Creator represents the name of a person, organization, etc.
	// responsible for the creation of the content of the Rendition.
	Creator []AuthorElt `xml:"creator,omitempty"`
	// Date is only used to define the publication date of the EPUB
	// Publication.
	Date []DateElt `xml:"date,omitempty"`
	// Description provides a description of the publication's content.
	Description []Element `xml:"description,omitempty"`
	// Format identifies the media type or dimensions of the resource.
	Format []Element `xml:"format,omitempty"`
	// Publisher identifies the publication's publisher.
	Publisher []Element `xml:"publisher,omitempty"`
	// Relation is an identifier of an auxiliary resource and its
	// relationship to the publication.
	Relation []Element `xml:"relation,omitempty"`
	// Rights provides a statement about rights, or a reference to one.
	Rights []Element `xml:"rights,omitempty"`
	// Sources provides information regarding a prior resource from which
	// the publication was derived.
	Source []Element `xml:"source,omitempty"`
	// Subject identifies the subject of the EPUB Publication.
	Subject []Element `xml:"subject,omitempty"`
	// Type is used to indicate that the given EPUB Publication is of a
	// specialized type.
	Type []Element `xml:"type,omitempty"`

	// Meta element provides a generic means of including package
	// metadata.
	Meta []MetaLegacy `xml:"meta,omitempty"`
	// Link element is used to associate resources with the given
	// Rendition, such as metadata records.
	Link []Link `xml:"link,omitempty"`
}

Metadata encapsulates metadata information for the given Rendition.

type PackageDocument added in v0.2.0

type PackageDocument struct {
	XMLName xml.Name `xml:"http://www.idpf.org/2007/opf package"`

	// Dir attribute specifies the base text direction of the content and
	// attribute values of the carrying element and its descendants.
	// Allowed values are ltr (left-to-right) and rtl (right-to-left).
	Dir string `xml:"dir,attr,omitempty" json:",omitempty"`
	// ID attributes provides the ID [XML] of the element, which MUST be
	// unique within the document scope.
	ID string `xml:"id,attr,omitempty" json:",omitempty"`
	// Prefix attribute provides a declaration mechanism for prefixes not
	// reserved by this specification.
	Prefix string `xml:"prefix,attr,omitempty" json:",omitempty"`
	// Lang specifies the language used in the contents and attribute
	// values of the carrying element and its descendants.
	Lang string `xml:"xml:lang,attr,omitempty" json:",omitempty"`
	// UniqueIdentifier attribute takes an IDREF [XML] that identifies
	// the dc:identifier element that provides the preferred, or primary,
	// identifier.
	UniqueIdentifier string `xml:"unique-identifier,attr"`
	// The version attribute specifies the EPUB specification version to
	// which the given EPUB Package conforms.
	Version string `xml:"version,attr"`

	Metadata   *Metadata   `xml:"metadata"`
	Manifest   *Manifest   `xml:"manifest"`
	Spine      *Spine      `xml:"spine"`
	Collection *Collection `xml:"collection,omitempty" json:",omitempty"`
}

PackageDocument carries meta information about the Rendition, provides a manifest of resources and defines the default reading order. PackageDocument is an implementation of a Package Document that intend to meet specification from https://www.w3.org/publishing/epub32/epub-packages.html. Known differences mainly aim at allowing reading information from OPF2-based epub.

func GetPackageFromFile added in v0.2.0

func GetPackageFromFile(path string) (*PackageDocument, error)

GetPackageFromFile reads an epub's Open Package Document from an epub file.

type Spine added in v0.2.0

type Spine struct {
	// ID attributes provides the ID [XML] of the element, which MUST be
	// unique within the document scope.
	ID string `xml:"id,attr,omitempty"`
	// PageProgression attribute sets the global direction in which the
	// content flows. Allowed values are ltr (left-to-right), rtl
	// (right-to-left) and default.
	PageProgression string `xml:"page-progression-direction,attr,omitempty"`
	// Toc is a legacy feature that previously provided the table of
	// contents for EPUB Publications.
	Toc string `xml:"toc,attr,omitempty"`
	// Itemrefs lists Publication Resources. The order of the Itemrefs
	// elements defines the default reading order of the given Rendition.
	Itemrefs []Itemref `xml:"itemref"`
}

Spine element defines an ordered list of manifest item references that represents the default reading order of the given Rendition.

type WalkFunc added in v0.2.0

type WalkFunc func(r io.Reader, info fs.FileInfo) error

WalkFunc is the signature of function called by Walkxxx on EPUB's resources. Should an error be returned by WalkFn, Walkxxx stops and returns that error. Only exception is returning ErrStopWalk error that only interrupts Walkxxx.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL