Documentation ¶
Index ¶
- type Asset
- type Assets
- type Chapter
- type ChapterContent
- type CleanupOptions
- type General
- type NovelConfig
- type Pagination
- type Parser
- type Replacement
- type SiteConfiguration
- type Source
- type SourceContent
- type TemplateChapter
- type TemplateToC
- type Templates
- type TitleContent
- type Toc
- type Translator
- type WaybackMachine
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Asset ¶
Asset contains the path on the host system and after being added the internal path of the epub
type Chapter ¶
type Chapter struct { URL string `yaml:"url"` SourceContent `yaml:",inline"` }
Chapter is the struct for a single chapter, requires on the URL also implements the ChapterContent struct
type ChapterContent ¶
type ChapterContent struct { ContentSelector *string `yaml:"content-selector"` CleanupOptions `yaml:",inline"` }
ChapterContent contains the content selector the content cleanup options
type CleanupOptions ¶
type CleanupOptions struct { PrefixSelectors *[]string `yaml:"prefix-selectors"` SuffixSelectors *[]string `yaml:"suffix-selectors"` StripRegex string `yaml:"strip-regex"` CleanupRegex string `yaml:"cleanup-regex"` }
CleanupOptions are all options related to cleaning up the extracted content of titles and chapters
type General ¶
type General struct { Title string `yaml:"title"` AltTitle string `yaml:"alt-title"` Author string `yaml:"author"` Description string `yaml:"description"` Cover string `yaml:"cover"` Language string `yaml:"language"` Raw string `yaml:"raw"` Translators []*Translator `yaml:"translators"` }
General contains the general information about the novel
type NovelConfig ¶
type NovelConfig struct { BaseDirectory string General General `yaml:"general"` Sites []SiteConfiguration `yaml:"sites"` Chapters []Source `yaml:"chapters"` Assets Assets `yaml:"assets"` BackList []string `yaml:"blacklist"` Replacements []Replacement `yaml:"replacements"` Templates Templates `yaml:"templates"` }
NovelConfig contains the configuration of the novel scraper
func (*NovelConfig) DoURLReplacements ¶
func (s *NovelConfig) DoURLReplacements(checkedURL string) (chapterUrl string, changed bool)
DoURLReplacements checks if the passed URL is getting replaced through the configuration
func (*NovelConfig) GetSiteConfigFromURL ¶
func (s *NovelConfig) GetSiteConfigFromURL(url *url.URL) *SiteConfiguration
GetSiteConfigFromURL retrieves the site configuration for the passed URL will return an empty site configuration with nil values if no site configuration for host exists
func (*NovelConfig) IsURLBlacklisted ¶
func (s *NovelConfig) IsURLBlacklisted(checkedURL string) bool
IsURLBlacklisted checks if the passed URL is blacklisted it parses the passed URL and the blacklisted URLs to ignore minor differences like f.e. trailing slash
type Pagination ¶
type Pagination struct { ReversePosts *bool `yaml:"reverse-posts"` NextPageSelector *string `yaml:"next-page-selector"` }
Pagination contains all implemented options for paginations of websites
type Parser ¶
type Parser struct{}
Parser is a struct solely to prevent expose functions without setting up first
func NewParser ¶
func NewParser() *Parser
NewParser returns a pointer to an initialized parser struct
func (*Parser) ReadConfigurationFile ¶
func (p *Parser) ReadConfigurationFile(fileName string) (novelConfig *NovelConfig, err error)
ReadConfigurationFile tries to read the passed configuration file and parse it into a NovelConfig struct
type Replacement ¶
Replacement contains the replaced URL and their replacement
type SiteConfiguration ¶
type SiteConfiguration struct { Host string `yaml:"host"` Pagination Pagination `yaml:"pagination"` SourceContent `yaml:",inline"` Redirects []string `yaml:"redirects"` WaybackMachine WaybackMachine `yaml:"wayback-machine"` }
SiteConfiguration is an optional configuration to extract the Pagination struct and the ChapterContent struct into one Configuration object, allowing multiple sources of the same Host to reuse them Source options have a higher priority than the SiteConfiguration options
type Source ¶
Source is the option to define the source of the chapter content, table of content or single chapter
type SourceContent ¶
type SourceContent struct { TitleContent `yaml:"title-content"` ChapterContent `yaml:"chapter-content"` }
SourceContent contains all configurations required for any type of source
type TemplateChapter ¶
TemplateChapter contains all templates related to the chapter pages
type TemplateToC ¶
type TemplateToC struct { Content string `yaml:"content"` AltTitle string `yaml:"alt-title"` Translator string `yaml:"translator"` }
TemplateToC contains all templates related to the table of content page
type Templates ¶
type Templates struct { ToC TemplateToC `yaml:"toc"` Chapter TemplateChapter `yaml:"chapter"` }
Templates contains a collection of templates to style the generated epub file
type TitleContent ¶
type TitleContent struct { AddPrefix *bool `yaml:"add-prefix"` TitleSelector *string `yaml:"title-selector"` CleanupOptions `yaml:",inline"` }
TitleContent contains the title selector and the title cleanup options
type Toc ¶
type Toc struct { URL string `yaml:"url"` ChapterSelector string `yaml:"chapter-selector"` Pagination `yaml:"pagination"` SourceContent `yaml:",inline"` }
Toc is the struct for a table of content, requires the URL and the ChapterSelectors and implements the Pagination struct and the ChapterContent struct
type Translator ¶
Translator contains the name and website of the translators
type WaybackMachine ¶
WaybackMachine contains the usage and version option of a site