ao3

package module
v0.0.0-...-d9a2bf9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 28, 2018 License: MIT Imports: 10 Imported by: 0

README

ao3-go

ao3-go is a Go client for Archive of our Own. Work in progress.

Due to the absence of a HTTP API, this package uses goquery to scrape from the website. As a result, the reliability of the package is tested using integration tests which compare processsed live data against expected values.

This package is designed to be the backend API for the fanficowl project. As a result, the API endpoints are tailored towards fanficowl's requirements.

Supported Endpoints

  • GetFandomCategories retrieves the list of fandom categories
    • Actual endpoint: https://archiveofourown.org/media
  • GetFandomCategory retrieves the fandoms under a category
    • Actual endpoint: https://archiveofourown.org/media/[category]/fandoms
  • GetTaggedWorks retrieves a paginated list of works for a tag with optional search parameters
    • Actual endpoint: https://archiveofourown.org/tags/[tag]/works?page=[page]
  • GetTagSearchOptions retrieves the possible search options for a tag's works
    • Actual endpoint: https://archiveofourown.org/tags/[tag]/works
  • GetAuthorWorks retrieves a list of works for a author with optional search parameters
    • Actual endpoint: https://archiveofourown.org/users/[author]/works
  • GetAuthorSearchOptions retrieves the possible search options for an author's works
    • Actual endpoint: https://archiveofourown.org/users/[author]/works
  • GetSeriesWorks retrieves a series' works and its metadata
    • Actual endpoint: https://archiveofourown.org/series/[series]
  • GetWork retrieves the details for a work
    • Actual endpoint: https://archiveofourown.org/works/[work]?view_adult=true
  • DownloadWork downloads the entire work and returns a byte array
    • Actual endpoint: https://archiveofourown.org/downloads/[path]
  • Authenticate authenticates the user and retrieves the session cookie
    • Actual endpoint: https://archiveofourown.org/user_sessions
    • An initial GET request is required by the scraper in order to obtain the authenticity (CSRF) token
  • AddKudos adds kudos to a work
    • Actual endpoint: https://archiveofourown.org/works/[work]/kudos
  • SearchWorks searches works
    • Actual endpoint: https://archiveofourown.org/works/search

Error Handling

See ao3_error.go for the format of all errors handled by this package.

Known Issues

Priority Affected Description
High Lists of works Listing works for a tag may return a different work count depending on whether the user is logged in. Solution to implement: add option to authenticate users.
High Most functions UTF-8 support is not yet implemented in functions which take in parameters for the URL's endpoint.
Medium IndexedWorkNode IndexedWorkNode is missing integration tests. As a fundamental part of this codebase, extensive tests should be written.
Low Author links Authors which are orphan accounts (e.g., Lumeilleur at https://archiveofourown.org/works/4664616) will link to the orphan_account user as pseudonyms are ignored.
Low Works part of collections On the website, if a work is part of a collection, its tags' links are prepended with /collections/[collection]/. This package ignores these prefixes, so slugs point to the main tags instead, i.e., /collections/[collection]/tags/[tag]/works => /tags/[tag]/works.
Low GetWork GetWork does not link to collections.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AtoiWithComma

func AtoiWithComma(s string) (int, error)

AtoiWithComma performs strconv.Atoi, removing commas from the string

func InitAO3Client

func InitAO3Client(client *http.Client, sanitizationPolicy SanitizationPolicy) (*AO3Client, *AO3Error)

InitAO3Client optionally takes in two parameters:

  • client, the HTTP client to use (especially useful to configure timeouts and use custom clients, for example, with Google Cloud Platform);
  • sanitizerStrength, the sanitization policy for sanitization of blurbs,e tc. (NonePolicy performs no sanitization)

Types

type AO3Client

type AO3Client struct {
	HttpClient    *http.Client
	HtmlSanitizer *Sanitizer
}

AO3Client contains configuration parameters for the package

func (*AO3Client) DownloadWork

func (client *AO3Client) DownloadWork(path string) ([]byte, *AO3Error)

DownloadWork downloads the work and returns a byte array

func (*AO3Client) GetFandomCategories

func (client *AO3Client) GetFandomCategories() ([]FandomCategory, *AO3Error)

GetFandomCategories scrapes the fandoms list.

Endpoint: https://archiveofourown.org/media

func (*AO3Client) GetFandomCategory

func (client *AO3Client) GetFandomCategory(category string) ([]Fandom, *AO3Error)

GetFandomCategory returns a list of all the fandoms under a category.

Endpoint: https://archiveofourown.org/media/[category]/fandoms Example: https://archiveofourown.org/media/Anime%20*a*%20Manga/fandoms

func (*AO3Client) GetSeries

func (client *AO3Client) GetSeries(id string) (*Series, *AO3Error)

GetSeries returns the metadata and works for a series.

Endpoint: https://archiveofourown.org/series/[series]

func (*AO3Client) GetTagWorks

func (client *AO3Client) GetTagWorks(tag string, page int) (*TagWorks, *AO3Error)

GetTagWorks returns a paginated list of works from a tag. A tag can represent fandoms, characters, etc.

Endpoint: https://archiveofourown.org/tags/[tag]/works?page=[page] Example: https://archiveofourown.org/tags/Action*s*Adventure/works

func (*AO3Client) GetWork

func (client *AO3Client) GetWork(id string) (*Work, *AO3Error)

GetWork retrieves a work from its page

Endpoint: https://archiveofourown.org/works/[work]?view_adult=true

type AO3Error

type AO3Error struct {
	// contains filtered or unexported fields
}

func NewError

func NewError(code int, message string) *AO3Error

func WrapError

func WrapError(code int, err error, message string) *AO3Error

func (*AO3Error) Code

func (e *AO3Error) Code() int

func (*AO3Error) Error

func (e *AO3Error) Error() string

type Fandom

type Fandom struct {
	Name   string
	Letter string
	Slug   string
	Count  int
}

type FandomCategory

type FandomCategory struct {
	Name string
	Slug string
}

type IndexedWork

type IndexedWork struct {
	Title       string
	Slug        string
	LastUpdated string

	IsAnonymous bool
	Authors     []Link
	Recipients  []Link

	Rating   string
	Warnings string
	Category string
	Status   string

	FandomTags       []Link
	WarningTags      []Link
	RelationshipTags []Link
	CharacterTags    []Link
	FreeformTags     []Link

	IsSeries   bool
	Series     Link
	SeriesPart int

	Summary string

	Language  string
	Words     int
	Chapters  string
	Comments  int
	Kudos     int
	Bookmarks int
	Hits      int
}

IndexedWork represents a work listed in a list of tags

type Link struct {
	Text string
	Slug string
}

Link is an internal representation of links parsed from the website

type SanitizationPolicy

type SanitizationPolicy int
const (
	// NonePolicy instructs the sanitizer not to perform any sanitization
	NonePolicy SanitizationPolicy = 0
	// AO3Policy instructs the sanitizer to only keep AO3's limited HTML tags
	AO3Policy SanitizationPolicy = 1
	// AO3AndroidPolicy instructs the sanitizer to keep the AO3Policy tags
	// which are supported by Android's TextView
	AO3AndroidPolicy SanitizationPolicy = 2
)

type Sanitizer

type Sanitizer struct {
	// contains filtered or unexported fields
}

func NewSanitizer

func NewSanitizer(strength SanitizationPolicy) (*Sanitizer, error)

func (*Sanitizer) Sanitize

func (sanitizer *Sanitizer) Sanitize(html string) string

type Series

type Series struct {
	Title       string
	IsAnonymous bool
	Creators    []Link
	Begun       string
	Updated     string
	Description string
	Notes       string
	Words       int
	NumWorks    int
	IsComplete  bool
	Bookmarks   int

	Works []IndexedWork
}

Series is a representation of the series page

type TagWorks

type TagWorks struct {
	Works []IndexedWork
	Count int

	// Pagination-related values
	IsPaginated bool
	CurrentPage int
	LastPage    int
}

TagWorks is a represented of a paginated /tags/.../works page

type Work

type Work struct {
	Title       string
	IsAnonymous bool
	Authors     []Link

	RatingTags    []Link
	FandomTags    []Link
	WarningTags   []Link
	CategoryTags  []Link
	CharacterTags []Link
	FreeformTags  []Link

	IsSeries   bool
	Series     Link
	SeriesPart int

	Language  string
	Published string
	Updated   string
	Words     int
	Chapters  string
	Comments  int
	Kudos     int
	Bookmarks int
	Hits      int

	Summary string

	HTMLDownloadSlug string
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL