archiver

package
v0.0.0-...-d1a9080 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 5, 2021 License: AGPL-3.0 Imports: 23 Imported by: 0

README

This package is a fork of Obelisk.

It adds some needed features such as:

  • A specific flag to fetch only images, and not all media
  • Callbacks for URL and content processing.
  • Ability to use your own HTTP Client
  • Ability to use any logger

Obelisk is originally written by RadhiFadlillah and released under an MIT License.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DefaultImageProcessor

func DefaultImageProcessor(_ context.Context, _ *Archiver,
	input io.Reader, contentType string, _ *url.URL) ([]byte, string, error)

DefaultImageProcessor is the default image processor. It simply reads and return the content.

func DefaultURLProcessor

func DefaultURLProcessor(_ string, content []byte, contentType string) string

DefaultURLProcessor is the default URL processor. It returns the base64 encoded URL.

Types

type ArchiveFlag

type ArchiveFlag uint8

ArchiveFlag is an archiver feature to enable.

const (
	// EnableCSS enables extraction of CSS files and tags.
	EnableCSS ArchiveFlag = 1 << iota

	// EnableEmbeds enables extraction of Embedes contents.
	EnableEmbeds

	// EnableJS enables extraction of JavaScript contents.
	EnableJS

	// EnableMedia enables extraction of media contents
	// other than image.
	EnableMedia

	// EnableImages enables extraction of images.
	EnableImages
)

type Archiver

type Archiver struct {
	sync.RWMutex

	Cache   map[string]Asset
	Request *Request
	Result  []byte

	Flags ArchiveFlag

	ImageProcessor imageProcessor
	URLProcessor   urlProcessor
	EventHandler   eventHandler

	RequestTimeout        time.Duration
	SkipTLSVerification   bool
	MaxConcurrentDownload int64
	// contains filtered or unexported fields
}

Archiver is the core of obelisk, which used to download a web page then embeds its assets.

func New

func New(req *Request) (*Archiver, error)

New creates a new Archiver using a Request instance.

func (*Archiver) Archive

func (arc *Archiver) Archive(ctx context.Context) error

Archive starts archival process for the specified request. Returns the archival result, content type and error if there are any.

func (*Archiver) SendEvent

func (arc *Archiver) SendEvent(ctx context.Context, event Event)

SendEvent is the function used to send an archiver event.

type Asset

type Asset struct {
	Data        []byte
	ContentType string
}

Asset is asset that used in a web page.

type Event

type Event interface {
	Fields() map[string]interface{}
}

Event is the interface for events emitted by the archiver.

type EventError

type EventError struct {
	Err error
	URI string
}

EventError is the event emitted when errors occur.

func (*EventError) Fields

func (e *EventError) Fields() map[string]interface{}

Fields returns the field map.

type EventFetchURL

type EventFetchURL struct {
	// contains filtered or unexported fields
}

EventFetchURL is the event emitted when the archiver loads a remote resource.

func (*EventFetchURL) Fields

func (e *EventFetchURL) Fields() map[string]interface{}

Fields returns the field map.

type EventInfo

type EventInfo map[string]interface{}

EventInfo is a simple event for any type of data.

func (EventInfo) Fields

func (e EventInfo) Fields() map[string]interface{}

Fields returns the field map.

type EventStartHTML

type EventStartHTML string

EventStartHTML is the event emitted at the beginning of the archiving process.

func (EventStartHTML) Fields

func (e EventStartHTML) Fields() map[string]interface{}

Fields returns the field map.

type Request

type Request struct {
	Input  io.Reader
	URL    *url.URL
	Client *http.Client
}

Request is data of archival request.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL