archivetoday

package module

v0.0.0-...-582bcb0 Latest Latest Go to latest Published: Jun 10, 2021 License: MIT Imports: 15 Imported by: 4

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/jaytaylor/archive.today

Links

Open Source Insights

README ¶

archive.today

About

archivetoday is a golang package for archiving web pages via archive.today.

Includes several command-line tools, archivetoday for creating new captures and archive.today-snapshots for finding existing captures.

(See "Command-line programs" section below for further details.)

Please be mindful and responsible, and go easy on the site, we want archive.today to last forever and not cause headaches or heartache!

Created by Jay Taylor.

Also see my related work: archive.org golang package

Alternate archive.today site / domain aliases: archive.fo, archive.is, archive.li, archive.md, archive.ph, archive.vn

Wikipedia article: archive.today

Requirements

Go version 1.9 or newer

Installation

go get jaytaylor.com/acrhive.today/...

Usage

Command-line programs

acrhive.today <url>

Archive a fresh new copy of an HTML page

acrhive.today-snapshots <url>

Search for existing page snapshots

Search query examples:

microsoft.com for snapshots from the host microsoft.com
*.microsoft.com for snapshots from microsoft.com and all its subdomains (e.g. www.microsoft.com)
http://twitter.com/burgerking for snapshots from exact url (search is case-sensitive)
http://twitter.com/burg* for snapshots from urls starting with http://twitter.com/burg

Go package interfaces

Capture URL HTML Page Content

capture.go:

package main

import (
	"fmt"

	"github.com/jaytaylor/acrhive.today"
)

var captureURL = "https://jaytaylor.com/"

func main() {
	archiveURL, err := archivetoday.Capture(captureURL)
	if err != nil {
		panic(err)
	}
	fmt.Printf("Successfully archived %v via acrhive.today: %v\n", captureURL, archiveURL)
}

// Output:
//
// Successfully archived https://jaytaylor.com/ via acrhive.today: https://acrhive.today/i2PiW

Search for Existing Snapshots

search.go:

package main

import (
    "fmt"
    "time"

    "github.com/jaytaylor/acrhive.today"
)

var searchURL = "https://jaytaylor.com/"

func main() {
    snapshots, err := archivetoday.Search(searchURL, 10*time.Second)
    if err != nil {
        panic(err)
    }
    fmt.Printf("%# v\n", snapshots)
}

// Output:
//
//

Running the test suite

go test ./...

TODO

Add timeout to .Capture.
Consider unifying to single binary

License

Permissive MIT license, see the LICENSE file for more information.

Documentation ¶

Index ¶

Variables
func Capture(u string, cfg ...Config) (string, error)
type Config
type Snapshot
- func Search(url string, timeout time.Duration) ([]Snapshot, error)

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	BaseURL               = "https://archive.today"                                                                                                    // Overrideable default package value.
	HTTPHost              = "archive.today"                                                                                                            // Overrideable default package value.
	UserAgent             = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36" // Overrideable default package value.
	DefaultRequestTimeout = 10 * time.Second                                                                                                           // Overrideable default package value.
	DefaultPollInterval   = 5 * time.Second                                                                                                            // Overrideable default package value.

)

Functions ¶

func Capture ¶

func Capture(u string, cfg ...Config) (string, error)

Capture archives the provided URL using the archive.today service.

Types ¶

type Config ¶

type Config struct {
	Anyway         bool          // Force archival even if there is already a recent snapshot of the page.
	Wait           bool          // Wait until the crawl has been completed.
	WaitTimeout    time.Duration // Max time to wait for crawl completion.  Default is unlimited.
	PollInterval   time.Duration // Interval between crawl completion checks.  Defaults to 5s.
	RequestTimeout time.Duration // Overrides default request timeout.
	SubmitID       string        // Accepts a user-provided submitid.
}

Config settings for page capture client behavior.

type Snapshot ¶

type Snapshot struct {
	URL          string
	ThumbnailURL string
	Timestamp    time.Time
}

Snapshot represents an instance of a URL page snapshot on archive.is.

func Search ¶

func Search(url string, timeout time.Duration) ([]Snapshot, error)

Search for URL snapshots.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
_examples
capture
search
cmd
archive.today
archive.today-snapshots

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL