archivetoday

package module
v0.0.0-...-582bcb0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 10, 2021 License: MIT Imports: 15 Imported by: 4

README

archive.today

Documentation Build Status Report Card

About

archivetoday is a golang package for archiving web pages via archive.today.

Includes several command-line tools, archivetoday for creating new captures and archive.today-snapshots for finding existing captures.

(See "Command-line programs" section below for further details.)

Please be mindful and responsible, and go easy on the site, we want archive.today to last forever and not cause headaches or heartache!

Created by Jay Taylor.

Also see my related work: archive.org golang package

Alternate archive.today site / domain aliases: archive.fo, archive.is, archive.li, archive.md, archive.ph, archive.vn

Wikipedia article: archive.today

Requirements
  • Go version 1.9 or newer
Installation
go get jaytaylor.com/acrhive.today/...
Usage
Command-line programs
acrhive.today <url>

Archive a fresh new copy of an HTML page

acrhive.today-snapshots <url>

Search for existing page snapshots

Search query examples:

  • microsoft.com for snapshots from the host microsoft.com
  • *.microsoft.com for snapshots from microsoft.com and all its subdomains (e.g. www.microsoft.com)
  • http://twitter.com/burgerking for snapshots from exact url (search is case-sensitive)
  • http://twitter.com/burg* for snapshots from urls starting with http://twitter.com/burg
Go package interfaces
Capture URL HTML Page Content

capture.go:

package main

import (
	"fmt"

	"github.com/jaytaylor/acrhive.today"
)

var captureURL = "https://jaytaylor.com/"

func main() {
	archiveURL, err := archivetoday.Capture(captureURL)
	if err != nil {
		panic(err)
	}
	fmt.Printf("Successfully archived %v via acrhive.today: %v\n", captureURL, archiveURL)
}

// Output:
//
// Successfully archived https://jaytaylor.com/ via acrhive.today: https://acrhive.today/i2PiW
Search for Existing Snapshots

search.go:

package main

import (
    "fmt"
    "time"

    "github.com/jaytaylor/acrhive.today"
)

var searchURL = "https://jaytaylor.com/"

func main() {
    snapshots, err := archivetoday.Search(searchURL, 10*time.Second)
    if err != nil {
        panic(err)
    }
    fmt.Printf("%# v\n", snapshots)
}

// Output:
//
//
Running the test suite
go test ./...
TODO
  • Add timeout to .Capture.
  • Consider unifying to single binary
License

Permissive MIT license, see the LICENSE file for more information.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	BaseURL               = "https://archive.today"                                                                                                    // Overrideable default package value.
	HTTPHost              = "archive.today"                                                                                                            // Overrideable default package value.
	UserAgent             = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36" // Overrideable default package value.
	DefaultRequestTimeout = 10 * time.Second                                                                                                           // Overrideable default package value.
	DefaultPollInterval   = 5 * time.Second                                                                                                            // Overrideable default package value.

)

Functions

func Capture

func Capture(u string, cfg ...Config) (string, error)

Capture archives the provided URL using the archive.today service.

Types

type Config

type Config struct {
	Anyway         bool          // Force archival even if there is already a recent snapshot of the page.
	Wait           bool          // Wait until the crawl has been completed.
	WaitTimeout    time.Duration // Max time to wait for crawl completion.  Default is unlimited.
	PollInterval   time.Duration // Interval between crawl completion checks.  Defaults to 5s.
	RequestTimeout time.Duration // Overrides default request timeout.
	SubmitID       string        // Accepts a user-provided submitid.
}

Config settings for page capture client behavior.

type Snapshot

type Snapshot struct {
	URL          string
	ThumbnailURL string
	Timestamp    time.Time
}

Snapshot represents an instance of a URL page snapshot on archive.is.

func Search(url string, timeout time.Duration) ([]Snapshot, error)

Search for URL snapshots.

Directories

Path Synopsis
_examples
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL