webpackager

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 9, 2022 License: Apache-2.0 Imports: 18 Imported by: 0

README

Web Packager

Build Status GoDoc

Web Packager is a command-line tool to "package" websites in accordance with the specifications proposed at WICG/webpackage. It may look like gen-signedexchange, but is rather based on gen-signedexchange and focuses on automating generation of Signed HTTP Exchanges (aka. SXGs) and optimizing the page loading.

Web Packager HTTP Server is an HTTP server built on top of Web Packager. It functions like a reverse-proxy, receiving signing requests over HTTP. For more detail, see cmd/webpkgserver/README.md. This README focuses on the command-line tool.

Web Packager retrieves HTTP responses from servers and turns them into signed exchanges. Those signed exchanges are written into files in a way to preserve the URL path structure, so can be deployed easily in some typical cases. In addition, Web Packager applies some optimizations to the signed exchanges to help the content get rendered quicker.

Web Packager is purposed primarily for a showcase of how to speed up the page loading with privacy-preserving prefetch. Web developers may port the logic from this codebase to their systems or integrate Web Packager into their systems. The Web Packager's code is designed to allow some injections of custom logic; see the GoDoc comments for details. Note, however, that Web Packager is currently at an early stage now: see Limitations below.

Web Packager is not related to webpack.

Prerequisite

Web Packager is written in the Go language thus requires a Go system to run. See Getting Started on golang.org for how to install Go on your computer.

You will also need a certificate and private key pair to use for the signing the exchanges. Note the certificate must:

(For example, DigiCert offers the right kind of certificates.)

Then you will need to convert your certificate into the application/cert-chain+cbor format, which you can do using the instructions at:

Limitations

In this early phase, we may make backward-breaking changes to the commandline or API.

Web Packager aims to automatically meet most but not all Google SXG Cache requirements. In particular, pages that do not use responsive design should specify a supported-media annotation.

Web Packager does not handle request matching correctly. It should not matter unless your web server implements content negotiation using the Variants and Variant-Key headers (not the Vary header). We plan to support the request matching in future, but there is no ETA (estimated time of availability) at this moment.

Note: The above limitation is not expected to be a big deal even if your server serves signed exchanges conditionally using content negotiation: if you already have signed exchanges, you should not need Web Packager.

Install

go get -u github.com/google/webpackager/cmd/...

Usage

The simplest command looks like:

webpackager \
    --cert_cbor=cert.cbor \
    --private_key=priv.key \
    --cert_url=https://example.com/cert.cbor \
    --url=https://example.com/hello.html

It will retrieve an HTTP response from https://example.com/, generate a signed exchange with the given pair of certificate (cert.cbor) and private key (priv.key), then write it to ./sxg/hello.html.sxg. If hello.html had subresources that could be preloaded together, webpackager would also retrieve those resources and generate their signed exchanges under ./sxg. Web Packager recognizes <link rel="preload"> and equivalent Link HTTP headers. It also adds the preload links for CSS (stylesheets) used in HTML, and may use more heuristics in future. See the defaultproc package to find how exactly the HTTP response is processed.

--cert_url specifies where the client will expect to find the CBOR-format certificate chain. --cert_cbor is optional when it can be fetched from --cert_url. Note the reverse is not true: --cert_url is always required.

The --url flag can be repeated as many times as you want. For example:

webpackager \
    --cert_cbor=cert.cbor \
    --private_key=priv.key \
    --cert_url=https://example.com/cert.cbor \
    --url=https://example.com/foo/ \
    --url=https://example.com/bar/ \
    --url=https://example.com/baz/

would generate the following three files:

  • ./sxg/foo/index.html.sxg for https://example.com/foo/
  • ./sxg/bar/index.html.sxg for https://example.com/bar/
  • ./sxg/baz/index.html.sxg for https://example.com/baz/

Note: webpackager expects all target URLs to have the same origin. In particular, the output files collide if you specify more than one URL that has the same path but a different domain.

Using URL File

webpackage also accepts --url_file=FILE. FILE is a plain text file with one URL on each line. For example, you could create urls.txt with:

# This is a comment.
https://example.com/foo/
https://example.com/bar/
https://example.com/baz/

then run:

webpackager \
    --cert_cbor=cert.cbor \
    --private_key=priv.key \
    --cert_url=https://example.com/cert.cbor \
    --url_file=urls.txt
Changing Output Directory

You can change the output directory with the --sxg_dir flag:

webpackager \
    --cert_cbor=cert.cbor \
    --private_key=priv.key \
    --cert_url=https://example.com/cert.cbor \
    --sxg_dir=/tmp/sxg \
    --url=https://example.com/hello.html
Setting Expiration

The signed exchanges last one hour by default. You can change the duration with the --expiry flag. For example:

webpackager \
    --cert_cbor=cert.cbor \
    --private_key=priv.key \
    --cert_url=https://example.com/cert.cbor \
    --expiry=72h \
    --url=https://example.com/hello.html

would make the signed exchanges valid for 72 hours (3 days). The maximum is 168h (7 days), due to the specification.

Other Flags

webpackager provides more flags for advanced usage (e.g. to set request headers). Run the tool with --help to see those flags.

Appendix: Deploying SXGs

The steps below illustrate an example of deploying Signed HTTP Exchanges on an Apache server.

  1. Upload cert.cbor to your server. Make it available at --cert_url.

  2. Upload *.sxg files to your server. Put them next to the original files (e.g. hello.html.sxg should stay in the same directory as hello.html). For example, if you are using the sftp command to upload, you can:

    sftp> cd public_html
    sftp> put -r sxg/*
    

    assuming public_html to be the document root and sxg to be where you generated the *.sxg files.

  3. Edit or create .htaccess in public_html (or the Apache's config file) to add the following settings:

    AddType application/signed-exchange;v=b3 .sxg
    
    <Files "cert.cbor">
      AddType application/cert-chain+cbor .cbor
    </Files>
    
    RewriteEngine On
    RewriteCond %{HTTP:Accept} application/signed-exchange
    RewriteCond %{REQUEST_FILENAME} !\.sxg$
    RewriteCond %{REQUEST_FILENAME}\.sxg -s
    RewriteRule .+ %{REQUEST_URI}.sxg [L]
    
    Header set X-Content-Type-Options: "nosniff"
    

Documentation

Overview

Package webpackager implements the control flow of Web Packager.

The code below illustrates the usage of this package:

packager := webpackager.NewPackager(webpackager.Config{
	ExchangeFactory: &exchange.Factory{
		Version:      version.Version1b3,
		MIRecordSize: 4096,
		// ... (you need to set other fields)
	},
	ResourceCache: filewrite.NewFileWriteCache(&filewrite.Config{
		BaseCache: cache.NewOnMemoryCache(),
		ExchangeMapping: filewrite.AddBaseDir(
			filewrite.AppendExt(filewrite.UsePhysicalURLPath(), *flagSXGExt), *flagSXGDir),
	}),
})
for _, url := range urls {
	if err := packager.Run(url); err != nil {
		log.Print(err)
	}
}

Config allows you to change some behaviors of the Packager. packager.Run(url) retrieves an HTTP response using FetchClient, processes it using Processor, and turns it into a signed exchange using ExchangeFactory. Processor inspects the HTTP response to see the eligibility for signed exchanges and manipulates it to optimize the page loading. The generated signed exchanges are stored in ResourceCache to prevent duplicates.

The code above sets just two parameters, ExchangeFactory and ResourceCache, and uses the defaults for other parameters. With this setup, the packager retrieves the content just through HTTP, applies the recommended set of optimizations, generates signed exchanges compliant with the version b3, and saves them in files named like "index.html.sxg" under "/tmp/sxg".

Config has a few more parameters. See its definition for the details.

You can also pass your own implementations to Config to inject custom logic into the packaging flow. You could write, for example, a custom FetchClient to retrieve the content from a database table instead of web servers, a custom Processor or HTMLTask to apply your optimization techniques, a ResourceCache to store the produced signed exchanges into another database table in addition to a local drive, and so on.

The cmd/webpackager package provides a command line interface to execute the packaging flow without writing the driver code on your own.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func WrapError

func WrapError(err error, url *url.URL) error

WrapError wraps err into an Error. url is the URL which err was raised for.

Types

type Config

type Config struct {
	// RequestTweaker specifies the mutation applied to every http.Request
	// before it is passed to FetchClient. RequestTweaker is applied
	// both to the http.Request instances passed to the Packager and those
	// generated internally (e.g. for subresources). Note that, however,
	// some RequestTweakers have effect only to subresource requests.
	//
	// nil implies fetch.DefaultRequestTweaker.
	RequestTweaker fetch.RequestTweaker

	// FetchClient specifies how to retrieve the resources which Packager
	// produces the signed exchanges for.
	//
	// nil implies fetch.DefaultFetchClient, which is just an http.Client
	// properly configured.
	FetchClient fetch.FetchClient

	// PhysicalURLRule specifies the rule(s) to simulate the URL rewriting
	// on the server side, such as appending "index.html" to the path when
	// it points to a directory.
	//
	// nil implies urlrewrite.DefaultRules, which contains a reasonable set
	// of rules to simulate static web servers.
	//
	// See package urlrewrite for details.
	PhysicalURLRule urlrewrite.Rule

	// ValidityURLRule specifies the rule to determine the validity URL,
	// where the validity data should be served.
	//
	// nil implies validity.DefaultURLRule, which appends ".validity" plus
	// the last modified time (in UNIX time) to the document URL.
	ValidityURLRule validity.URLRule

	// Processor specifies the processor(s) applied to each HTTP response
	// before turning it into a signed exchange. The processors make sure
	// the response can be distributed as signed exchanges and optionally
	// adjust the response for optimized page loading.
	//
	// nil implies complexproc.DefaultProcessor, a composite of relatively
	// conservative processors.
	//
	// See package processor for details.
	Processor processor.Processor

	// ValidPeriodRule specifies the rule to determine the validity period
	// of signed exchanges.
	//
	// nil implies vprule.DefaultRule.
	ValidPeriodRule vprule.Rule

	// ExchangeFactory specifies encoding parameters and signing materials
	// for producing signed exchanges. If you use the same certificate and
	// private key for the whole lifetime of the Packager, you can specify
	// an *exchange.Factory directly.
	//
	// ExchangeFactory must be set to non-nil.
	ExchangeFactory exchange.FactoryProvider

	// ResourceCache specifies the cache to store the signed exchanges and
	// the validity data.
	//
	// It is typically initialized with filewrite.NewFileWriteCache so the
	// signed exchanges are saved into files. See the package document for
	// sample usage.
	//
	// nil implies cache.NewOnMemoryCache(). It is not likely useful:
	// the process would produce signed exchanges and store them in memory,
	// then throw them away at the termination.
	ResourceCache cache.ResourceCache
}

Config defines injection points to Packager.

type Error

type Error struct {
	// Err represents the actual error.
	Err error
	// URL represents the URL that caused this Error.
	URL *url.URL
}

Error represents an error from Packager.Run.

func (*Error) Error

func (e *Error) Error() string

Error implements the error interface.

func (*Error) Unwrap

func (e *Error) Unwrap() error

Unwrap returns the wrapped error.

type Packager

type Packager struct {
	Config
}

Packager implements the control flow of Web Packager.

func NewPackager

func NewPackager(config Config) *Packager

NewPackager creates and initializes a new Packager with the provided Config. It panics when config.ExchangeFactory is nil.

func (*Packager) Run

func (pkg *Packager) Run(url *url.URL, sxgDate time.Time) (*resource.Resource, error)

Run runs the process to obtain the signed exchange for url: fetches the content from the server, processes it, and produces the signed exchange from it. Run also takes care of subresources (external resources referenced from the fetched content), such as stylesheets, provided they are good for preloading.

The process stops when it encounters any error with processing the main resource (specified by url), but keeps running and produces the signed exchange for the main resource if it just fails with the subresources. In either case, Run returns a multierror.Error (hashicorp/go-multierror), which consists of webpackager.Errors.

Run does not run the process when ResourceCache already has an entry for url.

func (*Packager) RunForRequest

func (pkg *Packager) RunForRequest(req *http.Request, sxgDate time.Time) (*resource.Resource, error)

RunForRequest is like Run, but takes an http.Request instead of a URL thus provides more flexibility to the caller.

RunForRequest uses req directly: RequestTweaker mutates req; FetchClient sends req to retrieve the HTTP response.

Directories

Path Synopsis
Package certchain handles signed exchange certificates.
Package certchain handles signed exchange certificates.
certchainutil
Package certchainutil complements the certchain package.
Package certchainutil complements the certchain package.
certmanager
Package certmanager manages signed exchange certificates.
Package certmanager manages signed exchange certificates.
certmanager/acmeclient
Package acmeclient provides a RawChainSource to acquire a signed exchange certificate using the ACME protocol.
Package acmeclient provides a RawChainSource to acquire a signed exchange certificate using the ACME protocol.
certmanager/futureevent
Package futureevent defines interface to handle future events.
Package futureevent defines interface to handle future events.
cmd
webpackager
webpackager is a command to "package" websites in accordance with https://github.com/WICG/webpackage/.
webpackager is a command to "package" websites in accordance with https://github.com/WICG/webpackage/.
webpkgserver
webpkgserver is a command to run Web Packager HTTP Server.
webpkgserver is a command to run Web Packager HTTP Server.
Package exchange provides high-level interface to generate signed exchanges.
Package exchange provides high-level interface to generate signed exchanges.
exchangetest
Package exchangetest provides utilities for exchange testing.
Package exchangetest provides utilities for exchange testing.
vprule
Package vprule defines how to determine the validity period of signed exchanges.
Package vprule defines how to determine the validity period of signed exchanges.
Package fetch defines interface to retrieve contents to package.
Package fetch defines interface to retrieve contents to package.
fetchtest
Package fetchtest provides FetchClient implementations for use in testing.
Package fetchtest provides FetchClient implementations for use in testing.
internal
certchaintest
Package certchaintest provides utilities for certificate chain testing.
Package certchaintest provides utilities for certificate chain testing.
customflag
Package customflag provides additional flag.Value implementations.
Package customflag provides additional flag.Value implementations.
timeutil
Package timeutil provides time.Now that can be monkey-patched.
Package timeutil provides time.Now that can be monkey-patched.
urlutil
Package urlutil provides URL-related utility functions.
Package urlutil provides URL-related utility functions.
Package processor defines the Processor interface.
Package processor defines the Processor interface.
commonproc
Package commonproc implements processors applicable to any HTTP responses.
Package commonproc implements processors applicable to any HTTP responses.
complexproc
Package complexproc provides a factory of fully-featured processors, namely NewComprehensiveProcessor.
Package complexproc provides a factory of fully-featured processors, namely NewComprehensiveProcessor.
htmlproc
Package htmlproc implements a Processor to process HTML documents.
Package htmlproc implements a Processor to process HTML documents.
htmlproc/htmldoc
Package htmldoc provides interface to handle HTML documents.
Package htmldoc provides interface to handle HTML documents.
htmlproc/htmltask
Package htmltask implements some optimization logic for HTML documents.
Package htmltask implements some optimization logic for HTML documents.
preverify
Package preverify implements processors to verify that HTTP responses can be distributed as signed exchanges.
Package preverify implements processors to verify that HTTP responses can be distributed as signed exchanges.
Package resource defines representations of resources to generate signed exchanges for.
Package resource defines representations of resources to generate signed exchanges for.
cache
Package cache defines the ResourceCache interface and provides the most basic implementation.
Package cache defines the ResourceCache interface and provides the most basic implementation.
cache/filewrite
Package filewrite provides ResourceCache that also saves signed exchanges to files on the Store operations to the cache.
Package filewrite provides ResourceCache that also saves signed exchanges to files on the Store operations to the cache.
httplink
Package httplink defines a representation of Web Linkings.
Package httplink defines a representation of Web Linkings.
preload
Package preload defines representations of preload links.
Package preload defines representations of preload links.
preload/preloadtest
Package preloadtest provides utilities for preload link testing.
Package preloadtest provides utilities for preload link testing.
Package server implements Web Packager HTTP Server (webpkgserver).
Package server implements Web Packager HTTP Server (webpkgserver).
tomlconfig
Package tomlconfig defines the TOML config for Web Packager HTTP Server.
Package tomlconfig defines the TOML config for Web Packager HTTP Server.
Package urlmatcher defines interface for URL matching.
Package urlmatcher defines interface for URL matching.
Package urlrewrite reproduces server-side URL rewrite logic.
Package urlrewrite reproduces server-side URL rewrite logic.
Package validity handles the validity data of signed exchanges.
Package validity handles the validity data of signed exchanges.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL