hibp

package module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 7, 2024 License: MIT Imports: 18 Imported by: 0

README

go-hibp-sync

go-hibp-sync provides functionality to keep a local copy of the HIBP leaked password database in sync with the upstream version at https://haveibeenpowned.com. In addition to syncing the "database", the library allows exporting it into a single list — the former distribution format of the database — and querying it for a given k-proximity range.

This local copy consists of one file per range/prefix, grouped into 256 directories (first 2 of 5 prefix characters). As an uncompressed copy of the database would currently require around ~40 GiB of disk space, a moderate level of zstd compression is applied with the result of cutting down storage consumption by 50%. This compression can be disabled if the little computational overhead caused outweighs the advantage of requiring only half the space.

To avoid unnecessary network transfers and to also speed up things, go-hibp-sync additionally keeps the etag returned by the upstream CDN. Subsequent requests contain it and should allow for more frequent syncs, not necessarily resulting in full re-downloads. Of course, this can be disabled too.

The library supports to continue from where it left off, the sync command mentioned below demonstrates this.

API

The API is really simple; one type, holding three methods, is exported (and additionally, typed configuration options):

New(options ...CommonOption) *HIBP
HIBP#Sync(options ...SyncOption) error // Syncs the local copy with the upstream database
HIBP#Export(w io.Writer, options ...ExportOption) error // Writes a continuous, decompressed and "free-of-etags" stream to the given io.Writer with the lines being prefix by the k-proximity range
HIBP#.Query("ABCDE") (io.ReadClose, error) // Returns the k-proximity API result as the upstream API would (without the k-proximity range as prefix)

All of them operate on disk but, depending on the medium, should provide access times that are probably good enough for all scenarios. A memory-based tmpfs will speed things up when necessary.

Attention: The official API states the following regarding the format:

Each password is stored as both a SHA-1 and an NTLM hash of a UTF-8 encoded password. The downloadable source data delimits the hash and the password count with a colon (:) and each line with a CRLF.

The crucial part being that lines are ended with \r\n. In order to be compatible with the upstream API this library sticks to this...

CLI

There are two basic CLI commands, sync and export that can be used for manual tasks and serve as minimal examples on how to use the library. They are basic but should play well with other tooling. sync will track the progress and is able to continue from where it left of last.

Run them with:

go run github.com/exaring/go-hibp-sync/cmd/sync
# and
go run github.com/exaring/go-hibp-sync/cmd/export

Documentation

Index

Constants

View Source
const (
	DefaultDataDir       = "./.hibp-data"
	DefaultStateFileName = "state"
)

Variables

This section is empty.

Functions

This section is empty.

Types

type CommonOption added in v0.2.1

type CommonOption func(config *commonConfig)

func WithDataDir added in v0.2.1

func WithDataDir(dataDir string) CommonOption

WithDataDir sets the data directory for all operations. The directory will be created it if it does not exist. Default: "./.hibp-data"

func WithoutCompression added in v0.2.1

func WithoutCompression() CommonOption

WithoutCompression disables compression when writing/reading the file-based database. When the local dataset exists already, this can only be used if the dataset has been created with the same setting. This seriously increases the amount of storage required. Default: false

type HIBP added in v0.2.1

type HIBP struct {
	// contains filtered or unexported fields
}

HIBP bundles the functionality of the HIBP package. In order to allow concurrent operations on the local, file-based dataset efficiently and safely, a shared set of locks is required - this gets managed by the HIBP type.

func New added in v0.2.1

func New(options ...CommonOption) *HIBP

func (*HIBP) Export added in v0.2.1

func (h *HIBP) Export(w io.Writer) error

Export writes the dataset to the given writer. The data is written as a continuous stream with no indication of the "prefix boundaries", the format therefore differs from the official Have-I-Been-Pwned API and from `Query`, which is mimicking the API. Lines have the schema "<prefix><suffix>:<count>".

func (*HIBP) Query added in v0.2.1

func (h *HIBP) Query(prefix string) (io.ReadCloser, error)

Query queries the local dataset for the given prefix. The function returns an io.ReadCloser that can be used to read the data, it should be closed as soon as possible to release the read lock on the file. It is the responsibility of the caller to close the returned io.ReadCloser. The resulting lines do NOT start with the prefix, they are following the schema "<suffix>:<count>". This is equivalent to the response of the official Have-I-Been-Pwned API.

func (*HIBP) Sync added in v0.2.1

func (h *HIBP) Sync(options ...SyncOption) error

Sync copies the ranges, i.e., the HIBP data, from the upstream API to the local storage. The function will start from the lowest prefix and continue until the highest prefix. See the set of SyncOption functions for customizing the behavior of the sync operation.

type ProgressFunc

type ProgressFunc func(lowest, current, to, processed, remaining int64) error

ProgressFunc represents a type of function that can be used to report progress of a sync operation. The parameters are as follows: - lowest: The lowest prefix that has been processed so far (due to concurrent operations, there is a window of prefixes that are possibly being processed at the same time, "lowest" refers to the range with the lowest prefix). - current: The current prefix that is being processed, i.e. for which the ProgressFunc gets invoked. - to: The highest prefix that will be processed. - processed: The number of prefixes that have been processed so far. - remaining: The number of prefixes that are remaining to be processed. The function should return an error if the operation should be aborted.

type SyncOption

type SyncOption func(config *syncConfig)

SyncOption represents a type of function that can be used to customize the behavior of the Sync function.

func SyncWithContext

func SyncWithContext(ctx context.Context) SyncOption

SyncWithContext sets the context for the sync operation.

func SyncWithEndpoint

func SyncWithEndpoint(endpoint string) SyncOption

SyncWithEndpoint sets a custom endpoint instead of the default Have-I-Been-Pwned API endpoint. Default: "https://api.pwnedpasswords.com/range/"

func SyncWithLastRange

func SyncWithLastRange(to int64) SyncOption

SyncWithLastRange sets the last range to be processed. Aside from tests, this is rarely useful. Default: 0xFFFFF

func SyncWithMinWorkers

func SyncWithMinWorkers(workers int) SyncOption

SyncWithMinWorkers sets the minimum number of workers goroutines that will be used to process the ranges. Default: 50

func SyncWithProgressFn

func SyncWithProgressFn(progressFn ProgressFunc) SyncOption

SyncWithProgressFn sets a custom progress function that will be called regularly. The function should return an error if the operation should be aborted. Note, there is no guarantee that the function will be called for every prefix. Default: no-op function

func SyncWithStateFile

func SyncWithStateFile(stateFile io.ReadWriteSeeker) SyncOption

SyncWithStateFile sets the state file to be used for tracking progress. This can either be an os.File or any other implementation of io.ReadWriteSeeker. Seeking is only used to jump back to the start of the "virtual file". It should be easy enough to decorate a bytes.Buffer with the necessary methods to make it work. Default: nil; meaning no state will be tracked.

Directories

Path Synopsis
cmd
export
Package main contains a small utility to export the HIBP data to stdout.
Package main contains a small utility to export the HIBP data to stdout.
sync
Package main contains a small utility to sync the HIBP data to the default data directory or to the directory specified as the first argument.
Package main contains a small utility to sync the HIBP data to the default data directory or to the directory specified as the first argument.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL