hashedrpz

package module
v0.0.0-...-a387d25 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 22, 2021 License: BSD-3-Clause Imports: 4 Imported by: 0

README

HashedRPZ by Jeroen Massar

This repository provides Golang and C code for the HashedRPZ implementation.

HashedRPZ provides a method of being able to distribute RPZ and normal domain block lists, without exposing the real contents to the world. This can be used for instance to block malicious and illegal domains without exposing the actual domains to anybody able to see the entries of the list.

HashedRPZ hashes domainnames, thus making it hard to find out what the original domain is. It hashes per sub-domain/label, thus enabling inclusion in RPZ and allowing wildcard matching.

HashedRPZ uses BLAKE3 which was selected as it is secure, fast, keyed and can be used on shorter strings.

Documentation

On go.dev one can find the documentation for this tool that gets extracted by the golang documentation tools.

The C version is rather similar, but shows that C does not have string handling and one has to then use good old memmove to prepend strings. The C edition is intended for integration with tools like unbound.

Usage

These hashed domains can be used in RPZ, but also can be used in plain blocklists. Of course, given support by the software that checks the RPZ or blocklist.

The example hasher command can be used to take a list of domains on stdin and produce a hashed version on stdout with the provided key.

Example Code (Golang)

package main

import (
        "fmt"
        "github.com/massar/hashedrpz"
)

func main() {
        h := hashedrpz.New("rpz.example.net: bCHn57T5 HHT6oM4e ... 34KycTqD")

        o, err := h.Hash("host.example.com", "rpz.example.net", nil)
        if err != nil {
                fmt.Printf("Hashing gave error %s", err)
                return
        }

        fmt.Printf("Hashed to:\n%s\n", o)
        return
}

The C edition is similar, see hashedrpz_test.c for details.

Key selection & distribution

The blake3 key can be based off any string, please select a long and complex one.

The blake3 key is typically derived from a combination of two strings. One is optionally included in-band in the zone as a TXT record in _rpzhashkey.<domain>. The other is a configuration-time per-zone key that is out-of-band and thus not public.

Depending on paranoia, these keys could be as simple as the domain of the RPZ zone or as complex as a 256 char randomly generated string.

The in-band key gets rotated often, as an adversary could grab it, so that the time it would take to construct a rainbow table would be useless as before one has generated a full list, the key would rotate away already.

The out-of-band key exists so that knowing the in-band key (which is included in the clear in the zone file) is not enough either, especially as it rotates.

Adversary Model

The adversary model is that if somebody wants to get to the list, the best they could do is monitor DNS and check which labels are being NXDOMAINed by logging them on the recursor while having to check separately that those domains really exist. Which means they have to wait for a hit to find a single entry, and they cannot retrieve the complete list in clear text.

Even having the current key though, one could attempt to do a rainbow-style DNS list and try to guess all the domains that are on it, but that will take quite a bit of time for that list then to materialize, as one even has to hash for each one separately and with a rotating key, that becomes rather hard.

HashedRPZ Algorithm

The algorithm is relatively simple (the hard part lives in blake3):

  • Split the label by component
  • if the label is a wildcard (*), keep it verbatim
  • Hash with blake3 keyed with key each sublabel, but as a complete domain upto that point
  • Output the hash using base32hex lowercase (RFC4648)

e.g. www.example.com is actually hash(www.example.com) + '.' + hash(example.com) + '.' + hash(com)

Indeed, TLDs can thus be identified, but as there are 'few' TLDs in comparison and most commonly it is '.com' this is not a huge worry.

Example

Given for instance the domains (and depending on the key):

www.example.net
one.example.com
two.example.com

Results:

9mgrvf8.qa4gjtuvuia82ubhh705n29hm0.0hjg4h0
fca618e.r939194s2f5m5rdougo4rvc0gg.u32p0s0
w21jice.r939194s2f5m5rdougo4rvc0gg.u32p0s0

The same domain level is thus encoded the same (and TLDs become obvious that they are the same).

Short labels can thus be indentified (and one could guess that is 'www') as they produce shorter sub-hashes. But even given that, one does not learn enough about the label for it to allow reversing to the real domain.

Thanks

I'd like to thank the BLAKE3 team: Jack O'Connor, Samuel Neves, Jean-Philippe Aumasson, Zooko for handling the cryptography, I have lots to learn there still, thus I am not 'rolling my own crypto'. I recommend Serious Cryptography by Jean-Philippe Aumasson as a very good primer and background read and reference on these subjects.

Thanks to Paul Vixie and Vernon Schryver for RPZ and the many implementors of RPZ for enabling the blocking of malicious and the amazing work they have put into making the Internet a better place.

Last, not least, thanks to Peter van Dijk for many inputs and improvement suggestions.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrEmptyLabel = errors.New("Empty Label provided (RPZ the root?)")

ErrEmptyLabel is returned when it is attempted to encode an empty label

View Source
var ErrEmptySublabel = errors.New("Empty Sub Label (eg. dom..example.com)")

ErrEmptySublabel is returned when a situation like "dom..example.com" is encountered

View Source
var ErrInvalidOriginDomain = errors.New("Invalid Origin Domain (empty/root/leading-dot)")

ErrInvalidOriginDomain is returned when the provided is empty, the root (.) or has a leading dot.

View Source
var ErrTooLong = errors.New("Domain too long to hash")

ErrTooLong indicates that the input domain was too long when hashed the result from the Hash function is that final only contains the hashed labels upto that error. The caller can decide to wildcard the domain or not.

This is used for a very simple check that just checks that we will never exceed the maximum domainlength; though, one already has to substract the $ORIGIN of the domain (e.g. ```.rpz.example.net```) that this RPZ ownername is part of. Thus typically, eg. given ```.rpz.example.net``` of length 14, it already becomes 255-16-15 = 224; hence why 200 is normally the suggested value, but one can accurately guess this value given the origin.

View Source
var ErrWildcardNotAtStart = errors.New("Wildcard (*) not at start of left hand side")

ErrWildcardNotAtStart is returned when there is a wildcard in the middle of the left hand side

Functions

This section is empty.

Types

type HashCallback

type HashCallback func(subdomain string, hash string)

HashCallback is called by Hash after each sublabel has been hashed allowing a caller to check at each part of the lefthandside the label that has been hashed.

var NoCallback HashCallback = nil

NoCallback can be used to clearly show in the calling function that no callback is being used (opposed to having a 'nil' and having to check what that nil is for)

type HashedRPZ

type HashedRPZ struct {
	sync.Mutex
	// contains filtered or unexported fields
}

HashedRPZ represents a hasher, it has a mutex to ensure only a single caller at a time

func New

func New(key string) (h HashedRPZ)

New creates a new HashedRPZ deriving the BLAKE3 key from the given string The string should be composed of both an inline and a out-of-band key.

func (*HashedRPZ) Hash

func (h *HashedRPZ) Hash(lefthandside string, origindomain string, callback HashCallback) (final string, err error)

Hash hashes the lefthandside that should be in domain format (thus ```host.example.org```) and returns the HashedRPZ hashed variant of that.

Lefthandside is allowed to end to be fully qualified (ending in a '.') but it will be ignored.

The origindomain (e.g. ```rpz.example.com```) is supplied to limit the length of the resulting ownername to ensure it does not exceed the full length of a domain name.

The origindomain is not used for hashing, only for limiting/detecting length issues.

A mutex ensures that only one hasher at the same time runs Create multiple HashedRPZ, e.g. one per go process, for parallel operation.

The callback will be called for every hashed label, thus allowing the user to do intermediate lookups. One can use a function closure to pass parameters that the callback might need.

Will return ErrInvalidOriginDomain if the origin domain is empty or root, or start with a '.'.

Will return ErrEmptyLabel if the label to hash is empty, this to avoid blocking the root of DNS.

Will return ErrWildcardNotAtStart when there is a wildcard not at the start of the left hand side.

Might return ErrTooLong (see description for details on how to handle it), thus do check for error returns.

Will return ErrEmptySubLabel if an empty sublabel is found.

func (*HashedRPZ) HashWildcard

func (h *HashedRPZ) HashWildcard(lefthandside string, origindomain string, callback HashCallback) (final string, iswildcard bool, err error)

HashWildcard calls Hash() but when the maxdomainlength is exceeded, it encodes the remaining labels as a wildcard inside the domain that fitted.

Thus for example an input of ```host.v.e.r.y.l.o.n.g.example.com``` would encode as ```*.n.g.example.com```. (if the domainname would be much longer than given in this example, see test cases for the real version).

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL