util

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 2, 2024 License: GPL-3.0 Imports: 16 Imported by: 0

README

base53a - a visually unambiguous alphabet for URLs

How to use

Python
How to actually use it
$ cd base53a/
$ ls
LICENSE  python  README.md
$ cd python
$ python
>>> from base53a import Base53ID, b53_generate_random_Base53ID, b53_generate_next_Base53ID
>>> b53_generate_random_Base53ID(5)
Base53ID(string_without_checksum='4gG6g', checksum_char='F')
>>> b53_generate_next_Base53ID(Base53ID(string_without_checksum='4gG6g', checksum_char='F'))
Base53ID(string_without_checksum='4gG6h', checksum_char='8')
How to run the tests
$ cd base53a/
$ ls
LICENSE  python  README.md
$ cd python
$ python -m unittest 
..........
----------------------------------------------------------------------
Ran 10 tests in 3.757s

OK
How to type check
$ cd base53a
$ python3 -m venv venv
$ . venv/bin/activate
(venv) $ pip install mypy
(venv) $ mypy .

Background

I was creating a URL shortener and of course, short URLs are meant to be read and typed by humans. This creates a problem when characters are visually similar, such as O and 0.

Although this alphabet was designed for URLs, it can be used for other purposes.

For more background details see: https://1f604.blogspot.com/2023/03/introducing-base52a-visually.html

Scheme

Dealing with visually ambiguous character groups:

  • 0 and O => choose 0 as canonical
  • 9 and g => choose g as canonical
  • 1, I, and l => ban 1, I, and l
  • W and VV => ban both W and VV
  • w and vv => ban both w and vv
  • m, rn, and nn => ban m, rn, and nn
  • d and cl => ban d (l is already banned)

So the following characters are banned from the alphabet:

  • O => automatically remapped to 0
  • 9 => automatically remapped to g
  • 1 => Error: Invalid ID
  • I => Error: Invalid ID
  • l => Error: Invalid ID
  • W => Error: Invalid ID
  • w => Error: Invalid ID
  • m => Error: Invalid ID
  • n => Error: Invalid ID
  • c => Error: Invalid ID
  • d => Error: Invalid ID
  • VV => Error: Invalid ID
  • vv => Error: Invalid ID
  • rn => Error: Invalid ID
  • nn => Error: Invalid ID

So, 9 characters are banned and 4 pairings are banned. Out of the 62 characters therefore we only have 53 characters.

It is important to note the pairings of characters that are banned.

How the checksum works

The checksum is a generalization of the ISBN-10 checksum (one could perhaps call it the ISBN-n checksum). The scheme is as follows:

Choose any prime p. Our alphabet will consist of p symbols which map to the set of integers from 0 to p-1. For example, if we choose p to be 17, then our alphabet will be {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f, g} where 0 maps to 0 and g maps to 16.

The maximum valid length of a string (not including the checksum character) in this scheme shall be p-2.

Given a string of length n where n is less than or equal to p-2 consisting of symbols from the alphabet, we want to compute a single-character error-detection checksum for the string. Denote characters in the string as x_i such that x_1 is the first character of the string, x_2 is the second character and x_n is the last character (n is the length of the string). Denote the checksum character as c.

The checksum character c shall be computed as the sum of products of multipliers with the characters of the string mod p, as follows:

c = sum mod p

Where sum = (p-2) * x_1 + (p-3) * x_2 + ... (p-1-n) * x_n

Where:

n is the length of the string
n is less than or equal to p-2
the multipliers are (p-2), (p-3), ...

As c is a number modulo p, its minimum value is 0 and maximum value is p-1, which means it can be represented using the alphabet.

For example, for p=17 the checksum will be computed as:

c = 15 * x_1 + 14 * x_2 + ... + (p-1-n) * x_n mod 17

Where n is less than or equal to 15.

The verification of a string will simply consist of recomputing the checksum of the string without the checksum character (the checksum character will be the last character in the string because we append the checksum character to the end of the string) and seeing if it matches the checksum character.

We want to be able to detect 2 types of errors:

Single character typo, where a single character is changed
Adjacent transposition, where two adjacent characters are swapped resulting in a different string.

The checksum character will be appended to the end of the string, therefore an adjacent transposition could swap the checksum character with the character next to it. We want to be able to detect this.

Here is my proof:

Lemma: A change in the string will go undetected iff the original and new strings have the same checksum, which means that the sums are equal mod p.

Therefore: A change will only go undetected iff the sum changes by a multiple of p.

Property 1. This checksum detects all single-character changes

Proof:

Since none of the multipliers are a multiple of p (they are all smaller than p) this means that in order for a single character change to produce a change by a multiple of p, it has to be a character that changes by a multiple of p. But as the characters are in the range 0 to p-1, it is impossible for them to change by a multiple of p (except for 0 * p, which isn't a change).

Therefore, it is impossible for a single-character change to go undetected.

Property 2. This checksum detects all adjacent transpositions in the string excluding the checksum character

Proof:

Transposition of two adjacent characters will go undetected iff the resulting sum is equal to the original sum plus a multiple of p.

Let r be the multiplier of x, x and y be the adjacent characters (where x precedes y in the string), and k be any integer.

The other characters in the string remain in their positions unchanged, therefore their contribution to the sum remains unchanged therefore the sum only changes if the sum of the products involving the two transposed characters changes.

In order for the change to go undetected the sum of the products involving the two transposed characters must remain unchanged mod p:

r * x + (r-1) * y = r * y + (r-1) * x + p * k where k is any integer

rx + ry - y = ry + rx - x + pk

rx - rx - y = ry - ry - x + pk

-y = -x + pk

x = y + pk

Thus, the only time when a transposition results in an unchanged checksum is when x is equal to y plus a multiple of p. This is impossible if k is nonzero because the characters are in the range 0 to p-1. But if k is 0 then x equals y and there is no change in the string.

Therefore the code detects all adjacent transpositions excluding the checksum character.

Property 3. This checksum detects the transposition between the checksum character and the last character

Denote the checksum character as c.

Denote the length of the string as n where n is less than or equal to p-2.

Denote the last character (not the checksum) as x_n.

Denote the sum not including the last character (p-2) * x_1 + (p-3) * x_2 + ... + x_{n-1} as s.

Denote the last multiplier (the multiplier for x_n) as r.

Then the following equation holds:

s + r * x_n = c + p * k where k is some integer

When the last character x_n is transposed with c, the string will be valid iff:

s + r * c = x_n + p * k where k is some integer. Otherwise the string would be detected as invalid.

Therefore in order for the transposition to go undetected, the following simultaneous equations must hold:

s + r * x_n = c + p * u where u is some integer

s + r * c = x_n + p * v where v is some integer.

Now we can subtract them like this:

s - s + r * x_n - r * c = c - x_n + p(u-v)

r * x_n - r * c - c + x_n = p(u-v)

(r + 1) * x_n - (r + 1) * c = p(u-v)

(r + 1) * (x_n - c) = p(u-v)

Now, p(u-v) is a multiple of p which means (r + 1) * (x_n - c) must be a multiple of p. This means that either r+1 is a multiple of p or (x_n - c) is a multiple of p or both.

r cannot be zero because we specified that the length of the string is less than or equal to p-2 which means r cannot be less than 1. The maximum possible value of r is p-2, which means r+1 cannot be a multiple of p.

Since r+1 cannot be a multiple of p, that means x_n - c must be a multiple of p. But the characters go from 0 to p-1, so the difference between two characters must be less than p, which means that if x_n - c is a multiple of p then x_n - c = 0, which means they're the same and therefore the string is unchanged.

Therefore the last transposition cannot go undetected.

Documentation

Overview

We use byte instead of rune because this alphabet contains only printable ASCII characters.

Index

Constants

View Source
const BASE53_ALPHABET_SIZE = 53

Variables

This section is empty.

Functions

func BuildStruct

func BuildStruct[T any]() *T

func Check_err

func Check_err(err error)

func Convert_str_to_uint64

func Convert_str_to_uint64(input_str string) uint64

func Convert_uint64_to_str

func Convert_uint64_to_str(bigendian_uint64 uint64, length int) string

func Copy_Slice_Into_150_Arr

func Copy_Slice_Into_150_Arr(slice []byte, arr [150]byte)

func Crypto_RandString

func Crypto_RandString(length int) string

Returns random string consisting of letters and numbers

func Crypto_Randint

func Crypto_Randint(max int) (int, error)

This function works, I've manually tested it. Returns integers from 0 up to AND NOT INCLUDING max

func Crypto_Random_Choice

func Crypto_Random_Choice[T any](arr *[]T) (T, error)

This function works, I've manually tested it.

func Divmod

func Divmod(numerator, denominator int) (int, int)

func Get_file_size

func Get_file_size(f *os.File) int64

this function assumes file pointer is valid. We could probably make this more efficient by calculating the file size in-process instead of making syscall each time.

func Int64_to_string

func Int64_to_string(num int64) string

func IsSameType

func IsSameType(a, b interface{}) bool

This function works, I've manually tested it.

func Power_Naive

func Power_Naive(a, b int) int

Naive algorithm, only suitable for small b.

func Power_Slow

func Power_Slow(a, b, m int) int

calculates a to the power of b mod m. If m is 0 then just returns a to the power of b. This function seems to create a memory leak, but it doesn't. Anyway, it's better to use custom power

func PrintMemUsage

func PrintMemUsage()

PrintMemUsage outputs the current, total and OS memory being used. As well as the number of garage collection cycles completed.

func ReplaceString

func ReplaceString(str string, replacement rune, index int) string

func Retryfunc

func Retryfunc(taskname string, dotask retrylib_task, expected_duration time.Duration, max_wait time.Duration)

func Retryproc

func Retryproc(procname string, expected_duration time.Duration, max_wait time.Duration)

func ReverseString

func ReverseString(s string) string

func RunFuncEveryXSeconds

func RunFuncEveryXSeconds(fn fn_type, run_interval_seconds int)

Does not tick shift - will run function precisely every X seconds even if function takes some time to run - as long as the function doesn't take too long of course.

Synchronous - next call cannot start until previous call has finished.

func String_to_int64

func String_to_int64(s string) (int64, error)

func Validate_Timestamp_Common

func Validate_Timestamp_Common(timestamp_unix int64) error

Returns error if unix timestamp is before 2023 or after the year 20,000

Otherwise returns nil

Types

type Base53ErrorChecksumMismatch

type Base53ErrorChecksumMismatch struct{}

func (Base53ErrorChecksumMismatch) Error

type Base53ErrorIllegalCharacter

type Base53ErrorIllegalCharacter struct{}

func (Base53ErrorIllegalCharacter) Error

type Base53ErrorIllegalPair

type Base53ErrorIllegalPair struct{}

func (Base53ErrorIllegalPair) Error

func (e Base53ErrorIllegalPair) Error() string

type Base53ErrorStrWithoutCsumTooLong

type Base53ErrorStrWithoutCsumTooLong struct{}

func (Base53ErrorStrWithoutCsumTooLong) Error

type Base53ErrorStrWithoutCsumTooShort

type Base53ErrorStrWithoutCsumTooShort struct{}

Custom error types

func (Base53ErrorStrWithoutCsumTooShort) Error

type Base53ID

type Base53ID interface {
	GetStrWithoutCsum() string
	GetCsum() byte
	GetCombinedString() string
	Length() int
}

See https://stackoverflow.com/questions/57993809/how-to-hide-the-default-type-constructor-in-golang

type Base53IDManager

type Base53IDManager struct {
	// contains filtered or unexported fields
}

func NewBase53IDManager

func NewBase53IDManager() *Base53IDManager

pregenerate means strings up to n characters will be pre-generated and stored in RandomBags for fast PopRandom and Push later.

func (*Base53IDManager) B53_generate_all_Base53IDs

func (b53m *Base53IDManager) B53_generate_all_Base53IDs(n int) ([]Base53ID, error)

Generate all IDs of length n

func (*Base53IDManager) B53_generate_all_Base53IDs_int64

func (b53m *Base53IDManager) B53_generate_all_Base53IDs_int64(n int) ([]uint64, error)

func (*Base53IDManager) B53_generate_all_Base53IDs_int64_optimized

func (b53m *Base53IDManager) B53_generate_all_Base53IDs_int64_optimized(n int, should_be_added_fn ShouldBase53IDBePlacedIntoSliceFn) ([]uint64, error)

Doesn't push it into slice if it's already in map.

func (*Base53IDManager) B53_generate_all_Base53IDs_int64_test

func (b53m *Base53IDManager) B53_generate_all_Base53IDs_int64_test(n int) ([]uint64, error)

func (*Base53IDManager) B53_generate_next_Base53ID

func (b53m *Base53IDManager) B53_generate_next_Base53ID(old_id Base53ID) (Base53ID, error)

func (*Base53IDManager) B53_generate_random_Base53ID

func (b53m *Base53IDManager) B53_generate_random_Base53ID(n int) (Base53ID, error)

func (*Base53IDManager) Convert_uint64_to_Base53ID

func (b53m *Base53IDManager) Convert_uint64_to_Base53ID(bigendian_uint64 uint64, length int) (*_base53ID_impl, error)

func (*Base53IDManager) Convert_uint64_to_byte_array

func (b53m *Base53IDManager) Convert_uint64_to_byte_array(bigendian_uint64 uint64) []byte

func (*Base53IDManager) NewBase53ID

func (b53m *Base53IDManager) NewBase53ID(str_without_csum string, csum byte, remap bool) (*_base53ID_impl, error)

Construction is validation.

type CryptoRandomChoiceEmptySliceError

type CryptoRandomChoiceEmptySliceError struct{}

Custom error types

func (CryptoRandomChoiceEmptySliceError) Error

type NewBase53IDParams

type NewBase53IDParams struct {
	Str_without_csum string
	Csum             byte
	Remap            bool
}

inspired by StripeIntentParams

type ShouldBase53IDBePlacedIntoSliceFn

type ShouldBase53IDBePlacedIntoSliceFn func(string) bool

type ValidationResult

type ValidationResult struct {
	Success bool
	Message string
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL