util

package module

v1.0.0 Latest Latest Go to latest Published: Jan 2, 2024 License: GPL-3.0 Imports: 16 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/1f604/base53a

Links

Open Source Insights

README ¶

base53a - a visually unambiguous alphabet for URLs

How to use

Python

How to actually use it

$ cd base53a/
$ ls
LICENSE  python  README.md
$ cd python
$ python
>>> from base53a import Base53ID, b53_generate_random_Base53ID, b53_generate_next_Base53ID
>>> b53_generate_random_Base53ID(5)
Base53ID(string_without_checksum='4gG6g', checksum_char='F')
>>> b53_generate_next_Base53ID(Base53ID(string_without_checksum='4gG6g', checksum_char='F'))
Base53ID(string_without_checksum='4gG6h', checksum_char='8')

How to run the tests

$ cd base53a/
$ ls
LICENSE  python  README.md
$ cd python
$ python -m unittest 
..........
----------------------------------------------------------------------
Ran 10 tests in 3.757s

OK

How to type check

$ cd base53a
$ python3 -m venv venv
$ . venv/bin/activate
(venv) $ pip install mypy
(venv) $ mypy .

Background

I was creating a URL shortener and of course, short URLs are meant to be read and typed by humans. This creates a problem when characters are visually similar, such as O and 0.

Although this alphabet was designed for URLs, it can be used for other purposes.

For more background details see: https://1f604.blogspot.com/2023/03/introducing-base52a-visually.html

Scheme

Dealing with visually ambiguous character groups:

0 and O => choose 0 as canonical
9 and g => choose g as canonical
1, I, and l => ban 1, I, and l
W and VV => ban both W and VV
w and vv => ban both w and vv
m, rn, and nn => ban m, rn, and nn
d and cl => ban d (l is already banned)

So the following characters are banned from the alphabet:

O => automatically remapped to 0
9 => automatically remapped to g
1 => Error: Invalid ID
I => Error: Invalid ID
l => Error: Invalid ID
W => Error: Invalid ID
w => Error: Invalid ID
m => Error: Invalid ID
n => Error: Invalid ID
c => Error: Invalid ID
d => Error: Invalid ID
VV => Error: Invalid ID
vv => Error: Invalid ID
rn => Error: Invalid ID
nn => Error: Invalid ID

So, 9 characters are banned and 4 pairings are banned. Out of the 62 characters therefore we only have 53 characters.

It is important to note the pairings of characters that are banned.

How the checksum works

The checksum is a generalization of the ISBN-10 checksum (one could perhaps call it the ISBN-n checksum). The scheme is as follows:

Choose any prime p. Our alphabet will consist of p symbols which map to the set of integers from 0 to p-1. For example, if we choose p to be 17, then our alphabet will be {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f, g} where 0 maps to 0 and g maps to 16.

The maximum valid length of a string (not including the checksum character) in this scheme shall be p-2.

Given a string of length n where n is less than or equal to p-2 consisting of symbols from the alphabet, we want to compute a single-character error-detection checksum for the string. Denote characters in the string as x_i such that x_1 is the first character of the string, x_2 is the second character and x_n is the last character (n is the length of the string). Denote the checksum character as c.

The checksum character c shall be computed as the sum of products of multipliers with the characters of the string mod p, as follows:

c = sum mod p

Where sum = (p-2) * x_1 + (p-3) * x_2 + ... (p-1-n) * x_n

Where:

n is the length of the string
n is less than or equal to p-2
the multipliers are (p-2), (p-3), ...

As c is a number modulo p, its minimum value is 0 and maximum value is p-1, which means it can be represented using the alphabet.

For example, for p=17 the checksum will be computed as:

c = 15 * x_1 + 14 * x_2 + ... + (p-1-n) * x_n mod 17

Where n is less than or equal to 15.

The verification of a string will simply consist of recomputing the checksum of the string without the checksum character (the checksum character will be the last character in the string because we append the checksum character to the end of the string) and seeing if it matches the checksum character.

We want to be able to detect 2 types of errors:

Single character typo, where a single character is changed
Adjacent transposition, where two adjacent characters are swapped resulting in a different string.

The checksum character will be appended to the end of the string, therefore an adjacent transposition could swap the checksum character with the character next to it. We want to be able to detect this.

Here is my proof:

Lemma: A change in the string will go undetected iff the original and new strings have the same checksum, which means that the sums are equal mod p.

Therefore: A change will only go undetected iff the sum changes by a multiple of p.

Property 1. This checksum detects all single-character changes

Proof:

Since none of the multipliers are a multiple of p (they are all smaller than p) this means that in order for a single character change to produce a change by a multiple of p, it has to be a character that changes by a multiple of p. But as the characters are in the range 0 to p-1, it is impossible for them to change by a multiple of p (except for 0 * p, which isn't a change).

Therefore, it is impossible for a single-character change to go undetected.

Property 2. This checksum detects all adjacent transpositions in the string excluding the checksum character

Proof:

Transposition of two adjacent characters will go undetected iff the resulting sum is equal to the original sum plus a multiple of p.

Let r be the multiplier of x, x and y be the adjacent characters (where x precedes y in the string), and k be any integer.

The other characters in the string remain in their positions unchanged, therefore their contribution to the sum remains unchanged therefore the sum only changes if the sum of the products involving the two transposed characters changes.

In order for the change to go undetected the sum of the products involving the two transposed characters must remain unchanged mod p:

r * x + (r-1) * y = r * y + (r-1) * x + p * k where k is any integer

rx + ry - y = ry + rx - x + pk

rx - rx - y = ry - ry - x + pk

-y = -x + pk

x = y + pk

Thus, the only time when a transposition results in an unchanged checksum is when x is equal to y plus a multiple of p. This is impossible if k is nonzero because the characters are in the range 0 to p-1. But if k is 0 then x equals y and there is no change in the string.

Therefore the code detects all adjacent transpositions excluding the checksum character.

Property 3. This checksum detects the transposition between the checksum character and the last character

Denote the checksum character as c.

Denote the length of the string as n where n is less than or equal to p-2.

Denote the last character (not the checksum) as x_n.

Denote the sum not including the last character (p-2) * x_1 + (p-3) * x_2 + ... + x_{n-1} as s.

Denote the last multiplier (the multiplier for x_n) as r.

Then the following equation holds:

s + r * x_n = c + p * k where k is some integer

When the last character x_n is transposed with c, the string will be valid iff:

s + r * c = x_n + p * k where k is some integer. Otherwise the string would be detected as invalid.

Therefore in order for the transposition to go undetected, the following simultaneous equations must hold:

s + r * x_n = c + p * u where u is some integer

s + r * c = x_n + p * v where v is some integer.

Now we can subtract them like this:

s - s + r * x_n - r * c = c - x_n + p(u-v)

r * x_n - r * c - c + x_n = p(u-v)

(r + 1) * x_n - (r + 1) * c = p(u-v)

(r + 1) * (x_n - c) = p(u-v)

Now, p(u-v) is a multiple of p which means (r + 1) * (x_n - c) must be a multiple of p. This means that either r+1 is a multiple of p or (x_n - c) is a multiple of p or both.

r cannot be zero because we specified that the length of the string is less than or equal to p-2 which means r cannot be less than 1. The maximum possible value of r is p-2, which means r+1 cannot be a multiple of p.

Since r+1 cannot be a multiple of p, that means x_n - c must be a multiple of p. But the characters go from 0 to p-1, so the difference between two characters must be less than p, which means that if x_n - c is a multiple of p then x_n - c = 0, which means they're the same and therefore the string is unchanged.

Therefore the last transposition cannot go undetected.

Documentation ¶

Overview ¶

We use byte instead of rune because this alphabet contains only printable ASCII characters.

Index ¶

Constants
func BuildStruct[T any]() *T
func Check_err(err error)
func Convert_str_to_uint64(input_str string) uint64
func Convert_uint64_to_str(bigendian_uint64 uint64, length int) string
func Copy_Slice_Into_150_Arr(slice []byte, arr [150]byte)
func Crypto_RandString(length int) string
func Crypto_Randint(max int) (int, error)
func Crypto_Random_Choice[T any](arr *[]T) (T, error)
func Divmod(numerator, denominator int) (int, int)
func Get_file_size(f *os.File) int64
func Int64_to_string(num int64) string
func IsSameType(a, b interface{}) bool
func Power_Naive(a, b int) int
func Power_Slow(a, b, m int) int
func PrintMemUsage()
func ReplaceString(str string, replacement rune, index int) string
func Retryfunc(taskname string, dotask retrylib_task, expected_duration time.Duration, ...)
func Retryproc(procname string, expected_duration time.Duration, max_wait time.Duration)
func ReverseString(s string) string
func RunFuncEveryXSeconds(fn fn_type, run_interval_seconds int)
func SetLogFile(filename string) *os.File
func String_to_int64(s string) (int64, error)
func Use(vals ...interface{})
func Validate_Timestamp_Common(timestamp_unix int64) error
type Base53ErrorChecksumMismatch
- func (e Base53ErrorChecksumMismatch) Error() string
type Base53ErrorIllegalCharacter
- func (e Base53ErrorIllegalCharacter) Error() string
type Base53ErrorIllegalPair
- func (e Base53ErrorIllegalPair) Error() string
type Base53ErrorStrWithoutCsumTooLong
- func (e Base53ErrorStrWithoutCsumTooLong) Error() string
type Base53ErrorStrWithoutCsumTooShort
- func (e Base53ErrorStrWithoutCsumTooShort) Error() string
type Base53ID
type Base53IDManager
- func NewBase53IDManager() *Base53IDManager
- func (b53m *Base53IDManager) B53_generate_all_Base53IDs(n int) ([]Base53ID, error)
- func (b53m *Base53IDManager) B53_generate_all_Base53IDs_int64(n int) ([]uint64, error)
- func (b53m *Base53IDManager) B53_generate_all_Base53IDs_int64_optimized(n int, should_be_added_fn ShouldBase53IDBePlacedIntoSliceFn) ([]uint64, error)
- func (b53m *Base53IDManager) B53_generate_all_Base53IDs_int64_test(n int) ([]uint64, error)
- func (b53m *Base53IDManager) B53_generate_next_Base53ID(old_id Base53ID) (Base53ID, error)
- func (b53m *Base53IDManager) B53_generate_random_Base53ID(n int) (Base53ID, error)
- func (b53m *Base53IDManager) Convert_uint64_to_Base53ID(bigendian_uint64 uint64, length int) (*_base53ID_impl, error)
- func (b53m *Base53IDManager) Convert_uint64_to_byte_array(bigendian_uint64 uint64) []byte
- func (b53m *Base53IDManager) NewBase53ID(str_without_csum string, csum byte, remap bool) (*_base53ID_impl, error)
type CryptoRandomChoiceEmptySliceError
- func (e CryptoRandomChoiceEmptySliceError) Error() string
type NewBase53IDParams
type ShouldBase53IDBePlacedIntoSliceFn
type ValidationResult

Constants ¶

View Source

const BASE53_ALPHABET_SIZE = 53

Variables ¶

This section is empty.

Functions ¶

func BuildStruct ¶

func BuildStruct[T any]() *T

func Check_err ¶

func Check_err(err error)

func Convert_str_to_uint64 ¶

func Convert_str_to_uint64(input_str string) uint64

func Convert_uint64_to_str ¶

func Convert_uint64_to_str(bigendian_uint64 uint64, length int) string

func Copy_Slice_Into_150_Arr ¶

func Copy_Slice_Into_150_Arr(slice []byte, arr [150]byte)

func Crypto_RandString ¶

func Crypto_RandString(length int) string

Returns random string consisting of letters and numbers

func Crypto_Randint ¶

func Crypto_Randint(max int) (int, error)

This function works, I've manually tested it. Returns integers from 0 up to AND NOT INCLUDING max

func Crypto_Random_Choice ¶

func Crypto_Random_Choice[T any](arr *[]T) (T, error)

This function works, I've manually tested it.

func Divmod ¶

func Divmod(numerator, denominator int) (int, int)

func Get_file_size ¶

func Get_file_size(f *os.File) int64

this function assumes file pointer is valid. We could probably make this more efficient by calculating the file size in-process instead of making syscall each time.

func Int64_to_string ¶

func Int64_to_string(num int64) string

func IsSameType ¶

func IsSameType(a, b interface{}) bool

This function works, I've manually tested it.

func Power_Naive ¶

func Power_Naive(a, b int) int

Naive algorithm, only suitable for small b.

func Power_Slow ¶

func Power_Slow(a, b, m int) int

calculates a to the power of b mod m. If m is 0 then just returns a to the power of b. This function seems to create a memory leak, but it doesn't. Anyway, it's better to use custom power

func PrintMemUsage ¶

func PrintMemUsage()

PrintMemUsage outputs the current, total and OS memory being used. As well as the number of garage collection cycles completed.

func ReplaceString ¶

func ReplaceString(str string, replacement rune, index int) string

func Retryfunc ¶

func Retryfunc(taskname string, dotask retrylib_task, expected_duration time.Duration, max_wait time.Duration)

func Retryproc ¶

func Retryproc(procname string, expected_duration time.Duration, max_wait time.Duration)

func ReverseString ¶

func ReverseString(s string) string

func RunFuncEveryXSeconds ¶

func RunFuncEveryXSeconds(fn fn_type, run_interval_seconds int)

Does not tick shift - will run function precisely every X seconds even if function takes some time to run - as long as the function doesn't take too long of course.

Synchronous - next call cannot start until previous call has finished.

func SetLogFile ¶

func SetLogFile(filename string) *os.File

https://stackoverflow.com/questions/19965795/how-to-write-log-to-file

func String_to_int64 ¶

func String_to_int64(s string) (int64, error)

func Use ¶

func Use(vals ...interface{})

https://stackoverflow.com/questions/21743841/how-to-avoid-annoying-error-declared-and-not-used

func Validate_Timestamp_Common ¶

func Validate_Timestamp_Common(timestamp_unix int64) error

Returns error if unix timestamp is before 2023 or after the year 20,000

Otherwise returns nil

Types ¶

type Base53ErrorChecksumMismatch ¶

type Base53ErrorChecksumMismatch struct{}

func (Base53ErrorChecksumMismatch) Error ¶

func (e Base53ErrorChecksumMismatch) Error() string

type Base53ErrorIllegalCharacter ¶

type Base53ErrorIllegalCharacter struct{}

func (Base53ErrorIllegalCharacter) Error ¶

func (e Base53ErrorIllegalCharacter) Error() string

type Base53ErrorIllegalPair ¶

type Base53ErrorIllegalPair struct{}

func (Base53ErrorIllegalPair) Error ¶

func (e Base53ErrorIllegalPair) Error() string

type Base53ErrorStrWithoutCsumTooLong ¶

type Base53ErrorStrWithoutCsumTooLong struct{}

func (Base53ErrorStrWithoutCsumTooLong) Error ¶

func (e Base53ErrorStrWithoutCsumTooLong) Error() string

type Base53ErrorStrWithoutCsumTooShort ¶

type Base53ErrorStrWithoutCsumTooShort struct{}

Custom error types

func (Base53ErrorStrWithoutCsumTooShort) Error ¶

func (e Base53ErrorStrWithoutCsumTooShort) Error() string

type Base53ID ¶

type Base53ID interface {
	GetStrWithoutCsum() string
	GetCsum() byte
	GetCombinedString() string
	Length() int
}

See https://stackoverflow.com/questions/57993809/how-to-hide-the-default-type-constructor-in-golang

type Base53IDManager ¶

type Base53IDManager struct {
	// contains filtered or unexported fields
}

func NewBase53IDManager ¶

func NewBase53IDManager() *Base53IDManager

pregenerate means strings up to n characters will be pre-generated and stored in RandomBags for fast PopRandom and Push later.

func (*Base53IDManager) B53_generate_all_Base53IDs ¶

func (b53m *Base53IDManager) B53_generate_all_Base53IDs(n int) ([]Base53ID, error)

Generate all IDs of length n

func (*Base53IDManager) B53_generate_all_Base53IDs_int64 ¶

func (b53m *Base53IDManager) B53_generate_all_Base53IDs_int64(n int) ([]uint64, error)

func (*Base53IDManager) B53_generate_all_Base53IDs_int64_optimized ¶

func (b53m *Base53IDManager) B53_generate_all_Base53IDs_int64_optimized(n int, should_be_added_fn ShouldBase53IDBePlacedIntoSliceFn) ([]uint64, error)

Doesn't push it into slice if it's already in map.

func (*Base53IDManager) B53_generate_all_Base53IDs_int64_test ¶

func (b53m *Base53IDManager) B53_generate_all_Base53IDs_int64_test(n int) ([]uint64, error)

func (*Base53IDManager) B53_generate_next_Base53ID ¶

func (b53m *Base53IDManager) B53_generate_next_Base53ID(old_id Base53ID) (Base53ID, error)

func (*Base53IDManager) B53_generate_random_Base53ID ¶

func (b53m *Base53IDManager) B53_generate_random_Base53ID(n int) (Base53ID, error)

func (*Base53IDManager) Convert_uint64_to_Base53ID ¶

func (b53m *Base53IDManager) Convert_uint64_to_Base53ID(bigendian_uint64 uint64, length int) (*_base53ID_impl, error)

func (*Base53IDManager) Convert_uint64_to_byte_array ¶

func (b53m *Base53IDManager) Convert_uint64_to_byte_array(bigendian_uint64 uint64) []byte

func (*Base53IDManager) NewBase53ID ¶

func (b53m *Base53IDManager) NewBase53ID(str_without_csum string, csum byte, remap bool) (*_base53ID_impl, error)

Construction is validation.

type CryptoRandomChoiceEmptySliceError ¶

type CryptoRandomChoiceEmptySliceError struct{}

Custom error types

func (CryptoRandomChoiceEmptySliceError) Error ¶

func (e CryptoRandomChoiceEmptySliceError) Error() string

type NewBase53IDParams ¶

type NewBase53IDParams struct {
	Str_without_csum string
	Csum             byte
	Remap            bool
}

inspired by StripeIntentParams

type ShouldBase53IDBePlacedIntoSliceFn ¶

type ShouldBase53IDBePlacedIntoSliceFn func(string) bool

type ValidationResult ¶

type ValidationResult struct {
	Success bool
	Message string
}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL