_generate

command
v0.0.0-...-86e9f11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2024 License: Apache-2.0 Imports: 15 Imported by: 0

Documentation

Overview

Alternative to-{upper,lower} approach --------------------------------------------------

Overally the design is strighforward

1. We consider only characters in range 0..1ffff --- it is 17 bits. 2. We split the char code into two parts: lower 8 bits (col), and higher 9 bits (row). 3. Then we lookup in the table like: lookup[row][col]. Thus, we derference twice. 4. Lookup might store either a difference of codes (2 bytes) or pre-encoded UTF-8 char (4 bytes).

The only trick with lookup is that we compress the second-level table. Each entry of lookup[row] contains three values:

- the minimum col value - the maximum col value - offset in values table

Thus, the real lookup looks like this:

if row > maxRow {
    return no-change
}

entry := lookup[row]
if col >= entry.lo && col <= entry.hi {
    return values[col - entry.lo + entry.offset]
}

For detailed implementation please see method `LookupDiff.translate below`.

Comparison with the current approach --------------------------------------------------

The current approach stores only the difference of char codes. As a result, we have to perform: 1) UTF-8 -> rune; 2) update rune; 3) rune -> UTF-8.

This new approach allows us to omit the last step, as we can precompute UTF-8 results.

Lookup tables size comparison:

* to-lower: current = 9665, new = 12892 * to upper: current = 10356, new = 13260

The tables are ~30% bigger.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL