username

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 17, 2021 License: BSD-3-Clause Imports: 7 Imported by: 1

README ยถ

go-ucd-username

Generates ASCII URL-safe aliases for Unicode usernames containing non-alphanumeric characters.

What is this thing?

go-ucd-username generates ASCII URI-safe aliases for Unicode usernames containing non-alphanumeric (plus dashes) characters by converting them in to ASCII equivalents using the go-ucd package.

Specifically any character that is not [a-zA-Z0-9\-] is passed to go-ucd to determine its equivalent Unicode name string. By default those strings are simple descriptive ASCII English names, but some names in the Unihan dataset contain non-ASCII characters so a final pass to filter out any character that is not [a-zA-Z0-9\-] is applied.

By default any whitespace or punctuation characters are ignored entirely but you can toggle these defaults if you choose (at which point they will be processed by go-ucd).

All strings are lower-cased.

Example

Given the string mr. ๐Ÿ˜ / ../test ๐Ÿš€ ใ– the following would happen:

2016/10/03 07:51:15 PARSE mr. ๐Ÿ˜ / ../test ๐Ÿš€ ใ–
2016/10/03 07:51:15 RUNE 0 U+006D 'm'
2016/10/03 07:51:15 RUNE 1 U+0072 'r'
2016/10/03 07:51:15 RUNE 2 U+002E '.'
2016/10/03 07:51:15 RUNE 2 U+002E '.' is punctuation	SKIPPING
2016/10/03 07:51:15 RUNE 3 U+0020 ' '
2016/10/03 07:51:15 RUNE 3 U+0020 ' ' is space	SKIPPING
2016/10/03 07:51:15 RUNE 4 U+1F601 '๐Ÿ˜'
2016/10/03 07:51:15 RUNE 4 U+1F601 '๐Ÿ˜' is not whitelisted	PROCESSING
2016/10/03 07:51:15 RUNE 4 U+1F601 '๐Ÿ˜' return string 'GRINNING FACE WITH SMILING EYES'	PROCESSING
2016/10/03 07:51:15 RUNE 4:0 U+0047 'G'
2016/10/03 07:51:15 RUNE 4:1 U+0052 'R'
2016/10/03 07:51:15 RUNE 4:2 U+0049 'I'
2016/10/03 07:51:15 RUNE 4:3 U+004E 'N'
2016/10/03 07:51:15 RUNE 4:4 U+004E 'N'
2016/10/03 07:51:15 RUNE 4:5 U+0049 'I'
2016/10/03 07:51:15 RUNE 4:6 U+004E 'N'
2016/10/03 07:51:15 RUNE 4:7 U+0047 'G'
2016/10/03 07:51:15 RUNE 4:8 U+0020 ' '
2016/10/03 07:51:15 RUNE 4:9 U+0046 'F'
2016/10/03 07:51:15 RUNE 4:10 U+0041 'A'
2016/10/03 07:51:15 RUNE 4:11 U+0043 'C'
2016/10/03 07:51:15 RUNE 4:12 U+0045 'E'
2016/10/03 07:51:15 RUNE 4:13 U+0020 ' '
2016/10/03 07:51:15 RUNE 4:14 U+0057 'W'
2016/10/03 07:51:15 RUNE 4:15 U+0049 'I'
2016/10/03 07:51:15 RUNE 4:16 U+0054 'T'
2016/10/03 07:51:15 RUNE 4:17 U+0048 'H'
2016/10/03 07:51:15 RUNE 4:18 U+0020 ' '
2016/10/03 07:51:15 RUNE 4:19 U+0053 'S'
2016/10/03 07:51:15 RUNE 4:20 U+004D 'M'
2016/10/03 07:51:15 RUNE 4:21 U+0049 'I'
2016/10/03 07:51:15 RUNE 4:22 U+004C 'L'
2016/10/03 07:51:15 RUNE 4:23 U+0049 'I'
2016/10/03 07:51:15 RUNE 4:24 U+004E 'N'
2016/10/03 07:51:15 RUNE 4:25 U+0047 'G'
2016/10/03 07:51:15 RUNE 4:26 U+0020 ' '
2016/10/03 07:51:15 RUNE 4:27 U+0045 'E'
2016/10/03 07:51:15 RUNE 4:28 U+0059 'Y'
2016/10/03 07:51:15 RUNE 4:29 U+0045 'E'
2016/10/03 07:51:15 RUNE 4:30 U+0053 'S'
2016/10/03 07:51:15 RUNE 8 U+0020 ' '
2016/10/03 07:51:15 RUNE 8 U+0020 ' ' is space	SKIPPING
2016/10/03 07:51:15 RUNE 9 U+002F '/'
2016/10/03 07:51:15 RUNE 9 U+002F '/' is punctuation	SKIPPING
2016/10/03 07:51:15 RUNE 10 U+0020 ' '
2016/10/03 07:51:15 RUNE 10 U+0020 ' ' is space	SKIPPING
2016/10/03 07:51:15 RUNE 11 U+002E '.'
2016/10/03 07:51:15 RUNE 11 U+002E '.' is punctuation	SKIPPING
2016/10/03 07:51:15 RUNE 12 U+002E '.'
2016/10/03 07:51:15 RUNE 12 U+002E '.' is punctuation	SKIPPING
2016/10/03 07:51:15 RUNE 13 U+002F '/'
2016/10/03 07:51:15 RUNE 13 U+002F '/' is punctuation	SKIPPING
2016/10/03 07:51:15 RUNE 14 U+0074 't'
2016/10/03 07:51:15 RUNE 15 U+0065 'e'
2016/10/03 07:51:15 RUNE 16 U+0073 's'
2016/10/03 07:51:15 RUNE 17 U+0074 't'
2016/10/03 07:51:15 RUNE 18 U+0020 ' '
2016/10/03 07:51:15 RUNE 18 U+0020 ' ' is space	SKIPPING
2016/10/03 07:51:15 RUNE 19 U+1F680 '๐Ÿš€'
2016/10/03 07:51:15 RUNE 19 U+1F680 '๐Ÿš€' is not whitelisted	PROCESSING
2016/10/03 07:51:15 RUNE 19 U+1F680 '๐Ÿš€' return string 'ROCKET'	PROCESSING
2016/10/03 07:51:15 RUNE 19:0 U+0052 'R'
2016/10/03 07:51:15 RUNE 19:1 U+004F 'O'
2016/10/03 07:51:15 RUNE 19:2 U+0043 'C'
2016/10/03 07:51:15 RUNE 19:3 U+004B 'K'
2016/10/03 07:51:15 RUNE 19:4 U+0045 'E'
2016/10/03 07:51:15 RUNE 19:5 U+0054 'T'
2016/10/03 07:51:15 RUNE 23 U+0020 ' '
2016/10/03 07:51:15 RUNE 23 U+0020 ' ' is space	SKIPPING
2016/10/03 07:51:15 RUNE 24 U+3416 'ใ–'
2016/10/03 07:51:15 RUNE 24 U+3416 'ใ–' is not whitelisted	PROCESSING
2016/10/03 07:51:15 RUNE 24 U+3416 'ใ–' return string 'ใ–ๆฏ’, AN OLD NAME FOR INDIA'	PROCESSING
2016/10/03 07:51:15 RUNE 24:0 U+3416 'ใ–'
2016/10/03 07:51:15 RUNE 24:3 U+6BD2 'ๆฏ’'
2016/10/03 07:51:15 RUNE 24:6 U+002C ','
2016/10/03 07:51:15 RUNE 24:7 U+0020 ' '
2016/10/03 07:51:15 RUNE 24:8 U+0041 'A'
2016/10/03 07:51:15 RUNE 24:9 U+004E 'N'
2016/10/03 07:51:15 RUNE 24:10 U+0020 ' '
2016/10/03 07:51:15 RUNE 24:11 U+004F 'O'
2016/10/03 07:51:15 RUNE 24:12 U+004C 'L'
2016/10/03 07:51:15 RUNE 24:13 U+0044 'D'
2016/10/03 07:51:15 RUNE 24:14 U+0020 ' '
2016/10/03 07:51:15 RUNE 24:15 U+004E 'N'
2016/10/03 07:51:15 RUNE 24:16 U+0041 'A'
2016/10/03 07:51:15 RUNE 24:17 U+004D 'M'
2016/10/03 07:51:15 RUNE 24:18 U+0045 'E'
2016/10/03 07:51:15 RUNE 24:19 U+0020 ' '
2016/10/03 07:51:15 RUNE 24:20 U+0046 'F'
2016/10/03 07:51:15 RUNE 24:21 U+004F 'O'
2016/10/03 07:51:15 RUNE 24:22 U+0052 'R'
2016/10/03 07:51:15 RUNE 24:23 U+0020 ' '
2016/10/03 07:51:15 RUNE 24:24 U+0049 'I'
2016/10/03 07:51:15 RUNE 24:25 U+004E 'N'
2016/10/03 07:51:15 RUNE 24:26 U+0044 'D'
2016/10/03 07:51:15 RUNE 24:27 U+0049 'I'
2016/10/03 07:51:15 RUNE 24:28 U+0041 'A'

Resulting in the string mrgrinningfacewithsmilingeyestestrocketanoldnameforindia. What you do (or don't do) with that string afterwards is entirely up to you!

But... why?

Principally so that you can provide meaningful "pretty" URL-safe aliases for users with full Unicode usernames.

For example, a user with the name Admiral๐Ÿฆ would have the URL alias /admiralsofticecream. If you don't already have users with names like that... you will.

While it's true that an Internalized Resource Identifier (IRI) would allow you to create an equivalent /Admiral%20๐Ÿฆ URL it's still early days for IRIs meaning browser support is uneven and there are a number of security concerns that haven't been fully sorted yet.

go-ucd-username side-steps all those issues and allows for a better-than-nothing alternative.

Usage

package main

import (
	"flag"
	"fmt"
	"github.com/aaronland/go-ucd-username"
	"log"
	"os"
	"strings"
)

func main() {

	var spaces = flag.Bool("spaces", false, "Do not filter out whitespace during processing")
	var punct = flag.Bool("punct", false, "Do not filter out punctuation during processing")
	var debug = flag.Bool("debug", false, "Enable verbose logging during processing")

	flag.Parse()
	args := flag.Args()

	pretty := strings.Join(args, " ")

	uname, err := username.NewUCDUsername()

	if err != nil {
		log.Fatal(err)
	}

	uname.Debug = *debug
	uname.AllowSpaces = *spaces
	uname.AllowPunctuation = *punct

	safe, err := uname.Translate(pretty)

	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(safe)
	os.Exit(0)
}

Tools

> make cli
GOARCH=wasm GOOS=js go build -mod vendor -o http/wasm/ucd.wasm cmd/ucd-wasm/main.go
go build -mod vendor -o bin/ucd-username cmd/ucd-username/main.go
go build -mod vendor -o bin/ucd-username-server cmd/ucd-username-server/main.go
ucd-username

Command line tool for converting strings in to valid UCD usernames.

$> ./bin/ucd-username -h
Command line tool for converting strings in to valid UCD usernames.

Usage:
	 ./bin/ucd-username [options] string(N) string(N) string(N)

For example:
	./bin/ucd-username captain ๐Ÿง โœจ 
	aptainastandingpersonsarkles

Valid options are:
  -debug
    	Enable verbose logging during processing
  -punct
    	Do not filter out punctuation during processing
  -spaces
    	Do not filter out whitespace during processing

For example:

./bin/ucd-username mr. ๐Ÿ˜
mrgrinningfacewithsmilingeyes
ucd-username-server

HTTP server exposing the ucd-username functionality.

$> ./bin/ucd-username-server -h
HTTP server exposing the ucd-username functionality.

Usage:
	 ./bin/ucd-username-server [options] 

For example:
	./bin/ucd-username-server
	2021/02/17 08:55:00 Listening on http://localhost:8080

Valid options are:
  -debug
    	Enable verbose logging during processing
  -enable-api
    	Enable the /api endpoint (default true)
  -enable-www
    	Enable the / endpoint (default true)
  -host string
    	What host to bind ucd-username-server to. This fs is DEPRECATED. Please use -server-uri instead. (default "localhost")
  -port int
    	What port to bind ucd-username-server to. This fs is DEPRECATED. Please use -server-uri instead. (default 8080)
  -punct
    	Do not filter out punctuation during processing
  -server-uri string
    	A valid aaronland/go-http-server URI. (default "http://localhost:8080")
  -spaces
    	Do not filter out whitespace during processing

For example:

./bin/ucd-username-server -port 8080
2017/04/07 18:02:51 listening on localhost:8080

And then:

# as in: http://localhost:8080?username=mr. ๐Ÿ˜

$> curl -s -i 'http://localhost:8080/api?username=mr.+%F0%9F%98%81'
HTTP/1.1 200 OK
Content-Length: 29
Content-Type: text/plain
Date: Sat, 08 Apr 2017 01:02:58 GMT

mrgrinningfacewithsmilingeyes
WASM

The ucd-wasm.go tool exports the ucd-username functionality was a WebAssembly binary. This binary is bundled with the ucd-username-server application and is exposed on the / endpoint, assuming the -enable-www flag is true.

As of this writing space and punctuation are explicitly disallowed when converting usernames. That could/should be modified to check input variables (in JavaScript-land) but today it does not.

Docker

Yes.

docker build -t ucd-username .
docker run -p 6161:8080 -e HOST='0.0.0.0' ucd-username

curl 'localhost:6161?/apiusername=\U+01F937'
shrug

Versions

go-ucd-username supports Unicode 13.0 (specifically aaronland/go-ucd/v13 as of February, 2021 and requires Go 1.16 or higher to compile.

See also

Documentation ยถ

Index ยถ

Constants ยถ

This section is empty.

Variables ยถ

This section is empty.

Functions ยถ

This section is empty.

Types ยถ

type UCDUsername ยถ

type UCDUsername struct {
	AllowSpaces      bool
	AllowPunctuation bool
	Debug            bool
	// contains filtered or unexported fields
}

func NewUCDUsername ยถ

func NewUCDUsername() (*UCDUsername, error)

func (*UCDUsername) Translate ยถ

func (u *UCDUsername) Translate(raw string) (string, error)

Directories ยถ

Path Synopsis
cmd
http
api

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL