charsetdetect

package module
v0.0.0-...-c125e74 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 20, 2015 License: MIT Imports: 3 Imported by: 0

README

Golang bindings for libdetectcharset

This is Go bindings for the C-library libdetectcharset which is itself a C-wrapper for the C++ library Universal Charset Detector by Mozilla.

I was interested in comparing performance of several available charset detectors and created this bindings for this purpose.

Performance

I bencharked four charset detectors available in Go:

The test was conducted on a MacBook Air 2013 (1.3 GHz Intel Core i5) using 1 CPU core. Test data – 4 HTML pages:

  • 106 KB – UTF-8
  • 105 KB – incorrect UTF-8
  • 521 KB - CP1251
  • 61 KB - KOI8-R
Detector (4x, full text) ns/op B/op allocs/op
Enca 7144856 64 4
CharsetDetect 8777701 64 4
ICU 36366853 2204 60
Chardet 68107438 53828 32
Detector (4x, 4096 bytes) ns/op B/op allocs/op
Enca 89757 64 4
CharsetDetect 187303 64 4
ICU 4393345 53824 32
Chardet 8906792 2165 60

endevit/enca was a clear winner both in speed and quality. aglyzov/charsetdetect was coming the second. So personally I recommend using endevit/enca instead.

Documentation

Overview

Package charsetdetect provides minimal cgo bindings for libcharsetdetect (included)

Source code and project home: https://github.com/aglyzov/go-charsetdetect

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DetectCharset

func DetectCharset(text []byte) (charset string, err error)

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL