reddit

command
v0.0.0-...-58dda5d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 10, 2017 License: MIT Imports: 17 Imported by: 0

README

Reddit comment analysis

Notes:

  • bzip2 is horrible. Uncompression is painfully slow. Files are downloaded and recompressed using gzip. All files takes about 150GB
  • Uses poor mans Map-Reduce. Each file is converted to a word/count file and at the end they are merged.
  • the scoring is really slow and O(n^2). This could be done in parallel
  • Tried to eliminate the Huge numbers of brand names, tv show characters, fantasy stuff.
  • Some misspelling are so common they make the 90% percentile!

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL