Still
A command-line tool to filter out needless text by using statistical classifier.
Installation
For installation, execute the following command:
$ go get github.com/mitsuse/still/cmd/still
Dependencies
Still includes the following libraries as vendored packages:
Usage
Build a model
Still requires the model file to filter out text,
which consists of weights for the binary linear classifier.
To build the model, use still build
:
$ still build -m model.still -e examples.json -i 3
-m
represents the output path of a built model.
-e
is used to specify the path of training data.
The JSON of training data should be a single array of objects which consists of "text" and "class" as follow:
[
{
"text": "Go 1.5 is released https://blog.golang.org/go1.5 #go_blog",
"class": 1
},
{
"text": "OnHub – Google https://on.google.com/hub/",
"class": 0
}
]
The "text" field is used for example of classification.
The "class" field represents the correct label of classification result.
To set the number of iterations, use -i
.
The training data are read N times when N is given as the value for -i
.
Test a model
Still can test the trained model on test data with the following command:
$ still test -m model.still -e examples.json
-m
is used for the path of a training model.
-e
represents the path of test data.
The test data has the same format as the training data.
Test command show precision and recall.
Filter out text
Still is used as a filter for the standard IO like grep
:
$ cat input.txt | still filter -m model.still
In the above command, The classification examples are lines of input.txt
.
The option -f
can be used to print filtered-out text instead.
License
Please read LICENSE.txt.