cntblank

command module
v0.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 28, 2015 License: MIT Imports: 15 Imported by: 0

README

Build Status

Count Blank Cells

Count blank cells on text-based tabular data.

Short examples:

$ ./cntblank --output-meta testdata/prefecture_jp.tsv
INFO[0000] start parsing with 2 columns.                 path=testdata/prefecture_jp.tsv
INFO[0000] finish parsing 48 lines to get 47 records with 2 columns. 0 errors detected.  path=testdata/prefecture_jp.tsv
# File          testdata/prefecture_jp.tsv      prefecture_jp.tsv
# Field         2                               (has header)
# Record        47
seq     Name    #Blank  %Blank  MinLength       MaxLength       #Int    #Float  #Bool   #Time    Minimum Maximum #True   #False
1       都道府県コード  0       0.0000  2       2       47       47               1       47
2       都道府県        0       0.0000  3       4

Since file argument is optional, it accepts standard input.

$ curl -sL https://raw.githubusercontent.com/datasets/language-codes/master/data/ietf-language-tags.csv |
      ./cntblank --input-delimiter=, 2>/dev/null | cut -f1-6 | expand -t 15
seq            Name           #Blank         %Blank         MinLength      MaxLength
1              lang           0              0.0000         2              14
2              langType       0              0.0000         2              4
3              territory      213            0.3096         2              3
4              revGenDate     0              0.0000         10             10
5              defs           0              0.0000         1              1
6              file           0              0.0000         6              18

Full Usage

--help shows the details.

  • Default input/output encoding is "UTF-8" and it also accepts only "sjis" value on the option.
  • Default input/output delimiter is TAB.
  • Meta-information is file path, field length, and number of records.
  • If no file path arguments are given, process standard input.
usage: cntblank [<flags>] [<tabfile>...]

Count blank cells on text-based tabular data.

Flags:
  --help               Show help (also see --help-long and --help-man).
  -v, --verbose        Set verbose mode on.
  -e, --input-encoding=INPUT-ENCODING
                       Input encoding.
  -E, --output-encoding=OUTPUT-ENCODING
                       Output encoding.
  --input-delimiter=INPUT-DELIMITER
                       Input field delimiter.
  --output-delimiter=OUTPUT-DELIMITER
                       Output field delmiter.
  --without-header     Tabular does not have header line.
  --strict             Check column size strictly.
  --output-meta        Put meta information.
  -o, --output=OUTPUT  Output file.
  --version            Show application version.

Args:
  [<tabfile>]  Tabular data files.
Output fields
Name Description
seq Sequential number which starts with one.
Name Field name from first header line, otherwise "ColumnNNN" where NNN is sequential number.
#Blank Count of blank cells.
%Blank Percentage of blank cells.
MinLength Minimum length of valid cells.
MaxLength Maximum length of valid cells.
#Int Count of integer type cells. This may be blank.
#Float Count of float type cells. This may be blank.
#Bool Count of bool type cells. This may be blank.
#Time Count of time type cells. This may be blank.
Minimum Minimum value after guessing data type.
Maximum Maximum value after guessing data type.
#True Count of cells which should be treated as boolean true.
#False Count of cells which should be treated as boolean false.

Note that "1" is interpreted as boolean true and "0" is also interpreted as boolean false. Therefore, if a column is integer field, "#True" represents the count of "1" and "#False" represents the count of "0" in the field. Since some buggy data files sometimes include "0" as null accidentally, this feature may help you to count up pseudo blank cells.

Development setup

  • Golang 1.5
Library Dependency
  • github.com/Sirupsen/logrus
  • gopkg.in/alecthomas/kingpin.v2
  • github.com/asaskevich/govalidator

Setup using Makefile calling go get -d:

$ make setup
Build
$ go build -o cntblank

build.sh is a build script to generate binary files for multiple architecture using docker container. And, make wraps whole processes to generate binaries.

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL