tarball_divide

command
v0.0.0-...-2d2ca90 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 29, 2022 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

tarball_divide takes a .tar.gz file of images and outputs one or more .tar.gz of images, where each contains images in a certain base directory. For example, the input tarball might contain three image files:

another_dir/dog/b.png
Yet_another_dir/dog/c.png
some_dir/0/a.png

The output includes two tarballs

- 0.tar.gz includes one file

  • some_dir/0/a.png

- dog.tar.gz inlcudes

  • another_dir/dog/b.png
  • Yet_another_dir/dog/c.png

tarball_merge can then merge the two tarballs into one, with an interleaved order of images:

another_dir/dog/b.png
some_dir/0/a.png
Yet_another_dir/dog/c.png

We need this pair of tools because the training of image classification models often needs to read images from a container file, the tarball, and we want each minibatch of successive images belong to different lables. In the convention, the base directory name is the label.

You can download the MNIST PNG dataset from https://github.com/myleott/mnist_png as /tmp/mnist_png.tar.gz and divide it using the following commands:

go install ./... tarball_divide -out=/tmp /tmp/mnist_png.tar.gz

The above command generates /tmp/[0-9].tar.gz, each of which contains only regular image files, no longer the directories.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL