oxfordflowers102

package
v0.9.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 20, 2024 License: Apache-2.0 Imports: 23 Imported by: 0

README

Oxford Flowers 102 Dataset

https://www.robots.ox.ac.uk/~vgg/data/flowers/102/

102 category dataset, consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories.

The dataset is divided into a training set, a validation set and a test set. The training set and validation set each consist of 10 images per class (totalling 1020 images each). The test set consists of the remaining 6149 images (minimum 20 per class). Total download in ~330Mb.

More information in the TensorFlow Datasets page:

https://www.tensorflow.org/datasets/catalog/oxford_flowers102

This package provides a train.Dataset with the images.

Under it you will also find a diffusion demo model trains a diffusion model, following the Keras example in:

https://keras.io/examples/generative/ddim/

Documentation

Overview

Package oxfordflowers102 provides tools to download and cache the dataset and a `train.Dataset` implementation that can be used to train models using GoMLX (http://github.com/gomlx/gomlx/).

Details in the README.md file. The dataset's home page is in https://www.robots.ox.ac.uk/~vgg/data/flowers/102/

Usage example:

Index

Constants

This section is empty.

Variables

View Source
var (
	DownloadBaseURL           = "https://www.robots.ox.ac.uk/~vgg/data/flowers/102/"
	DownloadSubdir            = "downloads"
	DownloadFilesAndChecksums = []struct {
		File, Checksum, UntarDir string
	}{

		{"102flowers.tgz", "", "jpg"},
		{"imagelabels.mat", "4903e94206bac23bf772aadf06451916df56b58fc483a62db32a97b82656651d", ""},
		{"setid.mat", "46b8678f91fd95d3c8f4feab80d271a6c834a1dd896fe29fd3e6ad9ce5c8dccd", ""},
	}
)
View Source
var (
	// AllLabels of the dataset. Converted to 0-based (0 to 101).
	// Only available after DownloadAndParse is successfully called.
	AllLabels []int32

	// AllImages of the dataset, the path to the images that is.
	// Only available after DownloadAndParse is successfully called.
	AllImages []string

	// NumExamples is the number of examples (images and labels) in the dataset.
	// Only available after DownloadAndParse is successfully called.
	NumExamples int

	// ImagesDir where images are stored. Only available after DownloadAndParse is
	// successfully called.
	ImagesDir string

	// NumLabels is 102, hence the name.
	NumLabels = 102

	// Names of all the 102 flowers in the dataset.
	Names = []string{}/* 102 elements not displayed */

)

Functions

func DownloadAndParse

func DownloadAndParse(baseDir string) error

DownloadAndParse "Oxford Flowers 102" dataset files to baseDir and untar it. If files are already downloaded, their previous copy is used.

After download, the contents of the files are parsed, and the global AllLabels is set.

func InMemoryDataset

func InMemoryDataset(manager *Manager, baseDir string, imageSize int, name string,
	partitionSeed int64, partitionFrom, partitionTo float64) (
	inMemoryDataset *data.InMemoryDataset, err error)

InMemoryDataset creates a `data.InMemoryDataset` with the Oxford Flowers 102, of the given `imageSize` for both, height and width -- image is resized and then cropped at the center.

A cache version is automatically saved at the `baseDir` and prefixed with `name`, if it is not empty. And if a cache file is found, it is used, instead of re-reading and processing all the images.

It takes a partition of the data, defined by `partitionFrom` and `partitionTo`. They take values from 0.0 to 1.0 and represent the fraction of the dataset to take. They enable selection of arbitrary train/validation/test sizes. The `partitionSeed` can be used to generate different assignments -- the same seed should be used for the different partitions of the dataset.

If the cache is not found, it automatically calls DownloadAndParse to download and untar the original images, if they are not yet downloaded.

func ParseImages

func ParseImages(dirPath string) error

func ParseLabels

func ParseLabels(filePath string) error

func ReadExample

func ReadExample(idx int) (img image.Image, label int32, err error)

ReadExample reads an image for the example idx. The example idx must be between 0 and NumExamples.

Types

type Dataset

type Dataset struct {
	// contains filtered or unexported fields
}

Dataset implements train.Dataset, and yields one image at a time. It pre-transforms the image to the target `imageSize`.

func NewDataset

func NewDataset(dtype shapes.DType, imageSize int) *Dataset

NewDataset returns a Dataset for one epoch that yields one image at time. It reads them from disk, and the parsing can be parallelized. See `data.NewParallelDataset`.

The images are resized and cropped to `imageSize x imageSize` pixel, cut from the middle.

It doesn't support batch, but you can use GoMLX's `data.Batch` for that.

func (*Dataset) Name

func (ds *Dataset) Name() string

Name implements train.Dataset interface.

func (*Dataset) Partition added in v0.4.0

func (ds *Dataset) Partition(seed int64, from, to float64) *Dataset

Partition allows one to partition the dataset into different parts -- typically "train", "validation" and "test". This should be called before the start of an epoch.

It takes a seed number based on which the partitions will be selected, and the range of elements specified as `from` and `to`: these are float values that represent the slice (from 0.0 to 1.0) of the examples that go into this dataset.

Example:

seed := int64(42)
dsTrain := oxfordflowers102.NewDataset(shapes.F32, 75).Partition(seed, 0, 0.8)   // 80%
dsValid := oxfordflowers102.NewDataset(shapes.F32, 75).Partition(seed, 0.8, 0.9) // 10%
dsTest := oxfordflowers102.NewDataset(shapes.F32, 75).Partition(seed, 0.9, 1.0)  // 10%

func (*Dataset) Reset

func (ds *Dataset) Reset()

Reset implements train.Dataset interface.

func (*Dataset) Shuffle

func (ds *Dataset) Shuffle() *Dataset

Shuffle will shuffle the order of the images. This should be called before the start of an epoch.

Once shuffled, every time the dataset is reset, it is reshuffled.

func (*Dataset) Yield

func (ds *Dataset) Yield() (spec any, inputs []tensor.Tensor, labels []tensor.Tensor, err error)

Yield implements train.Dataset interface. It returns `ds` (the Dataset pointer) as spec.

It yields one example at a time, each consists of:

  • `inputs`: three values: the image itself and a scalar `int32` with the index of the example and finally the type of flower (from 0 to `NumLabels-1`=101). The index of the example can be used, for instance, to split the dataset (into training/validation/test).
  • `labels`: the type of flower (same as `inputs[2]`), an `int32` value from 0 to `NumLabels-1` with the label.

Directories

Path Synopsis
Package diffusion contains an example diffusion model, trained on Oxford Flowers 102 dataset.
Package diffusion contains an example diffusion model, trained on Oxford Flowers 102 dataset.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL