lvis_cifar10_axon

command
v0.0.0-...-a39741f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 13, 2023 License: BSD-3-Clause Imports: 58 Imported by: 0

README

LVis CIFAR-10 TE16deg

This is a "standard" version of the LVis model with a 16 degree receptive field width for the highest TE level neurons, implemented using the axon spiking activation algorithm, applied to the standard CIFAR-10 image dataset: https://www.cs.toronto.edu/~kriz/cifar.html

The cemer lvix_fix8.proj version has "blob" color filters in addition to the monochrome gabor filters, which are also used here.

CIFAR-10 Image handling

The cifar_images.go file manages access to the original binary (.bin) files, loading them whole into memory and providing lists of train and test items sorted by category as well as a flat list used in the environment.

The ImagesEnv environment in images_env.go applies V1-like gabor filtering to make the pixel images much sparser and non-overlapping in ways that reflect the way the visual system actually works. It can add in-plane affine transformations (translation, scale, rotation) (now known as "data augmentation"). By default these are not used.

Benchmark Results

Old (circa 2015) results: https://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html -- ranged from roughly 20% error for early models, 10% for basic early DCNN's, and roughly 4% error for the best as of 2015.

A manual analysis of the dataset: http://karpathy.github.io/2011/04/27/manually-classifying-cifar10/ suggested that the human performance level might be about 6% error, and noted that the dataset is quite varied in terms of images within each category. Karpathy estimated that a 20% error level was "reasonable" given how inconsistent the images are. It seems likely given these analyses that models are learning pure pattern discriminations, not actually recognizing the shapes, and relying on things like texture -- very similar overall to the issues with ImageNet.

In contrast, the CU3D100 dataset of rendered 3D objects on uniform backgrounds really tests the systematic representation of shape per se, and the same model used here achieves high levels of training and generalization.

Axon Results

Current results are not very good overall, only getting roughly 50% correct on training and testing (very similar performance between the two), which improves to roughly 25% if you count the top two responses. Need to do more repeat testing to determine noise effects -- the raw testing performance does exhibit significant variability.

Documentation

Overview

lvis explores how a hierarchy of areas in the ventral stream of visual processing (up to inferotemporal (IT) cortex) can produce robust object recognition that is invariant to changes in position, size, etc of retinal input images.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL