dsbldr

package module
v0.0.0-...-e4a8277 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 3, 2023 License: MIT Imports: 5 Imported by: 0

README

MLDatasetCreator\nA Golang tool that provides efficient creation of Machine Learning Datasets from public Social Network APIs. Its main function is to manage concurrent operations and save data, offering a user-friendly and intuitive API.\n\n## Overview\nSocial APIs are frequently used for creating datasets for training Machine Learning models. For instance, models like Tweet2Vec aim to extract features or create embeddings from such data.\n\nMany models, particularly NLP-oriented ones, can benefit from a large repository of structured text that may or may not carry labeling.\n\nOften, creating such datasets takes time away from feature engineering and model formulation work. This tool aims to simplify the process.\n\nAlthough the author is new to the Machine Learning space, they are open to feedback and hope to share something genuinely useful in the near future.\n\n## TODO\n- [x] Feature-based API\n- [x] Concurrency using Goroutines, channels\n- [ ] Caching operations to prevent repeated requests\n- [ ] Save data in various formats (CSV, JSON)\n- [ ] Support different API data formats (JSON, XML)\n- [ ] Authentication\n- [ ] Command line functionality\n- [ ] Demo\n- Additional ideas and suggestions are welcome

Documentation

Index

Constants

View Source
const (
	SingleRetrieve = iota
	RepeatedRetrieve
)

Structs representing RetreiveType SingleRetrieve Features only require one request to create the JSON Dump that's passed to the RunFunc Repeated Retrieve Features require one request per value-set of of parent features that are concatenated into a JSON array and then passed to the Features RunFunc Almost as a given, all dependent features will be of RepeatedRetrieve per value sets of their parent features

Variables

This section is empty.

Functions

func BasicOAuthHeader

func BasicOAuthHeader(consumerKey, nonce, signature, signatureMethod,
	timestamp, token string) string

BasicOAuthHeader spits out a basic OAuth Header based on access token

func WithBasicAuth

func WithBasicAuth(username, password string)

WithBasicAuth is a Builder option that adds a username and password for Basic API authentication

Types

type Builder

type Builder struct {
	BaseURL string
	// contains filtered or unexported fields
}

Builder is main type for this tool.

func NewBuilder

func NewBuilder(featureCount, recordCount int, options ...func(*Builder)) *Builder

NewBuilder creates new Builder struct

func (*Builder) AddFeatures

func (b *Builder) AddFeatures(features ...*Feature)

AddFeatures adds a Feature struct to the "Features" Field on Builder

func (*Builder) GetFeature

func (b *Builder) GetFeature(name string) *Feature

GetFeature returns a feature in the detaset based on it's name

func (*Builder) Run

func (b *Builder) Run(client endpointClient) error

Run Builder to aggregate all features and manage concurrent operations

func (*Builder) Save

func (b *Builder) Save(writer csv.Writer) error

Save commits the downloaded features to a file

func (*Builder) SaveIf

func (b *Builder) SaveIf(writer csv.Writer, saveCond func(r []string) bool) error

SaveIf saves records only if saveCond evaluate to true

type Feature

type Feature struct {
	Name         string
	Endpoint     string  // API Endpoint
	RunFunc      RunFunc // function that performs ad-hoc computation
	RetrieveType int     // Determines if multiple or single requests are made to the api
	// contains filtered or unexported fields
}

Feature in the dataset, on which all other features are based on

func NewFeature

func NewFeature() *Feature

NewFeature creates new Feature with defaults

type RunFunc

type RunFunc func(responses []string) []string // parents map[string]string

RunFunc holds the computation that processes the API responses to features is sent an array of JSON strings as the responses ??as well as a map of data from the features parent features?? Basically what you do with the run function is take in a string of serialized API data (could be in JSON or XML), do parsing on your own or using utility functions. You do whatever computations you want and then spit it back as an array of strings to read to CSV or JSON

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL