mlflow

package module
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 3, 2023 License: BSD-3-Clause Imports: 23 Imported by: 0

README

mlflow-go

Go MLFlow client.

Supports the Tracking API, with local files and HTTP.

Usage

See the examples in conformance/main.go, or fully rendered documentation on pkg.go.dev.

Development

Install Bazel using Bazelisk. Some tests require Bazel to run (i.e. they are not run by go test).

If you want to use the go tool instead of / in addition to Bazel, you can install Go on your own or use the version that Bazel downloads.

After a bazel test //..., you should be able to find the Go binary like so:

find -L bazel-mlflow-go/external -wholename "*/bin/go"

Install pre-commit.

Manual tests

There are some tests that assume something about the environment. They can be run with go test -tags manual, or by specifying the exact target to bazel test. When making changes to the code that is not well-covered by the unit tests, please run the manual tests.

You can list the manual tests with:

bazel query "attr(tags, '\\bmanual\\b', //...)"

Documentation

Overview

Package mlflow implements an MLFlow client.

It supports the Tracking API, with local files and HTTP. The API is modeled after the official Python client, so the official MLFlow docs may be useful.

Authentication to Databricks-hosted MLFlow is only supported via access token, not via Databricks username and password. Follow the personal acess token instructions to get one.

The API is organized into a hierarchy of interfaces: - Tracking: represents an MLFlow tracking server. - Experiment: represents an MLFlow experiment. - Run: represents an MLFlow run.

Index

Examples

Constants

View Source
const (
	TrackingURIEnvName  = "MLFLOW_TRACKING_URI"
	ExperimentIDEnvName = "MLFLOW_EXPERIMENT_ID"
	RunIDEnvName        = "MLFLOW_RUN_ID"
	BearerTokenEnvName  = "MLFLOW_TRACKING_TOKEN"

	// https://www.mlflow.org/docs/latest/tracking.html#system-tags
	GitCommitTagKey   = "mlflow.source.git.commit"
	ParentRunIDTagKey = "mlflow.parentRunId"
	UserTagKey        = "mlflow.user"
	SourceNameTagKey  = "mlflow.source.name"
	SourceTypeTagKey  = "mlflow.source.type"

	SourceTypeJob   = "JOB"
	SourceTypeLocal = "LOCAL"

	HostTagKey = "host"
)

Variables

View Source
var (
	ErrUnsupported = errors.New("this operation not supported by this tracking client")
)

Functions

func LogStructAsParams

func LogStructAsParams(run Run, obj interface{}) error

LogStructAsParams logs the fields of the given obj as params.

func ToURI

func ToURI(path string) string

Types

type ArtifactRepo

type ArtifactRepo interface {
	// localPath is the path to the file on the local filesystem.
	// artifactPath is the directory in the artifact repo to upload the file to.
	// Kinda weird, but this is how the python client does it.
	LogArtifact(localPath, artifactPath string) error
	// LogArtifacts logs all of the files in a directory tree as artifacts.
	// localPath is the path to the directory on the local filesystem.
	// artifactPath is the directory in the artifact repo to upload to.
	LogArtifacts(localDir, artifactPath string) error
}

ArtifactRepo is an interface for logging artifacts. It is generally used indirectly via [Run.LogArtifact].

func NewDBFSArtifactRepo

func NewDBFSArtifactRepo(restStore *RESTStore, uri string) (ArtifactRepo, error)

This assumes uri is for the root of a run. We don't handle sub-directories in the same way the python client does.

func NewFileArtifactRepo

func NewFileArtifactRepo(rootDir string) (ArtifactRepo, error)

type DBFSArtifactRepo

type DBFSArtifactRepo struct {
	// contains filtered or unexported fields
}

DBFSArtifactRepo uploads to DBFS (Databricks File System). Generally it is used indirectly via [Run.LogArtifact].

func (*DBFSArtifactRepo) LogArtifact

func (repo *DBFSArtifactRepo) LogArtifact(localPath, artifactPath string) error

Implements [ArtifactRepo.LogArtifact].

func (*DBFSArtifactRepo) LogArtifacts

func (repo *DBFSArtifactRepo) LogArtifacts(localPath, artifactPath string) error

Implements [ArtifactRepo.LogArtifacts].

type Experiment

type Experiment interface {
	CreateRun(name string) (Run, error)
	GetRun(runId string) (Run, error)
	ID() string
}

type FileArtifactRepo

type FileArtifactRepo struct {
	// contains filtered or unexported fields
}

FileArtifactRepo writes to a local file system. Generally it is used indirectly via [Run.LogArtifact].

func (*FileArtifactRepo) LogArtifact

func (repo *FileArtifactRepo) LogArtifact(localPath, artifactPath string) error

func (*FileArtifactRepo) LogArtifacts

func (repo *FileArtifactRepo) LogArtifacts(localPath, artifactPath string) error

type FileStore

type FileStore struct {
	// contains filtered or unexported fields
}

Implements Tracking interface

func NewFileStore

func NewFileStore(rootDir string) (*FileStore, error)

func (*FileStore) CreateExperiment

func (fs *FileStore) CreateExperiment(name string) (Experiment, error)

func (*FileStore) ExperimentsByName

func (f *FileStore) ExperimentsByName() (map[string]Experiment, error)

func (*FileStore) GetExperiment

func (fs *FileStore) GetExperiment(id string) (Experiment, error)

func (*FileStore) GetOrCreateExperimentWithName

func (fs *FileStore) GetOrCreateExperimentWithName(name string) (Experiment, error)

Gets or creates an experiment and returns it.

func (*FileStore) SearchRuns

func (fs *FileStore) SearchRuns(experimentIDs []string, filter string, orderBy []string, pageToken string) ([]Run, string, error)

func (*FileStore) UIURL

func (fs *FileStore) UIURL() string

func (*FileStore) URI

func (s *FileStore) URI() string

type Metric

type Metric struct {
	Key string
	Val float64
}

type Param

type Param struct {
	Key string
	Val string
}

type RESTStore

type RESTStore struct {
	// contains filtered or unexported fields
}

Implements Tracking interface See https://www.mlflow.org/docs/latest/rest-api.html for the REST API documentation.

func (*RESTStore) CreateExperiment

func (rs *RESTStore) CreateExperiment(name string) (Experiment, error)

func (*RESTStore) ExperimentsByName

func (rs *RESTStore) ExperimentsByName() (map[string]Experiment, error)

func (*RESTStore) GetExperiment

func (rs *RESTStore) GetExperiment(id string) (Experiment, error)

func (*RESTStore) GetOrCreateExperimentWithName

func (rs *RESTStore) GetOrCreateExperimentWithName(name string) (Experiment, error)

func (*RESTStore) SearchRuns

func (rs *RESTStore) SearchRuns(experimentIDs []string, filter string, orderBy []string, pageToken string) ([]Run, string, error)

func (*RESTStore) UIURL

func (rs *RESTStore) UIURL() string

func (*RESTStore) URI

func (rs *RESTStore) URI() string

type Run

type Run interface {
	SetName(name string) error
	Name() string
	SetTag(key, value string) error
	SetTags(tags []Tag) error
	GetTag(key string) (string, error)
	LogArtifact(localPath, artifactPath string) error
	LogMetric(key string, val float64, step int64) error
	LogMetrics(metrics []Metric, step int64) error
	LogParam(key, value string) error
	LogParams(params []Param) error
	GetParam(key string) (string, error)
	End() error
	Fail() error
	UIURL() string
	ID() string
	ExperimentID() string
}

func ActiveRunFromConfig

func ActiveRunFromConfig(experimentName string, l *log.Logger, config interface{}) (Run, error)

Same as ActiveRunFromEnv, but uses the given struct as the source of config values. The struct must have string fields named that match the environment variable names, e.g. TrackingURIEnvName.

func ActiveRunFromEnv

func ActiveRunFromEnv(experimentName string, l *log.Logger) (Run, error)

Returns the singleton active run. If it has not been set, a new run will be created in the experiment named experimentName. If experimentName is not set, falls back to: 1. The value of the ExperimentIDEnvName environment variable. 2. The experiment with ID "0".

Differences from the python client: - Doesn't create nested runs. - No automatic switching to a new run if the active run finishes.

Example
run, err := ActiveRunFromEnv("", log.Default())
if err != nil {
	panic(err)
}
for i := int64(0); i < 10; i++ {
	run.LogMetric("metric0", float64(i+1), i)
}
run.SetTag("tag0", "value0")
run.LogParam("param0", "value0")

tempDir, err := os.MkdirTemp("", "*")
if err != nil {
	panic(err)
}
artifactPath := filepath.Join(tempDir, "artifact0.txt")
if err = os.WriteFile(artifactPath, []byte("hello\n"), 0644); err != nil {
	panic(err)
}
if err = run.LogArtifact(artifactPath, ""); err != nil {
	panic(err)
}

if err = run.End(); err != nil {
	panic(err)
}
Output:

type Tag

type Tag struct {
	Key string
	Val string
}

type Tracking

type Tracking interface {
	ExperimentsByName() (map[string]Experiment, error)
	CreateExperiment(name string) (Experiment, error)
	GetOrCreateExperimentWithName(name string) (Experiment, error)
	GetExperiment(id string) (Experiment, error)
	URI() string
	UIURL() string
	// Returns (matching runs, next page token, error)
	SearchRuns(experimentIDs []string, filter string, orderBy []string, pageToken string) ([]Run, string, error)
}

Tracking is an interface for an MLFlow tracking server. It is generally used indirectly via ActiveRunFromEnv, or you can create one with NewTracking.

func NewRESTStore

func NewRESTStore(baseURL, bearerToken string) (Tracking, error)

func NewTracking

func NewTracking(uri, bearerToken string, l *log.Logger) (Tracking, error)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL