lmr

command module
v0.0.0-...-bd4732b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 10, 2024 License: MIT Imports: 7 Imported by: 0

README

LMR: Little MapReduce (or Local MapReduce)

lmr is a tool for executing MapReduce-shaped tasks on a workstation. It's similar to tools like xargs or parallel but provides a bit more structure around job execution, failure handling, caching, and output management.

LMR is currently experimental, and may undergo significant API changes before a proper v1 release. Depend on it at your own risk.

Reference

Mapper Protocol

A mapper is provided as the first argument, and is required. A mapper can be a binary on the PATH, any executable file, or a shell command. The mapper is executed for each input chunk. The chunk is provided on stdin. By default any mapper stdout is grouped under a "default" key. Additional outputs keys can be created by writing to files in the "results" dir. This directory is provided to the mapper script in the LMR_RESULTS_DIR environment variable. For example, executing echo ... > $LMR_RESULTS_DIR/foo would produce an output with the foo key.

Roadmap

  • Mapper
    • Better error messages on stage failure
    • Configurable parallelism
    • Optional Resubmission
    • Chunk output caching
      • Cache management commands
      • Configurable Cache Size
    • Progress Bar
    • Performance stats on map stages
  • Reducer
    • Keyed output protocol. E.g. what happens when there are multiple keys?
    • Script reducer
    • Canned reducers
      • Concat + custom separator
      • Json Array
      • sum
  • Project
    • Code Health
      • CI
      • Tests
      • Lints
      • Separate Modules
    • Example in docs
    • Binary builds

Documentation

Overview

Command lmr is the main entrypoint for little map reduce.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL