hercules

package module
v1.0.0-...-af2d8db Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 11, 2017 License: MIT Imports: 15 Imported by: 0

README

Build Status

Hercules

This tool calculates the lines burnout stats in a Git repository. Exactly the same what git-of-theseus does actually, but using go-git. Why? source{d} builds it's own data pipeline to process every git repository in the world and the calculation of the annual burnout ratio will be embedded into it. This project is an open source implementation of the specific git blame flavour on top of go-git. Blaming is done incrementally using the custom RB tree tracking algorithm, only the last modification date is recorded.

There are two tools: hercules and labours.py. The first is the program written in Go which collects the burnout stats from a Git repository. The second is the Python script which draws the stack area plot and optionally resamples the time series. These two tools are normally used together through the pipe. hercules prints results in plain text. The first line is four numbers: UNIX timestamp which corresponds to the time the repository was created, UNIX timestamp of the last commit, granularity and sampling. Granularity is the number of days each band in the stack consists of. Sampling is the frequency with which the burnout state is snapshotted. The smaller the value, the more smooth is the plot but the more work is done.

git/git image

torvalds/linux burndown (granularity 30, sampling 30, resampled by year)

There is an option to resample the bands inside labours.py, so that you can define a very precise distribution and visualize it different ways. Besides, resampling aligns the bands across periodic boundaries, e.g. months or years. Unresampled bands are apparently not aligned and start from the project's birth date.

There is a presentation available.

Installation

You are going to need Go and Python 2 or 3.

go get gopkg.in/src-d/hercules.v1/cmd/hercules
pip install pandas seaborn
wget https://github.com/src-d/hercules/raw/master/labours.py

Usage

# Use "memory" go-git backend and display the plot. This is the fastest but the repository data must fit into RAM.
hercules https://github.com/src-d/go-git | python3 labours.py --resample month
# Use "file system" go-git backend and print the raw data.
hercules /path/to/cloned/go-git
# Use "file system" go-git backend, cache the cloned repository to /tmp/repo-cache and display the unresampled plot.
hercules https://github.com/git/git /tmp/repo-cache | python3 labours.py --resample raw

# Now something fun
# Get the linear history from git rev-list, reverse it
# Pipe to hercules, produce the snapshots for every 30 days grouped by 30 days
# Save the raw data to cache.txt, so that later simply cat cache.txt | python3 labours.py
# Pipe the raw data to labours.py, set text font size to 16pt, use Agg matplotlib backend and save the plot to output.png
git rev-list HEAD | tac | hercules -commits - https://github.com/git/git | tee cache.txt | python3 labours.py --font-size 16 --backend Agg --output git.png

Option -files additionally prints the corresponding burndown table for every file in the repository.

Caveats

  1. Currently, go-git's "file system" backend is much slower than the in-memory one, so you should clone repos instead of reading them from disk whenever possible.

License

MIT.

Documentation

Overview

Package hercules contains the functions which are needed to gather the line burndown statistics from a Git repository.

Analyser is the main object which concentrates the high level logic. It provides Commits() and Analyse() methods to get the work done. The following example was taken from cmd/hercules:

var repository *git.Repository
// ... initialize repository ...
analyser := hercules.Analyser{
	Repository: repository,
	OnProgress: func(commit, length int) {
		fmt.Fprintf(os.Stderr, "%d / %d\r", commit, length)
	},
	Granularity:         30,
	Sampling:            15,
	SimilarityThreshold: 90,
	Debug:               false,
}
commits := analyser.Commits()  // or specify a custom list
statuses := analyser.Analyse(commits)
// [y][x]int64 where y is the snapshot index and x is the granulated time index.

As commented in the code, the list of commits can be any valid slice of *object.Commit. The returned statuses slice of slices is a rectangular 2D matrix where the number of rows equals to the repository's lifetime divided by the sampling value (detail factor) and the number of columns is the repository's lifetime divided by the granularity value (number of bands).

Analyser depends heavily on https://github.com/src-d/go-git and leverages the diff algorithm through https://github.com/sergi/go-diff.

Besides, hercules defines File and RBTree. These are low level data structures required by Analyser. File carries an instance of RBTree and the current line burndown state. RBTree implements the red-black balanced binary tree and is based on https://github.com/yasushi-saito/rbtree.

Index

Constants

View Source
const TreeEnd int = -1

TreeEnd denotes the value of the last leaf in the tree.

Variables

This section is empty.

Functions

This section is empty.

Types

type Analyser

type Analyser struct {
	// Repository points to the analysed Git repository struct from go-git.
	Repository *git.Repository
	// Granularity sets the size of each band - the number of days it spans.
	// Smaller values provide better resolution but require more work and eat more
	// memory. 30 days is usually enough.
	Granularity int
	// Sampling sets how detailed is the statistic - the size of the interval in
	// days between consecutive measurements. It is usually a good idea to set it
	// <= Granularity. Try 15 or 30.
	Sampling int
	// SimilarityThreshold adjusts the heuristic to determine file renames.
	// It has the same units as cgit's -X rename-threshold or -M. Better to
	// set it to the default value of 90 (90%).
	SimilarityThreshold int
	// Debug activates the debugging mode. Analyse() runs slower in this mode
	// but it accurately checks all the intermediate states for invariant
	// violations.
	Debug bool
	// OnProgress is the callback which is invoked in Analyse() to output it's
	// progress. The first argument is the number of processed commits and the
	// second is the total number of commits.
	OnProgress func(int, int)
}

Analyser allows to gather the line burndown statistics for a Git repository.

func (*Analyser) Analyse

func (analyser *Analyser) Analyse(commits []*object.Commit) ([][]int64, map[string][][]int64)

Analyse calculates the line burndown statistics for the bound repository.

commits is a slice with the sequential commit history. It shall start from the root (ascending order).

Returns the list of snapshots of the cumulative line edit times and the similar lists for every file which is alive in HEAD. The number of snapshots (the first dimension >[]<[]int64) depends on Analyser.Sampling (the more Sampling, the less the value); the length of each snapshot depends on Analyser.Granularity (the more Granularity, the less the value).

func (*Analyser) Commits

func (analyser *Analyser) Commits() []*object.Commit

Commits returns the critical path in the repository's history. It starts from HEAD and traces commits backwards till the root. When it encounters a merge (more than one parent), it always chooses the first parent.

type File

type File struct {
	// contains filtered or unexported fields
}

A file encapsulates a balanced binary tree to store line intervals and a cumulative mapping of values to the corresponding length counters. Users are not supposed to create File-s directly; instead, they should call NewFile(). NewFileFromTree() is the special constructor which is useful in the tests.

Len() returns the number of lines in File.

Update() mutates File by introducing tree structural changes and updating the length mapping.

Dump() writes the tree to a string and Validate() checks the tree integrity.

func NewFile

func NewFile(time int, length int, statuses ...map[int]int64) *File

NewFile initializes a new instance of File struct.

time is the starting value of the first node;

length is the starting length of the tree (the key of the second and the last node);

statuses are the attached interval length mappings.

func NewFileFromTree

func NewFileFromTree(keys []int, vals []int, statuses ...map[int]int64) *File

NewFileFromTree is an alternative constructor for File which is used in tests. The resulting tree is validated with Validate() to ensure the initial integrity.

keys is a slice with the starting tree keys.

vals is a slice with the starting tree values. Must match the size of keys.

statuses are the attached interval length mappings.

func (*File) Dump

func (file *File) Dump() string

Dump formats the underlying line interval tree into a string. Useful for error messages, panic()-s and debugging.

func (*File) Len

func (file *File) Len() int

Len returns the File's size - that is, the maximum key in the tree of line intervals.

func (*File) Status

func (file *File) Status(index int) map[int]int64

func (*File) Update

func (file *File) Update(time int, pos int, ins_length int, del_length int)

Update modifies the underlying tree to adapt to the specified line changes.

time is the time when the requested changes are made. Sets the values of the inserted nodes.

pos is the index of the line at which the changes are introduced.

ins_length is the number of inserted lines after pos.

del_length is the number of removed lines after pos. Deletions come before the insertions.

The code inside this function is probably the most important one throughout the project. It is extensively covered with tests. If you find a bug, please add the corresponding case in file_test.go.

func (*File) Validate

func (file *File) Validate()

Validate checks the underlying line interval tree integrity. The checks are as follows:

1. The minimum key must be 0 because the first line index is always 0.

2. The last node must carry TreeEnd value. This is the maintained invariant which marks the ending of the last line interval.

3. Node keys must monotonically increase and never duplicate.

type Item

type Item struct {
	// contains filtered or unexported fields
}

Item is the object stored in each tree node.

type Iterator

type Iterator struct {
	// contains filtered or unexported fields
}

Iterator allows scanning tree elements in sort order.

Iterator invalidation rule is the same as C++ std::map<>'s. That is, if you delete the element that an iterator points to, the iterator becomes invalid. For other operation types, the iterator remains valid.

func (Iterator) Equal

func (iter Iterator) Equal(iter_ Iterator) bool

func (Iterator) Item

func (iter Iterator) Item() *Item

Return the current element. Allows mutating the node (key to be changed with care!).

REQUIRES: !iter.Limit() && !iter.NegativeLimit()

func (Iterator) Limit

func (iter Iterator) Limit() bool

Check if the iterator points beyond the max element in the tree

func (Iterator) Max

func (iter Iterator) Max() bool

Check if the iterator points to the maximum element in the tree

func (Iterator) Min

func (iter Iterator) Min() bool

Check if the iterator points to the minimum element in the tree

func (Iterator) NegativeLimit

func (iter Iterator) NegativeLimit() bool

Check if the iterator points before the minimum element in the tree

func (Iterator) Next

func (iter Iterator) Next() Iterator

Create a new iterator that points to the successor of the current element.

REQUIRES: !iter.Limit()

func (Iterator) Prev

func (iter Iterator) Prev() Iterator

Create a new iterator that points to the predecessor of the current node.

REQUIRES: !iter.NegativeLimit()

type RBTree

type RBTree struct {
	// contains filtered or unexported fields
}

RBTree created by Yaz Saito on 06/10/12.

A red-black tree with an API similar to C++ STL's.

The implementation is inspired (read: stolen) from: http://en.literateprograms.org/Red-black_tree_(C)#chunk use:private function prototypes.

The code was optimized for the simple integer types of key and value.

func (*RBTree) DeleteWithIterator

func (root *RBTree) DeleteWithIterator(iter Iterator)

Delete the current item.

REQUIRES: !iter.Limit() && !iter.NegativeLimit()

func (*RBTree) DeleteWithKey

func (root *RBTree) DeleteWithKey(key int) bool

Delete an item with the given key. Return true iff the item was found.

func (*RBTree) FindGE

func (root *RBTree) FindGE(key int) Iterator

Find the smallest element N such that N >= key, and return the iterator pointing to the element. If no such element is found, return root.Limit().

func (*RBTree) FindLE

func (root *RBTree) FindLE(key int) Iterator

Find the largest element N such that N <= key, and return the iterator pointing to the element. If no such element is found, return iter.NegativeLimit().

func (*RBTree) Get

func (root *RBTree) Get(key int) *int

A convenience function for finding an element equal to key. Return nil if not found.

func (*RBTree) Insert

func (root *RBTree) Insert(item Item) (bool, Iterator)

Insert an item. If the item is already in the tree, do nothing and return false. Else return true.

func (*RBTree) Len

func (root *RBTree) Len() int

Return the number of elements in the tree.

func (*RBTree) Limit

func (root *RBTree) Limit() Iterator

Create an iterator that points beyond the maximum item in the tree

func (*RBTree) Max

func (root *RBTree) Max() Iterator

Create an iterator that points at the maximum item in the tree

If the tree is empty, return NegativeLimit()

func (*RBTree) Min

func (root *RBTree) Min() Iterator

Create an iterator that points to the minimum item in the tree If the tree is empty, return Limit()

func (*RBTree) NegativeLimit

func (root *RBTree) NegativeLimit() Iterator

Create an iterator that points before the minimum item in the tree

Directories

Path Synopsis
cmd
hercules
Package main provides the command line tool to gather the line burndown statistics from Git repositories.
Package main provides the command line tool to gather the line burndown statistics from Git repositories.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL