gitdb

package module
v0.0.0-...-4703812 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 17, 2015 License: MIT Imports: 16 Imported by: 0

README

gitdb Documentation Build Status

A lightweight golang library to sync git objects between database and filesystem.

Features

  • Sync git objects between filesystem and database, incrementally.
  • Read git trees and blobs from database directly.

Dependencies

  • git binary, 2.5.1 tested
  • go, 1.5 linux/amd64 tested

Core API by examples

To create the database table, which is required by gitdb:

db, err := sql.Open(...)
gitdb.CreateTable(db)

To import git repo located at "/foo/bar" to database:

gitdb.Import(db, "/foo/bar", "HEAD")

To export git objects to filesystem and update its HEAD:

oid := "d18eb8215851573416b558cdf224c49580731249"
gitdb.Export(db, "/foo/bar", oid, "HEAD")

To read file paths and contents of a tree (and all subtrees) from database:

// oid can be either a commit or a tree
oid := "d18eb8215851573416b558cdf224c49580731249"
modes, oids, paths, err := gitdb.ReadTree(db, oid)
contents, err = gitdb.ReadBlobs(db, oids)
for i, p := range paths {
    fmt.Println(path, oids[i], contents[i])
}

FAQ

Q: Why sync git objects to database?

A: Easier deployment. Especially for applications which use git, run on multiple instances, have a centric database and do not have a centric filesystem.

Q: Does gitdb scale?

A: Sadly git does not scale and neither does gitdb. Repos with thousands of commits probably won't perform well. A lot of small repos should be okay, as long as GC performance is not important.

Things could be much better using recursive SQL queries (Common Table Expressions). However MySQL 5.6 does not support it while it is a target gitdb must support. MySQL stored procedures could help but it will be some extra and probably non-portable work. Therefore, database latency is extremely important to gitdb performance. Keep the database and the application as near as possible.

Q: Will Import and Export ignore existing objects?

A: Yes. Import and Export will skip importing or exporting existing objects. This means even for a relatively large repo, when syncs frequently, the performance is still acceptable.

Q: Can I use gitdb as a general purpose git library?

A: No. The package is designed to be simple. It even runs external git binary for some complex tasks. For unsupported tasks such as adding a commit, you can use export, do modifications using other git library or even git binary, then import.

Q: Why not use a native git library, instead of executing external git?

A: Because a git library is not simple. A decent go git library will probably cause the codebase much larger. libgit2 is good but not widely installed. And I tried not to introduce non-go dependencies.

Q: Can I modify the gitobjects table on my own?

A: Please do it only when you understand what you are doing. Deleting or altering rows in gitobjects may break gitdb in several ways.

Documentation

Overview

Package gitdb syncs git objects between database and filesystem

Example (ExampleCliTool)

An example CLI tool to test Import, Export and GC.

package main

import (
	"database/sql"
	"fmt"
	"github.com/quark-zju/gitdb"
	"os"

	_ "github.com/mattn/go-sqlite3"
)

var db *sql.DB

func main() {
	// To test the it, build, rename to `gitdbc` and try:
	//  mkdir -p /tmp/repo/
	//  pushd /tmp/repo && git clone https://gitlab.com/quark/gitdb && popd
	//  gitdbc import /tmp/repo/gitdb
	//
	//  mkdir -p /tmp/repo/gitdb2
	//  pushd /tmp/repo/gitdb2 && git init && popd
	//  OID=`git --git-dir /tmp/repo/gitdb rev-parse HEAD`
	//  gitdbc export /tmp/repo/gitdb2 $OID
	//
	//  gitdbc gc $OID
	usage := func() {
		fmt.Printf("%s i[mport] dir [ref=master]    # import objects from filesystem\n"+
			"%s e[xport] dir oid           # export objects to existing git repo. update master.\n"+
			"%s g[c] oid1 [oid [oid] ...]  # give some reachable oids, delete others\n",
			os.Args[0], os.Args[0], os.Args[0])
		os.Exit(1)
	}

	if len(os.Args) < 3 {
		usage()
	}

	var err error
	db, err = sql.Open("sqlite3", "/tmp/gitdb.sqlite3")
	if err != nil {
		panic(err)
	}
	_, err = gitdb.CreateTable(db)
	if err != nil {
		panic(err)
	}
	defer db.Close()

	switch os.Args[1][0] {
	case 'i': // Import
		ref := "master"
		if len(os.Args) >= 4 {
			ref = os.Args[3]
		}
		dir := os.Args[2]
		oids, rOid, err := gitdb.Import(db, dir, ref)
		if err != nil {
			panic(err)
		}
		fmt.Printf("Imported from %s: %d objects; %s is '%s'.\n", dir, len(oids), ref, rOid)
	case 'e': // Export
		if len(os.Args) < 4 {
			usage()
		}
		dir := os.Args[2]
		oid := gitdb.Oid(os.Args[3])
		oids, err := gitdb.Export(db, dir, oid, "refs/heads/master")
		if err != nil {
			panic(err)
		}
		fmt.Printf("Exported to %s: %d objects; master set to '%s'.\n", dir, len(oids), oid)
	case 'g': // GC
		var oids []gitdb.Oid
		for _, o := range os.Args[2:len(os.Args)] {
			oids = append(oids, gitdb.Oid(o))
		}
		tx, err := db.Begin()
		if err != nil {
			panic(err)
		}
		delOids, err := gitdb.GC(tx, oids)
		if err != nil {
			panic(err)
		}
		if err := tx.Commit(); err != nil {
			panic(err)
		}
		fmt.Printf("GC completed. %d objects deleted.\n", len(delOids))
	default:
		usage()
	}
	db.Close()
}
Output:

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func CreateTable

func CreateTable(db *sql.DB) (sql.Result, error)

CreateTable creates the required git objects table on demand.

func Import

func Import(dt dbOrTx, path string, ref string) (oids []Oid, refOid Oid, err error)

Import syncs git objects from filesystem to database. It is like `git push` running from the filesystem.

dt is either *sql.DB or *sql.Tx. path is the path of the git repository. It can be the `.git` directory, or its parent. ref is the reference string. It can be "HEAD", a tag name, a branch name, a commit hash or its prefix.

Returns oids, refOid, err. oids are imported object IDs. If nothing is imported (the database is up-to-date), oids will be an empty array. refOid is the parsed git object ID (40-char hex string) of the given ref.

func ReadBlobs

func ReadBlobs(dt dbOrTx, oids []Oid) ([][]byte, error)

ReadBlobs reads blob contents from database. It is like `git cat-file --batch` but only returns contents, without type or size information.

dt is either *sql.DB or *sql.Tx. oids are the git object IDs of the blobs to be read.

It is often used after ReadTree.

Note: ReadBlobs does not check git object type. It can be used to read raw contents of other git objects.

Example

Works like `git cat-file --batch='--'`

package main

import (
	"database/sql"
	"fmt"
	"github.com/quark-zju/gitdb"

	_ "github.com/mattn/go-sqlite3"
)

var db *sql.DB

func main() {
	var oids []gitdb.Oid
	for {
		var s string
		if n, _ := fmt.Scan(&s); n != 1 {
			break
		}
		oids = append(oids, gitdb.Oid(s))
	}

	blobs, err := gitdb.ReadBlobs(db, oids)
	if err != nil {
		panic(err)
	}

	for _, b := range blobs {
		fmt.Printf("--\n%s\n", string(b))
	}
	// Sample Output:
	// --
	// foo
	// --
	// bar
}
Output:

Types

type Oid

type Oid string

Oid is a 40-char sha1sum in hex form, used as the ID of a git object.

func Export

func Export(dt dbOrTx, path string, oid Oid, ref string) ([]Oid, error)

Export syncs git objects from database to filesystem. It is like `git pull` running from the filesystem.

dt is either *sql.DB or *sql.Tx. path is the path of the git repository. It can be the `.git` directory, or its parent. oid is the git object ID in database. ref is the reference string which will be written to the filesystem. It is usually "HEAD". It could also be "refs/tags/foo", or "refs/heads/bar". If ref is an empty string, a generated tag name will be used to make the newly written objects not orphaned.

Returns oids and error. oids is a list of git object IDs exported. If nothing is exported (the git repository in the filesystem is up-to-date), oids will be an empty array.

func GC

func GC(tx *sql.Tx, oids []Oid) ([]Oid, error)

GC removes all objects from database except for oids and their parents and ancestors.

Returns deleted git object IDs.

func ReadTree

func ReadTree(dt dbOrTx, oid Oid) (modes []int32, oids []Oid, paths []string, err error)

ReadTree reads trees and sub-trees recursively from database. Returns modes, oids, full paths for non-tree objects. It is like `git ls-tree -r` but works directly in database.

dt is either *sql.DB or *sql.Tx. oid is the git object ID of a git tree or commit.

Example

Works like `git ls-tree -r `

package main

import (
	"database/sql"
	"fmt"
	"github.com/quark-zju/gitdb"

	_ "github.com/mattn/go-sqlite3"
)

var db *sql.DB

func main() {
	modes, oids, paths, err := gitdb.ReadTree(db, "9864be5e4fac9b4108b3412b60ed55e3c7095559")
	if err != nil {
		panic(err)
	}
	for i, mode := range modes {
		fmt.Printf("%o blob %s\t%s\n", mode, oids[i], paths[i])
	}
	// Sample Output:
	// 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	a
	// 100644 blob 6a69f92020f5df77af6e8813ff1232493383b708	b/c
}
Output:

func (Oid) IsValid

func (o Oid) IsValid() bool

IsValid tests whether o is valid by checking whether it is 40-char sha1sum.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL