gha

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 6, 2023 License: Apache-2.0 Imports: 0 Imported by: 0

README

gha

Research project of go-faster.

Content based on www.gharchive.org used under the CC-BY-4.0 license.

Utilities to work with GH Archive, project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.

[
  {
    "input": "1693 GB",
    "content": "13 TB",
    "output": "1191 GB"
  }
]
[
  {
    "state": "NotFound",
    "count": 319
  },
  {
    "state": "Ready",
    "count": 68952
  }
]

Results

Missing chunks

319 of 68952 chunks are missing, not sure about restore, not critical.

Incomplete repo language data

Language data is not included in events.

There is incomplete (only 3 million repos) public dataset:

SELECT * FROM `bigquery-public-data.github_repos.languages`;

However, many popular repositories are missing and manual data retrieval is required.

Source

Programming languages by repository as reported by GitHub's https://developer.github.com/v3/repos/#list-languages API

Properties
  • No repo id, just name
  • Probably no removed or renamed repos
  • ~3 million entries
  • Language data is in array (language name, bytes)

Documentation

Overview

Package gha implements GitHub archive tools.

Directories

Path Synopsis
cmd
internal
app
calc
Package calc implements analysis and calculations on top of github archive data.
Package calc implements analysis and calculations on top of github archive data.
ent
oas

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL