criticality_score

module

v2.0.4 Latest Latest Go to latest Published: Apr 30, 2024 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ossf/criticality_score

Links

Open Source Insights

README ¶

Open Source Project Criticality Score (Beta)

This project is maintained by members of the Securing Critical Projects WG.

Goals

Generate a criticality score for every open source project.
Create a list of critical projects that the open source community depends on.
Use this data to proactively improve the security posture of these critical projects.

Criticality Score

A project's criticality score defines the influence and importance of a project. It is a number between 0 (least-critical) and 1 (most-critical). It is based on the following algorithm by Rob Pike:

We use the following default parameters to derive the criticality score for an open source project:

Parameter (S_i)	Weight (α_i)	Max threshold (T_i)	Description	Reasoning
created_since	1	120	Time since the project was created (in months)	Older project has higher chance of being widely used or being dependent upon.
updated_since	-1	120	Time since the project was last updated (in months)	Unmaintained projects with no recent commits have higher chance of being less relied upon.
contributor_count	2	5000	Count of project contributors (with commits)	Different contributors involvement indicates project's importance.
org_count	1	10	Count of distinct organizations that contributors belong to	Indicates cross-organization dependency.
commit_frequency	1	1000	Average number of commits per week in the last year	Higher code churn has slight indication of project's importance. Also, higher susceptibility to vulnerabilities.
recent_releases_count	0.5	26	Number of releases in the last year	Frequent releases indicates user dependency. Lower weight since this is not always used.
closed_issues_count	0.5	5000	Number of issues closed in the last 90 days	Indicates high contributor involvement and focus on closing user issues. Lower weight since it is dependent on project contributors.
updated_issues_count	0.5	5000	Number of issues updated in the last 90 days	Indicates high contributor involvement. Lower weight since it is dependent on project contributors.
comment_frequency	1	15	Average number of comments per issue in the last 90 days	Indicates high user activity and dependence.
dependents_count	2	500000	Number of project mentions in the commit messages	Indicates repository use, usually in version rolls. This parameter works across all languages, including C/C++ that don't have package dependency graphs (though hack-ish). Plan to add package dependency trees in the near future.

NOTE:

You can override those defaut values at runtime as described below.
We are looking for community ideas to improve upon these parameters.
There will always be exceptions to the individual reasoning rules.

Usage

$ go install github.com/ossf/criticality_score/v2/cmd/criticality_score@latest

$ export GITHUB_TOKEN=...         # requires a GitHub token to work
$ gcloud auth login --update-adc  # optional, add -depsdev-disable to skip

$ criticality_score -gcp-project-id=[your projectID] https://github.com/kubernetes/kubernetes
repo.name: kubernetes
repo.url: https://github.com/kubernetes/kubernetes
repo.language: Go
repo.license: Apache License 2.0
legacy.created_since: 87
legacy.updated_since: 0
legacy.contributor_count: 3999
legacy.watchers_count: 79583
legacy.org_count: 5
legacy.commit_frequency: 97.2
legacy.recent_releases_count: 70
legacy.updated_issues_count: 5395
legacy.closed_issues_count: 3062
legacy.comment_frequency: 5.5
legacy.dependents_count: 454393
default_score: 0.99107

The score can be changed by using the -scoring-config parameter and supplying a different configuration file to specify how the score is calculated.

By default the original_pike.yml configuration is used to calculate the score. However, other config files can be supplied to produce different scores. See config/scorer for more.

Feel free to copy one of the configurations and adjust the weights and thresholds to suit your needs.

Authentication

Before running criticality score, you need to:

For GitHub repos, you need to create a GitHub access token and set it in environment variable GITHUB_AUTH_TOKEN. This helps to avoid the GitHub's api rate limits with unauthenticated requests.

# For posix platforms, e.g. linux, mac:
export GITHUB_AUTH_TOKEN=<your access token>

# For windows:
set GITHUB_AUTH_TOKEN=<your access token>

Formatting Results

There are three formats currently: text, json, and csv. Others may be added in the future.

These may be specified with the -format flag.

Other Commands

The criticality score project also has other commands for generating and working with criticality score data.

enumerate_github: a tool for accurately collecting a set of GitHub repos with a minimum number of stars
collect_signals: a worker for collecting raw signals at scale by leveraging the Scorecard project's infrastructure.
scorer: a tool for recalculating criticality scores based on an input CSV file.

Public Data

If you're interested in seeing a list of critical projects with their criticality score, we publish them in csv format and a BigQuery dataset.

This data is generated using a production instance of the criticality score project running in GCP. Details for how this is deployed can be found in the infra directory.

NOTE: Currently, these lists are derived from projects hosted on GitHub ONLY. We do plan to expand them in near future to account for projects hosted on other source control systems.

CSV data

The data is available on Google Cloud Storage and can be downloaded via:

web browser: commondatastorage.googleapis.com/ossf-criticality-score/index.html
gsutil command-line tool: gsutil ls gs://ossf-criticality-score/

BigQuery Dataset

This data is available in the public BigQuery dataset.

With a GCP account you can run queries across the data. For example, here is a query returning the top 100 repos by score:

  SELECT repo.url, default_score
    FROM `openssf.criticality_score_cron.criticality-score-v0-latest`
ORDER BY default_score DESC
   LIMIT 100;

Contributing

If you want to get involved or have ideas you'd like to chat about, we discuss this project in the Securing Critical Projects WG meetings.

See the Community Calendar for the schedule and meeting invitations.

See the Contributing documentation for guidance on how to contribute.

Directories ¶

Path	Synopsis
cmd
collect_signals
collect_signals/localworker
collect_signals/vcs
criticality_score
criticality_score/inputiter
csv_transfer
enumerate_github
enumerate_github/githubsearch
enumerate_github/marker
enumerate_github/repowriter
scorer The scorer command is used for calculating the criticality score for signals generated by the collect_signals command.	The scorer command is used for calculating the criticality score for signals generated by the collect_signals command.
internal
cloudstorage
collector Package collector is used to collect signals for a given repository from a variety of sources.	Package collector is used to collect signals for a given repository from a variety of sources.
collector/depsdev
collector/github
collector/github/legacy
collector/githubmentions Package githubmentions provides a Collector that returns a Set for the number of mentions a given repository has in commit messages as returned by GitHub's search interface.	Package githubmentions provides a Collector that returns a Set for the number of mentions a given repository has in commit messages as returned by GitHub's search interface.
collector/projectrepo
collector/signal
envflag Package envflag is a simple library for associating environment variables with flags.	Package envflag is a simple library for associating environment variables with flags.
githubapi
githubapi/pagination
infile
iterator
log
outfile
retry
scorer
scorer/algorithm
scorer/algorithm/wam Package wam implements the Weighted Arithmetic Mean, which forms the basis of Rob Pike's criticality score algorithm as documented in Quantifying_criticality_algorithm.pdf.	Package wam implements the Weighted Arithmetic Mean, which forms the basis of Rob Pike's criticality score algorithm as documented in Quantifying_criticality_algorithm.pdf.
signalio
workerpool

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL