luci: Index | Files | Directories

package eval

import ""

Package eval implements evaluation of a pre-submit RTS algorithm.


RTS stands for Regression Test Selection. It is a technique to intellegently select tests to run, such that bad code is detected, but without spending too much resources on testing. An RTS algorithm for pre-submit accepts a list of changed files as input and returns a list of tests to run as output.

Why evaluation

An RTS algorithm can significantly increase efficiency of CQ, but also it can start letting bad code into the repository and create a havoc for sheriffs. Thus before an RTS algorithm is deployed, its safety and efficiency must be evaluated.

Also there are many possible RTS algorithms and we need objective metrics to choose among them.

Finally an algorithm developer needs objective metrics in order to make iterative improvements to the algorithm.

Quick start

The primary entry point to this package is function Main(), which accepts an RTS algorithm as input and prints its safety and efficiency metrics to stdout. The following is a coinflip RTS algorithm which drops a test with 0.5 probability:

func main() {
	ctx := context.Background()
	eval.Main(ctx, func(ctx context.Context, in eval.Input) (eval.Output, error) {
		return eval.Output{
			ShouldRun: rand.Intn(2) == 0,
		}, nil

This compiles to a program that evaluates the coinflip algorithm. Program execution requires a history file as input:

./rts-random -history cq.hist

Example of stdout + stderr:

Lost rejection:
  Failed and not selected tests:
    - builder:linux-chromeos-rel | os:Ubuntu-16.04 | test_suite:interactive_ui_tests
      - ninja://chrome/test:interactive_ui_tests/FeaturePromoSnoozeInteractiveTest.DismissDoesNotSnooze
        in //chrome/browser/ui/views/in_product_help/
  Score: 74%
  # of eligible rejections: 5190
  # of them preserved by this RTS: 3851
  Saved: 50%
  Compute time in the sample: 131h44m38.864038382s
  Forecasted compute time: 65h58m34.295175147s
Total records: 605372

This tells us that safety of coinflip is only 74%, meaning it would let 26% of bad CLs through. Also it tells that it would unsurprisingly save 50% of compute time.

The coinflip algorithm is notable because it represents the bare minimum. Any RTS algorithm we consider must have much higher safety than 74%. The algorithm is implemented in

History files

RTS evaluation emulates CQ with the given RTS algorithm, based on historical records read from a history file. Different LUCI projects acquire the history files differently. For Chromium, use

For history file format, see

Safety evaluation

Safety is evaluated as a ratio preserved_rejections/total_rejections, where

- total_rejections is the number of patchsets rejected by CQ due to test
- preserved_rejections is how many of them would still be rejected
  if the given RTS algorithm was deployed.

A rejection is considered preserved iff the RTS algorithm selects at least one test that caused the rejection.

The ideal safety score is 1.0, meaning all historical rejections would be preserved.

Efficiency evaluation

Efficiency is evaluated as the amount of saved compute time: forecast_duration/total_duration, where

- total_duration is the sum of test durations found in the history file.
- forecast_duration is the duration sum for those tests that the RTS
  algorithm decides to select.


Package Files

algo.go cache.go doc.go efficiency.go eval.go gerrit.go init.go main.go run.go safety.go util.go

func Main Uses

func Main(ctx context.Context, algo Algorithm)

Main evaluates an RTS algorithm, prints results and exits the process.

type Algorithm Uses

type Algorithm func(context.Context, Input) (Output, error)

Algorithm accepts a list of changed files and a test description and decides whether to run it.

type Efficiency Uses

type Efficiency struct {
    // SampleDuration is the sum of test durations in the analyzed data sample.
    SampleDuration time.Duration

    // ForecastDuration is the sum of test durations for tests selected by the RTS
    // algorithm. It is a value between 0 and SampleDuration.
    // The lower the number the better.
    ForecastDuration time.Duration

Efficiency is result of evaluation how much compute time the RTS algorithm could save.

func (*Efficiency) Score Uses

func (e *Efficiency) Score() float64

Score returns the efficiency score. May return NaN.

type Eval Uses

type Eval struct {
    // The algorithm to evaluate.
    Algorithm Algorithm

    // The number of goroutines to spawn for each metric.
    // If <=0, defaults to 100.
    Concurrency int

    // Directory where to cache fetched data.
    // If "", defaults to ${systemCacheDir}/chrome-rts.
    CacheDir string

    // Maximum QPS to send to Gerrit.
    // If <=0, defaults to 10.
    GerritQPSLimit int

    // Historical records to use for evaluation.
    History *history.Reader

    // How often to report progress. Defaults to 5s.
    ProgressReportInterval time.Duration

Eval estimates safety and efficiency of a given RTS algorithm.

func (*Eval) RegisterFlags Uses

func (e *Eval) RegisterFlags(fs *flag.FlagSet) error

RegisterFlags registers flags for the Eval fields.

func (*Eval) Run Uses

func (e *Eval) Run(ctx context.Context) (*Result, error)

Run evaluates the algorithm.

func (*Eval) ValidateFlags Uses

func (e *Eval) ValidateFlags() error

ValidateFlags validates values of flags registered using RegisterFlags.

type Input Uses

type Input struct {
    // ChangedFiles is a list of files changed in a patchset.
    ChangedFiles []*SourceFile

    // The algorithm needs to decide whether to run these test variants.
    TestVariants []*evalpb.TestVariant

Input is input to an RTS Algorithm.

type Output Uses

type Output struct {
    // ShouldRunAny is true if any of the test variants should run as a
    // part of the suite.
    ShouldRunAny bool

Output is the output of an RTS algorithm.

type Result Uses

type Result struct {
    Safety       Safety
    Efficiency   Efficiency
    TotalRecords int

Result is the result of evaluation.

func (*Result) Print Uses

func (r *Result) Print(w io.Writer) error

Print prints the results to w.

type Safety Uses

type Safety struct {
    // TotalRejections is the total number of analyzed rejections.
    TotalRejections int

    // EligibleRejections is the number of rejections eligible for safety
    // evaluation.
    EligibleRejections int

    // LostRejections are the rejections that would not be preserved
    // by the candidate algorithm, i.e. the bad patchsets would land.
    // The candidate RTS algorithm did not select any of the failed tests
    // in these rejections.
    // Ideally this slice is empty.
    LostRejections []*evalpb.Rejection

Safety is result of algorithm safety evaluation. A safe algorithm does not let bad CLs pass CQ.

func (*Safety) Score Uses

func (s *Safety) Score() float64

Score returns the safety score. May return NaN.

type SourceFile Uses

type SourceFile struct {
    // Repo is a repository identifier.
    // For repositories, it is a canonical URL, e.g.
    Repo string

    // Path to the file relative to the repo root.
    // Starts with "//".
    Path string

SourceFile identifies a source file.


historyPackage history implements serialization and deserilization of historical records used for RTS evaluation.

Package eval imports 45 packages (graph) and is imported by 1 packages. Updated 2020-10-28. Refresh now. Tools for package owners.