disco

module
v1.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 10, 2023 License: Apache-2.0

README

version

disco

Utility for bulk image, license, package, and vulnerability discovery in containerize workloads on GCP.

Note: this is a personal project, not an official Google product.

Features:

  • Discover currently deployed container images in Cloud Run, GCF, and GKE
    • supports multiple project and region
    • resolves deploys deployed image to their digests
  • Report on vulnerabilities, packages, or licenses in these images
    • scans base images and packages
    • supports filters (e.g CVE, package name, license type)
  • Available as CLI os Service (for continuous discovery)

Additionally, when deployed as a service, disco will:

  • Publish custom metrics (time-series) in Cloud Monitoring to support:
    • custom charts and dashboards (e.g. image vulnerability over time)
    • metric threshold alerts (e.g. page on CRITICAL vulnerability in project X)
  • Export image license, package, and vulnerability data to BigQuery
    • query data using SQL (e.g. package versions over time)
    • create ML models (e.g. vulnerability source classification model)
    • build custom visualizations using Google Sheets, Data Studio, or Looker
  • Archive raw license, package, and vulnerability scanner outputs into GCS bucket
    • each file stored in "folder" named after image SHA

Why

It's easy to end up with a large number of containerized workloads across many GCP projects and regions: Cloud Run, GKE, or even Cloud Functions (yes, those end up running as a container too). You can scan these containers in Artifact Registry using Container Analysis service, but currently it only covers base OS. It's also not easy to know which of these images (and which versions) are actually being used in active services. Services like Cloud Run also support multiple revisions, each potentially using a different version of an image, so identifying container images currently underpinning your services can get complicated.

disco provides an easy way to discover which of these container images are currently deployed, and automates the vulnerability/license scanning.

Install

You can use disco either as CLI or Service:

  • CLI - Supports most common distribution methods (Homebrew, RPM, DEB, Go install, Binary etc).
  • Service - Deploys as a Cloud Run service via Terraform.

Usage

CLI
disco command [command options] [arguments...]

You can use the --help flag on any level to get more information about the runtime, commands, of disco itself.

Images

Discover deployed images from specific runtime. To see all of the commands available for img (or image):

disco image --help

To discover container images currently deployed in all of the supported runtimes:

disco img

Options:

  • --output - path where to save the output (stdout by default)
  • --format - output format (yaml or json which is the default)
  • --project - scope discovery to a single project using project ID

The resulting report in JSON format will look something like this (abbreviated):

{
  "meta": {
    "kind": "image",
    "version": "v0.3.19-next",
    "created": "2022-12-28T21:20:15Z",
  },
  "items": [
    {
      "uri": "us-west1-docker.pkg.dev/cloudy-demos/gcf-artifacts/test--func@sha256:d22bfc69913190ff9d274553bc55f782b5056b0d2ed62b52eb327a34c90d7203",
      "context": {
        "container-name": "test--func-1",
        "location-id": "us-west1",
        "location-name": "Oregon",
        "project-id": "cloudy-demos",
        "project-number": "799736955886",
        "runtime": "gcf",
        "service-id": "projects/cloudy-demos/locations/us-west1/services/test-func",
        "service-name": "test-func",
        "service-revision": "projects/cloudy-demos/locations/us-west1/services/test-func/revisions/test-func-00001-fiz"
      }
    },
      ...
  ]
}
Vulnerabilities

Discover potential vulnerabilities in container images. To see all of the commands available for vul (or vulnerability):

disco vulnerability --help

Options:

  • --file - image list input file path to serve as a source (instead of discovery) (e.g. disco img --output images.json)
  • --image - specific image URI to scan. Note: source and image are mutually exclusive
  • --output - saves report to file at this path (stdout by default)
  • --format - output format (yaml or json which is the default)
  • --project - during discovery, runs only on specific project (project ID)
  • --min-severity - minimum severity of vulnerability to include in report (e.g. low, medium, high, critical, default: all)
  • --cve - filter results on a specific CVE ID (e.g. CVE-2020-22046)
  • --target - target data store to save the results to (e.g. bq://my-project.some-dataset or bq://my-project.some-dataset.table-name)

Using the cve filter you can quickly check if any of the currently deployed images have a vulnerability.

The resulting report in JSON format will look something like this (abbreviated):

{
  "meta": {
    "kind": "vulnerability",
    "version": "v0.3.19-next",
    "created": "2022-12-28T21:32:34Z",
    "count": 5
  },
  "items": [
    {
      "image": "gcr.io/cloudy-demos/hello-broken@sha256:0900c08e7d40f94...",
      "context": {
        "container-name": "hello-broken-1",
        "location-id": "us-central1",
        "location-name": "Iowa",
        ...
      },
      "vulnerabilities": [
        {
          "source": "CVE-2021-28165",
          "severity": "HIGH",
          "package": "org.eclipse.jetty:jetty-util",
          "version": "9.4.31.v20200723",
          "title": "jetty: Resource exhaustion when receiving an invalid large TLS frame",
          "description": "In Eclipse Jetty 7.2.2 to 9.4.38, 10.0.0.alpha0 to 10.0.1, and 11.0.0.alpha0 to 11.0.1, CPU usage can reach 100% upon receiving a large invalid TLS frame.",
          "url": "https://avd.aquasec.com/nvd/cve-2021-28165",
          "updated": "2022-07-29T17:05:00Z"
        },
        ...
      ]
    },
    ...
  ]
}
Licenses

Discover licenses for OS and packages used in container images. To see all of the commands available for lic or license:

disco license --help

Options:

  • --file - image list input file path to serve as a source (instead of discovery) (e.g. disco img --output images.json)
  • --image - specific image URI to scan. Note: source and image are mutually exclusive
  • --output - saves report to file at this path (stdout by default)
  • --format - output format (yaml or json which is the default)
  • --project - during discovery, runs only on specific project (project ID)
  • --type - license type filter (supports prefix: e.g. apache, bsd, mit, etc.)
  • --target - target data store to save the results to (e.g. bq://my-project)

Using the type you can quickly check if any of your currently deployed images are using specific license.

The resulting report in JSON format will look something like this (abbreviated):

{
  "meta": {
    "kind": "license",
    "version": "v0.3.19-next",
    "created": "2022-12-28T21:23:20Z",
  },
  "items": [
    {
      "image": "us-docker.pkg.dev/cloudrun/container/hello@sha256:2e70803dbc92...",
      "context": {
        "container-name": "hello-1",
        "project-id": "cloudy-demos",
        "project-number": "799736955886",
        ...
      },
      "licenses": [
        {
          "name": "GPL-2.0",
          "source": "alpine-baselayout-data"
        },
        {
          "name": "MIT",
          "source": "alpine-keys"
        },
        ...
      ]
    },
    ...
  ]
}
Packages

Discover packages used in container images. To see all of the commands available for pkg (or packages):

disco packages --help

Options:

  • --file - image list input file path to serve as a source (instead of discovery) (e.g. disco img --output images.json)
  • --image - specific image URI to scan. Note: source and image are mutually exclusive
  • --output - saves report to file at this path (stdout by default)
  • --format - output format (yaml or json which is the default)
  • --project - during discovery, runs only on specific project (project ID)
  • --name - package name filter (uses contains, e.g. libgcc, gobinary, express, etc.)
  • --target - target data store to save the results to (e.g. bq://my-project)

Using the type you can quickly check if any of your currently deployed images are using specific license.

The resulting report in JSON format will look something like this (abbreviated):

{
  "meta": {
    "kind": "package",
    "version": "v0.9.4",
    "created": "2023-01-08T00:37:26Z",
  },
  "items": [
    {
      "image": "us-central1-docker.pkg.dev/cloudy-labz/gcf-artifacts/test--go119@sha256:80be8e3c174...",
      "context": {
        "container-name": "test--go119",
        "location-id": "us-central1",
        "location-name": "Iowa",
        ...
      },
      "packages": [
        {
          "package": "minipass-sized",
          "version": "1.0.3",
          "source": "pkg:npm/minipass-sized@1.0.3",
          "license": "ISC",
          "format": "SPDX-2.2",
          "provider": "trivy"
        },
        ...
      ],
    ...
    }
  ]
}
Service

Instructions on how to deploy the disco service are here.

When running as a service, disco automatically exports metrics and report data:

Metrics

disco metrics can be found in Metric Explorer

Custom time-series metrics created by disco:

  • disco/vulnerability/image - count of images scanned for vulnerability (labels: project, version)
  • disco/vulnerability/severity - vulnerability severity count (labels: project, version, kind)
  • disco/license/image - count of images scanned for licenses (labels: project, version)
  • disco/license/count - count of licenses (labels: project, version)
  • disco/package/image - count of images scanned for packages (labels: project, version)
  • disco/package/count - count of packages (labels: project, version)

License and packages have too high cardinality for more detail labels.

Data

disco service automatically exports its data to two BigQuery tables

Common elements:

  • batch_id is the unique ID of each discovery operation
  • image is the image URI sans tag or sha
  • sha is the image digest prefixed with sha:
  • updated is the timestamp when the data element was extracted

licenses

{name: "batch_id", type: "integer", required: true},
{name: "image", type: "string", required: true},
{name: "sha", type: "string"},
{name: "name", type: "string", required: true},
{name: "package", type: "string"},
{name: "updated", type: "timestamp", required: true}

packages

{name: "batch_id", type: "integer", required: true},
{name: "image", type: "string", required: true},
{name: "sha", type: "string"},
{name: "cve", type: "string", required: true},
{name: "severity", type: "string"},
{name: "package", type: "string"},
{name: "version", type: "string"},
{name: "title", type: "string"},
{name: "description", type: "string"},
{name: "url", type: "string"},
{name: "updated", type: "timestamp", required: true}

vulnerabilities

{name: "batch_id", type: "integer", required: true},
{name: "image", type: "string", required: true},
{name: "sha", type: "string"},
{name: "format", type: "string", required: true},
{name: "provider", type: "string", required: true},
{name: "package", type: "string", required: true},
{name: "version", type: "string"},
{name: "source", type: "string"},
{name: "license", type: "string"},
{name: "updated", type: "timestamp", required: true}

You can use these in your custom queries:

Sample of SQL queries available here.

or in Sheet, Data Studio, or Looker reports

Disclaimer

This is my personal project and it does not represent my employer. While I do my best to ensure that everything works, I take no responsibility for issues caused by this code.

Directories

Path Synopsis
cmd
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL