pmap

module

v0.0.4 Latest Latest Go to latest Published: Aug 14, 2023 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/abcxyz/pmap

Links

Open Source Insights

README ¶

pmap

This is not an official Google product.

Background

Privacy data management is the process of collecting, storing, using, and disposing of data in a way that protects the privacy of users. It is a critical part of any organization that collects or uses user data, and organizations are typically required to maintain compliance with policies set by regulatory bodies.

To ensure that organizations maintain compliance with policies set by regulatory bodies, they need to know the following:

The requirements for what teams must do, driven by legal requirements or external commitments (aka. policy and compliance controls). This includes translating the comprehensive external legal requirements into requirements that are tailored to products and services of the organizations.
Where the user data is stored or processed. This includes understanding the different systems and databases that store or process user data, as well as the physical locations where user data is stored or processed.
Which policy or compliance control applies to the system that stores/processes user data (aka. data mapping). This includes understanding how the organization's policies and compliance control are applied to different systems and databases.
The visibility of the privacy compliance. This includes being able to track and monitor the organization's compliance with its policies/controls and applicable laws and regulations. PMAP provides a solution for the first three problems. We are working on a solution to provide visibility of privacy compliance in the near future.

Architecture

pmap architecture

Registration - Data owners and policy owners will register data mappings and policies/controls in a central GitHub repository.
GCS Snapshots - Snapshot the data mappings and policies/controls from GitHub to GCS with Workload Identity Federation.
Additional Processors - Extension point of validation and enrichment for data mappings.
Processing Service - The service that is responsible for ingesting, validating and storing the data mappings and policies/controls .
Storage and Analysis - The data warehouse for processed data mappings and policies/controls , and UI for dashboarding.

Why GitHub

We choose GitHub as it can preserve change history and enable multi-person review and approval. Change history and review/approval process are crucial in privacy data management.

Why BigQuery

We choose BigQuery for its excellent analytics support:

Be able to visualize data to reveal meaningful insights.
Be able to join data from other data sources in the future to achieve the privacy compliance monitoring.

Set Up

The central privacy/compliance eng team need to complete the steps below.

Workload Identity Federation

Set up Workload Identity Federation, and a service account with adequate condition and permission, see guide here.

-  Service account used in Authenticating via Workload Identity Federation
   needs [roles/storage.objectCreator]
   to snapshot the data mappings and policies/controls from GitHub to GCS.

GitHub Central Repository

The central privacy/compliance eng team can determine how to group data mappings and policies/controls as long as at least one level of group are needed (sub folders in the root of the central GitHub repository are needed). Files containing the data mappings or policies/controls can’t be stored directly in the root of the central GitHub repository.

Data Mapping

Presubmit workflows for sanity checks, see example here.
Postsubmit workflows to snapshot added_files and modified_files of data mappings to GCS, see example here.
Cron Workflows to snapshot the all files of data mappings to GCS, see example here.

Policy and Control

Postsubmit workflows to snapshot added_files and modified_files of policies/controls to GCS, see example here.
Cron Workflows to snapshot the all files of policies/controls to GCS, see example here

Infrastructure for pmap

You can use the provided Terraform module to setup the basic infrastructure needed for this service. Otherwise you can refer to the provided module to see how to build your own Terraform from scratch.

module "pmap" {
  source = "git::https://github.com/abcxyz/pmap.git//terraform/e2e?ref=main" # this should be pinned to the SHA desired

  project_id = "YOUR_PROJECT_ID"

  gcs_bucket_name                  = "pmap"
  pmap_container_image             = "us-docker.pkg.dev/abcxyz-artifacts/docker-images/pmap:0.0.4-amd64"
  pmap_prober_image                = "us-docker.pkg.dev/abcxyz-artifacts/docker-images/pmap-prober:0.0.4-amd64"
  bigquery_table_delete_protection = true
  # This is used when searching global Cloud Resources like GCS bucket.
  pmap_specific_envvars            = { "PMAP_MAPPING_DEFAULT_RESOURCE_SCOPE" : "YOUR_DEFAULT_RESOURCE_SCOPE" }
  notification_channel_email       = "YOUR_NOTIFICATION_CHANNEL_EMAIL"
}

Make sure the Service Account used in the Cloud Run service for Data Mapping is granted the roles/cloudasset.viewer to the corresponding scope PMAP_MAPPING_DEFAULT_RESOURCE_SCOPE level following docs here.

# Grep the Service Account used in the Cloud Run service for Data Mapping 
gcloud run services describe <NAME_OF_DATA_MAPPING_CLOUD_RUN_SERVICE>

End User Workflows

Policy/Control Owner

Create a policy/control (e.g. a wipeout plan) by opening a PR and add a yaml file under the sub folder where stores all the policies/controls. See example here.

Data Owner

Register and annotate resources to associate the resources to its specific policies/controls by opening a PR and add a mapping yaml file under the sub folder where stores all the data mappings. The association of the resource to the corresponding policies/controls is achieved via annotations field. See example here.

Data Governor(TODO)

Directories ¶

Path	Synopsis
apis
v1alpha1 Package v1alpha1 contains versioned pmap contracts, e.g.	Package v1alpha1 contains versioned pmap contracts, e.g.
cmd
pmap Package main is the main entrypoint to the application.	Package main is the main entrypoint to the application.
internal
testhelper Package testhelper provides utilities that are intended to enable easier and more concise writing of test and probder code.	Package testhelper provides utilities that are intended to enable easier and more concise writing of test and probder code.
version
pkg
cli Package cli implements the commands for the PMAP CLI.	Package cli implements the commands for the PMAP CLI.
mapping/processors Package processors provides essential processors for pmap.	Package processors provides essential processors for pmap.
pmaperrors Package pmaperrors defines the error wrapper for user facing errors.	Package pmaperrors defines the error wrapper for user facing errors.
server Package server is the base server for the pmap event ingestion.	Package server is the base server for the pmap event ingestion.
prober
test
integration

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL