pmap

module
v0.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 14, 2023 License: Apache-2.0

README

pmap

This is not an official Google product.

Background

Privacy data management is the process of collecting, storing, using, and disposing of data in a way that protects the privacy of users. It is a critical part of any organization that collects or uses user data, and organizations are typically required to maintain compliance with policies set by regulatory bodies.

To ensure that organizations maintain compliance with policies set by regulatory bodies, they need to know the following:

  • The requirements for what teams must do, driven by legal requirements or external commitments (aka. policy and compliance controls). This includes translating the comprehensive external legal requirements into requirements that are tailored to products and services of the organizations.

  • Where the user data is stored or processed. This includes understanding the different systems and databases that store or process user data, as well as the physical locations where user data is stored or processed.

  • Which policy or compliance control applies to the system that stores/processes user data (aka. data mapping). This includes understanding how the organization's policies and compliance control are applied to different systems and databases.

  • The visibility of the privacy compliance. This includes being able to track and monitor the organization's compliance with its policies/controls and applicable laws and regulations. PMAP provides a solution for the first three problems. We are working on a solution to provide visibility of privacy compliance in the near future.

Architecture

pmap architecture

  • Registration - Data owners and policy owners will register data mappings and policies/controls in a central GitHub repository.
  • GCS Snapshots - Snapshot the data mappings and policies/controls from GitHub to GCS with Workload Identity Federation.
  • Additional Processors - Extension point of validation and enrichment for data mappings.
  • Processing Service - The service that is responsible for ingesting, validating and storing the data mappings and policies/controls .
  • Storage and Analysis - The data warehouse for processed data mappings and policies/controls , and UI for dashboarding.
Why GitHub

We choose GitHub as it can preserve change history and enable multi-person review and approval. Change history and review/approval process are crucial in privacy data management.

Why BigQuery

We choose BigQuery for its excellent analytics support:

  • Be able to visualize data to reveal meaningful insights.
  • Be able to join data from other data sources in the future to achieve the privacy compliance monitoring.

Set Up

The central privacy/compliance eng team need to complete the steps below.

Workload Identity Federation

Set up Workload Identity Federation, and a service account with adequate condition and permission, see guide here.

-  Service account used in Authenticating via Workload Identity Federation
   needs [roles/storage.objectCreator]
   to snapshot the data mappings and policies/controls from GitHub to GCS.
GitHub Central Repository

The central privacy/compliance eng team can determine how to group data mappings and policies/controls as long as at least one level of group are needed (sub folders in the root of the central GitHub repository are needed). Files containing the data mappings or policies/controls can’t be stored directly in the root of the central GitHub repository.

Data Mapping
  • Presubmit workflows for sanity checks, see example here.

  • Postsubmit workflows to snapshot added_files and modified_files of data mappings to GCS, see example here.

  • Cron Workflows to snapshot the all files of data mappings to GCS, see example here.

Policy and Control
  • Postsubmit workflows to snapshot added_files and modified_files of policies/controls to GCS, see example here.

  • Cron Workflows to snapshot the all files of policies/controls to GCS, see example here

Infrastructure for pmap
  • You can use the provided Terraform module to setup the basic infrastructure needed for this service. Otherwise you can refer to the provided module to see how to build your own Terraform from scratch.
module "pmap" {
  source = "git::https://github.com/abcxyz/pmap.git//terraform/e2e?ref=main" # this should be pinned to the SHA desired

  project_id = "YOUR_PROJECT_ID"

  gcs_bucket_name                  = "pmap"
  pmap_container_image             = "us-docker.pkg.dev/abcxyz-artifacts/docker-images/pmap:0.0.4-amd64"
  pmap_prober_image                = "us-docker.pkg.dev/abcxyz-artifacts/docker-images/pmap-prober:0.0.4-amd64"
  bigquery_table_delete_protection = true
  # This is used when searching global Cloud Resources like GCS bucket.
  pmap_specific_envvars            = { "PMAP_MAPPING_DEFAULT_RESOURCE_SCOPE" : "YOUR_DEFAULT_RESOURCE_SCOPE" }
  notification_channel_email       = "YOUR_NOTIFICATION_CHANNEL_EMAIL"
}
  • Make sure the Service Account used in the Cloud Run service for Data Mapping is granted the roles/cloudasset.viewer to the corresponding scope PMAP_MAPPING_DEFAULT_RESOURCE_SCOPE level following docs here.
# Grep the Service Account used in the Cloud Run service for Data Mapping 
gcloud run services describe <NAME_OF_DATA_MAPPING_CLOUD_RUN_SERVICE> 

End User Workflows

Policy/Control Owner
  • Create a policy/control (e.g. a wipeout plan) by opening a PR and add a yaml file under the sub folder where stores all the policies/controls. See example here.
Data Owner
  • Register and annotate resources to associate the resources to its specific policies/controls by opening a PR and add a mapping yaml file under the sub folder where stores all the data mappings. The association of the resource to the corresponding policies/controls is achieved via annotations field. See example here.
Data Governor(TODO)

Directories

Path Synopsis
apis
v1alpha1
Package v1alpha1 contains versioned pmap contracts, e.g.
Package v1alpha1 contains versioned pmap contracts, e.g.
cmd
pmap
Package main is the main entrypoint to the application.
Package main is the main entrypoint to the application.
internal
testhelper
Package testhelper provides utilities that are intended to enable easier and more concise writing of test and probder code.
Package testhelper provides utilities that are intended to enable easier and more concise writing of test and probder code.
pkg
cli
Package cli implements the commands for the PMAP CLI.
Package cli implements the commands for the PMAP CLI.
mapping/processors
Package processors provides essential processors for pmap.
Package processors provides essential processors for pmap.
pmaperrors
Package pmaperrors defines the error wrapper for user facing errors.
Package pmaperrors defines the error wrapper for user facing errors.
server
Package server is the base server for the pmap event ingestion.
Package server is the base server for the pmap event ingestion.
test

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL