differential-privacy: github.com/google/differential-privacy/privacy-on-beam/pbeam Index | Examples | Files

package pbeam

import "github.com/google/differential-privacy/privacy-on-beam/pbeam"

Package pbeam provides an API for building differentially private data processing pipelines using Apache Beam (https://beam.apache.org) with its Go SDK (https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam).

It introduces the concept of a PrivatePCollection, an interface mirroring Apache Beam's PCollection concept. PrivatePCollection implements additional restrictions and aggregations to facilitate differentially private analysis. This API is meant to be used by developers without differential privacy expertise.

For a step-by-step introduction to differential privacy, Apache Beam, and example usage of this library, see: https://codelabs.developers.google.com/codelabs/privacy-on-beam/index.html; a codelab meant for developers who want to get started on using this library and generating differentially private metrics.

The rest of this package-level comment goes into more detail about the precise guarantees offered by this API, and assumes some familiarity with the Apache Beam model, its Go SDK, and differential privacy.

To understand the main API contract provided by PrivatePCollection, consider the following example pipeline.

p := beam.NewPipeline()
s := p.Root()
// The input is a series of files in which each line contains the data of a privacy unit (e.g. an individual).
input := textio.Read(s, "/path/to/files/*.txt") // input is a PCollection<string>
// Extracts the privacy ID and the data associated with each line: extractID is a func(string) (userID,data).
icol := beam.ParDo(s, input, extractID) // icol is a PCollection<privacyUnitID,data>
// Transforms the input PCollection into a PrivatePCollection with parameters ε=1 and δ=10⁻¹⁰.
// The privacy ID is "hidden" by the operation: pcol behaves as if it were a PCollection<data>.
pcol := MakePrivate(s, icol, NewPrivacySpec(1, 1e-10)) // pcol is a PrivatePCollection<data>
// Arbitrary transformations can be applied to the data…
pcol = ParDo(s, pcol, someDoFn)
pcol = ParDo(s, pcol, otherDoFn)
// …and to retrieve PCollection outputs, differentially private aggregations must be used.
// For example, assuming pcol is now a PrivatePCollection<field,float64>:
sumParams := SumParams{MaxPartitionsContributed: 10, MaxValue: 5}
ocol := SumPerKey(s, pcol2, sumParams) // ocol is a PCollection<field,float64>
// And it is now possible to output this data.
textio.Write(s, "/path/to/output/file", ocol)

The behavior of PrivatePCollection is similar to the behavior of PCollection. In particular, it implements arbitrary per-record transformations via ParDo. However, the contents of a PrivatePCollection cannot be written to disk. For example, there is no equivalent of:

textio.Write(s, "/path/to/output/file", pcol)

In order to retrieve data encapsulated in a PrivatePCollection, it is necessary to use one of the differentially private aggregations provided with this library (e.g., count, sum, mean), which transforms the PrivatePCollection back into a PCollection.

This is because of the API contract provided by this library: once data is encapsulated in a PrivatePCollection, all its outputs are differentially private. More precisely, suppose a PrivatePCollection pcol is created from a PCollection<K,V> icol with privacy parameters (ε,δ), and output in one or several PCollections (ocol1, ocol2, ocol3). Let f be the corresponding randomized transformation, associating icol with (ocol1, ocol2, ocol3). Then f is (ε,δ)-differentially private in the following sense. Let icol' be the PCollection obtained by removing all records associated with a given value of K in icol. Then, for any set S of possible outputs:

P[f(icol) ∈ S] ≤ exp(ε) * P[f(icol') ∈ S] + δ.

The K, in the example above, is userID, representing a user identifier. This means that the full list of contributions of any given user is protected. However, this does not need to be the case; the protected property might be different than a user identifier. In this library, we use the more general terminology of "privacy unit" to refer to the type of this identifier (for example, user ID, event ID, a pair (user ID, day)); and "privacy identifier" to refer to a particular instance of this identifier (for example, user n°4217, event n°99, or the pair (user n°4127,2020-06-24)).

Note that the interface contract of PrivatePCollection has limitations. this library assumes that the user of the library is trusted with access to the underlying raw data. This intended user is a well-meaning developer trying to produce anonymized metrics about data using differential privacy. The API tries to make it easy to anonymize metrics that are safe to publish to untrusted parties; and difficult to break the differential privacy privacy guarantees by mistake.

However, this API does not attempt to protect against malicious library users. In particular, nothing prevents a user of this library from adding a side-effect to a ParDo function to leak raw data and bypass differential privacy guarantees. Similarly, ParDo functions are allowed to return errors that crash the pipeline, which could be abused to leak raw data. There is no protection against timing or side-channel attacks, as we assume that the only thing malicious users have access to is the output data.

Code:

// This example computes the "Sum-up revenue per day of the week" example
// from the Go Differential Privacy Library documentation, available at
// https://github.com/google/differential-privacy/go/README.md.
//
// It assumes that the input file, "week_data.csv", has the same format as
// the data used in the above example:
// https://github.com/google/differential-privacy/go/examples/data/week_data.csv

// visit contains the data corresponding to a single restaurant visit.
type visit struct {
    visitorID  string
    eurosSpent int
    weekday    int
}

// Initialize the pipeline.
beam.Init()
p := beam.NewPipeline()
s := p.Root()

// Load the data and parse each visit, ignoring parsing errors.
icol := textio.Read(s, "week_data.csv")
icol = beam.ParDo(s, func(s string, emit func(visit)) {
    data := strings.Split(s, ",")
    euros, err := strconv.Atoi(data[3])
    if err != nil {
        return
    }
    weekday, err := strconv.Atoi(data[4])
    if err != nil {
        return
    }
    emit(visit{data[0], euros, weekday})
}, icol)

// Transform the input PCollection into a PrivatePCollection.
epsilon, delta := 1.0, 1e-3
privacySpec := pbeam.NewPrivacySpec(epsilon, delta)
pcol := pbeam.MakePrivateFromStruct(s, icol, privacySpec, "visitorID")
// pcol is now a PrivatePCollection<visit>.

// Compute the differentially private sum-up revenue per weekday. To do so,
// we extract a KV pair, where the key is weekday and the value is the
// money spent.
pWeekdayEuros := pbeam.ParDo(s, func(v visit) (int, int) {
    return v.weekday, v.eurosSpent
}, pcol)
sumParams := pbeam.SumParams{
    // There is only a single differentially private aggregation in this
    // pipeline, so the entire privacy budget will be consumed (ε=1 and
    // δ=10⁻³). If multiple aggregations are present, we would need to
    // manually specify the privacy budget used by each.

    // If a visitor of the restaurant is present in more
    // than 4 weekdays, some of these contributions will be randomly dropped.
    // This is necessary to achieve differential privacy.
    MaxPartitionsContributed: 4,

    // If a visitor of the restaurant spends more than 50 euros, or less
    // than 0 euros, their contribution will be clamped. This is necessary
    // to achieve differential privacy.
    MinValue: 0,
    MaxValue: 50,
}
ocol := pbeam.SumPerKey(s, pWeekdayEuros, sumParams)

// ocol is a regular PCollection; it can be written to disk.
formatted := beam.ParDo(s, func(weekday int, sum int64) string {
    return fmt.Sprintf("Weekday n°%d: total spend is %d euros", weekday, sum)
}, ocol)
textio.Write(s, "spend_per_weekday.txt", formatted)

// Execute the pipeline.
if err := direct.Execute(context.Background(), p); err != nil {
    fmt.Printf("Pipeline failed: %v", err)
}

Index

Examples

Package Files

aggregations.go coders.go count.go distinct_id.go distinct_per_key.go mean.go pardo.go pbeam.go select_partitions.go sum.go

func Count Uses

func Count(s beam.Scope, pcol PrivatePCollection, params CountParams) beam.PCollection

Count counts the number of times a value appears in a PrivatePCollection, adding differentially private noise to the counts and doing pre-aggregation thresholding to remove counts with a low number of distinct privacy identifiers. It is also possible to manually specify the list of partitions present in the output, in which case the partition selection/thresholding step is skipped.

Note: Do not use when your results may cause overflows for int64 values. This aggregation is not hardened for such applications yet.

Count transforms a PrivatePCollection<V> into a PCollection<V, int64>.

func DistinctPerKey Uses

func DistinctPerKey(s beam.Scope, pcol PrivatePCollection, params DistinctPerKeyParams) beam.PCollection

DistinctPerKey estimates the number of distinct values associated to each key in a PrivatePCollection, adding differentially private noise to the estimates and doing pre-aggregation thresholding to remove estimates with a low number of distinct privacy identifiers.

DistinctPerKey does not support public partitions yet.

Note: Do not use when your results may cause overflows for Int64 values. This aggregation is not hardened for such applications yet.

DistinctPerKey transforms a PrivatePCollection<K,V> into a PCollection<K,int64>.

func DistinctPrivacyID Uses

func DistinctPrivacyID(s beam.Scope, pcol PrivatePCollection, params DistinctPrivacyIDParams) beam.PCollection

DistinctPrivacyID counts the number of distinct privacy identifiers associated with each value in a PrivatePCollection, adding differentially private noise to the counts and doing post-aggregation thresholding to remove low counts. It is conceptually equivalent to calling Count with MaxValue=1, but is specifically optimized for this use case. Client can also specify a PCollection of partitions.

Note: Do not use when your results may cause overflows for int64 values. This aggregation is not hardened for such applications yet.

DistinctPrivacyID transforms a PrivatePCollection<V> into a PCollection<V,int64>.

func MeanPerKey Uses

func MeanPerKey(s beam.Scope, pcol PrivatePCollection, params MeanParams) beam.PCollection

MeanPerKey obtains the mean of the values associated with each key in a PrivatePCollection<K,V>, adding differentially private noise to the means and doing pre-aggregation thresholding to remove means with a low number of distinct privacy identifiers. Client can also specify a PCollection of partitions.

Note: Do not use when your results may cause overflows for float64 values. This aggregation is not hardened for such applications yet.

MeanPerKey transforms a PrivatePCollection<K,V> into a PCollection<K,float64>.

func SelectPartitions Uses

func SelectPartitions(s beam.Scope, pcol PrivatePCollection, params SelectPartitionsParams) beam.PCollection

SelectPartitions performs differentially private partition selection using dpagg.PreAggSelectPartitions and returns the list of partitions to keep as a PCollection.

In a PrivatePCollection<K,V>, K is the partition key and in a PrivatePCollection<V>, V is the partition key. SelectPartitions transforms a PrivatePCollection<K,V> into a PCollection<K> and a PrivatePCollection<V> into a PCollection<V>.

func SumPerKey Uses

func SumPerKey(s beam.Scope, pcol PrivatePCollection, params SumParams) beam.PCollection

SumPerKey sums the values associated with each key in a PrivatePCollection<K,V>, adding differentially private noise to the sums and doing pre-aggregation thresholding to remove sums with a low number of distinct privacy identifiers. Client can also specify a PCollection of partitions.

Note: Do not use when your results may cause overflows for int64 and float64 values. This aggregation is not hardened for such applications yet.

SumPerKey transforms a PrivatePCollection<K,V> either into a PCollection<K,int64> or a PCollection<K,float64>, depending on whether its input is an integer type or a float type.

type CountParams Uses

type CountParams struct {
    // Noise type (which is either LaplaceNoise{} or GaussianNoise{}).
    //
    // Defaults to LaplaceNoise{}.
    NoiseKind NoiseKind
    // Differential privacy budget consumed by this aggregation. If there is
    // only one aggregation, both Epsilon and Delta can be left 0; in that
    // case, the entire budget of the PrivacySpec is consumed.
    Epsilon, Delta float64
    // The maximum number of distinct values that a given privacy identifier
    // can influence. If a privacy identifier is associated with more values,
    // random values will be dropped. There is an inherent trade-off when
    // choosing this parameter: a larger MaxPartitionsContributed leads to less
    // data loss due to contribution bounding, but since the noise added in
    // aggregations is scaled according to maxPartitionsContributed, it also
    // means that more noise is added to each count.
    //
    // Required.
    MaxPartitionsContributed int64
    // The maximum number of times that a privacy identifier can contribute to
    // a single count (or, equivalently, the maximum value that a privacy
    // identifier can add to a single count in total). If MaxValue=10 and a
    // privacy identifier is associated with the same value in 15 records, Count
    // ignores 5 of these records and only adds 10 to the count for this value.
    // There is an inherent trade-off when choosing MaxValue: a larger
    // parameter means that less records are lost, but a larger noise.
    //
    // Required.
    MaxValue int64
    // You can input the list of partitions present in the output if you know
    // them in advance. When you specify partitions, partition selection /
    // thresholding will be disabled and partitions will appear in the output
    // if and only if they appear in the set of public partitions.
    //
    // You should not derive the list of partitions non-privately from private
    // data. You should only use this in either of the following cases:
    // 	1. The list of partitions is data-independent. For example, if you are
    // 	aggregating a metric by hour, you could provide a list of all possible
    // 	hourly period.
    // 	2. You use a differentially private operation to come up with the list of
    // 	partitions. For example, you could use the keys of a DistinctPrivacyID
    // 	operation as the list of public partitions.
    //
    // Note that current implementation limitations only allow up to millions of
    // public partitions.
    //
    // Optional.
    PublicPartitions beam.PCollection
}

CountParams specifies the parameters associated with a Count aggregation.

type DistinctPerKeyParams Uses

type DistinctPerKeyParams struct {
    // Noise type (which is either LaplaceNoise{} or GaussianNoise{}).
    //
    // Defaults to LaplaceNoise{}.
    NoiseKind NoiseKind
    // Differential privacy budget consumed by this aggregation. If there is
    // only one aggregation, both Epsilon and Delta can be left 0; in that
    // case, the entire budget of the PrivacySpec is consumed.
    Epsilon, Delta float64
    // The maximum number of distinct keys that a given privacy identifier
    // can influence. If a privacy identifier is associated to more keys,
    // random keys will be dropped. There is an inherent trade-off when
    // choosing this parameter: a larger MaxPartitionsContributed leads to less
    // data loss due to contribution bounding, but since the noise added in
    // aggregations is scaled according to maxPartitionsContributed, it also
    // means that more noise is added to each count.
    //
    // Required.
    MaxPartitionsContributed int64
    // The maximum number of distinct values a given privacy identifier can
    // contribute to for each key. There is an inherent trade-off when choosing this
    // parameter: a larger MaxContributionsPerPartition leads to less data loss due
    // to contribution bounding, but since the noise added in aggregations is
    // scaled according to maxContributionsPerPartition, it also means that more
    // noise is added to each mean.
    //
    // Required.
    MaxContributionsPerPartition int64
}

DistinctPerKeyParams specifies the parameters associated with a DistinctPerKeyParams aggregation.

type DistinctPrivacyIDParams Uses

type DistinctPrivacyIDParams struct {
    // Noise type (which is either LaplaceNoise{} or GaussianNoise{}).
    //
    // Defaults to LaplaceNoise{}.
    NoiseKind NoiseKind
    // Differential privacy budget consumed by this aggregation. If there is
    // only one aggregation, both Epsilon and Delta can be left 0; in that
    // case, the entire budget of the PrivacySpec is consumed.
    Epsilon, Delta float64
    // The maximum number of distinct values that a given privacy identifier
    // can influence. If a privacy identifier is associated with more values,
    // random values will be dropped. There is an inherent trade-off when
    // choosing this parameter: a larger MaxPartitionsContributed leads to less
    // data loss due to contribution bounding, but since the noise added in
    // aggregations is scaled according to maxPartitionsContributed, it also
    // means that more noise is added to each count.
    //
    // Required.
    MaxPartitionsContributed int64
    // You can input the list of partitions present in the output if you know
    // them in advance. When you specify partitions, partition selection /
    // thresholding will be disabled and partitions will appear in the output
    // if and only if they appear in the set of public partitions.
    //
    // You should not derive the list of partitions non-privately from private
    // data. You should only use this in either of the following cases:
    // 	1. The list of partitions is data-independent. For example, if you are
    // 	aggregating a metric by hour, you could provide a list of all possible
    // 	hourly period.
    // 	2. You use a differentially private operation to come up with the list of
    // 	partitions. For example, you could use the keys of a DistinctPrivacyID
    // 	operation as the list of public partitions.
    //
    // Note that current implementation limitations only allow up to millions of
    // public partitions.
    //
    // Optional.
    PublicPartitions beam.PCollection
}

DistinctPrivacyIDParams specifies the parameters associated with a DistinctPrivacyID aggregation.

type GaussianNoise Uses

type GaussianNoise struct{}

GaussianNoise is an aggregations param that makes them use Gaussian Noise.

type LaplaceNoise Uses

type LaplaceNoise struct{}

LaplaceNoise is an aggregations param that makes them use Laplace Noise.

type MeanParams Uses

type MeanParams struct {
    // Noise type (which is either LaplaceNoise{} or GaussianNoise{}).
    //
    // Defaults to LaplaceNoise{}.
    NoiseKind NoiseKind
    // Differential privacy budget consumed by this aggregation. If there is
    // only one aggregation, both Epsilon and Delta can be left 0; in that
    // case, the entire budget of the PrivacySpec is consumed.
    Epsilon, Delta float64
    // The maximum number of distinct values that a given privacy identifier
    // can influence. There is an inherent trade-off when choosing this
    // parameter: a larger MaxPartitionsContributed leads to less data loss due
    // to contribution bounding, but since the noise added in aggregations is
    // scaled according to maxPartitionsContributed, it also means that more
    // noise is added to each mean.
    //
    // Required.
    MaxPartitionsContributed int64
    // The maximum number of contributions from a given privacy identifier
    // for each key. There is an inherent trade-off when choosing this
    // parameter: a larger MaxContributionsPerPartition leads to less data loss due
    // to contribution bounding, but since the noise added in aggregations is
    // scaled according to maxContributionsPerPartition, it also means that more
    // noise is added to each mean.
    //
    // Required.
    MaxContributionsPerPartition int64
    // The total contribution of a given privacy identifier to partition can be
    // at at least MinValue, and at most MaxValue; otherwise it will be clamped
    // to these bounds. For example, if a privacy identifier is associated with
    // the key-value pairs [("a", -5), ("a", 2), ("b", 7), ("c", 3)] and the
    // (MinValue, MaxValue) bounds are (0, 5), the contribution for "a" will be
    // clamped up to 0, the contribution for "b" will be clamped down to 5, and
    // the contribution for "c" will be untouched. There is an inherent
    // trade-off when choosing MinValue and MaxValue: a small MinValue and a
    // large MaxValue means that less records will be clamped, but that more
    // noise will be added.
    //
    // Required.
    MinValue, MaxValue float64
    // You can input the list of partitions present in the output if you know
    // them in advance. When you specify partitions, partition selection /
    // thresholding will be disabled and partitions will appear in the output
    // if and only if they appear in the set of public partitions.
    //
    // You should not derive the list of partitions non-privately from private
    // data. You should only use this in either of the following cases:
    // 	1. The list of partitions is data-independent. For example, if you are
    // 	aggregating a metric by hour, you could provide a list of all possible
    // 	hourly period.
    // 	2. You use a differentially private operation to come up with the list of
    // 	partitions. For example, you could use the keys of a DistinctPrivacyID
    // 	operation as the list of public partitions.
    //
    // Note that current implementation limitations only allow up to millions of
    // public partitions.
    //
    // Optional.
    PublicPartitions beam.PCollection
}

MeanParams specifies the parameters associated with a Mean aggregation.

type NoiseKind Uses

type NoiseKind interface {
    // contains filtered or unexported methods
}

NoiseKind represents the kind of noise to be used in an aggregations.

type PartitionsMapFn Uses

type PartitionsMapFn struct {
    PartitionType beam.EncodedType
    // contains filtered or unexported fields
}

PartitionsMapFn makes a map consisting of public partitions.

func (*PartitionsMapFn) AddInput Uses

func (fn *PartitionsMapFn) AddInput(m mapAccum, partitionKey beam.X) mapAccum

AddInput adds the public partition key to the map

func (*PartitionsMapFn) CreateAccumulator Uses

func (fn *PartitionsMapFn) CreateAccumulator() mapAccum

CreateAccumulator creates a new accumulator for the appropriate data type

func (*PartitionsMapFn) ExtractOutput Uses

func (fn *PartitionsMapFn) ExtractOutput(m mapAccum) pMap

ExtractOutput returns the completed partition map

func (*PartitionsMapFn) MergeAccumulators Uses

func (fn *PartitionsMapFn) MergeAccumulators(a, b mapAccum) mapAccum

MergeAccumulators adds the keys from a to b

func (*PartitionsMapFn) Setup Uses

func (fn *PartitionsMapFn) Setup()

Setup is our "constructor"

type PrivacySpec Uses

type PrivacySpec struct {
    // contains filtered or unexported fields
}

PrivacySpec contains information about the privacy parameters used in a PrivatePCollection. It encapsulates a privacy budget that must be shared between all aggregations on PrivatePCollections using this PrivacySpec. If you have multiple pipelines in the same binary, and want them to use different privacy budgets, call NewPrivacySpec multiple times and give a different PrivacySpec to each PrivatePCollection.

func NewPrivacySpec Uses

func NewPrivacySpec(epsilon, delta float64, options ...PrivacySpecOption) *PrivacySpec

NewPrivacySpec creates a new PrivacySpec with the specified privacy budget and options.

The epsilon and delta arguments are the total (ε,δ)-differential privacy budget for the pipeline. If there is only one aggregation, the entire budget will be used for this aggregation. Otherwise, the user must specify how the privacy budget is split across aggregations.

type PrivacySpecOption Uses

type PrivacySpecOption interface {
    // contains filtered or unexported methods
}

PrivacySpecOption is used for customizing PrivacySpecs. In the typical use case, PrivacySpecOptions are passed into the NewPrivacySpec constructor to create a further customized PrivacySpec.

type PrivatePCollection Uses

type PrivatePCollection struct {
    // contains filtered or unexported fields
}

A PrivatePCollection embeds a PCollection, associating each element to a privacy identifier, and ensures that its content can only be written to a sink after being anonymized using differentially private aggregations.

We call "privacy identifier" the value of the identifier associated with a record (e.g. 62934947), and "privacy unit" the semantic type of this identifier (e.g. "user ID"). Typical choices for privacy units include user IDs or session IDs. This choice determines the privacy unit protected by differential privacy. For example, if the privacy unit is user ID, then the output of aggregations will be (ε,δ)-indistinguishable from the output obtained via PrivatePCollection in which all records associated with a single user ID have been removed, or modified.

Some operations on PCollections are also available on PrivatePCollection, for example a limited subset of ParDo operations. They transparently propagate privacy identifiers, preserving the privacy guarantees of the PrivatePCollection.

func MakePrivate Uses

func MakePrivate(_ beam.Scope, col beam.PCollection, spec *PrivacySpec) PrivatePCollection

MakePrivate transforms a PCollection<K,V> into a PrivatePCollection<V>, where <K> is the privacy unit.

func MakePrivateFromProto Uses

func MakePrivateFromProto(s beam.Scope, col beam.PCollection, spec *PrivacySpec, idFieldPath string) PrivatePCollection

MakePrivateFromProto creates a PrivatePCollection from a PCollection of proto messages and the qualified name of the field to use as a privacy key. The field and all its parents must be non-repeated, and the field itself cannot be a submessage.

func MakePrivateFromStruct Uses

func MakePrivateFromStruct(s beam.Scope, col beam.PCollection, spec *PrivacySpec, idFieldPath string) PrivatePCollection

MakePrivateFromStruct creates a PrivatePCollection from a PCollection of structs and the qualified path (seperated by ".") of the struct field to use as a privacy key. For example:

  type exampleStruct1 struct {
    IntField int
		 StructField exampleStruct2
  }

  type  exampleStruct2 struct {
    StringField string
  }

If col is a PCollection of exampleStruct1, you could use "IntField" or "StructField.StringField" as idFieldPath.

Caution

The privacy key field must be a simple type (e.g. int, string, etc.), or a pointer to a simple type and all its parents must be structs or pointers to structs.

If the privacy key field is not set, all elements without a set field will be attributed to the same (default) privacy unit, likely degrading utility of future DP aggregations. Similarly, if the idFieldPath or any of its parents are nil, those elements will be attributed to the same (default) privacy unit as well.

func ParDo Uses

func ParDo(s beam.Scope, doFn interface{}, pcol PrivatePCollection) PrivatePCollection

ParDo applies the given function to all records, propagating privacy identifiers. For now, it only works if doFn is a function that has one of the following types.

Transforms a PrivatePCollection<X> into a PrivatePCollection<Y>:
	- func(X) Y
	- func(context.Context, X) Y
	- func(X) (Y, error)
	- func(context.Context, X) (Y, error)
	- func(X, emit), where emit has type func(Y)
	- func(context.Context, X, emit), where emit has type func(Y)
	- func(X, emit) error, where emit has type func(Y)
	- func(context.Context, X, emit) error, where emit has type func(Y)

Transforms a PrivatePCollection<X> into a PrivatePCollection<Y,Z>:
	- func(X) (Y, Z)
	- func(context.Context, X) (Y, Z)
	- func(X) (Y, Z, error)
	- func(context.Context, X) (Y, Z, error)
	- func(X, emit), where emit has type func(Y, Z)
	- func(context.Context, X, emit), where emit has type func(Y, Z)
	- func(X, emit) error, where emit has type func(Y, Z)
	- func(context.Context, X, emit) error, where emit has type func(Y, Z)

Transforms a PrivatePCollection<W,X> into a PrivatePCollection<Y>:
	- func(W, X) Y
	- func(context.Context, W, X) Y
	- func(W, X) (Y, error)
	- func(context.Context, W, X) (Y, error)
	- func(W, X, emit), where emit has type func(Y)
	- func(context.Context, W, X, emit), where emit has type func(Y)
	- func(W, X, emit) error, where emit has type func(Y)
	- func(context.Context, W, X, emit) error, where emit has type func(Y)

Transforms a PrivatePCollection<W,X> into a PrivatePCollection<Y,Z>:
	- func(W, X) (Y, Z)
	- func(context.Context, W, X) (Y, Z)
	- func(W, X) (Y, Z, error)
	- func(context.Context, W, X) (Y, Z, error)
	- func(W, X, emit), where emit has type func(Y, Z)
	- func(context.Context, W, X, emit), where emit has type func(Y, Z)
	- func(W, X, emit) error, where emit has type func(Y, Z)
	- func(context.Context error, W, X, emit), where emit has type func(Y, Z)

type SelectPartitionsParams Uses

type SelectPartitionsParams struct {
    // Differential privacy budget consumed by this aggregation. If there is
    // only one aggregation, both Epsilon and Delta can be left 0; in that
    // case, the entire budget of the PrivacySpec is consumed.
    Epsilon, Delta float64
    // The maximum number of distinct keys that a given privacy identifier
    // can influence. If a privacy identifier is associated to more keys,
    // random keys will be dropped. There is an inherent trade-off when
    // choosing this parameter: a larger MaxPartitionsContributed leads to less
    // data loss due to contribution bounding, but since the noise added in
    // aggregations is scaled according to maxPartitionsContributed, it also
    // means that more noise is added to each count.
    //
    // Required.
    MaxPartitionsContributed int64
}

SelectPartitionsParams specifies the parameters associated with a SelectPartitions aggregation.

type SumParams Uses

type SumParams struct {
    // Noise type (which is either LaplaceNoise{} or GaussianNoise{}).
    //
    // Defaults to LaplaceNoise{}.
    NoiseKind NoiseKind
    // Differential privacy budget consumed by this aggregation. If there is
    // only one aggregation, both Epsilon and Delta can be left 0; in that
    // case, the entire budget of the PrivacySpec is consumed.
    Epsilon, Delta float64
    // The maximum number of distinct values that a given privacy identifier
    // can influence. There is an inherent trade-off when choosing this
    // parameter: a larger MaxPartitionsContributed leads to less data loss due
    // to contribution bounding, but since the noise added in aggregations is
    // scaled according to maxPartitionsContributed, it also means that more
    // noise is added to each count.
    //
    // Required.
    MaxPartitionsContributed int64
    // The total contribution of a given privacy identifier to partition can be
    // at at least MinValue, and at most MaxValue; otherwise it will be clamped
    // to these bounds. For example, if a privacy identifier is associated with
    // the key-value pairs [("a", -5), ("a", 2), ("b", 7), ("c", 3)] and the
    // (MinValue, MaxValue) bounds are (0, 5), the contribution for "a" will be
    // clamped up to 0, the contribution for "b" will be clamped down to 5, and
    // the contribution for "c" will be untouched. There is an inherent
    // trade-off when choosing MinValue and MaxValue: a small MinValue and a
    // large MaxValue means that less records will be clamped, but that more
    // noise will be added.
    //
    // Required.
    MinValue, MaxValue float64
    // You can input the list of partitions present in the output if you know
    // them in advance. When you specify partitions, partition selection /
    // thresholding will be disabled and partitions will appear in the output
    // if and only if they appear in the set of public partitions.
    //
    // You should not derive the list of partitions non-privately from private
    // data. You should only use this in either of the following cases:
    // 	1. The list of partitions is data-independent. For example, if you are
    // 	aggregating a metric by hour, you could provide a list of all possible
    // 	hourly period.
    // 	2. You use a differentially private operation to come up with the list of
    // 	partitions. For example, you could use the keys of a DistinctPrivacyID
    // 	operation as the list of public partitions.
    //
    // Note that current implementation limitations only allow up to millions of
    // public partitions.
    //
    // Optional.
    PublicPartitions beam.PCollection
}

SumParams specifies the parameters associated with a Sum aggregation.

Package pbeam imports 23 packages (graph). Updated 2021-01-19. Refresh now. Tools for package owners.