Pseudonymization and Disabling Personalization

How Promoted protects Personally Identifiable Information (PII) and user data

Promoted takes the utmost care to process our customers' user data. Our prior experience and the purpose-built foundations of our system enforced our data protection policies through strong centralized design collaboration and review.

Avoid Collection

The simplest, most reliable data protection method is never to collect data in the first place. Promoted only has access to the data explicitly shared by our customers. Our defined interfaces avoid collecting identifiable customer information wherever possible, and customers are discouraged from submitting identifying information in semi-structured inputs, for example, in the User CMS.

Pseudonymization Before Persistence

Promoted implements “pseudonymization” as recommended by the European Parliament in GDPR using recommended methodologies. User identifiers like User IDs are substituted with a replacement key before activity data is written in a durable format. The substitution mapping cannot be reconstructed and is stored separately from the subject data. Upon deletion request, or after a period configured by the customer, the mapping is deleted, and it is no longer feasible to identify records using the replacement key.

Restricted Distribution and Retention

Mistaken use of free text and unstructured customer data can evade reasonable ingestion avoidance. Additionally, replication of denormalized mappings could undermine the efficacy of pseudonymization. We address these potentials with an opinionated engineering design of data processing for modeling and analytics purposes. Derived datasets written to storage at Promoted are produced by source-controlled code subject to the same design, source control, review, testing, and monitoring as any other engineered system. The attention to review and detection capabilities provide straightforward guards against the propagation of protected fields explicitly or mistakenly.

Considered design of derived datasets for the concerns of their end use case is a subtle but important prevention for identifiable information in the free text being inadvertently retained. For example, we separate derived datasets for machine learning training and analytics display.

  • Machine learning training datasets are more likely to use free text fields as features, but since new models are trained every few days, those training examples are short-lived and not useful for long-term consideration. Therefore, Promoted retains training data for at least one year or less, depending on customer configuration.
  • Analytic datasets used for long-term conclusions over years are aggregates that don’t use free text data without reducing to low cardinality. It is important that these aggregates are stable or predictably restated, and pseudonymization patterns serve the need to retain granular facts without identifiable information. Any dataset retained indefinitely for reaggregation by customers has the design constraint that it includes only dimensions sufficiently conformed such that they can be validated as non-identifiable.
  • Cases where free text provides context for long-term analytics are staff or customer-facing tools in which presentation time retrieval via pseudonymization mapping is sufficiently performant. These cases do not include data related to protecting and managing identifiable or sensitive user data.

User Deletion Requests

Promoted supports user deletion in the User CMS API. In addition to deleting the primary record, we use data lake tables that support row-level deletion (e.g., Paimon) to delete any derived records associated with the user. To initiate a full user deletion for use cases that include GDPR compliance, send the necessary information to [email protected], and the request will be executed within 24 hours.

Disable Personalization (disablePersonalization)

Personalization is when Promoted uses user-specific data as ML features to optimize listings in search, feed, and ads. Personalization can be disabled per Delivery request via the Delivery API and per user via the User CMS. When personalization is disabled, no user- or session-specific information is used to make the machine-learning inferences used to optimize delivery. When this happens, the optimization is similar to when a new user starts a new session — the results are still optimized, but they are not personalized because no information about the user or session is used in that optimization.

To disable personalization dynamically per Delivery Request: Set disablePersonalization=True on the Delivery.Request object to disable personalization per Delivery Request. See the example in the Typescript SDK.

To disable personalization persistently per User: Set the key-valuedisablePersonalization=1 anywhere in the .user’s User CMS record. Note that changes to the User CMS record may take a few seconds to propogate to live delivery.

ML Opt-Out (ignoreUsage)

Promoted supports entirely excluding traffic from use in machine learning – both as training examples and as features using the ignoreUsage flag. When the ignoreUsage flag is set, engagement directly attributed to a Delivery Request insertion, like impressions and clicks, will be excluded from machine learning as training examples, features, and parts of aggregate metrics used as features. Engagement indirectly attributed to like purchases will never be attributed to an insertion attached to a Delivery Request with ignoreUsage. See Ruby SDK for an example of the “ignoreUsage” parameter.

ignoreUsage is useful for traffic that should never be used to model real user behavior. For example: integration testing, internal employee traffic, and bots. It is not intended to disable personalization. Some personalization may still be enabled if the ignoreUsage flag is set. Use disablePersonalization flag to disable personalization.