Ads Data Hub: User-Provided Data Matching (UPDM)

Ads Data Hub (ADH) lets you join first-party data with Google's ad impression data inside a privacy-controlled environment. The feature that makes this possible is user-provided data matching (UPDM).

UPDM is not a simple join. It involves specific hashing requirements, a match table architecture that differs from ordinary BigQuery tables, privacy thresholds that determine whether query results are returned at all, and consent requirements that vary by geography.

What follows is a technical reference based on Google's official ADH documentation, covering what an engineer needs to know before implementing it.

What UPDM does

UPDM joins first-party data you have collected about your users (from your website, app, CRM, or physical store) with the same users' signed-in activity across Google ad inventory.

Two things that make this different from a standard BigQuery join:

The data lives in Google's infrastructure. You cannot directly access the raw impression data. All joins happen inside ADH's controlled environment, and only aggregated results are returned to your BigQuery dataset.
It only covers signed-in users. UPDM matches against Google accounts. Signed-out users, minors, and users who have not given consent (in applicable regions) cannot be matched - they are excluded from match tables regardless of whether you have their data.

Importantly: UPDM is not affected by third-party cookie deprecation because it operates on signed-in identity, not browser cookies.

Data you can match on

ADH accepts seven fields for matching. Each has specific formatting and hashing requirements:

Field	Hashing required	Format requirements
User ID	None (plain text)	String
Email	SHA256 → Base64	Lowercase, trimmed, no accents, remove periods before `@gmail.com` / `@googlemail.com` domain
Phone	SHA256 → Base64	E.164 format (`+[country][number]`), no spaces or special characters
First name	SHA256 → Base64	Lowercase, trimmed, no prefixes (Mr./Dr.), preserve accents
Last name	SHA256 → Base64	Lowercase, trimmed, no suffixes (Jr./III), preserve accents
Country	None	ISO 3166-1 alpha-2 (e.g., `US`, `DE`, `GB`)
Zip code	None	5-digit US or international format

The hashing transformation Google specifies is:

TO_BASE64(SHA256(user_data))

This is SHA256 followed by Base64 encoding - not plain SHA256 hex. A phone number hashed as plain hex SHA256 will not match.

Join key strength

Not all matching combinations are equal. Google ranks combinations by matching effectiveness from strongest to weakest:

Email + Phone + Address (first name + last name + country + zip)
Phone + Address
Email + Address
Email + Phone
Address alone
Phone alone
Email alone (weakest)

When using address matching, all four address components (first name, last name, country, zip) must be present. A partial address does not qualify as an address match.

For most CRM datasets, email is the most commonly available field. If phone numbers are also collected, including both significantly improves match rates.

How match tables work

When you run a match table generation query in ADH, it produces a companion table for each standard ADH table. For example:

Source table	Match table
`adh.google_ads_impressions`	`adh.google_ads_impressions_updm`
`adh.youtube_impressions`	`adh.youtube_impressions_updm`

The _updm table contains only the rows where a match was found between your first-party data and Google's data. Rows where no match exists are excluded.

Each match table includes a customer_data_user_id column that stores your user identifiers as BYTES.

To query against a match table, you cast your join key to BYTES:

SELECT
  imp.campaign_id,
  COUNT(*) AS impressions
FROM adh.google_ads_impressions_updm AS imp
JOIN my_dataset.my_users AS u
  ON imp.customer_data_user_id = CAST(u.user_id AS BYTES)
GROUP BY 1

String comparisons are case-sensitive in SQL. If your user IDs mix case, normalize before casting.

Match tables must be manually refreshed by rerunning the generation query. Each run overwrites the previous version - there is no incremental update.

Calculating match rate

Match rate tells you what fraction of eligible impressions were matched to your first-party data.

Not all impressions are eligible: signed-out users, minors, and unconsented users cannot be matched regardless of what data you have. ADH provides an is_updm_eligible field (available from October 1, 2024 onwards) that identifies the eligible population.

The match rate query:

WITH total_events AS (
  SELECT COUNT(*) AS n
  FROM adh.google_ads_impressions
  WHERE is_updm_eligible = TRUE
),
matched_events AS (
  SELECT COUNT(*) AS n
  FROM adh.google_ads_impressions_updm
)
SELECT
  SAFE_DIVIDE(matched_events.n, total_events.n) AS match_rate
FROM total_events, matched_events

SAFE_DIVIDE prevents division-by-zero if the eligible population is empty.

Interpreting match rate: because only signed-in, consented users are eligible, a match rate of 30–50% on the eligible population is often realistic even with good data quality. A very low rate (under 10%) suggests a data quality issue - misformatted emails, incorrect phone number formatting, or incomplete address data. A rate that seems too high (near 100%) may indicate the eligible population is very small.

Privacy thresholds

ADH enforces aggregation thresholds to protect individual user privacy. Rows that do not meet the threshold are excluded from query results.

The thresholds depend on the privacy mode and data type:

Mode	Minimum unique users per result row
Noise injection (default)	~20
Difference checks (legacy)	~50
Click and conversion data only	~10

Noise injection (the default for most ADH accounts) adds calibrated random noise to aggregated results. A true value of 35 impressions might return 37.8 after noise injection. This is the privacy mechanism that lets results be returned at all - without it, the threshold enforcement would simply suppress rows.

Difference checks are the older mechanism and are more aggressive about suppressing data. They are less predictable and lead to more data loss.

Rows with zeroed or null user IDs do not count toward aggregation thresholds.

Practical implication: do not expect ADH queries to return results for small audience segments. A segment of 15 users will not produce a result row. Design your analysis around audience sizes large enough to clear the threshold consistently.

EEA consent requirements

For users in the European Economic Area (EEA), additional consent requirements apply:

You must acknowledge per ADH account that you have obtained proper consent from EEA end users per Google's EU User Consent Policy.
This acknowledgment is required per account and upon each new first-party data upload.
For UPDM specifically: EU regulations mean UPDM is no longer available for policy-isolated non-network tables. Remove table suffixes to limit queries to consented users only.
Cross-service queries (joining data across multiple Google services) are incompatible with EEA consent mode during match table creation.

Failure to acknowledge EEA consent before an upload blocks the upload from being used in UPDM queries.

Setup checklist

Before running your first UPDM query:

Verify data location: first-party data must be in BigQuery. If you have VPC Service Controls enabled, the data must be within the VPC-SC perimeter.
Grant access: your ADH service account needs BigQuery read access to the dataset containing your first-party data.
Format and hash: apply TO_BASE64(SHA256(field)) to all required fields. Google provides validation scripts in JavaScript, Python, Go, Java, and SQL.
Check minimums: you must upload at least 1,000 records for UPDM to be eligible.
Acknowledge EEA consent: required before uploading data if you have EEA users.
Generate match table: navigate to Create > Report > "Private cloud match table generation" template. Edit with your column names and run.
Verify and query: join against the generated _updm table using CAST(user_id AS BYTES).

Common implementation mistakes

Using hex SHA256 instead of Base64-encoded SHA256. The correct transformation is TO_BASE64(SHA256(value)) - Base64 of the raw SHA256 bytes. Plain SHA256 hex output will not match Google's hash.

Not normalizing email before hashing. Emails must be lowercase and have periods stripped from the local part of Gmail addresses (j.smith@gmail.com becomes jsmith@gmail.com) before hashing. Case and format variations produce different hashes that will not match.

Not using E.164 format for phone numbers. +14155552671 is valid. 415-555-2671 is not. Phones without country codes will produce hashes that do not match.

Querying the match table without casting to BYTES. The customer_data_user_id column is BYTES. Joining without CAST(my_data.user_id AS BYTES) produces no results - not an error, just zero matches.

Expecting match tables to update automatically. They do not. Each refresh requires manually rerunning the generation query. For pipelines that upload new first-party data regularly, build the match table regeneration step into the workflow.

UPDM in a production ADH pipeline

In a production context, the UPDM workflow is typically:

First-party data (CRM, sales, attribution data) arrives in BigQuery on a schedule.
A preprocessing step hashes and formats the required fields.
The match table generation query runs, overwriting the previous _updm table.
ADH queries run against the refreshed match table.
Aggregated results land in your BigQuery dataset.
A downstream step reads the results for reporting or audience analysis.

In the DV360 & Ads Data Hub pipeline I built, UPDM match table generation was one state in a six-state async state machine driven by Cloud Tasks - because match table generation is asynchronous and subject to privacy-check failures with a 7-hour mandatory cooldown before retry.

If you are implementing Ads Data Hub user-provided data matching and running into hashing mismatches, privacy threshold failures, or EEA consent blocking, I can help diagnose and build a production-ready workflow.

Ads Data Hub: How User-Provided Data Matching (UPDM) Actually Works