Ahmad HumayunGet in touch
Data Engineering

Normalizing Affiliate Network Data: Awin, Rakuten, and TradeTracker

How to normalize affiliate network transaction data from Awin, Rakuten, TradeTracker, and similar networks into one reporting-ready dataset. The schema differences, status lifecycles, and validation checks that matter.

Affiliate networks track the same thing - publisher-driven sales and commissions - but they do it differently.

Each network has its own API, its own schema, its own commission status lifecycle, and its own date and currency handling.

When you have transactions across Awin, Rakuten, TradeTracker, or similar networks, you cannot put them side by side without normalization.

The core problem

At first glance, affiliate transaction data looks uniform: a transaction happened on a date, earned a commission, and has a status. That is true at a high level, but the details diverge quickly:

  • Transaction IDs: Each network assigns its own. There is no universal identifier, so deduplication has to happen per-network.
  • Commission status: Networks use different status names for the same lifecycle states. "Approved" on Awin is not always equivalent to "Validated" on TradeTracker. The status lifecycle (pending → approved → declined, or pending → validated → rejected) differs by network.
  • Currency: Networks report commissions in the publisher's or advertiser's account currency. If you aggregate across networks without converting to a base currency, totals are wrong.
  • Date fields: Networks typically expose a transaction date and a validation date. Some expose a click date. Which date to use for reporting depends on what question you are answering.
  • Attribution windows: Networks have different default attribution windows for click and impression tracking. A sale attributed on Awin may not be attributed the same way on another network.
  • Merchant and order identifiers: Networks may use different terms for the same concept (merchant vs. advertiser vs. program). Your pipeline needs a consistent key.

How to structure the pipeline

For affiliate normalization work, I build source-specific modules that handle each network's extraction and field mapping, then feed into shared normalization utilities.

Source modules handle:

  • Authentication (API keys, OAuth, or export download)
  • Pagination (most affiliate APIs are paginated)
  • Rate limit handling
  • Raw response parsing (JSON or XML)

Normalization layer handles:

  • Mapping network-specific status values to a shared status vocabulary (e.g., pending, approved, declined)
  • Normalizing date fields to UTC
  • Currency conversion if needed
  • Transaction ID deduplication per network
  • Merchant and order identifier mapping to shared keys

Output is a transaction table where one row represents one affiliate transaction with consistent field names, status vocabulary, and currency.

Commission status handling

The status lifecycle is where most affiliate reporting bugs live.

A transaction typically starts as pending, then moves to approved (or validated) when the network or advertiser confirms the commission, or declined (or rejected) if the sale is reversed or found invalid.

The tricky part:

  • Status changes happen after the transaction date. A transaction from last month may still be pending.
  • A declined transaction should not be included in commission totals.
  • If you load transaction data and then load it again next week, the status may have changed. Reruns need to update status, not just append new rows.

The correct pattern is a merge on transaction ID + network, not an append. Each run should update the status of existing rows and add new ones.

Validation checks

Before affiliate transaction data reaches a reporting table:

  • Total check: The sum of commissions in the loaded table should match the network's own reporting for that date range.
  • Status distribution: The split of pending, approved, and declined should look reasonable. If everything is pending or everything is approved, something is wrong with the status mapping.
  • Duplicate check: Transaction IDs should be unique per network. Duplicates mean the deduplication logic is broken.
  • Date range coverage: Rows should exist for the full date range requested. Gaps may indicate a pagination bug or API failure.

Awin-specific notes

Awin's API uses a REST endpoint with date-range filtering. Commission data is available at transaction level. The status field uses values like pending, approved, declined. Awin also exposes click data separately from transaction data.

Rakuten-specific notes

Rakuten's reporting API (formerly LinkShare) tends to deliver data in batch. The status lifecycle uses values like approved, returned, void. Returned and voided transactions represent declined commissions and should not be summed in positive commission totals.

TradeTracker-specific notes

TradeTracker uses an XML-based API in some versions. XML parsing is more fragile than JSON - schema changes can break parsers silently. The status values tend to use accepted, pending, rejected. TradeTracker's transaction data includes separate click and conversion timestamps.

When you need this

Affiliate network normalization is necessary when:

  • You report on commissions across more than one affiliate network
  • Your finance or operations team needs a single source of truth for affiliate transactions
  • You are building a dashboard that shows affiliate performance alongside other channel data
  • Commission totals need to reconcile against network invoices

It is not necessary when you only use one network and the network's own reporting UI is sufficient.

Related project

See the project write-up: Affiliate Network Data Pipelines.


If you need affiliate transaction data from Awin, Rakuten, TradeTracker, or similar networks normalized into a single reporting dataset, I can build the extraction, normalization, and validation pipeline.

AH

Ahmad Humayun

Data Engineering Consultant

Freelance data engineering consultant specialising in BigQuery, Dataform/dbt, marketing data pipelines, API automation, and AI-ready analytics layers. Based in Lahore, Pakistan — available worldwide.

Working through a messy reporting workflow, API integration, or BigQuery pipeline?

I can help design and build the reliable version.