For CDOs + data platform + marketing-ops leadership
Two hundred franchisee POS instances, four POS vendors, one canonical master record. Fivetran does not handle that.
Toast at some locations, Square at others, NCR at others, Lightspeed at others. Same fragmentation across CRM, booking, loyalty, and the local-marketing point- solutions each franchisee chose independently before the brand standardized. Enterprise ELT connects to one system at a time. The multi-source orchestration that lands every signal in one canonical master record at 200-location scale is operator-side architecture.
What this gets you
- Per-source adapter pattern for franchisee-owned systems — one adapter per source vendor (Toast POS + Square + NCR + Lightspeed + others); per-franchisee configuration applied through the adapter without vendor-specific code in the pipeline core.
- Schema canonicalization on ingest— 200 per-franchisee source schemas land in one canonical parent schema. The per-location attribution metadata (franchisee ID + location ID + banner ID + vertical tag + jurisdiction tag) follows every row through the pipeline.
- Real-time vs batch routing per source business value — POS receipts sync near-real-time; review-platform sync hourly; ad-platform-aggregate sync daily; high-volume low-value clickstream sampled at the source rather than ingested in full.
- Source-API drift detection and recovery — integration-health-monitoring runs continuous schema validation against expected source schemas; drift exceeds tolerance triggers alert plus adapter-update workflow. Source-API failures fall back to buffer-and-retry with exponential backoff plus dead-letter queue.
- Cost control via change-data-capture + per-source frequency tuning + sample-then-ingest — incremental sync ships only changed rows; per-source frequency aligns with business value; high-volume low-value sources sample at source. Naive full-refresh sync at 200-location scale produces material monthly cost the architecture prevents.
Two hundred franchisees chose four POS vendors before the brand standardized
A franchise system grew from 20 locations to 200 locations over a decade. The early franchisees chose their own POS systems — Toast at the Phoenix location, Square at the Tampa location, NCR at the Denver location, Lightspeed at the Boston location. Same fragmentation across CRM (Salesforce at some locations, HubSpot at others, none at the smaller franchisees), booking (Acuity at some, Mindbody at others, Vagaro at others), loyalty (the parent brand consolidated to one platform last year but legacy franchisee-chosen loyalty platforms still run during the migration), and the long tail of local-marketing point-solutions each franchisee deployed independently.
The CDO buys Fivetran for the parent-level ad-platforms-and-email-and-SMS-and-GBP integration. Fivetran connects to those parent-owned systems cleanly. The data lands in the warehouse. The CDO then needs to land the franchisee-owned POS plus CRM plus booking plus loyalty data in the same warehouse with per-location attribution. Fivetran has connectors for Toast and Square and NCR but not for the franchisee-specific configuration of each instance. Some franchisees run Toast on-premise with custom API endpoints. Some Square instances use the legacy v1 API. The NCR locations vary by region. The Lightspeed instances split between Lightspeed Restaurant and Lightspeed Retail with different schemas.
The data team builds custom ingestion code per franchisee. The code works on day one. The Toast Phoenix franchisee upgrades to a new Toast version six months later and the custom code breaks. The NCR Denver location switches to a different NCR product line and the custom code breaks. The Square Tampa franchisee migrates from Square v1 to v2 API and the custom code breaks. Each break requires data- team intervention. The data team spends 60 percent of its time maintaining ingestion code instead of producing analytics.
The per-source adapter pattern moves the per-source quirks out of the pipeline core and into versioned adapters per source vendor. One Toast adapter handles every Toast franchisee through per-franchisee configuration; same pattern for Square, NCR, and Lightspeed. Source-version drift triggers an adapter- update workflow rather than a custom-code rewrite. Schema canonicalization on ingest lands every franchisee in one canonical parent schema with the per-location attribution metadata attached. The downstream master record sees uniform shape regardless of which POS the franchisee runs.
What is in market — and what each category leaves to you
The ELT plus iPaaS plus marketing-data-specific connector primitive is mature. The multi-location multi-vendor orchestration at operator scale is architecture.
Modern ELT — Fivetran, Stitch (Talend), Airbyte, Hevo Data, Matillion, dbt
Excellent at extract-load-transform for parent- level systems with stable APIs. Pre-built connectors to ad platforms + ESPs + SMS providers + CRMs + warehouses + reverse-ETL targets. The per-franchisee source-vendor diversity (200 franchisees on four POS vendors with per-instance configuration), the per- source adapter pattern for the long tail of franchisee-owned systems, and the schema canonicalization with per-location attribution metadata are operator-side architecture on top of the connector primitive.
Enterprise iPaaS — MuleSoft, Workato, Boomi, Tray.io, Informatica, Talend Data Fabric
Strong at complex enterprise integration patterns with workflow orchestration, API management, and hybrid on-premise plus cloud connectivity. The multi-location-specific orchestration at 200- franchisee scale fits inside the iPaaS framework but requires the per-source adapter library, the schema canonicalization rules, and the per-location attribution metadata model to be designed and maintained as operator-side IP.
Marketing-data specialists — Funnel.io, Improvado, Adverity, Supermetrics, Cervinodata
Strong at ad-platform plus marketing-platform data consolidation with marketing-specific schema normalization. The offline-and-operational sources (POS + CRM + booking + loyalty) sit outside the marketing-data-specialist scope; those land via separate ingestion pipelines that the operator orchestrates.
Open-source ingestion — Apache Kafka + Connect, Airbyte OSS, Singer, Meltano, dbt + Snowflake / BigQuery / Redshift
Capable of building the full multi-source ingestion pipeline for operators with in-house data- engineering capacity. The per-source adapter library, the schema canonicalization rules, the per-location attribution metadata model, the source-API drift detection plus recovery workflow, the cost-control rules — all become operator- side IP. The orchestration partner can run the open-source pipeline against operator data without the operator carrying in-house engineering capacity.
The 200 custom ingestion scripts the data team maintains
The status quo at most multi-location operators that grew through franchisee acquisition over a decade. The data team spends 60 percent of its time maintaining custom ingestion code instead of producing analytics. Every source-API change triggers another rewrite. The adapter pattern moves the maintenance burden from custom-code-per- franchisee to versioned adapters per source vendor.
The pipeline, end to end
- Position in the 3-skill Parallel-Inputs → Process bundle on master-record. Custom-system-adapters (per-source connectors) + multi-source ingestion (this skill — orchestration) + per-vertical-schema-validation (downstream Process stage). The bundle is the 2nd Parallel-Inputs-family instance in arc (joins walk-in-phone-attribution Parallel-Inputs → Resolve).
- Per-source adapter library. One versioned adapter per source vendor (Toast POS, Square, NCR, Lightspeed for POS; Salesforce, HubSpot for CRM; Acuity, Mindbody, Vagaro for booking; franchisee-specific loyalty platforms; ESPs, SMS providers, ad platforms). Adapters handle vendor quirks (auth, pagination, rate limits, schema variants) so the pipeline core stays vendor-agnostic.
- Per-franchisee adapter configuration. Each franchisee instance loads its adapter through a configuration record (API endpoint, auth credential reference, schema variant flag, sync schedule override). No franchisee-specific code in the pipeline core; only configuration data per franchisee in the adapter library.
- Schema canonicalization on ingest. Every row from every source converts to the canonical parent schema at adapter output. Field-name mapping, type conversion, normalization (phone format, address format, timestamp timezone, currency unit), and missing-field defaults apply per source per franchisee. The downstream master record sees uniform shape.
- Per-location attribution metadata. Every ingested row carries franchisee ID + location ID + banner ID + vertical tag + jurisdiction tag in addition to the canonical row content. Downstream consumers (per-location attribution + customer journey resolution + compliance overlay) filter on the metadata without re-deriving it.
- Change-data-capture incremental sync. Each source-adapter combination ships only changed rows since the last sync watermark rather than full table refresh. Watermarks persist per source per franchisee. Source vendors that do not support CDC (some legacy systems) fall back to differential- fetch-and-diff at the adapter layer.
- Per-source sync-frequency tuning.Sync frequency aligns with business value — POS receipts near-real-time, review-platform hourly, ad-platform-aggregate daily, ecommerce-clickstream sampled at source. The tuning controls ELT cost against business need rather than syncing everything near-real-time.
- Sample-then-ingest for high-volume low-value sources. Clickstream events at high volume cost more to ingest than the analytical value they produce. The sampling pattern selects a representative subset at the source before ingestion. Sampling rate tunes per source per cycle against downstream model-accuracy impact.
- Conflict resolution. The same record may arrive from multiple sources at slightly different times (a customer record from POS plus the booking-system plus the loyalty platform). The resolution layer applies a priority rule per field (most-recent timestamp wins, source- authoritative-field wins, manual-curate field wins). Conflicts that exceed the rule library route to analyst review.
- Source-API drift detection (loop 43 integration). Continuous schema validation against expected source schemas runs alongside each adapter. Drift detection triggers alert + adapter-update workflow + (optional) temporary fall-back to schemaless ingestion until the adapter updates.
- Source-API failure recovery. Rate-limit exhaustion + source outage + auth expiration + network partition fall back to buffer- and-retry with exponential backoff. Failed records land in a dead-letter queue for analyst review. Idempotency keys prevent double-write on retry.
- Data lineage tracking. Every row in the master record carries lineage metadata (source vendor + source instance + ingest timestamp + adapter version + canonicalization version). Lineage queryable per row supports regulator audit, attribution-model validation, and upstream-failure root-cause analysis.
- Handoff to per-vertical-schema-validation Process stage. Every ingested row passes to the per-vertical-schema- validation Process stage before commit. HIPAA + FDA + FINRA + cannabis-state + PCI + GDPR + CCPA libraries evaluate. Hard violations reject or quarantine; soft violations route to remediation; auto-fixable issues fix in place. The Input → Process pair completes.
Frequently asked
What is marketing data integration?
Marketing data integration is the practice of unifying customer-facing and operations data from every system that produces it — POS, CRM, ad platforms (Google + Meta + TikTok + others), email service provider, SMS provider, Google Business Profile API, review platforms, booking systems, loyalty platforms, ecommerce, analytics platforms — into a single canonical master record per customer per location. Enterprise ELT platforms (Fivetran, Stitch, Airbyte, Hevo, Matillion, dbt), iPaaS (MuleSoft, Workato, Boomi, Tray.io, Informatica, Talend Data Fabric), and marketing-data specialists (Funnel.io, Improvado, Adverity, Supermetrics) ship the ETL primitive. The multi-source orchestration that lands the data in a canonical master record per location per franchisee at scale is operator-side architecture.
Why does single-tenant ETL fail multi-location operators?
A 200-location operator has 200 franchisee-owned POS instances, 200 franchisee-owned booking calendars, parent-owned ad platforms with per-location campaign segmentation, parent-owned email and SMS providers with per-location list segmentation, 200 GBP listings, 200 review-platform listings, and a centralized loyalty platform. Fivetran connects to a Shopify store or a Salesforce instance. It does not orchestrate 200 franchisee POS instances each running a different POS vendor. The multi-source ingestion layer that handles per-franchisee source diversity, per-source schema canonicalization at the parent level, and per-location attribution metadata on every ingested row is the multi-location-specific architecture.
How is this different from Fivetran, Stitch, Airbyte, Hevo, MuleSoft, Workato, Funnel.io, or Improvado?
Those platforms ship the connector primitive — pre-built integrations to common sources, scheduled or event-driven sync, schema-mapping configuration, observability. They are excellent at the connector layer. The per-source adapter pattern for franchisee-owned systems that vary across the 200-location footprint, the schema canonicalization that converts 200 per-franchisee POS schemas into one parent canonical schema, the per-location attribution metadata that follows every row through the pipeline, the conflict resolution when the same record arrives from multiple sources at slightly different times, and the integration with the per-vertical-schema-validation Process stage and the customer-data-orchestration emit-change broadcast are operator-side wiring on top of the connector primitive.
How does the Parallel-Inputs → Process architecture work?
The master-record-canonicalization agent owns a 3-skill bundle. Parallel-Input 1 (custom-system-adapters connects to each franchisee-owned source — Toast POS at some locations, Square at others, NCR at others; same pattern across CRM, booking, loyalty). Parallel-Input 2 (multi-source ingestion — this skill — orchestrates the parallel adapters, handles schema canonicalization, applies per-location attribution metadata). Process (per-vertical-schema-validation evaluates every ingested row against HIPAA, FDA, FINRA, cannabis-state, PCI, GDPR, CCPA libraries before commit). The bundle is the 2nd Parallel-Inputs-family instance in arc (joins walk-in-phone-attribution Parallel-Inputs → Resolve).
How do you handle source-API drift and source-API failures?
Source-API drift (Toast POS adds a new field, Meta Ads changes pagination semantics, Klaviyo deprecates an endpoint) surfaces through the integration-health-monitoring skill on the compliance-overlay-manager cluster. The monitoring runs continuous schema-validation against expected source schemas and alerts when drift exceeds the tolerance threshold. Source-API failures (rate-limit exhaustion, source-outage, auth expiration, network partition) fall back to a buffer-and-retry pattern with exponential backoff plus dead-letter queue for unrecoverable failures. The pipeline ensures at-least-once delivery to the master record with idempotency keys preventing double-write.
How do you handle multi-source ingestion cost at scale?
Cost grows with row volume across most enterprise ELT platforms (Fivetran charges per million-active-rows; Stitch charges per row; Airbyte open-source has compute cost). At 200-location scale with 10-plus high-volume sources per location, naive sync produces material monthly cost. The architecture controls cost three ways. First, change-data-capture incremental sync ships only changed rows rather than full table refreshes. Second, per-source sync frequency tuned against business value (POS receipts sync near-real-time; review-platform sync hourly; ad-platform-aggregate sync daily). Third, sample-then-ingest pattern for high-volume low-business-value sources (clickstream events sampled at the source rather than ingested in full).
Hire the agent that owns the master record + ingestion
The master-record-canonicalization agent owns the 3-skill Parallel-Inputs → Process bundle — custom-system-adapters + multi-source ingestion + per-vertical-schema-validation — sitting on top of whichever ELT (Fivetran, Stitch, Airbyte, Hevo, Matillion, dbt), iPaaS (MuleSoft, Workato, Boomi, Tray.io, Informatica, Talend Data Fabric), or marketing- data specialist (Funnel.io, Improvado, Adverity, Supermetrics) you license downstream. Per-source adapter library, schema canonicalization on ingest, per-location attribution metadata, change-data-capture incremental sync, per-source frequency tuning, sample-then-ingest for high-volume sources, source-API drift detection, data lineage tracking.
We scope on the call and send a private checkout link after.
Related reading: Location master-record sync · Per-vertical data validation · Customer data orchestration