Completions

Architecture swarm · Master-record-canonicalization agent · Multi-source-ingestion skill · Build pillar · Published June 1, 2026

How to build multi-source canonical ingestion

A 200-location operator running 20 per-location source systems faces 4,000 source-location pairs that must stay coherent. Spreadsheet, CRM, listing-platform, POS, loyalty, helpdesk, and review-platform sources each ship a different field shape, different update frequency, different rate-limit, and different consent posture. The Pull + Resolve + Gate + Audit skill bundle on the master-record-canonicalization agent sits above your existing Zapier + Make + Workato + Boomi + MuleSoft + Microsoft Power Automate + n8n + Tray.io + Fivetran + Airbyte + Stitch + Hevo + Meltano integration surface and writes a canonical per-location master record with named regulatory anchors preserved in every audit record: per-API license terms, GDPR Article 28 data processor + Article 30 records of processing, EU-US Data Privacy Framework + Standard Contractual Clauses, CCPA service-provider + CPRA contractor + 17-state processor distinctions, DSAR fulfillment chain, SOC 2 Type II, ISO 27001 / 27701, NIST AI RMF.

The 4-skill bundle on the master-record-canonicalization agent

Pull

Per-source adapters translate each vendor schema into the canonical master-record schema. Per-spreadsheet (Google Sheets + Airtable + Notion + Smartsheet + Excel Graph + Coda). Per-CRM (Salesforce REST + Bulk + Streaming + HubSpot Custom Objects + Pipedrive + Zoho + Dynamics 365 + SugarCRM). Per-listing (Yext Knowledge Graph + Google Business Profile + BrightLocal + Whitespark + Synup + Rallio + SOCi + Uberall + Brandify + Moz Local + ChatMeter). Per-POS + per-loyalty + per-review- platform. Per-API polling + per-API webhook + per-API streaming + per-API conditional GET (If-Modified-Since + ETag) + per-API rate-limit coordination + per-API pagination + per-API retry (exponential backoff + circuit breaker + dead- letter queue) + per-API token rotation.

Resolve

Per-field source-priority (legal name — Yext priority; PostalAddress — Google Business Profile priority; Telephone E.164 — Google Business Profile priority; OpeningHoursSpecification — Google Business Profile priority + per-location override; PaymentAccepted — POS priority; AggregateRating — Google priority; Review — platform-of-origin priority). Per-field conflict resolution (most-recent-wins + highest-trust-source-wins + manual review + vote-majority + LLM-augmented tie-breakers under per-vendor zero-retention flag for operator review). Data-quality validation (per-field format + per-field required + per-field enumeration + per-record cross-field + per-record completeness score + per-record quality score).

Gate

Five anchors before canonical master-record write commits. Per-API license terms + per-source DPA + per-API rate-limit honoring + hiQ Labs v LinkedIn 9th Cir 2022 + Van Buren 2021 + Meta v Bright Data ND Cal 2024 (when third-party citation tracker involved). GDPR Article 5 + 6 + 28 data processor + Article 30 records of processing + Article 32 + Article 35 DPIA + Article 44-49 international transfers + EU-US Data Privacy Framework + Standard Contractual Clauses + UK IDTA + Swiss-US DPF + CCPA service-provider + CPRA contractor + 17- state comprehensive privacy + WA My Health My Data Act 2024 + Texas SCOPE Act 2024 + LGPD + DPDP + PIPEDA + CASL. DSAR fulfillment chain (GDPR Article 15 + 16 + 17 + 20 + CCPA right to know + CPRA right to correct + 17-state DSAR). FTC Section 5 substantiation (Pfizer 1972) + SOC 2 Type II CC2 + CC3 + CC6 + CC7 + CC8 + ISO 27001 + ISO 27701 + NIST SP 800-53 + NIST CSF 2.0. EU AI Act Article 10 data governance + Article 50 + Article 12 record-keeping + NIST AI RMF + ISO 42001 + per-vendor LLM zero-retention. Policy-as-code via OPA Rego + AWS Cedar + Casbin + Cerbos + Oso + Styra DAS + Permit.io.

Audit

Per-record immutable event log + per-record snapshot + per- record time-travel query (as-of-timestamp + as-of-version) + per-record rollback + per-record regulatory-defense export (FTC audit + state-AG audit + GDPR Article 15 + CCPA right to know + 17-state DSAR + EU AI Act Article 12 + SOC 2 + ISO 27001 / 27701 evidence). Storage on AWS S3 Object Lock + Azure Blob immutable + Google Cloud Storage Bucket Lock + Wasabi WORM. Retention stacks: 7-year FTC + 7-year IRS + GDPR Article 30 + CCPA 12-month look-back + per-state DSAR retention + EU AI Act Article 12 + SOC 2 CC7 / CC8. Change-event emission to Kafka + AWS Kinesis + Google Pub/Sub + Azure Event Hubs + RabbitMQ + Redpanda + Confluent classified by per-NAP change + hours / phone / rebrand / URL / service-area / image change, with sibling-handoff pointers downstream.

The real vendor ecosystem this sits above

Source-system APIs

Spreadsheets (Google Sheets + Airtable + Notion + Smartsheet + Microsoft Excel Graph + Coda + ClickUp Docs); CRMs (Salesforce REST + Bulk + Streaming + HubSpot Custom Objects + Pipedrive + Zoho + Microsoft Dynamics 365 + SugarCRM + SAP CRM); listings (Yext Knowledge Graph + Google Business Profile API + BrightLocal + Whitespark + Synup + Rallio + SOCi + Uberall + Brandify + Moz Local + ChatMeter); POS (Square + Toast + Clover + Lightspeed Retail + Shopify POS + NCR Aloha + Revel + Lavu + Vend + TouchBistro); loyalty (Smile.io + LoyaltyLion + Yotpo Loyalty + Annex Cloud + Stamped + Marigold + Talon.One + Punchh + Paytronix).

Integration + ELT + event bus

Zapier + Make + Workato + Boomi + MuleSoft CloudHub + IBM App Connect + Microsoft Power Automate + n8n + Tray.io integration platforms remain the orchestration substrate. Fivetran + Airbyte + Stitch + Hevo + Meltano land raw data in a warehouse. Kafka + AWS Kinesis + Google Pub/Sub + Azure Event Hubs + RabbitMQ + Redpanda + Confluent move change events downstream. OpenAI + Anthropic LLM tie-breakers under per- vendor zero-retention back conflict resolution.

Policy-as-code + WORM + sibling skills

OPA Rego + AWS Cedar + Casbin + Cerbos + Oso + Styra DAS + Permit.io policy-as-code expresses every Gate rule including per-API license terms, GDPR Article 28 / 30, EU-US DPF, SCCs, CCPA / CPRA / 17-state processor distinctions, DSAR propagation, and EU AI Act Article 10 data governance. AWS S3 Object Lock + Azure Blob immutable + Google Cloud Storage Bucket Lock + Wasabi compliance WORM holds the per-record audit substrate. Sibling skills on the same agent: conflict- resolution policy + per-vertical schema validation + change- event emission + versioned history + per-location custom system adapters.

The 6-workstream reporting cycle

Numeric uplift commitments are not made up-front. The engagement ships a pre-engagement baseline across six workstreams; the cycle tracks delta against that baseline. Reporting is the substrate, not the promise.

  1. Pull coverage. Per-source adapter coverage across 20+ source systems; per-API polling + webhook + streaming distribution; per-API rate-limit adherence; per-API retry telemetry.
  2. Resolve quality. Per-field source-priority consistency; per-field conflict-resolution distribution; LLM tie-breaker escalation rate; data-quality validation completeness + per-record completeness + quality score distribution.
  3. Gate quality. Per-anchor evaluation completeness (per-API license terms + GDPR Article 28 / 30 + EU-US DPF + SCCs + CCPA / CPRA / 17-state processor + DSAR + SOC 2 + ISO 27001 / 27701 + EU AI Act Article 10); per-anchor pass / fail / route- to-counsel distribution; per-API ToS violation count; cross- border transfer adequacy posture.
  4. Audit quality. Per-record event-log completeness; time-travel query latency; per-record rollback success rate; retention-window coverage; end-to-end replay success rate.
  5. Compliance posture. Per-API license-terms attestation cadence; DPA refresh cadence; GDPR Article 30 records-of-processing coverage; EU-US DPF + SCCs + UK IDTA coverage by data-transfer pair; CCPA / CPRA / 17-state processor-distinction posture; DSAR fulfillment turnaround; SOC 2 evidence cadence.
  6. Audit-trail completeness. Per-anchor regulatory citation completeness; change-event emission coverage to downstream event bus; sibling-handoff pointer completeness (conflict-resolution policy + per-vertical schema validation + versioned history + per-location custom system adapters within the master-record bundle; schema-audit-remediation agent; customer-graph agent; rollup-reporting agent).

Frequently asked questions

What is multi-source canonical ingestion for multi-location master-record operations — and why does 200 locations × 20 sources break the per-Zap pattern?

Multi-source canonical ingestion is the upstream substrate that feeds the per-location master record (legal name + DBA + PostalAddress + Telephone E.164 + Email + OpeningHoursSpecification + PaymentAccepted + PriceRange + CurrenciesAccepted + Languages + AreaServed + ServiceArea + Department + Brand + Parent Organization + Service children + AggregateRating + Review). A 200-location operator running 20 per-location source systems faces 4,000 source-location pairs that must stay coherent. Per-spreadsheet sources (Google Sheets, Airtable, Notion, Smartsheet, Microsoft Excel Graph, Coda) carry string-typed fields. Per-CRM sources (Salesforce REST + Bulk + Streaming, HubSpot Custom Objects, Pipedrive, Zoho, Microsoft Dynamics 365, SugarCRM) carry typed fields with different vocabulary. Per-listing sources (Yext Knowledge Graph, Google Business Profile, BrightLocal Citation Tracker, Whitespark, Synup, Rallio, SOCi LocalManager, Uberall CoreX, Brandify, Moz Local, ChatMeter) each ship a different attribute schema. Per-POS sources (Square, Toast, Clover, Lightspeed Retail, Shopify POS, NCR Aloha, Revel, Lavu, TouchBistro) and per-loyalty sources (Smile.io, LoyaltyLion, Yotpo Loyalty, Annex Cloud, Talon.One, Punchh, Paytronix) ship their own field shapes. The four-skill bundle on the master-record-canonicalization agent — Pull, Resolve, Gate, Audit — sits above your existing Zapier / Make / Workato / Boomi / MuleSoft / Microsoft Power Automate / n8n / Tray.io / Fivetran / Airbyte / Stitch / Hevo / Meltano integration surface and writes the per-location canonical master record with named regulatory citations preserved in the audit trail.

Why do Zapier + Make + Workato + Boomi + MuleSoft + n8n + Fivetran + Airbyte break at 200-location-20-source-canonical-master-record scale?

Zapier, Make (formerly Integromat), Workato, Boomi, MuleSoft CloudHub, IBM App Connect, Microsoft Power Automate, n8n, Tray.io ship per-tenant single-trigger single-destination automation primitives — a Zap pulls from one source and writes to one destination. Fivetran, Airbyte, Stitch, Hevo, and Meltano ship per-tenant ELT primitives that land raw data in a warehouse without conflict resolution or per-field canonical mapping. None composes a per-field source-priority rule across overlapping fields (legal name from Yext vs Salesforce vs HubSpot; PostalAddress from Google Business Profile vs the rest; OpeningHoursSpecification from Google Business Profile vs POS; AggregateRating from Google vs platform-of-origin). None enforces per-API rate-limit coordination across 20+ APIs polling the same record. None propagates a DSAR (data-subject-access-request) suppression through every downstream source on right-to-erasure. The four-skill bundle Pull + Resolve + Gate + Audit sits above the integration surface — it does not replace it. Pull runs per-source schema normalization + per-API polling / webhook / streaming + per-API rate-limit + per-API retry. Resolve runs per-field source-priority + conflict-resolution policy + data-quality validation + completeness scoring. Gate enforces per-API license terms + GDPR Article 28 / 30 + EU-US DPF + SCCs + CCPA / CPRA / 17-state processor + DSAR + SOC 2 + ISO 27001 / 27701. Audit writes an immutable per-record event log with time-travel query.

What does Pull do — per-source schema normalization + per-API polling / webhook / streaming + rate-limit coordination?

Pull runs per-source adapters that translate each vendor schema into the canonical master-record schema. Per-spreadsheet adapter (Google Sheets API column-to-field + Airtable table-field-to-field + Notion database-property-to-field + Smartsheet + Microsoft Excel Graph + Coda). Per-CRM adapter (Salesforce REST + Bulk + Streaming + HubSpot Custom Objects + Pipedrive + Zoho + Dynamics 365 + SugarCRM). Per-listing adapter (Yext Knowledge Graph + Google Business Profile + BrightLocal + Whitespark + Synup + Rallio + SOCi + Uberall + Brandify + Moz Local + ChatMeter). Per-POS + per-loyalty + per-review-platform adapter. Per-API polling spec (Google Sheets API 1-hour + Airtable 1-hour + Notion 30-min + BrightLocal 24-hour) and per-API webhook spec (Salesforce Streaming + HubSpot Webhooks + Yext Webhooks + Google Business Profile Pub/Sub + Shopify Webhooks). Per-API conditional GET (If-Modified-Since + ETag). Per-API rate-limit coordination + per-API pagination + per-API retry (exponential backoff + circuit breaker + dead-letter queue). Per-API authentication token rotation. Per-source confidence tier and explainability trace written into Audit at every pull.

What does Resolve do — per-field source-priority + conflict-resolution policy + data-quality validation?

Resolve runs three coordinated subsystems. Per-field source-priority spec: legal name (Yext priority over Salesforce over HubSpot); PostalAddress (Google Business Profile priority); Telephone E.164 (Google Business Profile priority); OpeningHoursSpecification (Google Business Profile priority + per-location override); PaymentAccepted (POS priority); AggregateRating (Google priority); Review (platform-of-origin priority). Per-field conflict-resolution policy: most-recent-wins, highest-trust-source-wins, manual-review, vote-majority-wins, LLM-augmented tie-breaker (OpenAI + Anthropic under per-vendor zero-retention) flag for operator review rather than auto-resolve where the operator policy requires. Data-quality validation: per-field format (Phone E.164, Address canonical, Hours OpeningHoursSpecification, Email RFC 5322, URL canonical, GeoCoordinates) + per-field required-field check + per-field enumeration check + per-record cross-field validation (Phone E.164 matches Address country, Hours matches Address time zone, AggregateRating consistent with Review count) + per-record completeness score + per-record quality score. Every Resolve decision writes a confidence tier + explainability trace + per-vendor source contribution + consent state into Audit.

What does Gate do — per-API license terms + GDPR Article 28 / 30 + EU-US DPF + SCCs + CCPA / CPRA / 17-state processor + DSAR + SOC 2 + ISO 27001 / 27701 + EU AI Act Article 10 + NIST AI RMF?

Gate evaluates five operationally distinctive anchors before any canonical master-record write commits. Anchor 1 (the most operationally distinctive): per-API license terms across the 20+ source-system surface (Google Business Profile API Terms of Service; Yext API Terms; Salesforce Master Subscription Agreement; HubSpot Customer Terms; Microsoft Graph License; Notion API Terms; Airtable API Terms; per-CRM ToS; per-listing platform ToS) plus per-source Data Processing Addendum (DPA); per-API rate-limit honoring + per-API pagination contract; per-API authentication-token rotation per vendor contract; hiQ Labs v LinkedIn 9th Cir 2022 + Van Buren v United States 2021 (CFAA “exceeds authorized access” scope) + Meta v Bright Data ND Cal 2024 where third-party citation-tracker ingestion (Whitespark + SOCi LocalManager + BrightLocal) crosses scraping doctrine. Anchor 2 (data-protection + cross-border transfer): GDPR Article 5 lawful basis + Article 6 + Article 28 data processor + Article 30 records of processing + Article 32 security + Article 35 DPIA + Article 44-49 international transfers; EU-US Data Privacy Framework (July 2023 adequacy decision); Standard Contractual Clauses; UK International Data Transfer Agreement (IDTA); Swiss-US Data Privacy Framework; CCPA service-provider definition + CPRA contractor definition; 17-state comprehensive privacy processor distinctions (Virginia VCDPA + Colorado CPA + Connecticut CTDPA + Utah UCPA + Texas TDPSA + Florida FDBR + Oregon OCPA + Montana CDPA + Iowa ICDPA + Indiana INCDPA + Tennessee TIPA + Delaware DPDPA + New Hampshire NHPA + New Jersey NJDPA + Maryland MODPA + Minnesota MCDPA + Rhode Island RIDPPA); Washington My Health My Data Act 2024; Texas SCOPE Act 2024; LGPD + DPDP + PIPEDA + CASL. Anchor 3 (DSAR fulfillment chain): CCPA right to know + CPRA right to correct + GDPR Article 15 right of access + Article 16 right to rectification + Article 17 right to erasure + Article 20 right to data portability + 17-state DSAR variations — DSAR requests must propagate through every downstream source on right-to-erasure; the versioned-history sibling skill provides the substrate. Anchor 4 (security + control framework): FTC Section 5 substantiation when ingested records drive marketing claims (Pfizer 1972 reasonable-basis); SOC 2 Type II Common Criteria CC2 communication + CC3 risk assessment + CC6 logical access + CC7 system operations + CC8 change management; ISO 27001; ISO 27701 privacy information management; NIST SP 800-53 + NIST Cybersecurity Framework 2.0; per-source incident-response runbook. Anchor 5: EU AI Act Article 10 data and data governance when ingested data feeds AI training (data-quality + relevance + representativeness + completeness) + Article 50 transparency for AI-generated content when LLM tie-breaker used in Resolve; NIST AI Risk Management Framework Govern + Map + Measure + Manage; ISO 42001 AI Management System; per-vendor LLM zero-retention verified per call. Policy-as-code expression via OPA Rego + AWS Cedar + Casbin + Cerbos + Oso + Styra DAS + Permit.io.

What does Audit do — immutable per-record event log + time-travel query + DSAR export + end-to-end replay?

Audit writes a per-record immutable event log: per-event canonical record state + per-event timestamp + per-event source + per-event actor + per-event change diff. Per-record snapshot spec supports time-travel query (as-of-timestamp + as-of-version). Per-record rollback spec supports per-source rollback when downstream agents flag a regression. Per-record regulatory-defense export ships canonical evidence packages for FTC audit + state-AG audit + GDPR Article 15 Right of Access + CCPA Right to Know + 17-state DSAR + EU AI Act Article 12 record-keeping (when ingested data trains AI systems) + SOC 2 evidence + ISO 27001 / 27701 evidence. Storage on AWS S3 Object Lock + Azure Blob immutable + Google Cloud Storage Bucket Lock + Wasabi compliance WORM. Retention stacks (longest applicable wins): 7-year FTC substantiation + 7-year IRS tax + GDPR Article 30 records of processing (typically 6 years) + CCPA 12-month look-back + per-state DSAR fulfillment retention + EU AI Act Article 12 + SOC 2 CC7 / CC8 evidence retention. End-to-end replay rewinds Pull + Resolve + Gate + change-event emission with confidence tier and explainability at every stage. Change-event emission to the downstream event-bus surface (Kafka + AWS Kinesis + Google Pub/Sub + Azure Event Hubs + RabbitMQ + Redpanda + Confluent) writes per-create + per-update + per-delete + per-merge + per-split events classified by per-NAP-change + per-hours-change + per-phone-change + per-name-rebrand + per-URL-change + per-service-area-change + per-image-change, with sibling-handoff pointers into the schema-audit-remediation agent (continuous schema audit + 17-schema-class JSON-LD generation + per-vertical schema validation), the customer-graph agent (cross-touchpoint identity resolution + deterministic + probabilistic hybrid identity resolution), and the rollup-reporting agent (monthly executive summary drafting + cohort-framed reports + per-franchisee accountability views).

Engage Completions on the master-record-canonicalization bundle

The Pull + Resolve + Gate + Audit four-skill bundle ships as the orchestration layer above your existing integration + ELT + event- bus surface. Per-API license terms + GDPR Article 28 / 30 + EU-US DPF + SCCs + CCPA / CPRA / 17-state processor + DSAR fulfillment chain + SOC 2 + ISO 27001 / 27701 + EU AI Act Article 10 anchors are preserved in every per-record audit record. Tier 1 AI Readiness Assessment scopes the bundle in two to three weeks; Tier 3 Fractional CMO with AI Swarm operates the bundle end-to-end.