Completions

Keep-customer swarm · Subscription Management Agent · Cancellation-reason-cluster-analytics skill · Build pillar · Published June 3, 2026

How to build LLM cancellation-reason clustering across 50 to 500 subscription locations

This guide explains how to architect the cancellation-reason-cluster-analytics skill on the subscription-management agent end-to-end at multi-location subscription scale: per-portfolio per-canonical-multi-source-text-ingestion + per-LLM-embedding-clustering + per-cluster-naming + per-other-bucket-elimination + per-per-location-cluster-prevalence + per-rolling-cluster-drift + per-cluster-cause-classification + per-cluster-to-save-flow-routing + per-cluster-to-product-team-feedback-routing + per-portfolio audit-trail.

What you will build

  • Per-portfolio per-canonical-multi-source-text-ingestion across per-multi-select-radio-on-cancel-flow + per-free-text + per-NPS-style-rating + per-post-cancel-email-survey + per-voice-call-transcript (Twilio Voice + RingCentral + Whisper + Deepgram + AssemblyAI) + per-chat-transcript (Intercom + Zendesk Chat + Drift + Crisp + Tidio) + per-text-normalization (Unicode NFC + stop-word removal + spelling correction + language detection).
  • Per-canonical-LLM-embedding-clustering substrate — per-OpenAI-text-embedding-3-small + per-OpenAI-text-embedding-3-large + per-Anthropic-Claude-embeddings + per-Cohere-embed-v3 + per-Voyage-AI-voyage-3 + per-Google-Vertex-AI-textembedding-gecko + per-cosine-Euclidean-Manhattan distance + per-HDBSCAN-DBSCAN-K-means-LDA-Agglomerative-Hierarchical-GMM clustering + per-UMAP-dimensionality-reduction + per-silhouette-score-optimal-cluster-count.
  • Per-canonical-cluster-naming via per-LLM-cluster-summarization (GPT-4 + Claude) + per-cluster-keyword-extraction (TF-IDF + KeyBERT) + per-human-in-loop-cluster-review + per-cluster-name-revision.
  • Per-canonical-other-bucket-elimination via per-LLM-text-classification of legacy other-bucket text + per-confidence-threshold spec (90-percent assign + 60-percent flag-for-human + below-60-percent create-new-cluster) + per-post-clustering-other-bucket-percentage-target (under 5%).
  • Per-portfolio per-per-location-cluster-prevalence — per-location-per-cluster-cancellation-count + percentage + per-vs-portfolio-baseline + per-percentile-ranking + per-rolling-30-day-90-day-365-day + per-Mann-Kendall-trend-test + per-cluster-emergence-detection + per-cluster-disappearance-detection.
  • Per-canonical-cluster-cause-classification across per-pricing-objection + per-product-quality + per-CS-experience + per-shipping-delivery + per-life-event + per-competitive-switch + per-temporary-pause-intent + per-account-management-friction + per-content-quality + per-onboarding-failure + per-cause-confidence-scoring.
  • Per-cluster-to-save-flow routing — per-pricing-routes-to-discount-save-flow + per-shipping-routes-to-shipping-credit-save-flow + per-life-event-routes-to-pause-save-flow + per-content-quality-routes-to-content-recommendation-save-flow + per-cluster-to-product-team-feedback-routing.

Why per-vendor-SurveyMonkey-Exit-Survey-single-account breaks at multi-location-subscription scale

Per-vendor-SurveyMonkey-canonical-Exit-Survey ships per-account per-survey per-presented-reason-list primitive. Per-vendor-Typeform + Hotjar + Qualtrics + Medallia + Forsta + Alchemer + Wootric + Delighted + Trustpilot + AskNicely-canonical-single-account ship per-vendor per-native exit-survey primitives.

At 1-location-1-subscription scale per-account per-survey per-presented-reason-list primitive is enough. At 200-location-200-subscription scale per-canonical-per-location-canonical-cancellation-reason-text-collection-canonical-multi-source + per-LLM-embedding-clustering-canonical-not-presented-reason-list + per-other-bucket-elimination + per-per-location-cluster-prevalence-canonical-vs-portfolio-baseline + per-rolling-cluster-drift + per-cluster-cause-classification + per-cluster-to-save-flow-routing.

Per-cross-vendor-exit-survey-fragmentation + per-presented-reason-list-only + per-other-bucket-60-percent-blind + per-per-location-cluster-prevalence-blind + per-cluster-drift-blind + per-cluster-cause-classification-blind + per-cluster-to-save-flow-routing-blind.

The operator-side architecture above per-vendor-exit-survey primitive is canonical-multi-source-text-ingestion + per-LLM-embedding-clustering + per-cluster-naming + per-other-bucket-elimination + per-per-location-cluster-prevalence + per-rolling-cluster-drift + per-cluster-cause-classification + per-cluster-to-save-flow-routing + per-cluster-to-product-team-feedback-routing + per-portfolio-audit-trail.

What is in market today

Per-platform per-exit-survey-vendor

SurveyMonkey, Typeform, Hotjar, Qualtrics, Medallia, Forsta (formerly FocusVision), Alchemer (formerly SurveyGizmo), Wootric, Delighted, Trustpilot, AskNicely, GetFeedback, Userpilot, Pendo, Sprig. Per-account per-survey per-presented-reason-list. Per-canonical-LLM-embedding-clustering-canonical-other-bucket-elimination-canonical-per-location-cluster-prevalence is not the primitive.

Per-platform per-LLM-embedding-vendor

OpenAI text-embedding-3-small + text-embedding-3-large, Anthropic Claude embeddings, Cohere embed-v3, Voyage AI voyage-3, Google Vertex AI textembedding-gecko, Hugging Face Inference, Jina AI, Mistral embeddings. Per-API-key per-embedding primitive. Per-canonical-per-portfolio-canonical-multi-vendor-embedding-canonical-cluster-naming-canonical-other-bucket-elimination is not the primitive.

Per-platform per-subscription-management-software

Chargebee, Stripe Billing, Recurly, Zuora, Maxio (formerly Chargify + SaaSOptics), ChartMogul, Baremetrics, ProfitWell (Paddle), Recharge (DTC subscriptions), Bold Subscriptions, Loop Subscriptions, Skio, OrderGroove, Smartrr. Per-account per-subscription per-cancellation-event. Per-canonical-cluster-prevalence-canonical-per-location-canonical-portfolio-baseline-canonical-cause-classification is not the primitive.

Per-platform per-clustering-platform

scikit-learn (Python), HDBSCAN (Python), DBSCAN (scikit-learn), Weaviate (vector DB clustering), Pinecone (vector DB), Qdrant (vector DB), Chroma (vector DB), Milvus (vector DB), LightGBM, XGBoost, RapidMiner, KNIME. Per-library per-developer primitive. Per-canonical-per-portfolio-canonical-cluster-naming-canonical-human-in-loop-canonical-cluster-drift-canonical-cluster-cause-classification is not the primitive.

How the architecture is built

  1. Per-portfolio per-canonical-multi-source-text-ingestion-substrate. Per-multi-select-radio-on-cancel-flow + per-free-text-on-cancel-flow + per-NPS-style-rating + per-post-cancel-email-survey + per-voice-call-transcript-Twilio-RingCentral-Whisper-Deepgram-AssemblyAI + per-chat-transcript-Intercom-Zendesk-Drift-Crisp-Tidio canonical-multi-source.
  2. Per-portfolio per-canonical-text-normalization. Per-Unicode-NFC + per-lowercase + per-stop-word-removal + per-spelling-correction + per-language-detection canonical-normalization.
  3. Per-portfolio per-canonical-LLM-embedding-vendor. Per-OpenAI-text-embedding-3-small + per-OpenAI-text-embedding-3-large + per-Anthropic-Claude-embeddings + per-Cohere-embed-v3 + per-Voyage-AI-voyage-3 + per-Google-Vertex-AI canonical-embedding.
  4. Per-portfolio per-canonical-similarity-metric. Per-cosine-distance + per-Euclidean-distance + per-Manhattan-distance canonical-similarity.
  5. Per-portfolio per-canonical-UMAP-dimensionality-reduction. Per-UMAP-pre-clustering canonical-dimensionality-reduction.
  6. Per-portfolio per-canonical-clustering-algorithm. Per-HDBSCAN + per-DBSCAN + per-K-means + per-LDA + per-Agglomerative-Hierarchical + per-Gaussian-Mixture-Model canonical-clustering.
  7. Per-portfolio per-canonical-optimal-cluster-count-via-silhouette-score. Per-silhouette-score + per-Davies-Bouldin-index + per-Calinski-Harabasz-index canonical-cluster-count.
  8. Per-portfolio per-canonical-cluster-naming. Per-LLM-cluster-summarization-GPT-4 + per-Claude-cluster-summarization + per-cluster-keyword-extraction-TF-IDF + per-cluster-keyword-extraction-KeyBERT + per-human-in-loop-cluster-review canonical-cluster-naming.
  9. Per-portfolio per-canonical-other-bucket-elimination. Per-pre-clustering-other-bucket-text + per-LLM-text-classification + per-90-percent-confidence-assign + per-60-percent-confidence-flag-for-human-review + per-below-60-percent-create-new-cluster + per-target-under-5-percent-other-bucket.
  10. Per-portfolio per-canonical-per-location-cluster-prevalence. Per-location-per-cluster-cancellation-count + percentage + per-vs-portfolio-baseline + per-percentile-ranking canonical-prevalence.
  11. Per-portfolio per-canonical-rolling-cluster-drift. Per-rolling-30-day-90-day-365-day-cluster-prevalence + per-Mann-Kendall-trend-test + per-cluster-emergence + per-cluster-disappearance canonical-drift.
  12. Per-portfolio per-canonical-cluster-cause-classification. Per-pricing-objection + per-product-quality + per-CS-experience + per-shipping-delivery + per-life-event + per-competitive-switch + per-temporary-pause + per-account-management + per-content-quality + per-onboarding-failure canonical-cause.
  13. Per-portfolio per-canonical-cluster-to-save-flow-routing + per-cluster-to-product-team-feedback-routing + audit-trail. Per-pricing-routes-to-discount-save-flow + per-shipping-routes-to-shipping-credit + per-life-event-routes-to-pause + per-content-quality-routes-to-content-recommendation + per-cluster-to-product-team-Slack-Jira-Linear + per-CMO-dashboard canonical-routing.

Frequently asked questions

What is LLM cancellation-reason clustering across 50 to 500 subscription locations?

LLM cancellation-reason clustering runs per-portfolio per-location per-subscription per-canonical-cancellation-reason-text-collection + per-canonical-LLM-embedding-clustering + per-canonical-cluster-naming + per-canonical-other-bucket-elimination + per-canonical-per-location-cluster-prevalence + per-canonical-per-location-vs-portfolio-baseline + per-canonical-rolling-30-day-90-day-365-day-cluster-drift + per-canonical-cluster-cause-classification + per-canonical-cluster-to-save-flow-routing + per-portfolio audit-trail. Per-canonical-cancellation-reason-text-collection runs per-canonical-multi-select-radio-on-cancel-flow (per-presented-reason-list + per-other-text-input) + per-canonical-free-text-on-cancel-flow + per-canonical-NPS-style-rating + per-canonical-post-cancel-email-survey-text + per-canonical-voice-call-transcript-on-cancel (per-Whisper-transcription + per-Deepgram-transcription per-canonical-voice-transcript) + per-canonical-chat-transcript-on-cancel (per-Intercom + per-Zendesk-Chat + per-Drift per-canonical-chat-transcript). Per-canonical-LLM-embedding-clustering runs per-OpenAI-text-embedding-3-small + per-OpenAI-text-embedding-3-large + per-Anthropic-Claude-embeddings + per-Cohere-embed-v3 + per-Voyage-AI-voyage-3 + per-cosine-distance + per-HDBSCAN + per-DBSCAN + per-K-means + per-Latent-Dirichlet-Allocation + per-UMAP-dimensionality-reduction. The per-platform exit-survey vendor category includes SurveyMonkey, Typeform, Hotjar, Qualtrics, Medallia, Forsta (formerly FocusVision), Alchemer (formerly SurveyGizmo), Wootric, Delighted, Trustpilot, AskNicely, GetFeedback, Userpilot, Pendo, Sprig.

Why does per-vendor-SurveyMonkey-canonical-Exit-Survey-canonical-single-account break down at multi-location-subscription scale?

Per-vendor-SurveyMonkey-canonical-Exit-Survey ships per-account per-survey per-presented-reason-list primitive. Per-vendor-Typeform + per-Hotjar + per-Qualtrics + per-Medallia + per-Forsta + per-Alchemer + per-Wootric + per-Delighted + per-Trustpilot + per-AskNicely-canonical-single-account ship per-vendor per-native exit-survey primitives. At 1-location-1-subscription scale per-account per-survey per-presented-reason-list primitive is enough. At 200-location-200-subscription scale per-canonical-per-location-canonical-cancellation-reason-text-collection-canonical-multi-source + per-canonical-LLM-embedding-clustering-canonical-not-presented-reason-list + per-canonical-other-bucket-elimination + per-canonical-per-location-cluster-prevalence-canonical-vs-portfolio-baseline + per-canonical-rolling-cluster-drift + per-canonical-cluster-cause-classification + per-canonical-cluster-to-save-flow-routing.

How does per-portfolio per-canonical-multi-source-text-ingestion work?

Per-portfolio per-canonical-multi-source-text-ingestion runs per-portfolio per-canonical-multi-select-radio-on-cancel-flow (per-presented-reason-list + per-other-text-input + per-checkbox-multi-select per-canonical-multi-select) + per-canonical-free-text-on-cancel-flow (per-textarea + per-character-limit-1000 per-canonical-free-text) + per-canonical-NPS-style-rating-on-cancel (per-0-to-10-scale + per-5-star-rating per-canonical-NPS) + per-canonical-post-cancel-email-survey (per-day-3-survey + per-day-7-survey + per-day-30-survey per-canonical-email-survey) + per-canonical-voice-call-transcript-on-cancel (per-Twilio-Voice-recording + per-RingCentral-recording + per-Whisper-transcription + per-Deepgram-transcription + per-AssemblyAI-transcription per-canonical-voice-transcript) + per-canonical-chat-transcript-on-cancel (per-Intercom + per-Zendesk-Chat + per-Drift + per-Crisp + per-Tidio per-canonical-chat-transcript) + per-canonical-text-normalization (per-Unicode-NFC + per-lowercase + per-stop-word-removal + per-spelling-correction + per-language-detection per-canonical-normalization).

What does per-portfolio per-canonical-LLM-embedding-clustering + per-cluster-naming do?

Per-portfolio per-canonical-LLM-embedding-clustering runs per-portfolio per-canonical-embedding-vendor (per-OpenAI-text-embedding-3-small + per-OpenAI-text-embedding-3-large + per-Anthropic-Claude-embeddings + per-Cohere-embed-v3 + per-Voyage-AI-voyage-3 + per-Google-Vertex-AI-textembedding-gecko per-canonical-embedding-vendor) + per-canonical-embedding-dimensionality (per-512-dim + per-1536-dim + per-3072-dim per-canonical-dimensionality) + per-canonical-similarity-metric (per-cosine-distance + per-Euclidean-distance + per-Manhattan-distance per-canonical-similarity) + per-canonical-clustering-algorithm (per-HDBSCAN-density-based + per-DBSCAN-density-based + per-K-means + per-Latent-Dirichlet-Allocation-topic-model + per-Agglomerative-Hierarchical + per-Gaussian-Mixture-Model per-canonical-clustering) + per-canonical-UMAP-dimensionality-reduction-before-clustering + per-canonical-optimal-cluster-count-via-silhouette-score + per-canonical-cluster-naming (per-LLM-cluster-summarization-via-GPT-4 + per-Claude-cluster-summarization + per-cluster-keyword-extraction-via-TF-IDF + per-cluster-keyword-extraction-via-KeyBERT + per-human-in-loop-cluster-review + per-cluster-name-revision per-canonical-cluster-naming).

What does per-portfolio per-canonical-other-bucket-elimination + per-per-location-cluster-prevalence + per-cluster-drift do?

Per-portfolio per-canonical-other-bucket-elimination runs per-portfolio per-canonical-pre-clustering-other-bucket-text + per-canonical-other-bucket-text-LLM-classification + per-canonical-confidence-threshold-spec (per-90-percent-confidence-assign-to-existing-cluster + per-60-percent-confidence-flag-for-human-review + per-below-60-percent-create-new-cluster per-canonical-confidence-threshold) + per-canonical-post-clustering-other-bucket-percentage-target (per-target-under-5-percent-other-bucket). Per-canonical-per-location-cluster-prevalence runs per-portfolio per-canonical-per-location-per-cluster-cancellation-count + per-canonical-per-location-per-cluster-cancellation-percentage + per-canonical-per-location-per-cluster-cancellation-vs-portfolio-baseline + per-canonical-per-location-cluster-prevalence-percentile-ranking. Per-canonical-rolling-cluster-drift runs per-portfolio per-canonical-rolling-30-day-cluster-prevalence + per-canonical-rolling-90-day-cluster-prevalence + per-canonical-rolling-365-day-cluster-prevalence + per-canonical-cluster-drift-Mann-Kendall-trend-test + per-canonical-cluster-emergence-detection + per-canonical-cluster-disappearance-detection.

What does per-portfolio per-canonical-cluster-cause-classification + per-cluster-to-save-flow-routing + per-subscription-management-agent-canonical-bundle do?

Per-portfolio per-canonical-cluster-cause-classification runs per-portfolio per-canonical-per-cluster-cause-spec (per-pricing-objection + per-product-quality-issue + per-CS-experience-issue + per-shipping-delivery-issue + per-life-event-cancellation + per-competitive-switch + per-temporary-pause-intent + per-account-management-friction + per-content-quality-issue + per-onboarding-failure per-canonical-cluster-cause) + per-canonical-cause-confidence-scoring + per-canonical-cause-to-save-flow-routing (per-pricing-objection-routes-to-discount-save-flow + per-shipping-issue-routes-to-shipping-credit-save-flow + per-life-event-routes-to-pause-save-flow + per-content-quality-routes-to-content-recommendation-save-flow per-canonical-cause-routing) + per-canonical-cause-to-product-team-feedback-routing. Per-subscription-management-agent-canonical-bundle integrates the cancellation-reason-cluster-analytics skill with sibling skills on the same agent: per-canonical-churn-prediction (sibling, provides upstream churn-prediction substrate for proactive intervention) + per-canonical-save-flow-propensity-scoring (sibling, downstream consumer of cluster-cause for save-flow routing) + per-canonical-subscriber-lifecycle-cadence (sibling, consumer of cluster-cause for content-cadence adjustment) + per-canonical-subscription-analytics (sibling, consumer of cluster-prevalence for portfolio-level reporting).

Engage the subscription-management agent

Per-portfolio per-canonical-multi-source-text-ingestion + per-LLM-embedding-clustering + per-cluster-naming + per-other-bucket-elimination + per-per-location-cluster-prevalence + per-rolling-cluster-drift + per-cluster-cause-classification + per-cluster-to-save-flow-routing + per-cluster-to-product-team-feedback-routing + per-portfolio audit-trail shipped as the orchestration layer above your existing per-exit-survey-vendor + per-LLM-embedding + per-subscription-management-software + per-clustering-platform primitive.