Abstract
Sentiment Balance Score (SBS) is a topic-level, polarity-balanced metric derived from open-ended feedback. SBS converts unstructured comments into standardized aspect/topic mentions, assigns aspect-level polarity, applies confidence and intensity weighting, stabilizes estimates for low-volume topics using Bayesian smoothing, and enables peer-cohort benchmarking and trend/volatility analysis. This document defines the end-to-end pipeline and provides implementable formulas and data structures.
1. Data Model and Notation
Let each written comment be a document with metadata:
- builder (or entity)
- timestamp
The pipeline transforms each comment into a set of mentions (aka aspect annotations). A mention i is a record:
Where:
- : builder/entity id
- : time bucket (e.g., day/week/month)
- : canonical topic label (e.g.,
paint_quality) - : polarity (negative, neutral, positive)
- : intensity/strength (how strong the sentiment is)
- : model confidence (probability-like; calibration discussed later)
A mention has a derived weight:
2. End-to-End Pipeline (Per Comment)
Each comment goes through deterministic stages. You can implement this as an event-driven pipeline.
Step 0 – Ingest
Store raw comment text and metadata:
comment_id,builder_id,created_at,text,source, etc.
Step 1 – Preprocessing
Recommended (not strictly required):
- language detection
- sentence segmentation
- PII masking (optional, governance-driven)
Step 2 – Aspect Extraction (Topic Discovery)
Goal: identify spans that correspond to “what the person is talking about.”
Output: a set of extracted aspect spans:
Each aspect span should include:
- raw span text
- candidate topic label(s)
- evidence / anchor phrase boundaries (if your model provides them)
Implementation notes
- You can do this with LLM structured extraction, ABSA models, or hybrid (LLM + taxonomy mapping).
- A taxonomy constraint improves standardization (critical for benchmarking).
Step 3 – Aspect-Level Sentiment (No Inheritance)
For each extracted aspect , predict:
- polarity
- intensity
- confidence
This avoids the inheritance error where a comment-level label is applied to all topics.
Mixed sentiment
If a span expresses mixed sentiment (rare but real), options:
- split into two mentions (preferred)
- or assign and track a
mixedflag
Step 4 – Canonical Topic Mapping
Map each aspect’s free-text label to a canonical topic .
Two-stage mapping recommended:
- alias dictionary / rules (deterministic)
- embedding similarity to canonical topic embeddings with thresholding
Unmatched topics go to OTHER for later curation.
Step 5 – Persist Mentions
Insert mention rows:
- polarity
- intensity
- confidence
- weight
- canonical topic
- builder/time metadata
3. Topic-Level Aggregation
For a given builder , topic , time window , aggregate weighted “votes”:
Define weighted voting mass:
Neutral is tracked but non-voting by default (analogous to NPS passives). You can optionally include it for other diagnostics (e.g., “clarity/decisiveness”).
4. Core SBS Metric
4.1 Raw (unsmoothed) SBS
If :
Range:
If , SBS is undefined; return null and show “insufficient voting signal.”
This is the closest analogue to “%positive − %negative” in the two-class voting universe.
4.2 Bayesian-Smoothed SBS (recommended)
Low-volume topics produce unstable rates. Stabilize using Beta priors.
Interpret and as weighted pseudo-counts. Use a symmetric prior by default:
Smoothed positive and negative rates:
Then:
Equivalent closed form:
With symmetric priors , this becomes:
Choosing priors
- : Laplace smoothing (simple, common)
- : “k pseudo-votes” stabilizer. Pick based on desired damping.
5. Confidence and Uncertainty
You should report SBS alongside a confidence measure.
5.1 Vote-Mass Confidence (simple, executive-friendly)
Map to :
sets the scale of stability. Example: .
5.2 Credible Interval (statistician-friendly)
Let the posterior for positive rate be:
A credible interval for yields an interval for SBS via transformation:
Compute:
Then:
6. Benchmarking Methodology
Compute SBS for peer cohorts.
Let cohort be defined by filters (region, price tier, product line, delivery model, etc.). Aggregate across all builders :
Compute cohort SBS the same way:
Then define benchmark delta:
Percentiles
For each topic in cohort , compute the empirical distribution of and report:
- percentile rank
- quartiles
- z-score (optional, though SBS isn’t guaranteed normal)
7. Trend and Volatility
Compute SBS over rolling windows (e.g., monthly). Let be time series.
Trend slope
Fit OLS on last periods:
Report:
- slope (points/month)
- (signal strength)
Volatility
Compute standard deviation of SBS or of the posterior mean over last periods:
High volatility + low confidence often indicates insufficient volume or operational instability.
8. Programmatic Implementation Notes
8.1 Storage
Maintain:
mentionstable (atomic)- rollup table keyed by (builder, topic, time bucket):
- pos_weight_sum
- neg_weight_sum
- neutral_weight_sum
- vote_weight_sum
- counts
- model/prompt versions for reproducibility
8.2 Idempotency
Store comment_processing keyed by:
- comment_id
- model_version
- prompt_version
and only process once per version.
8.3 Topic Governance
Maintain:
- topic taxonomy
- alias mappings
- “OTHER queue” review process
- periodic embedding re-indexing
8.4 Calibration
Confidence should ideally be calibrated (temperature scaling or isotonic regression) against a labeled validation set. If not, treat it as relative and keep a QA program that measures drift.
9. Summary of the SBS Definition
Atomic unit: aspect mention with
Weight:
Aggregates:
Smoothed SBS:
Benchmark delta:
Confidence:
