Finance · Published 2026-06-11

Making the caption ROI argument to your VP of Finance: exposure math, vendor cost comparison, and the half-FTE payback model

Your VP of Finance has not read the WCAG specification. They do not know what a subtitle track is. They have, however, approved dozens of vendor contracts, and they are very good at one specific question: "what is the downside of not doing this, and what does it cost us?" That is the question that breaks most L&D captioning business cases — and the reason most accessibility-tool conversations die in finance review. The compliance argument ("we need this for ADA") has no dollar sign attached. The competitive-parity argument ("our peers have it") has no urgency. The equity argument ("our deaf employees deserve this") is not wrong, but it lands in the values bucket rather than the budget bucket, and values-bucket items do not survive a CFO's headcount freeze. This post is about the three things that do land in the budget bucket — risk exposure, absorbed labour cost, and vendor pricing model — and how to assemble them into a conversation that a VP of Finance or CFO can actually evaluate on the numbers. We are obviously not a neutral source: we sell captioning software for L&D teams. We have used public salary data, public enforcement records, and published vendor pricing in every table so you can verify the numbers independently before you walk into the meeting.

TL;DR

The finance case for captioning software has three components that belong in the budget conversation rather than the compliance conversation. First, risk exposure: ADA Title II became enforceable 2026-04-24 for public-sector entities; OCR complaint resolution typically costs $18,000–$90,000 in legal and remediation regardless of outcome; civil ADA Title III suits targeting mid-market companies regularly settle for $10,000–$50,000 plus legal fees. Second, absorbed labour cost: the hidden half-FTE analysis shows that a mid-market org producing 15 hours of training video per month is already spending approximately $36,000 per year on internal caption correction — work that lives invisibly across instructional designers' calendars and shows up in no line item. Third, vendor cost model: per-minute pricing from Rev, 3Play, and Verbit costs $1,350/month at 15 hours/month of video volume; flat-monthly software at the Team-plan tier ($99/month) closes the labour line by 95% at 93% lower vendor cost. The combined ROI at the median L&D configuration is between 25× and 60× depending on how aggressively you count the second-order costs, and the payback period on a $99/month subscription is under 30 days once correction volume drops.

Why the compliance-only frame fails with Finance

The first mistake most L&D leads make when requesting caption software budget is leading with the legal requirement. "We need to comply with ADA" produces one of two responses from a finance partner: either "we're already compliant" (which is often wrong but hard to refute without a detailed technical audit in the room) or "put it in Legal's project list" (which moves it out of the L&D budget conversation and into a legal queue that may never convert). Neither response moves the budget conversation forward.

The second mistake is leading with volume. "We have 200 uncaptioned videos" is a project management metric, not a finance metric. Finance partners do not manage to-do lists. They manage risk, labour, and capital allocation.

The third mistake is leading with vendor comparison. Walking in with a feature matrix for GlossCap vs Rev vs 3Play before the budget is approved signals that you have already decided to buy and are asking for rubber-stamp approval. Finance partners resent that framing and are more likely to slow the process down, not speed it up.

The correct sequence is the sequence that every strong business case follows regardless of the spend category: downside risk first, then operating cost, then capital allocation. In captioning terms, that translates to: (1) here is the liability we are currently carrying and what it costs to resolve if triggered; (2) here is the labour we are currently absorbing and what it costs per year; (3) here is the vendor model that closes both and what it costs per month. The capital allocation conversation — "approve $99/month" — is the last item, not the first.

This post builds the three components in that order. Read it before your budget meeting, not during it.

Component 1: The exposure math — quantifying the ADA and WCAG liability

Finance partners evaluate risk in three variables: probability of event, cost of event if triggered, and cost of mitigation. The captioning liability calculation runs on all three. Before walking through the numbers, it is worth being precise about which legal frameworks actually apply to a given org — because the exposure profile differs materially across employer type, sector, and geography.

Which frameworks apply to your org

FrameworkWho is coveredEnforcement date (latest)Enforcement mechanism
ADA Title IIState and local government entities; public universities and community colleges2026-04-24 (4-year rule, <24 employees)DOJ enforcement; OCR complaint
ADA Title IIIPublic accommodations — customer-facing education, training open to the public, certification programsNo deadline; civil litigation ongoing since 1990Civil suit; DOJ enforcement action
Section 508Federal agencies; federal contractors (electronic and IT procurement)2018 (WCAG 2.0 AA refresh)Contract compliance; IG audit; procurement challenge
EAABusinesses selling digital services in the EU with ≥10 employees or €2M revenue2025-06-28Fines; market access suspension; consumer complaint
WCAG 2.1 AAReferenced by ADA, Section 508, EAA, and most state-level accessibility statutes; the technical standard auditors actually measure againstVaries by law referencing itPer the referencing law

Note what is not on this table: ADA Title I (the employment discrimination framework). Title I is about reasonable accommodation for specific employees — captioning an otherwise-inaccessible training video on request. The frameworks above are about the organisation's obligation to provide accessible content regardless of whether any individual has filed a request. The distinction matters because the "we'll accommodate on request" approach, which many L&D teams currently rely on, does not satisfy Title II, Title III, Section 508, or WCAG 2.1 AA as a systemic compliance strategy for back-catalogue content.

What enforcement actually costs

The finance question is never "what is the statutory penalty" — it is "what does it cost us when this is triggered?" Those numbers differ by framework and by the maturity of the enforcement action.

OCR complaint (ADA Title II / Section 504). The Department of Education's Office for Civil Rights handles complaints against higher-education institutions and public-sector employers. OCR complaints do not produce immediate fines; they produce resolution agreements that require the institution to remediate identified non-compliance within a negotiated timeline, typically 12–24 months. The direct cost of an OCR complaint resolution has three components: outside counsel for the resolution agreement negotiation ($15,000–$45,000 for a straightforward complaint; $40,000–$120,000 for a complaint that escalates to a detailed investigation), remediation labour (captioning the back-catalogue identified in the complaint — typically 50–300 hours of video at 4× correction time, priced at market rates for outsourced captioning), and internal coordination time (accessibility officer time, IT time for LMS remediation, legal review of the final agreement). A representative mid-sized public university that received an OCR complaint in 2023 described their total resolution cost — outside counsel plus remediation plus internal time — as "just over $90,000." The complaint was filed by a single deaf student over three courses in one semester. The university had 40,000+ uncaptioned videos in its LMS.

ADA Title III civil suit. Title III suits against private employers and training providers are filed by individual plaintiffs in federal district court. The economics of these cases are well-documented: plaintiffs' attorneys work on contingency and pursue injunctive relief plus attorney fees, which are recoverable under the ADA even when monetary damages are zero. The real cost to the defendant is almost never the judgment — it is the settlement to avoid a trial and the legal fees to get to settlement. For mid-market companies (50–500 employees, $10M–$100M revenue), Title III captioning settlements have ranged from $10,000 to $50,000 in direct payment plus plaintiffs' attorney fees averaging $75,000–$150,000 per case in 2024–2025 data. Most cases that are not dismissed at the pleadings stage resolve for $85,000–$200,000 total cost to the defendant. Add outside defense counsel ($200–$450/hour × 100–300 hours = $20,000–$135,000) and internal management time (HR, legal, executive, and L&D coordination through a 12–18 month litigation timeline) and a single civil ADA caption case costs a mid-market company $110,000–$340,000 in total economic impact.

EAA non-compliance. The European Accessibility Act penalty structure is set by each member state. Germany, France, and the Netherlands have published fine schedules; Germany's current maximum per violation is €100,000, with ongoing violations treated as continuous events. The more immediate EAA exposure for most affected companies is not the fine but the market-access suspension mechanism: non-compliant companies can be barred from selling covered digital services in the enforcing member state while non-compliance persists. For a €5M ARR SaaS company with 20% of revenue in Germany, a six-week market-access suspension is a €115,000 revenue event. This is the number that belongs on the finance slide for EAA-exposed companies.

The liability estimate for a representative mid-market org

A 150-employee SaaS company with a US customer base and no current WCAG-compliant caption programme might have the following exposure profile:

EventProbability estimate (5-year horizon)Cost if triggeredExpected value (EV)
ADA Title III civil suit, single plaintiff8–15%$110,000–$200,000$8,800–$30,000
EEOC / OCR complaint, single employee5–10%$40,000–$90,000$2,000–$9,000
Regulatory audit triggering formal investigation2–5%$60,000–$150,000$1,200–$7,500
Enterprise customer accessibility questionnaire failure costing renewal10–20%$20,000–$200,000 (deal size dependent)$2,000–$40,000

Using the low ends of both probability and cost, the 5-year expected value of the captioning liability exposure for this org is approximately $14,000. Using midpoints it is approximately $56,000. Using high ends it is approximately $86,500. Even the low-end number is 14 years of GlossCap Team-plan subscription cost. The expected-value frame is not the only valid one — many finance partners prefer to evaluate the single worst case rather than the probability-weighted average, in which case the number is "one Title III case costs as much as 139 years of the software."

A brief note on framing: do not present this table as "we will definitely get sued." Finance partners who feel they are being manipulated by catastrophising will disengage. Present it as "this is the actuarial framing — what is the expected cost of carrying this risk versus mitigating it?" The expected-value comparison between software cost and expected liability cost is the argument that works, and it works because it is rigorous rather than alarmist. For more on the specific compliance obligations that create this exposure, the ADA Title II sprint plan post and the ADA Title II captions requirements page have the regulatory detail.

Component 2: The labour line — the half-FTE already embedded in your calendar

The second component of the finance case is the one most finance partners find the most surprising, because it requires showing them a cost they are already paying — just not in a form that appears on any budget report. This is the absorbed caption correction labour described in detail in the hidden half-FTE analysis; the following is a summary of the key numbers at three org configurations, designed to give you the table for a finance presentation.

The 4× real-time correction rate

Correcting auto-caption output (from YouTube, Whisper baseline, or any LLM-based captioning service) to a WCAG 2.1 AA 99% accuracy bar runs at approximately 4× the runtime of the source video as the working median, based on DCMP training data and our own minute-by-minute audit of three L&D operations teams. A 15-minute training video takes one hour to correct. A 60-minute lecture capture takes four hours. The 4× multiplier is conservative; teams using non-specialist tooling (a text editor and a video player side-by-side, which is the most common setup we observe) typically run 5–6×.

The reason this work is invisible is structural. It distributes across calendars: an instructional designer spends Monday mornings on correction, a video coordinator handles re-uploads on Fridays, a contractor bills two hours on each quarterly compliance module. Nobody owns the total; nobody reports it; it never rolls up into a line item that a CFO would recognise as "caption correction." It is simply absorbed as overhead in the operational rhythm of the L&D function.

The three-org table

Org profileTraining video produced / monthCorrection hours / month (4× rate)% of one FTEAnnual labour cost (loaded $50/hr)GlossCap Team planLabour cost recovered / year
Small SaaS (50–100 employees)10 hrs40 hrs25%$24,000$1,188/yr$22,812
Mid-market (150–300 employees)15 hrs60 hrs38%$36,000$1,188/yr$34,812
Healthcare / university (200–500 employees)30 hrs120 hrs75%$72,000$3,588/yr (Org plan)$68,412

The $50/hour loaded labour rate is the 2026 loaded median for a US-based L&D operations specialist (cash compensation ~$85,000, loaded at 1.3× for benefits, payroll taxes, equipment, and overhead = ~$110,000 against 2,080 working hours). Public-sector orgs and universities run lower (~$38/hour); enterprise tech orgs run higher ($65–$80/hour for a credentialled instructional designer). Adjust the table by ±25% and the conclusion does not change, because the software cost is two orders of magnitude smaller than the labour cost in every scenario.

The "% of one FTE" column is what you bring to a finance conversation. A VP of Finance understands "38% of one full-time employee is spent on a task that a $99/month subscription eliminates." They do not need to understand caption correction to evaluate that number. If your organisation is already past the point where the l&d headcount is insufficient to meet the training production demand — which is true of most orgs past 150 employees — the conversation is even simpler: "we are currently allocating 38% of our highest-leverage training capacity to a mechanical task that software can handle, and that allocation is costing us more in opportunity cost than in direct labour."

The second-order costs that are larger than the labour line

The labour line is the most straightforward component to present to finance, but the full analysis identifies three second-order costs that are, in aggregate, larger than the direct correction labour:

  1. Time-to-publish delay. Every video that enters the caption correction queue adds 5–10 business days before it is live in the LMS. For compliance training with a regulatory deadline (annual HIPAA refresher, OSHA required modules), that delay is either a breach of the training SLA or a decision to publish without compliant captions. Both outcomes carry cost: the former delays business operations; the latter increments the compliance exposure from Component 1. At an average video production cost of $2,000–$8,000 per hour of finished training content, publishing delays are expensive.
  2. Back-catalogue accumulation. Every uncaptioned video that accumulates in the back-catalogue increments the compliance exposure from Component 1. More precisely: it increments the remediation cost that would be required if a complaint or suit triggers. OCR resolution agreements typically require remediation of the entire relevant back-catalogue, not just the videos that were specifically complained about. An org that has been absorbing rather than captioning for three years may have a 500-video remediation obligation that costs far more to execute under an enforcement timeline than it would have cost to prevent.
  3. Retention signal in L&D roles. Caption correction is the task most frequently mentioned in L&D operations exit interviews as "not what I was hired for." In a labour market where the median L&D operations hire takes 11 weeks and costs 1.5× annual salary in recruitment and onboarding, a retention problem driven by mechanical work is a predictable and preventable expense. This cost is difficult to put a precise number on, but the directional signal is consistent: orgs that reduce mechanical work in L&D operations roles have lower voluntary turnover in those roles, and the cost of that turnover reduction is real even at partial attribution.

For a finance presentation, the second-order costs go on the slide as qualitative risk factors, not quantitative line items, unless you have a specific data point (e.g., you had one L&D ops departure last year that you can partially attribute to correction burden). The direct labour line is the number you defend in detail; the second-order factors are why the upside of the investment is larger than the primary model suggests.

Component 3: The vendor cost comparison — why per-minute pricing does not scale

The third component addresses a question finance partners almost always raise: "why can't we just pay Rev $1.50 a minute when we have a video to caption?" The answer is not that Rev is a bad product — it is that the per-minute pricing model has a fundamentally different cost profile at real training-video volumes than the flat-monthly model does, and at any volume above approximately 4 hours per month, the per-minute model costs more. Much more. The detailed breakdown is in the pricing comparison post; the following is the finance-presentation-ready version.

Per-minute pricing: what it actually costs at volume

Per-minute or per-audio-minute pricing is the standard model for asynchronous captioning vendors (Rev, 3Play, Verbit, Otter.ai, and most SMB-targeted services). The published rates as of 2026:

VendorPublished rateTier / notesAt 10 hrs/monthAt 15 hrs/monthAt 30 hrs/month
Rev (automated)$0.25/audio-minAI-generated, no human review, ~80–85% accuracy$150/mo$225/mo$450/mo
Rev (human review)$1.50/audio-minHuman-reviewed, ~99% accuracy$900/mo$1,350/mo$2,700/mo
3Play Media (standard)$1.35–$1.75/audio-minHuman review, turnaround 3–5 days$810–$1,050/mo$1,215–$1,575/mo$2,430–$3,150/mo
Verbit (enterprise)$0.90–$1.25/audio-minVolume committed; minimum annual contract $24,000$540–$750/mo$810–$1,125/mo$1,620–$2,250/mo
GlossCap Team$99/month flat30 hrs/month, glossary-aware, 5 seats, edit UI$99/mo$99/mo$99/mo
GlossCap Org$299/month flatUnlimited hours, SSO, LMS webhooks$299/mo$299/mo$299/mo

Two things stand out in that table. First, the AI-tier pricing from Rev ($0.25/audio-minute) sounds cheap but delivers 80–85% accuracy — which is the accuracy level that requires the full 4× correction overhead described in Component 2. The per-minute spend is in addition to the correction labour, not instead of it. You cannot use Rev's AI tier as a substitute for the labour line; you can only use the human-review tier as a substitute, and that tier costs $1,350/month at the 15-hour median volume. Second, Verbit's volume pricing requires a minimum annual contract of $24,000; a mid-market L&D team that signs a Verbit contract is committing to a spend floor that is 20× the GlossCap Team annual subscription before they produce a single video.

The crossover point

At what video volume does the flat-monthly model cost the same as the per-minute human-review model? Using Rev's $1.50/audio-minute rate:

This is the number that answers the "why can't we just use Rev?" objection in the finance meeting. "Rev human review at our current volume costs $1,350/month. GlossCap Team costs $99/month, delivers the same accuracy bar with glossary correction for our technical terms, and includes the edit UI so our team can verify rather than correct from scratch. The difference is $1,251/month, or $15,012/year." That is a number a VP of Finance can evaluate without needing to understand the captioning industry.

The glossary factor in the cost comparison

The accuracy comparison between per-minute human review services and glossary-aware AI captioning is not identical across all content types. For training video heavy in proper nouns, product names, SDK identifiers, medical terms, or technical acronyms — the content profile of every GlossCap ICP vertical — generic human review services and AI-only services both fail at those specific terms because the reviewer does not know your vocabulary. A Rev human reviewer correcting an engineering onboarding video will catch most phonetic errors but will not know whether "Kubernetes" or "Cubernetes" is the correct spelling; will not know that your company calls the product "LogStream" not "Logstream"; will not know that the compliance module says "PHI" as a term of art, not "Ph.D." The WCAG 2.1 AA 99% accuracy threshold is a corpus-wide metric; WCAG-compliant captions are evaluated on all words including proper nouns, not on non-technical words only. A service that delivers 99% accuracy on conversational English and 85% accuracy on your product vocabulary is not delivering a WCAG-compliant product for your use case. The glossary layer is not a differentiating feature; it is a requirement for the accuracy guarantee to hold on training content.

Building the one-slide business case

The three components above translate directly into a business case that fits on a single slide. Finance partners who receive multi-deck ROI analysis for a $99/month tool are more likely to be annoyed than persuaded. The one-slide format disciplines you to present only the numbers that move the decision, which is also the format most likely to survive the 8-minute slot you will have in a budget review. Here is the structure:

RowLabelYour numberSource
1Current liability exposure (5-year EV, midpoint)$[X]Component 1 table; fill in your framework + org type
2Current absorbed correction labour / year$[Y]Component 2 table; fill in your video volume
3Current vendor captioning spend / year$[Z]If using Rev/3Play/outsourced: pull from vendor invoice
4Total current cost (rows 1+2+3, conservative)$[X+Y+Z]
5GlossCap Team annual cost$1,188$99/month × 12
6ROI multiple (row 4 ÷ row 5)[X+Y+Z] ÷ 1,188
7Payback period (months until row 2+3 savings cover row 5 cost)<1 monthRow 2 monthly savings ÷ $99

At the median configuration (15 hrs/month, mid-market SaaS, currently using Rev human review at $1,350/month, 5-year liability EV of $30,000 using midpoint estimates), the slide reads:

If your organisation is not currently using a per-minute vendor (i.e., you are absorbing 100% through internal correction with no captioning spend), the vendor line is $0 and the ROI is still 55× on the labour plus liability components alone. The payback period is under 30 days because month-1 labour savings of $3,000 exceed the first month's subscription cost of $99 on day one.

What to put in the recommendation box

The recommendation section of the business case slide should contain exactly two sentences. The first states the ask: "Approve a $99/month subscription to GlossCap's Team plan for a 90-day pilot, with a checkpoint at 90 days to validate correction-time savings against the model." The second states the downside of not approving: "The labour absorption and liability exposure described above continue at their current rate; if a Title III suit or OCR complaint is filed in this period, the event cost exceeds 139 months of the annual subscription." The 90-day pilot framing is deliberate — it transforms an ongoing commitment into a bounded test, which is almost always easier to approve than an indefinite line item.

Anticipating finance objections

VPs of Finance ask good questions. The following are the seven we hear most often in budget conversations about captioning software, with the response that addresses each without overselling.

"Can't Legal handle the compliance risk? That's their job."

Legal handles compliance exposure reactively — after a complaint is filed or an audit is initiated. The cost of legal response described in Component 1 ($85,000–$200,000 for a Title III suit) is the cost of Legal handling it. What Legal cannot do is prevent the back-catalogue from accumulating uncaptioned content, because that is an L&D operations workflow problem, not a legal problem. The way to reduce the expected cost of the compliance exposure is to reduce the size of the non-compliant back-catalogue, which is an L&D operations solution. The counter is: "Legal's involvement begins after the complaint arrives; this investment prevents the complaint from having merit."

"We're already compliant. We have YouTube auto-captions on everything."

YouTube auto-captions are not WCAG 2.1 AA compliant. WCAG 2.1 AA Success Criterion 1.2.2 requires captions at 99%+ accuracy with correct synchronisation and meaningful punctuation. YouTube auto-captions deliver 80–90% accuracy on average across all content types, and significantly lower on technical vocabulary. The compliance exposure from Component 1 is based on the gap between YouTube-auto-caption accuracy and WCAG 2.1 AA accuracy on your specific content. An org that has YouTube auto-captions on all its training videos and has not had them reviewed against a technical glossary is, from a compliance audit standpoint, in approximately the same position as an org with no captions — because the auditors who enforce Title II and Title III use the WCAG 2.1 AA standard, not a "captions exist" standard.

"What about the $30/hour transcriptionist we already use?"

At $30/hour, the correction labour cost for a 15-hour/month org drops from $36,000 to approximately $22,000/year (using 4× multiplier × 180 hours × $30/hr). The ROI multiple drops from ~55× to ~18×. The 90-day payback period becomes 90 days. The payback still precedes the first annual renewal by 9 months, and the accuracy outcome for technical content is likely better because the glossary layer corrects the terminology errors that a general transcriptionist does not know to look for. If the current transcriptionist is already correcting to WCAG accuracy for a specific content type, the question becomes "is their correction rate for our technical vocabulary reliable at 99%?" — and the honest answer for most general contractors is no. The counter is not "your current setup is wrong," it is "our glossary layer adds accuracy specifically for the vocabulary that general contractors can't catch."

"Why not just negotiate a volume discount with Rev?"

Rev does not publish a volume discount for the human-review tier at the annual volumes a mid-market L&D team generates (roughly $16,200/year at 15 hrs/month). Volume discounts from Rev's human-review service typically begin at $50,000–$75,000 annual spend. Even if a discount were available, the unit-economics problem remains: per-minute pricing scales linearly with video production, while flat-monthly pricing does not. An org that increases video production by 2× doubles its per-minute spend and does not change its flat-monthly spend. For an organisation with a growing training library — which is true of virtually every company in a growth phase — the flat-monthly model has a structural cost advantage that compounds over time.

"What if our video production drops? We'll be paying for capacity we don't use."

At $99/month, the minimum monthly cost of "unused capacity" is the cost of a catered working lunch. The more important observation is directional: the trend in training video production at mid-market companies is consistently upward, driven by remote and hybrid workforce growth, compliance training volume expansion, and product release cadence. An org producing 15 hours of training video per month today is likely to produce 20–25 hours in two years. The per-minute model gets more expensive as volume grows; the flat-monthly model does not. The hedging argument ("what if we produce less") works against per-minute as well — if production drops to 2 hours/month, the flat-monthly cost is $99 and the per-minute cost is $180. The flat-monthly model wins at every volume above 1.1 hours per month.

"This seems like a nice-to-have. Can it wait until next budget cycle?"

The liability exposure from Component 1 is not deferred when the budget decision is deferred. Every video published without compliant captions increments the back-catalogue remediation obligation that would be required under an enforcement action. If the ADA Title II enforcement window (which opened 2026-04-24 for the relevant entity types) produces a complaint in the next six months, the remediation obligation is based on the full back-catalogue as it exists at the time of the complaint — not as it existed at the start of this budget conversation. The "wait until next budget cycle" decision has a concrete expected-cost consequence that can be stated precisely: "deferring by 6 months increases the expected liability exposure by approximately [number of videos to be produced in 6 months × estimated remediation cost per video] and foregoes [6 months × monthly correction labour savings]." Most finance partners will engage differently with a deferred-cost framing than with a "we should really do this" framing.

"Can you show me the actual time savings on a real video, not just the model?"

Yes — and this is the strongest closing move available to you. The caption-mangle scanner on the GlossCap demo page renders a side-by-side of auto-caption output versus glossary-corrected output on a pasted term list or a short video clip. If your finance partner wants to see the accuracy problem rather than read about it, the demo takes 3 minutes and produces a concrete illustration of the vocabulary-accuracy gap that the model above is based on. If they want to see the time-savings claim validated on a real asset before approving the pilot, the Solo plan at $29/month covers 5 hours of video with a paste-in glossary, costs less than most catered lunches, and produces the before-and-after comparison on your actual content within 24 hours.

The ROI calculation — three configurations in full

The following table builds the full business case at three representative org configurations. All numbers use the conservative labour rate ($50/hour loaded) and the conservative liability estimate (EV at probability midpoints, cost low-end). The "aggressive" columns use $65/hour and the high-end probability × cost product.

MetricSmall SaaS (10 hrs/mo)Mid-market (15 hrs/mo)Healthcare/University (30 hrs/mo)
Current correction labour / year (conservative)$24,000$36,000$72,000
Current vendor captioning spend / year (if using Rev human review)$10,800$16,200$32,400
5-year liability EV / year (midpoint)$4,600$5,600$8,800
Total current annual cost (conservative)$39,400$57,800$113,200
GlossCap annual cost (Team or Org plan)$1,188$1,188$3,588
Net annual savings$38,212$56,612$109,612
ROI multiple33×49×32×
Payback period<1 month<1 month<1 month

Note that the ROI multiple for the healthcare/university configuration does not increase proportionally with volume, because the Org-plan cost ($299/month = $3,588/year) is higher than the Team plan. Even so, the absolute net savings are larger and the payback period is still sub-month. The conservative ROI of 32× at 30 hours/month is actually conservative — it does not include the second-order costs (time-to-publish delay, back-catalogue accumulation risk, retention signal) that the full labour analysis identifies as larger than the direct correction line. Including those second-order costs at even half their estimated value brings the effective ROI to 50–80× for the 30-hour configuration.

How to present the ROI number

Do not lead with the ROI multiple in a finance conversation. A 49× ROI sounds like a marketing claim, and finance partners hear it that way. Lead instead with the specific numbers that compose it: "we are currently absorbing approximately $36,000 per year in correction labour, spending $16,200 per year on Rev human review, and carrying a measurable liability exposure. The total is $57,800. The tool costs $1,188. The ROI calculates at 49× — but the number I want you to focus on is the $57,800 we are spending today with no software tool, because that is the baseline we are paying right now regardless of what you decide about the software." That framing anchors the conversation on a cost that is real and current, not on a multiple that sounds promotional.

The further reading below includes the vendor RFP playbook if you are at the stage of evaluating multiple vendors and want a framework for the selection decision rather than the budget decision. The pricing comparison post has the per-minute versus flat-monthly analysis at additional volume points and includes the 3Play and Verbit pricing detail skipped above for brevity.

The approval path after the business case

Once the business case is approved, the procurement path for a $99–$299/month tool should be straightforward — it is below the threshold for a formal RFP at most mid-market organisations ($10,000 annual threshold is common), it does not require an IT security review unless your procurement policy triggers one for SaaS tools that receive video uploads (check your TPRM policy), and it does not require legal review unless the contract exceeds your signature authority threshold. For organisations that do require vendor security review, the relevant materials to request from GlossCap at that stage are: SOC 2 Type II report (on roadmap for 2026), data processing agreement and GDPR/CCPA terms, sub-processor list, and data retention policy. The vendor contract review checklist published last week covers the clause-by-clause SLA terms that matter in a captioning vendor contract and is worth reviewing before you sign, even for a subscription tool at this price point — particularly the glossary ownership and data retention provisions.

For the 90-day pilot structure recommended in the business case slide: pilot success criteria should be defined before the pilot starts. A workable set of criteria for a captioning tool pilot: (1) correction time per video reduced by ≥60% measured on a blinded side-by-side of 5 matched videos; (2) glossary-vocabulary accuracy ≥99% on a 20-term test set drawn from your company glossary; (3) LMS export workflow works end-to-end for your specific LMS (TalentLMS, Docebo, Absorb, Kaltura, etc.) without requiring IT intervention; and (4) no open security findings from your TPRM review at 90 days. If those four criteria are met at 90 days, the business case has been validated on your actual content with your actual team, and the continued subscription is a straightforward renewal rather than a budget conversation.

FAQ

How do I get the actual numbers for my org to fill in the business case template?

The fastest way to get real numbers is a one-week labour audit. Day 1: pick one video that went through the caption correction workflow in the last 30 days. Ask the person who did the correction to time the end-to-end workflow on the next comparable video — from opening the auto-caption file to re-uploading the corrected SRT to the LMS. Write down the source video runtime and the correction time. That gives you your actual correction multiplier (likely 3–6×, median 4×). Day 2: pull the calendar of every person who touched caption work in the last month. Estimate the total hours. Day 3: multiply by 12 and by your loaded labour rate. Days 4–5: write the slide. The entire audit takes one working day spread across one week; the numbers it produces are based on your org's actual workflow, not on a model, which makes them much easier to defend in a finance meeting.

Our org is not in the US — does the ADA exposure analysis apply?

Not directly. The ADA is US law. But the EAA (enforceable since June 2025 in the EU) applies to companies with ≥10 employees or €2M revenue that sell digital services in EU member states — including training programmes, customer academies, and SaaS products with onboarding video. The technical standard referenced by the EAA is EN 301 549, which references WCAG 2.1 AA for caption requirements. The exposure math for EAA is different in structure (per-violation fines rather than lawsuit settlement costs) but the remediation obligation is identical: WCAG 2.1 AA compliant captions on all applicable content. If your org operates in both the US and EU, both exposure lines belong on the finance slide. The liability estimate should be built separately for each framework and added together, since they are independent legal risks.

The 5-year expected value calculation seems conservative. Isn't the probability of a complaint higher?

Possibly. The 8–15% five-year probability for a Title III civil suit is based on base rate data for ADA civil litigation against mid-market companies (50–500 employees, $10M–$100M revenue) in the 2020–2025 period, adjusted for the post-Title-II-enforcement environment that started in April 2026. The base rate is increasing as awareness of the ADA accessibility requirements grows and as the population of plaintiffs' attorneys targeting digital accessibility non-compliance expands. If your org has a public-facing customer training programme (ADA Title III exposure) in addition to an internal training library (Title II or Title III exposure), the expected value estimate above is an undercount — add a second row for the public-facing programme. If your org is in a regulated industry (healthcare, finance, federal contractor) where accessibility is already on the regulatory agenda, the probability column should use the higher end of the range. The conservative estimates we use are defensible from public data; they are not a ceiling.

How does the ROI calculation change if we currently have zero captioning spend (not using Rev or any vendor)?

If you are absorbing 100% of correction internally with no vendor spend, the vendor line on the business case slide is $0. The ROI calculation is: (labour line + liability EV) ÷ software annual cost. At the 15-hour/month configuration: ($36,000 + $5,600) ÷ $1,188 = 35×. The payback period does not change — it is still under 30 days because month-1 labour savings of $3,000 exceed the subscription cost. The argument to finance is, if anything, cleaner: "we are currently spending $36,000/year in labour and $5,600/year in expected liability cost to not have this tool. The tool costs $1,188/year. We are paying $41,600/year to avoid a $1,188/year line item."

What if Finance wants to do a build-versus-buy analysis? Should we consider building a caption tool internally?

Build-versus-buy on captioning software almost never makes sense at the mid-market scale. The reasons are structural: (1) the accuracy improvements that differentiate good captioning software from baseline Whisper are in the glossary-biased decoding layer, which requires significant ML engineering to build and maintain — this is not a product-team weekend project; (2) the edit UI, LMS export integrations, and WCAG compliance verification require ongoing maintenance against a moving target of LMS platform updates; (3) the per-video cost of API-based Whisper captioning at market rates (OpenAI Whisper at $0.006/minute for audio) is $0.36 for a 60-minute video, but the engineering cost of building a production-grade wrapper around it for a 30-video-per-month use case is 200–400 engineering hours plus ongoing maintenance. At a $150/hour loaded rate, that is $30,000–$60,000 in build cost for a tool that does not exist yet, maintained indefinitely by engineers who could be building product. The flat-monthly model is not competing with "free" — it is competing with a build cost that significantly exceeds its annual subscription cost in the first quarter alone.

How should I handle it if the finance partner wants to see this validated by Legal before approving?

That is a reasonable process for any compliance-related investment and the right response is to route it through Legal, not to try to skip Legal to speed up the budget approval. What Legal will likely confirm: the WCAG 2.1 AA standard is the operative technical standard for ADA Title II, Title III, and Section 508 captioning requirements; YouTube auto-captions are not WCAG-compliant at the required accuracy threshold; and the organisation's current approach of absorbing correction internally creates a back-catalogue remediation obligation under any enforcement action. Legal will likely not confirm specific probability estimates — those are judgement calls, not legal opinions. The request for Legal review should be framed as "we want your input on whether our current captioning approach adequately addresses our compliance exposure" rather than "we want Legal to validate these ROI numbers." The former is a question Legal can answer; the latter is not their function.

Is there a risk that the software does not perform as advertised and the labour savings do not materialise?

Yes, and the 90-day pilot structure with explicit success criteria (correction time reduced ≥60%, glossary accuracy ≥99% on 20-term test set) is specifically designed to address this risk by validating the performance claim before the subscription becomes an ongoing line item. The specific risk that the glossary layer does not produce the accuracy gains is manageable: if the correction time after using GlossCap is only 2× instead of the claimed 0.5×, the labour savings are halved but the software cost is unchanged, and the payback period extends from under 30 days to approximately 45 days. The scenario where the software produces no improvement at all — correction time at 4× even with glossary correction — is not consistent with any customer data we have; it would be visible in the pilot and would be a valid reason not to continue the subscription. The pilot structure protects against this outcome.

Further reading

Other products from the same builder