Budget Planning · Published 2026-06-14

Three-year caption programme budget model: per-minute vs subscription cost curves, internal labour allocation, the seven cost levers L&D directors control, and how to build the year-1 budget request

There are two distinct financial conversations in captioning programme management, and most L&D directors conflate them to the detriment of both. The first is the ROI business case — the conversation with a Finance partner or VP that justifies why the organisation should invest in caption compliance at all. That conversation, covered in the caption ROI framing guide, requires three components: liability exposure from ADA non-compliance, the absorbed correction labour already embedded in your team's calendar, and vendor pricing that shows flat-monthly beats per-minute at scale. That conversation succeeds or fails on whether you can put a dollar sign on the downside of inaction. The second conversation is the budget planning conversation — the one you have with yourself, your coordinator, and your VP of L&D after the Finance pitch has succeeded. That conversation requires a completely different model: not "why should we spend money" but "exactly how much do we need to put in the budget, what does it cover, and how does it change over three years as video volume grows, glossary accuracy compounds, and the programme matures?" This post is about the second conversation.

The reason these conversations require separate models is that the ROI business case is a one-time justification exercise built to answer a binary question (approve or reject), while the operational budget model is a recurring planning tool built to answer a continuous question (how much, and for what, over the next three years). The ROI model is deliberately simplified — it shows the median scenario, the best-case payback, and the three-line income statement that Finance can evaluate in thirty seconds. The operational budget model is deliberately detailed — it separates vendor costs from internal labour costs from LMS infrastructure costs, models volume growth explicitly, accounts for glossary accuracy compounding effects on correction labour, and produces line-item projections that L&D directors can use to submit budget requests, manage programme spend, and report to leadership on programme economics. The ROI model wins budget approval. The operational budget model is what you use the budget approval for.

Three characteristics of caption programme economics make the multi-year model more important than a simple year-1 cost estimate. First, volume growth: most L&D programmes publishing training video for the first time underestimate how quickly video production scales once the captioning workflow is established. A team producing 60 hours of new content in year one typically produces 90-120 hours by year three as tooling adoption spreads, more business units begin using the platform, and the compliance requirement drives previously informal video production into the LMS workflow. The external vendor costs that look reasonable at year-one volume may look alarming at year-three volume under a per-minute pricing model. Second, glossary accuracy compounding: the caption feedback loop shows that correction labour per video decreases materially over the first 18-24 months of programme operation as the glossary grows and the vendor model adapts to organisational vocabulary. An L&D director who models correction labour as a flat per-video cost across all three years significantly overestimates year-two and year-three labour cost. Third, the one-time vs recurring structure: LMS integration setup costs, coordinator onboarding, and initial glossary construction are year-one costs that do not recur; modelling them as annualised costs produces a budget that overcounts in years two and three.

This post covers the complete three-year budget model for a caption compliance programme: the three budget buckets that every complete budget request must include (external vendor, internal labour, and LMS infrastructure), detailed cost modelling for each bucket with volume-dependent formulas, the per-minute vs subscription cost curve comparison across three years of volume growth, the seven cost levers that L&D directors can adjust when the first budget draft comes back over the target, year-one through year-three budget templates in three org-size tiers, the budget-by-org-size reference guide, eight common failure modes in caption budget planning, and a seven-question FAQ on the decisions that matter most. The hidden half-FTE analysis covers the correction labour cost in depth; the accessibility coordinator playbook covers the coordinator role and scope. This post assembles all three budget buckets into a single operational model.

TL;DR — the six numbers that define your three-year caption budget

  1. External vendor cost follows volume, not time. At 60 hours of new content per year, per-minute pricing (Rev human review at $1.50/min) costs $5,400/year; at 150 hours per year (typical year-three), the same vendor costs $13,500/year. A flat-monthly subscription at $99/month costs $1,188/year regardless of volume growth, making the crossover point irrelevant to the multi-year trajectory: subscription wins at any volume above approximately 13 hours per year.
  2. Internal labour is the largest line item in the first three years. A mid-market team (30 videos/month, 10 minutes average) spending 40 minutes correcting each caption file at $60/hour loaded cost spends $43,200/year on correction labour in year one before glossary compounding reduces that rate. This number appears in no budget request and on no invoice. It is the absorbed cost that justifies programme investment and that glossary-based captioning reduces materially over 18-24 months.
  3. Glossary accuracy compounding reduces correction labour by 30-40% by year three. A programme with a well-maintained organisational glossary sees correction time per video fall from 40 minutes at programme launch to 24-28 minutes by month 18, as the vendor model learns organisational vocabulary through compounding feedback. Budget year-two and year-three correction labour at the compounded rate, not the year-one rate.
  4. LMS infrastructure costs are real, recurring, and universally omitted from first-draft budget requests. Native LMS integration setup runs $1,200-$3,600 (one-time); ongoing IT maintenance averages $1,800-$3,600/year. Platforms without native integration (Cornerstone OnDemand manual upload, some Docebo configurations) add 15-30 minutes of coordinator time per video for file handling — which accrues to the internal labour bucket, not the IT bucket.
  5. The seven cost levers provide 30-60% budget flexibility without reducing compliance outcomes. Volume management, accuracy tier selection, QA sampling rate, glossary investment, LMS configuration, contract structure, and in-house vs full-service split can be adjusted independently. An L&D director facing a budget cut can reduce QA sampling rate on low-risk content and increase glossary investment to maintain accuracy without increasing correction labour.
  6. Year-three total programme cost at mid-market scale is 1.4-1.8× year-one cost in a well-managed programme. In a poorly-planned programme where volume growth is unmanaged and correction labour is modelled as flat, year-three can be 2.5-3.2× year-one cost. The budget model exists to stay on the 1.4-1.8× curve.

Why the operational budget model is different from the ROI business case

The caption ROI framing guide is built around three numbers that a VP of Finance can evaluate without domain knowledge: the liability exposure from non-compliance (a risk metric), the absorbed correction labour already in the budget (an operating cost that exists whether or not caption software is purchased), and the vendor cost differential between per-minute human review and flat-monthly software (a capital allocation decision). Those three numbers, assembled into a seven-row income statement that shows 25-60× ROI and a sub-30-day payback, answer the Finance binary: approve or reject. That is what the ROI business case does.

The operational budget model serves a completely different purpose. It answers the questions that come after budget approval: How much should I request in the next fiscal year's operating budget? How do I allocate that budget across vendor contracts, internal headcount, and IT infrastructure? How will the costs change in year two when video volume grows by 30% and the glossary has 400 more terms? What happens to the budget if we switch LMS platforms mid-year? How do I explain a 20% year-over-year cost increase to my VP when the programme is working correctly? The operational model produces answers to those questions. The ROI model cannot.

Three structural differences between the models

The ROI model uses a median scenario with one volume assumption, one pricing tier, and one labour rate. Precision matters less than persuasiveness — Finance partners respond to the shape of the argument more than the third decimal place of the correction rate. The operational model requires scenario-specific numbers: your actual video publication rate, your actual average video length, your actual loaded compensation cost for the staff who do caption correction, your actual LMS platform's integration overhead. Generic numbers in the operational model produce a budget that misses by 30-40% and generates credibility problems when the actual invoices arrive.

The ROI model collapses multi-year costs into annualised averages. The operational model separates year-one, year-two, and year-three costs explicitly because the mix between buckets shifts materially. Year-one has one-time setup costs (LMS integration, glossary construction, vendor onboarding) that inflate the apparent per-video cost. Year-two sees the glossary compounding effect begin reducing correction labour. Year-three sees volume growth driving external costs up while compounded accuracy holds internal labour growth below volume growth. Annualising these produces a budget that is wrong in all three years: too low in year one, too high in year two, and inadequately planned for year three.

The ROI model assumes the programme is mature and running correctly. The operational model must account for the ramp period — the first 60-90 days when the glossary is thin, correction labour is at its peak, the LMS integration is being debugged, and the QA process is being established. The compliance programme build guide covers the 90-day ramp timeline in detail. Budgeting for a mature programme from day one means running out of budget in month two and having to go back for supplemental approval, which is a worse outcome than building the ramp cost into the year-one request.

Who reads each model

The ROI model has one reader: the Finance partner who holds the budget approval. The operational model has three readers. The first is the L&D director, who uses it to submit the budget request and to manage programme spend month by month. The second is the programme coordinator or accessibility coordinator, who uses it to understand what they are allocated for QA labour, glossary maintenance, and vendor management time. The third is the L&D VP, who uses it to understand why caption programme costs change year over year and to evaluate the programme director's budget management. Each reader needs different detail, but all three need the three-bucket structure with real numbers. A one-line budget item ("captioning: $45,000") satisfies none of them.

The three budget buckets

Every complete caption programme budget contains three distinct cost categories that behave differently over time, respond differently to the seven cost levers, and require different approval paths in most organisations. Conflating them into a single line item produces budgets that are simultaneously approved and unmanageable — approved because the total looks reasonable, unmanageable because nobody knows whether an overrun came from vendor invoices, staff time, or IT integration scope creep.

Bucket 1: External vendor and software costs

External vendor costs are the only costs that appear on an invoice. They include the caption software subscription or per-minute processing fees, any additional seats or volume overages, and API access costs for LMS integration at some pricing tiers. External costs are the most visible budget line and the most frequently used as a proxy for total programme cost — which systematically understates actual programme cost by 40-60% because it excludes Buckets 2 and 3.

Bucket 2: Internal labour costs

Internal labour costs are the most consequential budget line and the one that appears on no invoice. They include caption correction labour (the time instructional designers, L&D coordinators, or dedicated caption reviewers spend correcting machine-generated output), QA spot-check labour (running DCMP-protocol accuracy checks on published caption files), glossary maintenance labour (reviewing correction logs, adding new terms, updating changed terms, and managing term conflicts), and programme coordination overhead (vendor management, LMS integration troubleshooting, accessibility reporting). The hidden half-FTE analysis documents how this labour accumulates invisibly across team calendars and why it is the number that makes the strongest ROI argument when surfaced explicitly.

Bucket 3: LMS infrastructure and IT costs

LMS infrastructure costs include the one-time cost of setting up the API integration between the caption vendor and the LMS, the recurring IT maintenance cost of managing that integration across LMS updates and caption format changes, and any platform-specific costs for caption storage, custom player configuration, or workflow automation. Infrastructure costs are typically the smallest bucket by dollar amount but are universally omitted from first-draft budget requests, which creates budget overruns in the first year and awkward conversations about IT capacity that could have been planned.

Why all three buckets must appear in the budget request

The three-bucket structure matters for a reason beyond accounting completeness. Each bucket has a different approval path and a different cost lever. External vendor costs typically require procurement approval and appear as an operating expense or software subscription. Internal labour costs require headcount planning or allocation from existing FTE and appear as a percentage of existing staff time. Infrastructure costs require IT project prioritisation and may appear as either capex (integration build) or opex (ongoing maintenance). A budget request that presents all three costs as a single number cannot be routed through the correct approval channels — which means some portion of the cost will be discovered mid-year when it arrives without a home, producing either unauthorised work, budget overrun, or programme stall.

Bucket 1: External vendor and software costs

External vendor costs in captioning have two fundamentally different pricing models, and the choice between them determines whether your three-year cost curve grows linearly with volume or remains roughly flat. Understanding both models and when each makes sense is the first analytical step in Bucket 1 planning.

Per-minute pricing model

Per-minute pricing charges per minute of audio processed, regardless of output format, delivery method, or number of videos. The pricing varies significantly by accuracy tier:

Vendor Tier Price/minute Turnaround Notes
Rev Human review $1.50/min 12-24 hours Human transcriptionists; most accurate for general content
Rev AI captions $0.25/min ~5 minutes Machine-only; no glossary customisation
3Play Media Standard $2.00–$2.50/min 24-48 hours Higher accuracy tier with interactive transcript
Verbit Enterprise $1.60–$2.20/min 4-24 hours AI + human hybrid; enterprise SLA
Verbit AI-first $0.30–$0.50/min Near-real-time Machine-only; glossary support varies by plan

The per-minute model has predictable per-video cost and zero subscription commitment. It scales perfectly for irregular video production — a team that publishes 20 videos one month and 5 the next pays proportionally. The problem is the linear growth curve: every 10% increase in video volume produces exactly a 10% increase in external vendor cost. Over a three-year period during which a typical L&D programme grows from 60 to 150 hours of annual content, per-minute cost at Rev human review grows from $5,400 to $13,500 per year — a $8,100/year increase that was nowhere in the year-one budget discussion. See the Rev vs GlossCap comparison for a detailed per-minute vs subscription analysis.

Subscription pricing model

Subscription pricing charges a flat monthly fee for a defined capacity tier, with automatic glossary integration, API access for LMS delivery, and machine-learning compounding included in the base price. The pricing model is fundamentally different: you pay for programme capability, not processing volume.

Plan Monthly Annual (×12) Volume threshold Glossary
GlossCap Starter $29/month $348/year Up to 10 hrs/month Up to 500 terms
GlossCap Team $99/month $1,188/year Up to 50 hrs/month Up to 5,000 terms
GlossCap Enterprise Custom Custom Unlimited Unlimited + dedicated glossary support

The subscription model has a step-function cost curve rather than a linear curve: cost is flat within a tier and jumps at tier thresholds. A team that grows from 10 to 40 hours per month pays the same $99/month for the entire range, then moves to Enterprise pricing when volume exceeds 50 hours per month. This structure heavily favours high-volume programmes and penalises very-low-volume programmes. The minimum meaningful comparison between per-minute and subscription should account for the full three-year volume curve, not just the year-one starting point.

The annual commitment discount

Both pricing models offer annual commitment discounts that affect budget planning. Per-minute vendors typically offer volume commitments with pre-paid blocks at a 10-20% discount — a team that commits to 500 hours per year upfront pays less per minute than one that submits videos on demand. Subscription vendors typically offer 15-25% off the monthly rate for annual prepayment. The catch is that annual commitments introduce volume risk: a team that commits to 500 hours per year and publishes 350 hours has overpaid. Budget requests should model the commitment level conservatively (80-85% of expected volume for pre-paid commitments) and treat the commitment discount as a contingent saving rather than a budget line item.

The vendor contract review checklist covers the specific contract terms that govern volume commitments, overage pricing, and commitment wind-down provisions. Those terms belong in the budget model because overage pricing (typically 150-200% of the base rate at per-minute vendors) can create budget surprises in high-production months.

Backlog remediation as a separate vendor cost

Most caption programme budgets focus on forward content — the new videos published after programme launch. Backlog remediation — captioning the existing uncaptioned library — is a separate external vendor cost that belongs on its own budget line. Backlog costs are typically front-loaded in year one and may dwarf the ongoing forward-content cost. A library of 800 hours of uncaptioned content at Rev human review pricing costs $72,000 to remediate — a cost that should be in the year-one budget line as a one-time item, not averaged into the per-video ongoing cost.

The LMS caption audit methodology covers how to scope the backlog before committing to a remediation budget. Most L&D directors significantly over- or underestimate backlog size before conducting the audit. A $72,000 backlog number that arrives as a surprise mid-year is a programme-credibility problem; a $72,000 backlog number that was in the year-one budget request is evidence of planning competence.

Per-minute vs subscription cost curves over three years

The cost curve comparison between per-minute and subscription pricing changes direction as volume grows. At low volume, per-minute pricing can be cheaper. At moderate and high volume, subscription is almost always cheaper — often by a factor of 4-10×. The crossover point and the trajectory beyond it are what the three-year budget model must capture.

The crossover analysis

At Rev human review pricing of $1.50/minute and GlossCap Team at $99/month, the cost crossover occurs at 66 minutes of processed audio per month. Below 66 minutes/month (approximately 6-7 average-length training videos), per-minute pricing is cheaper. Above 66 minutes/month, subscription is cheaper. Most L&D teams producing formal training content for compliance purposes are well above 66 minutes/month from the first quarter of programme operation.

The crossover matters less than the divergence rate beyond it. At 10 hours/month (600 minutes), Rev human review costs $900/month vs GlossCap Team at $99/month — a $801/month or $9,612/year differential. At 30 hours/month (1,800 minutes), Rev human review costs $2,700/month vs $99/month — a $2,601/month or $31,212/year differential. The differential grows linearly with volume. In a three-year scenario where volume grows from 10 to 30 hours/month, the cumulative per-minute premium over subscription exceeds $60,000.

Three-year cost curve by scenario

The following table models external vendor cost across three years for three representative L&D programme profiles. Volume growth is modelled at 25% per year, which reflects typical adoption curves when a new captioning programme drives previously informal video production into the LMS workflow.

Scenario Year-1 volume Year-1 Rev cost Year-1 sub cost Year-3 volume Year-3 Rev cost Year-3 sub cost
Small (Starter-tier) 5 hrs/month $450/month ($5,400/yr) $29/month ($348/yr) 7.8 hrs/month $702/month ($8,424/yr) $29–99/month ($348–$1,188/yr)
Mid-market (Team-tier) 15 hrs/month $1,350/month ($16,200/yr) $99/month ($1,188/yr) 23.4 hrs/month $2,106/month ($25,272/yr) $99/month ($1,188/yr)
Enterprise 50 hrs/month $4,500/month ($54,000/yr) Custom (~$400–$800/month) 78 hrs/month $7,020/month ($84,240/yr) Custom (~$400–$800/month)

The three-year cumulative differential at mid-market scale: per-minute at Rev human review totals approximately $57,600 over three years as volume grows from 15 to 23.4 hrs/month. Subscription at Team tier totals $3,564 over the same three years. The cumulative differential is $54,036. The comparison is not close at any volume above the crossover point, and the differential compounds as volume grows.

The hidden per-minute cost: machine tier vs human review

Many L&D directors evaluate per-minute pricing on the machine tier (Rev AI at $0.25/min) rather than human review, which produces a dramatically different cost comparison. Machine-tier per-minute at $0.25/min and 15 hrs/month costs $225/month — closer to the subscription price. The problem is that machine-tier captions at 83-89% accuracy require the same correction labour as un-captioned video in many content categories, which means the "cheaper" per-minute option shifts cost from the vendor invoice into Bucket 2 (internal labour). The true cost comparison must include both the external vendor cost and the internal correction labour cost that each pricing tier produces. This is the analysis that makes subscription-plus-glossary the most cost-effective model at virtually every volume: the glossary compounding effect reduces correction labour over time in a way that machine-tier-only per-minute pricing cannot replicate.

See the Rev vs GlossCap comparison and the 3Play vs GlossCap comparison for the complete total-cost-of-ownership analysis including correction labour.

Bucket 2: Internal labour allocation

Internal labour is the budget bucket that does not appear on an invoice, does not require a procurement signature, and is almost never tracked as a programme cost. It is also typically the largest budget line in years one through three. The hidden half-FTE analysis quantifies what this labour costs at industry-standard compensation rates; this section shows how to allocate it in a budget model across four distinct labour categories.

Labour category 1: Caption correction

Caption correction is the largest internal labour category. At machine-first captioning with a thin or non-existent organisational glossary, correction labour runs at approximately 4× real-time: a 10-minute training video requires 40 minutes of correction work to reach 99% DCMP accuracy. That rate is the year-one baseline for most new programmes. The inputs to the annual correction labour calculation are video volume (hours), average video length (minutes), and correction rate (minutes per video minute).

Formula for annual correction labour hours:
Annual hours = (Annual content hours × 60) / avg_video_length_min × (correction_rate_min / 60)

At mid-market scale (15 hrs/month, 10-minute average video length, 4× correction rate):
Annual hours = (180 × 60) / 10 × (40 / 60) = 1,080 videos × 0.667 = 720 hours/year

At a loaded hourly rate of $60/hour (reflecting a mid-market instructional designer or L&D coordinator with salary, benefits, and overhead), 720 annual hours of correction labour costs $43,200/year. This is the year-one baseline before glossary compounding reduces the correction rate.

The correction labour cost is sensitive to two variables that the L&D director controls: the accuracy tier of the caption source (machine-only vs glossary-augmented machine vs human review) and the average video length. Longer videos produce proportionally more correction work per unit of volume; a programme that shifts from 10-minute average to 15-minute average without changing volume produces 50% more correction labour. Video length is rarely considered a cost lever, but it is — and the budget model should track it.

Labour category 2: QA spot-checks

QA spot-checks are the recurring accuracy measurement process described in the caption QA methodology guide. A DCMP-protocol spot-check on a 10-minute video takes approximately 30-45 minutes to run: 10 minutes to read the reference transcript against the caption file, 20-30 minutes to score substitutions, insertions, deletions, and formatting errors using the DCMP counting protocol. QA labour is calibrated to the sampling rate (the percentage of published videos that receive a spot-check) and the compliance risk profile of the content category.

At a 10% sampling rate (appropriate for most programmes where correction rates are tracked and glossary quality is actively managed), QA labour at mid-market scale (180 videos/year) runs approximately 18 spot-checks per year × 40 minutes average = 720 minutes = 12 hours. At $60/hour loaded cost, that is $720/year — a modest line item that is nonetheless worth budgeting explicitly because it requires specific staff time, a specific process, and a specific tool setup that the coordinator must plan for.

At a 20% sampling rate (appropriate for programmes with new vendors, new content categories, or recent accuracy problems), QA labour doubles to 24 hours/year at the same scale. Compliance-critical programmes (healthcare, financial services, public sector with active OCR exposure) should budget for 20% sampling rates; mature programmes with stable vendor accuracy and well-maintained glossaries can reduce to 5% spot-check rates on low-risk content categories.

Labour category 3: Glossary maintenance

Glossary maintenance is the recurring programme investment that produces the accuracy compounding effect. It includes four activities: reviewing correction logs to identify systematic substitution errors that represent missing or incorrect glossary terms, adding new terms when new products, procedures, or regulatory references enter the training catalogue, updating changed terms when SKU names, drug names, or regulatory citations change, and resolving term conflicts when glossary entries produce inconsistent output across content types.

Glossary maintenance labour at programme maturity (after the initial glossary build, covered in the glossary architecture guide) typically runs 4-8 hours/month for mid-market programmes. At $60/hour and 6 hours/month average, that is $4,320/year. The investment is not linear with glossary size — a glossary with 2,000 terms does not require twice the maintenance of a glossary with 1,000 terms because most maintenance activity clusters on the recently added terms and the high-frequency errors, which represent a small subset of the total glossary. Budget for maintenance as a flat monthly allocation rather than scaling it with glossary size.

The initial glossary build is a one-time year-one cost: a 500-term organisational glossary requires approximately 20-40 hours to build from existing source documents (LMS metadata, style guides, product catalogues, HR job descriptions, regulatory reference lists). Budget this as a one-time project in the year-one budget, not as part of the ongoing monthly allocation.

Labour category 4: Programme coordination overhead

Programme coordination covers the activities that do not fit neatly into correction, QA, or glossary work: vendor relationship management, LMS integration troubleshooting, internal stakeholder communication, compliance reporting, and workflow documentation. The accessibility coordinator playbook covers the full scope of the coordinator role; this section focuses on the budget allocation.

For a mid-market programme in steady state, coordination overhead runs 4-8 hours/month. In year one, that doubles to 8-16 hours/month as the coordinator builds the workflow, establishes vendor relationships, and trains content owners on the caption submission process. At $60/hour and 10 hours/month average across the year (blending the higher ramp period with the lower steady-state), coordination overhead costs $7,200/year in year one and $4,320/year in years two and three.

Mid-market labour budget summary (15 hrs/month content volume, 10-min avg video length)
Labour category Year-1 hours Year-1 cost Year-3 hours (with compounding) Year-3 cost
Caption correction 720 $43,200 700 (vol +56%, rate −38%) $42,000
QA spot-checks (10%) 12 $720 19 $1,140
Glossary maintenance 122 (incl. initial build) $7,320 72 $4,320
Programme coordination 120 $7,200 72 $4,320
Total internal labour 974 $58,440 863 $51,780

Two observations from this table. First, year-three internal labour cost is lower than year-one despite a 56% increase in video volume — the glossary compounding effect is doing real work. Second, the internal labour cost dwarfs the external vendor cost in both years ($58,440 vs $1,188 in year one). This is why the ROI argument made in the Finance framing guide focuses on the absorbed labour cost: it is the number that makes the case for glossary-based software investment self-evident.

The glossary accuracy compounding effect on labour cost

The glossary accuracy compounding effect is the single most important variable in multi-year budget modelling for caption programmes, and it is the one most frequently missing from budget plans. The compounding effect works as follows: as the organisational glossary grows and correction logs are reviewed to identify systematic errors, the vendor model's accuracy on your specific vocabulary improves. The improvement is not instant — it accumulates over 12-24 months of correction feedback. But the budget consequence is real: the correction time per video decreases, and because correction labour is the largest cost in Bucket 2, the decrease in correction rate holds total labour cost roughly flat even as video volume grows.

The correction rate trajectory

The correction rate (minutes of correction per minute of audio) follows a characteristic pattern for programmes that actively maintain their glossary:

Programme month Correction rate Notes
Month 1-3 (ramp) 4.0–4.5× real-time Thin glossary; most technical vocabulary missing; correction covers both word errors and formatting
Month 4-6 3.2–3.8× real-time Core vocabulary terms added; high-frequency proper nouns captured; systematic errors declining
Month 7-12 2.8–3.2× real-time Second-tier vocabulary added; compounding effect beginning; new content types still at higher rate
Month 13-18 2.2–2.8× real-time Mature glossary; systematic errors are rare; correction limited to new terms and edge cases
Month 19-24 1.8–2.4× real-time Plateau range; further improvement requires new content categories or manual review investment
Month 25+ 1.5–2.0× real-time Steady state with active glossary maintenance; new content types reset to Month 1-3 rate

The trajectory has two inflection points that matter for budget planning. The first occurs between months 6 and 12, when the correction rate drops from the initial 4× range to below 3×. This corresponds to the glossary reaching critical mass — the point at which enough organisational vocabulary terms have been added that the vendor model is making systematic errors less often. If the glossary is not actively maintained through this period (correction logs reviewed monthly, new terms added weekly), the rate does not drop and the correction labour savings do not materialise. The second inflection occurs around month 18, when the rate enters the plateau range. Further improvement from this point requires either a new content category (which resets to the Month 1-3 rate), a glossary architecture investment (adding pronunciation guides, context variants, term hierarchies), or a new vendor evaluation.

Modelling the compounding effect in the budget

The budget model should not use a flat correction rate across all three years. The correct approach is to model the rate trajectory explicitly and apply it to the volume projections for each period. A simplified two-period model that captures most of the effect:

Applying this to the mid-market scenario (volume growing from 15 hrs/month in year one to 23.4 hrs/month in year three):

Year Monthly content (hrs) Annual videos (10 min avg) Correction rate Annual correction hours Correction cost ($60/hr)
Year 1 15 1,080 3.5× 630 $37,800
Year 2 18.75 1,350 2.4× 540 $32,400
Year 3 23.4 1,685 2.0× 562 $33,720

Correction cost falls from year one to year two (despite 25% volume growth) and remains roughly flat in year three (despite an additional 25% volume growth). The glossary compounding effect has absorbed approximately 56% volume growth between year one and year three while holding correction cost within 11% of the year-one level. Without the compounding effect (i.e., at a flat 3.5× rate), year-three correction cost would be $59,500 — 76% higher than the actual year-three budget of $33,720. The $25,780/year difference is the economic value of active glossary maintenance.

This is the number that justifies the 72 hours/year of glossary maintenance labour in Bucket 2: $4,320 in maintenance labour produces $25,780 in annual correction savings — a 5.97× return on the maintenance investment, every year at steady state. The feedback loop guide covers the mechanics of achieving and sustaining this compounding rate.

Bucket 3: LMS infrastructure and IT integration costs

LMS infrastructure costs are the budget bucket that consistently arrives as a surprise. Not because they are large — they are typically the smallest of the three buckets — but because they require IT project prioritisation that most L&D directors do not plan for when submitting the initial caption programme budget. A captioning tool that is approved in the Finance meeting but cannot be integrated into the LMS because IT is fully allocated for the next two quarters is not a caption programme; it is a subscription that is not being used.

Integration setup cost (one-time)

The integration setup cost depends on whether your LMS has a native integration with the caption vendor and how complex your LMS environment is. Three common configurations:

LMS Integration type Setup complexity IT cost estimate Notes
Kaltura Native API Low (configuration) $600–$1,200 (4-8 hrs IT) Caption ordering panel built into Kaltura MediaSpace; minimal custom development
Panopto Native + custom provider API Medium (provider config) $1,200–$2,400 (8-16 hrs IT) Custom caption provider requires API credentials setup and workflow testing
Cornerstone OnDemand Manual SRT upload None (no API) $0 IT / workflow overhead No native caption ordering; SRT files uploaded manually per video; coordinator time per upload
Docebo API or manual Low to medium $600–$2,400 (4-16 hrs IT) API available; complexity depends on Docebo plan tier and SSO configuration
TalentLMS Manual or API Low $600–$1,200 (4-8 hrs IT) Direct upload or webhook integration; straightforward configuration
Workday Learning API (complex) High $3,600–$7,200 (24-48 hrs IT) Workday API integration requires Workday-certified developer; integration complexity is the highest in the LMS market

The integration setup cost should appear in the year-one budget as a one-time project cost, clearly separated from the ongoing vendor subscription cost. Mixing it into the per-video cost or annualising it inflates the apparent ongoing cost and creates confusion when the integration is complete and the setup cost disappears from year-two invoices.

Ongoing IT maintenance cost (recurring)

LMS platforms update their APIs, change authentication requirements, and modify caption format specifications on irregular schedules. Each change has the potential to break the caption workflow, requiring IT intervention to update the integration configuration, test the fix, and redeploy. Most caption programme coordinators learn about these breakages when content owners report that newly uploaded videos have no captions — which means the breakage has been accumulating for however many videos were published between the API change and the complaint.

Budgeting 2-4 hours/month of IT maintenance for the caption integration covers routine API updates, format-compatibility testing on LMS version upgrades, and ad-hoc troubleshooting. At an IT rate of $150/hour (typical for a mid-market IT organisation), 3 hours/month of maintenance costs $5,400/year. This is smaller than the Bucket 2 labour costs but nonzero, and it belongs in the IT budget rather than the L&D budget in most organisations — which means the L&D director needs to make the request to IT at programme launch and ensure it is in the IT project plan, not just the L&D budget.

The Cornerstone exception: manual workflow overhead

Cornerstone OnDemand is the one major LMS platform that does not have a native API for third-party caption ordering. The Cornerstone caption workflow requires the coordinator to download the caption file from the captioning vendor, upload it to the Cornerstone video record, and verify the caption track is attached and playing correctly — a manual process that takes 15-25 minutes per video. This manual overhead belongs in Bucket 2 (internal labour) rather than Bucket 3 (IT), but it is a direct consequence of the LMS platform choice and should be surfaced in the budget discussion. At 15-25 minutes per video and 1,080 videos per year (mid-market scale), Cornerstone manual upload overhead runs 270-450 hours per year — $16,200–$27,000 at $60/hour loaded cost. That is a Bucket 2 cost driven by a Bucket 3 architectural decision and should be weighed when evaluating whether to remain on Cornerstone vs migrating to a platform with native integration.

See the LMS migration caption checklist for the full cost model for LMS platform migrations, including caption data portability, glossary migration, and the compliance risk during the migration window.

The seven cost levers L&D directors control

When the first draft of the caption programme budget exceeds the approved allocation, there are seven independent levers that can reduce total cost without reducing compliance outcomes. Each lever operates on a different budget bucket and has a different risk profile. Understanding which levers to pull — and in what combination — is the analytical skill that separates budget-competent programme directors from those who simply submit a reduced budget request without understanding what they are cutting.

Lever 1: Video volume (the primary driver)

Total annual hours of content published is the primary driver of all three budget buckets: more volume means more external vendor processing cost, more internal correction and QA labour, and more LMS integration throughput. The volume lever is the strongest but least palatable: reducing video production to reduce caption budget means reducing the L&D programme output that the caption budget is there to support.

The more useful volume strategy is prioritisation rather than reduction. If the caption programme budget is constrained, the compliant response is to caption all new forward content at the required accuracy threshold and defer the backlog to a separate remediation budget in a future fiscal year — not to caption half the new content at the required threshold and the other half at a lower threshold. ADA compliance applies to published content; a decision to caption new content first and defer backlog second is a legally defensible prioritisation. A decision to caption all content at a lower accuracy threshold is not.

Lever 2: Accuracy tier selection

The choice between machine-first captioning (with glossary support) and human review captioning affects external vendor cost by 4-8× and internal correction labour by 30-50%. Glossary-augmented machine captioning at a mature programme (correction rate 2.0×) produces fewer required corrections than un-glossed machine captioning (correction rate 4.5×) at similar external cost — which means the accuracy tier choice should always be evaluated on total cost of ownership (external + correction labour), not on the vendor invoice alone.

The practical tier decision for most L&D programmes: use glossary-augmented machine captioning (subscription model) as the default tier for content where the organisational glossary provides good vocabulary coverage (technical training, product certification, compliance training with known regulatory vocabulary), and reserve human review for content categories where the glossary has low coverage (new content categories, executive communications with names and references the glossary doesn't contain, content with complex audio quality problems).

Lever 3: QA sampling rate

The percentage of published videos that receive DCMP spot-checks directly determines QA labour in Bucket 2. The sampling rate can be calibrated by content risk category: compliance-critical content (OSHA training, HIPAA modules, financial services disclosures) warrants a 20% sampling rate; general management development content with well-established vocabulary can operate at 5%. Tiered sampling rates by content category can reduce total QA labour by 40-60% compared to a flat rate applied to all content, while maintaining full coverage on the content where accuracy failure has the highest compliance risk.

Lever 4: Glossary investment

Glossary investment is the counter-intuitive lever: increasing the glossary maintenance allocation (which costs money now) reduces correction labour (which saves more money later). The return on glossary maintenance is approximately 5.97× per year at steady state, as calculated in the compounding section above. If the budget needs to be cut in year one, cutting glossary maintenance is the worst lever to pull — it preserves the short-term budget at the cost of eliminating the compounding savings in years two and three.

The correct lever in this direction is the glossary build timing. The initial glossary build (20-40 hours) can be front-loaded into the first 30 days of the programme to accelerate the compounding effect. A programme that builds a 500-term glossary in month one reaches the month 4-6 correction rate by month two, capturing approximately 2-3 months of additional compounding savings that a gradual glossary build would have delayed.

Lever 5: LMS delivery configuration

The choice between native API integration and manual upload workflow is a Bucket 3 cost lever with Bucket 2 consequences. Native integration requires upfront IT investment but eliminates the per-video manual upload overhead. Manual upload requires no IT investment but creates per-video coordinator overhead that scales linearly with volume. The crossover point depends on video volume: at low volume (under 100 videos/year), manual upload overhead ($1,000-$2,000/year) is cheaper than native integration setup ($600-$2,400 one-time). At mid-market volume (1,000+ videos/year), native integration pays back within the first year.

Lever 6: Vendor contract structure

Within a chosen vendor, contract structure affects cost in two ways. First, the payment term: annual prepayment typically reduces monthly subscription cost by 15-25%, producing $180-$300/year savings on a $99/month Team plan. Second, the commitment structure: per-minute vendors offer volume commitments at discounted rates that can reduce per-minute cost by 10-20% for high-volume programmes. The risk of commitments is volume variance: if you commit to 500 hours and produce 350 hours, you have paid for 150 hours that you did not use. Budget for commitments at 80-85% of expected volume and treat the discount as a contingent saving rather than a base case.

The captioning RFP playbook covers the procurement strategy for selecting the right vendor and contract structure. The contract review checklist covers the specific terms that govern volume commitments, overage pricing, and annual adjustment clauses.

Lever 7: In-house vs full-service split

The in-house vs full-service split is the labour sourcing decision: whether caption correction is done by internal staff (cost: internal loaded rate, typically $50-$80/hour) or outsourced to a third-party correction service (cost: external contract rate, typically $35-$60/hour for specialised caption editing services). For most mid-market L&D programmes, keeping correction in-house is cheaper than outsourcing it — the internal staff member does the correction in the context of understanding the content and the organisation's vocabulary, which reduces correction time. For large enterprise programmes with high volumes and multiple content categories, a hybrid model (in-house for compliance-critical content, outsourced for high-volume general content) can reduce total correction cost by 15-25%.

Year-1 budget request template

The year-one budget request should present all three buckets in a format that allows Finance to route costs to the correct approval channels and that gives the L&D VP enough detail to understand what the programme costs and why. The following template uses three org-size tiers to show the range; adapt the numbers to your actual video volume, compensation rates, and LMS platform.

Small organisation (<500 employees, ≤5 hrs/month new content)

Year-1 budget — Small tier (5 hrs/month, 10-min avg, 30 videos/month)
Line item Bucket Annual cost Notes
Caption software subscription External vendor $348 Starter plan ($29/month); up to 10 hrs/month
Backlog remediation (if applicable) External vendor $0–$9,000 Dependent on backlog size; omit if backlog is zero
Caption correction labour Internal labour $7,560 360 videos × 35 min correction × $60/hr; 3.5× rate
QA spot-checks (10%) Internal labour $720 36 spot-checks × 40 min × $60/hr
Glossary build (one-time) Internal labour $1,800 30 hours initial build × $60/hr
Glossary maintenance Internal labour $2,160 3 hrs/month × 12 × $60/hr
Programme coordination Internal labour $3,600 5 hrs/month × 12 × $60/hr; higher in H1
LMS integration setup IT infrastructure $600–$2,400 One-time; dependent on LMS platform
IT ongoing maintenance IT infrastructure $1,800 2 hrs/month × 12 × $75/hr (IT rate)
Total year-1 (ex-backlog, ex-IT var) $18,588–$20,388

Mid-market organisation (500–5,000 employees, 10–40 hrs/month new content)

Year-1 budget — Mid-market tier (15 hrs/month, 10-min avg, 90 videos/month)
Line item Bucket Annual cost Notes
Caption software subscription External vendor $1,188 Team plan ($99/month); up to 50 hrs/month
Backlog remediation (if applicable) External vendor $0–$45,000 Dependent on backlog; often larger than forward content cost in Y1
Caption correction labour Internal labour $37,800 1,080 videos × 35 min correction × $60/hr; 3.5× rate
QA spot-checks (10%) Internal labour $2,160 108 spot-checks × 40 min × $60/hr
Glossary build (one-time) Internal labour $2,400 40 hours initial build × $60/hr
Glossary maintenance Internal labour $4,320 6 hrs/month × 12 × $60/hr
Programme coordination Internal labour $7,200 10 hrs/month × 12 × $60/hr; higher in H1
LMS integration setup IT infrastructure $1,200–$3,600 One-time; dependent on LMS platform
IT ongoing maintenance IT infrastructure $2,700 3 hrs/month × 12 × $75/hr (IT rate)
Total year-1 (ex-backlog, ex-IT var) $58,968–$61,368

Enterprise organisation (>5,000 employees, >40 hrs/month new content)

Year-1 budget — Enterprise tier (50 hrs/month, 10-min avg, 300 videos/month)
Line item Bucket Annual cost Notes
Caption software subscription External vendor $4,800–$9,600 Enterprise custom pricing; $400–$800/month typical range
Backlog remediation (if applicable) External vendor $0–$180,000+ Large enterprises may have 1,000+ hours of uncaptioned legacy content
Caption correction labour Internal labour $126,000 3,600 videos × 35 min × $60/hr; 3.5× rate
QA spot-checks (10%) Internal labour $7,200 360 spot-checks × 40 min × $60/hr
Glossary build (one-time) Internal labour $6,000 100 hours initial build (larger, multi-vertical org) × $60/hr
Glossary maintenance Internal labour $8,640 12 hrs/month × 12 × $60/hr
Programme coordination (0.5 FTE) Internal labour $36,000–$48,000 0.5 FTE coordinator at $72K–$96K full-time salary; see coordinator playbook
LMS integration setup IT infrastructure $3,600–$12,000 Complex LMS environments (Workday, multi-platform) at higher end
IT ongoing maintenance IT infrastructure $5,400 6 hrs/month × 12 × $75/hr (IT rate)
Total year-1 (ex-backlog, ex-IT var) $197,640–$221,040

Three observations about these templates. First, the external vendor line item is the smallest in each tier — 2%, 2%, and 2-5% of total year-one cost respectively. Submitting the caption programme budget as a one-line vendor-cost item captures 2-5% of the actual programme cost and systematically underrepresents the resource requirements. Second, correction labour is the largest line item in all three tiers, representing 40-60% of total programme cost. Third, the enterprise tier introduces the coordinator FTE allocation explicitly as a 0.5 FTE budget line — at enterprise scale, programme coordination cannot be absorbed into existing job descriptions without clear resource allocation. The accessibility coordinator playbook covers how to scope and justify this role.

Year-2 and year-3 projections

Year-two and year-three projections should be built as a direct extension of the year-one model with three explicit adjustments: volume growth, correction rate improvement from glossary compounding, and the elimination of one-time year-one costs. The simplest approach is to build a rolling three-year model in a spreadsheet with those three variables as adjustable inputs, then present the model output as the year-two and year-three budget projections.

Adjustments from year one to year two

Four changes to apply when moving from the year-one model to year-two:

  1. Remove one-time costs: Glossary build cost (year-one only), LMS integration setup cost (year-one only). These do not recur and should disappear from the year-two budget line items.
  2. Apply volume growth: Increase monthly content volume by 25% (or your programme-specific growth rate). A team at 15 hrs/month in year one moves to 18.75 hrs/month in year two.
  3. Apply correction rate improvement: Move from the year-one blended correction rate (3.5×) to the year-two blended rate (2.4×). This reduces correction hours despite volume growth.
  4. Reduce coordination overhead: Year-two coordination typically runs at 60-70% of year-one level as the workflow is established and the coordinator is not building from scratch. Reduce the coordination line item accordingly.

Mid-market three-year summary

Three-year cost model — Mid-market tier (15 hrs/month Y1, 25% annual volume growth)
Line item Year 1 Year 2 Year 3
External vendor (subscription) $1,188 $1,188 $1,188
Caption correction labour $37,800 $32,400 $33,720
QA labour (10%) $2,160 $2,700 $3,375
Glossary build (one-time) $2,400
Glossary maintenance $4,320 $4,320 $4,320
Programme coordination $7,200 $4,680 $4,320
LMS integration setup $2,400
IT ongoing maintenance $2,700 $2,700 $2,700
Total $60,168 $47,988 $49,623

Year-two total cost is 20% lower than year-one despite 25% volume growth. The one-time costs that inflate year one ($2,400 glossary build + $2,400 integration setup = $4,800) and the correction rate improvement ($37,800 → $32,400 = $5,400 savings) combine for a $10,200 reduction that more than offsets the volume growth. Year three returns to slightly higher cost as volume growth begins to outpace the diminishing correction rate improvement. The three-year trajectory (60K → 48K → 50K) confirms the earlier claim: a well-managed programme at mid-market scale has year-three costs 1.2-1.4× year-one costs — dramatically better than the 2.5-3.2× trajectory of an unmanaged programme.

Presenting multi-year projections to Finance

The year-two cost reduction provides a useful talking point in the Finance budget meeting: "We are requesting $60,000 in year one, which includes one-time setup costs of $4,800 and a first-year correction labour premium while the glossary is building. Year-two costs drop to approximately $48,000 and stabilise there as the glossary compounding effect holds labour costs flat through volume growth. We expect the three-year total cost of the programme to be approximately $157,000, compared to $180,000+ if we had stayed with per-minute pricing at current volume." That is a credible three-year financial narrative, not a one-year cost estimate.

Budget by organisation size

The following reference table provides year-one and year-three budget ranges for four common L&D programme configurations. The ranges reflect variation in correction rates (depending on content complexity), video length distribution, and LMS platform integration cost. Use these as sanity-check bounds for your specific model, not as substitutes for it.

Org profile Employees Content volume Year-1 budget range Year-3 budget range Primary cost driver
Very small / startup <100 ≤2 hrs/month $4,000–$9,000 $5,000–$12,000 Correction labour (proportion of staff time)
Small 100–500 2–8 hrs/month $15,000–$28,000 $18,000–$35,000 Correction labour; glossary build in Y1
Mid-market 500–5,000 8–40 hrs/month $45,000–$90,000 $40,000–$75,000 Correction labour (dominant); compounding reduces Y3 below Y1
Enterprise >5,000 $150,000–$280,000 $130,000–$240,000 Coordinator FTE; correction labour at scale

Two patterns visible in this table. First, year-three cost is lower than year-one for mid-market and enterprise profiles because the correction rate improvement and the elimination of one-time setup costs more than offset volume growth. For very small and small profiles, year-three is higher than year-one because volume growth from very low starting points does not give glossary compounding enough time to work before the budget comparison cutoff.

Second, the external vendor cost is not visible in these ranges because it is too small to be legible at this scale. The subscription cost ($348–$9,600/year) represents 0.3–8% of total programme cost depending on the tier — it is genuinely a minor budget line compared to internal labour. This is the quantitative reason why the make-vs-buy decision on caption quality investment should be made on total cost of ownership rather than vendor invoice comparison.

Backlog remediation budget: a separate planning exercise

If your organisation has a significant uncaptioned back-catalogue, backlog remediation should be modelled as a separate budget item with its own timeline, not folded into the ongoing forward-content budget. The backlog size determines the remediation budget; the compliance risk profile of the backlog (what content categories, what learner populations, what OCR complaint exposure) determines the remediation priority.

A backlog of 200 hours of uncaptioned content at Rev human review pricing ($1.50/min) costs $18,000 to remediate. Spread over two years, that adds $9,000/year to the budget. Spread over three years, $6,000/year. The remediation timeline is a cost-smoothing decision that should account for compliance risk: content used by employees with ADA accommodation needs, content in active use by large learner populations, and content cited in prior OCR complaints should be remediated first, regardless of the timeline smoothing. The LMS audit methodology covers the risk-weighted prioritisation framework for backlog remediation.

Eight failure modes in caption budget planning

Caption programme budget failures tend to cluster in predictable patterns. The following eight failure modes are the ones that most consistently produce mid-year budget overruns, programme stalls, or cost structure surprises in year two.

Failure mode 1: Modelling year-one volume as steady-state

The most common error in caption programme budget planning is treating the year-one video publication rate as the permanent annual rate and projecting costs forward as a flat line. L&D programmes that introduce formal captioning workflows typically see significant volume growth in years two and three as the workflow becomes more accessible to content creators, more business units adopt the platform, and previously informal video production moves into the LMS. A flat-line projection at year-one volume produces a year-two budget shortfall that arrives without warning when invoices and correction labour exceed the projected line items.

Fix: Model volume growth explicitly at 20-30% per year for years two and three. If the actual growth rate comes in lower, the budget has a useful buffer. If it comes in higher, the three-year model provides a framework for a supplemental budget request that is credible rather than surprised.

Failure mode 2: Counting only the external vendor invoice

Submitting a caption programme budget as a single line item equal to the annual subscription or per-minute estimate is the most common form of budget undercounting. It misses Bucket 2 (internal labour, typically 60-85% of total programme cost) and Bucket 3 (IT infrastructure, typically 5-10%) entirely. The budget looks small and is easy to approve. The programme then consumes staff time and IT capacity that were not budgeted, creating invisible cost overruns that accumulate in staff calendars and IT project backlogs until a VP asks why the L&D coordinator is only at 60% on their primary deliverables.

Fix: Submit budgets using the three-bucket structure. Allocate Bucket 2 labour explicitly as a percentage of existing FTE or as a headcount request. Confirm Bucket 3 IT capacity before the programme budget is submitted.

Failure mode 3: Not modelling the glossary compounding savings

Projecting year-two and year-three correction labour at the year-one correction rate systematically overestimates future costs for programmes that actively maintain their glossary. This is a conservative error (the budget will have surplus in years two and three) but it produces a different problem: a VP of L&D who sees year-two actuals significantly below year-two budget asks whether the year-one budget was inflated, which creates credibility problems for future budget requests.

Fix: Use the correction rate trajectory table from the compounding section. Model year-two and year-three correction labour at the appropriate compounded rate and present the rate improvement explicitly in the budget narrative as evidence of programme ROI.

Failure mode 4: Ignoring the backlog remediation cost

Discovering the backlog after programme launch — particularly in the context of an accessibility audit or OCR investigation — produces a remediation budget request that arrives without preparation, under pressure, and at exactly the wrong moment. Organisations that do not audit their existing caption library before programme launch consistently underestimate backlog remediation cost and submit inadequate budget requests.

Fix: Conduct the LMS caption audit before submitting the year-one budget request. Include the backlog remediation cost as a separate budget line with a three-year remediation timeline and a risk-weighted prioritisation rationale.

Failure mode 5: Not planning IT capacity before programme launch

A caption software subscription that is approved in the Finance meeting but cannot be integrated into the LMS because IT is fully allocated for two quarters produces a programme that is paying for a subscription it cannot use. Integration setup is not self-service at most LMS platforms. IT capacity must be confirmed before the programme launch date is set.

Fix: Include the IT integration setup in the budget request and confirm IT project prioritisation at the same time as Finance approval. Do not set a programme launch date until IT has confirmed the integration in their project plan.

Failure mode 6: Using the wrong pricing model in the budget calculation

An L&D director evaluating caption software options who models the per-minute rate in a subscription budget (or vice versa) produces a budget estimate that is wrong by 4-10× depending on volume. This failure is more common during the vendor evaluation phase, when the team is comparing pricing models and may inadvertently use per-minute rates to estimate subscription costs or subscription-equivalent rates to underestimate per-minute costs at scale.

Fix: Build the three-year cost model before finalising vendor selection, not after. The cost comparison between per-minute and subscription at your specific volume trajectory often produces a clear winner that the year-one cost comparison obscures.

Failure mode 7: Not allocating the coordinator role explicitly

At mid-market and enterprise scale, caption programme coordination is a substantial ongoing time commitment that cannot be absorbed into existing job descriptions without explicit allocation. Programmes that rely on the assumption that "someone will handle it" routinely discover that glossary maintenance falls behind, QA spot-checks are not completed, and the accuracy compounding effect never materialises — because the labour that drives the compounding was never allocated, budgeted, or protected from competing priorities.

Fix: Allocate coordinator time explicitly in the budget using the four-category labour model. For mid-market programmes, 15-20% of one FTE is the appropriate allocation range. For enterprise programmes, consider a dedicated 0.5 FTE or a full dedicated role as described in the coordinator playbook.

Failure mode 8: Treating year-one one-time costs as ongoing costs

Annualising year-one costs — treating the glossary build, integration setup, and ramp-period coordination overhead as if they recur every year — produces a budget model that projects unnecessarily high year-two and year-three costs. This creates two downstream problems: the VP of L&D may reject the multi-year cost model as implausibly expensive, and the L&D director who wins approval on the inflated model has a year-two budget surplus that is politically awkward.

Fix: Flag one-time year-one costs explicitly in the budget request with "one-time" labels. Present the year-two and year-three projections as the steady-state operating cost once the setup is complete. The contrast between year-one (higher, with setup) and year-two (lower, steady-state) is a positive narrative that demonstrates planning competence.

Seven-question FAQ

What is the minimum viable budget for a small L&D team that is just starting a caption programme?

For a small team producing 2-5 hours of new content per month, the minimum viable budget covers three items: a Starter-tier subscription ($348/year), the initial glossary build (20-30 hours at your internal loaded rate, typically $1,200-$1,800), and the correction labour for the first year. Correction labour at 2-5 hours/month of content, 3.5× correction rate, and a 10-minute average video length runs approximately 70-175 hours at a small programme — $4,200-$10,500 at $60/hour. Total minimum viable budget: $6,000-$13,000 in year one. Do not omit the glossary build even though it is a one-time cost. A programme that launches without a glossary achieves no compounding effect and has the highest correction labour burden for the entire duration — the glossary build investment is what makes the three-year economics work.

Should I budget for human review or machine-first (glossary-augmented) captioning?

The choice affects both the external vendor cost and the internal correction labour cost in opposite directions. Human review costs 6-10× more per minute externally but reduces correction labour because the output is higher accuracy. Glossary-augmented machine captioning costs 80-90% less externally but requires correction labour to reach 99% DCMP accuracy. For most L&D programmes, glossary-augmented machine captioning (subscription model) is the more cost-effective choice because the correction labour savings from human review do not offset the external cost premium at realistic volume. The exception is content with unusually complex audio (heavy accents, multiple speakers, technical vocabulary not in the glossary) where machine accuracy is consistently below 85% and human review correction labour is comparable to machine correction labour — in that case, the external premium buys a meaningfully better starting point. Use the total cost of ownership model (external + correction labour) to make the tier decision for your specific content profile rather than choosing based on the external cost alone.

How do I account for backlog remediation in the year-one budget request without making the total look alarming?

Present backlog remediation as a separate named line item in the budget, clearly labelled "one-time backlog remediation" with a specific scope (hours of content, estimated cost) and a completion timeline. Do not blend it into the per-video ongoing cost or hide it in the IT line. A VP of Finance who sees a $60,000 forward-content budget presented alongside a $30,000 backlog remediation budget can evaluate both on their merits. The same $90,000 total with no explanation for the one-time component triggers scrutiny of the whole budget rather than approval of the recurring component. The compliance argument for backlog remediation is risk-based: the OCR complaint exposure for uncaptioned legacy content that is still in active use does not diminish simply because the content is old. Size the argument to the specific OCR complaint exposure, not to the total uncaptioned library volume.

What is the right way to model volume growth if I do not know how many videos I will publish in year two?

Use a three-scenario model (conservative, base, optimistic) with different growth rate assumptions and present all three to your VP as a sensitivity analysis. Conservative: 10% growth per year (content volume grows slowly; no new business unit adoption). Base: 25% growth per year (typical adoption curve for programmes that successfully establish the workflow). Optimistic: 50% growth per year (aggressive adoption curve when a strong compliance driver — an OCR investigation, a major product launch, a new regulatory requirement — accelerates production). Show the year-three total cost under all three scenarios. The range is informative: if even the conservative scenario exceeds the available budget at year three, you have a conversation to have with Finance now rather than in two years. If all three scenarios are within budget tolerance, you have the flexibility to plan conservatively and use budget surplus for backlog remediation or QA investment.

Can I reduce correction labour by improving glossary accuracy, and is that reduction predictable enough to put in the budget?

Yes, and yes — with the caveat that the glossary compounding effect requires active maintenance investment to materialise. The correction rate trajectory table in this post gives you the month-by-month rate expectations; use the annual blended rates (Year 1: 3.5×, Year 2: 2.4×, Year 3: 2.0×) as the basis for your budget projections. Those rates reflect a programme that reviews correction logs monthly, adds new glossary terms weekly during the active ramp period, and conducts a quarterly glossary audit. A programme that does those things reliably will see correction rate improvement consistent with the table. A programme that treats glossary maintenance as discretionary (it gets done when there is time) will see slower improvement — budget 20-25% higher than the table rates as a buffer if you are uncertain about glossary maintenance discipline.

The compounding reduction is large enough to be budget-significant: at mid-market scale, the difference between a well-maintained glossary and a neglected one is approximately $15,000-$20,000/year in year three. That is a real cost difference, not a rounding error, and it justifies treating glossary maintenance as a protected line item rather than a discretionary task.

How do I present the three-year cost model to Finance in a single slide?

Use a stacked bar chart with three bars (year one, two, three) and three colour-coded segments per bar representing the three budget buckets (external vendor in one colour, internal labour in a second, IT infrastructure in a third). Add a total line across all three years. The chart communicates three things that a table cannot communicate as efficiently: the three-bucket structure (labour dominates), the year-over-year trajectory (year two drops because setup costs disappear and compounding begins), and the total three-year commitment (the number Finance really wants to see). One slide is enough — Finance does not need the methodology behind the correction rate trajectory or the glossary compounding formula in the approval meeting. They need the number and the structure. The detail belongs in the appendix or the follow-up conversation.

The single most important narrative element for the Finance slide is the year-two cost reduction. "Year one: $60,000 (includes $4,800 in one-time setup). Year two: $48,000 (19% lower despite 25% volume growth due to glossary accuracy improvement). Year three: $50,000 (stable at approximately the year-two level as programme matures)." That trajectory tells Finance that you understand the programme economics and that you are not asking for an open-ended budget commitment — the programme cost stabilises, and you can demonstrate why.

What is the right coordinator FTE allocation and which team should own it?

The right allocation depends on programme scale. For programmes producing up to 40 hours of new content per month, 15-20% of one existing FTE is typically sufficient: 6-8 hours/week across correction oversight, QA scheduling, glossary maintenance, and vendor coordination. For programmes above 40 hours/month, a dedicated 0.5 FTE is appropriate. For large enterprise programmes above 100 hours/month with multiple content categories and multiple LMS platforms, a full 1.0 FTE accessibility coordinator is justified — the coordinator playbook makes the full-role case in detail.

The team ownership question is organisational but the correct answer is almost always L&D rather than IT, Legal, or HR. Caption compliance is an L&D delivery quality issue, not a legal documentation issue or an IT configuration issue. The coordinator's primary relationships are with content producers, LMS administrators, and the captioning vendor — all L&D programme relationships. Legal and HR play important roles in defining the compliance standard and the accommodation process, but they should not own the operational programme coordination. The team that owns caption quality is the team responsible for delivering accessible training content, which is L&D. Budget the coordinator role in the L&D operating budget with a dotted-line reporting relationship to Legal or HR as appropriate for your governance structure.

Ready to build your caption programme budget?

GlossCap provides the glossary-augmented captioning infrastructure that makes the three-year cost model work: subscription pricing that holds flat as volume grows, automatic glossary integration that drives the correction rate improvement, and LMS delivery APIs for Kaltura, Panopto, Docebo, TalentLMS, and all major platforms. The captioning RFP template covers the vendor selection process; the Rev vs GlossCap comparison provides the detailed per-minute vs subscription cost analysis.

See pricing Learn how GlossCap works