Procurement · Published 2026-06-10

Caption vendor contract review checklist: SLA terms, accuracy guarantees, BAA provisions, and the clauses that matter

The captioning RFP is where you pick the right vendor. The contract is where you lock in what "right" actually means, legally and operationally, for the duration of the engagement. Most L&D teams that have gone through the trouble of running a structured RFP — like the one we walked through in the captioning vendor RFP playbook — arrive at the contract phase with good vendor selection data and no contract negotiation strategy. The vendor's standard contract is handed over and signed with minimal red-lines because the team is tired of the procurement cycle and wants to get captions flowing. The result is a two-year commitment with a vendor whose contract does not specify how accuracy is measured, does not define what triggers a remediation obligation, does not clarify who owns the per-customer glossary on exit, and does not require the vendor to execute a BAA even though the training video catalogue contains PHI. Each of these is a material exposure. This post is a practitioner's guide to the contract phase: what to require in each clause category, how to recognise when a vendor draft is deliberately vague on terms that matter, and the 42-point review checklist to run before signature. The post assumes you have already completed a structured vendor selection process. If you are still in the evaluation phase, start with the RFP playbook and the vendor pricing breakdown; the contract review process only applies once you have a shortlisted vendor and a draft contract in hand.

TL;DR

Eight clause categories matter in a captioning vendor contract: (1) accuracy guarantees — specify the measurement protocol, the sample methodology, and the remediation trigger; (2) SLA terms — turnaround time, remediation window, breach consequences, and back-catalogue retrofit timeline; (3) BAA provisions — required if any PHI or customer-account data surfaces in the training content, must specify the scope of permitted uses and the breach notification window; (4) data retention and deletion — retention window for audio and transcripts, data residency, deletion rights on termination; (5) glossary ownership and portability — your glossary is your IP, must be exportable in a documented format on request and returned on termination; (6) format and integration SLAs — output formats guaranteed, LMS integration delivery method, API uptime; (7) pricing and volume terms — overage rates, unused-minutes policy, rate lock duration, minimum volume exit rights; (8) termination and transition — data return window, transition assistance SLA, exit penalties. The seven most common contract failures are: vague accuracy language with no measurement protocol, missing BAA despite PHI exposure, glossary IP owned by vendor, no deletion obligation on termination, SLA measured on response time instead of completion, overage pricing that eliminates the unit-economics case, and single-year auto-renewal with long notice windows. The 42-point checklist is at the end of this post. Run it before signature on any captioning vendor contract above $10K annual value.

Why the contract phase is distinct from the RFP phase

The RFP phase answers the question "which vendor produces the best output on our content at a price we can afford?" The contract phase answers a different set of questions: "what happens when the output is not what we expected?", "who owns the data and the accuracy investment we make?", and "can we exit if the vendor's quality degrades over the contract term?" These are adversarial questions in the sense that the vendor's interests are partially opposed to yours, and the vendor's standard contract reflects the vendor's preferred answers.

The captioning-vendor market has a structural dynamic that produces contracts unfavourable to buyers: most buyers sign captioning contracts once every two to four years (the average LMS cycle, roughly), while vendors negotiate captioning contracts every week. A mid-market captioning vendor whose legal team drafts the standard contract has 400 contracts of precedent on what clauses buyers push back on and which ones they miss. The buyers who push back hardest on the accuracy guarantee language are usually the ones who have been burned before. The buyers who do not push back are the ones signing the contract for the first time. The vendor's standard contract is calibrated for the typical first-time buyer, not for the buyer who has done this before.

This asymmetry is correctable. The clause categories below represent the terms that consistently matter in practice — the ones where the vendor's standard language benefits the vendor at the buyer's expense, and where a few hours of negotiation produces a contract that gives you the operational and legal protection the purchase price should include. The 42-point checklist at the end of the post is the fast path through the clause review: run it on the vendor's draft, note which items are addressed, which are silent, and which are addressed in ways that expose you, then negotiate specifically on the gaps. The checklist is not a scorecard — a vendor who passes 38 of 42 is not necessarily a better contract partner than one who passes 35, because the weight of individual clauses varies with your situation. It is a completeness check that ensures you are not missing a category.

For context on the vendor landscape whose contracts you are likely reviewing: the pricing and structural differences between the major captioning vendors are covered in the pricing breakdown post. The comparison pages for Rev vs GlossCap, 3Play vs GlossCap, and Verbit vs GlossCap cover the feature and workflow differences that typically surface in the RFP phase. This post assumes you are in the contract phase with a specific vendor; the comparison pages are for the selection phase.

Section 1: Accuracy guarantees

The accuracy guarantee clause is the load-bearing clause for most L&D buyers, and it is the clause most often drafted in language that sounds substantive but is legally unenforceable. The goal in this section is a contract clause that specifies (a) what accuracy level is guaranteed, (b) how accuracy is measured, (c) on what corpus the measurement is taken, (d) what triggers a remediation obligation, and (e) what remediation looks like. A clause that specifies only (a) — a stated WER floor with no measurement or remediation terms — is operationally useless.

The accuracy floor

The number most buyers want in the contract is 99% word-level accuracy, because 99% is the standard that corresponds to WCAG 2.1 AA compliance as specified in the DCMP Captioning Key. Some vendors will resist this floor if their standard product does not reliably hit it. The resistance usually comes in three forms: (1) they propose a lower floor (95% or 97%) and characterise 99% as "enterprise" tier; (2) they propose 99% with a measurement methodology that is easier to hit (corpus-wide WER on broadcast-quality audio rather than per-video spot-check on your content); or (3) they agree to 99% but strip out the measurement protocol, leaving the floor notional. All three should be declined.

If a vendor tells you that 99% word-level accuracy is achievable only on a higher-priced tier, ask them specifically whether their standard product is designed to produce WCAG 2.1 AA compliant captions. If the answer is yes, the 99% floor should be achievable on the standard product. If the answer is "it depends," that is an answer about vocabulary complexity, not about the vendor's ceiling. Vocabulary complexity is precisely what the per-customer glossary exists to solve — the 99% accuracy post covers this in detail. A vendor whose standard product cannot reach 99% on your content with your glossary applied is a vendor whose product does not do what WCAG requires. Contracting with them anyway at a lower accuracy floor is a compliance exposure, not a negotiating compromise.

The measurement protocol

The accuracy guarantee clause must specify the measurement protocol. The standard that courts and DOJ enforcement have applied to WCAG 2.1 AA caption compliance is the DCMP Captioning Key protocol: word-level accuracy measured on a sampled passage, including substitution errors (wrong word), insertion errors (word added), and deletion errors (word dropped), with synchronisation and speaker identification also checked. The WER number is the total error count divided by the total word count in the sampled passage. This is the protocol your compliance programme should be using for the post-delivery quality spot-check in the caption QA methodology post, and it should be the protocol specified in the vendor contract.

Vendors sometimes propose alternative measurement protocols that produce higher apparent accuracy on the same output. Common alternatives: measuring over a large corpus (which dilutes errors in low-vocabulary passages against the total word count), measuring only on audio that has been pre-screened for quality (excluding the hard segments), or measuring over a subset of the vocabulary (which excludes proper nouns from the error count, where most of the hard errors live). Each of these alternatives produces a number that looks like 99% but does not correspond to WCAG compliance on your actual training content. The contract clause should specify: DCMP Captioning Key word-level accuracy, measured per-video on a random sample of at least 10% of submitted content per quarter, on a passage selected by the buyer (not by the vendor), without pre-screening for audio quality.

The remediation trigger and SLA

The accuracy floor and the measurement protocol establish what accuracy should be. The remediation clause establishes what happens when it is not. The remediation clause should specify: (1) the trigger — what measured accuracy level creates a remediation obligation (the threshold should be at or slightly above the accuracy floor, to give you a right to act before the content is actively non-compliant); (2) the remediation window — how long the vendor has to deliver a corrected caption track on the failing content; (3) what remediation looks like — re-captioning by a human reviewer on the failing content, not regeneration by the same ASR system that produced the failing output; and (4) the escalation consequence — what right the buyer has if the vendor fails the remediation SLA (typically a service credit, a right to re-submit with no charge, or a termination right for cause).

Vendor standard contracts sometimes define remediation as the vendor "using commercially reasonable efforts to improve accuracy." This language is operationally meaningless — it creates no specific obligation, no deadline, and no consequence for failure. Replace it with specific language: "Vendor shall deliver a corrected caption track for any failing submission within [N] business days of written notification of the accuracy failure. Failure to deliver within the remediation window entitles Buyer to (a) a service credit equal to the submission fee for the failing content and (b) a right to submit the same content at no charge." The N should be 5–10 business days for standard content, with a 2-business-day rush-remediation option specified for content designated as compliance-critical at submission.

Scope of the accuracy obligation

One clause gap that consistently surfaces in captioning vendor disputes is the scope of the accuracy obligation: specifically, whether the 99% floor applies to all vocabulary in the caption output or only to "general vocabulary." Some vendor contracts contain language that excludes "technical terms," "proper nouns," or "specialised vocabulary" from the accuracy measurement. This exclusion, if accepted, eliminates the accuracy obligation on precisely the vocabulary category where caption accuracy is most important for L&D training content. Product names, drug names, SDK terms, and LMS-specific workflow vocabulary are the categories of words that break auto-captions and that matter most to learners using the captions to understand the training. Excluding them from the accuracy measurement makes the accuracy guarantee meaningless. The contract should specify that the accuracy obligation covers all vocabulary in the submitted content, including proper nouns, technical terms, and product-specific vocabulary supplied in the customer glossary.

Section 2: SLA terms

The SLA section covers turnaround time (how long to deliver captions after submission), back-catalogue retrofit timeline (how long to complete the initial bulk captioning of your existing library), rush-order handling, and the consequences of SLA breach. SLA terms are the terms most often treated as administrative boilerplate during contract review and most often the source of operational friction during the contract term.

Turnaround SLA

Standard turnaround SLA for a pre-recorded training video submission is 24–48 hours from submission to delivery. Some vendors offer a tiered turnaround: standard (48 hours), expedited (24 hours at a premium), and rush (same-day at a higher premium). The contract should specify which tier applies by default and what the pricing structure is for upgrades. The turnaround SLA should be measured from submission (the moment the buyer uploads the file and the submission is confirmed by the vendor's system) to delivery (the moment the vendor delivers the caption file to the buyer in the agreed format), not from when the vendor begins processing — "processing start" is a variable that the vendor controls and can use to extend the effective turnaround time.

The SLA should also specify what happens to turnaround time if the vendor's system has a processing backlog. Some vendors include language that suspends the turnaround SLA during "periods of high demand" — which is essentially a provision that the SLA does not apply when you most need it (large batch submissions, back-catalogue retrofit periods, pre-deadline pushes). Push back on this language: the SLA should apply uniformly or not at all. If the vendor needs to price higher for guaranteed turnaround on large batches, that is a fair negotiating position; a SLA that suspends itself under load is not.

Back-catalogue retrofit SLA

If you are running a back-catalogue retrofit — captioning a large existing library of training video — the turnaround SLA for individual submissions is less important than the total retrofit completion timeline. The contract should specify a retrofit completion SLA: a committed date by which the vendor will deliver captions for a batch of a stated size (for example, "800 hours of content delivered within 90 days of contract start, with incremental delivery of at least 100 hours per week"). This is particularly important for buyers with a compliance deadline — an ADA Title II deadline or an EAA enforcement date — where the retrofit must complete before a specific date or you are non-compliant. The retrofit SLA should be a separate, named obligation in the contract, with a separate service credit structure if the retrofit completion date is missed.

The retrofit SLA should also specify the processing pipeline: does the vendor process all submissions in parallel, or is there a queue that limits throughput? Some vendors have daily processing capacity limits that mean a 1,000-hour back catalogue submitted on day one will not be completed for six months regardless of the stated turnaround SLA. Ask the vendor to specify their maximum daily processing throughput, and contract the retrofit completion date based on that throughput number and your total hours, not on the per-submission turnaround SLA applied naively to the batch size.

SLA breach consequences

The SLA breach consequence is the clause most often left vague in vendor standard contracts. A vendor standard contract often specifies that SLA breaches entitle the buyer to a "service credit" without specifying the credit amount, how credits are calculated, how they are applied, or whether they expire. The practical effect is a contract where the SLA exists on paper but produces no real consequence when breached. The contract should specify: the service credit amount as a percentage of the per-submission fee for each business day the SLA is missed (a common structure is 10–20% of the submission fee per day, capped at the full submission fee for that item), the credit application method (applied to the next invoice, not held in a credit pool that expires), and a termination-for-cause right if the vendor misses the SLA on more than 10% of submissions in any rolling 90-day period. The termination-for-cause right is the teeth of the SLA: without it, the vendor can consistently miss turnaround by one day and pay small service credits without any structural consequence.

Rush-order provisions

Most captioning contracts include a rush-order option at a premium price, but the contract terms for rush orders are often underspecified: the premium is stated (typically 50–100% of the standard per-minute rate), but the definition of what constitutes a rush order, the guaranteed turnaround for rush submissions, and the breach consequence for missed rush-order SLAs are not. Clarify all three in the contract. A rush-order commitment that has no defined turnaround and no breach consequence is a marketing feature, not a contractual obligation. The rush-order turnaround should be same-day for submissions below a stated size (typically 60 minutes of content) and four hours for submissions below a stated shorter size (typically 15 minutes), with a breach consequence of no charge for the submission if the rush turnaround is missed.

Section 3: BAA provisions

The Business Associate Agreement (BAA) is a HIPAA-required contract between a covered entity (a healthcare provider, health plan, or healthcare clearinghouse) and a vendor who creates, receives, maintains, or transmits Protected Health Information (PHI) on the covered entity's behalf. For captioning vendors, the BAA question surfaces when training video content contains PHI — which is a broader category than most L&D leads initially recognise. PHI in training video includes: patient names mentioned in clinical scenarios, medical record numbers cited in workflow demonstrations, account identifiers that can be combined with medical information to identify an individual, and audio or screen-capture content of actual patient encounters used in clinical training modules.

The BAA requirement applies to HIPAA-covered entities and their business associates (vendors who receive PHI in the course of providing services). It also extends to business associate of business associates: if your captioning vendor sub-contracts audio processing to a third-party ASR provider, that sub-contractor may also need to execute a sub-BAA. The BAA clause in your captioning vendor contract should address both the primary BAA and any required sub-BAAs with processing sub-contractors.

When a BAA is required

The BAA is required if the captioning vendor receives or processes audio or video content that contains PHI. The trigger is not whether your organisation is a healthcare company — it is whether the specific content being submitted for captioning contains PHI. A healthcare-adjacent SaaS company whose training library includes clinical workflow demonstrations, patient-case scenarios, or recorded clinical simulations is submitting PHI to the captioning vendor for each of those submissions. The BAA is required for those submissions even if the majority of the training library is non-clinical content. The practical recommendation is: if any training content in your library could contain PHI, execute a BAA before any content is submitted. The cost of a BAA negotiation is one-time; the cost of a HIPAA breach discovered during an audit because the captioning vendor processed PHI without a BAA is significant and potentially includes OCR civil monetary penalties.

Some captioning vendors will tell you that a BAA is not required because "we delete the audio immediately after processing." This is not a correct reading of HIPAA. The BAA obligation is triggered by the act of receiving or processing PHI, not by the duration of the retention. If the vendor's ASR system processes the audio and generates a transcript, the vendor has created, received, and transmitted PHI on your behalf — regardless of what happens to the source file afterward. The BAA is required from the moment of processing.

What a HIPAA-sound BAA must contain

A HIPAA-compliant BAA must contain the following provisions, per 45 CFR 164.504(e): (1) a description of the permitted and required uses and disclosures of PHI by the business associate; (2) a provision requiring the business associate to not use or disclose PHI other than as permitted in the agreement or required by law; (3) a requirement that the business associate use appropriate safeguards and report any breaches or security incidents; (4) a requirement that the business associate ensure any sub-contractors or agents agree to the same restrictions; (5) an obligation to make PHI available to the covered entity on request; (6) an obligation to make the business associate's records and practices available to HHS for audit; and (7) a termination clause specifying that the business associate will return or destroy PHI upon termination of the agreement.

Most captioning vendor BAA templates contain the required provisions, but with carve-outs and limitations that weaken them. Common red flags in vendor BAA drafts:

BAA scope for non-healthcare organisations

Non-healthcare organisations — SaaS companies, universities, financial services firms — may also have BAA obligations if they handle PHI as part of their product or service (for example, a practice-management SaaS company, a health benefits platform, or a university health system). For these organisations, the BAA requirement in the captioning context follows the same rule: if the training content contains PHI, the captioning vendor is a business associate and a BAA is required. The analysis is the same whether the organisation is a covered entity itself or a business associate of a covered entity. For universities with a student health service that produces training video covering FERPA-covered educational records in addition to HIPAA-covered health records, the BAA obligation applies to the health-related content and a separate FERPA data-handling clause applies to the educational-records content.

Section 4: Data retention and deletion

The data retention and deletion clause governs what the vendor retains, for how long, and what happens to it on contract termination. This clause is underspecified in most vendor standard contracts — the vendor's default position is to retain everything indefinitely, which is convenient for the vendor (the transcripts and accuracy data are valuable for model training) and a liability for the buyer (retained audio and transcripts containing PHI, confidential training content, or employee personal data represent an ongoing privacy and security exposure). Negotiate the retention clause as carefully as the accuracy clause.

Audio retention window

The vendor's standard retention clause for submitted audio typically covers one of three positions: indefinite retention ("data is retained as needed for service delivery and improvement"), a stated fixed window ("audio files are deleted within 90 days of processing completion"), or a configurable window ("customer data retention can be configured to 30/60/90 days in the account settings"). The first position is unacceptable without a negotiated amendment; the second depends on the stated window; the third is the most buyer-friendly default. The recommended position is a 30-day audio retention window from processing completion, with the option to extend to 90 days for reprocessing purposes if needed, and a hard deletion obligation on contract termination (no grace period).

The audio retention window matters most when the submitted content contains sensitive categories of information: PHI, financial data cited in compliance training, employee personal data in onboarding video, and trade secrets in product training. For a 1,400-hour training library submitted for retrofit captioning, the audio files are retained on the vendor's infrastructure for the duration of the retention window — a period during which a vendor-side breach would expose your entire training library. A 30-day window limits the maximum exposure to the current processing batch; an indefinite retention window means the entire retrofit is at risk until you exit the contract.

Transcript and metadata retention

The transcript (the machine-generated text output of the ASR process, before the caption file is formatted) and the caption file itself (the SRT, VTT, or other formatted output) are distinct from the source audio and have different retention interests. The vendor typically retains transcripts and caption files longer than source audio, because transcripts are the data input to accuracy measurement, quality-improvement workflows, and (in some vendor architectures) model training. The contract should specify: (1) whether transcripts are retained, and for how long; (2) whether transcripts are used for vendor model training, and whether they can be excluded from model training on request; and (3) whether the transcript is accessible to the buyer via the vendor's platform after the retention window (useful for audit trail purposes) or is deleted with no buyer access after the window.

The model-training use clause is a specific negotiating point for buyers whose training content contains confidential information. Many captioning vendor contracts include language permitting the vendor to use de-identified transcripts for "service improvement" or "model training." The de-identification claim is typically that the customer account identifiers are stripped before the transcript enters the training pipeline. The de-identification claim is often technically true but contextually insufficient: the transcript text itself may be sufficient to identify the customer (training content that describes company-specific products, processes, or people is contextually identifiable even without the account ID). For buyers with highly sensitive training content, the contract should explicitly exclude their content from vendor model training, with the exclusion enforceable via contractual audit rights.

Data residency

Data residency (where the audio and transcript data is physically stored during processing and retention) is a clause category that frequently matters for EU-domiciled organisations operating under GDPR, Canadian organisations under PIPEDA, and US healthcare organisations with specific state data-residency requirements. The vendor's standard contract typically does not specify data residency, which means the data may be processed on infrastructure in any jurisdiction the vendor uses. For GDPR Article 46 compliance, the vendor must either process the data in the EU/EEA or in a country with an adequacy decision, or execute a Data Processing Agreement (DPA) with Standard Contractual Clauses (SCCs) that governs the transfer. For US healthcare organisations with state data-residency requirements (common in states with enhanced medical privacy statutes), the data residency clause must specify that audio and transcripts are stored on US-only infrastructure.

If your organisation has data residency requirements, do not proceed on the assumption that the vendor's standard infrastructure meets them. Ask the vendor to provide a written statement of data residency (where is the data processed and stored, on whose infrastructure, under what data-processing agreement), and have your privacy counsel review the response before contract signature.

Deletion rights on termination

The deletion clause on contract termination should be non-negotiable: all customer audio, transcripts, and associated metadata should be deleted within 30 days of contract termination, with a written certificate of deletion delivered to the buyer. The vendor's standard contract often specifies a longer deletion window (60–90 days) or conditions deletion on a written request from the buyer. Neither is acceptable. The deletion obligation should be automatic and self-executing — the vendor should not require a trigger from the buyer to execute the deletion obligation. The 30-day window is the standard used by most enterprise SaaS vendors in DPAs; captioning vendors should be able to match it. The certificate of deletion is important for HIPAA BAA compliance (the BAA specifically requires the business associate to return or destroy PHI upon termination) and for your internal audit trail.

Section 5: Glossary ownership and portability

The glossary ownership clause is the clause most specific to captioning vendors — it has no direct parallel in most software vendor contracts and is correspondingly overlooked during contract review. The per-customer glossary is the mechanism that separates a captioning vendor that produces 99% accuracy on your training content from one that produces 87%. As described in the customer glossary architecture post, the glossary compounds over time: every accuracy correction made during the human-review step feeds the customer glossary model, making each subsequent submission more accurate than the last. The accuracy improvement over a 12-month engagement — from 87% on the initial back catalogue to 99%+ on content submitted in month twelve — represents a significant investment of time, editorial labour, and institutional knowledge.

This investment lives in the per-customer glossary and the accuracy model built on it. The contract should make explicit that this investment is your asset, not the vendor's. Three specific clause provisions are required:

Glossary IP ownership

The contract should specify that all glossary terms, term-frequency data, and accuracy-improvement data derived from the buyer's content are the exclusive property of the buyer. Some vendor contracts contain language characterising the per-customer glossary as vendor IP (on the grounds that the glossary is stored in the vendor's system and was produced by the vendor's model). This characterisation is incorrect and should be rejected: the glossary terms are the buyer's own vocabulary (your product names, your internal jargon, your proprietary identifiers), and the accuracy data is derived from the buyer's content. The vendor's contribution is the infrastructure to store and apply the glossary, not the glossary content itself. The contract should state this explicitly: "All glossary terms, term definitions, custom vocabulary data, and accuracy-improvement data derived from Buyer's submitted content are the exclusive intellectual property of Buyer. Vendor's rights with respect to Buyer's glossary are limited to the right to use the glossary to improve caption accuracy for Buyer's submissions during the term of this Agreement."

Glossary export rights

The contract should guarantee the buyer's right to export the full per-customer glossary at any time, in a documented, machine-readable format (JSON, CSV, or TSV), without requiring the buyer to submit a support ticket or wait more than five business days for the export to be delivered. The export right has two practical applications: (1) as a negotiating tool — a buyer who can export their glossary on demand has a credible exit option that limits vendor leverage during contract renewal, because the glossary value does not lock the buyer in; and (2) as a data-continuity mechanism — in the event the vendor experiences a service outage, goes out of business, or is acquired, the buyer can take the exported glossary to a replacement vendor and restart from a substantive accuracy baseline rather than from zero. Specify both the format (machine-readable, not a formatted PDF) and the delivery window (five business days, with the export delivered as a downloadable file in the vendor's portal rather than via email attachment).

Glossary return on termination

The contract should specify that on termination, the vendor will deliver a complete export of the per-customer glossary in the agreed format within the same 30-day window as the audio and transcript deletion. The glossary return obligation should be specified as a prerequisite to the vendor's deletion rights: the vendor cannot delete the per-customer glossary data until they have confirmed delivery of the glossary export to the buyer. This sequencing prevents the failure mode where the vendor deletes the glossary as part of the termination data-deletion process before delivering the export, leaving the buyer with no accuracy history to take to a replacement vendor. The caption feedback loop post covers the accuracy compounding model; the glossary export on termination is the mechanism that prevents the compounding investment from being lost on vendor transition.

Model training exclusion

Related to the data retention clause: the contract should explicitly exclude the buyer's per-customer glossary from the vendor's general model training. The vendor's interest is to use per-customer vocabulary data to improve the general model (which benefits all customers). The buyer's interest is to prevent their proprietary vocabulary from being incorporated into a general model that their competitors also use. This tension is most acute for SaaS companies whose product vocabulary (feature names, API terms, proprietary workflow identifiers) represents competitive IP. Specify: "Vendor may not use Buyer's glossary terms, term definitions, or vocabulary data to train, fine-tune, or otherwise improve any model used for customers other than Buyer. Vendor may use Buyer's glossary data solely for improving caption accuracy on Buyer's own submissions." This clause is widely accepted by enterprise captioning vendors but is rarely in their standard contracts; it requires a specific request during negotiation.

Section 6: Integration and format SLAs

The integration and format SLA clause covers the vendor's obligations with respect to the output of their captioning service: the formats they will deliver, the LMS integrations they will support, the API uptime they guarantee, and what happens when integration behaviour changes. This clause category is often collapsed into a brief "supported formats" section in vendor contracts with no SLA terms attached to it.

Output format guarantees

The contract should specify the exact output formats guaranteed for the duration of the engagement. Standard formats for L&D use cases are SRT, VTT, TTML, and STL — the format guide post covers the compatibility and conversion considerations for each. Most captioning vendors support SRT and VTT as standard; TTML support for broadcast and enterprise LMS workflows varies; STL support is narrow. If your LMS import workflow requires a specific format, confirm that format is in the contract's guaranteed output list. The contract should also specify what happens if the vendor deprecates a supported format during the contract term — the default vendor position is that formats can be changed with 30 days notice, which is insufficient if a format change requires updating your LMS ingestion workflow.

For format deprecations or changes to the SRT/VTT schema (for example, changes to the timing precision, the character encoding, or the line-length formatting), the contract should require 90 days advance written notice and a parallel-delivery period where both the old and new format are delivered simultaneously, allowing the buyer to update their ingestion workflow before the old format is discontinued.

LMS integration SLAs

If the vendor provides a direct integration with your LMS — an API or native connector that allows caption files to be delivered directly to your LMS without manual upload — the integration should be covered by a separate SLA. LMS integrations fail for reasons that are often outside the captioning vendor's direct control (LMS API changes, authentication token expiry, LMS rate limits), but the captioning vendor is the party responsible for maintaining the integration, and the SLA should assign that responsibility clearly. The integration SLA should cover: uptime (what percentage of API requests succeed, measured over a rolling 30-day window), failure notification (how quickly the vendor notifies the buyer when an integration is degraded), and restoration window (how quickly the vendor restores integration functionality after a failure). A reasonable integration SLA is 99% uptime with a 4-hour notification window and a 24-hour restoration window for P1 integration failures.

API deprecation notice

If you are using the vendor's API for a custom integration — direct submission via API, webhook delivery of completed caption files, or programmatic glossary management — the contract should specify the API deprecation notice period. Vendor standard contracts often permit API changes or deprecations with 30 days notice, which is insufficient if your engineering team needs to update a production integration. The recommended notice period for any API change that breaks a documented integration is 90 days, with a 180-day notice period for major API version deprecations. The notice period should be accompanied by an obligation to maintain the old API version in parallel with the new version for the duration of the notice period.

Section 7: Pricing, volume, and overage terms

The pricing section of a captioning vendor contract is the one most buyers review carefully, but even in the pricing section there are terms that are routinely missed: overage pricing, unused-minutes policy, rate lock duration, and minimum volume commitments. These terms determine the unit economics of the engagement over the contract term, and they are where the vendor's standard contract most consistently favours the vendor over the buyer.

Overage pricing

Most captioning contracts are priced on a volume basis: a stated number of minutes or hours per month at a stated per-minute rate, with an overage rate for volume above the committed tier. The per-minute rate for overage is almost always higher than the contracted rate — often significantly higher, because the overage rate is the rate at which uncommitted volume is billed and the vendor has no competitive pressure at the point of overage billing. The gap between the committed rate and the overage rate determines how punishing it is to exceed your committed volume in a given month.

The contract should specify the overage rate explicitly as a multiple of the committed rate (for example, "overage volume is billed at 1.3× the committed per-minute rate"). If the vendor's standard contract specifies the overage rate as a separate dollar figure without relating it to the committed rate, calculate the multiple before signing. An overage rate of $2.50 per minute against a committed rate of $0.80 per minute is a 3.1× multiple — a punishing overage structure that makes exceeding the committed volume extremely expensive. This structure is common in vendor contracts designed to create pressure to commit to a higher annual volume tier rather than to operate near the boundary of the current tier. The vendor pricing breakdown post covers the volume tier structures of the major vendors; the contract review should confirm that the overage rate you have negotiated matches or improves on the published rate.

Unused minutes rollover

The unused-minutes policy determines what happens to committed volume you do not use in a given period. Vendor standard positions range from "minutes expire at the end of each billing period" (use-it-or-lose-it, most vendor-friendly) to "minutes roll over indefinitely" (most buyer-friendly) to "minutes roll over for 90 days" (typical compromise). For buyers with a back-catalogue retrofit followed by steady-state monthly volume, the rollover policy matters during the retrofit phase: if the back catalogue is submitted in months two and three, and the committed monthly volume is calibrated for steady-state submission, the overage in months two and three is expensive unless unused minutes from month one roll over.

The practical negotiating position is to request a 90-day rollover policy as a contractual commitment, with an explicit provision that rolled-over minutes are consumed before the current-period allotment (so the rollover minutes are used first, not left to expire after the current-period minutes). Some vendors will offer this in exchange for a slightly longer minimum commitment term; the trade is usually worth it if your submission volume is seasonal or retrofit-heavy in the early months of the engagement.

Rate lock duration

The rate lock clause specifies how long the committed per-minute rate is locked and what mechanism governs rate changes after the lock period. Vendor standard contracts typically lock the rate for the initial contract term (one or two years) and allow the vendor to change rates on renewal with 60–90 days notice. This is a reasonable baseline, but the language around the rate-change mechanism at renewal is important: the contract should specify that the rate at renewal cannot increase by more than a stated percentage (CPI plus 5%, or a flat 10% cap, are common benchmarks) without the buyer's consent. Without a cap, the vendor can reset the rate to market on renewal, eliminating the volume discount the buyer earned by committing to the initial term.

The rate lock should also specify what happens to the rate if the buyer's volume increases during the term. Volume growth should not be penalised: the contract should specify that if the buyer's actual volume consistently exceeds the committed tier, the rate resets to the rate applicable to the higher tier (which should be a lower per-minute rate) rather than the overage rate. Automatic tier-ratcheting — where exceeding the committed volume triggers a renegotiation at the next tier rate — is the buyer-friendly structure; static-tier pricing with overage at a premium rate is the vendor-friendly structure.

Minimum volume commitments and exit rights

Many captioning vendor contracts include a minimum annual volume commitment: a stated number of minutes or hours that the buyer commits to pay for regardless of actual usage. The minimum commitment is the vendor's mechanism for revenue certainty; it is the buyer's mechanism for securing a preferred rate. The contract should specify the consequences of falling below the minimum: typically, the buyer pays the difference between actual usage and the committed minimum at the committed rate. The exit right from a minimum commitment should be addressed explicitly: if the buyer's captioning programme is discontinued, reduced below the minimum for reasons outside the buyer's control (a budget cut, an acquisition, a legal hold), the buyer should have a right to exit the minimum commitment with a stated notice period (typically 90 days) rather than paying out the remaining committed term at the full minimum volume.

Section 8: Termination and transition rights

The termination clause governs both termination for convenience (the buyer decides to exit) and termination for cause (the vendor has materially breached the contract). Both require specific provision in the contract; the standard vendor contract is heavily biased toward the vendor on both.

Termination for cause

The contract should specify the buyer's right to terminate for cause if the vendor materially breaches the contract and fails to cure the breach within a stated notice period. Material breach events for a captioning vendor contract include: (1) failure to deliver compliant captions at the guaranteed accuracy level on more than a stated percentage of submissions in any rolling 90-day period; (2) failure to execute the BAA or a violation of BAA terms; (3) failure to delete data within the required retention window on a written request; (4) SLA breaches that exceed the stated threshold; and (5) a change of control where the successor entity is a direct competitor of the buyer. The cure period for material breach should be 30 days for operational breaches (accuracy, SLA) and 10 days for compliance breaches (BAA violations, data deletion failures). If the vendor fails to cure within the cure period, the buyer should have a right to exit the contract with no penalty and with a prorated refund of any prepaid fees for the undelivered contract term.

Termination for convenience

Termination for convenience — the buyer's right to exit without cause — is usually available in vendor contracts but with a notice period and, sometimes, an early termination fee. The notice period for convenience terminations is typically 30–90 days; an early termination fee (ETF) is common for contracts with minimum volume commitments and may be structured as a percentage of the remaining committed value. The contract should specify: (1) the notice period for convenience termination; (2) whether an ETF applies and how it is calculated; (3) the buyer's right to exit without ETF if the vendor initiates a material change to the service (price increase above the rate-lock cap, format deprecation without adequate notice, reduction in service level); and (4) the buyer's right to exit without notice in the event of a vendor BAA violation or a data breach affecting the buyer's content.

Data return and transition assistance

The transition provisions should specify three obligations: (1) data return — all buyer data (audio, transcripts, glossary, metadata) returned or deleted within 30 days of termination, as described in the data retention section; (2) transition assistance — the vendor will provide reasonable assistance to the buyer's new vendor during the transition period, including delivering the glossary export in a format compatible with common captioning platforms; and (3) post-termination read-only access — the buyer retains read-only access to previously delivered caption files via the vendor's platform for 60 days after termination, to allow time to archive completed caption files before the account is deprovisioned.

Transition assistance is the provision most often omitted from vendor contracts and most valuable during actual vendor transitions. A captioning vendor who has processed 1,200 hours of your content over a two-year engagement has knowledge of your content's audio characteristics, your common error patterns, and your glossary structure that is not fully captured in the glossary export. A transition assistance SLA — specifying what the vendor will provide (format conversion for the glossary export, documentation of the per-customer model configuration, a technical call with the new vendor's onboarding team) — captures the transition value that would otherwise be lost. It also creates an incentive for the vendor to support a clean exit rather than to obstruct the transition.

Seven contract failure modes

These are the contract failures we have seen play out in mid-market captioning engagements. Each one is preventable with the clause negotiation described above; each one is expensive to remediate after the contract is signed.

Failure mode 1: Vague accuracy language with no measurement protocol

The contract states "Vendor will use best efforts to achieve 99% word-level accuracy." The 99% number is there; the measurement protocol is not. The vendor uses a corpus-wide WER measurement that produces 99% accuracy on the overall library while specific video categories (clinical content, engineering onboarding, product training) are at 83–89% on the vocabulary categories that matter most. The buyer discovers the accuracy gap during an accessibility audit or a learner complaint and has no contractual basis for a remediation demand because the accuracy guarantee is expressed as a "best efforts" commitment, not a specific measurable obligation. The outcome is a six-month negotiation about whether the vendor has met the accuracy standard, during which the non-compliant content remains in service. The fix is specifying the measurement protocol in the contract before signature — the DCMP per-video spot-check methodology, a random sample drawn by the buyer, with a specific remediation trigger and timeline.

Failure mode 2: Missing BAA despite PHI exposure

The L&D team submits a 200-hour clinical-skills training library for captioning without recognising that the content contains PHI (patient names used in illustrative scenarios, procedure codes cited in workflow demonstrations, medical record identifiers used as examples in documentation-accuracy training). The vendor processes the audio, produces the captions, and the content enters the LMS. Eighteen months later, during a HIPAA compliance review triggered by an acquisition due-diligence process, the compliance officer discovers that the captioning vendor processed PHI without a BAA for the duration of the engagement. The vendor has been acquired and their data retention practices during the period of the engagement are undocumented. The acquirer's legal team flags the BAA gap as a material compliance issue. The remediation path involves retroactively documenting the PHI inventory, notifying OCR of a potential HIPAA violation as part of the good-faith self-disclosure programme, and executing a remediation plan. The entire exposure is preventable by executing the BAA before the first clinical-content submission. The rule is: if any content in the submission queue could contain PHI, execute the BAA before the first submission, not after.

Failure mode 3: Glossary IP owned by the vendor

The buyer's per-customer glossary grows over two years from 200 terms to 1,800 terms, incorporating the full product vocabulary of a rapidly growing SaaS company with quarterly release cycles. The glossary represents two years of iterative accuracy corrections, post-delivery edits, and editorial decisions about how specific terms should be rendered in captions. The vendor's contract contains standard language characterising per-customer glossary data as vendor IP "generated as part of the service delivery process." On contract renewal, the vendor increases rates by 40%. The buyer attempts to exit to a lower-cost competitor. The vendor refuses to export the glossary on the grounds that it is vendor IP and is not deliverable to a competing vendor. The buyer faces the choice of accepting the 40% rate increase or re-starting from a zero-term glossary with the new vendor, losing two years of compounding accuracy improvement. The glossary IP clause — explicitly establishing the per-customer glossary as buyer property, exportable on request, returned on termination — eliminates this failure mode entirely.

Failure mode 4: No deletion obligation on termination

A healthcare-adjacent SaaS company terminates a captioning vendor relationship at the end of a two-year contract. The vendor's contract specifies that customer data is "purged on a scheduled basis in accordance with vendor data management policies," with no specific timeline or buyer notification. Three months after termination, the vendor experiences a security breach. The breach notification discloses that customer data from the prior 12 months is among the exposed data — including audio files from the healthcare company's clinical-content training library. The healthcare company investigates and discovers that the vendor retained audio files for the terminated account for over 90 days post-termination, despite the assumption that deletion was handled on termination. The absence of a specific deletion obligation and a certificate of deletion left the healthcare company unable to confirm its data was not exposed. A specific 30-day deletion obligation, automatic on termination, with a written certificate of deletion, would have established a clear evidence baseline for the breach investigation.

Failure mode 5: SLA measured on response time instead of completion

The vendor's 48-hour turnaround SLA specifies "48 hours from submission to processing completion." The vendor interprets "processing completion" as the moment the ASR system completes the machine-generated transcript — not the moment the caption file is delivered to the buyer in the agreed format. The vendor's workflow includes a human-review step for healthcare-adjacent content that adds 12–36 hours to the processing pipeline between ASR completion and delivery. The buyer consistently receives caption files 60–72 hours after submission and is told the SLA is not being breached because "processing completed within 48 hours." The SLA measured on response time rather than delivery creates a gap between what the buyer needs (caption files delivered within 48 hours) and what the vendor has committed to (internal processing completed within 48 hours). The SLA should be specified end-to-end: from the moment of confirmed submission to the moment the caption file is accessible to the buyer in the vendor's delivery channel. Any internal processing step between submission and delivery is within the 48-hour window, not additive to it.

Failure mode 6: Overage pricing that eliminates the unit-economics case

A 200-employee SaaS company commits to a captioning plan at 500 minutes per month at $0.75 per minute ($375/month). The overage rate is $3.50 per minute, which the sales team frames as "applicable only if you significantly exceed your plan." In month three, a new onboarding programme launches with 12 product-demonstration videos totalling 340 minutes. Combined with the normal monthly training content of 280 minutes, the month's submissions total 620 minutes — 120 minutes over the plan. The overage bill for that month is $420 (120 × $3.50), bringing the total for the month to $795. The monthly cost is 2.1× the committed plan cost. The buyer upgrades to the 1,000-minute plan at $1.20 per minute ($1,200/month) and discovers that the per-minute rate at the 1,000-minute tier is more expensive than the 500-minute tier because the plan pricing is structured non-linearly. The unit-economics case for the captioning programme — presented to the VP of L&D based on the 500-minute-plan rate — is no longer correct at the volume the programme actually runs at. The overage pricing structure should be reviewed at the time of contract signature, not when the first overage invoice arrives.

Failure mode 7: Auto-renewal with a long notice window

The captioning vendor contract has a one-year term with automatic renewal and a 90-day non-renewal notice window. The L&D team completes the vendor evaluation process and selects a new captioning vendor in month 10 of the contract year. The procurement team sends a non-renewal notice in month 11, believing the 90-day notice requirement refers to 90 days before the end of the calendar year. The contract's anniversary date is in month nine. The non-renewal notice arrives in month 11, which is 30 days after the contract's anniversary date — too late. The contract auto-renewed for another year. The buyer is committed to a 12-month contract with a vendor they have already decided to exit. The outcomes are: negotiate an early exit (usually possible but not free), operate in parallel with the new vendor for up to 12 months while paying the incumbent (expensive), or continue with the incumbent for another year (defeats the purpose of the evaluation process). The auto-renewal notice window should be calendared on contract signature, with a 30-day reminder before the notice window opens. The notice window itself should be negotiated to 30 days for contracts under $50K annual value.

The 42-point contract review checklist

Run this checklist on any captioning vendor contract above $10K annual value before signature. Items marked (M) are mandatory — missing or unacceptable language on a mandatory item should block signature pending negotiation. Items marked (R) are recommended but negotiable depending on your organisation's risk profile.

Accuracy (8 items)

ItemPriorityWhat to look for
1. Accuracy floor statedM99% word-level accuracy specified, not "commercially reasonable efforts"
2. Measurement protocol specifiedMDCMP Captioning Key or equivalent per-video spot-check methodology named
3. Sample selection by buyerMBuyer (not vendor) selects the sample passage for accuracy measurement
4. All vocabulary in scopeMNo carve-out for "technical terms" or "proper nouns" from the accuracy obligation
5. Remediation trigger definedMSpecific accuracy threshold (e.g., below 97%) that triggers remediation obligation
6. Remediation window specifiedM5–10 business days for standard; 2 days for compliance-critical content
7. Remediation method definedRHuman-reviewer re-captioning required, not ASR regeneration
8. Breach consequence for failed remediationMService credit plus termination-for-cause right if remediation SLA is missed repeatedly

SLA terms (6 items)

ItemPriorityWhat to look for
9. Turnaround SLA end-to-endMMeasured from submission confirmation to file delivery, not to internal processing completion
10. No load-suspension clauseMSLA applies uniformly; no suspension during "high demand" periods
11. Back-catalogue retrofit SLAMSeparate total-completion SLA for retrofit batch, not just per-submission SLA
12. Rush-order terms specifiedRRush turnaround defined (same-day / 4-hour), premium rate stated, breach consequence included
13. SLA breach credit formulaMSpecific credit amount (% of submission fee per day), applied to next invoice without expiry
14. Termination right for repeated SLA breachMBuyer can terminate for cause if SLA breach rate exceeds threshold in any 90-day period

BAA provisions (6 items)

ItemPriorityWhat to look for
15. BAA executed before first PHI submissionMBAA must be in place before any clinical, health-related, or PHI-containing content is submitted
16. BAA scope covers all PHI in submitted contentMNo "de-identified audio" carve-out if audio contains spoken PHI
17. Breach notification window ≤ 72 hoursMVendor must notify buyer within 72 hours of breach discovery (not 30 days)
18. Sub-processor sub-BAA requiredMAll sub-processors who receive PHI must execute sub-BAAs; buyer to be notified of sub-processor changes
19. Destruction clause is automatic, not request-triggeredMPHI destroyed within 30 days of termination without requiring a buyer request
20. Certificate of destruction deliveredRWritten certificate of PHI destruction delivered within 30 days of termination

Data retention and deletion (6 items)

ItemPriorityWhat to look for
21. Audio retention window ≤ 30 daysMSource audio deleted within 30 days of processing completion
22. Transcript retention and use specifiedRTranscript retention window stated; use for model training addressed
23. Model training exclusion availableRBuyer can opt out of transcript/glossary data use for general model training
24. Data residency statedMData residency jurisdiction stated; GDPR or state data-residency requirements addressed if applicable
25. Deletion on termination automaticMAll data deleted within 30 days of termination without requiring buyer request
26. Certificate of deletion deliveredMWritten certificate of deletion for all buyer data within 30 days of termination

Glossary ownership and portability (5 items)

ItemPriorityWhat to look for
27. Glossary IP owned by buyerMPer-customer glossary explicitly buyer property; vendor rights limited to use during term
28. Glossary export right, any timeMBuyer can export glossary on request, machine-readable format, delivered within 5 business days
29. Glossary returned on terminationMComplete glossary export delivered within 30 days of termination, before deletion
30. Model training exclusion for glossaryRGlossary terms not used for training general models available to other customers
31. Export format documentedRExport format (JSON, CSV, TSV) specified and documented in contract or schedule

Integration and format SLAs (5 items)

ItemPriorityWhat to look for
32. Output formats guaranteedMSpecific formats (SRT, VTT, TTML, STL) named in contract; required formats confirmed in list
33. Format deprecation notice ≥ 90 daysM90-day advance notice required for format deprecation; parallel delivery during notice period
34. LMS integration uptime SLARAPI uptime stated (99%), failure notification window (4 hours), restoration window (24 hours)
35. API deprecation notice ≥ 90 daysR90-day notice for breaking API changes; 180-day notice for major version deprecations
36. Integration change notificationRBuyer notified of any integration configuration change within 5 business days

Pricing, volume, and overage (4 items)

ItemPriorityWhat to look for
37. Overage rate as multiple of committed rateMOverage rate stated as multiple of committed rate (≤ 1.5×); absolute dollar figure alone is insufficient
38. Rollover policy ≥ 90 daysRUnused minutes roll over for at least 90 days; rolled-over minutes consumed before current-period allotment
39. Rate lock with renewal capMRate lock for contract term; renewal increase capped at CPI + 5% or 10% flat
40. Minimum volume exit rightRBuyer can exit minimum volume commitment with 90 days notice for budget or programme reasons

Termination and transition (2 items)

ItemPriorityWhat to look for
41. Termination for cause rights specifiedMBuyer can terminate for cause with 30-day cure period for operational breach; 10-day for compliance breach
42. Transition assistance SLA includedRVendor provides glossary export, format documentation, and technical transition call for new vendor

FAQ

Do all captioning vendors accept contract modifications, or do most require signing the standard agreement?

Most mid-market captioning vendors accept contract modifications in the clause areas described above, particularly for accounts above $10K annual value. The resistance points vary by vendor: accuracy measurement protocol and remediation terms are commonly negotiated; BAA provisions are mandatory for any vendor serving healthcare-adjacent buyers and most will have a standard BAA template; glossary IP and model training exclusion require specific negotiation and are not in most vendors' standard contracts but are accepted when requested; data retention windows are often configurable at the account level as a self-service setting rather than requiring contract modification. The practical approach is to run the 42-point checklist on the vendor's standard contract, identify the items that are silent or unacceptable, and present them as a single redline document rather than opening individual negotiations on each item. Most vendors will work through a single redline faster than they will respond to a series of individual modification requests.

For contracts below $10K annual value, vendor flexibility on modifications is more limited. At this price point, the most important contract provisions to insist on are the BAA (if PHI is in scope), the accuracy floor with a specific measurement reference (even if the full DCMP protocol is not contractually specified, a reference to WCAG 2.1 AA SC 1.2.2 accuracy standards is sufficient for most compliance purposes), and the glossary export right. These three items are achievable even on vendor self-serve plan agreements if the buyer is willing to represent their compliance requirements clearly and escalate to the vendor's enterprise or legal team if the online-only agreement does not address them.

At what point in the procurement process should we start the contract review — after the vendor is selected, or during the RFP phase?

The contract review process most efficiently starts during the RFP phase, as a parallel track, with a specific contract-review step built into the vendor evaluation process. The RFP playbook describes the Section 4 of the RFP scoring matrix as covering SLA, pricing, and contract terms. Including contract terms in the RFP scoring — specifically, whether the vendor accepts the key modifications described above — gives you contract-quality data before you select a preferred vendor. This prevents the common scenario where a vendor wins the RFP on accuracy and pricing merits but whose contract terms are unacceptable, leading to either protracted post-selection negotiation or a decision to sign on unfavourable terms because the evaluation investment has already been made. The most efficient procurement flow is: RFP Section 4 includes a yes/no checklist of required contract provisions; vendor responses to Section 4 are scored alongside accuracy and pricing; the preferred vendor is selected with contract compliance as a scored criterion, not an afterthought.

Our organisation does not process any PHI. Do we still need to address data retention and deletion clauses?

Yes. PHI is the most acute category of sensitive data in training video, but it is not the only one. Training video at most organisations contains confidential business information (proprietary product architecture, unreleased feature demonstrations, customer-account data used as training examples), employee personal data (names, job titles, and identifying information in onboarding video), and legally privileged information (compliance training that references specific regulatory decisions or enforcement matters). Each of these categories creates retention exposure if retained indefinitely on the vendor's infrastructure. The 30-day audio retention window and the automatic deletion on termination are appropriate for all training video content, not just PHI-containing content. The deletion certificate is particularly valuable for companies that undergo regular privacy compliance audits (SOC 2 Type II, ISO 27001, privacy impact assessments) because it provides documentary evidence that vendor data-handling commitments were executed. For organisations in verticals with strong data governance requirements — financial services, legal, pharmaceutical — the data residency clause is equally important even without direct PHI involvement.

We are currently in a contract with a captioning vendor. Can we negotiate modifications during the contract term?

Mid-contract modifications are possible but require more negotiating leverage than modifications negotiated before signature. The most practical entry points for mid-contract modification are: the annual account review (most enterprise captioning vendors conduct an annual review meeting; this is the appropriate forum to raise contract modifications alongside a discussion of renewal terms); a contract renewal negotiation (if the contract is within 90 days of auto-renewal, this is the highest-leverage point for modification); or a response to a service failure (an accuracy breach, a data retention failure, or an SLA miss creates a legitimate opening to renegotiate the affected clause, and often provides leverage to improve related clauses simultaneously). The modifications most achievable mid-contract are the glossary export right (often achievable as a platform feature request if the vendor has not yet built self-service export), the BAA execution (achievable at any time if the vendor has a standard BAA template and the buyer has PHI-containing content), and data retention configuration (often available as a self-service account setting). Accuracy measurement protocol changes and rate cap provisions for renewal are more difficult mid-contract and are best targeted at the renewal negotiation.

What is the correct approach to the glossary ownership question if we use a captioning vendor that also provides an LMS integration — is the glossary associated with the LMS or with the captioning vendor?

The glossary ownership question applies to the per-customer glossary maintained by the captioning vendor, regardless of whether the captioning vendor is integrated with your LMS. The LMS is the delivery system for the caption file; the glossary that influenced the accuracy of the caption file is maintained by the captioning vendor. The two are separate. If your captioning vendor delivers caption files directly to your LMS via a native integration — as described in the LMS migration checklist post — the caption files in your LMS are the output of the captioning process. The glossary that produced those outputs remains with the captioning vendor. On exit from the captioning vendor, you take your glossary export to the new captioning vendor; the caption files already in your LMS stay there regardless of which captioning vendor you use going forward. The only complication arises if the captioning vendor provides a custom LMS integration that embeds vendor-specific metadata in the caption file (for example, vendor-specific format extensions or accuracy-confidence scores embedded in the VTT file that only the vendor's LMS integration can interpret). In this case, the migration complexity is higher — transitioning to a new captioning vendor may require re-processing content already in the LMS if the new vendor's caption files use a different metadata schema. Avoid proprietary metadata extensions by requiring standard-compliant SRT/VTT output without vendor-specific extensions in the contract output format clause.

We are evaluating a captioning vendor who refuses to sign a BAA because they claim their system does not retain audio after processing. Is this a legally valid basis for declining the BAA?

No. The BAA obligation under HIPAA is not conditioned on the duration of data retention; it is triggered by the act of creating, receiving, maintaining, or transmitting PHI. A captioning vendor who receives audio containing PHI, processes it through an ASR system, and produces a transcript has received, processed, and transmitted PHI — regardless of the retention duration. The "we delete immediately" argument is sometimes advanced by vendors who have not had their position reviewed by HIPAA counsel. The appropriate response is to ask the vendor to obtain written confirmation from their legal team that their data handling is outside BAA scope, specifically addressing the audio-processing step. In most cases, a vendor legal review of the question will conclude that a BAA is required. If the vendor continues to decline the BAA after a legal review, that vendor is not an appropriate partner for any content that contains PHI, and the RFP selection should be re-evaluated. The compliance exposure of processing PHI without a BAA is material; the inconvenience of executing a BAA is minimal. Any captioning vendor serving healthcare-adjacent buyers at scale will have a standard BAA template — if a vendor claims not to have one, that is a signal about their customer base and legal maturity, not about the technical correctness of their BAA-decline position.

How do the contract terms described here apply to GlossCap specifically?

GlossCap's standard subscription agreement addresses the core contract provisions as follows. Accuracy: the product is designed to the WCAG 2.1 AA 99% accuracy standard, with glossary-biased decoding as the primary accuracy mechanism; the accuracy measurement reference in the subscription agreement is the WCAG 2.1 AA SC 1.2.2 standard applied per-video. BAA: GlossCap executes BAAs for accounts with healthcare, life-sciences, or clinical-education content; the BAA template covers all sub-processors with access to customer audio. Glossary ownership: per-customer glossary terms are explicitly buyer property; self-service export is available at any time from the account dashboard in CSV and JSON format. Data retention: audio files are deleted 30 days after processing completion by default, configurable to 7 or 14 days in account settings; transcripts are deleted on account termination. Termination: accounts can terminate month-to-month plans with 30 days notice and no early termination fee; annual plans include a 90-day notice window with no ETF if the product's core accuracy standard is not met. If you are comparing GlossCap's contract terms against the checklist above, the items the standard subscription agreement covers without modification and the items that require a custom enterprise agreement are summarised on the pricing page. Enterprise agreements (Org plan) include a redline process for all 42 checklist items above; Solo and Team plan agreements cover the mandatory items.

Captioning vendor contracts that work as hard as the product does

The post above covers the contract-phase work that protects your organisation after the vendor selection is made. But the contract-phase negotiation only matters if the vendor can actually deliver the accuracy standard you are negotiating into the agreement. The accuracy guarantee clause in a captioning vendor contract is only enforceable if you have a way to measure whether it is being met — and the measurement is only meaningful if the vendor's product can reach the 99% threshold on your actual training content with your actual vocabulary.

GlossCap's per-customer glossary model is built to pass the contract-level accuracy test: DCMP Captioning Key word-level accuracy, measured on a random per-video spot-check, on a passage selected by you rather than by us, including all proper nouns, product names, API identifiers, and domain-specific vocabulary in the accuracy count. The accuracy improvement from the initial submission to month six of a compounding glossary engagement — from the 80–88% auto-caption baseline to 96–99% on domain-specific content — is documented in the caption feedback loop post. The glossary that drives this improvement is your property, exportable from your account dashboard at any time in CSV and JSON format, returned to you in full on termination.

The contract checklist above was written with GlossCap's own subscription agreement in mind: every mandatory item on the checklist is addressed in our standard Team and Org plan agreements. Review the checklist, compare it against your current captioning vendor's contract, and compare the gap against what Rev, 3Play, or Verbit offer in their enterprise contracts. If the contract quality matters as much as the product quality — and for a two-year captioning engagement with a back catalogue of training content, it should — the full checklist review is worth the time before signature.

Review GlossCap's subscription terms

Other tools from the factory