Compliance Operations · Published 2026-07-04
Caption vendor audit rights and examination evidence: what regulators look for in a FINRA examination or OCR complaint review, and what documentation your vendor must produce
An HR director at a 200-person broker-dealer receives a FINRA examination request letter on a Tuesday morning. Among the 14 item categories: “Training records for all registered representatives for the period January 2024 to December 2025, including documentation that training materials meet applicable accessibility requirements under the Americans with Disabilities Act.” She has 10 business days. She opens the LMS. Training completion records: present. Training content: present — 34 compliance training videos. Caption files: 22 of 34 have auto-generated captions the LMS produced. 8 have SRT files from a vendor the firm stopped using 18 months ago, after that vendor was acquired. 4 have no captions. When she calls the LMS vendor about the auto-caption processing records, they say auto-caption artefacts are retained for 12 months. The oldest training videos were auto-captioned 22 months ago. When she contacts the acquiring company about the SRT files, they have the account records but not the per-file processing logs, accuracy scores, or reference transcripts. She has 10 business days to respond to an examiner who asked for evidence that training was accessible. She can prove that training happened. She cannot prove that training was accessible. This guide covers the documentation framework that prevents that situation — what examiners request, what your vendor must contractually agree to retain, how chain of custody works for caption files, and the annual compliance package that allows you to respond to any examination request in 48 hours.
TL;DR
Five things every organisation that uses caption vendors for training content needs to know before an examination request arrives:
- Caption compliance examinations are triggered by four mechanisms. OCR (Department of Education or HHS) processes disability accommodation complaints from employees and programme participants. FINRA examines broker-dealer training records in its routine examination cycle. EEOC processes ADA Title I employment discrimination charges. State civil rights agencies and Attorneys General enforce state-level accessibility requirements. Each trigger produces a different examination, but all of them converge on the same documentation gap: organisations can show that training happened but cannot show that training was accessible.
- The reference transcript is a distinct document from the caption file, and examiners use it to verify accuracy. The reference transcript is the text of what the speaker actually said — the ground truth against which caption accuracy is measured. An organisation that relies on auto-captions without retaining reference transcripts cannot respond to an accuracy verification request. A vendor that destroys reference transcripts after 12 months is a liability in any examination where training predates that window.
- Vendor documentation obligations must be negotiated before the relationship begins. A vendor that has no contractual obligation to retain processing logs, accuracy records, reference transcripts, and glossary version history for a defined minimum period is a liability the moment an examination request arrives. The audit-rights clause belongs in the vendor contract at signature, not in the amendment the organisation tries to negotiate after receiving an examination letter.
- Chain of custody for caption files means six documented facts. Who produced the file. When. What tool or service version was used. What glossary was active at the time of processing. Where the file is stored now. Who has modified it since original production and what was changed. Most LMS-stored caption files have none of these six facts documented anywhere. The auto-caption file that lives in the LMS with no metadata beyond filename and upload date has no chain of custody.
- Pre-assembling an annual caption compliance package is the structural solution to the 48-hour problem. An organisation that maintains a current compliance package — governance policy, vendor contract with audit rights, accuracy sample records, accommodation request log, chain-of-custody record for the 10 highest-risk caption files — can respond to any examination request with confidence. An organisation that assembles this package in response to an examination letter does so in crisis mode, against a 10-business-day deadline, with a vendor whose cooperation is not contractually guaranteed.
Who examines caption compliance and what triggers an examination
Caption compliance is examined by six distinct regulatory bodies, each with a different trigger, a different documentation focus, and a different consequence spectrum. Understanding which body is most likely to examine your organisation — and what it looks for when it does — determines which documentation you need to maintain at the highest priority.
Office for Civil Rights (OCR) — Department of Education
OCR at the Department of Education enforces Title II of the ADA (state and local government entities, including public universities, public school districts, and state workforce agencies) and Section 504 of the Rehabilitation Act (any entity receiving federal financial assistance from the Department of Education, including private colleges and universities with federal student aid, Head Start grantees, and vocational rehabilitation programmes). OCR investigations are complaint-driven — they begin when a specific individual files a complaint with the OCR regional office.
For training video captions, the typical OCR complaint scenario: a deaf or hard-of-hearing employee or programme participant is assigned required training on a platform with inadequate auto-captions, requests accommodation, and either does not receive it or receives it after a delay that caused the person to miss a deadline. The OCR complaint is filed against the educational institution or federally funded programme, not against the caption vendor.
OCR complaint volume has grown substantially since 2020 as more training content moved online. OCR uses a rapid resolution process for straightforward accommodation failures (30-60 days), but organisations that have a systematic documentation problem — no accommodation request log, no caption accuracy evidence, no policy document — are referred to the full investigation track, which can take 12-36 months and conclude with a resolution agreement that specifies corrective actions and monitoring periods.
Office for Civil Rights (OCR) — HHS
OCR at HHS enforces Section 504 for entities receiving federal financial assistance from HHS: hospitals that receive Medicare or Medicaid, community health centres, social service organisations with HHS grants, substance use disorder treatment programmes, state vocational rehabilitation agencies funded by HHS, and many nonprofit service organisations. The training video caption context: workforce training for healthcare workers, social service staff, and programme participants in federally funded programmes. A hospital system where clinical staff complete HIPAA security training, infection control procedure training, or mandatory annual Joint Commission education using auto-captioned videos with unacceptable accuracy has Section 504 exposure if any employee with a disability relies on those captions.
FINRA — Financial Industry Regulatory Authority
FINRA examines broker-dealers in its routine examination cycle — annual for large firms, less frequent for smaller firms. Unlike OCR, FINRA examinations are not complaint-triggered; they happen on schedule regardless of whether anyone has complained. Training records are a standard examination category. FINRA Rule 3110 requires that broker-dealers maintain a supervisory system reasonably designed to achieve compliance with applicable securities laws and FINRA rules, which includes compliance training. ADA Title I requires that all training content be accessible to employees with disabilities. The intersection: a FINRA examiner reviewing training records may ask whether training content is accessible as part of a broader inquiry into whether the firm's training programme is legally compliant. See our detailed guide to financial services captioning under FINRA, SEC Rule 17a-4, and broker-dealer regulatory training requirements for the full regulatory framework.
EEOC — Equal Employment Opportunity Commission
EEOC processes ADA Title I employment discrimination charges filed by employees and applicants. A charge alleging that an employer failed to provide accessible training materials — or failed to accommodate a disability-related request for accessible training — is an ADA Title I charge that EEOC investigates. EEOC may subpoena training records, caption files, and accommodation request documentation as part of the investigation. Unlike FINRA and OCR, EEOC does not examine organisations proactively — an EEOC investigation begins only when a charge is filed by an individual.
OFCCP — Office of Federal Contract Compliance Programs
OFCCP enforces Section 503 of the Rehabilitation Act for federal contractors and subcontractors with contracts above the threshold ($10,000 generally; $15,000 for specific contract types). Federal contractors must take affirmative action to employ and advance in employment qualified individuals with disabilities. Accessible training is part of affirmative action in employment. OFCCP conducts compliance evaluations (audits) of federal contractors and can request training records as part of the evaluation. Organisations with significant federal contract revenue — defence contractors, healthcare providers with CMS contracts, IT services firms — are at elevated risk.
State civil rights agencies and Attorneys General
State laws in many jurisdictions impose accessibility requirements that are coextensive with or exceed federal requirements. California's FEHA, New York's NYSHRL, Illinois's Human Rights Act, Massachusetts's Chapter 151B, and Washington State's WLAD all prohibit disability discrimination in employment. Several states have also enacted specific web accessibility statutes that apply to employers and public accommodations. State Attorneys General have enforcement authority over public accommodations under state civil rights laws. State civil rights agencies process complaints through investigation procedures similar to EEOC. The enforcement consequence is typically a state court action or state administrative hearing rather than a federal resolution agreement.
State bar associations — CLE provider audits
State bar associations that approve continuing legal education providers increasingly include accessibility requirements in their provider standards. A CLE provider that delivers training content via video without adequate captions may fail a provider audit or receive a complaint from a bar member with a disability. The documentation requirements parallel the OCR investigation: accommodation request log, caption accuracy evidence, accessibility policy. The enforcement consequence is typically loss of CLE provider accreditation — a significant commercial consequence for CLE providers. This scenario is covered tangentially in our guide to captioning for legal department and law firm training.
The four documentary evidence pillars every examination requests
Despite the differences among FINRA examinations, OCR investigations, EEOC charge investigations, and OFCCP audits, all four converge on the same four categories of documentary evidence. Understanding these four categories — and why each matters independently — is the foundation for building a documentation framework that can respond to any examination.
Pillar 1: The caption file itself, with production metadata
The caption file — the SRT, VTT, or platform-native file that displays text on screen during video playback — is the first thing any examiner looks at. But the file alone is not sufficient evidence. An examiner reviewing a caption file is asking: was this file produced by a process that reliably achieves adequate accuracy? Is this the file that was actually available to employees when they completed the training? Has this file been modified since it was produced?
To answer these questions, the examiner needs production metadata: when was the file produced, by whom (vendor or internal), using what tool or service version, with what accuracy-enhancing configurations (glossary, language model). None of this information is stored in the SRT or VTT file format itself. It must be stored separately — either in a processing log maintained by the vendor or in an internal documentation system. The caption file without production metadata is evidence that captions exist. It is not evidence that captions are accurate, that the production process is reliable, or that the file has not been modified.
Pillar 2: The accuracy measurement record
Caption accuracy is the substantive question in any accessibility examination: did this caption file accurately convey the spoken content? A file that exists but is 78% accurate does not satisfy WCAG 2.1 AA Success Criterion 1.2.2's requirement for captions that are accurate, synchronised, complete, and accessible. An examiner who reviews a caption file may be able to assess accuracy qualitatively — obvious vocabulary errors are visible in the text — but the formal accuracy record is a measurement using a defined methodology (typically the DCMP Captioning Key formula: word error rate measured against a reference transcript) taken at a defined date by a defined person or process.
The accuracy measurement record answers: what score did this file receive, measured by what method, against what reference, on what date, by whom? An organisation that has never measured caption accuracy has no accuracy record to produce. An organisation that relies on a vendor's SLA claim (“99% accuracy guaranteed”) without per-file accuracy records has a guarantee but no evidence. The distinction matters: a guarantee says the vendor commits to a standard; a record says the standard was verified for this specific file. See our guide to caption vendor accuracy evaluation methodology for the full measurement framework.
Pillar 3: The accommodation request and resolution log
An accommodation request log is the operational evidence that an organisation's accessibility system functions as documented. It records: who requested an accommodation (identified by a case number for privacy), when, what accommodation was requested, how the organisation responded, how long the response took, and how the accommodation was resolved. The log is not just a legal requirement under ADA's reasonable accommodation process — it is the evidence that examiners use to assess whether the organisation's accessibility programme is responsive to actual need.
A clean accommodation log — complete, contemporaneous, with prompt response times and resolved outcomes — tells an examiner that the organisation takes accessibility requests seriously and has a functioning process. A missing, sparse, or incomplete log raises the opposite inference: that accommodation requests were not tracked, not responded to promptly, or not resolved. An organisation that received an accommodation request for accessible training captions and cannot document what happened to that request is in a significantly worse position than one that has documented the request, response, and resolution even if the resolution took longer than ideal.
Pillar 4: The caption programme governance policy
The governance policy — the formal document that specifies how the organisation approaches caption compliance for training content — is the framework evidence that contextualises the other three pillars. Examiners use the governance policy to assess whether the organisation has thought systematically about accessibility or treats it ad hoc. A governance policy specifies: which content requires captions (scope definition), what accuracy standard is required, who is responsible for ensuring captions are produced and reviewed, how accommodation requests are received and resolved, how vendors are selected and monitored, and how compliance is reported. See our governance policy template for a complete framework and our guide to building a caption compliance programme for the organisational context.
An organisation without a governance policy — one that produces captions reactively, without documented standards, without assigned responsibility, without a monitoring process — cannot demonstrate systematic compliance. An OCR resolution agreement for an organisation without a governance policy will typically require the organisation to write and implement one as a corrective action. Writing it in response to an examination is harder than writing it in advance: under examination conditions, the policy is being written to satisfy an examiner rather than to actually govern the programme.
The chain-of-custody problem for caption files
Chain of custody is a concept from evidence law: a record of who has handled a piece of evidence, from its origin to its presentation in court, that allows a court to conclude that the evidence has not been tampered with or substituted. Applied to caption files, chain of custody means a documented record of a caption file’s complete history from original production to current state. Without a chain of custody, an examiner cannot assess whether the caption file available today is the caption file that was available to employees when they completed the training — or whether it has been corrected, altered, or substituted since.
The six elements of caption file chain of custody
A complete chain of custody record for a caption file documents six facts:
- Producer: Who produced the original caption file — a named vendor (with vendor name, account ID, service name), an internal tool (with tool name and version), or an LMS auto-caption engine (with platform name, caption engine version if available).
- Production timestamp: When the caption file was produced by the original production process — not the date the file was uploaded to the LMS, which may be days or weeks later, but the date the captioning process completed.
- Tool and configuration: What tool or service version was used, what model version (if using an ASR engine), and what configuration was active — particularly whether a custom glossary was applied and which version of the glossary was active at the time of production.
- Storage location: Where the file currently resides (LMS platform, content server, archive system) and the file identifier (LMS asset ID, S3 key, or equivalent) that allows the file to be retrieved on demand.
- Modification history: A record of every modification to the file after original production — who made the change, when, what was changed, and why. If the file has never been modified, the modification history is a single entry noting that the file is unchanged since original production.
- Version designation: Whether the file has version numbers if multiple revisions exist (v1.0 original, v1.1 corrected vocabulary errors on 2025-03-14 by [name], v2.0 re-processed after video content update on 2025-09-01). For the current production and version lifecycle framework, see our guide to caption glossary maintenance and version control.
Why most LMS-stored caption files have no chain of custody
The standard LMS architecture for caption files is: file stored in the LMS content library, associated with the video asset, displayed during playback. The LMS records when the file was uploaded and (in some platforms) who uploaded it. It does not record who produced the file, what processing parameters were used, whether a glossary was applied, what the accuracy measurement is, or whether the file has been modified since upload.
For vendor-produced SRT files that are uploaded to the LMS, the production metadata lives with the vendor — in the vendor’s processing records, not in the LMS. When the vendor relationship ends, that metadata becomes inaccessible unless it was retained elsewhere. For LMS auto-generated captions, the production metadata may not be retained by the LMS platform at all after the initial processing period. For in-house captioning workflows, the production metadata is whatever the internal team documented — which is often nothing beyond the SRT file and the email exchange with the instructional designer who requested the captions.
The acquired-vendor gap
Vendor acquisition creates a specific chain-of-custody gap: the records the original vendor retained were typically transferred to the acquiring entity, but the acquiring entity may not have the same records management practices, may not have completed the records transfer for all accounts, and may not have a contractual obligation to the customer to produce records that were the original vendor’s rather than the acquirer’s. A customer who signed a contract with Vendor A that was acquired by Vendor B discovers that Vendor B has the account credentials and the current caption files but not the per-file processing logs and reference transcripts from Vendor A’s system.
Protection against the acquired-vendor gap: negotiate records-delivery-at-termination as a contract term. If the vendor relationship ends — for any reason, including vendor acquisition — the vendor must deliver a complete export of all processing records, reference transcripts, accuracy scoring records, and glossary version history for all files produced under the contract. This delivery should happen before or simultaneously with the termination of the vendor’s data retention obligations. Without this term, records may be inaccessible after acquisition regardless of what the contract otherwise says. See our caption vendor SLA and contract review checklist for the complete contract review framework.
The in-house modification problem
A common scenario: an instructional designer or L&D coordinator reviews an auto-caption file or vendor-produced SRT file, notices vocabulary errors, and edits the SRT file directly — correcting “CAPA” to “KAPA” back to “CAPA”, fixing a misspelled proper noun, updating terminology that changed after the video was recorded. The LMS stores the corrected file. The edit history exists nowhere except (possibly) in the editor’s memory and (possibly) in version control if the organisation uses it. The examiner sees a caption file that is better than the original but has no modification record.
For chain-of-custody purposes, every modification to a caption file must be recorded: who made the change, when, what was changed (specific text), and why. This is the same principle that applies to controlled documents in a quality management system — every revision must be traceable. The audit trail does not need to be elaborate: a spreadsheet with one row per modification, or a comment in the SRT file header, or a note in the LMS content management record. What it cannot be is nothing. See our QA methodology guide for how to build caption review and correction processes that produce an audit trail.
The reference transcript: why it is different from the caption file and why examiners need it
The reference transcript is the text of what the speaker actually said — the ground truth against which caption accuracy is measured. It is a separate document from the caption file and serves a different evidentiary function. Understanding this distinction is essential for understanding what an examiner will request and what your vendor must retain.
Reference transcript vs. caption file: the functional distinction
A caption file is a timed text document: it contains transcribed text synchronised to time codes so that the correct text appears on screen during the corresponding audio. The caption file is what viewers see. It may be more or less accurate depending on the transcription process, glossary configuration, and review. The reference transcript is the verbatim text of what the speaker said, without time codes, produced by a human reviewer who listened to the audio and typed what was actually spoken. It is the ground truth document — the text to which the caption file is compared when accuracy is measured.
Accuracy measurement using the DCMP Captioning Key formula works by comparing the caption file text to the reference transcript text using word error rate: words substituted, deleted, or inserted in the caption file relative to the reference transcript, divided by the total words in the reference transcript. Without a reference transcript, you cannot calculate accuracy. An examiner who wants to independently verify caption accuracy needs the reference transcript. A vendor who does not retain reference transcripts cannot provide the basis for independent accuracy verification.
Who produces the reference transcript
In a professional captioning workflow, the reference transcript is typically produced by the caption vendor as part of the transcription process: a human transcriptionist (or a human reviewing an ASR-generated transcript) produces the reference text, then the timed caption file is derived from it. The reference transcript should be retained as a separate deliverable from the caption file, because it serves a different purpose: the caption file goes to the LMS; the reference transcript goes to the documentation archive.
For auto-generated captions (LMS-native or uploaded from an ASR-only workflow without human review), there may be no human-produced reference transcript at all — only the ASR output that became the caption file. In this case, producing a reference transcript requires hiring a human transcriptionist to listen to the audio and produce the ground truth text retroactively. This can be done, but it is expensive and time-consuming when applied to a large content library. For high-risk training content — required compliance training, training that has been the subject of an accommodation request, training in a regulated industry where captions are part of the training record — the reference transcript should be produced at the time of initial captioning, not retroactively after an examination request.
Vendor retention of reference transcripts: what to require
The vendor contract must specify that the vendor will retain the reference transcript (separately from the caption file) for a defined minimum period and produce it within a defined window on the customer’s written request. Specific terms:
- Retention period: Minimum 3 years from the date of production, matching the conservative default for civil rights records retention. For organisations subject to FINRA Rule 4511 (general records retention), 6 years is the appropriate window for the highest-risk training records.
- Production window: The vendor must produce the reference transcript within 10 business days of a written request from the customer. This window aligns with typical examination response deadlines.
- Format: Plain text or formatted PDF, labelled with the video title, production date, and vendor job number. The format must allow a third-party reviewer to perform an accuracy calculation against it.
- Completeness: The reference transcript covers the complete audio, including speaker identification where multiple speakers appear on screen (common in training video that features instructor and student interaction, panel discussions, or interview formats).
- Delivery at termination: On termination of the vendor relationship, all reference transcripts produced under the contract are delivered to the customer in bulk before the vendor’s data retention obligations expire.
When the reference transcript is gone
If the vendor has already destroyed reference transcripts for older content — because the contract had no retention requirement, or because the retention window has elapsed, or because the vendor was acquired and the acquiring entity did not transfer records — the reference transcripts must be produced retroactively. The retroactive production process: hire a human transcriptionist to listen to the original audio and produce a verbatim transcript. Then measure accuracy of the existing caption file against that retroactive reference transcript. Document the retroactive production date, who produced it, and the fact that it was produced retroactively (so there is no claim that it was the original basis for the caption file).
Retroactive reference transcript production for a large content library is expensive — professional human transcription typically costs $1.00-$2.00 per audio minute, so a 500-video library with an average runtime of 8 minutes costs $4,000-$8,000 in transcription alone, before accuracy measurement, documentation, and storage. This cost is the retrospective penalty for not requiring reference transcript retention in the original vendor contract. See our caption programme budget planning guide for how to budget for documentation requirements alongside production costs.
Vendor documentation obligations: what to negotiate before the relationship begins
The vendor documentation obligations that an examination requires cannot be created retroactively. If the contract was signed without an audit-rights clause, there is no enforceable basis for requiring the vendor to produce documentation. The audit-rights clause must be negotiated at contract signature. This section covers the eight specific obligations to negotiate, the contract placement, and the response-time terms that make the clause usable in an actual examination.
The eight vendor documentation obligations
1. Original submission records
The vendor must retain a record of the original audio/video file submitted for captioning — at minimum, a cryptographic hash (SHA-256) of the original file that allows identity verification. If the vendor also retains the original file, the contract should specify the storage format and access mechanism. This allows the customer to verify that the caption file produced corresponds to the audio submitted and that the audio has not been altered since submission. The submission timestamp — the date and time the file was received by the vendor for processing — is part of this record.
2. Processing parameters and tool version
The vendor must document the processing parameters used for each caption file: the ASR engine or model used (with version number), whether a custom glossary was applied (and which glossary version), the language configuration, any post-processing steps applied (speaker diarisation, punctuation restoration, confidence filtering), and any human review steps performed before the file was delivered. For vendors that update their ASR models periodically, the model version at the time of processing is material: a file produced under model v2.1 and a file produced under model v3.0 may have different accuracy profiles for the same vocabulary. The model version record allows the customer to understand why accuracy may differ across files produced at different times.
3. Reference transcript
As discussed above, the reference transcript — the verbatim text of what the speaker said — must be retained for the full retention period and produced on request within the examination response window. If the vendor does not produce human-reviewed reference transcripts as part of its standard workflow (i.e., the vendor delivers ASR-generated captions without a human reference transcript step), this must be disclosed and the contract must specify that the customer may request a retroactive reference transcript at any time at the vendor’s cost.
4. Per-file accuracy scoring records
The vendor must retain a per-file accuracy score measured against the reference transcript using a documented methodology. The DCMP Captioning Key formula (word error rate against a reference transcript, with specific categories for substitution, deletion, and insertion errors) is the methodology most commonly referenced in resolution agreements and industry standards. The accuracy scoring record must specify: the score, the methodology, the date measured, and who performed the measurement (automated scoring or human reviewer). For vendor accuracy evaluation methodology, see our detailed accuracy evaluation guide.
5. Custom glossary version history
If the organisation uses a custom glossary with the vendor — a list of industry-specific terms, proper nouns, product names, and acronyms that the ASR engine should render correctly — the vendor must retain a version history of the glossary: each version with its effective date (the date it was deployed to the processing system), a list of terms added and removed at each version, and a record of which glossary version was active when each caption file was produced. This version history allows the organisation to answer the examiner’s question “was this file produced with the current glossary or an earlier version?” and to identify files that may need re-processing after a glossary update. See our glossary architecture guide and our glossary maintenance workflow guide for the full framework.
6. QA review records
If the vendor includes a human QA review step — where a human reviewer reads the ASR-generated caption file against the reference audio and corrects errors before delivery — the vendor must retain records of who performed the review, when, and what corrections were made. This record is the chain-of-custody documentation for the vendor’s own internal process: it allows the customer to demonstrate that the delivered file was human-reviewed and to identify whether errors in the final file represent reviewer error or ASR-initial errors that the reviewer missed. For the internal QA methodology that complements vendor review, see our QA methodology guide.
7. Corrected version documentation
When the vendor corrects or re-processes a caption file after initial delivery — because the customer reported errors, because the accuracy score fell below the contracted threshold, or because the underlying video was updated — the vendor must retain documentation of the correction: the original file (before correction), the corrected file, the date of correction, the specific changes made, and the reason for correction. This is the modification history component of the chain-of-custody record for vendor-produced corrections. It is distinct from the customer-side modification history (edits made by the customer after delivery).
8. Records delivery at termination
When the vendor relationship ends — for any reason, including expiration, non-renewal, termination for convenience, or vendor acquisition — the vendor must deliver a complete export of all records (reference transcripts, processing logs, accuracy scoring records, glossary version history, QA review records, corrected version documentation) for all files produced under the contract. This delivery must occur before or simultaneously with the termination of the vendor’s data retention obligations. Without this term, records become inaccessible at termination even if they were properly retained during the relationship.
Retention period and response time: the contract terms
The retention period should match the longest applicable regulatory requirement for your organisation:
- For most employers (ADA Title I, state civil rights laws): 3 years from date of production, or 1 year after the most recent relevant employment action, whichever is later. The practical default is 3 years from production.
- For FINRA-regulated broker-dealers (FINRA Rule 4511, SEC Rule 17a-4): 6 years for general business records, 3 years for certain categories. For training records used to demonstrate compliance with supervisory obligations, 6 years is the conservative default.
- For federal contractors subject to OFCCP (Section 503): 2 years from date of record creation or personnel action, whichever is later. 3 years from production satisfies this requirement.
- For organisations subject to multiple frameworks: use the longest applicable period. A broker-dealer that is also a federal contractor should use 6 years.
The response time for producing records on request should be specified in the contract as a binding commitment: 10 business days from receipt of a written request. This aligns with typical examination response windows and creates an enforceable SLA for documentation production. A contract that says “we will produce records upon request” without a time commitment is not useful when the examination letter gives 10 business days.
Where audit rights live in the vendor contract
Audit rights and records retention should appear in a dedicated section of the vendor contract, separate from the general SLA terms that cover turnaround time, accuracy guarantees, and re-captioning commitments. The reason for a separate section: the SLA terms govern the ongoing vendor performance during the relationship; the audit rights terms govern what happens when a third party asks for documentation of the relationship. Many vendor contracts have robust SLA terms and no audit rights terms, because the SLA terms are what vendors compete on during procurement. The audit rights terms are negotiated separately, after the vendor has been selected. This is the correct sequence: the RFP process selects the vendor; the contract negotiation determines the documentation obligations that the selected vendor must accept. For complete contract review framework including the specific language for each clause, see our caption vendor SLA and contract review checklist.
How FINRA training examinations look at caption compliance
FINRA examinations of broker-dealer training programmes are conducted by FINRA examination staff during the routine examination cycle. Most broker-dealers are examined annually for the first three years of operation and then on a cycle based on risk assessment. Training records are a standard examination category because FINRA Rule 3110 (Supervision) requires that broker-dealers establish, maintain, and enforce a supervisory system that achieves compliance with applicable laws and rules — and compliance training is part of that supervisory system.
The FINRA Rule 3110 supervisory system requirement
FINRA Rule 3110 requires that a broker-dealer’s supervisory system be “reasonably designed to achieve compliance with applicable securities laws and regulations, and with applicable FINRA rules.” Applicable laws include ADA Title I, which requires that training content be accessible to employees with disabilities. A supervisory system that includes a compliance training programme is not “reasonably designed” to achieve ADA compliance if it delivers training content that is inaccessible to employees who are deaf, hard of hearing, or have other disabilities that affect their ability to receive audio-only content.
The FINRA examiner reviewing training records is not specifically an accessibility specialist — the examiner is reviewing whether the broker-dealer has a supervisory system for training that is reasonably designed and functioning as documented. The accessibility question arises when the examiner asks: “How does the firm ensure that training content is accessible to employees with disabilities?” and the answer is “We use auto-captions from the LMS, and we have not measured their accuracy.”
What FINRA examiners look at in training records
The FINRA training examination request typically covers:
- The list of all training content delivered to registered representatives in the examination period, with dates, content titles, and completion records per registered representative
- Documentation that training content is current (i.e., that the content reflects current regulatory requirements, not superseded rules)
- Documentation of the supervisory review of training content (someone reviewed and approved the training content before delivery)
- The firm’s written supervisory procedures (WSPs) as they relate to training obligations
- Documentation of accommodation requests related to training and how they were resolved
- Evidence that training content meets applicable legal requirements — which, for content delivered via video, includes accessibility
The last two items are where caption compliance becomes relevant. A firm that has received an accommodation request related to training accessibility — and has not documented how it was resolved — has a supervisory failure. A firm that cannot produce evidence that its training content is accessible has a potential compliance gap.
The auto-caption documentation problem for broker-dealers
The most common pattern at mid-market broker-dealers: the firm uses a cloud-based compliance training platform (Cornerstone, Docebo, TalentLMS) with native auto-captioning enabled. All registered representatives complete their annual compliance training (suitability, Reg BI, anti-money laundering, cybersecurity) on this platform. The auto-captions are present on all training videos. Nobody has measured caption accuracy. Nobody has a documentation record of how the captions were produced. Nobody has reviewed the captions for the FINRA-specific vocabulary that matters: “Reg BI”, “suitability”, “FINRA Rule 2111”, “best interest”, “conflict of interest”, “disclosure obligations.”
For the vocabulary failure profile in financial services training content — including how FINRA rule references are systematically misrendered by general-purpose ASR — see our guide to financial services captioning and regulatory training. The key takeaway for FINRA examination purposes: auto-captions on FINRA compliance training content are not inherently compliant. They are potentially compliant if accuracy has been measured and found adequate. They are demonstrably compliant if there is a measurement record, a process for review, and a remediation path for content below the threshold.
What clean FINRA documentation looks like
A broker-dealer with clean caption compliance documentation for a FINRA examination can produce:
- A written accessibility policy for training content (part of the overall WSPs or a standalone caption programme governance policy)
- The vendor contract for captioning services, with audit rights and records retention terms
- A quarterly accuracy sample record: 5-10 training videos measured per quarter, accuracy score per file, measurement methodology
- The accommodation request log for the examination period: zero entries (no requests) or a log of requests received with response and resolution
- For any training video the examiner selects at random: the reference transcript (obtainable from vendor within 10 business days) and the accuracy score for that file
This documentation package demonstrates a supervisory system that is reasonably designed for accessible training content: it has a policy, a vendor with documented obligations, a monitoring process, and a responsive accommodation system. This is substantially different from “we use auto-captions and we have not measured accuracy.”
How OCR complaint investigations work and what they request
An OCR complaint investigation begins with a specific individual filing a complaint. The individual provides: their name and contact information, the name of the institution against which the complaint is filed, a description of the alleged discriminatory action, and their basis for belief that the action violated ADA Title II or Section 504. OCR reviews the complaint for sufficiency and jurisdiction, and if the complaint meets the threshold, opens an investigation. The institution is notified of the investigation and the specific allegations.
The information request letter
The information request letter is the mechanism by which OCR collects documentary evidence. It is typically issued within 60 days of the institution receiving notice of the investigation. The letter lists numbered items and gives a response deadline (commonly 30-60 days, though OCR may grant extensions on request). For caption-related investigations, the information request letter typically includes:
- Item 1: Policy documents. A copy of the institution’s written policies regarding accessibility of training content, including caption requirements, accommodation procedures, and grievance procedures.
- Item 2: Training content inventory. A list of all training content available during the period covered by the complaint, with the format of each (video, text, audio) and whether captions are available.
- Item 3: The specific training content at issue. The caption file, the video, and any documentation of the accuracy of the captions (how accuracy was measured, by whom, and on what date).
- Item 4: Accommodation request records. All records related to the complainant’s accommodation request (and any prior requests): who received the request, when, what response was given, the timeline, and the resolution. This must cover not just the specific accommodation that gave rise to the complaint but also any prior requests by the same individual, and any prior requests by other individuals for similar accommodations.
- Item 5: Grievance procedure records. Evidence that the complainant had access to an accessible grievance procedure and documentation of whether the grievance procedure was used.
- Item 6: Vendor contracts. Copies of contracts with caption vendors, including accuracy specifications, SLA terms, and (if present) audit rights and records retention terms.
- Item 7: Title II/Section 504 coordinator designation. The name and contact information of the institution’s ADA/Section 504 coordinator, and evidence that this designation has been communicated to employees and programme participants.
The population problem in OCR investigations
OCR investigations focus on the specific complaint but routinely expand to examine whether the institution has a systematic accessibility problem. An institution that has one caption complaint about one training video is likely to have the same problem with other training videos. OCR investigators know this, which is why the information request letter asks for the full training content inventory and the full accommodation request log, not just the records related to the specific complaint. An institution that produced inadequate captions for the training video at issue in the complaint, and has no governance policy, no measurement process, and no accommodation log, will be asked to provide a corrective action plan for the full training programme, not just for the one video.
Resolution agreements
When OCR finds that an institution has violated ADA Title II or Section 504, it typically resolves the case through a resolution agreement rather than a referral to the Department of Justice for enforcement. A resolution agreement is a binding commitment by the institution to take specific corrective actions within specified timeframes. For caption compliance, a resolution agreement typically requires:
- Adoption of a written caption governance policy within 60-90 days
- Review and correction of all training video captions within 180 days, with accuracy measured against a defined standard (often the DCMP Captioning Key)
- Implementation of an accommodation request log and procedure within 30 days
- Training of all relevant staff (L&D team, IT administrators, accessibility coordinator) within 90 days
- Annual reporting to OCR for 1-2 years documenting compliance with the resolution agreement
- Designation of a specific contact for disability accommodation matters
The monitoring period — typically 1-2 years of annual reporting to OCR — is the consequence that most organisations find most onerous. It means that the organisation must maintain documentation of caption compliance for the full monitoring period and submit that documentation to OCR on schedule. An organisation that enters a resolution agreement without having built the documentation systems that the agreement requires will find the monitoring period very difficult. See our accessibility coordinator playbook for the role that is responsible for managing resolution agreement compliance and our guide to caption compliance reporting to leadership for the reporting framework.
Assembling the caption compliance package: the 48-hour standard
The 48-hour standard is the operational target for caption compliance documentation: an organisation should be able to produce a complete, examination-ready documentation package within 48 hours of receiving a request. This is not the examination response deadline (which is typically 10-30 business days), but the internal availability standard — the time it takes to locate, compile, and verify the documentation before sending it to counsel or directly to the examiner. Meeting the 48-hour standard requires that the documentation be pre-assembled and maintained, not assembled from scratch in response to the examination request.
The five-layer compliance package
A complete caption compliance package has five layers, each addressing a different examination question:
Layer 1: Policy documents
The governance policy for training content captioning, including the scope definition (which content requires captions), the accuracy standard (WCAG 2.1 AA Success Criterion 1.2.2, measured at 99% using the DCMP formula), the responsible party (the accessibility coordinator or designated L&D staff member), the accommodation request procedure, the grievance procedure, and the annual review schedule. The policy should be dated, signed by a senior official, and version-controlled. The most recent version and at least one prior version should be in the package. See our governance policy template for a complete framework.
Layer 2: Vendor documentation
The current vendor contract with the audit rights section highlighted, the SLA terms (accuracy specification, turnaround), and the records retention commitment. If the organisation has used multiple vendors in the examination period, all relevant contracts. If the vendor was changed during the period, the transition documentation showing how records from the previous vendor were transferred or are available. For the vendor selection process that should precede contract negotiation, see our guide to evaluating vendor RFP responses and our vendor pilot programme design guide.
Layer 3: Accuracy sample records
A sample of caption accuracy measurements for the examination period: at minimum, 10 caption files measured per year, with the file identifier, the accuracy score, the methodology, the date of measurement, and the measurement method (automated or human). The 10 files should include the highest-risk content: required compliance training, training on regulated procedures, training that has been the subject of an accommodation request. Accuracy sample records should be maintained in a dedicated log (a spreadsheet or document management system record is sufficient) and reviewed at the annual caption programme review. See our annual caption programme review guide for how to structure this review.
Layer 4: Accommodation request log
A complete log of all accommodation requests related to training content accessibility for the examination period (3-6 years depending on regulatory framework). Each entry includes a case number (not the individual’s name, to protect privacy), the date the request was received, the content at issue, the accommodation requested, the date the organisation responded, the nature of the response, and the date and nature of the resolution. The log should include zero-entry periods where no requests were received (a note that the log was actively maintained with no entries during Q1 2025, for example, is more credible than a log with no entries at all, which might suggest the log was created retroactively).
Layer 5: Chain-of-custody records for high-risk content
For the 10-20 training videos that carry the highest regulatory risk — required annual compliance training, training on regulated procedures, training that has been the subject of an accommodation request — a complete chain-of-custody record: producer, production timestamp, tool and configuration, storage location, modification history, and version designation. This does not need to cover the entire content library, but it must cover the content that an examiner is most likely to focus on. The accessibility coordinator or designated L&D staff should maintain this record and review it annually as part of the annual programme review.
The annual compliance package review
The compliance package should be reviewed annually, at the same time as the annual caption programme review. The annual review of the compliance package covers: updating the policy document if the programme has changed, verifying that the vendor contract audit rights are still in place (important after a vendor contract renewal, which may reset terms), adding the year’s accuracy sample records, updating the accommodation request log, and refreshing the chain-of-custody records for high-risk content. An annual review of 2-3 hours produces a package that is always within one year of current — which is close enough to current for most examination purposes. A package that has never been formally reviewed may be multiple years out of date at the time an examination arrives.
The role of the accessibility coordinator
The accessibility coordinator — whether a dedicated role or a responsibility assigned to an L&D team member — is the person responsible for maintaining the compliance package. The accessibility coordinator playbook covers the full scope of this role, but the compliance package responsibility is the most important examination-specific function: making sure that the four documentary pillars are current, the vendor documentation is available, and the chain-of-custody records are maintained. Without a named responsible party, compliance packages tend not to be maintained between annual reviews, and the next examination finds a package that is 18 months out of date.
Eight failure modes that expose organisations to examination findings
The following eight failure modes are the specific patterns that produce negative examination outcomes: vendor documentation gaps, process failures, and policy gaps that turn a routine examination inquiry into a corrective action requirement.
Failure Mode 1: Auto-caption reliance without any documentation
The organisation’s LMS has been auto-generating captions for 3+ years. No processing records exist. No accuracy records exist. No governance policy addresses auto-captions. No accommodation request log has been maintained. The only documentation is the SRT files currently in the LMS and the fact that captions are “on.” When the examination request arrives, the organisation can confirm that captions are present but cannot confirm that captions are accurate, that the production process is reliable, or that the caption files have not been altered. This is the most common failure mode and the hardest to remediate retroactively, because the processing records and reference transcripts that would allow retroactive accuracy verification may no longer exist.
Failure Mode 2: Vendor acquisition gap with no records transfer
The caption vendor was acquired. The acquiring company has the account credentials and the current billing relationship but not the per-file processing logs, reference transcripts, or accuracy scoring records that were generated under the original vendor’s system. The original vendor’s contract had no records-delivery-at-termination clause. The records are not available from the acquiring company; they may not have been transferred, or they may have been transferred in a format the acquiring company cannot access. The customer has the SRT files that were delivered and nothing else. For training content from the pre-acquisition period, the documentation package is incomplete by definition.
Failure Mode 3: SLA response time that exceeds the examination window
The vendor contract specifies that the vendor will “make reasonable efforts” to produce records “within a reasonable timeframe.” The examination letter gives 10 business days. The organisation contacts the vendor; the vendor says 4-6 weeks because the records are in an archive system that requires a manual retrieval process. The examination response deadline arrives before the vendor can produce the records. The organisation must either request an extension (which may not be granted) or respond to the examination without the vendor documentation. The absence of an enforceable turnaround commitment in the contract — 10 business days, not “reasonable timeframe” — is the specific failure.
Failure Mode 4: Reference transcript destruction after 12-month window
The vendor’s terms of service include a 12-month retention window for “processing artefacts” including reference transcripts. The organisation did not negotiate a longer retention period. The training videos at issue in the examination were captioned 15-22 months ago. The reference transcripts are gone. The organisation cannot produce the document that would allow an independent accuracy verification of the caption files. The vendor will not reconstruct reference transcripts retroactively because it has no obligation to do so. The organisation must commission retroactive human transcription, at its own cost, to produce a reference for accuracy measurement — a process that takes weeks, costs money per file, and produces a retroactive document rather than a contemporaneous one.
Failure Mode 5: Glossary version not recorded
The organisation uses a custom glossary with the caption vendor. The glossary was updated six months ago to add 30 new product names and correct several incorrectly rendered regulatory terms. There is no record of which glossary version was active when each caption file was produced. The examination asks whether the caption files use the current glossary or the prior version that had incorrect terms for [specific regulatory term]. The organisation cannot answer the question. The vendor cannot answer the question. Some files were produced before the glossary update; some were produced after. The examiner may infer that some files have accuracy problems related to the pre-update glossary — problems that the organisation cannot verify or deny because the version tracking was not maintained.
Failure Mode 6: Accommodation log maintained as email threads
The organisation has never maintained a formal accommodation request log. Accommodation requests were received by email and handled individually by the HR coordinator or L&D manager who received them. The examiner asks for the accommodation request log for the past 3 years. The organisation has email threads with the individuals who requested accommodation, but the threads are in the personal mailboxes of staff members, some of whom have left. The oldest threads are in an inbox that belongs to a former HR coordinator who left 2.5 years ago. The email archive is inaccessible without IT intervention. The organisation produces a partial log reconstructed from the email threads it can access, with known gaps. The incomplete log does not demonstrate a systematic, documented accommodation process — it demonstrates an ad hoc process with records gaps.
Failure Mode 7: Caption file modification without audit trail
An instructional designer edited several SRT files directly in the LMS content management system, correcting vocabulary errors that were reported by a staff member. The edits were made over a period of several months. The LMS stores only the current version of each file — there is no edit history. The caption files available to the examiner are the corrected versions, but there is no record of what was in the original version, what was changed, when, or by whom. The examiner asks whether the caption files reflect the captions that were available to employees when they completed the training. The organisation cannot confirm this because it has no record of when the edits were made or whether employees who completed the training before the edits saw the uncorrected version.
Failure Mode 8: Missing governance policy
The organisation has a captioning workflow — a vendor is used, captions are produced, files are uploaded to the LMS — but no formal written governance policy. The workflow exists as institutional knowledge and informal practice, not as a documented, approved, reviewable policy. The examiner asks for the captioning policy. The organisation produces a one-paragraph excerpt from the employee handbook that says the organisation provides accommodations upon request. The examiner asks for the specific policy on training content accessibility standards, accuracy requirements, vendor oversight, and accommodation response procedures. No such document exists. OCR’s resolution agreement requires the organisation to write and implement this policy within 60 days — something that could have been done proactively in an afternoon using a governance policy template.
FAQ
We use a well-known caption vendor that guarantees 99% accuracy. Does that mean we are covered for an examination?
An accuracy guarantee in the SLA covers ongoing vendor performance during the relationship — the vendor commits to re-caption any file that falls below 99%. It is not a records retention commitment, a reference transcript retention commitment, or a commitment to produce documentation in a form that satisfies an examiner. An SLA that says “99% accuracy or we re-caption” does not answer the examination question “can you produce the accuracy measurement records for the training content from January 2024 to December 2025 and show us the reference transcripts on which those measurements were based?” You need both the SLA guarantee (for the ongoing performance commitment) and the audit rights clause (for the documentation obligation). They address different questions.
How long do we need to retain caption compliance documentation?
The retention window depends on which regulatory framework applies. For most employers subject to ADA Title I (no federal contracts, no FINRA regulation): 3 years from date of production, or 1 year after the most recent employment action involving the employee who used the training content, whichever is later. For FINRA-regulated broker-dealers: 6 years for training records that document supervisory system compliance (FINRA Rule 4511; SEC Rule 17a-4 applies a 6-year general records retention for most books and records). For OFCCP federal contractors: 2 years from date of record creation or personnel action, whichever is later. For organisations subject to multiple frameworks, use the longest applicable period. A firm that is both FINRA-regulated and a federal contractor should retain for 6 years. This retention requirement applies to processing logs, reference transcripts, accuracy records, and accommodation request logs — not just the caption files themselves.
Our LMS auto-generates captions. What documentation do we need?
Auto-generated captions create the largest documentation gap because the LMS typically retains no processing records beyond the caption file itself. At minimum, document: (1) the LMS captioning engine name and version in use (this changes with platform updates; document the version in use at quarterly intervals), (2) quarterly accuracy checks on a sample of 5-10 training videos, measuring accuracy using the DCMP formula against a reference transcript you commission, (3) your review process before captions are published (auto-captions reviewed by a human within X days of upload, or published immediately but reviewed within Y days and corrected if below threshold), and (4) the accommodation request log. The LMS auto-caption that has never been reviewed, never measured for accuracy, and has no governance policy documentation is the highest-risk caption file in an examination. See our WCAG 2.1 AA caption compliance guide for the accuracy standard and our analysis of auto-caption compliance status by platform for the platform-specific context.
What is the difference between an OCR investigation and a FINRA examination from a documentation perspective?
OCR investigations are complaint-driven and focus on a specific individual’s experience: what happened when this person requested accommodation? Did this person receive accessible training? FINRA examinations are systematic and focus on the firm’s process: does the firm have a system for ensuring accessible training? Is that system reasonably designed and functioning as documented? The documentation burden is different: OCR wants to see how the organisation handled a specific situation, plus evidence that the situation was not symptomatic of a systemic failure. FINRA wants to see that the firm has documented policies, a vendor with enforceable obligations, and a measurement process. OCR investigations may lead to retroactive remediation of the specific individual’s situation. FINRA examinations may lead to a deficiency letter and a requirement to implement a systematic programme. Both are better resolved with a pre-existing compliance package than with an ad hoc response.
Our caption vendor went out of business. What are our options?
If the vendor was acquired: contact the acquiring company and formally request the records in writing. Reference the specific contract term (if you have one) requiring records delivery. If there is no contract term, request as a courtesy and document the request and response. Acquiring companies typically assume the contracts and records of acquired entities; the records should exist somewhere in the acquirer’s systems. If the vendor shut down with no acquirer: you likely cannot recover reference transcripts or processing logs. Document the gap: note in your compliance records that the vendor is defunct, the records are unavailable, and what retroactive steps you took (retroactive human transcription, retroactive accuracy measurement). For current gap: commission retroactive reference transcripts from a human transcription service for the highest-risk content; measure accuracy of the existing caption files against those retroactive reference transcripts; note the retroactive production date. For future protection: negotiate records-delivery-at-termination in every vendor contract, specifying that termination (for any reason, including vendor insolvency) triggers an immediate records export to the customer before any data retention window begins. See our vendor transition playbook for the full records transition framework.
Does using in-house captioning tools change the documentation requirements?
In-house tools shift the documentation obligation from vendor to your internal IT and quality team, but they do not reduce the documentation requirement. With in-house tools, you must document: the tool name and version used to produce each file (important if you update the tool), the model version if using a hosted ASR service (Whisper, Google Speech-to-Text, Azure Cognitive Services — model updates affect accuracy and must be tracked), the glossary terms active at time of processing and which glossary version was deployed, the accuracy measurement for each file, and who has reviewed and modified each file since production. The advantage of in-house tools: you have direct access to processing logs rather than depending on a vendor to retain them. The disadvantage: the documentation burden is entirely internal, which means it only gets done if someone is assigned to do it. The chain-of-custody and reference transcript requirements are identical regardless of whether captions are produced in-house or by a vendor.
What does “adequate accuracy” mean in a FINRA or OCR context?
Neither FINRA nor OCR has published a specific numerical accuracy standard for training video captions. WCAG 2.1 AA Success Criterion 1.2.2 requires that captions be provided for pre-recorded audio content but does not specify a numerical accuracy threshold. The closest published standard is the DCMP Captioning Key, which OCR has referenced in resolution agreements as the benchmark for caption quality. The DCMP Captioning Key evaluates captions on accuracy, completeness, synchronisation, and accessibility, with a scoring system that translates to approximately 99% word accuracy (measured by word error rate against a reference transcript) to achieve a quality level that satisfies a deaf or hard-of-hearing viewer. In practice, OCR resolution agreements typically require that auto-generated captions be reviewed before publishing and that the organisation have a documented accuracy measurement process — rather than specifying a numerical threshold. FINRA examinations have not (as of mid-2026) produced published guidance on a specific numerical caption accuracy threshold. The practical standard is: document your process, measure accuracy against the DCMP-informed target, demonstrate that your process produces captions adequate for a deaf or hard-of-hearing employee to complete the training with the same information content as a hearing employee. See our QA methodology guide for how to implement the measurement process and the GlossCap approach to per-customer glossary accuracy for how custom glossaries achieve consistent accuracy on regulated vocabulary.
The documentation your examiner will request is determined before you receive the letter
Caption compliance examinations — whether a FINRA routine examination, an OCR complaint investigation, or an EEOC charge — ask for documentation that must be built into your vendor contract and your internal processes before the examination request arrives. The audit-rights clause, the reference transcript retention commitment, the processing log records, the accuracy measurement records, and the accommodation request log are all products of decisions made at vendor contract signature and maintained over the years of the relationship. They cannot be assembled after the examination letter arrives — not in 10 business days, and often not at all, because the records that would allow retroactive assembly no longer exist.
GlossCap builds caption compliance documentation into the service from the first caption file. Every file includes a processing record with the model version, glossary version, and production timestamp. Reference transcripts are retained for the full contract period and produced within 5 business days on request. Accuracy is measured per file using the DCMP formula and the records are available in the customer dashboard. When an examination request arrives, the documentation is already assembled.