Higher Education · Published 2026-04-29
How to pick a captioning vendor if you're a public university after ADA Title II
If you are an accessibility-services director, instructional-technology lead, or digital-accessibility coordinator at a public university, the captioning-vendor procurement that has been on the back-burner for two years became a pressing line item on 2026-04-24. That is the day the U.S. Department of Justice's web-content rule under ADA Title II — 28 CFR Part 35, §§ 35.200–35.205, finalized at 89 FR 31320 — became enforceable for state and local government entities serving populations of 50,000 or more. Almost every public flagship and most public regional universities sit inside that scope, because state actors include public colleges and universities under longstanding ADA Title II case law (U.S. v. Univ. of Calif. (Maricopa), NAD v. Harvard / MIT, the UC Berkeley free-courseware takedown). Smaller public entities — special districts, towns, the long tail of community colleges in less-populous service areas — are on a 2027-04-26 deadline. This post is the buyer-journey for the procurement that follows: the five regulatory frameworks the captioning contract has to satisfy at once, the field of vendors already entrenched in higher education, the seven questions the RFP has to answer before purchasing services will sign it, and where GlossCap fits in the layer the incumbents do not actually serve well.
TL;DR
Public universities are squarely in ADA Title II scope, so the standard the contract must commit to is WCAG 2.1 Level AA for web content and mobile apps, including video produced or hosted by the institution. Four other frameworks already overlap on the same artifact: Section 504 of the Rehabilitation Act applies because the institution receives federal financial assistance, Section 508 applies indirectly through federally-funded materials and procurement language, IDEA Part D sits behind DCMP-funded captioning grants, and FERPA applies whenever a student is identifiable in a recording. The vendor field has four incumbents — Verbit, 3Play Media, Kaltura, Panopto — plus AI-Media as a frequent fifth, with Otter.ai often present as a shadow-IT option that fails FERPA review. The seven RFP questions are: WCAG conformance and a current VPAT, FERPA data-handling, accuracy SLA on the human-reviewed tier, glossary or vocabulary adaptation per department, LMS integration with Canvas/Blackboard/Brightspace plus the lecture-capture stack, language coverage, and pricing shape that survives the back-catalogue retrofit. GlossCap is not a replacement for the lecture-capture incumbent on live new-content captioning; it is the layer that handles the back-catalogue retrofit at sustainable per-hour cost and the departmental glossary the enterprise vendors do not own — research-lab terminology, course-specific acronyms, and program-level vocabulary that change every semester.
The five frameworks the captioning contract has to satisfy at once
One reason the higher-ed captioning RFP is harder than a private-sector one is that the same recorded lecture is governed by overlapping federal and state statutes whose requirements point in slightly different directions. A vendor that satisfies WCAG 2.1 AA but mishandles FERPA voids the procurement. A vendor that handles FERPA but does not produce a current VPAT cannot be added to the master agreement. Walk the stack from the outside in:
- ADA Title II — 42 U.S.C. § 12132 and 28 CFR Part 35, §§ 35.200–35.205. The DOJ rule finalized April 2024 (89 FR 31320) requires public entities to make web content and mobile apps conform to WCAG 2.1 Level AA, with phased deadlines: 2026-04-24 for entities serving populations of 50,000 or more, 2027-04-26 for smaller entities and special-district governments. Pre-recorded video on a public university website or LMS is "web content" under the rule, and Success Criterion 1.2.2 (Captions, Prerecorded) is the governing requirement. There is no de minimis exemption for video produced before the deadline; the rule covers the corpus, not the publication date.
- Section 504 of the Rehabilitation Act — 29 U.S.C. § 794. Applies to any program or activity receiving federal financial assistance, which includes virtually every U.S. college and university (Title IV financial aid, NIH/NSF/DOEd grants, Pell, GI Bill). 504 predates ADA Title II by 17 years and is not displaced by it; the Office for Civil Rights (OCR) has been enforcing 504 against universities for inaccessible video since the 2010s. Compliance under 504 is "effective communication" and "equivalent ease of use" — fuzzier than WCAG, but higher in spirit. The settlements with Harvard and MIT (NAD v. Harvard, NAD v. MIT, both consent decrees) were brought under both ADA and Section 504.
- Section 508 of the Rehabilitation Act — 29 U.S.C. § 794d. Section 508 binds federal agencies directly and reaches universities through two pathways: federally-funded research deliverables (NIH and NSF require 508-conformant materials in some grant categories), and procurement language that flows down through federal contracts to subrecipients. Most universities have written 508 conformance into their IT procurement standard regardless of whether a specific vendor relationship is federally funded, because it is the simplest cross-walk to ICT accessibility — the 508 Refresh adopted WCAG 2.0 AA, and the practical rewrite of campus standards now points at WCAG 2.1 AA.
- IDEA — 20 U.S.C. §§ 1400 et seq., specifically Part D. IDEA Part D funds the Described and Captioned Media Program (DCMP), which sets the de facto industry standard for caption quality (the DCMP Captioning Key — the document that originated the often-cited 99% accuracy figure). IDEA mostly governs K-12, but DCMP funding flows to post-secondary materials used by deaf and hard-of-hearing students at any level, and DCMP's Captioning Key is the reference U.S. higher-ed accessibility offices reach for when an OCR complaint forces them to define "accurate."
- FERPA — 20 U.S.C. § 1232g and 34 CFR Part 99. A captured lecture in which a student speaks, asks a question, or is visible on camera is an "education record" under FERPA the moment it is associated with a course roster. Sending that recording to a third-party transcription vendor is a disclosure that requires either a school-official designation under 34 CFR § 99.31(a)(1) or specific consent. The vendor contract must contain school-official language, a strict-need-to-know provision, a re-disclosure prohibition, and data-destruction terms. A vendor that processes recordings outside the U.S., uses customer audio for model training, or retains audio indefinitely will fail the FERPA review even if the WCAG box is ticked.
State-level frameworks add a sixth layer at most institutions. California's Government Code § 7405 ties state agencies (which includes the CSU and UC systems) to Section 508. The Texas TGC § 2054.451 imposes a parallel ICT accessibility requirement on state higher ed. New York, Illinois, Massachusetts, and Virginia have substantially-similar statutes. Practical effect: the contract must hit the federal stack and a state cross-walk that almost always points at WCAG 2.1 AA anyway.
Why the higher-ed captioning vendor stack already exists — and what is in it
Higher education has been buying captioning since well before the 2026 deadline, because Section 504 has been live since 1973 and OCR complaints about inaccessible online courses have been the most active enforcement vector for a decade. The result is a vendor field that is mature, well-integrated with the lecture-capture stack, and almost entirely priced for the prime-time, full-suite, multi-year-MSA campus use case. Five vendors come up in nearly every higher-ed RFP:
- Verbit — enterprise transcription with a proprietary adaptive ASR engine (Captivate™), heavy higher-ed installed base, integrations with Canvas, Blackboard, Brightspace, Kaltura, Panopto, Zoom, and Echo360. Pricing is dominated by annual contracts in the
$33K–$75K/yearband, with the entry$29/motier acting as a wedge into the enterprise sales motion (vocabulary adaptation lives behind the enterprise contract). See Verbit vs GlossCap for the head-to-head and the four-vendor pricing breakdown for the cross-cut. - 3Play Media — full media-accessibility platform: captioning, subtitling, audio description, translation, transcripts, the works. Long-standing higher-ed contractor with public rate cards on multiple university IT pages (Cornell IT being the canonical reference). AI-only around
$0.20/min, standard human-reviewed in the$1.75–$2.50/minband, with volume-discounted annual agreements typical for whole-system commits. See 3Play vs GlossCap. - Kaltura — primarily a video platform rather than a captioning vendor, but it operates as both because Kaltura's REACH service offers integrated machine and professional captioning workflows that drop the SRT/VTT directly back into the asset metadata. Many campuses run Kaltura as the lecture-capture backbone in addition to or instead of Panopto, and REACH is the path of least resistance for shops that already have the Kaltura contract.
- Panopto — lecture-capture incumbent at a large slice of U.S. and Canadian universities. Panopto ships its own integrated AI captioning, has a marketplace integration with both Verbit and 3Play for human-reviewed tiers, and exposes a webhook API for third-party caption upload. The Panopto-default AI captions are usable for compliance triage but typically do not meet the DCMP 99% accuracy floor without human review or strong glossary biasing.
- AI-Media — Australian provider, long-standing presence in higher ed and broadcast. Smile/Falcon/Lexi product line covers live captioning, recorded captioning, and broadcast workflows. Often the fifth name on the RFP shortlist after the four above.
Two notable absences and one shadow-IT problem are worth flagging. Rev is rarely on the higher-ed shortlist despite its scale and per-minute pricing; the marketplace model and historic FERPA / data-handling gaps have kept it on the wrong side of campus procurement at most institutions, though individual departments often use it for one-off transcripts. YouTube auto-captions are free and the structural temptation is real; they fail WCAG 2.1 AA at the 99% accuracy threshold on virtually every technical lecture, and a campus that depends on them is exposed on every word the auto-captioner mangles. Otter.ai is the shadow-IT option faculty reach for — it is good, it is cheap individually, and it is almost always the wrong answer at the institutional level because the consumer terms of service do not satisfy FERPA and the standard plan retains and trains on customer audio. If Otter is in your environment unauthorised, the captioning RFP is also a chance to surface and remediate it.
The procurement reality: what the RFP actually has to clear
Public university procurement is not the SaaS sign-up flow. The contract has to clear at least three checkpoints — purchasing services (master-services agreement, vendor onboarding, indemnification), accessibility services (the substantive WCAG conformance review), and information security (FERPA + the campus IT security review, often referencing NIST 800-171 controls). At larger systems, a fourth — the office of general counsel — reviews the data-protection addendum. The realistic timeline from "draft RFP" to "executed MSA" is 60 to 180 days. That timeline matters because the back-catalogue retrofit — the corpus of pre-deadline lectures that ADA Title II now reaches — is not waiting. The practical strategy on most campuses is to execute a bridge purchase order against an existing vendor or a smaller-dollar agreement that fits inside the dean-of-students or accessibility-office signing authority, in parallel with the longer MSA.
Three procurement realities shape vendor choice in ways the brochure does not always reveal:
- The VPAT is not optional and not negotiable. A current Voluntary Product Accessibility Template (VPAT 2.5 INT or US-Federal, in the WCAG 2.1 Level AA column) has to be submitted with the bid. The VPAT is the document the campus accessibility office reads first; a vendor that says "we will provide one upon request" gets ranked behind one that has it on the website. All four incumbents publish VPATs; some are more recent than others. Ask for the date.
- Single-source justification is required above the campus threshold. Most public universities require a competitive procurement above a dollar threshold (commonly
$5K,$10K, or$25Kper fiscal year) unless single-source justification is filed. A first-year captioning spend of$50K–$300Kat a mid-size public flagship is normal, which means the captioning purchase has to clear the competitive bid; a sole-source narrative based on existing LMS integration is defensible but has to be written down. The path of least resistance is to add the captioning vendor to an existing master agreement on the lecture-capture vendor where that vendor is a reseller or marketplace partner. - FERPA-compliant data handling is binary. The campus IT security review will reject any vendor that cannot sign a school-official DPA, retain audio outside the U.S. for non-FERPA-compliant durations, or trains foundation models on customer audio. Two of the five vendors above default to U.S.-region processing with a clean DPA; the others can negotiate to it but require explicit contract language. This is the most common reason a vendor-shortlisted-on-paper drops off the list at security review.
The seven questions the RFP has to answer
Before sending the RFP, write down concrete, quantitative answers to seven questions. Vendors that cannot answer them in writing — not "we can do that, let's discuss" — are not in scope. The seven:
- WCAG 2.1 Level AA conformance and a VPAT dated within the last 12 months. Specifically, conformance with SC 1.2.2 (Captions, Prerecorded) at the DCMP 99% accuracy floor, and the platform's own WCAG conformance for the editor / review UI / customer portal. Two artifacts have to be supplied: the VPAT and a sample VTT/SRT pair from a representative academic lecture.
- FERPA data-handling and a school-official DPA on file. U.S.-region processing, no model training on customer audio without separate explicit consent, audio retention not exceeding contract term plus 30 days, sub-processor list, breach-notification SLA. The DPA template should be the campus's preferred template, with redlines back to the vendor — not the vendor's template imposed on the campus.
- Accuracy SLA on the human-reviewed tier, in writing. 99% on substantive content, 98% on incidental, with definitions of "substantive" and "incidental" the campus actually agrees with. Both Verbit and 3Play will supply this; the AI-only tier is best-effort and should not be the SLA tier for student-impacting content.
- Vocabulary adaptation per department, course, or program. Glossary upload (custom dictionaries, name lists, abbreviations), per-course context, retention of glossaries across captioning runs, and how new terms get added between drops. This is where the gap between the enterprise vendor's "we have an admin panel" and the actual workflow at department level shows up. Glossary-biased decoding is now table stakes for technical departments — biology, computer science, engineering, music — where the proper-noun density is the determinant of whether the transcript is usable.
- LMS and lecture-capture integration. Canvas, Blackboard, Brightspace, Moodle, plus Kaltura and Panopto — the realistic combinations are usually one LMS plus one lecture-capture system plus Zoom for synchronous. Native integration (drop SRT/VTT back into the asset, not "you download a file and we upload it") matters at scale. Webhook-based auto-sync is the table-stakes ask.
- Language coverage. Most public universities now have non-English-speaking students who use captions as a comprehension aid, and ESL-support requirements are increasingly tied to Title VI national-origin compliance. Spanish at minimum, often Mandarin, Korean, Arabic, Vietnamese, French. Confirm that the AI-only and human-reviewed tiers cover the top two non-English languages on campus, and that the pricing model does not tax non-English captions disproportionately.
- Pricing shape that survives the back-catalogue retrofit. The two-year corpus the deadline reaches can be 2,000–10,000 hours at a mid-size public flagship. At per-minute pricing, that is the difference between a six-figure first-year line item and a low-six-figure one, and the difference between a flat-monthly subscription that scales with new content and a per-minute spend that is exactly proportional to corpus size. Triage of the back-catalog (which lectures actually need to be retrofitted, which can be archived or unpublished) is part of the answer.
Vendor-by-vendor: where each fits in a higher-ed stack
The four-or-five-vendor field is not a single ranking; it is a sort of role-and-fit problem. The honest read is that the campus rarely picks one vendor for everything — most large public universities run a primary captioning contractor (Verbit or 3Play) plus the in-platform option that ships with the lecture-capture system (Kaltura REACH or Panopto integrated AI), and accept that some long-tail content is served by Otter or YouTube auto-captions even though the institutional accessibility office wishes it were not.
| Vendor | Best fit | Where it underperforms |
|---|---|---|
| Verbit | The institution-wide MSA at a large public system. Captivate vocabulary adaptation, integrations with the entire LMS + lecture-capture stack, ASR + human-reviewed tiers under one roof. Best for the centralised accessibility office that wants one signature line. | Procurement-heavy onboarding, annual-commit lock-in. The departmental layer (one professor, one course glossary, one semester) is administratively heavy. |
| 3Play Media | Public-university procurement that already references Cornell IT or a peer's rate card. Mature DPA, mature VPAT, strong audio-description and translation lines that increasingly matter under WCAG 2.2. | No public pricing — every quote is a sales-call. Pro tier caps usage at 10 hours per year (not month), which catches campus shops off-guard at the SKU mismatch. |
| Kaltura REACH | Campuses that already have the Kaltura platform contract. Lowest friction — captions land back in the asset without an extra integration. Marketplace allows machine, professional, and accuracy-tier selection per video. | Cost can be opaque inside the bundled Kaltura contract; the per-minute rate is not always visible to the accessibility office. Tied to the platform; harder to extract if the platform contract changes. |
| Panopto integrated | The everyday lecture-capture flow at campuses already on Panopto — the AI tier is right there, no separate upload. Decent floor, predictable cost. | AI-only and substandard for technical content (engineering, biology, music) without glossary biasing the platform does not natively support per-course. Push to human-reviewed tier through Panopto's marketplace and the price stops being a Panopto price. |
| AI-Media | Live captioning use cases (commencement, public-affairs broadcasting, hybrid synchronous teaching) where the broadcast-grade stack matters. Solid recorded line as well, especially in international institutions. | Smaller U.S. higher-ed install base than Verbit or 3Play, so peer-references are harder to get; procurement appetite for "second-tier brand" varies by institution. |
| GlossCap | Departmental and back-catalogue layer. $29/$99/$299 flat-monthly, glossary-first decoding, sized for a single department, lab, or program — not a centralised RFP. Best as the layer beneath whatever incumbent owns the institution-wide contract. | Not a replacement for the lecture-capture stack. No live captioning v1. No human-reviewed tier — the 99% accuracy comes from glossary biasing on the AI tier, which is sufficient for most technical lecture content but not for the OCR-complaint-pending edge cases that need human signoff. |
Where GlossCap fits — the layer the incumbents do not actually serve well
The honest pitch is not "switch to GlossCap." A public university with a 2,000-hour back-catalogue and a 500-hour-per-month live new-content stream cannot run that on GlossCap and should not try. The honest pitch is: the institution-wide contract goes to one of the four incumbents, and GlossCap fills two well-defined gaps the incumbents do not optimize for.
- Back-catalogue retrofit at sustainable per-hour cost. The corpus the 2026-04-24 deadline reaches is large, mostly low-traffic, and structurally not worth the human-reviewed tier. At $99/mo for 30 hours on the GlossCap Team tier, the per-hour cost lands at roughly
$3.30/hour— an order of magnitude cheaper than the per-minute economics of the human-reviewed tier on Rev or 3Play, and roughly half the AI-only per-minute cost at the same tier. The Org tier ($299/mo, unlimited hours) becomes the right shape once the retrofit volume crosses 90 hours/month. The accuracy ceiling without human review is the constraint — see the 99% accuracy post for the audit method — but for archived lecture content where the alternative is "leave the video up with auto-captions or take it down," 99.2% glossary-biased AI captions are the correct deliverable. - Departmental glossary the enterprise vendors do not own. Verbit has Captivate, 3Play has glossary upload, Panopto has post-edit. None of them are the lab's terminology — the way a biology department's glossary updates when the new term's primer drops, the way a CS department's glossary updates when a new framework lands in the curriculum, the way a music department's glossary contains five spellings of "fortepiano" that all need to round-trip the same way. Glossary-biased decoding is GlossCap's load-bearing feature; the implementation is described in that post and the moat is that the glossary is owned by the department, lives in their Notion or Confluence or Google Doc, and is read by the captioning pipeline directly. The enterprise vendors are at the institutional layer; GlossCap operates at the departmental layer where the vocabulary actually lives.
Two cases where GlossCap is not the right answer, said clearly:
- Live captioning during synchronous instruction. Hybrid teaching, ASL-interpreted classes, real-time accommodations under Section 504 require live captioning that GlossCap does not offer in v1. The right vendors here are Verbit, AI-Media, or Panopto's live integration.
- OCR-complaint-pending content where signoff is required. If your office has a current Letter of Findings or a settlement agreement that names specific accessibility work, the contract probably names a human-reviewed tier. GlossCap's glossary-biased AI is sufficient for compliance triage but not for the human-signoff stage; route those assets to the incumbent on the institution-wide MSA.
Budget shape and the ADA-settlement risk profile
The unspoken budget question on every public-university captioning procurement is: what does the institutional risk look like if we underspend? The settlement record is the answer. NAD v. Harvard and NAD v. MIT (consent decrees, 2019–2020) committed both institutions to multi-year captioning programs with specific accuracy and turnaround SLAs. The UC Berkeley free-courseware takedown (2017) was the inverse — the institution chose to remove 20,000 hours of content rather than caption it, a path the 2026 rule effectively forecloses by reaching the corpus, not just new publications. Maricopa Community Colleges (DOJ, 2010s) and the Penn State NFB settlement (2010, predating much of the modern captioning vendor stack) are the older anchor cases. The pattern in every settlement: the cost of remediating after a complaint is materially higher than the cost of running an adequate captioning program prospectively, because the consent decree imposes timing, scope, and external monitoring on top of the underlying labour.
Practical budget framing for the accessibility-services director making the ask:
- Year 1 (back-catalogue retrofit + new-content captioning):
$80K–$300Kat a mid-size public flagship, depending on corpus size and human-review percentage. The bulk is the retrofit; new-content captioning is the steady-state line. - Year 2 onward (steady-state new content):
$30K–$120Kat the same institution, again a function of new lecture-capture volume and human-review fraction. - Risk-priced reference: a single OCR Letter of Findings with a five-year monitoring period typically commits the institution to
$200K–$800Kof remediation plus monitoring fees, depending on scope. The Year 1 number above is materially below the floor of the post-complaint number, which is the case the ask should make to the CFO.
Action plan: this week, this month, this term
If you are the accessibility-services director, instructional-technology lead, or compliance counsel reading this on the day after the deadline went live, the realistic plan is staged.
- This week. Triage the public-facing video corpus: which assets are on the public site (admissions, course catalog, public lectures, commencement), which are LMS-only, which are unpublished archives. The public-facing assets are the highest-risk surface and the smallest in count. Run the Caption Mangle Scanner at /embed-preview on a representative slice to get a quick read on auto-caption error density.
- This month. Start the RFP. Use the seven-question checklist above as the appendix. Run a parallel bridge purchase order on a small-dollar engagement (under the campus single-source threshold) to start retrofitting the highest-traffic public-facing assets while the longer MSA executes. The bridge engagement is where GlossCap fits cleanly — flat-monthly, sign up with a card, no procurement cycle, glossary owned by the department doing the retrofit.
- This term. Stand up departmental glossary practice — every academic unit owns its terminology list, refreshes per semester, and points the captioning pipeline at it. The glossary is the moat under the long-run accuracy floor; without it, the OCR complaint two years from now is going to be on the same proper nouns the auto-captioner mangled this term.
FAQ
- Is a private university covered by the same ADA Title II rule?
- No — private universities are covered by ADA Title III, which has not yet had a parallel web-content rule finalized. Private universities are still covered by Section 504 if they take federal financial assistance, which most do, and 504 has been the basis for OCR-driven captioning enforcement at private universities for years. The vendor-selection logic in this post applies; the specific 2026-04-24 deadline does not.
- What about community colleges?
- Most community colleges are public entities under ADA Title II. Whether they hit the 2026-04-24 deadline or the 2027-04-26 deadline depends on the population the entity serves — the test in the rule is the population of the public entity, not the enrollment of the college. A community college operated by a state government is on the 2026 deadline. One operated by a special district that serves under 50,000 is on the 2027 deadline. Either way, prepare now; the vendor procurement timeline is longer than the calendar gap.
- Does the rule require captioning of live synchronous classes?
- The DOJ rule references WCAG 2.1 Level AA, which includes SC 1.2.4 (Captions, Live) at AA. Live classes are real-time content under that SC. In practice, most universities meet the live-captioning requirement through an interpreter accommodation under Section 504 (an individualized service triggered by a registered student), not through universal live captioning. The rule does not change that case-by-case framework, but it does push toward more universal availability over time.
- How does the rule handle archived video that is no longer being used in active courses?
- Web content under the rule is "available to members of the public or used by the public entity in the conduct of its activities." Archived video that sits on a public-facing platform is in scope. Archived video stored internally and not made available is generally not. The remediation strategy that aligns with this distinction is: triage the public-facing archive, retrofit what stays public, unpublish or archive what does not warrant the captioning spend.
- Can we use student-employees to caption-edit instead of buying a vendor?
- Some campuses do, and it can be a defensible part of the program — typically for the human-review pass on top of an AI-generated draft, supervised by accessibility-services staff. The hard limits are FERPA (student employees handling other students' education records need school-official designation), labor law (hourly limits, training cost), and the DCMP Captioning Key quality floor that human review is supposed to clear. Student-edit programs are complement, not substitute. Most institutions that run them also have a vendor on contract for the bulk volume.
Where to go next
- University lecture capture captions — ADA Title II scope, Kaltura/Panopto flow, back-catalog retrofit
- ADA Title II captions — the 2026-04-24 deadline reference
- ADA Title II just became enforceable — the seven-day sprint plan
- Rev vs 3Play vs Verbit vs GlossCap — pricing breakdown for mid-market teams
- Verbit vs GlossCap — full head-to-head
- 3Play vs GlossCap — full head-to-head
- Rev vs GlossCap — full head-to-head
- WCAG 2.1 Level AA captions — the underlying spec
- SC 1.2.2 (Captions, Prerecorded) explained
- Kaltura captions — REST Caption API workflow
- Glossary-biased captioning — the Whisper implementation
- Why 99% caption accuracy matters — the WCAG threshold audit
- Live demo: caption-mangle scanner