Higher Education Operations · Published 2026-06-12

Captioning university lecture capture: Panopto, Echo360, Canvas Studio, Blackboard Ultra, and the academic-calendar compliance problem

University lecture capture sits at the intersection of the highest-stakes caption-compliance environment in U.S. higher education and some of the most technically difficult audio for automatic speech recognition. The existing post on picking a captioning vendor for public universities covers the procurement decision: which of the five higher-ed incumbents (Verbit, 3Play, Kaltura, Panopto, AI-Media) the contract has to go through, which of the five regulatory frameworks (ADA Title II, Section 504, Section 508, IDEA/DCMP, FERPA) the vendor must satisfy, and how GlossCap fits as the departmental-glossary layer the enterprise vendors do not own. This post is about the operational problem that the procurement decision does not solve: how do you actually get caption compliance across a Panopto or Echo360 deployment at a university where the lecture-capture back-catalogue has been growing for five to ten years, the academic calendar creates a new cohort of uncaptioned content every sixteen weeks, guest lecturers arrive without vocabulary files, course instructors change between semesters, and every recorded session that includes a student asking a question is also an education record under FERPA? This is the workflow problem — the one that sits between the contract award and the OCR investigation — and it has a different shape from the equivalent problem at a corporate L&D team. The academic-calendar constraint is the structural difference: a semester ends, a course closes, the cohort of students who needed those captions has already graduated or transferred or failed and re-enrolled. The compliance obligation does not expire with the semester. But the operational context for remediation is completely different from month two to month twenty-two of a captioning programme. This post covers the platform-by-platform caption workflows for Panopto, Echo360, Canvas Studio, Blackboard Ultra, and Kaltura; the institutional glossary architecture that makes the proper-noun problem manageable at scale; the academic-calendar compliance problem in detail; the semester-based remediation approach that makes a multi-year back-catalogue tractable; the proper-noun failure modes specific to academic content; eight failure modes in university lecture-capture caption programmes; and a seven-question FAQ covering the operational questions that digital-accessibility coordinators and instructional-technology directors encounter most often.

TL;DR

University lecture capture is different from corporate training video in five structural ways that compound the compliance problem: (1) the academic-calendar constraint — a new cohort of uncaptioned content arrives every sixteen weeks at a rate that outpaces back-catalogue remediation unless a new-content workflow is in place from day one; (2) the back-catalogue depth — a university that has been using Panopto or Echo360 since 2014 typically has seven to ten years of uncaptioned recordings representing thousands to tens of thousands of hours; (3) the proper-noun density — lecture audio has a dramatically higher density of faculty names, course codes, research terminology, lab names, and institutional acronyms than any other training-video category, and ASR fails on exactly these terms; (4) the multi-platform complexity — most universities run at least two of {Panopto, Echo360, Canvas Studio, Blackboard Ultra, Kaltura} simultaneously, each with a different caption upload workflow, different format requirements, and different accuracy-track behaviour; and (5) the FERPA compliance layer — every session in which a student is identifiable by name, image, or voice is an education record, and sending it to a captioning vendor requires specific contract provisions. The operational approach that works: build a three-tier institutional glossary (institution-wide, department, course) before processing anything; segment the back-catalogue by semester cohort and remediation priority rather than alphabetically; run new-content through a pre-publication caption gate; use a semester cadence for glossary maintenance and submission-rate review; and document the measurement methodology before the programme starts, because the twelve-month documentation trail is the difference between a compliant programme and a paper one under OCR audit.

Why university lecture capture is different from corporate training video

Corporate L&D teams produce training video in a relatively controlled environment: a defined content library, a known set of subject-matter experts, a production workflow that includes scripting and review, and a stable vocabulary set that changes slowly as the product evolves. The captioning workflow can be built around this structure: a submission gate at the content-production stage, a glossary that grows incrementally with each product release, and a QA process that samples from a library of known-structure content. The proper-noun failure modes are real — product names, SDK symbols, medical terms — but they are amenable to a glossary that is maintained by the same team that maintains the product documentation.

University lecture capture breaks every one of those assumptions. The content is produced in real time by faculty who are not following a script, in an audio environment ranging from a purpose-built lecture-capture classroom to a professor with a laptop in a conference room with street noise. The vocabulary is not just product-specific — it is discipline-specific, instructor-specific, and course-specific, changing every time a new instructor takes over a course section, every time a guest lecturer is added to the syllabus, and every time the curriculum is updated to reflect new research. The back-catalogue is not a library of reviewed, structured content — it is an archive of recordings going back years, captured at varying audio quality levels, with no consistent metadata about content type, instructor vocabulary complexity, or compliance status.

The academic calendar creates a compliance rhythm that has no corporate equivalent. At a corporate L&D team, the compliance obligation is continuous: every video in the library must be captioned, and the programme makes steady progress against that requirement over time. At a university, the compliance obligation is both continuous and episodic: the back-catalogue obligation is continuous, but a new semester adds a new cohort of course recordings every sixteen weeks. A university that runs four semesters per year (fall, spring, summer I, summer II) and captures 800 lecture hours per semester is adding 3,200 uncaptioned hours per year even while the remediation programme is working through the historical backlog. A programme that does not have a new-content gate in place at the start of semester one is not just remediating a back-catalogue — it is remediating a back-catalogue that is growing faster than the remediation rate.

The FERPA layer adds a compliance dimension that corporate training video does not have at all. A recorded lecture in which a student asks a question, contributes to a discussion, or is identifiable by name or image is an education record under FERPA (20 U.S.C. § 1232g, 34 CFR Part 99). The university's obligation to caption that recording for accessibility and its obligation to protect the student's educational records under FERPA operate simultaneously and do not always point in the same direction. Sending the recording to a captioning vendor is a disclosure of an education record, and the vendor contract must designate the vendor as a "school official" under 34 CFR § 99.31(a)(1), include a legitimate-educational-interest provision, prohibit re-disclosure, and require data destruction after processing. A captioning vendor that processes recordings outside the U.S., retains audio for model training, or does not have a formal FERPA-compliant DPA will fail the institutional legal review before the first session is processed. This is the reason the five-vendor field in higher-ed captioning exists — those vendors have invested in the FERPA compliance infrastructure that the mass-market transcription services have not.

The multi-platform complexity

Most universities do not run a single lecture-capture platform. The typical large-university technology stack involves at least two video surfaces for lecture content: a primary lecture-capture system (Panopto or Echo360 in the majority of deployments) and an LMS-integrated video layer (Canvas Studio, Blackboard Ultra Video, or Kaltura, depending on the LMS). Some universities run all four. Each platform has a different caption workflow, a different set of format requirements, different auto-caption track behaviour, and a different API for programmatic caption upload. The coordination problem — ensuring that the correct, glossary-biased caption file is on the right version of a recording across multiple systems — is not solved by any of the platforms automatically. It is solved by institutional workflow design, which is what this post addresses.

The vocabulary dimension is where the proper-noun problem concentrates. In a corporate training video, the proper nouns are product names, SDK symbols, and occasionally medical or legal terms. In a university lecture, the proper nouns include: faculty names (often from non-English language families with phoneme sequences that are not well-represented in ASR training data), course codes (alphanumeric identifiers like CS 224N, BIO 101A, HIST 3310-02, CHEM 221L), research terminology that may be cutting-edge and therefore absent from any ASR training corpus, lab names, building names, institutional acronym registers, name-of-act legislative citations, regulatory body abbreviations, grant numbers, conference names, journal names, and cross-referenced course codes that students need to be able to cite in their own assignments. The failure on any of these categories is more consequential than it sounds: "Dr. Bhattacharya" rendered as "Dr. Bach a charya" is not just an accuracy failure — it is the finding that an OCR investigator's screen-reader test will surface in the first five minutes of a sampling audit.

Panopto caption workflow for university deployments

Panopto is the dominant lecture-capture platform at U.S. and Canadian universities, with market share concentrated at research-intensive institutions. Its caption workflow has three relevant surfaces: the built-in ASR auto-caption track, the manual caption editor, and the SRT/VTT file import endpoint. For compliance purposes, the path to WCAG 2.1 AA accuracy runs through the third surface — importing a correctly-generated caption file — because the built-in ASR track does not reliably meet the 99% threshold on lecture content with institution-specific vocabulary.

Panopto ASR: what it gets right and what it does not

Panopto's integrated ASR runs on a general-purpose model with real-time output during the recording. It is fast, it is free (included in the licensing), and it produces a plausible first-pass transcript for most lecture content. For general-English lecture audio with common vocabulary, the word-error rate runs roughly 7–10% — which translates to 90–93% accuracy at the word level. That is not enough for WCAG 2.1 AA Success Criterion 1.2.2, which sets the compliance floor at 99% accuracy and is interpreted per-segment (a 60-second clip with a single mangled faculty name fails even if the surrounding text is perfect).

The failure pattern concentrates predictably on proper nouns. "Dr. Bhattacharya" renders as "Dr. Bach a charya" (phoneme-sequence failure on a Bengali-origin surname). "Dr. Ng" renders as "Dr. ing" (consonant-cluster initial that does not exist in standard English phonology). "CS 224N" renders as "C S two two four n" (alphanumeric course code with a letter suffix). "BIO 101A" renders as "B I O one oh one A" (department abbreviation plus section identifier). "Schrödinger's equation" renders as "Shrodingers equation" (umlaut-bearing proper name). "IL-6" renders as "I L six" or "il six" (cytokine nomenclature common in biomedical lecture content). "CRISPR-Cas9" renders correctly about 60% of the time and as some variant of "Krispy cast nine" the rest. Faculty names at the bottom of the proper-noun failure taxonomy — names like Smith, Johnson, Chen, or Lee — render correctly because they appear in ASR training data at scale. Names from South Asian, East Asian, Arabic, and sub-Saharan African language families fail at dramatically higher rates because they are underrepresented in the training corpora of any general-purpose ASR model.

The compliance implication is that Panopto's auto-caption track is usable as a first-pass triage tool (it identifies which sessions have grossly inaccurate captions) but is not sufficient as the caption of record for any session that will be part of an ADA Title II or Section 504 compliance claim. The compliance caption requires either (a) a human-reviewed transcript meeting the DCMP Captioning Key standard or (b) a glossary-biased ASR output with documented accuracy verification.

Importing a caption file to Panopto

Panopto accepts SRT and VTT caption files via the session editor. The import path is: open the session in Panopto Editor → Captions tab → "Import Captions" → select the SRT or VTT file. A few format requirements matter for clean import: UTF-8 encoding without a BOM, timing in the HH:MM:SS,mmm format (SRT) or HH:MM:SS.mmm (VTT), cue start times that do not overlap (Panopto's parser rejects overlapping cues without a useful error message), and a maximum line length of approximately 68 characters per cue line (longer lines wrap in the player and break the visual alignment). Speaker identification formatting in VTT (the <v Speaker> tag) is preserved in Panopto's display if the tenant has the multi-speaker track feature enabled; if not, the speaker tag is stripped silently.

Panopto allows multiple caption tracks per session — the ASR auto-track and one or more uploaded tracks can coexist. The visibility default depends on tenant configuration: the admin console can set which track displays by default. Best practice for a university retrofit is to leave the ASR track in place (it serves as a fallback if the primary track has a delivery problem) and upload the glossary-biased SRT/VTT as the primary track, then configure the default to show the primary track. This preserves Smart Chapter / topic-detection behaviour, which runs on the ASR transcript and is not affected by which caption track is set as the display default.

Panopto REST API for bulk caption upload

For a back-catalogue remediation project involving hundreds or thousands of sessions, the per-session editor workflow is not tractable. Panopto's REST API supports caption upload programmatically. The relevant endpoints are:

GET /api/v1/sessions — list sessions by folder with optional filters (created after date, folder ID). Returns session IDs, names, creation dates, and duration. The folder hierarchy maps to the department/course structure in most university tenants.
POST /api/v1/sessions/{id}/captions — upload a caption file to a specific session. The request body is multipart/form-data with the caption file and a language code (ISO 639-1 two-letter). The API returns the caption track ID on success.
GET /api/v1/sessions/{id}/captions — list existing caption tracks on a session, including the ASR auto-track. Returns track IDs, languages, and source types.
PUT /api/v1/sessions/{id}/captions/{trackId}/default — set the default display track for a session. Requires the session ID and the track ID returned by the POST upload.

Authentication uses OAuth 2.0 client credentials (client_id and client_secret from the Panopto admin console, scoped to api). The access token is short-lived (30 minutes); bulk upload scripts should refresh proactively. Rate limiting applies — the Panopto API documentation does not publish a specific rate limit, but in practice a sustained upload rate above 10 requests per second triggers 429 responses; 5–6 requests per second is a safe sustained rate for bulk operations.

A practical bulk-upload pipeline for a university back-catalogue:

Export the full session list by folder (department) from the Panopto admin console — this gives session IDs, durations, creation dates, and ASR-caption status.
Cross-reference against the compliance inventory: which sessions already have a verified-accurate caption file? Which are auto-ASR-only? Which have no caption track?
For the priority pool (auto-ASR-only, high viewer count, mandatory curriculum courses), generate the glossary-biased SRT/VTT in batch via GlossCap or your vendor of choice.
Upload via the API, writing session ID → caption track ID → upload timestamp to the compliance log. This log entry is the evidence that the session was remediated, when it was remediated, and what methodology was used.
Set the new track as default via the PUT endpoint. Verify the display behaviour in the Panopto player by spot-checking a sample of ten sessions.

Panopto and FERPA

Panopto's FERPA position is that the platform itself acts as a school official under 34 CFR § 99.31(a)(1) when deployed on the university contract. The caption vendor processing Panopto sessions needs a separate FERPA designation. If you are routing Panopto sessions through GlossCap or any third-party captioning service, the vendor contract must include school-official language, a legitimate-educational-interest provision, a re-disclosure prohibition, a data-minimisation clause (audio not retained beyond the processing window), and a data-destruction certification. GlossCap's standard university contract includes all five. If you are using a vendor whose contract template does not include these provisions, the institutional legal office will flag it before the first session is processed — and if it is flagged after the first session has already been processed, the FERPA exposure is real.

Echo360 caption workflow for university deployments

Echo360 (now part of Echo360 by Epson in the lecture-capture hardware division, but operating as a standalone video platform at most tenants) is the second-largest lecture-capture platform in U.S. higher education and the dominant platform in Australian higher education. Its caption workflow differs from Panopto in two relevant ways: the auto-caption default is more aggressive (Echo360 enables ASR captions on all recordings by default in most configurations), and the platform's integration with Canvas is tighter than Panopto's (Echo360 was built as a Canvas-first platform, while Panopto started as a standalone system and added Canvas integration later).

Echo360 caption surfaces

Echo360 has three caption surfaces: the built-in EchoCapture auto-transcript, the in-platform caption editor, and the SRT/VTT file upload. The EchoCapture auto-transcript is enabled by default in most tenant configurations and generates a word-level transcript with timestamps. Like Panopto's ASR, it performs reasonably on general lecture audio (90–93% word accuracy) and fails systematically on proper nouns. The failure modes are the same in character: faculty names from non-English language families, discipline-specific terminology, institution-specific acronyms, course codes. The accuracy floor is not WCAG 2.1 AA-compliant on technical lecture content.

The Echo360 in-platform caption editor is more capable than Panopto's equivalent: it allows word-by-word timing correction, bulk find-and-replace for systematically misrecognised terms, and a side-by-side view of the audio waveform and the transcript text. For small-scale correction of a high-value lecture (a keynote, a thesis defence, a distinguished-lecture recording), the in-platform editor is usable for human review. For bulk back-catalogue remediation, it is not tractable — at the correction rates that qualify as careful human review (8–12× real-time for dense technical content), processing 1,000 hours of lecture video through the in-platform editor would require 8,000–12,000 person-hours.

Importing SRT/VTT to Echo360

Echo360 accepts SRT and WebVTT files via the media editor. The import path is: open the media item → Edit → Captions → Upload Captions → select SRT or VTT. Format requirements for Echo360: UTF-8 without BOM, cue IDs are required in SRT (Echo360's parser throws an error on SRT files without numeric cue IDs), timing separator must be the comma-space format (00:00:10,000 --> 00:00:13,500), and cue overlap is not permitted. Echo360 does not retain the original EchoCapture transcript when you upload a replacement — the uploaded file becomes the primary caption track and the auto-transcript is not accessible to the viewer unless the administrator re-enables it.

The critical difference from Panopto: Echo360 does not expose a per-session caption upload API endpoint in the same way Panopto does. The Echo360 REST API (version 1.0, documented in the Echo360 developer portal) provides session metadata, playback data, and basic content management, but caption upload via the API requires using the undocumented internal endpoint that the web UI calls — which means the API-based bulk upload approach that works cleanly for Panopto requires either (a) reverse-engineering the web UI's request structure and maintaining that integration as Echo360 updates the frontend, or (b) using Echo360's integration with a supported third-party captioning vendor (Verbit and 3Play both have direct Echo360 integrations through the Echo360 marketplace) as the bulk-upload pathway. For institutions that are using GlossCap for the glossary-biasing layer and a supported vendor for the Echo360 API integration, the workflow is: generate the SRT/VTT from GlossCap → hand off to the integration vendor → use the vendor's Echo360 API integration for upload → log the caption track ID returned by the vendor integration to the compliance log.

Echo360 and Canvas integration

Echo360's Canvas LTI integration means that lecture recordings often appear in Canvas course modules as embedded Echo360 players rather than as standalone Echo360 URLs. This creates a caption-management complexity: the caption file lives in Echo360, but the viewing context is Canvas. If a student accesses a captioned lecture through the Canvas LTI embed and the captions are not rendering correctly in the embedded player, the issue may be a Canvas LTI configuration problem rather than an Echo360 caption problem — the LTI player inherits the caption settings from Echo360, but the embedded context can suppress the caption toggle UI in some Canvas theme configurations. The accessibility-coordinator check for Canvas/Echo360 LTI embeds should include a cross-browser player test in the Canvas course context, not just verification that the caption file uploaded successfully to Echo360.

Echo360 also surfaces recordings in Canvas Studio (if the institution has both) through a content library integration. When a recording exists in both Echo360 and Canvas Studio, there may be two caption files — the one on the Echo360 asset and the one on the Canvas Studio asset — that need to be managed separately. Institutions that have migrated from Echo360 to Canvas Studio for some content categories and retained Echo360 for others should have a clear inventory of which recordings live in which system, because the compliance obligation applies to the version the student actually accesses, not the version the coordinator captioned.

Canvas Studio caption workflow

Canvas Studio (formerly Arc) is Instructure's integrated video tool for Canvas LMS. It is not a lecture-capture system in the same sense as Panopto or Echo360 — it does not have classroom hardware integration — but it functions as the video layer for instructor-created course content, student video assignments, and media embedded in Canvas courses. At universities where Canvas is the LMS, Canvas Studio is often present alongside the primary lecture-capture platform, handling a different content category: short instructor-recorded video content, video quizzes, student video submissions, and media converted from YouTube or Vimeo for course use.

Canvas Studio caption surfaces

Canvas Studio has two caption surfaces: the built-in auto-caption feature (which triggers automatically on upload and generates a VTT file using a third-party ASR service) and the manual caption editor. The auto-caption accuracy on Canvas Studio tends to be slightly better than Panopto's ASR on general English lecture content because the backend ASR service (currently AWS Transcribe in most configurations) uses a more recent model — but the failure mode on domain-specific proper nouns is qualitatively identical. "Prof. Kourakos" renders as "Prof. Cora cos." "NF-κB" (a transcription factor common in molecular biology courses) renders as "N F Kappa B" approximately 40% of the time and as "N F Kappa beta" or "N F Kappa be" the rest. "Pfizer-BioNTech" renders correctly because it is in the training corpus from COVID-19 media coverage; "Moderna" renders correctly; "Novavax" renders as "Nova vax" or "nova backs" at approximately equal frequency.

The Canvas Studio manual caption editor is word-level with a search/replace function. For a short instructor-recorded video (five to fifteen minutes), the manual editor is a reasonable human-review surface. For longer content or bulk back-catalogue work, it is not tractable for the same reason as Echo360's in-platform editor.

Importing SRT/VTT to Canvas Studio

Canvas Studio accepts SRT and VTT file upload via the media detail view: open the media item → Captions → Upload → select the file. Format requirements: UTF-8 without BOM, VTT requires a "WEBVTT" header on line 1, cue overlap is rejected silently (the cues before the overlap render; cues at or after the overlap disappear from the player without an error). Canvas Studio replaces the existing auto-caption track with the uploaded file rather than adding a second track — there is no multi-track support analogous to Panopto's. The uploaded file is the caption of record from that point forward.

Canvas Studio has an API for media management (the Canvas Data API), but caption upload is not currently exposed in the public API documentation. The Canvas REST API allows retrieving media comment captions from Canvas courses but does not support updating them programmatically. For bulk operations, the practical approach at the time of writing is: generate the VTT files → upload each via the web interface → document the media item ID and upload timestamp in the compliance log. For institutions with large Canvas Studio libraries, this is a material operational cost, and several accessibility-services teams have built browser-automation scripts to handle the upload loop — which works, but introduces a maintenance burden as Canvas Studio UI updates.

Canvas Studio video quizzes and student submissions

Two content categories in Canvas Studio require special treatment. Video quizzes — Canvas Studio's embedded-question format where the video pauses and presents a question at a specified timestamp — need caption files that include the quiz question text alongside the lecture audio captions, because the question text is not automatically included in the auto-caption track. Institutions that produce ADA Title II-compliant video quizzes need either (a) an SRT/VTT file that covers both the lecture audio and the quiz question text, which requires a custom export from the Canvas Studio quiz builder, or (b) separate accessibility documentation for the quiz components that argues equivalent access through the quiz interface itself. Neither approach has been tested in OCR adjudication at the time of writing, but the consensus among higher-ed accessibility legal counsel is that the safer position is option (a).

Student video submissions in Canvas Studio raise a distinct FERPA question. A student's video submission is an education record and cannot be sent to a third-party captioning vendor without either the student's consent or a school-official designation. For disability-accommodation scenarios where a deaf or hard-of-hearing student needs captions on a peer's video submission, the university must caption it internally using faculty-generated text or a designated school-official vendor under the FERPA contract. The institutional captioning programme's commercial vendor contract does not automatically cover student-submission captioning — this is a separate workflow that should be defined in the disability-services office's accommodation procedures.

Blackboard Ultra Video caption workflow

Blackboard Ultra (the modern redesign of Blackboard Learn) integrates video through two pathways: the built-in video recording tool (Blackboard Collaborate recordings can be embedded directly) and external video embeds via the Rich Content Editor. The caption workflow depends on which pathway the content entered the LMS.

Blackboard Collaborate recordings

Blackboard Collaborate is the synchronous conferencing tool integrated with Blackboard Ultra. Collaborate recordings are stored in Blackboard's cloud infrastructure and embedded in course materials via the Recordings panel. Collaborate generates auto-captions on recordings using an ASR service — accuracy is comparable to Canvas Studio at roughly 90–93% on general English, failing systematically on the same proper-noun categories. Caption upload to Collaborate recordings is not supported natively in the Collaborate web interface as of the current version; the workflow for institutions that need compliant captions on Collaborate recordings is to download the MP4 recording, generate the caption file externally, and re-embed the video with the caption file attached via the Rich Content Editor rather than using the native Collaborate recording embed. This is not a clean workflow — it duplicates the recording storage, loses the Collaborate playback analytics, and requires a manual step every time a Collaborate recording needs to be captioned. It is, however, the current state of the platform.

Blackboard Ultra's video content block (the non-Collaborate path) allows SRT file upload directly alongside a video file. The format requirements for Blackboard Ultra video: UTF-8, SRT format (Blackboard Ultra does not accept VTT in the video content block as of the current version in most deployments — confirm with the institutional Blackboard instance administrator), cue IDs required, timing in HH:MM:SS,mmm format. The caption file is stored alongside the video content block and renders in the built-in Blackboard Ultra video player.

Blackboard Ultra and the migration from Blackboard Original

Many universities are mid-migration from Blackboard Original (the legacy interface) to Blackboard Ultra. The caption workflow between the two interfaces is not compatible: caption files attached to video content in Blackboard Original courses do not carry forward automatically to Blackboard Ultra when the course is converted. The LMS migration caption checklist covers the general migration-caption problem; the Blackboard-specific version is: audit Original courses for caption files before conversion, export those files, re-upload to the Ultra-converted course versions. If the Original courses have been converted using Blackboard's auto-migration tool without a pre-migration caption audit, the captioned content may have become uncaptioned content in the Ultra version — and there may be no record of which Original courses had captions, because Blackboard's migration logs do not include caption-file transfer status.

Institutions that are planning or mid-migration should run a caption-inventory audit of Blackboard Original content before completing the Ultra migration, or accept that the migration will require a full caption re-audit of the converted courses. Either approach is workable; the failure mode to avoid is completing the migration without addressing captions and discovering the caption-transfer gap during an OCR audit two years later.

Kaltura in university deployments

Kaltura operates as both a video platform and a captioning vendor at many universities through its REACH service (Recapped Enhanced Audio Captioning). The Kaltura REACH workflow integrates captioning directly into the Kaltura MediaSpace platform: faculty submit content to REACH through a service-level request within MediaSpace, Kaltura routes it to machine or human review depending on the order type, and the completed caption file is deposited back into the Kaltura media asset automatically. For institutions on Kaltura with a REACH subscription, this is the lowest-friction captioning workflow available — the caption file lands in the right place with no manual upload step.

The limitation of REACH is vocabulary adaptation. Kaltura REACH does not support customer-provided glossaries in the machine-captioning tier in most contract configurations — vocabulary biasing is available at the human-reviewed tier at the cost premium that implies ($1.75–$2.50/min versus $0.15–$0.25/min for machine captioning). For institutions that need glossary-biased machine captioning at scale — the scenario where 40% of the catalogue has high-density proper nouns and the per-minute cost of human review is prohibitive — the hybrid workflow is: generate the glossary-biased caption file via GlossCap, upload it to the Kaltura media asset via the Kaltura caption asset API (below), and set it as the default track. REACH then functions as the human-review workflow for content that fails the DCMP spot-check after initial machine captioning, not as the primary captioning workflow for all content.

Kaltura's caption asset API for programmatic upload:

captionAsset.add — creates a caption asset attached to an entry, returning a captionAssetId. Required fields: entryId, language (ISO 639-1 two-letter), format (1 = SRT, 2 = DFXP, 3 = WebVTT, 4 = CAP). Optional: label (display name shown in player), isDefault (boolean — set to true to make this the default track).
uploadToken.add + uploadToken.upload — create an upload token and upload the caption file content to it. Returns an uploadTokenId.
captionAsset.setContent — associate the upload token with the caption asset. After this call, the caption file is available in the player.
captionAsset.setAsDefault — explicitly set the caption asset as the default display track for an entry. Equivalent to setting isDefault=true at creation but callable separately after the fact.

Authentication uses Kaltura's session-based API with an app token or a Kaltura session (KS) generated from the admin partner ID and admin secret. The KS has a configurable expiry (default 24 hours); bulk upload scripts should generate a new KS at the start of each run rather than reusing a potentially-expired one. Kaltura's API rate limit is more generous than Panopto's — a sustained rate of 20–30 requests per second is typically supported without 429 responses on a standard university contract.

The institutional glossary architecture for university lecture capture

The proper-noun failure modes in university lecture capture are not random — they concentrate in a predictable taxonomy that maps to an institutional glossary architecture. Building the glossary before processing the back-catalogue rather than after is the difference between a retrofit that runs at $3–4/hour and one that runs at $1.75–$2.50/minute (the human-reviewed tier). The three-tier glossary structure that works for university deployments:

Tier 1: Institution-wide glossary

Institution-wide vocabulary is the foundation. These are terms that appear in lecture audio across all schools, colleges, and departments: faculty names (the full faculty directory, with pronunciation guides where available), building names, institutional acronyms (the registrar, controller, dean of students, and IT abbreviations that appear in administrative communications and therefore in the academic vocabulary students use), named programs and centres (the Institute for Advanced Study, the Neuroscience Training Program, the Center for AI Safety), the institution's accreditation bodies (HLC, SACSCOC, WSCUC, NECHE), the institution's research administration vocabulary (IRB, IACUC, ORI, sponsored programs, sub-awards), and the regulatory bodies that appear in compliance training content (OCR, DOJ, NSF, NIH, USDA, EPA, OSHA).

The institution-wide glossary is the easiest tier to build and the highest-leverage tier to have: a single list of, say, 800–1,200 terms covers the proper-noun failure modes that appear across every school and department. Most universities already maintain a faculty directory (for the website), a building list (for campus maps), and an institutional acronym reference (for the new-employee orientation). Connecting or converting these sources into a GlossCap glossary takes one to two days and produces a term list that immediately improves accuracy on any lecture audio that references institutional vocabulary — which is almost all lecture audio, because faculty routinely reference colleagues, buildings, grant numbers, and institutional programs in the first and last five minutes of any lecture.

Tier 2: School and department glossary

Department-level vocabulary is discipline-specific: the technical terminology of the field, the named theorems and laws that students are expected to know, the founding figures of the discipline (whose names appear in attribution and citation), the methodological vocabulary that is dense in certain disciplines (e.g., "heteroskedasticity," "immunoprecipitation," "decolonisation," "phenomenology"), the journal and conference names that faculty cite in context ("as we published in Nature Methods," "this came up at NeurIPS last year"), and the regulatory or professional standards vocabulary that is specific to the field (HIPAA for health sciences, OSHA for engineering safety courses, GAAP and IFRS for accounting, IEEE 802.11 standards for networking courses).

The department-level glossary is typically 300–600 terms per department. The sources are: the department's graduate reading lists (which name the canonical texts and their authors), the syllabi for the highest-enrollment courses (which name the methods, readings, and assignments students are responsible for), the department's research group pages (which list the specific technical terms the faculty use in their published work), and any department-maintained style guide or vocabulary reference. Instructional designers who work with specific departments often already have this vocabulary documented informally in course-production notes — a conversation with the instructional designer for the School of Engineering yields a faster and more accurate starting vocabulary list than trying to extract it from syllabi alone.

Tier 3: Course and instructor glossary

Course-level vocabulary is the most granular tier and the most dynamic. It changes every semester as the curriculum is updated, as different instructors take over sections, and as guest lecturers add their own naming conventions, research vocabulary, and proper-noun register. The course-level glossary typically has 50–150 terms and is sourced from: the course syllabus (readings, assignments, named topics), the course textbook index (a dense term list that the publisher has already curated for the disciplinary vocabulary), any course-specific material the instructor has prepared (lab protocols, formula sheets, term-definition handouts), and the instructor's CV or recent publications (for the instructor's own research vocabulary, which will appear in lectures that discuss their current work).

The course-level glossary is the least tractable to maintain at scale but also the tier where the accuracy gain per term is highest. A course-specific term like "the Blackman-Harris window function" (a signal-processing term that appears in three lectures of a digital signal processing course) will not appear in either the institution-wide or department glossary, but it will be mangled in every lecture where it appears — "Blackman Harris" renders correctly approximately 30% of the time and as "Blackmon Harris," "black man Harris," or "black man and Harris" the rest. Adding it to the course-level glossary costs thirty seconds; the alternative is a human reviewer spending three minutes on every occurrence.

Glossary maintenance cadence

The institution-wide glossary updates at a low frequency: new faculty hire cycle (twice yearly for research universities, once for smaller institutions), building and program naming changes (annually), and administrative vocabulary changes (as they occur). The department-level glossary updates at medium frequency: annually for stable disciplines, before each semester for departments with rapidly evolving research areas. The course-level glossary is the most dynamic: it should be updated at the start of each semester for every course in the captioning programme, and updated mid-semester if the course switches instructors (e.g., a faculty illness substitution) or adds a guest lecturer with substantially different vocabulary.

The semester cadence also applies to quality feedback: the accessibility coordinator should review the accuracy logs at the end of each semester and identify systematic failures that should be added to the glossary. A term that failed in five lectures across three courses in a semester belongs in the department-level glossary; a term that failed in twenty lectures across seven courses belongs in the institution-wide glossary. The feedback loop that compounds accuracy over time is built from this semester-level review cycle, and it is the mechanism by which the per-hour cost of caption production falls over successive semesters as the glossary accumulates.

The academic-calendar compliance problem in detail

The academic calendar compliance problem has three dimensions: the back-catalogue depth, the new-content rate, and the course-closure timing problem. Each dimension requires a different operational response.

Back-catalogue depth

A university that began deploying Panopto or Echo360 in 2015 and did not have a captioning workflow in place until 2026 has an uncaptioned back-catalogue representing ten academic years of lecture recordings. At a typical large research university, that is roughly 30,000–60,000 recorded sessions. At sixty to ninety minutes per session, the total uncaptioned back-catalogue is in the range of 30,000–90,000 hours of lecture audio. Even at a generous machine-captioning throughput of 100 hours of processed audio per hour of wall-clock time (the approximate speed of batch Whisper-large processing), a team that has access to the full institutional glossary can process the entire back-catalogue in machine time in 300–900 wall-clock hours — three to six weeks of continuous batch processing. The bottleneck is not processing speed; it is prioritisation and reviewer capacity.

Prioritisation for back-catalogue remediation should follow a risk-weighted approach rather than an alphabetical or chronological approach. The highest-priority tier is: current-semester courses (students currently enrolled need captions now), courses in mandatory-curriculum categories with high enrollment, courses with documented accommodation requests on file with the disability-services office, and courses in departments where OCR complaint patterns suggest elevated scrutiny (courses related to health sciences, law, and engineering have historically appeared in OCR complaint samples more often than average because the technical vocabulary failure mode is more visible in those disciplines). The lowest-priority tier is: archived courses with no current enrollment, elective courses with historically low enrollment, and non-credit or continuing-education courses that are not in the primary ADA Title II scope.

The semester-cohort approach to back-catalogue remediation is more tractable than a flat priority list. Rather than attempting to remediate the entire back-catalogue at once, a semester-cohort approach remediates one or two academic years' worth of recordings at a time, starting from the current year and working backwards. The institution makes a compliance commitment: all recordings from 2024–2025 are remediated by end of Q1 2027; all recordings from 2023–2024 by end of Q2 2027; all recordings from 2022–2023 by end of Q3 2027. The OCR has accepted phased back-catalogue remediation plans in a number of resolution agreements, provided the plan is documented, the implementation is tracked, and the institution demonstrates good-faith progress at each evaluation point.

New-content rate and the pre-publication gate

The new-content problem at universities is more severe than at corporate L&D teams because the new-content rate is outside the institution's control. A faculty member who records a lecture at 08:00 and makes it available to students by 10:00 has created a compliance obligation that the captioning workflow has to satisfy before the next class session — typically 48 to 72 hours, not 30 days. The corporate L&D model of "publish to the LMS after captions are complete" does not map onto the lecture-capture model of "make the recording available to enrolled students the same day or the next morning."

The university captioning workflow therefore has to accommodate three publication scenarios: (1) same-day publication with a same-day caption (enabled by the institutional glossary — with the full three-tier glossary, machine captioning of a one-hour lecture takes approximately 5–10 minutes; the caption file can be uploaded before the recording is made available in the LMS); (2) next-day publication with a same-day caption submission (the standard workflow for non-urgent lecture recordings — the recording is made available in the LMS the morning after the lecture, the caption is submitted to the captioning service the same afternoon, and the caption is available when the recording is published); and (3) accommodation-priority publication (a student has a disability accommodation requiring captions; the disability-services office notifies the captioning coordinator; the recording is expedited through the captioning workflow and captions are available within 24 hours of the recording being made). All three scenarios require a submission log entry so the new-content submission rate can be measured accurately at the end of each semester.

The practical implementation of the pre-publication gate at the LMS level varies by platform. At Panopto, the administrator can configure Panopto folders to require an approved caption track before content is visible to students — this is a tenant-level configuration option that effectively creates a pre-publication gate without requiring a manual check on each session. At Echo360, the equivalent configuration is available through the Echo360 Section Management settings (content visibility can be set to "instructor controlled" with an accessibility review step). At Canvas Studio, there is no native pre-publication caption gate — the gate has to be enforced at the Canvas course level (publishing the course module only after confirming that caption files are attached to all video content items). At Blackboard Ultra, the equivalent is managing the content item visibility settings to keep video content hidden until captions are confirmed present.

The guest lecture problem

Guest lecturers are the most difficult edge case in the university lecture-capture captioning workflow. A guest lecturer who appears in one or two sessions of a course has no vocabulary profile in the institutional glossary. Their name may not be in the faculty directory. Their research vocabulary may not be in any of the three tiers of the institutional glossary if their discipline is sufficiently distant from the host department. Their audio quality may be lower than the department norm if they are appearing via Zoom rather than in person. And the course coordinator may not notify the captioning service of the guest lecture in advance, meaning the first indication that a session has unusual vocabulary is when the ASR output contains systematic failures on a new proper-noun register.

The guest lecture workflow requires a specific protocol: the course coordinator notifies the captioning service at least 48 hours before the guest lecture (to allow vocabulary sourcing from the guest's publicly available materials — faculty page, recent publications, conference talks); the captioning service builds a temporary course-level vocabulary extension from those sources and adds the guest's name in phonetic override format; the session is processed with the extended glossary; the temporary extension is reviewed and either retained (if the guest will appear again) or archived (if the appearance is one-off). The permanent archive of guest-lecture vocabulary extensions is a secondary benefit of this protocol — it accelerates the captioning of future guest appearances by the same speakers, and it serves as documentation that the captioning process accommodated the unusual vocabulary requirement, which is relevant if the session is sampled in an OCR audit.

The course-closure timing problem

The most counterintuitive aspect of the academic-calendar compliance problem is that the students who needed the captions most are often no longer enrolled by the time the back-catalogue remediation reaches their course cohort. A student who took BIO 211 in fall 2022 and had a disability accommodation requiring captions may have graduated in 2024. The compliance obligation to caption BIO 211 fall 2022 recordings remains — the ADA Title II obligation does not expire with the cohort — but the remediation urgency is lower than for current-semester content.

This timing asymmetry creates a moral-hazard problem in compliance programme design: if the urgency signal comes from accommodation requests and the accommodation request disappears when the student graduates, the programme has no urgency signal for historical content. The OCR resolution agreements from the Harvard and MIT cases addressed this explicitly: the obligation is prospective (caption all new content before or at the time of publication) and retrospective (remediate the back-catalogue on a documented schedule). The schedule does not require prioritising courses based on whether a documented accommodation request existed — it requires prioritising based on the risk framework described above, with good-faith progress demonstrated at each evaluation point.

Proper-noun failure modes specific to university lecture capture

The full taxonomy of proper-noun failure modes covers fifteen categories across all training-video contexts. University lecture capture concentrates in a specific subset with higher failure rates than most other training-video contexts because of the density and diversity of the vocabulary. The categories that matter most in university lecture audio:

Faculty and researcher names

Faculty names are the most salient failure because they are the terms that students will search, cite, and encounter on exams. The failure rate is highest for names from South Asian, East Asian, Arabic, and sub-Saharan African language families, for reasons rooted in ASR training data composition: the Whisper model family, including Whisper-large, was trained on audio data that overrepresents English, Spanish, French, German, and other European-language content relative to names from other language origins. Names like Bhattacharya, Subramanian, Krishnamurthy, Chatterjee, Raghunathan (South Asian); Xu, Zhao, Ng, Wong, Huang (East Asian); Al-Farabi, Ibn Khaldun, Abdel-Rahman, El-Sisi (Arabic-origin); Okafor, Osei, Mensah, Nkrumah, Acheampong (sub-Saharan African) fail at rates between 40% and 90% depending on the phonemic complexity of the name and the audio quality of the recording.

The glossary fix for faculty names uses phonetic override: rather than spelling the name in its standard form, the glossary entry specifies how the name sounds in the context of the speaker's accent. "Prof. Bhattacharya" entered as "Prof. Battacharya" (dropping the initial aspirated consonant cluster) captures more correct outputs than the standard spelling because ASR decoding is probabilistic — the model looks for the most likely word sequence, and "Battacharya" is close enough to the ASR output distribution that the decoder selects it more reliably than "Bhattacharya," which has no representation in the training prior. The phonetic-override approach requires the captioning service to have actually listened to the audio or to have queried a native speaker of the name's language family — it cannot be generated from standard transliteration alone. GlossCap's glossary review process includes native-speaker phonetic verification for names outside the standard English-language training prior.

Course codes and section identifiers

Course codes are uniquely difficult for ASR because they combine two categories that ASR handles poorly: department abbreviations (short sequences of capital letters that are pronounced as individual letters or as acronyms, neither of which is well-represented in conversational audio training data) and alphanumeric identifiers (number sequences with letter suffixes that do not follow natural English number-word phonology). "CS 224N" is the Stanford NLP course, pronounced "C-S two-twenty-four-N" — the N at the end is the most common failure point, because "N" spoken after a number suffix is often decoded as "and" or dropped entirely. "EECS 6.832" (MIT Underactuated Robotics) is pronounced "double-E-C-S six point eight three two" — the department abbreviation spelling-out is reliably transcribed, but the decimal point in the course number often disappears ("six eight three two" rather than "six point eight three two"), which produces a different course number entirely. "BIO 101A" fails on the section letter "A" approximately 50% of the time — the "A" is soft and often merged with the preceding number by the ASR decoder.

Course codes belong in the course-level tier of the institutional glossary, not the institution-wide tier, because the code has no meaning outside the context of the specific institution and course. For the department-level tier, a prefix list is useful: all codes starting with "CS," "EECS," "BIO," "CHEM," "HIST," "ECON" with their pronunciation guides, so that when a course code appears in audio, the prefix is decoded correctly even if the course number itself is not yet in the glossary.

Research terminology and cutting-edge vocabulary

Lecture content at research universities is often drawn from the instructor's recent research, which means the vocabulary includes terms that may have entered the literature within the last year or two and are therefore absent from any ASR training corpus. The failure rate for cutting-edge research vocabulary varies widely by discipline — in molecular biology, a term like "CAR-T cell therapy" was reliably transcribed by 2022 (it had been in the medical news cycle for years), but "CAR-NK therapy" (natural killer cell variant) was still failing at high rates in 2024 because it had not entered the high-frequency media coverage that feeds ASR training data. In AI/ML, terms like "attention mechanism" and "transformer architecture" are now reliable, but "LoRA fine-tuning" (Low-Rank Adaptation), "RLHF" (Reinforcement Learning from Human Feedback), and "constitutional AI" are still failing inconsistently. In economics, "monopsony" is reliable; "heterogeneous agent model" is not. In history, all place names are reliable except the most obscure; personal names from non-European language origins follow the same failure pattern as faculty names.

The operational fix for cutting-edge vocabulary is to source the course-level glossary from the instructor's recent publications — the abstract and methods section of the instructor's most recent three papers will contain the precise terminology they use in lectures, in the exact form in which it should appear in the transcript. A course-level vocabulary build from three recent abstracts takes fifteen to twenty minutes and produces a list of fifty to one hundred high-confidence course-specific terms that would otherwise fail in every lecture.

Institutional acronyms and named programs

Every university has a dense register of institutional acronyms — the abbreviations used in internal communications that faculty use conversationally in lectures because they assume students know them. "ORI" (Office of Research Integrity) renders as "ory" or "oree." "IRB" (Institutional Review Board) renders correctly about 80% of the time and as "IR B" or "I R B" the rest. "SACSCOC" (Southern Association of Colleges and Schools Commission on Colleges) is a phonological disaster — it renders as "sax cok," "sachs cock," "SACS coke," and various other approximations. "HEOA" (Higher Education Opportunity Act) renders correctly because "HEOA" is a pronounceable acronym; "IPEDS" (Integrated Postsecondary Education Data System) renders as "I-PEDS" (correct) about 60% of the time and as "I-pets" the rest. The institution-wide glossary should include every institutional acronym with a pronunciation guide and the full-form expansion, formatted for the ASR glossary as: "SACSCOC" → override hint "SACS-coc" (the nearest phonemic approximation that the ASR model will decode correctly).

The university lecture-capture compliance workflow

A university caption compliance workflow structured around the academic calendar has four phases: pre-semester setup, in-semester production, end-of-semester review, and back-catalogue remediation. Each phase has distinct activities and documentation outputs.

Pre-semester setup (two weeks before semester start)

The pre-semester window is the highest-leverage moment in the academic year for the captioning programme. Activities:

Update the course-level glossary register. For every course in the captioning programme, verify that the course-level glossary is current for this semester's instructor. If the instructor has changed (substitute, sabbatical replacement, visiting professor), update the glossary from the new instructor's recent publications. If the course has been updated with new readings or a new textbook, add vocabulary from the new materials.
Add guest lecturer vocabulary. For any scheduled guest lectures (from the course syllabus), build the temporary vocabulary extensions from the guest's publicly available materials and pre-register them in the course-level glossary with an expiry flag (so they can be reviewed after the guest appearance and either retained or archived).
Confirm platform configuration. Verify that the pre-publication caption gate is correctly configured in Panopto/Echo360/Canvas Studio for this semester's course folders. A platform update over the summer may have reset configuration settings. Spot-check that the default caption track display is set correctly in Panopto for existing courses.
Brief new instructors on the submission workflow. Any faculty member who is new to the institution, new to a course, or new to using the lecture-capture system should receive a two-paragraph email from the captioning coordinator explaining the submission workflow before the first lecture. The failure mode is a faculty member who has been at the institution for ten years but has never used the lecture-capture system, records their first lecture on day one of the semester, and makes it available to students before the captioning coordinator knows it exists.
Open the semester's compliance log. Create a new row in the compliance documentation for this semester with the target submission-rate metric (95%+ for new content), the target accuracy pass rate (95% on DCMP spot-check), the courses in scope, and the relevant glossary versions. Date and version the log entry. This is the header of the documentation that will exist for this semester in the twelve-month compliance record.

In-semester production

The in-semester workflow is primarily about maintaining the submission rate and catching failures early. Activities by week:

Weekly: Review the new recordings that have appeared in Panopto/Echo360/Canvas Studio since the last review. For each new recording: confirm that a caption file has been submitted (if not, add to the urgent remediation queue); confirm that the caption file is attached to the recording in the LMS (if not, upload from the captioning service delivery); spot-check one three-minute segment of two to three recordings per week at random against the DCMP standard. Log any failures immediately. Week one failures in particular often indicate a glossary gap — if the first lecture of a course is producing systematic failures on a specific term, that term should be added to the course-level glossary before the second lecture is processed.
After each guest lecture: Review the guest lecture recording within 24 hours. Guest lecture vocabulary failures need to be caught before the recording is viewed widely, because once students have seen a failure (a mangled faculty name, a critical technical term rendered as nonsense), the reputational cost to the programme is done. Correct the caption file, re-upload to the LMS, and add any unresolved terms to the glossary.
Monthly: Pull the submission log and calculate the new-content submission rate for the month. If it is below 95%, identify which courses or instructors have not been submitting recordings through the workflow. This is a management conversation, not an IT problem — the most common cause of submission-rate failure mid-semester is a faculty member who decided that the submission workflow was too burdensome and started making recordings available without going through the captioning service.

End-of-semester review

The end-of-semester review is the audit equivalent for the semester's caption production. Activities:

Compile the semester compliance summary. Total recordings in the captioning programme for this semester; total captioned (compliant caption file attached); submission rate (actual vs target); accuracy pass rate on the DCMP spot-check sample (actual vs target); remediation queue items (count and median days to close). This is the dashboard row for the semester — the equivalent of the monthly measurement in the compliance reporting framework.
Review the glossary for systematic failures. Identify terms that failed in three or more recordings this semester. Terms failing in one course belong in the course-level tier; terms failing across departments belong in the institution-wide tier. Add them to the appropriate glossary level and version the glossary.
Archive completed course recordings. For courses that will not be taught again next semester, flag the recordings as archived in the compliance inventory. Archived recordings still need to be captioned — the ADA Title II obligation doesn't expire — but they can be de-prioritised relative to current-semester content. If the course is being taught again next semester, verify that the course-level glossary for this semester is ready for re-use in the next instance.
Update the back-catalogue remediation plan. With the current semester's content fully captioned, the back-catalogue remediation schedule advances by one semester. Update the plan: the next semester cohort in the remediation queue, the expected completion date, and the current machine-captioning throughput rate at the institutional glossary's current accuracy level.
Close the semester compliance log entry. Sign off the semester row in the compliance documentation with the final metrics. This is the record that will be produced if an OCR investigator asks for documentation of the institution's captioning programme for the fall 2026 semester. The twelve months of signed-off semester rows is the compliance trail.

Back-catalogue remediation approach

The back-catalogue is typically the largest compliance obligation for a university that has been using lecture capture for more than two or three years. The semester-cohort approach described above addresses the ongoing question (what do I do with new content each semester) but does not address the historical back-catalogue. For the back-catalogue, the approach that is both operationally tractable and legally defensible is:

Inventory. Export the full recording inventory from Panopto/Echo360, including session IDs, durations, creation dates, folder paths (which typically map to departments and courses), and current caption status (ASR-only, manually uploaded, or no captions). This is a one-time export that establishes the scope of the obligation.
Prioritise. Apply the risk-weighted prioritisation: current-semester content first (handled by the in-semester workflow), then recently-archived content from the most recent two to three academic years, then content associated with documented accommodation requests, then high-enrollment mandatory-curriculum courses, then all remaining content by semester cohort in reverse chronological order.
Build the institutional glossary. The institution-wide and department-level tiers of the glossary should be complete before back-catalogue batch processing begins. Processing 5,000 sessions without the institutional glossary and then re-processing after the glossary is built doubles the processing cost. Building the glossary first and running batch processing once is the standard approach.
Process in cohorts. Process the back-catalogue in semester cohorts through the batch machine-captioning pipeline. For each cohort, generate SRT/VTT files with the institutional glossary applied, upload to the relevant platform via the API, and log the session ID → caption track ID → processing date to the compliance spreadsheet.
Sample-verify each cohort. After processing each semester cohort, run a DCMP spot-check on a random 5% sample (minimum: sample the ten highest-enrollment courses in the cohort). Log the pass rate. If the pass rate is below 95%, identify the failure mode (glossary gap, audio quality issue, speaker accent not covered by phonetic overrides) and remediate before processing the next cohort.
Document the progress. The back-catalogue remediation plan and the per-cohort completion log are the documentation that demonstrates good-faith progress under a phased-remediation model. This documentation should exist as a separate document from the in-semester compliance log, with explicit start date, milestone dates, and completion dates for each cohort.

Eight failure modes in university lecture-capture caption programmes

Failure mode 1: Treating Panopto/Echo360 ASR as the caption of record

The most common institutional failure is treating the platform's built-in ASR track as compliance-sufficient. It is not — ASR at 90–93% word accuracy fails WCAG 2.1 AA's 99% threshold by a wide margin on technical lecture content. Institutions that discover this during an OCR audit face the double problem of explaining why they believed the ASR track was sufficient and remediating the entire back-catalogue under investigation pressure. The safe position is to document explicitly that the ASR track is not the compliance track, that a glossary-biased or human-reviewed track is the compliance track, and to maintain both tracks on all sessions so the distinction is visible in the platform data.

Failure mode 2: No pre-publication gate for new content

The most consequential operational failure is allowing new lecture recordings to become student-accessible without a caption file. At a university generating 800 lecture hours per semester, a programme without a pre-publication gate is accumulating 800 hours of uncaptioned content per semester — which is 800 hours of ADA Title II exposure per semester. The pre-publication gate is a platform configuration, not a process request to faculty — it is the accessibility coordinator configuring Panopto or Echo360 to require a caption track before a recording is student-visible, not asking faculty to remember to submit captions before publishing.

Failure mode 3: Single-tier glossary

Institutions that build an institution-wide glossary and nothing else are solving 40–50% of the proper-noun problem. The department-level and course-level tiers are where the highest-density failure modes concentrate, particularly in research-intensive departments. A glossary that covers faculty names and building names but not cutting-edge research vocabulary and course codes will produce machine-captioning output that looks acceptable at the title level and fails systematically inside the lecture content.

Failure mode 4: Ignoring guest lecturer vocabulary

Guest lecturers are the fastest way for a semester's accuracy pass rate to fall below the 95% target. A programme that has built a solid institutional glossary for its permanent faculty but has no guest-lecture vocabulary protocol will have consistent accuracy failures in the one to two sessions per course where guest lecturers appear. Over a semester with 50 courses and an average of two guest lectures per course, that is 100 sessions with elevated failure risk — enough to drop the accuracy pass rate from 97% to 89% if the guest lecturers are all from high-failure-risk vocabulary domains (medicine, AI, international affairs).

Failure mode 5: No FERPA contract for the caption vendor

Institutions that select a caption vendor without verifying the FERPA contract provisions are creating a FERPA exposure that is legally distinct from the ADA Title II exposure but potentially larger in consequence — FERPA violations can result in the loss of federal financial assistance for the institution. The five provisions that must be present in any caption vendor contract processing university recordings are: school-official designation, legitimate-educational-interest scope, re-disclosure prohibition, data-minimisation and processing-only restriction, and data-destruction certification. The vendor's default contract template often does not include all five — the institutional legal office should review and redline before the first session is processed.

Failure mode 6: Caption files not reaching the LMS

The caption file exists in the captioning vendor's delivery system but has not been uploaded to the platform where students access the content. This is a workflow-gap failure rather than a captioning-quality failure, but it is indistinguishable from no captions at all from the student's perspective (and the OCR investigator's perspective). Programmes that use a vendor with no direct platform integration — delivering SRT/VTT files via email or file transfer — have a higher incidence of this failure mode because the file delivery and the upload are two separate steps with a manual handoff between them. Automating the upload step via the Panopto or Kaltura API eliminates this failure mode for platforms that support API caption upload.

Failure mode 7: No per-semester compliance documentation

A programme that runs a good in-semester captioning workflow but does not document the per-semester compliance metrics has no evidence of its own performance. If an OCR investigation opens two years after the programme was implemented, the institution needs to demonstrate what its captioning programme did in each of the intervening eight semesters — not just assert that it had a programme. The documentation is not complex: a spreadsheet with one row per semester, the seven metric columns (in-scope recordings, captioned, submission rate, accuracy pass rate, remediation queue items, median remediation days, exception count), and a sign-off date. That row exists, or it does not.

Failure mode 8: Conflating lecture-capture captions with LMS captions

A lecture recorded in Panopto and embedded in Canvas as a Panopto LTI link has a caption file managed in Panopto. The same content, if it is also downloaded and re-uploaded to Canvas Studio for a flipped-classroom workflow, has a different caption file managed in Canvas Studio. If the Canvas Studio version is updated (a clip is trimmed, a section is replaced) and the caption file is not re-generated, the Canvas Studio version is uncaptioned or has captions that do not match the audio. This is the multi-platform complexity problem: the compliance obligation attaches to the content the student accesses, and the student may access the same underlying lecture through two different surfaces that have independent caption management. The solution is a canonical source-of-truth: the lecture-capture platform (Panopto/Echo360) is the caption-of-record; any re-embedded version in Canvas Studio or Blackboard Ultra inherits from the same caption file rather than having an independent caption file. This requires explicit institutional workflow design — it does not happen automatically when the platforms are integrated.

Frequently asked questions

Our university has Panopto and Canvas Studio running simultaneously. Do we need separate caption programmes for each?

No, but you need a clear institutional policy about which platform holds the caption of record and how captions on the other platform are managed. The recommended approach: Panopto is the caption-of-record for lecture-capture content (recordings captured by the classroom infrastructure). Canvas Studio is the caption-of-record for instructor-created course content (short videos recorded directly in Canvas Studio, video quizzes, student submissions). The two programmes share the institutional glossary — the same three-tier glossary (institution-wide, department, course) is used to generate captions for both platforms — but the compliance documentation is maintained per-platform, because the submission rate for Panopto content (instructor records, Panopto captures, captioning service uploads) is structurally different from the submission rate for Canvas Studio content (instructor records directly, caption auto-generated, coordinator verifies). If the two programmes share a single compliance log, it is impossible to diagnose which platform is generating submission failures when the aggregate rate drops below target.

Our institution has 12,000 uncaptioned Panopto sessions from 2015–2023. Where do we start?

Start with the institutional glossary, not with the back-catalogue. The time investment in building the three-tier glossary (institution-wide in one to two days, five to ten department-level tiers in parallel over one to two weeks) is recovered within the first semester of batch processing because the glossary-biased machine captioning needs only one pass instead of two (one machine pass followed by human correction of glossary failures). With the glossary complete, run a pilot batch on the 100 highest-enrollment, highest-priority sessions in the back-catalogue — the courses with documented accommodation requests, the mandatory-curriculum courses in the current semester's active cohorts, and the courses in departments where OCR complaint patterns have been highest historically. The pilot batch gives you an accuracy pass rate against the DCMP standard (the 5% random sample) and a per-hour processing cost that you can scale to the full back-catalogue plan. The pilot typically takes two to three days to process and one day to spot-check; by day four you know your actual accuracy rate on institutional content and can project the time and cost to complete the back-catalogue. Document the pilot results — they are evidence of good-faith programme initiation that matters if an OCR complaint arrives before the back-catalogue remediation is complete.

How does FERPA interact with the OCR disability investigation? Can OCR request the caption files that include student audio?

OCR (the Department of Education's Office for Civil Rights) investigating under ADA Title II or Section 504 can request documentation of the institution's captioning programme — policies, accuracy logs, compliance timelines, sample caption files. OCR is not required to see recordings that include identifiable student audio to conduct the investigation; the investigator can conduct accuracy testing on sessions that are lecture-only (no student audio) or on anonymised transcripts. If OCR requests a specific recording that includes identifiable student audio, the institution should provide the transcript with student identifiers redacted rather than the audio file itself. This position is consistent with the FERPA exception for legal proceedings at 34 CFR § 99.31(a)(9) (disclosure to comply with judicial order or lawfully-issued subpoena), which OCR investigations are not — they are administrative proceedings, not judicial ones. The captioning service should never provide audio files directly to OCR investigators; the institution's general counsel manages any document requests from OCR.

A faculty member recorded a lecture before we had a captioning programme and shared it as a YouTube link in the Canvas course materials. Who is responsible for captioning it?

The institution is responsible for ensuring that all content in its LMS that is subject to ADA Title II is accessible — regardless of whether that content is hosted on the institution's infrastructure or on a third-party platform. A YouTube link in a Canvas course module is part of the institution's LMS content, and the institution's ADA Title II obligation applies to it. YouTube auto-captions at 80–90% accuracy do not meet WCAG 2.1 AA. The practical resolution: the faculty member provides the source video file; the institution runs it through the captioning workflow; the captioned version is uploaded either to Panopto (as a non-lecture-capture asset) or to Canvas Studio (as a video content item) and the Canvas course link is updated to point to the captioned version. If the source file is not available (the faculty member recorded it on a personal device and no longer has it), the audio can be extracted from the YouTube video and run through the captioning workflow — the SRT/VTT output can then be used to replace the YouTube auto-caption track (via YouTube Studio) or uploaded alongside a Canvas Studio version of the video. The key documentation step: log the resolution method and the date the captioned version replaced the YouTube link in the Canvas course materials.

We use Echo360 and our contract with a captioning vendor was signed three years ago. Do we need to re-sign for FERPA compliance?

Likely yes. FERPA contract requirements for caption vendors were not consistently included in vendor agreements three years ago — the regulatory pressure on vendors to include explicit school-official designation, data-minimisation, and data-destruction terms accelerated significantly after 2023 as OCR increased enforcement activity. A contract signed in 2022 or 2023 should be reviewed by the institutional legal office against the five required provisions: (1) school-official designation under 34 CFR § 99.31(a)(1); (2) legitimate-educational-interest scope definition; (3) re-disclosure prohibition; (4) restriction to processing-only use (the vendor cannot use student audio for model training or any purpose beyond the specific captioning service); and (5) data-destruction certification specifying the destruction timeline after processing is complete. If any of the five is absent or ambiguous, an addendum or contract amendment is appropriate before the next semester's recordings are processed. The vendor's legal team will typically accept a short addendum addressing the FERPA provisions rather than requiring a full contract re-execution.

How do we handle the transition period if we are switching from Echo360 to Panopto mid-academic-year?

The LMS migration caption checklist covers the general back-catalogue migration problem; the Echo360-to-Panopto specific version has two additional steps. First: before migrating content, export the Echo360 caption files (SRT) for all recordings that have a compliant caption track. Echo360's bulk export functionality allows downloading caption files by folder; use this to create a local archive of all caption files before the migration begins. Second: after migrating the video content to Panopto, re-upload the caption files to the corresponding Panopto sessions via the API, preserving the session ID cross-reference (Echo360 session ID → Panopto session ID → compliance log entry) so the compliance trail is intact. Do not rely on the platform migration tool to migrate caption files — the Echo360-to-Panopto migration tooling (if available at your institution through a Panopto professional services engagement) may migrate video content successfully but may not migrate caption files, because caption file migration requires a separate API call to a separate endpoint on the destination system. Verify caption file transfer explicitly on a pilot batch of 20–30 sessions before running the full migration, and document the verification in the compliance log.

Is there a penalty structure for a public university that is not yet fully compliant with ADA Title II for its lecture-capture back-catalogue?

ADA Title II enforcement by OCR typically proceeds through a complaint-investigation-resolution agreement cycle rather than through automatic penalties. A public university that receives an OCR complaint about inaccessible lecture-capture content is asked to provide documentation of its captioning programme and compliance status. If the programme is real — documented, measurable, demonstrating good-faith progress — OCR typically resolves the investigation through a voluntary resolution agreement (VRA) that specifies a remediation timeline, monitoring requirements, and reporting obligations. The institution is not assessed financial penalties in the first instance. Penalties under ADA Title II become available to the Department of Justice if the institution refuses to enter a VRA or fails to meet VRA commitments — a scenario that is extremely rare for institutions acting in good faith. The practical lesson: the documentation of the programme (the semester-by-semester compliance log, the institutional glossary version history, the back-catalogue remediation plan and progress) is what determines whether an OCR investigation resolves quickly in the institution's favour or becomes a protracted enforcement action. An institution with clean documentation that demonstrates an active, improving programme is in a fundamentally different position than one with no documentation. See the 90-day programme build guide and the compliance reporting framework for the documentation architecture that creates this record.

Build the institutional glossary that makes your back-catalogue tractable

GlossCap's three-tier institutional glossary architecture — institution-wide, department, course — is designed specifically for the university lecture-capture compliance problem. Upload your faculty directory and department vocabulary lists, connect to Confluence or Google Docs for ongoing maintenance, and process back-catalogue batches with glossary-biased Whisper-large output that meets the DCMP 99% accuracy standard on your institution's specific vocabulary. The Team plan ($99/month) covers 30 hours of video per month plus Notion/Confluence/Docs glossary sync and a reviewable edit UI. The Org plan ($299/month) covers unlimited hours, SSO, custom glossary model, and LMS webhooks — including the Panopto and Kaltura API integrations for automated bulk upload.

See GlossCap pricing Try the caption demo