Platform reference · Canvas LMS · Instructure · LTI 1.3

Canvas LMS captions: caption upload, Studio, and the back-catalogue retrofit pattern

Canvas LMS by Instructure is the dominant learning-management system in US higher education — and a growing fraction of corporate L&D — which means a Canvas captioning workflow has to think about three video surfaces at once: course-level SRT/VTT files in the Files area, Canvas Studio's auto-caption-and-publish pipeline, and the larger video catalogues delivered through external tools (Panopto, Kaltura, YouTube, Vimeo) embedded into Canvas via LTI. After ADA Title II's 2026-04-24 deadline, the urgent operational task at every public-college and public-university Canvas tenant is the back-catalogue retrofit — and the failure mode auditors find is the proper-noun mangling generic auto-captioning produces with predictable regularity.

TL;DR

A Canvas captioning workflow has three surfaces. (1) Files-area video — instructors upload an MP4 and a sidecar SRT or VTT caption file; the Canvas video player reads the sidecar via the track element. (2) Canvas Studio — Instructure's video host inside Canvas, with auto-caption + caption editor + accessibility-checker integration; auto-captions are off by default in older tenants and have to be enabled per-account. (3) External video via LTI — the bulk of higher-ed video lives in Panopto, Kaltura, or YouTube and is embedded through an LTI tool; captioning runs through that platform, not Canvas. The ADA Title II deadline at 2026-04-24 (now live) makes this catalogue-wide. The retrofit pattern is: inventory the catalogue → identify which surface owns each asset → re-caption the high-stakes content first (compliance training, regulated-program courses, public-facing welcome videos) → publish glossary-biased captions back to each surface → log the asset register for OCR-sampling readiness.

Why Canvas captioning is now urgent: ADA Title II Section 36.504

The Department of Justice's final rule under ADA Title II (28 CFR Part 35, with the digital content provisions at the new Section 36.504) bound state and local-government public entities to WCAG 2.1 Level AA on web content and mobile apps. The compliance date for large public entities — including public universities and large community-college systems — was 2026-04-24, and that date has now passed. SC 1.2.2 (Captions, Prerecorded) is the operative success criterion; the substantive bar is captions that accurately convey the audio.

Public colleges and universities are the densest concentration of Canvas tenants in the United States. The 2026-04-24 deadline created an immediate audit-evidence task: every video in every active course had to either have substantively accurate captions on it, or had to be removed, or had to have a documented accommodation pathway. Auto-captions in the 80–90% accuracy band do not clear the SC 1.2.2 substantive-accuracy bar when the words being mangled are the words the student is being tested on.

Surface 1 — Files-area video with sidecar captions

The most basic Canvas captioning surface is video uploaded to a course's Files area. The instructor uploads an MP4, MOV, or other supported container; Canvas serves it through the built-in video player. To attach captions:

Upload the caption file (SRT or WebVTT) to the same Files area.
Edit the page or assignment that embeds the video.
In the rich content editor (RCE), open the video's properties and select the caption file under "Add caption track."
Save. The video player now exposes the CC button and reads the captions through the HTML5 track element.

The Files-area surface is what every Canvas instructor knows how to use. It's also where the back-catalogue lives at most institutions — the videos uploaded over the last six to ten years that pre-date Canvas Studio and never had captions attached. Two operational realities apply:

Caption files are first-class file objects. They have their own permissions, can be moved between courses, and can be downloaded by anyone with read access. Don't put PII or unredacted student feedback into caption text — it's now a downloadable file.
Caption files don't auto-attach across course imports. When an instructor copies a course from one term to another, the videos and the caption files come over but the caption-track association in the RCE typically does not survive. Re-attach captions after every course copy, or use Canvas Studio for content that travels across terms.

For format choice, see the SRT and VTT reference pages — Canvas's HTML5 player handles both, with VTT preferred when the captions need styling cues.

Surface 2 — Canvas Studio

Canvas Studio (formerly Arc) is Instructure's in-LMS video host. Studio provides:

Auto-captioning. Speech-to-text on uploaded video, producing an editable caption track aligned to a per-word timeline.
A caption editor. Browser UI for fixing the auto-caption text and timing per word or per cue.
Comments and questions. Time-anchored discussion threads on the video.
Insights. Per-student and per-cohort viewing analytics.
Accessibility-checker integration. The Canvas accessibility checker flags Studio videos that are published without an enabled caption track.

The relevant Studio behaviour for captioning operators:

Auto-captions are off by default in many older accounts. Per-account settings control whether Studio runs auto-captioning on uploads. After the ADA Title II deadline, most tenants flipped this on globally, but check the account-level Studio settings — uploads from before the toggle moved still need to be re-processed.
Re-process and replace. The Studio caption editor supports replacing the caption track wholesale. The supported workflow for vendor-supplied captions is to delete the auto-caption track and upload a clean SRT or VTT in its place; this is what re-captioning the back-catalogue with glossary-biased output looks like.
Captions travel with embeds. When a Studio video is embedded in a course page, assignment, or discussion via the Studio LTI button, the captions follow the video. This is what makes Studio the right surface for any video that recurs across multiple courses or terms.
Multi-language tracks. Studio supports multiple caption tracks per video. The student-facing player shows a language selector. For institutions that serve a multilingual student body — common in California state systems, City University of New York, the Texas systems, and federal-grant-funded programmes that require Spanish-language access — Studio is the sanest place to host the multi-track delivery.

The Studio caption editor is functional for line-by-line corrections but doesn't scale to a back-catalogue retrofit. For volume work, the operational pattern is: bulk-export the auto-captions out of Studio, run them through a glossary-biased re-captioning pass, and bulk-replace.

Surface 3 — External video through LTI

The largest video catalogues in higher education are not in the Canvas Files area or in Studio. They are in external lecture-capture and video-hosting platforms — Panopto, Kaltura, YouTube, and Vimeo — embedded into Canvas through LTI 1.3 tools. The captioning workflow follows the external platform, not Canvas:

Lecture-capture content typically lives in Panopto or Kaltura, with the Canvas LTI tool surfacing a course folder of videos. Captions are uploaded inside the platform's UI or via API and follow the video into Canvas.
Marketing, public-facing welcome content, and short instructional segments are often on YouTube — embedded through the Canvas RCE's media insertion. YouTube captions are the responsibility of the channel owner; YouTube auto-captions are explicitly insufficient for Title II.
SMB and continuing-education content frequently lives on Vimeo, with embeds through the same RCE pathway. Vimeo supports up to five caption formats per video; the SRT or VTT track travels with the embed.
Modern async-video tools like Loom appear in Canvas through institutional Loom Education licences and the Loom LTI app; captioning is Loom's responsibility, but the auto-transcript failure mode is the same as YouTube's.

The catalogue-inventory step that opens any retrofit must look across all three surfaces. The 2026-04-24 deadline applies to every video the student encounters through Canvas — regardless of which platform actually hosts it.

The OCR sampling pattern, applied to a Canvas tenant

The Office for Civil Rights (US Department of Education) is the primary federal enforcement body for higher-education ADA and Section 504 complaints; for ADA Title II compliance, public-entity actions also flow through the DOJ. The OCR's sampling pattern, when an investigation lands on a Canvas tenant, is consistent across the cases that have been published:

Identify a course. Often the complainant names a specific course, and the institution provides the course URL.
Open a recent module. The investigator looks for video — instructor-created lecture, a guest-speaker recording, a procedural demonstration, a regulated-content module.
Watch a slice with captions on. Two to three minutes is enough to assess whether the captions track the speaker, including the named technical terms.
Read the caption track against the audio. Mangled proper nouns (drug names, regulatory citations, technical product terms, institution-specific programme names) are the failure pattern that gets flagged in writing.
Sample the back-catalogue. If the named course fails, the investigator typically samples a half-dozen other active-term courses to check for a pattern. A pattern triggers a programme-wide finding.

The proper-noun failure mode is what generic auto-captioning is structurally bad at. The words that distinguish a competent caption from a mangled one — the regulatory citations a healthcare student is being tested on, the SDK symbols a software-engineering student must read off the screen, the procedure names in a nursing module, the institution-specific course numbers and faculty names that anchor the conversation — are exactly the words generic STT has the least training data for.

The back-catalogue retrofit pattern

For an institution sitting on years of un-captioned or auto-captioned Canvas video, the retrofit runs in five phases:

Inventory. Generate a flat list of every video asset across Files, Studio, and the LTI-embedded platforms. Canvas's API exposes the Files objects; Studio has its own enumeration endpoint; Panopto / Kaltura / YouTube / Vimeo each have list APIs. Most institutions discover that 60–80% of the catalogue lives outside Files.
Triage. Rank by exposure: required courses first, regulated-content modules first within those (compliance training, healthcare procedure videos, anything that's audit-bait under Section 504 or HIPAA), public-facing welcome video first within marketing.
Re-caption. Replace mangled or absent captions with glossary-biased output. The institutional glossary is built once — programme names, course names, faculty names, regulatory citations, drug and procedure names if you have a healthcare programme, SDK symbols if you have a CS programme, the institution's acronym handbook — and applies to every retrofit asset. Per-customer compounding accuracy is what makes this scale.
Publish. Push captions back to the originating surface. Sidecar SRT/VTT for Files; replace caption track in Studio; upload through the platform API for Panopto/Kaltura/Vimeo; channel-owner action for YouTube.
Log. Maintain an asset register: video URL, surface, caption file, caption source, reviewer, review date, glossary version. This is the documentation an OCR investigator asks for, and it's how institutional risk management proves work-in-progress on the long tail.

See pricing

Where glossary-biased captioning changes the math

The standard institutional retrofit cost calculus pits hand-corrected auto-captioning against vendor-supplied human captioning. Hand-correction at one to two hours per video, multiplied by a five-thousand-asset back-catalogue, multiplied by a $40-per-hour staff or student-worker rate, produces a six-figure project. Human captioning at $1.25-$3.00 per minute of video, multiplied by an average 30-minute lecture across that catalogue, produces a similar six-figure project — sometimes worse.

Glossary-biased captioning is a different cost shape. The institution builds the glossary once. Each minute of video costs a fraction of human-vendor pricing. The accuracy is high enough on the proper-noun surface that the human-review pass collapses from the full-correction hour to a quick scrub of the amber-highlighted glossary surface. For a 5,000-asset catalogue at an average 30-minute length — 2,500 hours — the GlossCap math (Org plan, 2,500 hours over a four-month retrofit window) lands well under the in-house and vendor-only paths. See the vendor pricing breakdown for the per-hour comparison.

The other cost most retrofit calculations miss is the cost of getting the proper nouns wrong twice — once in the captions and again in the OCR finding letter. Glossary-biased captioning is what stops that recurrence in the second-cycle audit, when the institution attests to a clean catalogue and the investigator spot-checks at random.

Canvas accessibility-checker behaviour

The Canvas accessibility checker (built into the rich content editor, also available as the institution-wide UDOIT plug-in) flags video accessibility issues at content-edit time. The checks relevant to captioning:

Video without captions. An embedded Studio video without an enabled caption track fails the check. A Files-area video without an attached caption track fails the check.
Auto-caption-only Studio videos. Some accessibility-checker configurations flag auto-caption-only Studio videos (no human review) as a partial fail; behaviour varies by Canvas version and UDOIT plug-in version.
External video without a verified caption track. The checker can't reach into the LTI-embedded platform to verify captions exist; the institution-side compliance posture has to assume LTI-embedded video may need separate audit.

The accessibility checker is a useful gating signal at content-creation time but is not a substitute for the catalogue-wide audit. Many of the most legally exposed videos in a Canvas tenant are months or years old and were published before the current accessibility-checker behaviour landed.

FAQ — Canvas captioning

Does Canvas Studio's auto-caption clear ADA Title II SC 1.2.2?

Studio auto-captions land in the same 80–90% substantive-accuracy band as YouTube auto-captions on training-style content with technical proper nouns. The substantive-accuracy bar SC 1.2.2 enforces is "captions that accurately convey the audio," not "captions that exist." For a no-proper-noun, conversational video, auto-captions can be substantively accurate. For lecture, regulated-content, or technical-procedure video, auto-captions virtually always require correction. The defensible posture is to treat auto-captions as a draft and run a glossary-biased correction pass before the video is exposed to a student.

What format do I upload to Canvas Files — SRT or VTT?

Both are accepted by the Canvas video player. SRT is the universal default and works in every consumption surface. WebVTT (VTT) is preferred when you need positioning cues, styling, or speaker-identification metadata; the file format is also what the HTML5 track element natively reads. Most institutional retrofits standardise on SRT for Files and let Studio output VTT internally.

How do captions in Canvas Studio compare to Panopto for accessibility evidence?

Both produce per-video caption files that are uploadable to an OCR or accommodation-services request. Panopto's accessibility report is more mature and exports a per-folder caption-status spreadsheet; Studio's reporting is per-video and lighter. For institutions running both — the common pattern, where Studio handles instructor-created short-form and Panopto handles full-classroom lecture capture — the asset register has to merge both surfaces.

If we copy a course across terms, do the captions travel?

Studio videos travel with their caption tracks. Files-area videos travel with their caption files but the caption-track association in the rich content editor often has to be re-attached. Course-copy is the most common point at which captions detach in practice; checking the new term's first-week course pages for the CC button is the fastest pre-flight test.

What about Canvas-embedded YouTube video — whose caption is it?

The channel owner's. Canvas only embeds the YouTube player; the caption track is whatever YouTube serves. For institutional content posted on the institution's official YouTube channel, the captioning workflow is the same as any other YouTube content — upload SRT/VTT through YouTube Studio. For third-party YouTube content embedded into a Canvas course (which raises a separate copyright question), the institution can't control the captions; it has to either find an alternative source or provide an equivalent captioned alternative.

What does the OCR investigation packet typically request?

For a video-accessibility complaint, OCR typically requests: the course URL, the videos in the course (or the sample courses if the complaint is programme-wide), the caption files attached to each, the institutional accessibility policy, the staff and faculty training records around accessibility, the accommodation-services request log relevant to the complainant. The asset register described above is exactly the artefact that answers the documentation half of that request quickly.

How does this relate to the AODA captions rule for Canadian universities on Canvas?

AODA's Integrated Accessibility Standards Regulation binds large Ontario organisations (50+ employees) — including all Ontario public universities — to WCAG 2.0 AA on web content, with a three-year compliance reporting cycle. The substantive captioning bar is identical to ADA Title II's. See the AODA captions reference for the reporting-cycle detail; institutions in scope of both regimes can ship one caption track to clear both.