Platform reference · Google Meet · Google Workspace · Drive · YouTube · Classroom

Google Meet captions: Workspace tenant captioning, recordings, Drive video, and training compliance

Google Meet is the default meeting surface for the majority of Google Workspace tenants — Business Starter, Business Standard, Business Plus, Enterprise Standard, Enterprise Plus, Education Fundamentals, Education Standard, Teaching and Learning Upgrade, Education Plus, and Workspace for Government. For organisations on Google Workspace, Meet is the meeting tool, Google Drive is where recordings land, and Google Classroom or a third-party LMS is where training content gets distributed. Every link in that chain has a caption surface, and every caption surface that touches training video carries an ADA Title II, Section 508, Section 504, EAA, or AODA obligation attached to it at most institutional tenants. Google's built-in auto-captions are the starting point, not the finishing point, for training-video compliance.

TL;DR

A Google Meet captioning workflow spans five surfaces. (1) Live auto-captions — Google's real-time speech-to-text, displayed as a CC overlay during the meeting, not stored in the recording by default. (2) Meeting transcript — Workspace recording + transcript feature, produces a per-speaker text document to Drive alongside the video recording. (3) Google Drive recording with captions — the video recording in Drive; captions are added as a separate caption track (SRT/VTT) or burned in at post-production. (4) YouTube upload with auto-captions — recordings uploaded to an institutional YouTube channel receive YouTube's auto-caption track, which inherits all of Google's STT limitations on technical content. (5) Google Classroom embed — recordings embedded in a Classroom assignment carry whatever caption track was added to the Drive or YouTube source. The failure mode is the same across all five surfaces: generic STT mangles technical proper nouns, and the mangled transcript is what gets stored, distributed, and discovered in a compliance audit.

Google Workspace tenant types and their caption implications

The Workspace edition defines which caption features are available and which compliance posture applies:

The compliance regime attached to any specific Meet recording depends on the tenant type, the content of the recording, and how the recording is distributed — not just on which Workspace edition the organiser uses.

Surface 1 — Live auto-captions during the meeting

Google Meet generates live closed captions during the active meeting using Google's speech-to-text engine. The captions are displayed as a CC overlay in each participant's UI and can be turned on or off per participant. What they are not, by default, is stored. Live auto-captions are ephemeral — they disappear when the meeting ends, unless a meeting transcript has been enabled (Surface 2 below).

Substantive accuracy on conversational audio: 80–90%, consistent with all major generic STT systems. Substantive accuracy on technical content — engineering SDK and command-line terms, medical drug and procedure names, legal citations, financial-services regulatory acronyms, government agency and program names — drops materially. The proper-noun failure mode is the same: common word, plausible phonetic replacement, wrong rendering. PyTorch becomes "pie torch." tirzepatide becomes "tear-za-pa-tide." FERPA becomes "ferpa" or "fur-pa." TRADOC becomes "trade doc." The live-caption surface is an accommodation aid, not a compliance artefact — but the failure modes it exposes in real-time are the same failure modes that will appear in the transcript and recording captions.

For live accommodation obligations — an attendee with a documented hearing-related accommodation need — the Meet CART pathway is the defensible route. The organiser designates a caption role; a professional CART captioner types live captions that replace the auto-caption track in real time for that meeting. This is the standard pathway for institutionally required CART services under Section 504 and ADA Title II.

Google Meet also supports third-party live-caption integration through an API endpoint; used at large institutions running dedicated CART vendors at scale for regular all-hands or recurring instructional sessions.

Surface 2 — Meeting transcript (Workspace recording + transcript)

On Workspace Business Standard and above, and on Education Standard and above, Workspace can automatically generate a meeting transcript alongside the video recording when a meeting is recorded. The transcript is a per-speaker text document (Google Doc) deposited in the organiser's Drive alongside the video file.

Characteristics relevant to training-video captioning:

The transcript is the raw material for producing a caption file. The workflow: Google Transcript Doc → convert to WebVTT (either manually or via a conversion tool) → upload to the Drive video recording as a caption track. The conversion step is where glossary-biased correction happens — the mangled proper nouns are corrected in the conversion pass before the caption file is produced.

Surface 3 — Google Drive recording with caption track

When a Meet is recorded, the video recording lands in the organiser's Google Drive as an MP4. Google Drive video playback supports caption tracks, uploaded as a separate SRT or VTT file associated with the Drive video file. The workflow:

  1. Open the video file in Google Drive.
  2. Click the three-dot menu → "Add captions."
  3. Upload an SRT or WebVTT caption file. Drive accepts both.
  4. The caption track is associated with the video and displays in Drive playback.

Google Drive video playback does not automatically generate a caption track from the video file's audio. Unlike YouTube, Drive does not run auto-captioning on the video file itself. If no caption file is manually uploaded, Drive video playback has no captions. This is an important distinction: a recording that sits in Drive without a manually uploaded caption file has no captions, regardless of whether the live meeting had auto-captions.

The Drive caption-upload workflow is the cleanest captioning path for Meet recordings that stay inside the Workspace ecosystem. The caption file is associated with the specific Drive video file; shared-drive access controls determine who can see it. For shared drives used as institutional training repositories, the caption file needs to be uploaded and verified before the video is shared with learners.

For a large Meet recording catalogue in Drive, the API pattern is the production-grade automation: enumerate Drive video files by folder or label → detect those without a caption track → send to glossary-biased captioning → upload the returned VTT via the Drive API's caption-attachment method → log the asset register entry.

Surface 4 — YouTube upload with auto-captions

Many Workspace Education tenants upload Meet recordings to an institutional YouTube channel — either the district's or university's own channel, or a Google Workspace for Education YouTube organisation account — for distribution to students, parents, or the public. YouTube is also a common distribution path for corporate L&D teams on Workspace who use YouTube as their video host.

YouTube auto-captions run on every uploaded video. YouTube's auto-caption quality is in the same 80–90% band for conversational audio, with the same proper-noun failure modes. The auto-caption track is published by default with the video on public and unlisted YouTube uploads; viewers see the mangled output unless a corrected caption file is uploaded to replace it.

YouTube caption replacement workflow:

For university lecture capture workflows that use YouTube as the distribution layer, the YouTube caption replacement step is where the compliance artefact is produced. The corrected, glossary-biased caption file uploaded to YouTube is what an OCR investigator pulling the video link will see.

Google Workspace for Education institutional YouTube channels can also be set to require manual captions before a video goes public — an admin-level policy setting that prevents uncaptioned videos from being published to external audiences.

Surface 5 — Google Classroom embed and LMS distribution

Meet recordings distributed through Google Classroom — posted as assignment materials, announcements, or stream items — are embedded from their Drive or YouTube source. The caption track that displays in Classroom playback is the caption track attached to the source:

For K-12 tenants, the Classroom distribution path is where Section 504 individual accommodation obligations and ADA Title II web-content obligations converge. An IEP or 504 plan that documents hearing-related captioning needs for a specific student triggers an immediate obligation to ensure every Classroom video that student accesses has substantively accurate captions — not YouTube auto-captions, not a mangled Meet transcript overlay.

For non-Classroom LMS distribution — Meet recordings distributed through Canvas, Brightspace, Moodle, or Schoology — the caption track must be added to the Drive or YouTube source before the LMS link goes live. Canvas, Brightspace, and Moodle each have their own caption-track upload workflows for externally hosted video; the Workspace recording workflow feeds into those.

Compliance regimes — which apply to Google Meet recording tenants

Proper-noun failure modes in Google Meet recording content

The proper-noun failure modes in Meet content vary by Workspace tenant type:

In each case, the compounding-glossary property of GlossCap captioning means the institution builds the vocabulary once. Every Meet recording that goes through the glossary-biased pipeline — current and future — benefits from the accumulated corrections.

The Google Meet recordings retrofit pattern

For a Workspace tenant sitting on a Drive folder of Meet recordings without substantive caption tracks, the retrofit runs in five phases:

  1. Inventory. Use the Google Drive API to enumerate video files in the relevant folders or shared drives. Filter for MP4 files created via Meet by checking the MIME type (video/mp4) and, optionally, the meeting title pattern. Most institutional tenants find that 20–40% of their Meet recordings have been shared into a training or instructional context — LMS link, Classroom assignment, shared-drive training repository, or a YouTube channel. Those are the "promoted to training" set and the retrofit priority.
  2. Triage. Rank by instructional exposure: recordings embedded in an active Classroom assignment or LMS course first, recordings distributed to a student-accessible shared drive high, recordings with documented accommodation obligations urgent. Recordings no one has accessed in six months can be archived rather than re-captioned if there is no regulatory trigger. The triage cut typically removes 30–50% of the catalogue from retrofit scope.
  3. Caption production. For each triage-selected recording, produce a glossary-biased WebVTT caption file. The institutional glossary is built once — SDK names, drug formulary, standards vocabulary, government acronyms — and applies to every recording in the catalogue. New recordings entering the system inherit the glossary state.
  4. Upload. Upload the caption file to the Drive video via the Drive caption-attachment method, or to the corresponding YouTube video if the recording has been distributed through YouTube. For Classroom-embedded videos, verify that the caption track displays in Classroom playback after upload.
  5. Log. Maintain an asset register: Drive file ID, video title, caption file version, caption source (GlossCap, human vendor, manual edit), upload date, downstream distribution paths (Classroom assignment IDs, LMS course IDs, YouTube video IDs), reviewer name, review date. The asset register is the artefact that answers OCR / ADA coordinator / compliance-officer document requests.

See pricing

Where glossary-biased captioning changes the math for Workspace tenants

The standard Workspace tenant retrofit cost calculus pits hand-corrected auto-transcripts against human captioning vendors. Hand-correction of a Meet transcript (re-typing the Google Transcript Doc into a properly timed caption file, correcting proper-noun errors throughout) at two to three hours per recorded hour, multiplied by a district or university catalogue of 500–3,000 recorded sessions, multiplied by a $40-per-hour staff or contractor rate, produces a six-figure or low-seven-figure project. Human captioning at $1.25–$3.00 per minute of video produces similar or higher costs.

Glossary-biased captioning collapses both. The institution builds the glossary once. Each minute of video costs a fraction of human-vendor pricing. The accuracy on the proper-noun surface — the one that the generic auto-transcript and the human corrector both struggle with — is high enough that the human-review pass collapses from full correction to a quick scrub of the amber-highlighted glossary surface. For a 1,000-hour Meet recording catalogue retrofitted over a semester, the GlossCap math lands well under both alternatives. See the vendor pricing breakdown and the hidden caption-correction FTE cost analysis.

The high-leverage steady-state pattern: webhook on Meet-recording-completed → Drive API notification → glossary-biased caption production → Drive caption upload → Classroom / LMS caption-display verification → asset-register entry. New Meet recordings that become training content are captioned within hours of recording completion, not weeks.

FAQ — Google Meet captions

Does Google Meet's live auto-caption clear ADA Title II SC 1.2.2 / Section 504 / EAA?

Live auto-captions are an accommodation aid during the meeting, not a stored compliance artefact for the recording. For training-video compliance (SC 1.2.2 on prerecorded video), what matters is the caption track on the recording — the Drive video caption file or YouTube caption track — not the ephemeral live-caption overlay. The Drive video recording has no captions by default unless a caption file is uploaded separately. The YouTube upload has auto-generated captions, but those inherit the same 80–90% accuracy limitation with proper-noun mangling. SC 1.2.2's "accurately convey the audio" standard is not met by a track that mangles product names, drug names, government agency names, or SDK terms in ways that change the substantive meaning.

Does the meeting transcript Google Doc count as a caption file?

No. The Google Transcript Doc is a plain-text document, not a time-coded caption file (WebVTT or SRT). It cannot be directly attached to the Drive video as a caption track. A conversion step is required: the transcript text (with timing information, if available from the raw format) must be formatted into a valid VTT or SRT with accurate per-cue timecodes. The conversion step is the opportunity to apply glossary-biased correction to the proper nouns before the caption file is published.

My district uses Google Classroom. If I upload a caption file to the Drive video, does it show up in Classroom?

Yes, if the Classroom assignment or material links directly to the Drive video file. When the student opens the Drive video in Classroom, the Drive caption track is available via the CC button in the video player. The caption track is not automatically enabled for the student — they must click CC. For students with documented accommodation needs (IEP or 504 plan), you may additionally need to document that the caption track is present and accessible, not just that it was uploaded.

What about YouTube auto-captions on our institutional YouTube channel?

YouTube auto-generates a caption track on every uploaded video. That track has the same 80–90% accuracy band as Google's other STT surfaces, with the same proper-noun mangling. For public-facing institutional YouTube videos — posted to a district or university channel where students, parents, or the public can watch — the auto-caption track is what's displayed if no corrected track has been uploaded. For ADA Title II compliance (public university or public K-12 district), the auto-caption track on public-facing instructional video does not meet the substantive-accuracy bar. The corrected, glossary-biased caption file must be uploaded to replace the auto-generated track.

Can a CART captioner be used in Google Meet?

Yes. The organiser can designate one participant as a caption typist during the meeting. That participant's typed captions replace the auto-caption track in the live-caption overlay for all participants who have captions enabled. This is the supported CART pathway in Meet. The CART captioner's typed transcript is not automatically stored as a caption file on the recording — the recording still captures only the video, and the post-recording caption track must be uploaded separately.

Does FERPA affect how we share Meet recordings with a captioning vendor?

Potentially yes. If the Meet recording contains identifiable student speech — which many instructional recordings do — the recording is an education record under FERPA. Before sharing with an external captioning vendor, the institution must ensure that the vendor is either designated as a school official (with a direct-services agreement) or qualifies as a contractor providing services the school directs, with FERPA-appropriate controls on data handling and a commitment not to use student information for secondary purposes. Most enterprise captioning vendors have FERPA-aware agreements; ask for the specific FERPA addendum.

How does Google Meet compare to Zoom and Webex for captioning?

The captioning surfaces are similar across all three platforms — live auto-captions during the meeting, a post-meeting transcript or recording-level caption workflow, and a post-upload caption replacement step. The differences are in the distribution ecosystem: Meet recordings land in Google Drive and can distribute through Google Classroom and YouTube, which is the natural path for Google Workspace Education tenants. Zoom recordings land in Zoom Cloud Recordings or local disk, with a transcript that exports as VTT. Webex adds the FedRAMP / HITRUST / ITAR tenant-compliance layer that Meet and Zoom commercial tenants don't have. For education tenants specifically, the Classroom distribution layer and the FERPA-compliance posture are the distinctive dimensions of the Meet workflow.

Further reading