Platform reference · Google Meet · Google Workspace · Drive · YouTube · Classroom

Google Meet captions: Workspace tenant captioning, recordings, Drive video, and training compliance

Google Meet is the default meeting surface for the majority of Google Workspace tenants — Business Starter, Business Standard, Business Plus, Enterprise Standard, Enterprise Plus, Education Fundamentals, Education Standard, Teaching and Learning Upgrade, Education Plus, and Workspace for Government. For organisations on Google Workspace, Meet is the meeting tool, Google Drive is where recordings land, and Google Classroom or a third-party LMS is where training content gets distributed. Every link in that chain has a caption surface, and every caption surface that touches training video carries an ADA Title II, Section 508, Section 504, EAA, or AODA obligation attached to it at most institutional tenants. Google's built-in auto-captions are the starting point, not the finishing point, for training-video compliance.

TL;DR

A Google Meet captioning workflow spans five surfaces. (1) Live auto-captions — Google's real-time speech-to-text, displayed as a CC overlay during the meeting, not stored in the recording by default. (2) Meeting transcript — Workspace recording + transcript feature, produces a per-speaker text document to Drive alongside the video recording. (3) Google Drive recording with captions — the video recording in Drive; captions are added as a separate caption track (SRT/VTT) or burned in at post-production. (4) YouTube upload with auto-captions — recordings uploaded to an institutional YouTube channel receive YouTube's auto-caption track, which inherits all of Google's STT limitations on technical content. (5) Google Classroom embed — recordings embedded in a Classroom assignment carry whatever caption track was added to the Drive or YouTube source. The failure mode is the same across all five surfaces: generic STT mangles technical proper nouns, and the mangled transcript is what gets stored, distributed, and discovered in a compliance audit.

Google Workspace tenant types and their caption implications

The Workspace edition defines which caption features are available and which compliance posture applies:

Workspace Business (Starter / Standard / Plus / Enterprise). Commercial tenants — the majority of 50–5,000-employee organisations running Google Workspace. Recordings land in the organiser's Google Drive. Transcript is available on Standard and above. Captioning vendor can typically access recordings via a standard data-processor agreement (DPA) under GDPR Article 28 and SOC 2 posture.
Workspace for Education (Fundamentals / Standard / Teaching and Learning Upgrade / Education Plus). K-12 districts and higher-ed institutions. FERPA-regulated tenant; the captioning vendor's data-handling must respect FERPA. Section 504, ADA Title II (for public K-12 and public universities), and Section 508 obligations attach to instructional video. Recordings that include identifiable student information are FERPA-covered; captioning vendor is a school official under the legitimate educational interest standard or must be identified as a contractor with FERPA-appropriate controls.
Workspace for Government (formerly Google Workspace for Government). FedRAMP-authorised tenant for US federal civilian and certain state/local government use. Captioning vendor must hold a DPA and, for certain data categories, match the FedRAMP posture. Fewer external captioning vendors are eligible than for commercial tenants.
Workspace Individual / Legacy G Suite. Older editions or individual plans; transcript and advanced captioning features may not be available.

The compliance regime attached to any specific Meet recording depends on the tenant type, the content of the recording, and how the recording is distributed — not just on which Workspace edition the organiser uses.

Surface 1 — Live auto-captions during the meeting

Google Meet generates live closed captions during the active meeting using Google's speech-to-text engine. The captions are displayed as a CC overlay in each participant's UI and can be turned on or off per participant. What they are not, by default, is stored. Live auto-captions are ephemeral — they disappear when the meeting ends, unless a meeting transcript has been enabled (Surface 2 below).

Substantive accuracy on conversational audio: 80–90%, consistent with all major generic STT systems. Substantive accuracy on technical content — engineering SDK and command-line terms, medical drug and procedure names, legal citations, financial-services regulatory acronyms, government agency and program names — drops materially. The proper-noun failure mode is the same: common word, plausible phonetic replacement, wrong rendering. PyTorch becomes "pie torch." tirzepatide becomes "tear-za-pa-tide." FERPA becomes "ferpa" or "fur-pa." TRADOC becomes "trade doc." The live-caption surface is an accommodation aid, not a compliance artefact — but the failure modes it exposes in real-time are the same failure modes that will appear in the transcript and recording captions.

For live accommodation obligations — an attendee with a documented hearing-related accommodation need — the Meet CART pathway is the defensible route. The organiser designates a caption role; a professional CART captioner types live captions that replace the auto-caption track in real time for that meeting. This is the standard pathway for institutionally required CART services under Section 504 and ADA Title II.

Google Meet also supports third-party live-caption integration through an API endpoint; used at large institutions running dedicated CART vendors at scale for regular all-hands or recurring instructional sessions.

Surface 2 — Meeting transcript (Workspace recording + transcript)

On Workspace Business Standard and above, and on Education Standard and above, Workspace can automatically generate a meeting transcript alongside the video recording when a meeting is recorded. The transcript is a per-speaker text document (Google Doc) deposited in the organiser's Drive alongside the video file.

Characteristics relevant to training-video captioning:

Per-speaker labelled. The transcript identifies speakers by their Workspace display name when speaker detection is clean. In a multi-speaker meeting with close microphone levels, speaker attribution degrades.
Not a caption file. The transcript Google Doc is a plain-text document, not a WebVTT or SRT file. It cannot be imported directly as a caption track on the Drive video recording. A conversion step is required to produce a caption file from the transcript.
Same STT accuracy as live captions. The transcript inherits the same speech-to-text accuracy band. Proper-noun mangling in the live caption appears verbatim in the transcript.
Admin-enabled in Education tenants. Workspace for Education admins control whether transcript and recording features are available to staff. Many districts disable recording for privacy reasons; enabling it requires an explicit admin-policy decision.
FERPA implication. A transcript that includes student speech is an education record in many interpretations. The transcript's downstream handling must respect FERPA.

The transcript is the raw material for producing a caption file. The workflow: Google Transcript Doc → convert to WebVTT (either manually or via a conversion tool) → upload to the Drive video recording as a caption track. The conversion step is where glossary-biased correction happens — the mangled proper nouns are corrected in the conversion pass before the caption file is produced.

Surface 3 — Google Drive recording with caption track

When a Meet is recorded, the video recording lands in the organiser's Google Drive as an MP4. Google Drive video playback supports caption tracks, uploaded as a separate SRT or VTT file associated with the Drive video file. The workflow:

Open the video file in Google Drive.
Click the three-dot menu → "Add captions."
Upload an SRT or WebVTT caption file. Drive accepts both.
The caption track is associated with the video and displays in Drive playback.

Google Drive video playback does not automatically generate a caption track from the video file's audio. Unlike YouTube, Drive does not run auto-captioning on the video file itself. If no caption file is manually uploaded, Drive video playback has no captions. This is an important distinction: a recording that sits in Drive without a manually uploaded caption file has no captions, regardless of whether the live meeting had auto-captions.

The Drive caption-upload workflow is the cleanest captioning path for Meet recordings that stay inside the Workspace ecosystem. The caption file is associated with the specific Drive video file; shared-drive access controls determine who can see it. For shared drives used as institutional training repositories, the caption file needs to be uploaded and verified before the video is shared with learners.

For a large Meet recording catalogue in Drive, the API pattern is the production-grade automation: enumerate Drive video files by folder or label → detect those without a caption track → send to glossary-biased captioning → upload the returned VTT via the Drive API's caption-attachment method → log the asset register entry.

Surface 4 — YouTube upload with auto-captions

Many Workspace Education tenants upload Meet recordings to an institutional YouTube channel — either the district's or university's own channel, or a Google Workspace for Education YouTube organisation account — for distribution to students, parents, or the public. YouTube is also a common distribution path for corporate L&D teams on Workspace who use YouTube as their video host.

YouTube auto-captions run on every uploaded video. YouTube's auto-caption quality is in the same 80–90% band for conversational audio, with the same proper-noun failure modes. The auto-caption track is published by default with the video on public and unlisted YouTube uploads; viewers see the mangled output unless a corrected caption file is uploaded to replace it.

YouTube caption replacement workflow:

Open YouTube Studio for the uploaded video.
Navigate to "Subtitles" in the left rail.
The auto-generated English track appears. Click the three-dot menu → "Edit as file" to download the raw auto-caption file, or click "Upload file" to replace the auto-generated track entirely with a corrected SRT or VTT file.
The uploaded caption track replaces the auto-generated track immediately.

For university lecture capture workflows that use YouTube as the distribution layer, the YouTube caption replacement step is where the compliance artefact is produced. The corrected, glossary-biased caption file uploaded to YouTube is what an OCR investigator pulling the video link will see.

Google Workspace for Education institutional YouTube channels can also be set to require manual captions before a video goes public — an admin-level policy setting that prevents uncaptioned videos from being published to external audiences.

Surface 5 — Google Classroom embed and LMS distribution

Meet recordings distributed through Google Classroom — posted as assignment materials, announcements, or stream items — are embedded from their Drive or YouTube source. The caption track that displays in Classroom playback is the caption track attached to the source:

If the source is a Drive video file with a caption track uploaded, the Drive caption track displays in Classroom playback.
If the source is a YouTube video with a corrected caption track uploaded, the YouTube caption track displays in Classroom playback.
If neither source has a caption track, the embedded video plays without captions in Classroom. YouTube auto-captions may display in Classroom for YouTube-sourced videos, but the auto-caption quality is not substantively compliant.

For K-12 tenants, the Classroom distribution path is where Section 504 individual accommodation obligations and ADA Title II web-content obligations converge. An IEP or 504 plan that documents hearing-related captioning needs for a specific student triggers an immediate obligation to ensure every Classroom video that student accesses has substantively accurate captions — not YouTube auto-captions, not a mangled Meet transcript overlay.

For non-Classroom LMS distribution — Meet recordings distributed through Canvas, Brightspace, Moodle, or Schoology — the caption track must be added to the Drive or YouTube source before the LMS link goes live. Canvas, Brightspace, and Moodle each have their own caption-track upload workflows for externally hosted video; the Workspace recording workflow feeds into those.

Compliance regimes — which apply to Google Meet recording tenants

ADA Title II. State and local government Workspace tenants — state agencies, public universities, public K-12 districts — bound to WCAG 2.1 AA on web content and mobile apps post-2026-04-24. Meet recordings distributed on public-facing institutional pages or through student-accessible LMS must have substantively accurate captions. OCR investigations sample training video and lecture recordings posted on institutional learning platforms.
Section 508. Federal agency tenants (Workspace for Government and some commercial-Workspace federal contractors) bound to 36 CFR § 1194 / WCAG 2.0 AA captioning on federal-program-related video.
Section 504. Any institution receiving federal financial assistance — public and private universities, K-12 districts, hospitals, non-profits — bound to programmatic accessibility. Captioning on instructional, program-relevant, and training video. When a student's IEP or 504 plan documents a captioning accommodation, the obligation is individual and immediate.
EAA. EU Workspace tenants in scope under the European Accessibility Act (since 2025-06-28). Training video distributed to employees or customers via Workspace and YouTube is subject to EN 301 549 clause 7.1.1–7.1.5 captioning requirements when the service falls within EAA product/service scope.
AODA. Ontario tenants bound to the Integrated Accessibility Standards Regulation (IASR § 14). Three-year compliance reporting cycle; next major large-organisation filing window 2026. Meet recordings used in employee training or public services must have substantively accurate captions.
FERPA. Education-tenant recordings that include student speech or identifiable student information. Captioning vendor is a school official or a contractor with FERPA-appropriate controls.

Proper-noun failure modes in Google Meet recording content

The proper-noun failure modes in Meet content vary by Workspace tenant type:

Engineering and SaaS tenants (Business / Enterprise). SDK and framework names (PyTorch, Helm, kubectl, Terraform, Pulumi, Bazel, gRPC, Protobuf, GraphQL, Istio, Argo, Flux, Spanner, BigQuery, Pub/Sub, Firestore, GKE), cloud-provider service names, internal product and service names that don't appear in training data, competitor names. The engineering-onboarding content pattern detailed in the engineering onboarding captions reference applies in full.
Healthcare tenants. Drug INNs (tirzepatide, semaglutide, apixaban, rivaroxaban, metformin, lisinopril, atorvastatin), procedure names (TAVR, PCI, CRRT, ECMO, laparoscopic cholecystectomy), diagnostic code prefixes (ICD-10 E-, F-, G-, H-, I-series), anatomy (myocardium, duodenum, sciatic nerve, renal artery), provider specialties (endocrinologist, rheumatologist, otolaryngologist). Detailed in medical training captions and HIPAA training captions.
Education tenants (higher-ed). Discipline-specific vocabulary varies by department. The Canvas LMS captions reference catalogues higher-ed proper-noun failure modes by discipline.
Education tenants (K-12). State and national standards vocabulary (Common Core State Standards, Next Generation Science Standards, TEKS, NGSS, CCSS), curriculum-series and textbook titles (Everyday Math, Saxon Math, Wonders ELA, Amplify Science), state assessment names (STAAR, SBAC, PARCC, NAEP, ACCESS), intervention program names, student information system names (PowerSchool, Infinite Campus, Skyward, Aeries, Illuminate Education, SchoolMint).
Government tenants. Federal-program acronyms (CMS, OPM, OCR, GSA, DLA, DCSA, DISA, NIH, NOAA, FERC, FCC, EPA, HUD, OMB), regulatory citations (CFR title/part/section, USC, FAR/DFARS), sub-agency and office names.

In each case, the compounding-glossary property of GlossCap captioning means the institution builds the vocabulary once. Every Meet recording that goes through the glossary-biased pipeline — current and future — benefits from the accumulated corrections.

The Google Meet recordings retrofit pattern

For a Workspace tenant sitting on a Drive folder of Meet recordings without substantive caption tracks, the retrofit runs in five phases:

Inventory. Use the Google Drive API to enumerate video files in the relevant folders or shared drives. Filter for MP4 files created via Meet by checking the MIME type (video/mp4) and, optionally, the meeting title pattern. Most institutional tenants find that 20–40% of their Meet recordings have been shared into a training or instructional context — LMS link, Classroom assignment, shared-drive training repository, or a YouTube channel. Those are the "promoted to training" set and the retrofit priority.
Triage. Rank by instructional exposure: recordings embedded in an active Classroom assignment or LMS course first, recordings distributed to a student-accessible shared drive high, recordings with documented accommodation obligations urgent. Recordings no one has accessed in six months can be archived rather than re-captioned if there is no regulatory trigger. The triage cut typically removes 30–50% of the catalogue from retrofit scope.
Caption production. For each triage-selected recording, produce a glossary-biased WebVTT caption file. The institutional glossary is built once — SDK names, drug formulary, standards vocabulary, government acronyms — and applies to every recording in the catalogue. New recordings entering the system inherit the glossary state.
Upload. Upload the caption file to the Drive video via the Drive caption-attachment method, or to the corresponding YouTube video if the recording has been distributed through YouTube. For Classroom-embedded videos, verify that the caption track displays in Classroom playback after upload.
Log. Maintain an asset register: Drive file ID, video title, caption file version, caption source (GlossCap, human vendor, manual edit), upload date, downstream distribution paths (Classroom assignment IDs, LMS course IDs, YouTube video IDs), reviewer name, review date. The asset register is the artefact that answers OCR / ADA coordinator / compliance-officer document requests.

See pricing

Where glossary-biased captioning changes the math for Workspace tenants

The standard Workspace tenant retrofit cost calculus pits hand-corrected auto-transcripts against human captioning vendors. Hand-correction of a Meet transcript (re-typing the Google Transcript Doc into a properly timed caption file, correcting proper-noun errors throughout) at two to three hours per recorded hour, multiplied by a district or university catalogue of 500–3,000 recorded sessions, multiplied by a $40-per-hour staff or contractor rate, produces a six-figure or low-seven-figure project. Human captioning at $1.25–$3.00 per minute of video produces similar or higher costs.

Glossary-biased captioning collapses both. The institution builds the glossary once. Each minute of video costs a fraction of human-vendor pricing. The accuracy on the proper-noun surface — the one that the generic auto-transcript and the human corrector both struggle with — is high enough that the human-review pass collapses from full correction to a quick scrub of the amber-highlighted glossary surface. For a 1,000-hour Meet recording catalogue retrofitted over a semester, the GlossCap math lands well under both alternatives. See the vendor pricing breakdown and the hidden caption-correction FTE cost analysis.

The high-leverage steady-state pattern: webhook on Meet-recording-completed → Drive API notification → glossary-biased caption production → Drive caption upload → Classroom / LMS caption-display verification → asset-register entry. New Meet recordings that become training content are captioned within hours of recording completion, not weeks.

FAQ — Google Meet captions

Does Google Meet's live auto-caption clear ADA Title II SC 1.2.2 / Section 504 / EAA?

Live auto-captions are an accommodation aid during the meeting, not a stored compliance artefact for the recording. For training-video compliance (SC 1.2.2 on prerecorded video), what matters is the caption track on the recording — the Drive video caption file or YouTube caption track — not the ephemeral live-caption overlay. The Drive video recording has no captions by default unless a caption file is uploaded separately. The YouTube upload has auto-generated captions, but those inherit the same 80–90% accuracy limitation with proper-noun mangling. SC 1.2.2's "accurately convey the audio" standard is not met by a track that mangles product names, drug names, government agency names, or SDK terms in ways that change the substantive meaning.

Does the meeting transcript Google Doc count as a caption file?

No. The Google Transcript Doc is a plain-text document, not a time-coded caption file (WebVTT or SRT). It cannot be directly attached to the Drive video as a caption track. A conversion step is required: the transcript text (with timing information, if available from the raw format) must be formatted into a valid VTT or SRT with accurate per-cue timecodes. The conversion step is the opportunity to apply glossary-biased correction to the proper nouns before the caption file is published.

My district uses Google Classroom. If I upload a caption file to the Drive video, does it show up in Classroom?

Yes, if the Classroom assignment or material links directly to the Drive video file. When the student opens the Drive video in Classroom, the Drive caption track is available via the CC button in the video player. The caption track is not automatically enabled for the student — they must click CC. For students with documented accommodation needs (IEP or 504 plan), you may additionally need to document that the caption track is present and accessible, not just that it was uploaded.

What about YouTube auto-captions on our institutional YouTube channel?

YouTube auto-generates a caption track on every uploaded video. That track has the same 80–90% accuracy band as Google's other STT surfaces, with the same proper-noun mangling. For public-facing institutional YouTube videos — posted to a district or university channel where students, parents, or the public can watch — the auto-caption track is what's displayed if no corrected track has been uploaded. For ADA Title II compliance (public university or public K-12 district), the auto-caption track on public-facing instructional video does not meet the substantive-accuracy bar. The corrected, glossary-biased caption file must be uploaded to replace the auto-generated track.

Can a CART captioner be used in Google Meet?

Yes. The organiser can designate one participant as a caption typist during the meeting. That participant's typed captions replace the auto-caption track in the live-caption overlay for all participants who have captions enabled. This is the supported CART pathway in Meet. The CART captioner's typed transcript is not automatically stored as a caption file on the recording — the recording still captures only the video, and the post-recording caption track must be uploaded separately.

Does FERPA affect how we share Meet recordings with a captioning vendor?

Potentially yes. If the Meet recording contains identifiable student speech — which many instructional recordings do — the recording is an education record under FERPA. Before sharing with an external captioning vendor, the institution must ensure that the vendor is either designated as a school official (with a direct-services agreement) or qualifies as a contractor providing services the school directs, with FERPA-appropriate controls on data handling and a commitment not to use student information for secondary purposes. Most enterprise captioning vendors have FERPA-aware agreements; ask for the specific FERPA addendum.

How does Google Meet compare to Zoom and Webex for captioning?

The captioning surfaces are similar across all three platforms — live auto-captions during the meeting, a post-meeting transcript or recording-level caption workflow, and a post-upload caption replacement step. The differences are in the distribution ecosystem: Meet recordings land in Google Drive and can distribute through Google Classroom and YouTube, which is the natural path for Google Workspace Education tenants. Zoom recordings land in Zoom Cloud Recordings or local disk, with a transcript that exports as VTT. Webex adds the FedRAMP / HITRUST / ITAR tenant-compliance layer that Meet and Zoom commercial tenants don't have. For education tenants specifically, the Classroom distribution layer and the FERPA-compliance posture are the distinctive dimensions of the Meet workflow.