Employee Communications Operations · Published 2026-06-13
Captioning employee communications video: all-hands meetings, town halls, exec recordings, and the ADA Title I gap most L&D teams don't audit
Employee communications video occupies a compliance blind spot that L&D caption programmes consistently miss. Training videos get captioned: compliance modules, onboarding video, product certification content. The videos that don't get captioned — and that collectively represent a substantial portion of the video library that employees at any given organization are expected to engage with — are the communications category: the quarterly all-hands recording, the town-hall session posted to the intranet the morning after the event, the CEO video message about the new strategic direction, the CFO recording walking through the annual results, the VP of People video explaining the new benefits package, the policy announcement video distributed before the open enrollment window. The production chain for this content is corporate communications or People/HR, not L&D. Corporate communications does not have a captioning workflow. People/HR does not have a captioning vendor. IT manages the platforms (Zoom, Microsoft Teams, Vimeo) that record and host this content, but IT is not responsible for the accessibility compliance status of what those platforms store. The result: a large, growing, and almost entirely uncaptioned archive of employee-facing video content that sits outside the L&D caption programme's scope — and outside anyone's direct compliance ownership. The compliance obligation for these videos does not sit outside ADA Title I.
ADA Title I requires employers with fifteen or more employees to provide effective communication to qualified individuals with disabilities across all employment contexts, not just formal training. The statutory language and the EEOC guidance that interprets it do not carve out "informal" or "non-training" employer video communications from the effective communication obligation. A quarterly all-hands recording that all employees are expected to watch describes the organization's direction, its priorities, and its personnel changes — it is employment communication. A CEO video message announcing a reorganization that will affect reporting structures, job scopes, and office locations is employment communication. A benefits open-enrollment video that employees must watch to make informed decisions about their healthcare coverage is employment communication. A policy announcement video that describes changes to the code of conduct, expense reporting procedures, or remote-work eligibility criteria is employment communication. Employees with hearing disabilities are entitled to effective access to all of these, not just to the content that the L&D team formally designated as "training." ADA Title II enforcement has focused significant attention on training video in public-sector organizations; Title I enforcement — which covers private employers — has been expanding its scope to employer video communications as video becomes the dominant medium for internal company announcements at hybrid and remote organizations. The trend of organizational communication moving from email and PDF to video makes the employee-communications caption gap larger every quarter.
The structural reason this gap persists is ownership fragmentation. In most organizations, three teams have partial ownership of some part of the employee communications video stack, and none of them owns the caption compliance problem. L&D owns the caption compliance programme but defines its scope as training content; corporate communications or People/HR owns the content production but has no captioning infrastructure; IT owns the platforms but treats caption accuracy as a content question, not a platform question. Each team has a reasonable internal logic for why the problem belongs to a different team. The L&D team says: "We don't own this content, we didn't produce it, it's not in our LMS." Corporate communications says: "We use Zoom and Teams auto-captions — isn't that what the platforms are for?" IT says: "Zoom and Teams have built-in auto-captions — the platform handles accessibility." None of these positions is irrational given each team's internal scope definition. But the combined result is that employees with hearing disabilities cannot reliably access the organization's primary vehicle for communicating strategy, culture, and operational changes. The gap is not a technology problem — Zoom, Teams, Vimeo, and Wistia all support caption file upload. It is an ownership and workflow problem.
This post covers the entire operational scope of employee communications video captioning: why the content category is structurally different from training video, what the ADA Title I compliance obligation actually covers and where the enforcement risk concentrates, the live versus recorded distinction that determines which platform capability you need and when, platform-by-platform caption workflows for Zoom, Microsoft Teams, Vimeo, Wistia, and Loom, the glossary architecture challenge for content whose vocabulary changes with every quarterly cycle, how to build a caption programme that spans L&D, corporate communications, and HR without requiring each team to become caption experts, the eight failure modes that generate compliance exposure in employee communications video programmes, and a seven-question FAQ on the operational decisions that come up most often when organizations try to close this gap. The remote and hybrid async video captioning post covers the home-office audio problem and distributed production model for training content; this post focuses specifically on what is different when the content is organizational communication rather than training and the producer is corporate communications or HR rather than L&D. The live captions versus recorded video accuracy post covers the accuracy implications of live session captioning in detail; this post applies that framework to the specific context of all-hands meetings and town halls where a live event produces a recording that needs to be captioned separately.
TL;DR — three things that matter about employee communications video caption compliance
- ADA Title I covers all employer communications to employees, not just formal training video. All-hands recordings, town hall sessions, executive video messages, benefits explanation videos, and policy announcement videos are employer communications under ADA Title I. Employees with hearing disabilities are entitled to effective access to these communications regardless of whether the L&D team produced them or owns the caption compliance programme. The production origin (corporate comms, HR, executive team) does not change the compliance obligation. The scope of employee communications video that requires captions is substantially larger than the scope most L&D caption programmes currently cover.
- Live platform auto-captions are not a substitute for a validated caption track on the recording. Zoom, Teams, and similar platforms generate real-time captions during live all-hands events and town halls. These live captions achieve 80–88% accuracy on general executive speech and substantially lower accuracy on product names, financial terminology, and organizational vocabulary. They cannot be saved as the caption track for the recording — they are displayed during the live event and then discarded. The recording requires a separate, validated SRT or VTT file that meets the 99% WCAG SC 1.2.2 accuracy standard. Many organizations mistakenly believe that because they enabled live captions for the event, the recording is captioned. It is not.
- The ownership gap between L&D, corporate communications, and HR is the structural problem — and it requires an explicit programme design, not a technology solution. No platform update will fix the fact that corporate comms produces video and L&D owns caption compliance. A workable programme requires naming an owner for employee communications video captions, defining the content scope, establishing a review workflow that the content producers can actually execute, and building a glossary that corporate comms or HR can update without requiring L&D involvement. The technology — Zoom, Teams, Vimeo, GlossCap — is implementation detail. The programme design is the hard part.
Why employee communications video is different from training video
The L&D caption programme is designed around a specific production model: a professional or near-professional producer creates structured content with a defined audience and a known compliance obligation, uploads it to an LMS, and the caption workflow is triggered at the upload step. This model works well for the content it was designed for. Employee communications video breaks every structural assumption of this model.
Production origin and ownership
Training video is typically produced by L&D professionals or subject-matter experts working with L&D. The L&D team commissions the content, reviews the script, manages the recording, uploads to the LMS, and owns the content from production through learner completion. Caption compliance is part of the L&D workflow because the L&D team owns the entire pipeline. Employee communications video is produced by whoever owns the message: the CEO's executive assistant schedules and records the quarterly all-hands; corporate communications manages the town hall Zoom setup and posts the recording to the company intranet; the VP of People records the benefits explanation video directly on Loom and emails the link to all-staff. L&D has no role in any of these production chains and no visibility into the content until after it is distributed. The caption obligation doesn't follow the production workflow — it follows the compliance obligation — but the caption programme is structured around the production workflow. The gap is structural.
Vocabulary and vocabulary change rate
Training video vocabulary is relatively stable. A product training module from last quarter is built on product terminology that changes when the product changes — quarterly at most for most products, more slowly for compliance and operational content. The L&D team's glossary can be updated on a quarterly cadence and remain accurate for the training library. Employee communications vocabulary changes at the speed of the business: each quarterly earnings call introduces new financial metrics, new segment definitions, and new analyst-relations terminology that corporate communications uses for the next three months and then replaces in the following quarter's messaging. Each product launch introduces new product names, feature names, and positioning language that appear in the all-hands before they appear in any training content. Organizational restructurings introduce new leadership names, new team names, and new titles. Benefits open enrollment introduces insurance plan names, provider network names, and benefits platform names that are entirely different from the L&D training glossary. A static quarterly-update glossary calibrated for the training library will systematically fail on the quarterly cadence of new vocabulary that employee communications content introduces.
Content format and structure
Training video has predictable structure: a defined topic, a clear learning objective, a scripted or near-scripted delivery, a consistent speaker in a consistent recording environment. The structure makes training video relatively amenable to ASR accuracy optimization — the vocabulary is bounded, the speaker is consistent, the recording environment is controlled. Employee communications video is structurally the opposite: an all-hands is a live event with multiple speakers, Q&A, interruptions, cross-talk, and the acoustic environment of a conference room or auditorium rather than a recording studio. A town hall often includes employees asking questions in a moderately noisy environment, sometimes with accents that the ASR model's baseline accuracy is lower on. An executive recording from a home office has home-office audio quality — the same home-office audio degradation problem as any other remote recording, compounded by the executive's tendency to speak at a faster pace and with more domain-specific vocabulary than a trained L&D presenter. The content format of employee communications video makes it harder to caption accurately than training video, not easier.
Distribution and consumption model
Training video is consumed through an LMS with a defined enrollment, completion tracking, and access control. L&D knows who has access to each training video, can control caption settings per video, and has an audit trail. Employee communications video is distributed through channels that L&D typically doesn't control: the company intranet, a Slack channel message, an email link, a Vimeo or Wistia embed on a People/HR page, a Teams channel post. Distribution is broad, uncontrolled from a caption perspective, and often immediate — the all-hands recording is posted to the intranet two hours after the event ends, before any caption validation has occurred. The audience is the entire company, not a defined learner group. The consumption context is informal — employees watch at their desks, on mobile, in the background — which increases the functional value of captions for all employees, not just those with hearing disabilities. The caption programme that works for a structured LMS delivery model needs to be adapted substantially for the informal distribution model of employee communications content.
The scale of uncaptioned content in most organizations
Most organizations that have been running a caption programme for training content for twelve to twenty-four months have a well-captioned training library. The same organizations typically have two to four years of uncaptioned all-hands recordings, town halls, and executive video messages sitting in Zoom cloud recording storage, the company Vimeo account, a SharePoint site, or the corporate intranet. The aggregate volume is often larger than the training library: a company that holds monthly all-hands (twelve events per year), quarterly town halls with the full executive team (four per year), and periodic video messages from the CEO and leadership team (eight to twelve per year) produces thirty to thirty-five hours of uncaptioned employee communications video annually. Over three years, that is ninety to one hundred hours of material — comparable to a mid-sized training library. The LMS audit methodology post covers how to triage and remediate a compliance backlog; those same prioritization principles apply to an employee communications video archive.
The ADA Title I compliance gap
ADA Title I prohibits covered employers (fifteen or more employees) from discriminating against qualified individuals with disabilities in all terms, conditions, and privileges of employment. The effective communication obligation — which is the specific obligation at issue for employee communications video — requires that employers ensure employees with disabilities can access the same information and participate in the same employment activities as employees without disabilities, through any reasonable accommodation or modification that does not create an undue hardship. For video content that all employees are expected to watch, synchronized captions are the standard effective communication accommodation — they are technically feasible, do not require the hearing-disabled employee to identify themselves or request a separate accommodation, and provide equivalent access at the moment the content is distributed.
What "employer communications" means under ADA Title I
The scope of employer communications under Title I extends beyond content that would be classified as "training" under a narrow definition. EEOC guidance and Title I case law have addressed effective communication obligations across a broad range of employment contexts: staff meetings, policy announcements, disciplinary proceedings, benefits explanations, and general workplace communications. The shift from in-person to video-mediated delivery of these communications over the past five years has not changed the legal framework — the obligation to provide effective communication transfers to the medium the organization uses to communicate, which is now frequently video. The specific content categories that generate the clearest Title I effective communication obligation for video are:
- All-hands meetings and town halls: These communicate organizational direction, performance, and strategy to all employees. They are the primary vehicle for senior leadership to explain decisions that affect employment. An employee with hearing loss who cannot access the recorded version of an all-hands announcing a restructuring — including their team — has not received effective employment communication.
- Executive video messages: CEO updates on financial results, new strategic initiatives, leadership changes, and organizational changes are employer communications under Title I even when delivered as a video message rather than a live meeting. The format (pre-recorded video versus email) does not change the communication's character as employer communication about terms and conditions of employment.
- Benefits and HR communications: Open enrollment explanation videos, 401(k) plan overview recordings, healthcare benefits walkthroughs, and policy change explanation videos are employer communications about employee benefits — a specific and significant term of employment. The DOL, EEOC, and healthcare regulators have all issued guidance that benefits communications must be accessible to employees with disabilities.
- Policy announcement videos: Changes to the code of conduct, expense reimbursement procedures, remote-work eligibility, performance management processes, and other operational policies distributed via video to all employees are employer communications under Title I. An employee who cannot access a policy change announcement video because it lacks captions is in a documentably inequitable information position relative to their colleagues.
- Emergency and crisis communications: Safety announcements, emergency procedure videos, and crisis communications that are distributed to employees have both an ADA Title I accessibility obligation and a separate safety and emergency-communications duty-of-care obligation. These are the highest-priority content for caption compliance given the combination of urgency and consequence.
Where enforcement risk concentrates
Title I ADA enforcement for employee communications video is not yet at the level of enforcement for Section 508 (federal contractors) or ADA Title II (public entities), but the trajectory is upward for three reasons. First, remote and hybrid work has made video the primary medium for organizational communication at many employers, dramatically increasing the volume of video content that employees are expected to engage with and creating a documented record of which employees could and couldn't effectively access that content. Second, remote work has increased the employment of deaf and hard-of-hearing employees by removing location barriers to employment — many jobs that previously required office presence can now be done remotely, expanding the pool of employees who have a documented need for captioned video content in their day-to-day work. Third, EEOC charge data and DOJ investigation patterns have increasingly included video accessibility as a component of effective communication investigations, particularly in cases where the organization has no systematic caption programme for employee communications video.
The enforcement pattern that generates the most organizational risk: an employee with hearing loss files an EEOC charge or internal complaint noting that they cannot access the organization's all-hands recordings, town-hall sessions, or benefits explanation videos. The employer produces the LMS caption programme as evidence of its accessibility commitment. The EEOC investigator notes that the employee communications video library — which includes the quarterly all-hands, the annual benefits enrollment video, and two leadership communications about the restructuring that affected the employee's team — has no captions. The employer's defence that "we have a training video caption programme" does not extend to the specific communications the employee needed to access. The investigation findings document systematic absence of captions in a category of employer communication that the employee demonstrably needed to access. The outcome: remediation order plus reputational exposure, potentially including a consent agreement requiring systematic captioning of employee communications going forward. The remediation cost is substantially higher after an investigation than before — it involves not just captioning new content but auditing and retroactively captioning the backlog that the investigation has now made directly relevant.
The "we have live captions so we covered it" exposure point
A specific and increasingly common compliance risk: organizations that enabled Zoom or Teams live captions for their all-hands events believe they have addressed the ADA Title I obligation. They have not, for two reasons. First, live auto-captions from Zoom and Teams achieve 80–88% accuracy on general executive speech and 65–78% accuracy on content with significant organizational vocabulary density (product names, financial terminology, leader names and titles) — below the 99% WCAG SC 1.2.2 threshold that is the standard for effective communication in a video context. An employee with hearing loss who watches the live event with 80% caption accuracy is receiving approximately one word in five incorrectly — which is a degraded, not equivalent, communication experience. Second, and more practically, the live captions exist only during the live event. The recording — which is what employees who missed the event, employees in different time zones, and employees who want to review specific content will actually watch — does not automatically carry the live captions. Zoom live captions are session-state data that is discarded at the end of the session; the cloud recording does not inherit them. Teams meeting transcription (via the "Transcribe" feature) generates a separate VTT file that can be added to the recording, but this is not automatic, has the same ~82–88% accuracy as the live caption, and is frequently not taken through the correction-and-upload workflow needed to produce a 99% accuracy track. The recording is what creates the persistent compliance exposure — and the recording is typically not captioned when the live-caption-only approach is used.
Section 508 and federal contractors
Organizations subject to Section 508 — federal contractors and subcontractors, federal agencies, federally-assisted programmes — have an explicit requirement that electronic and information technology be accessible to employees and members of the public with disabilities. Section 508 applies to all EIT, including video content distributed to employees through internal systems, and it applies to contractors across a wide scope of internal operations even when the video is not directly connected to the contract performance. Federal contractors who have invested in Section 508 compliance for public-facing content often have uncaptioned all-hands and employee communications video archives that would not survive a 508 internal audit. The Section 508 captions overview covers the technical standard; the application to employee communications video is that all internal video content accessible to employees is covered.
All-hands meetings and town halls: live versus recorded
The all-hands and town-hall format creates a two-phase caption challenge that is unlike any other content category: a live event with a real-time caption requirement, followed immediately by a recording that requires a validated caption track. Most organizations address one of these phases and not both. The live event gets Zoom or Teams auto-captions; the recording gets nothing. Alternatively, the organization invests in a professional live captioner (CART — Communication Access Realtime Translation) for the event but does not carry the CART output through to the recorded version. Understanding both phases and the different caption approaches they require is the starting point for an effective all-hands caption programme.
Phase 1: the live event
A live all-hands creates an immediate accessibility obligation for employees who are attending live, including employees with hearing disabilities who need real-time caption access to participate effectively. There are three approaches to live captioning, with different accuracy and cost tradeoffs:
Platform auto-captions (Zoom, Teams): Zoom Webinar and Zoom Meeting both support automatic live captions powered by a general-model ASR service. Teams Live Events and Teams Meetings both support live transcription via Azure Cognitive Services. These platform auto-captions are available at no incremental cost above the platform subscription. They achieve 80–88% accuracy on clear, standard-accent English speech and 65–78% accuracy on content with significant organizational vocabulary density. They are a reasonable starting point for smaller all-hands events where the primary goal is general comprehension rather than precise understanding of specific product names, financial figures, or strategic terminology. They are not WCAG SC 1.2.2 compliant — WCAG 2.1 SC 1.2.4 ("Captions (Live)") requires captions for live events but does not specify an accuracy threshold, while industry and DoJ guidance references 99% accuracy for effective communication. Platform auto-captions do not meet 99% accuracy on organizational content.
CART (Communication Access Realtime Translation): CART is a professional stenographic service where a trained captioner listens to the audio in real time and produces captions at 97–99%+ accuracy. CART captioners can be integrated into Zoom and Teams live events through a caption integration API that pipes the CART output into the platform's caption stream — attendees see the CART captions in the Zoom or Teams caption overlay, with 97–99% accuracy rather than the 80–88% of platform auto-captions. CART is the gold standard for live event captioning when a hearing-disabled employee needs equivalent-quality live caption access. Cost is typically $150–$350 per hour of live captioning. For an organization that holds monthly 60-minute all-hands events, the annual CART cost is $1,800–$4,200 — meaningful but not prohibitive relative to the live event's organizational importance. The CART output can also be captured as a transcript, which provides a 97–99% accuracy text base for the post-event caption correction workflow.
ASR with domain glossary (hybrid approach): For organizations that cannot use CART for every event but want better-than-platform-auto accuracy, an intermediate approach uses a real-time ASR service with domain vocabulary injection. Services like AssemblyAI, Deepgram, or GlossCap's real-time API accept audio stream input and return caption output with domain-vocabulary bias — similar to the glossary-biased decoding approach in post-processing, but running in real time with slightly lower accuracy (~91–95% on organizational content with a tuned glossary). This approach is significantly less expensive than CART ($0.01–$0.05 per minute of audio versus $2.50–$5.80 per minute for CART) and produces better accuracy than platform auto-captions on domain-specific content. The implementation requires a more complex setup than enabling Zoom's built-in captions, but the caption-quality improvement is substantial for organizations that use significant organizational vocabulary in their all-hands events.
Phase 2: the recording
The recording is where the persistent compliance exposure lives. Employees who missed the live event — and at large organizations with global distribution, that is often the majority of the target audience — consume the content from the recording. The recording is what the employee with hearing loss who is in a different time zone watches. The recording is what the EEOC investigator will review when assessing whether the employer's all-hands content was accessible. The recording has a different caption requirement than the live event: instead of real-time live captions, it needs a validated synchronized SRT or VTT file that meets the 99% WCAG SC 1.2.2 accuracy standard.
The recommended post-event caption workflow for all-hands recordings:
- Download the recording in full-quality MP4 format from Zoom cloud recordings, Teams meeting history, or the live-streaming platform used for the event.
- If CART was used for the live event, obtain the CART transcript and use it as the starting base for the caption file (already at 97–99% accuracy; primary editing needed is to convert from transcript format to timed SRT/VTT with one or two lines per caption event).
- If platform auto-captions were used, download the auto-generated VTT from the platform (Zoom provides a .vtt download in the cloud recording portal; Teams provides a .vtt from Stream) and submit it to a correction workflow: either human review against the audio, or glossary-corrected ASR re-processing of the recording.
- Run a DCMP spot-check on the corrected caption file: select five non-adjacent 2-minute segments, count errors using the DCMP error taxonomy, calculate accuracy. The target is ≥99% on a segment-by-segment basis, with special attention to organizational proper nouns, financial figures, and speaker names.
- Upload the validated SRT or VTT to the hosting platform (Zoom, Vimeo, SharePoint/Stream) as the permanent caption track for the recording.
- Publish the recording to the intranet or distribution channel with the caption track confirmed active and the default-on setting configured per the organization's accessibility policy.
The key constraint is timing: all-hands recordings are typically posted to the intranet within two to twenty-four hours of the event. Most caption correction workflows require at least a few hours even with modern ASR tools. The workflow design must account for this constraint: either the recording is held back from distribution until the caption validation is complete (creates a publication delay that the communications team needs to plan for) or the recording is published without captions initially and updated with captions within four to eight hours (creates a window of uncaptioned distribution that is not ideal but is an explicit, documented workflow choice rather than a silent gap). Many organizations establish a maximum four-hour captioning window for all-hands recordings — the recording is posted with a "captions coming within 4 hours" notice for employees who need them immediately, and the caption track is added and the notice removed as soon as the validation workflow is complete.
Multi-speaker accuracy in all-hands recordings
All-hands events typically have multiple speakers: the CEO, executive team members presenting by function, a guest external speaker, and employees asking questions during Q&A. Multi-speaker recordings are consistently harder to caption accurately than single-speaker recordings for several reasons. Speaker transitions without visual identification create speaker-attribution gaps in caption files — the caption appears but the viewer may not know who is speaking. Q&A segments where employees are in a room with a moderately noisy ambient environment and speaking toward a shared microphone or handheld mic produce lower audio quality than the main presenter's dedicated microphone. Executive team members who are not native English speakers or who have regional accents may score significantly lower on ASR baseline accuracy than the CEO's clear American-standard-accent delivery, creating inconsistent quality across the recording.
The practical mitigations: speaker-labelled caption output (available from CART and from some ASR services with speaker diarization) helps viewers follow multi-speaker segments. Separate microphone feeds for Q&A questions — a handheld microphone passed to questioners rather than a room microphone picking up ambient sound — substantially improves Q&A segment accuracy. For events with non-native English speakers presenting significant portions of the content, pre-event phoneme-variant additions to the organizational glossary for each speaker's most distinctive mispronunciation patterns improve ASR accuracy on their segments. The live versus recorded caption accuracy post covers the technical accuracy implications of multi-speaker live-event audio in more detail.
Executive recordings and video messages
Executive video messages — the CEO's quarterly update, the CFO's financial results walkthrough, the CHRO's benefits explanation, the VP of Engineering's quarterly engineering all-hands — form a distinct subcategory of employee communications video with specific caption challenges. They are typically shorter than a full all-hands (five to fifteen minutes rather than sixty to ninety minutes), produced more informally than a live event (often recorded on a laptop or phone rather than in a professional recording setup), distributed more broadly and through more informal channels (email, Slack, intranet, without a formal "event" framing), and produced more frequently than quarterly all-hands events. Their informal production context and broad distribution create the highest per-minute combination of caption compliance exposure and likelihood of not being captioned of any employee communications content category.
The production and distribution chain for exec video messages
The typical production chain for an executive video message: the CEO or their assistant records a five-to-ten-minute message using Zoom's record function, Loom, or the built-in recorder on a MacBook or iPhone; the recording is shared directly from the platform (Zoom sends a cloud recording link, Loom generates a share link, iPhone Video is uploaded to Vimeo or emailed as a file); corporate communications posts the link to the company intranet and sends an all-staff email. The caption chain in this process: none. The Zoom recording has auto-transcript available if the account admin has enabled it; the Loom recording has auto-captions if the workspace settings enable them; the iPhone video uploaded to Vimeo has no captions at all. In each case, the default path produces an uncaptioned or auto-captioned (below WCAG accuracy) recording distributed to the entire organization. The caption gap is structural: there is no step in the production chain where caption validation happens because there is no caption workflow that corporate communications is running.
Executive vocabulary as a distinct caption challenge
Executive video messages have a vocabulary profile that is substantially different from both training content and general all-hands recordings. The vocabulary categories that generate the most ASR errors in executive communications:
- Financial and investor-relations terminology: EBITDA, CAGR, ARR, net revenue retention (NRR), gross margin, operating leverage, free cash flow, covenant compliance, working capital — these terms appear in executive financial communications at high density. They are standard in investor relations vocabulary but are not in any general ASR model's high-frequency vocabulary, leading to systematic substitution errors. EBITDA is transcribed as "a bit duh" or "ability." ARR is transcribed as "are" or "our." Gross margin becomes "gross margin" correctly if the term is in a general model's vocabulary but "gross Martin" or "go smart" if it is not.
- Organizational structure terms: Team names, leadership titles, function names, and business unit names change over time and are not in any general ASR vocabulary. A "Go-to-Market team" is transcribed correctly or as "go to Mark team" depending on whether the ASR model has seen the compound noun. A new leadership title like "Chief Customer Experience Officer" will be transcribed differently depending on whether each word is in the vocabulary separately or whether the compound noun is recognized.
- Product and company names: Proprietary product names, feature names, brand names for competitor products, and technology names that appear in executive strategy communications are exactly the vocabulary type that general ASR models fail on most consistently. An executive describing "the GlossEngine API" will have "GlossEngine" transcribed as "gloss engine," "glos engine," "glass engine," or several phonetically similar alternatives. These are the same proper-noun failure modes that affect training content, compounded by the higher vocabulary density and faster speaking pace of executive presentations.
- Strategy and consulting vocabulary: Terms like "go-to-market motion," "land-and-expand strategy," "product-led growth flywheel," "organizational transformation," and "north star metric" appear frequently in executive communications and are semi-specific enough that they are not reliably in general ASR vocabulary models. The combination of compound noun structure and domain-specific meaning creates predictable transcription errors.
- M&A and corporate development terminology: During acquisition or merger processes, executive communications use terminology that is both highly specific and highly consequential for employees: "definitive agreement," "day one integration plan," "standalone operations," "earn-out provisions," "post-close integration timeline." Employees who cannot accurately access this content — because captions are missing or inaccurate — are in a documented inequitable information position at a moment when accurate information directly affects their employment decisions.
Recording quality issues in executive recordings
Executive video messages are frequently recorded in suboptimal audio conditions for three reasons. First, executives are busy and don't want to invest in a professional recording setup for a five-minute message — they record on a laptop microphone at a desk. Second, executive assistants or communications staff who support the recording process are generally not audio engineers and don't have established recording protocols. Third, urgency often drives recording decisions — an executive recording a quick message about a breaking organizational development will record it immediately in whatever environment they are in, not after setting up a professional microphone. The result: executive recordings routinely have the home-office audio quality issues (ambient noise, laptop microphone capture, room reverb) described in the remote and hybrid video captioning guide, compounded by the higher vocabulary density and faster speaking pace of executive communication. Building a light-touch recording guideline into the executive communications production process — even just "use an external USB microphone and close the door" — produces measurable accuracy improvements for the captioning workflow that processes the recordings.
Benefits videos from People/HR
Benefits explanation videos from the People/HR team occupy a specific compliance position among employee communications: they communicate information that has direct financial consequence for employee decisions (health plan selection, 401(k) contribution elections, FSA limits, leave accrual policies) and they are addressed to all employees, including employees with hearing disabilities who have the same interest in making informed benefits decisions as any other employee. Benefits videos are typically produced annually (at open enrollment) or ad-hoc (for benefits changes mid-year) by the People team or their benefits broker, hosted on a page the HR team manages, and distributed via all-staff email. They are virtually never captioned by default. The vocabulary — insurance plan names, insurance company names (Anthem, UnitedHealthcare, Kaiser Permanente, Aetna), 401(k) platform names (Fidelity, Vanguard, Empower, Principal), HSA and FSA terminology — is sufficiently domain-specific that general ASR auto-captions will produce consistent errors on the terms that matter most. An employee choosing between two health plan options based on audio they partially misunderstood because the caption said "anthem plan" when the speaker said "Anthem Premier plan" has received inadequate communication about a consequential employment benefit.
Platform-by-platform caption workflows
The platforms used for employee communications video are largely the same as those used for remote training video — Zoom, Teams, Vimeo, Wistia, Loom — but the specific workflows differ because employee communications video has different production contexts (live events versus async recordings), different distribution channels (company intranet, all-staff email, SharePoint) and different timing constraints (all-hands recordings need captions within hours, not days). This section covers the caption upload and management workflows for each major platform, with specific notes on the employee communications context.
Zoom: all-hands recordings and executive video messages
Zoom is the most common platform for live all-hands events and town halls and a frequent choice for executive video messages. The Zoom caption workflow for employee communications recordings:
- After the Zoom recording is available in the cloud recording portal (Zoom > Admin > Account Management > Recording Management, or My Recordings), download the MP4 and the audio-only M4A file. The M4A is typically higher bitrate than the audio track in the MP4 and produces slightly better ASR results if you are submitting audio separately.
- Download the auto-generated VTT from the recording if Zoom's auto-transcription was enabled for the account (Recording Management → select the recording → "Audio Transcript" tab → Download). This is the 82–88% accuracy starting point.
- Submit the audio to your captioning workflow (GlossCap, external vendor, or internal correction process) with the organizational glossary active. For all-hands recordings, the glossary should include the current quarter's key terminology: product names announced in the period, leadership changes, M&A-related terms, and financial metrics being discussed in the current earnings cycle.
- Upload the validated SRT or VTT to the Zoom cloud recording: Recording Management → select the recording → "Captions" tab → Upload. Zoom accepts VTT format for caption upload to cloud recordings. Set the language to "English" (or appropriate language).
- If the recording is being distributed via a Zoom share link (rather than downloaded and re-hosted), confirm that the "Show captions" setting is enabled for the shared recording page so that attendees see the caption toggle in the Zoom recording player.
- If the recording is being downloaded and uploaded to a SharePoint site, company intranet, or video hosting platform, carry the SRT file as a separate asset alongside the MP4 and upload both to the destination platform.
A Zoom-specific consideration for live events: Zoom Webinar (used for larger all-hands where attendees join as view-only participants) and Zoom Meeting (used for smaller all-hands with interactive Q&A) have different recording architectures. Zoom Webinar recordings capture the host and panelist audio clearly but may have inconsistent capture of attendee questions if attendees are unmuted individually rather than using a dedicated Q&A microphone setup. Zoom Meeting recordings with cloud recording enabled capture all participants' audio tracks when unmuted. For high-quality caption output on the Q&A segments, the recommended setup is a Zoom Webinar with a separate Q&A moderator microphone (the moderator repeats or reads aloud each question before the presenter answers it) rather than relying on individual attendee unmute audio quality for the recording.
Microsoft Teams: town halls, all-hands, and Stream hosting
Microsoft Teams is increasingly used for all-hands events through the Teams Live Events and Teams Town Hall features (the latter introduced in 2023 as the designated large-meeting format replacing Townhall and Teams Live Events in some tenant configurations). Teams recordings are stored in OneDrive or SharePoint and are accessible via Microsoft Stream, which serves as the video hosting layer for SharePoint-embedded content. The Teams caption workflow for employee communications recordings:
- After the Teams meeting or town hall recording is available in the chat thread or meeting recording location (SharePoint/OneDrive), access the recording via the Stream link that Teams generates automatically.
- If the meeting had transcription enabled (requires the tenant admin to have enabled Meeting Transcription in the Teams admin center), download the VTT transcript from Stream: open the recording in Stream → "..." menu → "Download transcript" → .vtt file. This is the 82–88% accuracy starting point for correction.
- Submit to the captioning workflow with the current organizational glossary active. Upload the corrected VTT to Stream: open the recording in Stream → Edit → Captions → "Add captions manually" → Upload VTT → set language code (en-US) → Save. The uploaded corrected VTT replaces the auto-generated transcript as the caption track for the recording in Stream playback.
- If the recording is embedded in a SharePoint page (the standard path for intranet distribution), confirm that the Stream web part on the SharePoint page renders the caption toggle correctly. In some tenant configurations, the caption track must be explicitly set to "default on" in the Stream video settings for it to appear by default in the SharePoint embed — check by accessing the SharePoint page in a private browser as a different account and verifying the caption UI appears.
- For Viva Connections or Viva Engage (Yammer) distribution: the Stream caption track propagates to Viva Connections video embeds automatically when the underlying Stream recording has a validated caption track. For Viva Engage video posts (which use a different storage path than Stream-backed SharePoint video), caption files must be uploaded separately using the Viva Engage video caption upload interface.
A Teams-specific consideration: Teams Town Hall recordings (the format intended for large all-company events) have a different recording architecture than standard Teams Meetings. Town Hall attendees are view-only by default; Q&A is conducted through a moderated Q&A panel rather than participant microphone use. The recording captures presenter and panelist audio from their individual microphone inputs, which is higher quality than an open-microphone town-hall setup. This typically produces better ASR accuracy on the main presentations but means that the Q&A segment shows the moderator reading questions aloud rather than the original asker's voice, which is the correct pattern for caption accuracy.
Vimeo: exec video hosting and intranet embedding
Vimeo is frequently chosen as the hosting platform for executive video messages and company-wide video communications because of its clean embed player, privacy controls (password-protected video, domain-restricted embedding), and production quality. The Vimeo caption workflow for employee communications video:
- Upload the MP4 recording to Vimeo (Video Manager → "New video" → upload). If the recording has been through a glossary-corrected captioning workflow, upload both the MP4 and the validated SRT simultaneously by dragging both files to the upload dialog — Vimeo will automatically associate the SRT with the video.
- If the caption file is being added after initial upload: Video Manager → select video → "Captions" tab → "Add captions" → Upload caption file → select language → confirm. Vimeo accepts SRT, VTT, and DFXP (TTML) formats. SRT is the most universally compatible format for Vimeo delivery.
- Set caption language and label: the language code (en-US) and label ("English" or "English [CC]") are set at upload time. For multilingual organizations, upload separate caption files for each language and label them appropriately — Vimeo's player allows viewers to select from multiple caption tracks.
- Configure the caption display default in Vimeo video settings: "Player" settings → "Default caption display" → "On" (recommended for internal accessibility policy) or "Off" (viewer choice). For employee communications video at organizations with an explicit accessibility commitment, defaulting captions on is the correct configuration.
- Confirm that the Vimeo privacy settings allow the intended distribution channel to display the video: if the video is embedded on a company intranet, add the intranet domain to the "Domain" allow-list in Vimeo privacy settings. If the video is distributed via a Vimeo private link (unlisted), confirm the link includes the caption track in the player.
A Vimeo consideration for executive communications: Vimeo's auto-captions are generated by a third-party ASR service and are available on Business plan and higher. These auto-captions are useful as a starting point for the correction workflow but achieve the same 80–88% general-English accuracy as other platform auto-captions on organizational vocabulary. Do not use Vimeo auto-captions as the final caption track for employee communications video without running a DCMP accuracy spot-check and correcting errors on organizational proper nouns.
Wistia: exec video on People/HR pages and external-facing channels
Wistia is used for employee communications video in organizations that prioritize video analytics (knowing exactly which employees watched the all-hands recording and how long they watched), for customer-facing communications that may also be distributed internally (investor day recordings, press release companion videos), and for People/HR content delivery (benefits walkthroughs, HRIS tutorial videos). The Wistia caption workflow:
- Video Details → "Advanced" → "Captions" → "Upload caption file" → select SRT, VTT, or DFXP file → set language code → Upload.
- Set caption default in Wistia: "Customize Player" → "Controls" → "Captions" → toggle "Default captions enabled" to on. This ensures captions display by default for all viewers without requiring each viewer to enable them manually.
- For Wistia embeds on People/HR intranet pages: the Wistia player renders the caption track and the default-on setting in the embed context, regardless of the page the embed appears on. Confirm caption display in an embedded context by loading the specific intranet page and verifying the caption overlay appears on video playback.
- For investor day or press release companion videos that are distributed both externally (on the investor relations page) and internally (to all employees via all-staff email): caption the video with the Wistia upload workflow for the public distribution, and confirm the same Wistia embed (same Wistia video ID) is being used for both the external and internal distribution rather than two separate uploads. Caption corrections to the Wistia video apply to all embed contexts automatically.
Wistia's engagement analytics — heat maps showing which segments each viewer watched — have additional compliance documentation value for employee communications video: if an employee with hearing loss files an accommodation complaint about a specific all-hands recording, Wistia analytics can document whether and when that employee watched the recording, which segments they engaged with, and whether the caption track was active during their viewing session. This creates a detailed compliance record that is useful both for investigating the complaint and for demonstrating that the organization has a systematic caption programme for its employee communications content.
Loom: async executive messages and team communications
Loom is used for shorter, more informal executive communications — a CEO sharing a quick update on a partnership deal, a VP recording a walkthrough of a new tool, a department head recording a personal message to their team. The Loom caption workflow for employee communications video:
- Loom auto-generates captions for Business plan accounts using its ASR pipeline. These are available for editing within the Loom UI (open the video → "Transcript" tab → edit text inline) but the editing interface is designed for light corrections, not for systematic glossary-corrected review of organizational proper nouns.
- For organizational communications that require validated caption accuracy: export the Loom auto-transcript as SRT (Business plan → "Download" → "SRT"), submit to the caption correction workflow with organizational glossary active, receive corrected SRT, and upload back to Loom (Video Settings → "Captions" → "Upload SRT"). Loom accepts SRT upload for videos where the Business plan SRT export is available.
- Alternatively, download the MP4 from Loom (Business plan → "Download" → "Original quality"), upload the MP4 to Vimeo or the company intranet video hosting platform with the corrected SRT as the caption track, and distribute the intranet or Vimeo link rather than the Loom share link. This gives you full caption control in a platform with better caption management than Loom's edit-in-UI approach.
- For Loom videos distributed via Slack (the most common distribution channel for informal executive Looms): Slack displays the Loom embed with the Loom player, which shows whatever caption state Loom has for the video. If the Loom video has a validated uploaded SRT, the Slack embed will display captions via the Loom player. This is the path of least friction for Loom-to-Slack caption distribution.
A Loom-specific consideration: Loom videos shared via individual Loom share links ("My Library" → "Copy link") are not automatically routed to any central caption monitoring workflow. In organizations where executives and managers frequently use Loom for informal communications, the caption programme needs a systematic way to identify and process Loom videos that have been shared to teams or all-company audiences — otherwise, the Loom library grows as an uncaptioned parallel archive alongside the formal video hosting platforms. Options: a policy requiring any Loom shared to more than a named individual to be submitted to the caption workflow; a Loom workspace admin policy enabling auto-captions as the default and a workflow for correcting the auto-captions before distribution; or a Zapier/Make automation that monitors the Loom workspace for new videos shared with the company email domain and triggers a caption workflow notification.
Webex: enterprise all-hands in Cisco-platform organizations
Organizations using Cisco Webex Events or Webex Meetings for all-hands events have an equivalent caption workflow to Zoom: Webex generates auto-transcripts for cloud recordings (available in Meeting Hub for Webex-authenticated accounts), the auto-transcript can be downloaded as a VTT, corrected through the captioning workflow, and uploaded back to the Webex recording. Webex recording caption upload: Meeting Hub → Recordings → select recording → "Transcripts" → "Upload transcript" → upload corrected VTT. Webex also integrates with the Webex Vidcast platform for post-event video hosting and distribution; caption files uploaded to the Webex recording propagate to Vidcast hosting if the organization uses the native Webex-to-Vidcast distribution path. For organizations not using Vidcast, the download-and-rehost workflow (MP4 + corrected SRT to Vimeo or SharePoint) applies.
Glossary architecture for employee communications
The glossary architecture for employee communications video is structurally different from the glossary architecture for training content in ways that require explicit programme design decisions. Training content glossaries are built around stable product and domain vocabulary that changes slowly and can be maintained on a quarterly or semi-annual cadence. Employee communications glossaries must track organizational vocabulary that changes at the speed of the business — quarterly financial cycle, product launch cadence, personnel changes, and strategic initiative vocabulary — while also maintaining the stable foundation of company names, product names, and organizational structure terms. Building the wrong glossary architecture for employee communications video produces a caption programme that is accurate on the training library and systematically inaccurate on the content category where organizational vocabulary change rate is highest.
Four vocabulary categories for employee communications
The employee communications glossary should be structured around four distinct vocabulary categories with different update cadences and ownership:
1. Stable organizational vocabulary (annual update or as-needed): Company name and spelling variants, product names and brand names, key technology platform names (the LMS, the CRM, the HRIS), office location names and city spellings, long-standing organizational unit names (the names of permanent departments that don't restructure frequently). This is the foundation layer that any employee communications caption will rely on and that changes slowly. It overlaps substantially with the L&D training glossary and can be shared between the two programmes. Size: 150–400 terms per organization. Ownership: L&D or caption programme manager, updated at annual review.
2. Quarterly-cycle vocabulary (quarterly update, synchronized with earnings/business cycle): Financial metrics in use for the current quarter (new KPIs being introduced, renamed segments, redefined metrics), quarterly business review terminology, product launch names for the current quarter's releases, new feature names from the current product cycle, analyst-relations framing terms. This is the highest-change-rate vocabulary category for executive and all-hands content and the one most likely to generate ASR errors if not updated in advance of each quarterly all-hands. Size: 40–100 terms per quarter. Ownership: corporate communications and/or the executive communications team, updated in the two-week window before each quarterly all-hands event.
3. People and organizational vocabulary (event-driven update): Leadership names (including phonetic pronunciation guidance for names with non-standard English pronunciation), new-hire names announced in all-hands, organizational team names from recent restructurings, new leadership titles, acquisition target or integration entity names. This vocabulary appears in all-hands and executive communications at a high density at the moments of organizational change (restructuring announcements, leadership transitions, acquisitions) and is zero-frequency at other times. Event-driven update means: when there is a leadership change, add the new leader's name to the glossary before their first all-hands appearance. When a restructuring creates new team names, add them to the glossary before the restructuring is announced. Size: varies; typically 20–60 terms per organizational event. Ownership: HR or People team, with a flag-and-update process aligned with the HR communications calendar.
4. Benefits and HR administration vocabulary (annual update at open enrollment): Insurance carrier names and plan names (Anthem Preferred Gold, Kaiser Permanente HMO, UnitedHealthcare Choice Plus), 401(k) platform names (Fidelity NetBenefits, Vanguard Institutional, Empower Retirement), HSA and FSA custodian names, leave management platform names (Workday Absence, Absence Management, Rippling Time Off), payroll platform names. This vocabulary appears predominantly in People/HR benefits communications and annual HR update videos. It is highly specific, changes when the organization changes carriers or platforms, and is entirely outside the L&D training glossary. Size: 50–150 terms, updated annually at the benefits renewal cycle. Ownership: HR Benefits team or People Operations, with updates provided to the caption programme at least two weeks before the open enrollment video distribution date.
Glossary maintenance and ownership structure
The practical challenge in building this glossary architecture is that the vocabulary owners (corporate communications for quarterly-cycle vocabulary, HR for benefits vocabulary, People/HR for organizational vocabulary) are different from the caption programme owner (typically L&D or an accessibility coordinator). Building a maintenance workflow that extracts vocabulary updates from these teams without requiring them to understand the captioning pipeline is the implementation challenge. The two approaches that work:
Pull approach: The caption programme manager runs a pre-event vocabulary sweep before each major employee communications event. One to two weeks before the quarterly all-hands, they schedule a fifteen-minute sync with the corporate communications lead to identify new terminology: "What product names, financial metrics, or strategic terms are we introducing or changing in this quarter's messaging?" The vocabulary list from that sync goes into the glossary update for the quarter. Before the benefits open enrollment video, they review the updated summary plan description (SPD) from HR Benefits and extract insurance carrier names, plan names, and platform names. The pull approach requires proactive calendar management by the caption programme manager but does not require corporate communications or HR to understand the caption workflow.
Push approach: Corporate communications and HR maintain a shared "new terms" document (a simple shared spreadsheet or a Notion page) where they log new vocabulary as it appears in communications drafts. The caption programme manager reviews this document weekly or before each major event and imports new terms into the captioning glossary. The push approach distributes the vocabulary identification work but requires training corporate communications and HR staff to log new terms, which is a cultural adoption challenge similar to the change management challenge in L&D caption programme rollouts.
In practice, a hybrid works best: pull for major scheduled events (quarterly all-hands, annual benefits enrollment) and push for ad-hoc updates that can't be predicted in advance (M&A announcements, unexpected restructuring, unplanned leadership changes). The caption programme manager runs the scheduled pulls; corporate communications or the executive assistant maintains a running "FYI for captions" note in whatever communication tool they use most naturally (Slack thread, shared doc, email thread).
Glossary for multilingual executive communications
Global organizations may distribute all-hands recordings or executive messages to employees in multiple languages, either by having executives present in multiple languages or by distributing translated caption tracks alongside the English original. The multilingual caption workflow post covers the translation pipeline in detail. For employee communications video specifically, the practical consideration is whether the same organizational vocabulary that requires a specialized glossary in English also requires glossary treatment in the translated caption tracks. For proper nouns — company name, product names, location names, brand names — the answer is typically yes: the translation workflow should preserve the proper noun rather than translating it, and the translated caption file's quality check should verify proper noun preservation. For financial and strategic terminology, translation conventions vary by language: some markets use English financial terms in an otherwise native-language context (EBITDA stays as EBITDA in German and French business communications) while others use translated equivalents (free cash flow → freier Cashflow or cash-flow libre). The glossary for translated employee communications should document the proper noun and financial term conventions for each target language rather than relying on the translation service's defaults.
Building the employee communications caption programme
The employee communications caption programme is not an extension of the L&D training caption programme — it is a parallel programme that shares some infrastructure (captioning vendor, glossary foundation layer, DCMP quality standard) but has a different production source, different content scope, different distribution channels, and different timing constraints. Building it as a sub-section of the L&D programme typically fails because the ownership structure doesn't map: L&D cannot be accountable for content it doesn't commission. Building it as an entirely separate programme typically fails because corporate communications and HR don't have captioning expertise or infrastructure. The architecture that works: a shared captioning infrastructure owned by L&D or an accessibility coordinator, with defined handoff protocols for corporate communications and HR content.
Defining the content scope
Before the programme can be designed, the content scope needs to be defined: what employee communications video requires captions? The answer depends on how rigorously the organization wants to apply the ADA Title I effective communication obligation. A risk-tiered approach:
- Tier 1 — Always caption: All company-wide communications (all-hands recordings, CEO/exec video messages, town halls, any video distributed to all employees via all-staff email or the company intranet). These are the highest-visibility content categories and the ones that generate the clearest Title I effective communication obligation. An employee with hearing loss who cannot access a company-wide communication is in a documentably inequitable information position relative to all other employees.
- Tier 2 — Caption before distribution: All-department or all-team communications where the audience includes known or likely employees with hearing disabilities (large department meetings, function-wide announcements, HR-administered benefits communications). The key distinction from Tier 1 is audience size rather than content type.
- Tier 3 — Caption within 24 hours of distribution: Team-level communications and smaller audience video content (department head recordings to teams of 10–20, sub-functional announcements). The case for captioning this content is the same as for Tier 1 and 2 but the timing constraint is relaxed because the audience size reduces the likelihood that a hearing-disabled employee needs access on day one versus within a day or two.
- Tier 4 — Caption on accommodation request: Truly informal one-to-one or very-small-audience video communications (manager recording a quick update for a team of three) where caption-on-request is a reasonable accommodation. This is a defensible scope for the informal tail of video communication at most organizations, but the scope must be explicitly defined and documented — "we caption on request for videos below X-person audience" rather than "we caption what we get around to."
Most organizations starting a structured employee communications caption programme should focus on Tier 1 and build from there. Attempting to caption all four tiers simultaneously is typically not operationally feasible in the first twelve months and generates quality problems when the workflow is stretched beyond its capacity. Document the tiers in the internal captioning governance policy — the scope definition is the clause that determines which content the caption obligation applies to in any future investigation or accommodation request.
RACI for the employee communications caption programme
The ownership structure for the employee communications caption programme spans three to four teams. A workable RACI:
| Activity | L&D / Caption Programme | Corporate Comms | HR / People | IT |
|---|---|---|---|---|
| Define content scope and tiers | Accountable | Consulted | Consulted | Informed |
| Maintain captioning vendor relationship | Accountable | Informed | Informed | Informed |
| Maintain organizational glossary foundation layer | Accountable | Consulted (quarterly vocabulary update) | Consulted (benefits/people vocabulary) | Informed |
| Submit Tier 1 all-hands recordings to caption workflow | Responsible | Responsible (notifies L&D of new recording) | Informed | Informed |
| Submit exec video messages to caption workflow | Responsible | Responsible (notifies L&D of new recording) | Informed | Informed |
| Submit benefits and HR communications videos | Responsible | Informed | Responsible (notifies L&D of new recording) | Informed |
| Upload validated caption tracks to distribution platforms | Responsible | Informed | Informed | Consulted (platform access) |
| Run DCMP quality spot-check | Accountable | Informed | Informed | Informed |
| Maintain compliance documentation and audit trail | Accountable | Informed | Informed | Informed |
| Update quarterly vocabulary for all-hands | Informed | Responsible | Informed | Informed |
| Update benefits vocabulary at annual enrollment | Informed | Informed | Responsible | Informed |
The notification and handoff protocol
The most critical workflow element — and the one most frequently absent in organizations that have a caption infrastructure but don't use it for employee communications video — is the notification and handoff protocol. When corporate communications produces an executive video message, how does L&D (or the caption programme manager) know about it in time to caption it before distribution? The answer must be a defined protocol, not an ad-hoc email. Options by organizational maturity:
Simple (low volume, small team): A shared Slack channel or Teams channel where corporate communications posts "New video for captioning" with a link to the recording and the intended distribution date. L&D picks up the job and responds in the channel when the caption track is uploaded. This works for organizations producing two to five employee communications videos per month.
Structured (medium volume): A shared intake form or project management workflow (Asana, Monday.com, Notion database) where corporate communications and HR submit new video jobs with metadata (title, recording link, intended distribution date, audience tier, vocabulary notes). L&D or the caption programme manager processes jobs from the queue and updates status. This works for organizations producing six to twenty employee communications videos per month and provides the audit trail documentation that the compliance reporting framework needs.
Automated (high volume or distributed production): A Zapier, Make, or Power Automate flow that monitors the Vimeo account or SharePoint library for new uploads from designated corporate communications accounts and automatically creates a caption job in the workflow system. This is appropriate for large organizations with high employee communications video volume and is the same trigger-based architecture described in the accessibility coordinator playbook for publication-gate enforcement.
Remediating the existing archive
Organizations building a new employee communications caption programme will almost certainly have an existing archive of uncaptioned all-hands recordings, executive video messages, and HR communications. The remediation approach mirrors the training library remediation framework in the LMS audit methodology: inventory the archive, triage by compliance priority, caption in priority order, document the remediation plan and completion dates. Specific to employee communications archives:
- Prioritize by recency and audience size: a company-wide all-hands recording from six months ago that employees are still referenced in by their managers ("watch the January all-hands if you missed it") is higher priority than a three-year-old town hall that has had zero recent access.
- Prioritize by content sensitivity: recordings that discussed restructurings, layoffs, significant benefit changes, or organizational decisions that affected specific employees' employment conditions have higher effective-communication priority than general strategy updates.
- For recordings more than two years old with no recent access, document them as "backlog — not yet captioned" in the compliance log and prioritize fresh new-content captioning. Attempting to retroactively caption every historical employee communications recording before building a workflow for new content inverts the priority — the forward-going compliance gap is the more immediate risk.
Eight failure modes in employee communications video caption programmes
1. Live platform auto-captions treated as compliant for the recording
The most common and most consequential failure mode. An organization enables Zoom live captions or Teams live transcription for its all-hands events, considers the accessibility obligation discharged, and posts the recording to the intranet without adding a validated caption track. The employees who attended live had imperfect (80–88% accuracy) caption access during the event; the employees who watch the recording after the event have no caption access at all. This failure is invisible in routine operations — the all-hands is run with captions (live), the recording is posted to the intranet (where many employees access it), and no one in the production chain notices that the recording has no captions. It surfaces when an employee with hearing loss files an accommodation request or an EEOC charge and the investigation reveals that every all-hands recording in the archive is uncaptioned. The fix: establish an explicit policy that live captions do not satisfy the effective communication obligation for the recording, and implement the post-event caption workflow described in the all-hands section above.
2. No content scope definition for employee communications
The L&D caption programme has a well-defined content scope: content published to the LMS. The employee communications caption programme at most organizations has no defined scope: there is no policy document that says "company-wide all-hands recordings require captions" and "manager recordings to teams of fewer than ten do not." Without a scope definition, every individual judgement call about whether a video needs captions defaults to "no" (because captioning requires effort and there is no defined obligation that says yes). The result over time: some videos get captioned (typically when an accessibility-conscious communications manager happens to be producing a specific video) and most don't, with no systematic coverage. The fix: define and document the content scope in the governance policy before building the workflow, not after. The tier framework described in the programme-building section is a workable starting structure.
3. Corporate communications receives no caption feedback loop
Even when a caption programme is established for employee communications video, corporate communications producers often have no visibility into caption accuracy issues. If an executive's name is consistently being transcribed incorrectly in all-hands captions, the caption programme manager may notice and correct it — but if there is no feedback channel from the caption programme to corporate communications, the communications team continues producing content with the same issue and the same correction has to be made on every recording. The fix: establish a quarterly feedback meeting between the caption programme manager and the corporate communications lead, covering the top-five vocabulary errors from the previous quarter's employee communications captions and the glossary updates needed before the next all-hands. This is the organizational equivalent of the accuracy feedback loop that training content teams use to compound glossary accuracy over time.
4. Benefits video captioned at lower quality than training video
Organizations that have a systematic caption programme for L&D training content often have an ad-hoc or absent caption programme for HR benefits videos. The benefits video is produced by a different team (HR/People), reviewed by a different reviewer (benefits specialist, not L&D instructional designer), and distributed through a different channel (benefits portal, HR intranet page, open enrollment email) — none of which is connected to the L&D caption workflow. The result: the organisation's employee-facing video library has WCAG-compliant captions for compliance training and no captions (or auto-captions) for benefits explanations. The compliance risk is symmetric: an employee with hearing loss who cannot access the benefits open enrollment video has the same ADA Title I effective communication claim as an employee who cannot access a compliance training module. The fix: explicitly include HR/People communications in the employee communications caption programme scope, with People/HR as the notification owner for benefits video submissions.
5. Organizational vocabulary not updated before the quarterly all-hands
The glossary-corrected caption workflow produces high accuracy on the vocabulary that was in the glossary when the captioning ran. It produces general-model accuracy (80–88%) on vocabulary that was introduced in the current quarter and not yet added to the glossary. If the quarterly all-hands introduces a new product name, a renamed financial metric, or a new organizational entity name — which is typical for quarterly all-hands events — and the glossary has not been updated with that vocabulary in advance, the caption for the all-hands will have the new terms transcribed incorrectly throughout. The fix: establish the pre-event vocabulary sweep as a mandatory step in the all-hands production calendar, not an optional enhancement. Two weeks before the quarterly all-hands, the caption programme manager reviews the draft all-hands agenda and presentation deck with the corporate communications lead, identifies new terminology, and updates the captioning glossary before the event recording is processed.
6. Distributed executive Loom library growing as an uncaptioned archive
Executives and managers at many organizations have adopted Loom for quick async updates, team communications, and informal messages. These Loom recordings accumulate in workspace libraries without routing through any caption programme — Loom's auto-captions are present but at general-model accuracy (not WCAG-compliant), and the recordings are typically shared via Slack or email without any caption review step. Over eighteen to twenty-four months, a large organization may accumulate hundreds of uncaptioned or auto-captioned-only Loom recordings representing employee communications that were never accessibly delivered. The failure is invisible until an accommodation request or compliance audit surfaces it. The fix: implement a workspace-level Loom policy requiring that any Loom video shared with more than a named individual (i.e., shared to a team channel, posted to an all-staff Slack channel, or emailed all-staff) is submitted to the caption workflow before distribution, or within twenty-four hours of distribution for time-sensitive informal messages.
7. No caption programme for crisis communications
Emergency communications — a CEO message about a data breach, a leadership announcement about an unexpected leadership change, a HR notification about a workplace incident, an all-staff communication about a site closure — have the shortest production-to-distribution timelines of any employee communications video category and the highest combined urgency and compliance sensitivity. An organization that has a working caption programme for routine all-hands recordings frequently has no plan for captioning a crisis communication that needs to go out within hours of a trigger event. The failure: the crisis communication is distributed uncaptioned because the normal caption workflow takes longer than the available time, and the employees most affected by the communication (who may include hearing-disabled employees whose employment is directly affected by the crisis) cannot access it. The fix: define a crisis communications caption path in advance — typically a CART service on retainer that can be activated for rapid live captioning of an urgent recording, or an on-call caption review process that can turn around a validated SRT within two to three hours for a fifteen-minute crisis communication video.
8. Caption programme scope not reviewed as employee communications video volume grows
An employee communications caption programme that was adequate when an organization produced ten videos per month may be inadequate when the same organization produces thirty videos per month after moving to video-first internal communications. The workflow that handled a small volume of Tier 1 company-wide communications starts missing Tier 2 department-level communications as production volume grows. The glossary that covered the vocabulary of a stable product line starts missing terms as the product portfolio expands. The quarterly vocabulary sweep that was sufficient for a single quarterly all-hands becomes inadequate when the organization moves to monthly all-hands plus weekly leadership video updates. The fix: include the employee communications caption programme in the annual compliance programme review described in the compliance programme build post — specifically, review programme capacity against actual production volume, and flag any categories of employee communications video that have grown outside the current scope definition. Update the scope definition and workflow capacity before the compliance gap widens.
FAQ: captioning employee communications video
Does ADA Title I actually require us to caption all-hands recordings and executive video messages, or only training video?
ADA Title I's effective communication obligation extends to employer communications broadly, not just to formal training content. The statute requires employers to provide effective communication to qualified individuals with disabilities across all terms, conditions, and privileges of employment — and "conditions of employment" encompasses the information an employer communicates to employees about organizational direction, personnel decisions, compensation and benefits, and operational policies. An all-hands recording that describes a reorganization affecting reporting structures is communicating information that is directly relevant to the conditions of employment for the employees involved. A benefits open enrollment video is communicating information about a significant term of employment. The EEOC's guidance on reasonable accommodation for hearing disabilities identifies synchronized captions as an appropriate form of effective communication for video content that employees are expected to engage with. The practical scope question is where to draw the informal-communication line: company-wide communications (all-hands, town halls, all-staff emails with embedded video) are clearly in scope; one-on-one informal manager messages are arguably not. The governance policy should define that line explicitly rather than leaving it to case-by-case judgment. The organizations with the lowest compliance exposure are those that have made an explicit scoping decision, documented it, and built a programme around it — not necessarily the ones that have captioned every video.
Our all-hands uses Zoom Live Captions during the event. Doesn't that mean we've already addressed the accessibility requirement?
Live captions during the event address the accessibility obligation for employees who attend live, with the caveat that Zoom Live Captions achieve approximately 80–88% word-level accuracy on general executive speech and substantially lower accuracy on organizational vocabulary — which is below the 99% WCAG SC 1.2.2 threshold. More importantly, the live captions do not carry over to the recording. The Zoom cloud recording is generated from the video and audio tracks captured during the session; the live caption display is session-state data that exists only during the live event and is not written to the cloud recording. Employees who watch the recording after the event — a common situation for employees in different time zones, employees who were in conflicting meetings, and employees who want to review specific segments — have no captions unless a separate validated caption track is uploaded to the recording. The EEOC's effective communication standard applies to all employees' access to the communication, not just those who happened to attend live. The recording needs a validated SRT/VTT uploaded before distribution to meet the effective communication standard for the employees who access it after the event.
Who should own the caption programme for employee communications video — L&D, corporate communications, or HR?
In most organizations, the operationally correct owner is L&D or the accessibility coordinator role that already owns the training content caption programme — because they have the captioning vendor relationship, the glossary infrastructure, the DCMP quality standard knowledge, and the compliance documentation practice. However, the content producers — corporate communications and HR — must have defined responsibilities for submission notification and vocabulary updates, because the caption programme owner cannot know about a new executive video message unless the communications team tells them about it. The RACI in the programme-building section describes the practical ownership structure: L&D accountable for the captioning infrastructure and quality standard; corporate communications and HR responsible for submitting content and vocabulary updates on schedule; nobody able to opt out of the workflow because the content scope is defined in the governance policy rather than being subject to individual judgement. In organizations where L&D has explicitly scoped out employee communications content, a second option is for corporate communications to own the caption programme for its content category, with L&D in a consulting role providing vendor recommendations and quality standards. This works if corporate communications has the programme management capacity to run a caption workflow; it often doesn't in smaller or leaner organizations.
What's the right caption turnaround time for a quarterly all-hands recording?
The target that balances compliance obligation against practical production timelines for most organizations: validated captions uploaded within four to eight hours of the recording becoming available, with same-day distribution of the captioned recording. This means: the all-hands ends at noon, the recording is available in Zoom cloud storage at 12:30 pm, the recording is submitted to the caption workflow at 12:45 pm, a validated corrected SRT is returned by 4 pm (three hours for a 90-minute all-hands at a typical vendor turnaround), the caption track is uploaded and verified by 4:30 pm, the captioned recording link is distributed to all-staff by 5 pm the same day. Organizations using GlossCap with a pre-loaded glossary for quarterly vocabulary can typically run the audio through glossary-corrected ASR in under thirty minutes and use that as the basis for a light human review step (thirty to sixty additional minutes) rather than a full correction run — bringing the total caption workflow time for a 90-minute all-hands to approximately ninety minutes to two hours, enabling same-afternoon distribution. The alternative framing that some organizations use: publish the recording to the intranet immediately with a "captions coming by [time]" banner for employees who need them, then update the posting with the validated caption track when it is ready. This acknowledges the timing tension explicitly rather than hiding it, and ensures employees who need captions are not left without any notification of when accessible access will be available.
We use a professional CART captioner for our all-hands live event. Do we still need to run the recording through a separate caption workflow?
The CART transcript — captured at 97–99% accuracy during the live event — is an excellent starting point for the recording caption workflow and significantly reduces the correction work needed. The CART output is typically provided as a text transcript rather than a timed SRT/VTT file; the additional step required is timing calibration: the CART text needs to be segmented into caption events (one to two lines per event, typically fourteen to seventeen words per segment) and timed to match the audio in the recording. This is a substantially lighter workflow than correcting a 80–88% accuracy auto-caption from scratch — the vocabulary is already correct at 97–99%; the work is formatting and timing. Tools like Subtitle Edit, Aegisub, or captioning platform timing editors allow the CART transcript to be loaded as a text file and timed against the recording audio in a semi-automated process (auto-timing based on audio waveform + manual verification of timing precision). For a 90-minute all-hands, timing calibration of a CART transcript takes 45–75 minutes compared to two to three hours for full correction of a 80–88% accuracy auto-caption. The CART approach saves time on the recording workflow and produces a higher-quality result — the investment in CART for the live event also accelerates the post-event recording caption workflow.
Do we need to caption video content that employees can only access by calling in to an earnings call or investor day, if we also share that recording internally?
Yes, once an earnings call or investor day recording is shared internally as an employee communication, it carries the same ADA Title I effective communication obligation as any other employee communications video. The external investor relations obligation and the internal employee communications obligation are separate: the investor-facing streaming of the earnings call has its own accessibility considerations under the CVAA and SEC-related guidance; the internal distribution of that same recording to employees activates the ADA Title I effective communication obligation for the employee audience. The practical workflow: most investor relations departments already caption the investor-facing earnings call (either with CART for high-quality live captioning or with the earnings transcript as a validated caption base — the earnings call transcript is already produced for investor relations purposes and typically covers 90%+ of the relevant content). The captioned investor-facing recording can be used as the starting point for the internal employee distribution, with a light review pass to confirm that the caption file is in SRT/VTT format suitable for the internal hosting platform and that the language code is correctly set. If the external earnings call caption was produced at 99%+ accuracy (CART or earnings-transcript-based), the same caption file can typically be distributed internally without additional correction. The organizational vocabulary in earnings calls (financial metrics, product names, segment definitions) is exactly the vocabulary category that benefits most from the glossary-corrected approach — earnings-call vocabulary is the highest-density financial and organizational proper-noun content category in employee communications.
We're a startup with twelve employees. Do we need a formal employee communications caption programme?
At twelve employees, you are at the ADA Title I coverage threshold (fifteen employees is the statutory cutoff; twelve is slightly below it, but coverage thresholds are calculated differently for some protected characteristics and the EEOC recommends employers at or near threshold plan for coverage). More practically: if you currently have no employees with hearing disabilities and no accommodation requests, a formal employee communications caption programme is not urgent. If you have any employee with a documented hearing disability or any accommodation request for captioned communications, the obligation is immediate. The pragmatic approach for a small organization: establish the practice of enabling Zoom or Teams auto-captions for all internal events (low friction, improves accessibility immediately even at 80–88% accuracy), build the organizational glossary now (while the company vocabulary is small and manageable), and establish a simple notation-and-caption process for any formal company-wide video communications (the CEO all-hands, the quarterly update). The operational overhead is low at twelve employees — the quarterly all-hands video is probably thirty minutes and can be corrected with a validated glossary pass in under an hour. Building the practice now, before it becomes a compliance requirement or before a hearing-disabled employee joins the team, is the lowest-friction time to establish it. The compliance programme build post covers the lightweight programme architecture that works at small organizational scale.
Close the employee communications caption gap with GlossCap
GlossCap's glossary-biased captioning workflow handles the full scope of employee communications video — all-hands recordings from Zoom or Teams, executive video messages from Loom or Vimeo, benefits explanation videos from HR, and policy announcement recordings — with the same organizational glossary that your L&D training caption programme uses. Upload the recording, run glossary-biased Whisper-large processing against your organization's current vocabulary (including quarterly-cycle updates for financial and strategic terminology), and receive a validated SRT/VTT file ready to upload to Zoom, Stream, Vimeo, or Wistia within hours of the recording being available. The Team plan ($99/month) includes 30 hours of video per month and Notion/Confluence/Google Docs glossary sync — workable for most organizations' combined L&D and employee communications video volume. The Org plan ($299/month) includes unlimited hours, SSO, custom glossary model, and LMS webhooks for organizations that need a fully automated submission-to-distribution pipeline for both training and employee communications video.