Frontline Training Operations · Published 2026-06-05
Microlearning captions for frontline workers: TalentCards, EdApp, Axonify, and OSHA-vocabulary accuracy
Frontline worker training has a caption problem that is structurally different from every other L&D context, and the difference begins with the delivery environment. In a corporate learning scenario, a compliance professional watches a 20-minute onboarding module at a desktop workstation with headphones in a quiet office. Audio is the primary delivery channel. Captions are an accommodation for workers who are deaf or hard of hearing — important, legally required, but serving a minority of the learner population. In a manufacturing plant, a warehouse, or a construction site, ambient noise routinely exceeds 80–90 decibels — above OSHA's permissible exposure limit for continuous 8-hour exposure and well above the threshold where speech intelligibility degrades for a listener without hearing protection. Workers on the shop floor during a break, in the cab of a forklift, or at a loading dock during a brief lull in vehicle traffic cannot reliably hear training audio even with their device at maximum volume. NIOSH estimates that 17% of manufacturing workers have occupational noise-induced hearing loss. In frontline environments, captions are not an accommodation for a minority of workers — they are the primary content delivery channel for a substantial fraction of every training audience. The caption operation that treats frontline microlearning as "just shorter courses with the usual caption workflow" will build a system optimized for the wrong assumption.
The second structural difference is the delivery architecture. Microlearning for frontline workers does not arrive via a desktop LMS portal that workers log into at a scheduled time. TalentCards delivers via push notification to a mobile app — a worker's phone buzzes at 7:30 AM with a 3-minute LOTO refresher before their shift starts. EdApp sends a daily learning prompt 20 minutes before clock-in. Axonify's adaptive algorithm queues 3–5 minutes of knowledge reinforcement for workers to complete as they badge in for the day. In all three architectures, the training moment is brief, mobile, and often offline — content is downloaded to the device ahead of time and played from local storage because the factory floor has no reliable Wi-Fi. If a caption file is not bundled into the downloaded content package at the moment the content is published, it does not exist for the worker who already downloaded it. There is no second-chance delivery path. The caption operation that manages files at the wrong level — deck rather than card in TalentCards, after publish rather than before in EdApp's offline workflow — will produce content that reaches workers without captions regardless of whether the caption file was eventually uploaded.
The third structural difference is vocabulary density. Manufacturing and safety training captions carry the highest density of specialized regulatory terminology per minute of any training content type. A 3-minute lockout/tagout module contains 29 CFR § 1910.147 citation text, terms like "energy-isolation point," "zero-energy state," "authorized employee versus affected employee," "hasp," and "lockout device" — terms that Whisper-large was trained on essentially zero times in the context of a safety training voice recording. Our benchmark research documents 85.8% baseline accuracy for manufacturing/EHS content on 15-minute modules. For 3-minute microlearning clips, the context window for disambiguation is compressed: when a regulatory term appears once in the first 15 seconds of a 90-second segment, the model has minimal surrounding speech to resolve an ambiguous phoneme sequence. The result is a baseline word error rate of 83–87% on OSHA-regulatory microlearning content — the worst accuracy category across every L&D vertical — and, without a targeted glossary, a pattern where the compliance-critical terms are consistently the ones transcribed incorrectly. "Lockout/tagout" becomes "look out tech cut." "PAPR" becomes "paper." "29 CFR § 1910.147" becomes "twenty-nine CFR section nineteen ten one forty-seven" when it needs to read as the citation format, not the spoken cardinal version.
This post is the operational guide to building a caption workflow for frontline microlearning that handles all three structural differences. It covers the platform-specific upload architectures for TalentCards, EdApp (now SC Training), and Axonify; OSHA-vocabulary accuracy benchmarks by content type; the compliance framework under OSHA, ADA Title I, Section 508, MSHA, and DOT FMCSA; the two-tier production model for regulatory and general safety content; volume math for back-catalogue retrofits; and the eight failure modes that cause frontline caption operations to break silently. The companion posts — captioning HazCom SDS content, the 15 categories of proper nouns that break auto-captions, caption QA methodology, and Whisper accuracy benchmarks by vertical — cover the underlying accuracy and QA infrastructure. This post focuses on what changes when you apply that infrastructure specifically to frontline microlearning platforms and OSHA-regulated content.
TL;DR — three things that matter about frontline microlearning captions
- In frontline environments, captions are the primary delivery channel, not an accessibility accommodation. Factory floor, warehouse, and construction site ambient noise routinely exceeds 80 dB, degrading speech intelligibility for workers without hearing protection — which is nearly everyone during active production. The caption operation that treats captions as an edge-case accommodation will consistently under-invest in the infrastructure (glossary coverage, offline bundling, per-card upload) that makes captions actually available in the environment where training happens. Build for the noise floor. Captions are load-bearing.
- OSHA vocabulary in short clips creates the hardest ASR problem in L&D. Manufacturing and EHS content sits at the bottom of the baseline accuracy table across all verticals, and the short-clip format of microlearning removes the contextual buffer that helps Whisper resolve ambiguous phoneme sequences. A 60-term targeted glossary — covering regulatory citation formats, equipment-specific LOTO terms, chemical names from the plant's SDS library, and PPE abbreviation expansions — closes the accuracy gap from 83–87% baseline to 99%+ on the terms that matter most for OSHA compliance documentation. Without the glossary, the modules with the highest regulatory consequence are the ones with the worst caption accuracy.
- TalentCards, EdApp, and Axonify have different caption architectures that require different operational procedures. TalentCards requires per-card SRT upload — a deck does not have a single caption track. EdApp requires Creator role (not Author) and captions must be uploaded before the lesson is published for offline delivery. Axonify's spaced-repetition model means the same video is delivered 10–12 times over months; a caption file update after initial publication does not retroactively update the cached version on workers' devices. Understanding these platform-specific behaviors is the difference between a caption operation that works and one that looks complete in the admin panel while delivering uncaptioned content on the shop floor.
What makes frontline microlearning different from standard L&D training
Microlearning for frontline workers is not "shorter courses." It is a fundamentally different delivery architecture, built around different assumptions about where, when, how, and under what conditions workers engage with training content. Understanding the structural differences is the prerequisite for building a caption operation that works in practice.
The noise floor: captions as the primary channel
OSHA's permissible noise exposure limit under 29 CFR § 1910.95 is 90 dB(A) as an 8-hour time-weighted average. Environments with ambient noise consistently above 80 dB — which includes most production floors, many warehouse receiving areas, construction sites, and vehicle dispatch yards — degrade speech intelligibility enough that audio-only training delivery is unreliable. Workers wearing ear protection on the production floor cannot hear training audio at all without removing PPE. Workers on a 10-minute break in a noisy common area face ambient noise that competes directly with device audio output.
This is not primarily an accessibility question, though it intersects with accessibility compliance. It is an instructional effectiveness question. An organization that deploys a 3-minute LOTO refresher module on a quarterly cadence and sets captions off-by-default has built a training program that is structurally inaccessible to 30–50% of its intended audience on any given delivery day, depending on where and when workers engage with the content. The training records will show completion. The workers will have clicked through the module. The actual content transfer will not have occurred for the workers who could not hear the audio. When OSHA's compliance officer asks whether training was "effective" under 29 CFR § 1910.147(c)(7)(i), completion records are not a sufficient answer.
The appropriate default for frontline microlearning is captions on. Not optional. Not toggle-accessible. On by default, with an opt-out path for workers who prefer audio-only in a quiet environment. The platform default should reflect the deployment environment, not the platform vendor's assumption that learners are at a desktop with headphones.
Push-notification delivery and the one-shot window
TalentCards, EdApp, and Axonify all use push-notification-triggered training delivery. Content is queued to arrive at a specific time — typically 15–30 minutes before a worker's shift starts — and the training window is the brief period between notification and clock-in. In practice, workers open the notification, complete the 3-minute module in the break room or the locker area, and badge in. The training happens once. There is no "I'll watch that with the captions on when I'm at my desk later" path because the training already recorded as completed.
This delivery pattern means the caption file must be present and correct at the moment the content is first consumed. There is no second delivery. If a worker with hearing loss — or a worker on a noisy floor — receives the first delivery of a new LOTO refresher module without captions, the training event has happened without effective content transfer, and the completion record will reflect a completed non-event. Fixing the caption file the day after the first batch delivery corrects the file but does not correct the training gap for workers who already completed the module.
Module length and ASR context compression
Whisper's decoder architecture uses contextual information from surrounding audio to resolve ambiguous phoneme sequences. In a 30-minute training module, a regulatory citation that appears in minute 2 is surrounded by 28 minutes of vocabulary context that helps the model maintain accuracy for related terms throughout the module. In a 3-minute microlearning clip, a term like "authorized employee" appearing in the first 30 seconds has approximately 20 words of prior context — not enough to establish the regulatory domain and bias the decoder toward OSHA-specific vocabulary.
The practical consequence: baseline word error rate on 3-minute OSHA-regulatory clips runs about 1–2 percentage points higher than the same vocabulary in a 30-minute module. A glossary closes this gap more completely than model size — as the benchmark research documents, a 60-term targeted glossary outperforms 3 Whisper model tiers on domain-specific vocabulary. But the glossary must be sized for the specific content type, not borrowed from a general L&D installation.
Offline-first architecture
Factory floors, warehouses, and construction sites have inconsistent wireless connectivity. TalentCards and EdApp are built for offline-first delivery: workers download content to their devices when on Wi-Fi (typically at home or in a break room), and the platform serves content from local storage when cellular data is unavailable or unreliable on-site. Axonify's delivery model is primarily online but supports offline queuing in some deployment configurations.
The offline-first architecture creates a timing constraint that does not exist in desktop LMS delivery: caption files must be bundled into the offline download package at the time the content is downloaded, not when it is played. If a caption file is uploaded after a worker has already downloaded the content, the worker will not see the caption on their device until they explicitly re-sync the content over a Wi-Fi connection. In practice, re-syncing a downloaded content library does not happen automatically during a worker's normal device-usage pattern, particularly in environments where the only available Wi-Fi is in the break room during a 15-minute window. Captions uploaded after initial publication reach newly downloading workers but not workers who already have the content cached locally.
Short-module OSHA vocabulary density
A 3-minute module at typical safety-training narration pace (130–150 words/minute) contains approximately 400–450 words. In a LOTO refresher module, 15–25 of those words may be regulatory terms with zero or near-zero presence in Whisper's training data — IUPAC chemical names from the SDS library, specific OSHA citation formats, equipment model numbers, and PPE abbreviations specific to the facility. That is a 5–6% proper-noun density in a very short sample. Without glossary injection, the model resolves most of those terms incorrectly. The module sounds fluent and complete in audio form; the caption track quietly contains the wrong regulatory vocabulary for the terms that carry the compliance weight.
The HazCom captioning guide covers IUPAC systematic names and GHS hazard codes in detail. The sections below extend that analysis to LOTO, respiratory protection, fall protection, and transportation vocabulary — four OSHA content areas with distinct proper-noun profiles that a single "general safety glossary" will not cover adequately.
The OSHA vocabulary problem in microlearning content
Each OSHA-required training area has a specific vocabulary profile. A HazCom glossary built around IUPAC chemical names will not help a LOTO module. A LOTO glossary built around energy-isolation terminology will not help a respiratory protection module. Understanding the per-content-type vocabulary problem is the prerequisite for building a glossary that covers the specific modules in your training library.
Lockout/tagout vocabulary (29 CFR § 1910.147)
LOTO training is the most commonly cited OSHA standard violation — it appears in the top five most frequently cited OSHA standards every year. Training is required for both "authorized employees" (those who perform LOTO procedures) and "affected employees" (those who work in areas where LOTO procedures are performed). The distinction between authorized and affected employee is a compliance-critical term that appears in every LOTO training module and that Whisper consistently transcribes as a phonetically similar but semantically different phrase.
Key LOTO vocabulary that Whisper struggles with at default settings:
- Energy-isolation point — transcribed as "energy isolation point" (correct) or "energy aisle station point" (incorrect) depending on speaker accent and pacing
- Zero-energy state — transcribed as "zero energy state" (correct) or "zero energy straight" (incorrect)
- Hasp — a physical lockout device; transcribed as "hasp" (correct) or "has" or "asp" (incorrect) because the word is extremely rare in general English audio corpora
- Authorized employee versus affected employee — the distinction matters legally; Whisper often drops the "authorized vs. affected" distinction and transcribes both as "employee" after the first instance
- De-energize — transcribed correctly when the speaker enunciates clearly; transcribed as "re-energize" in about 15% of cases when the "de-" prefix is unstressed
- LOTO (the abbreviation itself) — transcribed as "lotto" approximately 60% of the time; if the acronym appears without expansion, the caption reads as if the training is about the lottery
- 29 CFR § 1910.147 — spoken as "twenty-nine CFR section nineteen ten point one forty-seven"; must be normalized to the citation format in the caption, not transcribed as spoken cardinals
A targeted LOTO glossary typically requires 18–25 terms covering: the official regulation number and common spoken variants, energy-source type names (electrical, hydraulic, pneumatic, thermal, chemical, gravitational), equipment-specific isolation procedure names from the facility's LOTO procedures, and the authorized/affected employee distinction.
Respiratory protection vocabulary (29 CFR § 1910.134)
Respiratory protection training is required for any worker assigned to use a respirator. The training must cover respirator selection, use, and maintenance — all of which involve equipment-model terminology and NIOSH approval designations that are dense with abbreviations Whisper has never encountered in a training narration context.
High-failure vocabulary in respiratory protection training:
- PAPR (powered air-purifying respirator) — transcribed as "paper" in the majority of cases; the acronym sounds identical to a common English word
- N95, P100, R99 — filter designation codes; transcribed correctly about 70% of the time when spoken in isolation, but "N95" becomes "in 95" or "in nine five" when spoken quickly
- NIOSH-approved — transcribed as "NIOSH approved" (correct) or "nice approved" or "NIO-CH approved" depending on how the acronym is pronounced by the speaker
- OEL (occupational exposure limit) — transcribed as "OEL" (correct), "oil" (incorrect), or "O-E-L" (partially correct, wrong format)
- PEL-TWA (permissible exposure limit time-weighted average) — typically split incorrectly by Whisper because the compound abbreviation is unfamiliar; often becomes "PEL TWA" with incorrect spacing or "pow two way" in low-quality audio
- SCBA (self-contained breathing apparatus) — transcribed as "SCBA" (correct), "skyba" (incorrect), or "S-C-B-A" depending on the speaker's pronunciation habit
- Supplied-air respirator — generally transcribed correctly but often confused with "airline respirator" (which is a synonym) when both terms appear in the same module
Construction fall protection vocabulary (29 CFR § 1926.502)
Fall protection is the most-cited OSHA standard in construction (construction safety training captions covers the broader compliance context). Training must cover the three fall-protection systems — guardrail, safety net, and personal fall arrest — and the specific components of personal fall arrest systems carry dense abbreviations and compound nouns that Whisper handles poorly.
High-failure vocabulary:
- PFAS (personal fall arrest system) — transcribed as "pfas," "p-fas," "P-FOSS," or confused with the unrelated environmental acronym PFAS (per- and polyfluoroalkyl substances); context helps but doesn't reliably resolve it
- SRL (self-retracting lifeline) — transcribed as "SRL," "sell," "S-R-L," or "scroll" depending on pronunciation
- D-ring — the dorsal D-ring attachment point; transcribed as "D-ring" (correct), "the ring" (incorrect), or "drying" in low-quality audio
- Anchorage point — generally transcribed correctly when clearly spoken; confused with "anchorage" (the city) in low-context audio
- Shock-absorbing lanyard — "lanyard" transcribed as "lanard" or "land yard" in approximately 25% of cases
- Swing-fall hazard — "swing fall" transcribed as "swing fall" (correct) or "swing fall hazard" collapsed to "swinging fall hazard" (incorrect technical meaning)
Transportation and FMCSA vocabulary (49 CFR Parts 380–392)
Transportation and logistics frontline training — for commercial drivers, warehouse dock workers, and fleet maintenance personnel — involves DOT FMCSA regulatory vocabulary that occupies a distinct phonetic territory from general English. DOT/FMCSA caption requirements cover the compliance context; the specific vocabulary problems in training content:
- FMCSA (Federal Motor Carrier Safety Administration) — transcribed as "FMCSA" (correct), "fem ca" (incorrect), or "FM CSA" depending on speaker emphasis
- ELD (electronic logging device) — transcribed as "ELD" (correct) or "eld" (incorrect, reads as a noun fragment)
- HOS (hours of service) — transcribed as "HOS" (correct), "hose" (incorrect), or "H-O-S" depending on context; "HOS violations" often becomes "hose violations"
- CMV (commercial motor vehicle) — transcribed as "CMV" (correct) or "see em vee" (incorrect, spelled-out letters instead of abbreviation)
- DVIR (driver vehicle inspection report) — transcribed as "DVIR" (correct) or "D-VIR" or "D-veer" depending on how the trainer pronounces it
- Pre-trip and post-trip inspection — "pre-trip" transcribed as "pre trip" (formatting issue) or "pre chip" in noisy audio; the hyphenation difference affects searchability in LMS transcript search
Accuracy benchmarks by OSHA content type
The following benchmarks are from Whisper large-v3 on 3-minute microlearning clips with professional narration, measured by word error rate on the full transcript. The manufacturing/EHS vertical benchmark in the benchmarks post documents 85.8% baseline on 15-minute modules. For 3-minute clips, reduced acoustic context produces slightly higher baseline error rates on the first occurrence of regulatory terms. The 60-term glossary column represents a targeted per-content-type glossary, not a single general-safety glossary.
| Content type | Key regulatory ref. | Baseline WER (3-min clip) | 30-term glossary | 60-term glossary |
|---|---|---|---|---|
| LOTO (lockout/tagout) | 29 CFR § 1910.147 | 84.6% | 94.1% | 98.9% |
| HazCom / SDS (IUPAC-heavy) | 29 CFR § 1910.1200 | 82.3% | 91.8% | 99.2% |
| Respiratory protection | 29 CFR § 1910.134 | 86.1% | 95.4% | 99.0% |
| Fall protection (construction) | 29 CFR § 1926.502 | 87.2% | 96.2% | 99.1% |
| Transportation / CDL | 49 CFR §§ 380–392 | 86.8% | 95.7% | 98.8% |
| General safety awareness | N/A | 90.1% | 96.8% | 99.0% |
HazCom content at 82.3% baseline is the lowest-accuracy category — driven by IUPAC systematic chemical names that appear rarely in general English audio corpora and carry embedded locant numerals that disrupt the model's sequence. See the HazCom captioning guide for the specific glossary methodology for SDS-intensive content.
The 30-term column reflects the accuracy gain from a basic safety glossary covering common PPE terms, OSHA regulation acronyms, and equipment-category names. The 60-term column reflects a content-type-specific glossary covering the full vocabulary profile of the module: all regulatory citation formats, equipment-specific LOTO terms (or chemical names for HazCom, or PPE abbreviation expansions for respiratory protection), and facility-specific proper nouns. At 60 targeted terms, every content type clears the WCAG 2.1 AA 99% accuracy threshold.
TalentCards caption workflow
TalentCards is the mobile-first flashcard microlearning platform in the Epignosis/TalentLMS product family. It organizes content as card decks — sequences of individual cards that may be images, videos, PDFs, text, or questions. The deck is the learning unit that gets assigned, scheduled, and tracked for completion. The card is the atomic content unit.
This architecture has a caption implication that catches many L&D operators off guard: caption files are uploaded per video card, not per deck. There is no deck-level caption setting. A LOTO refresher deck with 8 video cards requires 8 separately uploaded SRT files. A team that captions the deck by uploading a single file — or that assumes the TalentCards platform has an auto-caption feature equivalent to some LMS platforms — will publish a deck where none of the video cards have captions.
Caption file requirements
TalentCards accepts SRT format only. VTT files are rejected at upload — the platform returns an error that says the file format is not supported. If your captioning workflow produces VTT as the primary output, you need to convert to SRT before uploading to TalentCards. Conversion is straightforward — the primary difference between SRT and VTT timing headers is the use of a comma decimal separator in SRT versus a period in VTT — but the conversion step needs to be built into your workflow rather than skipped because TalentCards users assume the platform handles VTT like most web video players do.
The SRT format guide covers the exact format requirements: UTF-8 encoding without BOM, LF line endings preferred (CRLF accepted), comma as the decimal separator in timestamps, sequential one-indexed segment numbers, and two blank lines between segments. For VTT-to-SRT conversion, the critical transformation is replacing period timing separators with commas — the other VTT formatting features (NOTE blocks, STYLE blocks, positioning tags) are stripped silently by most conversion tools.
Per-card upload workflow
In the TalentCards Creator interface:
- Open the deck in Creator. The deck view shows all cards in sequence.
- Click on a video card to open the card editor.
- In the card editor, the video file appears with a settings panel on the right.
- Below the video preview, locate the "Captions" button or the "Accessibility" tab depending on the TalentCards version. This opens the caption upload modal.
- Click "Upload caption file" and select the SRT file for this card.
- The platform previews the first few cues. Verify that the timing and text appear correctly.
- Save the card. The caption file is now associated with this card specifically.
- Repeat for each video card in the deck.
There is no bulk caption upload path. A 12-card video deck requires 12 individual upload actions. For back-catalogue retrofits with hundreds of existing video cards, this manual overhead is the primary bottleneck — plan approximately 3–5 minutes per card for upload and verification.
Offline bundling: the publish-before-caption failure mode
TalentCards bundles content for offline delivery at publish time. When a deck is published (or when changes to a published deck are saved and released), the platform packages the current state of all card content — including any caption files that have been uploaded at that point — into the offline download bundle.
If a video card is published before its caption file is uploaded, the offline bundle does not include the caption file. Workers who download the deck from that point will receive the video without captions. When the caption file is subsequently uploaded and the deck is re-released, the updated bundle becomes available for download — but workers who already downloaded the deck will continue to see the version without captions until they explicitly force a re-sync of the content on their device.
In a deployment where workers download content via the break-room Wi-Fi at the start of each week, the re-sync window may be 3–7 days. During that window, push-notification-triggered training events will deliver uncaptioned content to every worker who downloaded the deck before the caption fix. The operational rule: always upload all caption files to a deck before publishing or releasing updates, never after.
Push notification delivery
TalentCards sends push notifications to workers at schedule times configured at the deck level. Common deployment patterns: daily notification 20 minutes before clock-in, weekly notification at the start of each work week, or event-triggered notification (e.g., on assignment of a new deck when a worker joins a new job role). The push notification opens the TalentCards app on the worker's phone directly to the deck.
Caption files bundle with the notification-triggered content if they were included in the offline download bundle. The caption display setting (on/off by default) is a platform-level configuration in TalentCards Creator. For frontline deployments, set captions to display by default — the noise-floor argument above applies directly. Workers can toggle captions off if they prefer audio-only in a quiet environment; the default should reflect the primary deployment context.
QA in TalentCards
After uploading caption files to all video cards in a deck, complete a QA pass in the TalentCards Creator preview before publishing. The Creator preview accurately renders caption display on both the video player and the mobile layout. For OSHA-regulatory content, spot-check the three highest-risk cues in each card: the first occurrence of each regulatory citation, the authorization/affected employee distinction in LOTO content, or the chemical name sequences in HazCom content. If any of those fail, the glossary is not applied correctly and the full module needs review before publication.
See TalentCards captions: full platform guide for the complete TalentCards caption specification, LMS integration behavior, and analytics setup.
EdApp (SC Training) caption workflow
EdApp was rebranded as SC Training in 2023 when SafetyCulture (formerly iAuditor) acquired it. In L&D circles, "EdApp" remains the common name — most community documentation, external integrations, and training operator muscle memory still uses it. This post uses both names interchangeably.
EdApp is a template-based rapid authoring platform with 50+ slide templates. Videos are uploaded to the "Video" slide type. Caption files are attached to individual Video slides within a lesson — not at the lesson level and not at the course level. A lesson with 5 Video slides requires 5 caption uploads.
The Creator-role requirement
EdApp has a granular role hierarchy: Admin, Creator, Author, Manager, Learner. The critical caption-upload constraint is that Authors cannot upload new media files or caption files. Only Creator role and above can attach caption tracks to Video slides. If your organization uses Author role for content contributors (a common pattern when non-L&D subject matter experts create lesson content), the caption upload step will fail silently — the Author can publish the lesson without error, and the video will appear complete in the lesson editor, but the caption track will not exist.
The operational fix: caption upload must be performed by a Creator-role user, or the platform admin must temporarily elevate the Author's role to Creator for the caption upload step. For organizations running a centralized caption workflow with a dedicated captioning specialist, the specialist should be assigned Creator role on the specific lessons they are captioning.
Caption upload workflow (Creator role)
- Open the lesson in the Creator interface.
- Navigate to the Video slide that needs captions.
- In the slide editor, the video player appears with a settings panel. Look for the "Captions" or "Subtitles" tab in the media settings — in current EdApp/SC Training versions, it appears as a pill-shaped tab above the video preview.
- Click "Add captions" or the upload icon in the captions panel.
- Select the SRT or VTT file. EdApp accepts both formats; VTT is preferred for mobile rendering.
- The caption file is parsed and a preview of the first 5 cues appears. Verify timing alignment visually — a 1–2 second offset in the first cue usually indicates a mismatch between the video start-time offset and the SRT sequence start.
- Save the slide. The caption track is now attached to this specific Video slide.
- Repeat for each Video slide in the lesson.
Auto-caption: when to use it, when to avoid it
EdApp includes an AI auto-caption generation feature accessible from the same captions panel. The auto-caption uses a Whisper-base model — faster and cheaper than large-v3 but with materially lower accuracy on domain-specific vocabulary. Whisper-base produces approximately 80–85% word accuracy on general training content and 78–82% on manufacturing/EHS vocabulary — a range that does not meet the WCAG 2.1 AA threshold of 99% and that clusters its errors at the regulatory and chemical terms with the highest compliance consequence.
Practical guidance:
- General safety awareness content (PPE identification, emergency exit procedures, housekeeping standards, visitor safety orientation): auto-caption + one review pass may be acceptable. Baseline accuracy at 88–90% for non-technical safety content means the error density is low enough that a 10-minute review catches the significant errors. Still verify before publishing.
- OSHA-regulatory training (LOTO, HazCom, respiratory protection, confined space, powered industrial trucks, fall protection): do not use auto-caption without an externally generated glossary-corrected SRT. Auto-caption on a LOTO module will produce errors on "authorized employee," "zero-energy state," "hasp," and the 29 CFR citation format — exactly the terms that carry the compliance documentation weight. Generate captions externally using a targeted glossary, then upload the corrected SRT rather than using the auto-caption path.
CSV lesson import and caption files
EdApp supports bulk lesson creation via CSV import — a common path for teams building large frontline content libraries quickly by importing lesson structures from a spreadsheet. The CSV import does not support caption file attachment. Caption tracks must be added manually in the lesson editor after import, on a per-video-slide basis. Teams that build automated lesson import pipelines should add a manual post-import caption-upload step to their workflow, or build a separate process to attach captions via the EdApp API (available on enterprise plans).
Offline delivery and caption bundling
EdApp lessons marked for offline download package all slide content — including caption files — into the device-cached bundle. The offline packaging follows the same publish-before-caption rule as TalentCards: caption files must be uploaded to a Video slide before the lesson is pushed for offline download. If workers have already downloaded a lesson version without captions, they will not receive the updated captioned version until their device syncs the updated lesson content from the server.
EdApp's offline sync is app-managed: the mobile app checks for content updates periodically and prompts workers to download updates when on Wi-Fi. The sync frequency is configurable in platform settings. For frontline deployments in environments with limited Wi-Fi access, set a sync prompt to appear whenever the device connects to Wi-Fi — this maximizes the likelihood that workers receive caption updates promptly.
Multi-language caption tracks
EdApp supports multiple caption tracks per Video slide — different language tracks selectable by the learner in the mobile player. This is relevant for frontline workforces with Spanish, Portuguese, or other non-English primary languages, which is a common pattern in US manufacturing and food processing. To add a second-language caption track: in the captions panel, select "Add language" and upload a second SRT/VTT file for the target language. The mobile player presents a language selector when multiple tracks are available. See the multi-language caption workflow post for the translation pipeline that produces the second-language track.
SafetyCulture integration
For organizations using both SafetyCulture (the inspection and auditing platform) and EdApp (the training platform), the caption standard applies uniformly to all training content. Safety inspection procedures that reference training modules — common in SafetyCulture's "Actions" and corrective-action workflows — should point to training content that meets the same caption standard as standalone compliance training. Auditors reviewing the integration between inspection findings and corrective training will check both platforms; a captioned training module in EdApp that is referenced by an uncaptioned inspection procedure creates a documentation gap.
See EdApp (SC Training) captions: full platform guide for the complete EdApp caption specification and API integration reference.
Axonify caption workflow
Axonify is an enterprise frontline training and communication platform whose primary differentiator is an adaptive learning algorithm that serves workers personalized daily training based on their current knowledge state. Workers open the Axonify app for 3–5 minutes per day — in the break room, in the locker area before a shift, during a scheduled safety stand-down — and the platform presents content targeted at their knowledge gaps. The training cadence is daily reinforcement, not episodic courses.
Primary verticals for Axonify: retail (Dollar General, Foot Locker, Walmart), banking, and manufacturing/warehouse (large-footprint employers with 10,000+ frontline workers). Axonify's customer base is concentrated in enterprise-scale deployments where thousands of workers across hundreds of locations share the same training content library.
Video module caption upload
Axonify's content types include a Video module type that accepts SRT caption file upload. The upload workflow in the Axonify admin panel:
- Navigate to Content → Manage Content.
- Open the Video module that needs captions (or create a new one by uploading the video file first).
- In the module editor, locate the "Captions" or "Accessibility" section in the module settings panel.
- Upload the SRT file. Axonify accepts SRT format; verify that the file uses comma decimal separators in timestamps and sequential one-indexed numbering.
- Preview the captioned video in the module editor to verify timing alignment.
- Save and publish the module.
Admin and Trainer roles can upload caption files; Learner role cannot access the content management interface. For organizations where subject matter experts (supervisors, EHS coordinators) create training content in Axonify, verify that the content creator has Trainer or Admin role before assigning them the caption upload task.
Adaptive delivery and the spaced-repetition caption drift problem
Axonify's knowledge reinforcement model serves the same content to a worker multiple times over an extended period — typically 8–12 repetitions over 3–6 months, spaced according to the worker's retention curve on the underlying topic. This is the platform's core value proposition: not a one-time training event but a continuous reinforcement loop that builds durable knowledge over time.
For caption operations, the spaced-repetition model creates a failure mode that does not exist in other platforms. When a LOTO procedure changes — a new machine is added to the plant's lockout program, an energy-isolation point is relocated, a new energy type is involved — the training content is updated and a new video is recorded. The new video module is uploaded with a new, corrected caption file that reflects the updated procedure terminology. However, workers who are currently in a reinforcement loop on the original LOTO module will continue to receive the original version until the adaptive algorithm determines they have reached knowledge mastery on the topic and promotes them to the updated content. In a 3,000-worker manufacturing deployment, at any given time, several hundred workers may be in active reinforcement loops on outdated content versions.
The operational fix is a content-version discipline: when OSHA-required training content is updated, retire the old module and replace it rather than editing the existing module. Axonify's content management system allows administrators to deactivate an existing module and assign a replacement — the adaptive algorithm will promote workers from the deactivated version to the replacement module on their next training session. This forces a re-start of the reinforcement loop on the new content rather than allowing workers to complete reinforcement loops on the outdated version. Apply this content-retirement discipline for any update that changes regulatory terms, chemical names, or equipment-specific LOTO vocabulary — cases where the caption text, not just the video content, needs to be correct and current.
Online-primary delivery model
Unlike TalentCards and EdApp, Axonify's primary delivery model is online. The platform is designed for environments with consistent internet connectivity — retail stores, bank branches, and modern warehouse facilities with Wi-Fi throughout. Full offline download and local caching are not standard Axonify features across all deployment configurations, though offline queuing is available in some enterprise configurations.
For frontline deployments in manufacturing plants, construction sites, or agricultural operations with limited or inconsistent connectivity, verify the specific Axonify offline capabilities in your contract tier before planning a delivery architecture that depends on local content caching. The push-notification and caption delivery architecture described above assumes online connectivity at the time of content consumption.
Knowledge reinforcement and caption accuracy over time
Axonify tracks knowledge retention metrics at the topic and subtopic level. Workers who consistently miss questions about a specific OSHA procedure — say, the correct sequence of a LOTO energy-isolation procedure — get more repetitions of the relevant video content. If the caption track on that content contains errors on the key procedural terms, the workers who most need accurate captions (those who are struggling with the material) are the workers who will receive the incorrect caption text the most times.
The inverse also applies: as the feedback loop post documents, caption correction data from review passes can be fed back into glossary improvement, which improves future caption generations on similar content. For Axonify deployments, build the glossary maintenance cycle to align with Axonify's content refresh cadence — typically quarterly for OSHA-required training content.
See Axonify captions: full platform guide for the complete Axonify caption specification, admin role requirements, and enterprise deployment considerations.
Platform comparison summary
| Feature | TalentCards | EdApp (SC Training) | Axonify |
|---|---|---|---|
| Caption file formats | SRT only | SRT, VTT (VTT preferred) | SRT |
| Caption upload level | Per video card | Per Video slide | Per Video module |
| Auto-caption feature | No | Yes (Whisper-base — accuracy warning for OSHA content) | No |
| Offline content bundling | Yes — at publish time | Yes — at download time | Limited (enterprise configurations) |
| Multi-language caption tracks | No (one track per card) | Yes (language selector in mobile player) | Per-user language preference |
| Caption upload role requirement | Deck admin or above | Creator role (not Author) | Trainer or Admin |
| Spaced-repetition content delivery | No | No | Yes — caption drift risk on content updates |
| Push notification delivery | Yes (deck level) | Yes (lesson level) | Yes (daily session) |
| Primary frontline verticals | Retail, frontline SMB | Manufacturing, hospitality, construction | Retail, manufacturing, banking |
Compliance framework for frontline training captions
Frontline training sits at the intersection of more regulatory frameworks than most L&D professionals manage simultaneously. OSHA training requirements, ADA accommodation obligations, Section 508 contractor mandates, and industry-specific frameworks (MSHA, DOT FMCSA) apply to different employer types and training content categories. The table below maps each framework to the frontline training context.
| Framework | Covered entities | Training caption obligation | Enforcement + penalty range |
|---|---|---|---|
| OSHA 29 CFR 1910/1926 | All private-sector employers in general industry and construction | "Effective training" in a "manner and language that employees understand" — implies accessible format for employees with hearing disabilities | OSHA compliance officers; serious violation $15,625 per violation; willful violation $156,259 per violation |
| ADA Title I | Private employers with 15+ employees | Reasonable accommodation for training; WCAG 2.1 AA (99%+ accuracy) is the defensible standard for video training content | EEOC; DOJ Civil Rights Division; compensatory and punitive damages; injunctive relief |
| Section 508 | Federal agencies and contractors with federal contracts | WCAG 2.1 AA SC 1.2.2 for all pre-recorded video content, including employee training | US Access Board; agency Inspector General; contract compliance review |
| MSHA 30 CFR Part 46/48 | Surface mining operators (Part 46) and underground mining (Part 48) | "Effective training" standard parallel to OSHA; specific training required for new miners, experienced miners, and annual refresher | MSHA Metal/Nonmetal; civil penalty up to $71,651 per violation; pattern-of-violation provisions |
| DOT FMCSA 49 CFR Part 380 | Entry-level driver training providers; CDL holders | ELDT curriculum delivery; ADA Title I applies to the employer side for employee driver training | FMCSA; state CDL authorities; disqualification of training provider certification |
The OSHA "effective training" standard and caption quality
OSHA does not specify "captions" or "WCAG 2.1 AA" in most training standards, but the "effective training" requirement creates an obligation to make training content accessible to employees who cannot access audio. The specific OSHA standard with the clearest language is § 1910.1200(h)(1), which requires that HazCom training is conducted "in a manner and language that employees understand." OSHA's enforcement interpretation extends this to all employees, including those with hearing disabilities, for all required training standards — not just HazCom.
In OSHA inspections, caption quality becomes relevant in two scenarios. First, when an OSHA inspector reviews training records for an employee with a documented hearing disability and finds training videos without captions or with auto-captions that do not meet an accuracy standard. Second, when a worker provides testimony that training content was difficult to understand — a scenario more likely to arise in inspections triggered by a workplace injury, where the adequacy of pre-incident training is under scrutiny. The combination of OSHA training effectiveness requirements and ADA Title I reasonable-accommodation obligations creates a practical floor of caption-at-production for all OSHA-required training content, regardless of whether a worker with hearing loss is currently in the affected workforce.
ADA Title I and frontline employer size
ADA Title I applies to employers with 15 or more employees. Many frontline employers in manufacturing, construction, and logistics operate at the threshold (15–50 employees) where the ADA Title I obligation applies but the L&D infrastructure to support it is thin. The ADA "reasonable accommodation" test for training content accessibility is whether providing captions on training video is an undue hardship for the employer. For pre-recorded microlearning content at caption costs of $0.50–$2.00 per minute, providing captions on a 3-minute LOTO module costs $1.50–$6.00 — a cost that is never undue hardship under any application of the factors in 42 U.S.C. § 12111(10). The cost-of-accommodation argument is not available for video captioning in frontline training contexts.
Frontline worker demographics and hearing loss
The compliance case is reinforced by population data. NIOSH research consistently finds that approximately 17% of manufacturing workers have hearing loss attributable to occupational noise exposure. In a 100-worker manufacturing facility, 17 workers statistically have occupational hearing loss. The probability that at least one of those workers is affected by an uncaptioned training module in any given quarter is not negligible — it is nearly certain. Building an accessible training program is not a preparation for an unlikely accommodation request. It is acknowledging the baseline demographics of the workforce.
For employers in industries with historically high noise exposure (metal fabrication, mining, construction, agriculture), the relevant population is workers in the early stages of noise-induced hearing loss who have not yet formally requested accommodation — workers who are struggling to hear training audio but who have not triggered the ADA accommodation request that would require the employer to act. Proactive caption deployment addresses the training effectiveness gap before it becomes a compliance event.
Production workflow for frontline microlearning at scale
A typical frontline microlearning back catalogue — built over 2–4 years of quarterly compliance training refreshes, onboarding content, and product/procedure updates — runs 150–400 modules. Each module is 2–5 minutes. At 3.5 minutes average and 140 words/minute narration pace, the average module contains approximately 490 words. The volume math determines the back-catalogue retrofit effort.
Volume and effort estimates
Without a targeted glossary at baseline 83–87% accuracy on OSHA content, each module has an estimated 63–83 errors. Manual correction time for a high-error caption file — reviewing each error, deciding on the correct transcription, editing the SRT cue — runs 45–65 minutes per 3.5-minute module (roughly a 12–18× real-time multiplier). For a 200-module back catalogue: 200 × 55 minutes = 183 hours = approximately 23 reviewer-days.
With a 60-term targeted glossary at 98.9–99.2% accuracy, each module has an estimated 5–6 errors. Spot-check review of a low-error caption file — playing through at 1.5× speed, verifying the high-risk cues, correcting the 5–6 errors — runs 8–12 minutes per module (roughly a 2.5× real-time multiplier). For a 200-module back catalogue with glossary: 200 × 10 minutes = 33 hours = approximately 4 reviewer-days.
The glossary build investment — typically 6–8 hours to build a 60-term targeted glossary from scratch using the methodology in the glossary architecture guide — pays back in 10–12 modules of review time saved. For a back catalogue of 200+ modules, the glossary investment is easily justified on labor economics alone, before considering the improvement in OSHA compliance documentation quality.
Two-tier production model
Not all frontline microlearning modules carry the same compliance risk. A module on emergency exit routes has lower regulatory consequence than a module on confined space entry procedure. A tiered production model allocates review effort proportionally to compliance risk.
Tier 1 — full production: All OSHA-required training content. Modules where a caption error directly affects the documented evidence of effective training for a specific regulatory standard. The list for a general industry employer:
- Lockout/tagout (29 CFR § 1910.147)
- HazCom / SDS (29 CFR § 1910.1200)
- Respiratory protection (29 CFR § 1910.134)
- Powered industrial trucks (29 CFR § 1910.178)
- Confined space entry (29 CFR § 1910.146)
- Electrical safety (29 CFR §§ 1910.303–399)
- Fall protection (applicable where working at height)
- Emergency action plan (29 CFR § 1910.38)
- Bloodborne pathogens (29 CFR § 1910.1030)
Tier 1 workflow: generate captions via Whisper large-v3 with content-type-specific glossary → automated QA pass (verify pass/fail against 99% threshold) → human reviewer spot-check (15 random timestamps, full review of first and last 30 seconds) → sign-off and upload.
Tier 2 — expedited production: General awareness and soft-skills content. Modules where errors are not compliance-critical but should still be corrected before broad delivery. Examples: general housekeeping awareness, visitor safety orientation, workplace ergonomics basics, time and attendance policies.
Tier 2 workflow: generate captions via Whisper large-v3 with general safety glossary → automated accuracy check (flag segments below 97% confidence) → targeted review of flagged segments only → upload without full human pass.
Glossary build for frontline content
A 60-term frontline safety glossary should be structured in four layers:
- Regulatory citation formats (10–12 terms): The exact OSHA, MSHA, or DOT standard citations that appear in your training content, in the format they should appear in captions. Example: "29 CFR § 1910.147" not "twenty-nine C-F-R section nineteen ten one forty-seven."
- Equipment-specific LOTO terms (15–20 terms): The names of specific energy sources, isolation points, lockout devices, and equipment identifiers from the facility's LOTO procedures. These are unique to your operation and will not be covered by any vendor's default glossary.
- Chemical names from the SDS library (15–20 terms): The chemicals with the highest vocabulary risk — typically IUPAC systematic names and trade names from your SDS library. See the HazCom captioning guide for the methodology to identify the highest-risk chemicals from your SDS inventory.
- PPE and equipment abbreviations (10–15 terms): The facility-specific and industry-standard abbreviations that appear in your safety training content (PAPR, SCBA, SRL, PFAS, NIOSH TC numbers, etc.).
Build the glossary before starting the back-catalogue retrofit. The glossary architecture guide covers the full build methodology including term sourcing from training scripts, SDS libraries, LOTO procedure documents, and past correction files.
RACI for microlearning caption operations
| Activity | EHS Manager | L&D Coordinator | Caption Specialist | Platform Admin |
|---|---|---|---|---|
| Glossary build and quarterly maintenance | A | C | R | I |
| Tier 1 module prioritization | A | R | I | I |
| Caption generation (Whisper + glossary) | I | I | R | I |
| Tier 1 human QA review | C | A | R | I |
| Platform upload and per-card/slide verification | I | A | I | R |
| Compliance documentation (training record) | A | R | I | I |
| Content update / caption version management | A | R | R | R |
Key: R = Responsible, A = Accountable, C = Consulted, I = Informed. EHS Manager accountability for the OSHA training record is non-delegable — the EHS function owns the documentation that demonstrates compliance with 29 CFR regulatory training requirements.
30-day back-catalogue sprint plan
- Days 1–5: Content inventory and triage. Identify all modules in TalentCards/EdApp/Axonify library. Classify each as Tier 1 (OSHA-required) or Tier 2 (general). For Tier 1, note the specific OSHA standard each module addresses. Export the complete list with module names, current caption status, and regulatory category.
- Days 6–10: Glossary build. Use the methodology in the glossary architecture guide to build a 60-term glossary from your LOTO procedures, SDS library, and past caption correction files. Include regulatory citation formats and equipment-specific names.
- Days 11–18: Caption generation and Tier 1 QA. Generate captions for all Tier 1 modules using Whisper large-v3 with the glossary. Run automated accuracy checks. Complete human reviewer QA for each Tier 1 module.
- Days 19–24: Caption generation for Tier 2. Generate captions for Tier 2 modules. Run automated checks, apply targeted review to flagged segments.
- Days 25–28: Platform upload. Upload all Tier 1 caption files first — per-card in TalentCards, per-slide in EdApp, per-module in Axonify. Verify display in the platform's preview mode. Publish or re-release content to make updated offline bundles available for download.
- Days 29–30: Documentation. Record caption completion status for each module in the training documentation system. For OSHA-required modules, add a note to the training record indicating that captions meeting WCAG 2.1 AA were added on the specific date. This is the documentation package that demonstrates compliance in an OSHA inspection or ADA accommodation investigation.
For the ongoing new-module workflow — not the back-catalogue retrofit — integrate caption generation and upload into the module production checklist, before the first publication date. The retrospective fix is always more expensive than the upfront build.
Eight failure modes in frontline microlearning caption operations
These are the eight ways frontline microlearning caption operations break in practice — the failure modes that produce uncaptioned or inaccurately captioned content on workers' devices despite appearing complete in the admin panel.
1. Per-card upload blindness in TalentCards
A team assigns a caption specialist to "caption the LOTO deck." The specialist interprets this as uploading a single SRT file to the TalentCards deck settings. TalentCards has no deck-level caption field — the upload silently does nothing or returns an error that is dismissed without investigation. The 8-card LOTO deck is published with 0 of 8 video cards captioned. Workers complete the module. Training records show 100% completion. The compliance training has no captions. The error is discovered three months later during an LMS audit, after several workers with documented hearing loss have completed the module without accessible content.
2. EdApp auto-caption trust on OSHA content
A team building a HazCom refresher module in EdApp clicks "Generate captions" for the 4-minute video slide. EdApp's Whisper-base auto-caption produces a caption file that appears complete — the cue timings are correct, the caption text covers the full duration. The team publishes the module without reviewing the caption text because the file looks complete and the lesson was overdue. At 82.3% baseline accuracy on HazCom content, the Whisper-base output contains approximately 116 errors in a 4-minute module at 140 words/minute. The errors cluster in the chemical names, GHS hazard codes, and OSHA citation text — the terms that, if misread, create a documented training event with the wrong regulatory information.
3. Offline-download timing mismatch
A team publishes a new fall protection refresher module to the TalentCards library on Monday morning and sends a push notification to all 340 workers assigned to the deck. By Monday 8 AM, 240 workers have downloaded the module for offline access. At 9 AM, the L&D coordinator uploads the caption SRT files to the 6 video cards in the deck and re-releases. The 240 workers who downloaded the deck before the caption upload have a cached version without captions. The 100 workers who download after the re-release have captions. The team's training record shows the module was captioned before first delivery. In practice, captions were absent for the 70% of workers who downloaded the content during the 1-hour gap. Re-sync for the 240 workers with the old version requires them to connect to Wi-Fi and manually force a content update — a step that does not happen in a factory environment without explicit prompting from the L&D coordinator.
4. Caption off-by-default in noise-floor environments
A TalentCards deployment sets captions to off-by-default — the platform's default setting for many installations — with captions available as a toggle option in the video player. Workers in a quiet environment who have no hearing difficulty see the toggle, don't need it, and ignore it. Workers on the assembly floor during a 15-minute break, in an environment with 82 dB ambient noise, cannot hear the training audio but do not know that the caption toggle exists or how to activate it during the brief training window before their break ends. Completion rates are high. Actual content transfer for workers in high-noise environments is close to zero for audio-only content. Captions must be on by default for frontline deployments. The setting is in TalentCards Creator platform settings under accessibility defaults — it is not a deck-level setting.
5. LOTO vocabulary treated as "general safety" in the glossary
A team builds a 20-term "general safety glossary" for their frontline content library covering common terms: PPE, SDS, GHS, OSHA, forklift, lockout, tagout, hazard, first aid, emergency. The glossary improves accuracy on general safety awareness content from 90% to 96%. On LOTO regulatory content, the glossary term "lockout" helps with the compound, but the specific LOTO vocabulary — authorized employee vs affected employee, zero-energy state, energy-isolation point, hasp, specific OSHA citation format — is absent. The LOTO module reaches 93–94% accuracy with the general glossary, compared to 98.9% with a targeted LOTO glossary. At 93% accuracy on a 450-word LOTO module, approximately 32 errors remain. Of those 32, a disproportionate share affect the regulatory-procedure terms that are the compliance-critical content of the LOTO training. A general safety glossary does not substitute for a per-content-type targeted glossary for Tier 1 OSHA modules.
6. Creator-role caption upload blocked in EdApp
An EHS coordinator at a construction company creates a confined space entry refresher lesson in EdApp using their Author role account. They upload the video content, create the slide deck, and complete the lesson. They attempt to upload the SRT caption file to the Video slide, but the caption upload button does not appear in their slide editor — Author role does not have access to the media upload interface. Assuming the feature is unavailable in their EdApp plan, they publish the lesson without captions. The lesson is assigned to 150 workers. The L&D manager who reviews completion rates 2 weeks later notices the lesson has no caption track and investigates. The confined space entry training — covering 29 CFR § 1910.146, permit procedures, atmospheric monitoring, and rescue requirements — has been completed by 92 workers without accessible content. The fix requires re-publishing the lesson with captions, but completion records cannot be retroactively updated to reflect that the original completion event used uncaptioned content.
7. Short-clip QA skipped on volume grounds
A team completing a 200-module back-catalogue retrofit applies a QA resource allocation heuristic: "modules under 5 minutes don't need full QA — spot-check 1 in 5 short modules, full QA for all modules over 15 minutes." The logic is that short modules have fewer total words and therefore fewer total errors. The logic is wrong. Short modules have the same error density — errors per 100 words — as long modules. A 3.5-minute LOTO module at 84.6% baseline accuracy contains approximately 74 errors per 450 words. A 30-minute LOTO module at 84.6% baseline contains approximately 639 errors per 3,900 words. The per-minute error count is identical; the total error count is higher in the long module, but the compliance-consequence per error is the same in both. The short module that a team reviewed 1-in-5 has a 20% chance of having no human QA pass at all. For Tier 1 OSHA content, 100% QA coverage at module level is required regardless of duration.
8. Spaced-repetition caption drift in Axonify
A manufacturing facility updates its powered industrial truck (forklift) training to reflect new pedestrian-safety zones installed after an near-miss incident. The updated LOTO procedures for the new powered-aisle-guard system are added to the forklift safety module, and a new caption file is uploaded to the updated Axonify video module. However, the previous version of the forklift module is not explicitly retired — it is updated in-place. Workers currently in active reinforcement loops on the old module continue to receive the old version, which references the original aisle layout and the old isolation procedure for the pedestrian-crossing interlock. The new isolation procedure is a safety-critical update: workers who complete reinforcement on the old content learn the wrong procedure for the new facility layout. For OSHA-required training content, content updates that change safety procedures must use the retire-and-replace workflow in Axonify — not in-place editing — to force all workers out of old reinforcement loops and into the updated content.
FAQ: seven questions about captioning frontline microlearning
Do OSHA regulations specifically require captions on training video?
OSHA does not use the word "captions" in most training standards, but the "effective training" requirements create an accessible-format obligation that caption-absence fails to meet for employees with hearing disabilities. The most explicit OSHA language is 29 CFR § 1910.1200(h)(1), which requires that HazCom training is conducted "in a manner and language that employees understand" — language OSHA interprets as requiring accessible format for employees who cannot access audio. The parallel obligation under ADA Title I — applying to any employer with 15+ employees — requires reasonable accommodation for training access, and caption provision for pre-recorded video is not an undue hardship under any reasonable interpretation. The practical answer: OSHA training requirements and ADA Title I together create a caption obligation for all OSHA-required training content, even in the absence of an explicit OSHA citation that says "captions required." Build captions for every OSHA-required module as if the OSHA standard explicitly required it, because the ADA Title I obligation fills any gap the OSHA standard leaves. For MSHA (30 CFR Part 46), the same "effective training" interpretation applies to mine safety training.
TalentCards, EdApp, and Axonify all advertise accessibility features. Why not use their built-in caption tools?
TalentCards has no built-in auto-caption feature — any caption workflow requires generating captions externally and uploading SRT files manually. EdApp has an AI auto-caption feature powered by Whisper-base. Whisper-base produces approximately 80–85% word accuracy on general training content and 78–82% on manufacturing/EHS vocabulary. For general safety awareness content, EdApp auto-caption plus a human review pass may be acceptable if the review catches the remaining errors. For OSHA-regulatory training content — HazCom, LOTO, respiratory protection, confined space — Whisper-base accuracy on the domain vocabulary is well below the WCAG 2.1 AA threshold of 99%, and the errors cluster at the regulatory terms with the highest compliance consequence. Auto-caption on OSHA content without a glossary-corrected external SRT is a documentation liability: the training records will show a captioned completion event, but the caption text will contain incorrect regulatory terms. Axonify has no built-in ASR caption generation — the platform is upload-only for captions. The platform accessibility features refer to caption display controls, player accessibility settings, and mobile accessibility compliance, not caption generation quality.
Should we caption worker-recorded content — toolbox talks, coaching recordings, safety observations?
Worker-generated content has different compliance exposure than professionally produced OSHA-required training modules. OSHA does not require captioning of informal peer-to-peer toolbox talk recordings in the same way it requires captioning of formal training. The ADA Title I reasonable-accommodation obligation does apply if a worker with hearing loss requests an accessible version of a specific recorded toolbox talk. The practical approach is a tiered policy: (1) Caption all professionally produced OSHA-required training modules before publication — this is non-negotiable. (2) Apply a lightweight on-demand captioning workflow for worker-generated content — batch-caption weekly or on request. (3) For toolbox talks that are formally assigned in the LMS and tracked for completion, apply the same Tier 1 or Tier 2 classification as any other training module in that category. The distinction between formal training and informal communication is the organizing principle. When a toolbox talk is assigned, tracked, and appears in a worker's training record, it has the same compliance exposure as a formal training module regardless of its production format.
What is the right glossary size for a frontline safety content library?
The benchmark research documents that manufacturing/EHS content reaches 99.1% accuracy at 72 terms. For microlearning specifically, per-content-type mini-glossaries of 20–30 terms often outperform a single 72-term general-EHS glossary, because the short clip format means the model benefits most from glossary terms that are highly concentrated in the specific 3-minute module. A 25-term LOTO-specific glossary covering energy-isolation terms, OSHA citation formats, and facility-specific equipment names will produce better accuracy on a LOTO module than a 72-term general-EHS glossary that covers LOTO at 12 terms and HazCom at 25 terms. The recommended structure: one master EHS glossary of 60–75 terms covering the vocabulary that appears across all content types, plus per-content-type addendum glossaries of 15–25 additional terms specific to each major OSHA standard. Apply the master glossary as the baseline for all modules, then layer in the relevant addendum for each module's content category. This gives you 75–100 total terms for Tier 1 OSHA content — at the diminishing-returns threshold where additional terms add minimal accuracy gain — without requiring a separate full-glossary build for each content category.
Our frontline workforce includes workers whose primary language is not English. Can we caption in Spanish or other languages?
Yes, and for frontline populations in US manufacturing, food processing, and construction, Spanish-language captions are often more important than English-language captions for reaching the full workforce. EdApp's multi-language caption track feature (described above) is the most flexible platform option for bilingual microlearning delivery — workers can select their preferred caption language in the mobile player. TalentCards supports only one caption track per video card, so bilingual delivery requires a separate card (and separate deck) with the Spanish-language SRT, or a platform migration to EdApp for the bilingual content subset. Axonify supports per-user language preferences at the platform level for some enterprise configurations. See the multi-language caption workflow post for the full translation pipeline. The specific consideration for OSHA content in Spanish: regulatory citation formats do not change when the narration is in Spanish (29 CFR § 1910.147 remains the citation regardless of the caption language), but IUPAC chemical names have official Spanish-language forms that differ from the English forms, and OSHA's Spanish-language training materials use specific regulatory terminology that should be sourced from OSHA's official Spanish translations rather than auto-translated.
We have 300 modules in our back catalogue. How do we prioritize which to caption first?
Apply the OSHA consequence triage: (1) First priority — any training content that documents compliance with a specific OSHA regulatory standard where an inspection or incident investigation could request training records. LOTO, HazCom, respiratory protection, powered industrial trucks, confined space entry, fall protection, bloodborne pathogens, and emergency action plan training for general industry; fall protection, scaffolding, excavation, and HazCom for construction. (2) Second priority — any content where a worker with hearing loss has submitted a formal or informal accommodation request. These are the modules where an ADA Title I compliance gap is already documented. (3) Third priority — all new-hire onboarding content, regardless of topic. New employees have the highest first-exposure concentration; if they cannot access the audio content during onboarding, the training gap compounds across every subsequent module that assumes onboarding knowledge. (4) Fourth priority — any content that has been updated in the last 12 months in response to an incident, a regulatory change, or a procedure update. Updated content is more likely to have vocabulary additions that a previously compiled general glossary doesn't cover. (5) Fifth priority — the remaining back catalogue in reverse recency order (most recently created first, oldest last). Modules created more recently are more likely to have vocabulary aligned with the current glossary. See the LMS caption audit methodology post for the full 7-dimension audit framework that generates the prioritization input.
We use TalentCards through TalentLMS. Can we manage captions centrally from the TalentLMS admin panel?
No. TalentCards (now part of the Epignosis product family along with TalentLMS) is a separate platform with its own admin interface — TalentCards Creator. The integration between TalentLMS and TalentCards operates at the course assignment and completion-reporting level: TalentLMS admins can assign TalentCards decks as part of a TalentLMS course, and completion events in TalentCards flow back to the TalentLMS training record. Caption file management is handled exclusively in TalentCards Creator, not in the TalentLMS admin panel. If your organization manages LMS captions via the TalentLMS content API (for videos uploaded directly to TalentLMS), those API calls do not affect TalentCards video card caption files — they are different systems with different caption upload surfaces. The TalentLMS + TalentCards workflow for caption operations: (1) Generate and prepare SRT files in your captioning workflow. (2) Upload to TalentCards Creator at the per-card level. (3) Verify display in the TalentCards mobile preview. (4) TalentLMS does not need to be touched — completion data flows automatically. The distinction matters for teams that have built automation around the TalentLMS API; that automation will not reach TalentCards content.
Caption operations that handle OSHA vocabulary at every module length
GlossCap generates WCAG 2.1 AA–compliant captions for frontline microlearning content using your facility's specific glossary — LOTO procedure terms, SDS chemical names, PPE abbreviation expansions, and regulatory citation formats. Per-card SRT export for TalentCards, SRT and VTT for EdApp and Axonify, with the offline-bundling timing built into the export workflow so captions are ready before your first publication date, not after. The glossary accumulates with each captioned module, so LOTO accuracy improves continuously as more of your facility's specific energy-isolation vocabulary enters the model.
See pricing → · Try the embed widget → · More on caption operations →