Platform reference · cybersecurity training captions · security awareness training · NIST CSF · MITRE ATT&CK · CMMC 2.0 · HIPAA Security Rule · KnowBe4 · Proofpoint SAT
Cybersecurity training captions: NIST CSF, MITRE ATT&CK technique IDs, SIEM vocabulary, and CMMC 2.0 compliance
Cybersecurity and information security training occupies a singular position in the enterprise training landscape: it is mandatory across virtually every regulated industry (HIPAA Security Rule for healthcare, FISMA for federal government, CMMC 2.0 for DoD contractors, PCI DSS 4.0 for payment card processing), it is mandated at hire and annually thereafter for all personnel — not just security staff — and its vocabulary is among the most technically specific of any training content category. The vocabulary challenge is two-layered and compounding. The first layer is framework and standard vocabulary: NIST CSF 2.0 function and subcategory codes (GV.OC-01, ID.AM-01, PR.AA-05), MITRE ATT&CK technique identifiers (T1566.001 for spear-phishing with malicious attachment, T1059.001 for PowerShell command execution), CVE numbering (CVE-2024-12345), CWE identifiers (CWE-89 for SQL injection), OWASP Top 10 identifiers (A01:2021 — Broken Access Control), ISO/IEC 27001:2022 Annex A control references (A.5.1 through A.8.34), CMMC 2.0 practice identifiers (AT.L2-3.2.1, AC.L1-3.1.1), and PCI DSS requirement numbering (Requirement 12.6.1). This framework vocabulary requires exact alphanumeric formatting — the technique ID "T1566.001" is not the same as "T 1566.001" or "T-fifteen sixty-six zero-zero-one" in the context of a SIEM alert, a threat-intel briefing, or a training exercise where the learner may need to look up the referenced technique in the ATT&CK navigator. The second layer is product and threat-actor vocabulary that does not appear in general STT training corpora: SIEM vendor names (Splunk, CrowdStrike Falcon, SentinelOne, Microsoft Sentinel, IBM QRadar, Elastic SIEM), SOAR platform names (Palo Alto XSOAR, Splunk SOAR, ServiceNow SecOps), EDR/XDR/MDR/CDR solution names, threat-actor group names (APT28/Fancy Bear, APT41/Winnti, Lazarus Group, ALPHV/BlackCat, LockBit 3.0, Scattered Spider, Lapsus$), and malware family names (Cobalt Strike, Emotet, Qakbot, BlackMatter, SUNBURST, BumbleBee) — a vocabulary that is both highly specific and rapidly evolving as new threat actors and malware variants emerge. Generic speech-to-text systems cannot handle either layer consistently, producing caption tracks that systematically misrepresent the technical content of cybersecurity training video. For hearing-impaired security analysts, SOC engineers, DevSecOps practitioners, and the general employee population receiving mandatory security awareness training, the result is that captions fail most severely on precisely the vocabulary items — attack technique identifiers, framework control references, compliance requirement numbers — that are the operative content of the training.
TL;DR
Cybersecurity training video has two vocabulary layers that defeat generic STT: (1) framework and standard alphanumeric identifiers — NIST CSF 2.0 subcategory codes (GV.OC-01), MITRE ATT&CK technique IDs (T1566.001), CVE numbering (CVE-2024-12345), CWE identifiers, OWASP Top 10 identifiers (A01:2021), ISO 27001 Annex A control references (A.5.1), CMMC 2.0 practice identifiers (AT.L2-3.2.1), PCI DSS requirement numbering (12.6.1) — each requiring exact alphanumeric formatting that generic STT cannot produce consistently; and (2) product and threat-actor vocabulary — SIEM/SOAR/EDR/XDR vendor names, APT group names (APT28, Lazarus Group, ALPHV/BlackCat), malware family names (Emotet, Qakbot, Cobalt Strike, SUNBURST) — that appears in no general STT training corpus. Four major compliance frameworks mandate cybersecurity training: HIPAA Security Rule 45 CFR § 164.308(a)(5) for healthcare covered entities and business associates; FISMA-mandated annual security awareness training (NIST SP 800-50) for all federal employees; CMMC 2.0 AT domain practices (AT.L1-3.2.1, AT.L2-3.2.1, AT.L2-3.2.2) for DoD contractors; and PCI DSS 4.0 Requirement 12.6 for all personnel in payment card processing environments. These mandates create a vast captioning obligation across healthcare, federal government, defense supply chain, and financial services — all sectors where the vocabulary failure mode is systematic. Security awareness training further divides into two distinct tracks: the general-population awareness track (all employees — phishing recognition, password hygiene, social engineering defence, data handling, incident reporting) and the security-privileged technical track (SOC analysts, CSIRT, DevSecOps — threat hunting, SIEM/SOAR workflow, malware analysis, penetration testing vocabulary). Each track has its own vocabulary profile, and the technical track carries the most extreme vocabulary density in any enterprise training content category. The KnowBe4 KMSAT platform and Proofpoint Security Awareness Training (SAT) are the two dominant security awareness training vendors; DoD JKO administers the CyberAwareness Challenge for DoD; and enterprise LMS platforms (Cornerstone, Workday, TalentLMS) deliver CMMC, ISO 27001, SOC 2, and PCI DSS compliance training. None of these platforms generate accurate captions for cybersecurity vocabulary automatically.
Security awareness training content types: general-population track
Phishing simulation and awareness training
Phishing awareness is the single most widely deployed security training topic in the enterprise — every major security awareness training vendor (KnowBe4, Proofpoint, Cofense, Mimecast, Infosec IQ) makes phishing awareness the anchor module of their training programmes. Phishing awareness training narrates a vocabulary of attack-type names that are specific to information security and appear rarely in general STT training data: phishing (usually handled correctly), spear-phishing (targeted phishing using personal or organisational information — "spear" + "phishing" compound, sometimes rendered as "spearfishing" by STT), whaling (CEO-targeted phishing — "whale" + "-ing" portmanteau that STT renders as "whaling" but occasionally as "wailing"), vishing (voice phishing over telephone — "V-ishing" rendered as "fishing," "v-ishing," or "vishing" inconsistently), smishing (SMS-based phishing — "S-M-ishing" rendered as "smishing" or "smithing" or "S-M-ishing"), BEC (Business Email Compromise — "B-E-C" or "beck" or "B.E.C." — three rendering strategies within a single training video), and pretexting (social engineering through fabricated scenarios — usually transcribed correctly but the specific BEC/pretexting distinction is not contextually reinforced by STT). The training also narrates phishing indicator vocabulary: lookalike domain (domain spoofing), homograph attack (Unicode character substitution in URLs), typosquatting, reply-chain hijacking, QR code phishing (quishing — another portmanteau that STT renders as "quishing" or "squishing").
Password and credential hygiene training
Password hygiene and multi-factor authentication (MFA) training narrates a fast-evolving vocabulary of authentication technologies. The failure modes are specific: MFA → "M-F-A" or "multi-factor" or "em-FA" — three rendering strategies; TOTP (Time-based One-Time Password, the algorithm behind Google Authenticator and Authy) → "TOTE-P" or "T-O-T-P" or "tote-pee" — all appear in STT output for the same acronym spoken in the same sentence; FIDO2 → "FEE-doh-two" (the pronunciation used by most security professionals) → STT: "Fido 2," "Fido2," "FEE-doe 2," "F-I-D-O-2" — four formatting variants for a proper noun with one canonical form (FIDO2); WebAuthn (Web Authentication API) → "web-AUTH-en" or "web-auth-n" — the W3C specification name that most narrators pronounce "WEB-aw-then" produces "WebAuthn," "web-auth-N," "web authen," "WEB-aw-then" in STT output; passkey (FIDO2 device-bound credential without password) → usually handled correctly but "passkey" vs. "PASSKEY" vs. "Passkey" capitalisation varies; hardware security key / YubiKey → "YubiKey" (Yubico's product name) → "Yubi Key," "Yoobi key," "ruby key" — STT mis-recognition of an uncommon proper noun; push notification fatigue (MFA bombing) → usually handled but "MFA fatigue" produces "em-FA fatigue" or "MFA fatigue" or "multifactor fatigue" inconsistently; authenticator app (generic term for TOTP apps) → usually handled.
Social engineering defence training
Social engineering training covers the human-manipulation attack surface: OSINT (Open Source Intelligence — "OS-int" or "O-S-I-N-T" — STT renders as "OSINT," "OS int," "oz-int," "O.S.I.N.T." — four formatting variants for a six-letter acronym), pretext scenarios (usually handled), shoulder surfing (usually handled), tailgating and piggybacking (physical access social engineering — both usually handled correctly), vishing scripts (voice phishing scripts — see above), and baiting (USB drop attacks — the word "baiting" in security context is usually handled). The social engineering vocabulary that most consistently fails in STT is the acronym layer: OSINT is the primary failure, followed by HUMINT (Human Intelligence, which appears in some advanced social engineering training) → "HYOO-mint" or "H-U-M-I-N-T" and SIGINT (Signals Intelligence, which appears in training that covers nation-state threats) → "SIG-int" or "S-I-G-I-N-T" — both intelligence community acronyms with multiple STT rendering strategies.
Data handling and classification training
Data handling and classification training covers the organisation's data classification scheme and the DLP (Data Loss Prevention) controls that enforce it. The vocabulary includes: DLP → "D-L-P" or "deep" in rare fast-narration contexts; classification tiers — Public / Internal / Confidential / Restricted (organisation-specific names that vary; STT handles the generic names but not organisation-specific tier naming like "SENSITIVE//NF" or "COMPANY CONFIDENTIAL//INTERNAL USE ONLY" in organisations that use government-style classification-marking formats); DRM (Digital Rights Management) → "D-R-M" or "drm" — always letter-spelled; PII (Personally Identifiable Information) — narrated as "P-I-I" or "PII" (as a word) inconsistently; PHI (Protected Health Information) → "P-H-I" or "fee" — three-letter acronym with silent H creates "fee" mis-rendering in some STT contexts; PCI data (payment card data in scope for PCI DSS) → "P-C-I data" or "payment card data" or "PCI scope" — expansion vs. abbreviation inconsistency; GDPR (General Data Protection Regulation) → "G-D-P-R" or "GDPR" (usually handled in European-training-data-rich models but less consistently in US-focused STT); CCPA (California Consumer Privacy Act) → "C-C-P-A" or "CCPA" — less frequently handled correctly than GDPR.
Incident reporting and escalation training
Incident reporting training covers the employee-facing escalation path: how to recognise a potential security incident, who to notify, and what information to capture. The vocabulary includes: SOC (Security Operations Center) → "SOCK" or "S-O-C" — both renderings appear in the same training video; IR (Incident Response) → "I-R" or "incident response" — expansion vs. abbreviation inconsistency; CIRT (Computer Incident Response Team, sometimes CSIRT with an S for Computer Security) → "SIRT" or "C-I-R-T" or "siirt" — inconsistent; ticket and case management vocabulary (JIRA, ServiceNow, Remedy — vendor names that STT handles with varying capitalisation accuracy); escalation path vocabulary (L1/L2/L3 support tier labels — "L-1," "L1," "Level 1," "level-one" — four formatting variants); runbook and playbook (security operations procedure documents — usually handled); and SOAR (Security Orchestration, Automation, and Response) → "SOAR" as a word (usually handled correctly when narrated as a single word, but "S-O-A-R" when spelled produces "S-O-A-R," "soar," "SOAR" inconsistently).
Cybersecurity compliance training content types: technical and framework track
NIST CSF 2.0 training
NIST Cybersecurity Framework 2.0 (published February 2024) organises cybersecurity activities into six functions: Govern (GV), Identify (ID), Protect (PR), Detect (DE), Respond (RS), and Recover (RC). Each function contains categories and subcategories with alphanumeric codes: GV.OC-01 (Organizational context is understood to inform the cybersecurity risk management strategy), ID.AM-01 (Inventories of hardware managed by the organization are maintained), PR.AA-05 (Access permissions and authorizations are managed). NIST CSF training for security professionals and IT risk staff narrates these subcategory codes in the context of control assessment and risk management. The STT challenge: "GV.OC-01" narrated as "G-V-dot-O-C-zero-one" → produces "GV.OC-01," "GV OC 01," "GV.OC01," "G.V.O.C.01" — four formatting variants for a code that has one canonical form in the NIST CSF document. Across a full-framework NIST CSF training covering all 106 subcategories, the formatting inconsistency accumulates to hundreds of mis-formatted control codes. The six function names themselves — Govern, Identify, Protect, Detect, Respond, Recover — are common English words handled correctly by STT, but their function codes (GV, ID, PR, DE, RS, RC) are rendered inconsistently as letter strings when narrated in isolation.
SOC 2 Type II training
SOC 2 Type II (System and Organisation Controls 2, Type II) attestation training covers the AICPA's Trust Services Criteria (TSC) framework used by cloud service providers and technology companies to demonstrate security controls to enterprise customers. SOC 2 training narrates: Trust Services Criteria categories — Security (CC), Availability (A), Confidentiality (C), Processing Integrity (PI), and Privacy (P) — with their subcategory codes (CC1.1 through CC9.9); TSP (Trust Services Principles, the underlying standard document); the SOC 2 Type I vs. Type II distinction (Type I — design effectiveness at a point in time; Type II — operational effectiveness over a period); QSA (Qualified Security Assessor — shared with PCI DSS — an abbreviation that appears in both frameworks but with different meanings in audit reports); management assertion letter vocabulary; complementary user entity controls (CUECs); complementary subservice organisation controls (CSOCs); and the sampling methodology vocabulary (population, sample, deviation, exception rate) used in Type II examination reports. STT failures: "CUECs" → "KWEKS" or "C-U-E-C-s" or "koo-eks"; "CSOCs" → "C-SOCKS" or "C-S-O-Cs"; "TSP Section 100" → "TSP section one hundred" (usually handled but "TSP" → "T-S-P" or "the TSP" or "tee-ess-pee" — inconsistently).
ISO/IEC 27001:2022 training
ISO/IEC 27001:2022 (the international standard for Information Security Management Systems, ISMS) was significantly restructured in its 2022 revision, with Annex A controls reorganised from 114 controls in 14 domains (ISO 27001:2013) to 93 controls in 4 themes (A.5 Organisational, A.6 People, A.7 Physical, A.8 Technological). ISO 27001 training narrates the control references as "A-dot-five-dot-one" (A.5.1 — Policies for information security) through "A-dot-eight-dot-thirty-four" (A.8.34 — Protection of information systems during audit testing). STT rendering of ISO 27001 control references: "A.5.1" → "A5.1," "A.five.one," "A-5-1," "A dot 5 dot 1" — four formatting variants. Additional ISO 27001 vocabulary with STT failure profiles: SoA (Statement of Applicability — the mandatory document listing all Annex A controls with applicability decisions and justifications) → "S-O-A" or "the SoA" or "soa" (all three rendering strategies appear); RTP (Risk Treatment Plan) → "R-T-P" or "the RTP"; ISMS (Information Security Management System) → "I-S-M-S" or "the ISMS" or "isms" — three strategies; ISMS scope (mandatory ISMS boundary definition) → usually handled but "scope" meaning in ISO 27001 context vs. general meaning is not contextualised by STT. The ISO/IEC 27001 certification process vocabulary (stage 1 / stage 2 audit, surveillance audit, recertification audit, nonconformity (NC), minor NC, major NC, opportunity for improvement (OFI)) is also narrated in training and handled inconsistently by STT.
CMMC 2.0 training (DoD contractors)
The Cybersecurity Maturity Model Certification 2.0 (CMMC 2.0) is the DoD framework for cybersecurity requirements for the Defense Industrial Base (DIB) supply chain. CMMC 2.0 organises requirements into three levels (Level 1 — Foundational, Level 2 — Advanced, Level 3 — Expert) based on the sensitivity of Controlled Unclassified Information (CUI) handled by the contractor. CMMC 2.0 training for DoD contractors narrates 14 practice domains with two-letter codes (identical to NIST SP 800-171 domains): AC (Access Control), AT (Awareness and Training), AU (Audit and Accountability), CA (Security Assessment), CM (Configuration Management), IA (Identification and Authentication), IR (Incident Response), MA (Maintenance), MP (Media Protection), PE (Physical Protection), RA (Risk Assessment), RM (Risk Management, new in CMMC 2.0), SA (System and Services Acquisition), SC (System and Communications Protection), SI (System and Information Integrity), SR (Supply Chain Risk Management). Practice identifiers follow the format Domain.Level-NIST-number: AT.L1-3.2.1 (Awareness and Training, Level 1, NIST SP 800-171 requirement 3.2.1) — "A-T-dot-L-one-dash-three-dot-two-dot-one" → STT produces "AT.L1-3.2.1," "AT L1 3.2.1," "AT-L1-3.2.1," "A.T.L1-3.2.1." The CMMC acronym itself: "CMMC" → "CIM-see" or "C-M-M-C" or "CMMC" as a word — three rendering strategies; CUI (Controlled Unclassified Information) → "Q-I" or "CUI" or "C-U-I" — the DoD-specific information classification that is the primary subject of CMMC training and appears in every CMMC module.
PCI DSS 4.0 training
PCI DSS 4.0 (Payment Card Industry Data Security Standard, version 4.0 released March 2022, mandatory since March 2024) requires security awareness training under Requirement 12.6 for all personnel and targeted training for roles with specific security responsibilities. PCI DSS 4.0 training narrates: Requirement numbering (Requirement 12.6.1, Requirement 12.6.2, Requirement 10.7.1) — the dotted requirement format that STT renders as "12.6.1," "12 6 1," "Requirement twelve-six-one" without consistent formatting; QSA (Qualified Security Assessor — the PCI DSS third-party auditor who conducts the on-site assessment for Level 1 merchants); SAQ (Self-Assessment Questionnaire — the PCI DSS self-attestation pathway for Level 2-4 merchants) → "S-A-Q" or "sack" — one of the more consistent STT failures in compliance training, where a three-letter acronym with a plausible phonetic rendering ("sack") is systematically mis-transcribed; CDE (Cardholder Data Environment — the in-scope system boundary for PCI DSS) → "C-D-E" or "the CDE" or "C.D.E." — three rendering strategies; CHD (Cardholder Data) → "C-H-D" or "the CHD"; PAN (Primary Account Number — the 16-digit card number) → "PAN" as a word (usually handled correctly but capitalisation — PAN vs. pan — inconsistent); ASV (Approved Scanning Vendor — the PCI DSS entity that conducts quarterly external vulnerability scans) → "A-S-V" or "the ASV"; and 3DS (3D Secure — the card authentication protocol) → "three-D-S" or "three-D secure" or "three DS" — three formatting variants.
The cybersecurity vocabulary failure mode in detail
MITRE ATT&CK technique identifiers
MITRE ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge) is the most widely referenced cybersecurity threat-behaviour framework in use. ATT&CK organises adversary behaviour into 14 tactics and hundreds of techniques and sub-techniques, each with a unique identifier: T[number].[sub-number] for enterprise techniques. The STT failure mode for ATT&CK vocabulary is multi-part:
- "MITRE ATT&CK" as a proper noun: "MIGHT-ree attack" — the common pronunciation — produces "MITRE attack," "Miter ATT&CK," "might-ree at-n-ck," "MITER ATTACK." The all-caps "ATT&CK" format — with an ampersand embedded in an acronym — is not preserved by STT, which consistently renders it as "attack" in lowercase or mixed case. For a SIEM analyst whose job involves mapping detections to ATT&CK techniques, the training caption that renders "MITRE ATT&CK" as "MITRE attack" changes the meaning: "attack" is a generic word; "ATT&CK" is a proper noun referring to a specific framework. The distinction matters for compliance evidence, where training records may reference ATT&CK-based detection training.
- Technique IDs (T1566.001): narrated as "T-fifteen-sixty-six-point-zero-zero-one" → STT produces "T1566.001," "T 1566.001," "T-fifteen sixty-six zero-zero-one," "T1566.1" (dropping the leading zero in the sub-technique number). For a training module on phishing that covers T1566.001 (Spear Phishing Attachment), T1566.002 (Spear Phishing Link), and T1566.004 (Spear Phishing via Service), each technique ID must be formatted exactly — the difference between T1566.001 and T1566.004 is meaningful.
- Tactic names and abbreviations: MITRE ATT&CK tactic names (Initial Access, Execution, Persistence, Privilege Escalation, Defense Evasion, Credential Access, Discovery, Lateral Movement, Collection, Command and Control, Exfiltration, Impact) are narrated in security training. Each tactic has a TA[number] code (TA0001 through TA0043). STT handles the English tactic names acceptably but "TA0001" narrated as "T-A-zero-zero-zero-one" produces "TA0001," "TA 0001," "T.A.0001" — formatting variants that affect searchability.
CVE numbering and vulnerability vocabulary
Common Vulnerabilities and Exposures (CVE) numbering is a central vocabulary element of cybersecurity training: vulnerability management training, patch management training, and threat-intel briefings all reference specific CVE identifiers. The CVE numbering format is CVE-[year]-[number]: "CVE-2024-12345" narrated as "C-V-E-twenty-twenty-four-dash-twelve-three-four-five" → STT produces: "CVE-2024-12345," "CVE 2024 12345," "C-V-E-2024-12345," "cve twenty-twenty-four dash twelve three four five." Four formatting variants for a vulnerability identifier that has one canonical form (CVE-2024-12345 with hyphens and no spaces). In a patch management training video that covers 12 recent CVEs with different CVSS (Common Vulnerability Scoring System) scores, each CVE may produce a different formatting variant, making the training record's CVE references non-searchable and non-matchable against vulnerability management system data. CVSS scoring vocabulary adds another layer: "CVSS v3.1 Base Score 9.8" narrated as "C-V-S-S-version-three-point-one-base-score-nine-point-eight" → "CVSS v3.1 Base Score 9.8," "CVSS version 3.1 base score 9.8," "C.V.S.S. v3.1 base score nine-point-eight."
SIEM, SOAR, EDR, XDR, CASB — security tool category acronyms
Cybersecurity tool category acronyms are the primary vocabulary layer of technical security operations training and appear throughout general security awareness training when tools are referenced. The STT failure profile for each is distinct:
- SIEM (Security Information and Event Management): pronounced "seem" by most security professionals. STT rendering: "seem" (loses the acronym meaning entirely, producing "the SEEM" or "the seem" where the training says "the SIEM"), "S-I-E-M" (letter-spelled when narrated more formally), or "SIEM" in models with security training data exposure. In a training video that uses "SIEM" 15 times, three rendering strategies may be applied to the same acronym in different sentence positions.
- SOAR (Security Orchestration, Automation, and Response): pronounced "soar" (rhymes with "more"). STT handles "SOAR" as a word correctly, but when narrated as "S-O-A-R" (as some trainers do for emphasis) produces "S-O-A-R," "soar," "SOAR" — capitalisation inconsistency for the same acronym.
- EDR/XDR/MDR/CDR (Endpoint Detection and Response / Extended Detection and Response / Managed Detection and Response / Cloud Detection and Response): all four are letter-spelled in narration and produce the same three-or-four-letter inconsistency pattern. The distinctions between EDR, XDR, and MDR are conceptually important in training that covers security architecture — if "EDR" and "XDR" are both transcribed as letter strings but with inconsistent capitalisation, the training record does not clearly distinguish the tools.
- CASB (Cloud Access Security Broker): pronounced "KAZ-bee" by most security professionals → STT: "Kazby," "CAS-B," "CASB," "kas-bee" — four rendering variants for a tool category that appears in every cloud security training module.
- PAM (Privileged Access Management): "PAM" as a word → STT: "pam" (a first name — STT may capitalise it as "Pam" in sentence-initial position), "PAM," "P-A-M." In a sentence like "the PAM solution enforces just-in-time access," STT produces "the Pam solution" with a name capitalisation rather than an acronym capitalisation.
Threat-actor group names and malware families
Advanced threat-actor vocabulary is the most rapidly evolving and most completely out-of-distribution vocabulary category in cybersecurity training. Threat actor names appear in security awareness training (advanced threat briefings, threat-intel integration training, SOC playbook training) and in technical security operations training. The STT failure mode is near-total for novel or complex threat-actor names:
- APT28 / Fancy Bear: "APT28" → "A-P-T-twenty-eight" → "APT 28," "APT28," "A.P.T. 28," "advanced persistent threat twenty-eight" — four rendering variants. "Fancy Bear" → usually handled correctly. When a training module uses both the APT number designation and the colloquial name ("APT28, also known as Fancy Bear"), STT must handle both naming conventions and their co-reference — a task that generic STT handles correctly for the English words but not for the alphanumeric designation.
- ALPHV / BlackCat: "ALPHV" → "alpha-V" or "ALF-V" or "ALPHV" — three rendering strategies for a five-letter threat-actor name with no standard pronunciation. "BlackCat" → usually handled correctly as two common English words combined.
- Scattered Spider: usually handled correctly as two common English words.
- Lapsus$ (also written LAPSUS$): "lap-suss-dollar" or "LAP-sus" — the dollar-sign character embedded in a group name is a formatting convention that STT cannot preserve; narrators pronounce the dollar sign as "dollar" (which STT produces correctly as "dollar") but the canonical written form "Lapsus$" is never produced by STT.
- SUNBURST: the SolarWinds supply-chain attack malware — usually handled as an all-caps word.
- Cobalt Strike: a commercial penetration testing tool (Fortra/formerly HelpSystems) that is also widely abused by threat actors — usually handled correctly as two common English words, but "Cobalt Strike Beacon" (the malicious payload component) produces "Cobalt Strike Beacon" vs. "cobalt strike beacon" capitalisation inconsistency.
- Emotet / Qakbot / BumbleBee / Raccoon: malware family names that are either invented words (Emotet, Qakbot) or common English words used in a specific technical context (BumbleBee, Raccoon). Invented malware names are handled inconsistently by STT — "Emotet" → "Emo-tet," "E-motet," "emoTet"; "Qakbot" → "Quack-bot," "Q-A-K-bot," "kwak-bot."
Security framework abbreviation rendering: OWASP, FIDO2, WebAuthn, CIS Controls
Security framework and standard abbreviations outside the NIST/ATT&CK vocabulary also produce systematic STT failures:
- OWASP (Open Web Application Security Project): "OH-wasp" → STT: "OWASP," "oh-wasp," "O-W-A-S-P," "Oz-wasp" — four rendering variants. OWASP Top 10 identifiers (A01:2021, A02:2021, etc.) narrated as "A-zero-one-colon-twenty-twenty-one" → "A01:2021," "A01 2021," "A-01-2021," "A zero-one twenty-twenty-one."
- FIDO2 and WebAuthn: as described above — FIDO2 → four rendering variants; WebAuthn → four rendering variants. These are the two most-cited MFA standard proper nouns in current security training and the most inconsistently handled by STT.
- CIS Controls (Center for Internet Security Controls, v8): training content may use either the full name ("Center for Internet Security Controls") or the abbreviation ("CIS Controls" or "the CIS Controls"). STT handles the full English name correctly but abbreviation rendering — "CIS" → "sis" or "C-I-S" or "CIS" — is inconsistent, and the expansion vs. abbreviation strategy varies between narrators and between sentences in the same video.
- BEC (Business Email Compromise): "B-E-C" → "beck" or "B-E-C" or "BEC" — the phonetic rendering "beck" is the most common STT output for this three-letter acronym, completely losing the abbreviation format. In a training video on social engineering that uses "BEC" 12 times, some occurrences produce "BEC" and others produce "beck" — the inconsistency prevents a learner from recognising both as the same term.
Compliance obligations for cybersecurity training video
HIPAA Security Rule — 45 CFR § 164.308(a)(5)
The HIPAA Security Rule (45 CFR Part 164) requires covered entities (hospitals, health systems, physician practices, health insurers) and business associates (cloud providers, billing companies, IT service providers handling PHI) to implement a security awareness and training program for all members of the workforce. The specific standard is 45 CFR § 164.308(a)(5)(i): "Implement a security awareness and training program for all members of its workforce (including management)." The implementation specification at § 164.308(a)(5)(ii) includes periodic security updates, protection from malicious software, log-in monitoring, and password management as addressable specifications. For a health system with 15,000 employees, all 15,000 receive mandatory HIPAA Security Rule training annually — creating a captioning obligation for the training video under ADA Title I for hearing-impaired employees. The training vocabulary is dual-layer: HIPAA-specific vocabulary (PHI, ePHI, covered entity, business associate agreement — BAA, Minimum Necessary standard, workforce member) plus cybersecurity technical vocabulary (phishing, malware, ransomware, MFA, DLP). The HIPAA-specific vocabulary adds its own STT failure layer on top of the security vocabulary failures: "ePHI" → "e-PHI" or "E-P-H-I" or "ephi"; "BAA" → "B-A-A" or "baa" (like a sheep sound — the STT rendering of a three-letter acronym with a phonetic rendering); OCR (Office for Civil Rights, the HIPAA enforcement agency) → "O-C-R" or "OCR" (shared with "Optical Character Recognition" — a frequent STT context-confusion).
FISMA-mandated annual security awareness training — NIST SP 800-50
FISMA (44 U.S.C. §§ 3551-3558) requires all federal agencies to implement an information security awareness and training programme. NIST SP 800-50 (Building an Information Technology Security Awareness and Training Program) is the implementation guide. Annual security awareness training is mandatory for all federal employees and contractors with access to federal information systems. The government employee training vocabulary obligations overlap: NIST SP 800-53 control identifiers, FISMA vocabulary, FedRAMP vocabulary, and OMB M-memo references appear in FISMA security training just as they do in general federal mandatory training. The DoD CyberAwareness Challenge (administered through DoD JKO) adds DoD-specific vocabulary: CAC (Common Access Card), PIV (Personal Identity Verification), NIPRNET/SIPRNET/JWICS network tiers, STIG (Security Technical Implementation Guide, DoD hardening configuration standards — "STIG" as a word: usually handled, but "STIG" → "stick" or "stifg" in some STT contexts), and RMF (Risk Management Framework — the DoD/federal implementation of NIST SP 800-37) → "R-M-F" or "the RMF" or "rimp" in fast narration.
CMMC 2.0 — AT domain — DoD contractor obligation
CMMC 2.0 (32 CFR Part 170, final rule effective December 2024) requires DoD prime contractors and subcontractors that handle Federal Contract Information (FCI) or Controlled Unclassified Information (CUI) to meet cybersecurity practice requirements across 14 domains. The Awareness and Training (AT) domain contains the specific training requirements: AT.L1-3.2.1 (Level 1 — Ensure that personnel are aware of security risks associated with their activities and report such risks), AT.L2-3.2.1 (Level 2 — Ensure that organizational personnel are made aware of the security risks associated with their activities and of the applicable policies, standards, and procedures related to the security of organizational systems — annual training required), and AT.L2-3.2.2 (Level 2 — Ensure that personnel are trained to carry out their assigned information security responsibilities — role-based training for privileged users). DoD contractors must document their training program as part of their System Security Plan (SSP) and, for Level 2, conduct a CMMC Third Party Assessment Organization (C3PAO) assessment. The training obligation applies to the entire organisation at Level 2, not just security staff — creating the same broad-population captioning obligation as HIPAA Security Rule training. The CMMC vocabulary failure mode compounds the general cybersecurity vocabulary failures with DoD-specific acronyms: C3PAO → "C-three-PAO" or "C3PAO" or "see-three-pao"; SSP → "S-S-P" or "the SSP"; OSC (Organisation Seeking Certification, the CMMC assessment term for the DoD contractor being assessed) → "O-S-C" or "the OSC" (shared with Office of Special Counsel — frequent abbreviation collision); POA&M → "PWAM" or "P-O-A-and-M" — same failure mode as in federal FISMA training.
PCI DSS 4.0 Requirement 12.6 — security awareness training
PCI DSS 4.0 Requirement 12.6 mandates a formal security awareness program covering all personnel in the cardholder data environment (CDE). Requirement 12.6.1 requires a security awareness program that provides security awareness training upon hire and at least annually thereafter. Requirement 12.6.2 requires program review at least annually and updates to address new threats. Requirement 12.6.3 requires personnel to acknowledge at least annually that they have read and understood the information security policy and procedures. PCI DSS training vocabulary adds to the general security vocabulary layer: the QSA/ISA (Internal Security Assessor) distinction → "I-S-A" or "isa" or "the ISA"; SAQ types (SAQ A, SAQ A-EP, SAQ B, SAQ B-IP, SAQ C, SAQ C-VT, SAQ D) → "SAQ-A," "SAQ A," "sack A" — the SAQ-type suffix is consistently lost in STT rendering of "SAQ"; PAN masking (the requirement to display only the first 6 and last 4 digits of the primary account number) → "PAN masking" (usually handled but "PAN" capitalisation inconsistency); network segmentation (the architectural control used to reduce PCI DSS scope) → usually handled; and scope reduction (the process of removing systems from CDE scope) → usually handled.
ADA Title I and state law obligations for cybersecurity training
Cybersecurity and security awareness training is mandatory for all employees at large technology companies, financial institutions, healthcare systems, and any organisation subject to HIPAA, CMMC, or PCI DSS — all of which are large employers with ADA Title I obligations. ADA Title I (42 U.S.C. § 12112) requires employers with 15+ employees to provide reasonable accommodation to qualified employees with disabilities, including accessible formats for mandatory training. For a 10,000-employee technology company whose mandatory annual security awareness training video is deployed without accurate captions, every hearing-impaired employee who accesses the training without accessible captions is a potential ADA Title I reasonable-accommodation failure — and for organisations in states with stricter disability accommodation standards (California FEHA at 5+ employees, New York HRL at 4+ employees, New Jersey LAD at 1+ employees), the threshold is lower still. California SB 1386 and its successor California privacy laws (CCPA/CPRA) require employee privacy training that overlaps with security training content — adding a CCPA/CPRA vocabulary layer (data subject rights, opt-out of sale, sensitive personal information categories) to the security training vocabulary challenge.
Security awareness training LMS and delivery platforms
Cybersecurity training is delivered through a distinctive ecosystem that spans purpose-built security awareness platforms and general enterprise LMS deployments. Each platform has a distinct captioning workflow.
KnowBe4 KMSAT
KnowBe4's Kevin Mitnick Security Awareness Training (KMSAT) platform is the world's largest integrated security awareness training and phishing simulation platform by customer count. KMSAT provides a training content library of security awareness modules covering phishing, social engineering, ransomware, and compliance topics, combined with a phishing simulation engine that sends simulated phishing emails to employees and captures click-through and credential-submission rates. KnowBe4's training content library includes video-based modules narrated in English (with translations available). Caption file support (SRT/VTT) is available for uploaded custom content; KnowBe4's own library content includes caption tracks of varying quality — the platform's auto-captioning uses a generic STT engine that produces the characteristic cybersecurity vocabulary failures documented in this reference. For organisations that need accurate captions on KnowBe4 library content or custom content uploaded to KMSAT, caption files must be produced externally and uploaded to the module configuration.
Proofpoint Security Awareness Training (SAT)
Proofpoint Security Awareness Training (formerly Wombat Security) is the second-largest security awareness training platform by revenue. Proofpoint SAT provides a training content library, phishing simulation, and learning management features. Like KnowBe4, Proofpoint SAT's video content includes caption tracks with generic STT rendering — the MITRE ATT&CK technique IDs, NIST CSF codes, and threat-actor names in the training content are mis-transcribed in the auto-generated captions. Proofpoint SAT supports custom content upload with SRT caption file attachment. The captioning gap in Proofpoint's library content is most severe in the modules covering advanced threats (nation-state actors, supply-chain attacks, advanced phishing) where the ATT&CK and threat-actor vocabulary density is highest.
Cofense — phishing simulation with healthcare focus
Cofense (formerly PhishMe) provides phishing simulation and awareness training with a strong healthcare sector customer base. Cofense's training content includes healthcare-specific phishing scenario modules (patient record phishing, healthcare vendor impersonation, medical device vendor pretexting) that combine HIPAA security vocabulary with cybersecurity attack vocabulary. For healthcare SOC teams and clinical staff receiving mandatory HIPAA security training through Cofense, the vocabulary double-layer (HIPAA terms + cybersecurity terms) creates the most complex captioning environment in security awareness training.
SANS Security Awareness (formerly SANS Ouch!)
SANS Security Awareness provides monthly security awareness content (the Ouch! newsletter and accompanying training modules) targeted at both general employee populations and security professionals. SANS content is often used for executive and senior management security briefings — content where the vocabulary profile includes both C-suite accessible language and the technical vocabulary of advanced threats. SANS also delivers the SANS Institute technical security training courses (GIAC certification training) which have the highest technical vocabulary density of any commercial security training content: exploitation frameworks (Metasploit, Cobalt Strike, Burp Suite), reverse engineering vocabulary, malware analysis vocabulary, and network forensics vocabulary.
DoD JKO — CyberAwareness Challenge
The DoD Joint Knowledge Online (JKO) platform administers the DoD CyberAwareness Challenge, the mandatory annual cybersecurity training for all DoD military and civilian personnel. JKO is FedRAMP-authorized and FISMA-compliant. JKO's captioning workflow for SCORM-packaged content requires externally produced caption files embedded in the SCORM package. The DoD CyberAwareness Challenge content is produced by a DoD contractor and carries Section 508-compliant caption tracks — but the auto-captioning approach used in prior iterations of the CyberAwareness Challenge produced the DoD/FISMA vocabulary failures documented above (NIPRNET, SIPRNET, JWICS, STIG, RMF, CAC, PIV rendering inconsistency).
Enterprise LMS for compliance training — Cornerstone, Workday, TalentLMS
Cornerstone OnDemand, Workday Learning, and TalentLMS are used to deliver CMMC 2.0, ISO 27001, SOC 2, and PCI DSS compliance training in large enterprise environments. These are the platforms where organisations build their own compliance training catalogue using internally produced video — video narrated by the organisation's own CISO, security architects, or compliance team. Internally produced cybersecurity training video has the highest organisation-specific vocabulary density: the SIEM vendor the organisation uses (Splunk, CrowdStrike Falcon, Microsoft Sentinel), the organisation's specific incident response playbook vocabulary, the classification tier names specific to the organisation, and the specific threat actors the organisation's threat-intel team tracks. All of these organisation-specific vocabulary items are beyond any generic STT glossary and require an organisation-specific glossary overlay.
The GlossCap approach for cybersecurity training video
Cybersecurity training vocabulary divides into a large framework base layer and an organisation-specific overlay — a structure that mirrors the two-tier nature of cybersecurity training delivery (industry-standard framework content + organisation-specific tool and threat vocabulary).
The framework base layer covers vocabulary that is consistent across all cybersecurity training regardless of organisation or industry: all NIST CSF 2.0 function codes (GV, ID, PR, DE, RS, RC) and subcategory codes (GV.OC-01 through RC.CO-04); all NIST SP 800-53 Rev 5 control family codes and control identifiers (AC-1 through SR-12); all MITRE ATT&CK technique ID formats (T[NNNN] and T[NNNN].[NNN]) and tactic codes (TA0001 through TA0043); all CVE numbering format patterns (CVE-[YYYY]-[NNNNN]) and CVSS v3.x scoring vocabulary; all CWE identifier formats (CWE-[NN] through CWE-[NNN]); all OWASP Top 10 identifier formats (A01:2021 through A10:2021 in the current edition); all ISO/IEC 27001:2022 Annex A control references (A.5.1 through A.8.34) and ISMS vocabulary (SoA, RTP, ISMS, nonconformity, surveillance audit); all CMMC 2.0 practice identifiers (AT.L1 through SR.L2) and domain codes (AC, AT, AU, CA, CM, IA, IR, MA, MP, PE, RA, RM, SA, SC, SI, SR); all PCI DSS 4.0 requirement numbering formats (Requirement 1.1.1 through 12.10.7); HIPAA Security Rule vocabulary (45 CFR § 164.308 through 164.318, ePHI, BAA, OCR, covered entity, business associate); FISMA/FedRAMP vocabulary (FISMA, FedRAMP, ATO, POA&M, ConMon, ISSO, ISSM, AO); security tool category acronyms (SIEM, SOAR, EDR, XDR, MDR, CDR, CASB, PAM, UEBA, DLP, DRM, WAF, NGFW, IDS/IPS); threat-actor naming conventions (APT[NN] format, common group aliases — Fancy Bear, Lazarus Group, ALPHV, Scattered Spider, Lapsus$); and major malware family names (Emotet, Qakbot, Cobalt Strike, BumbleBee, Raccoon, SUNBURST, BlackMatter, LockBit).
The organisation-specific overlay covers vocabulary unique to the organisation's security environment: the specific SIEM vendor deployed (Splunk query language terms, CrowdStrike Falcon UI vocabulary, Microsoft Sentinel KQL vocabulary, IBM QRadar AQL vocabulary, SentinelOne console vocabulary); the organisation's incident response playbook vocabulary (internal escalation tier names, internal team names — "Purple Team," "Red Cell," "Threat Intel Function"); the specific threat-actor groups the organisation's threat-intel team tracks by name; internal data classification tier names (if they differ from the generic Public/Internal/Confidential/Restricted scheme — e.g., "TLP:RED," "COMPANY CONFIDENTIAL," "NTK" — Need to Know); and internal tool and system names referenced in security training (internal vulnerability management platform, internal ticketing system, internal SOAR playbook names). The organisation-specific overlay is uploaded as a GlossCap custom glossary alongside the framework base layer, covering both the shared industry vocabulary and the organisation-specific security vocabulary in a single captioning run.
FAQ — cybersecurity training captions
Does HIPAA require captions on security awareness training for hospital employees?
HIPAA does not directly mandate caption formats for training video — the HIPAA Security Rule at 45 CFR § 164.308(a)(5) requires a security awareness and training program but does not specify the accessibility format of training delivery. The captioning obligation for hospital employees' security awareness training comes from ADA Title I (42 U.S.C. § 12112): all hospitals with 15+ employees — meaning essentially every hospital in the United States — must provide reasonable accommodation to hearing-impaired employees for mandatory training. Security awareness training is mandatory (required annually for all workforce members under the HIPAA Security Rule) — a hearing-impaired hospital employee who cannot access it through accurate captions has a valid Title I accommodation claim. In California, FEHA applies at 5+ employees, making the threshold effectively all California hospitals. The practical result: every US hospital must provide accurately captioned security awareness training video. The training content's HIPAA-specific vocabulary (ePHI, BAA, covered entity, OCR, minimum necessary, 45 CFR § 164.308) plus the cybersecurity vocabulary (phishing, MFA, SIEM, incident response) creates a dual-vocabulary captioning challenge that generic STT addresses poorly.
How does CMMC 2.0 training affect DoD contractor captioning obligations?
CMMC 2.0 (32 CFR Part 170) imposes cybersecurity training obligations on DoD contractors handling FCI or CUI through the AT domain practices: AT.L1-3.2.1 requires all personnel to be aware of security risks (security awareness training), AT.L2-3.2.1 requires annual security awareness training for all personnel (Level 2 requirement for contractors handling CUI), and AT.L2-3.2.2 requires role-based security training for personnel with privileged system responsibilities. The captioning obligation for CMMC training comes from ADA Title I: DoD prime contractors and subcontractors in the Defense Industrial Base range from Fortune 500 aerospace companies to 50-person engineering firms, but the ADA Title I threshold (15 employees) covers nearly all of them. The training must be accessible to hearing-impaired employees. The CMMC vocabulary challenge is particularly acute because CMMC 2.0 training narrates CUI handling procedures, system security plan vocabulary, C3PAO assessment vocabulary, and NIST SP 800-171/800-172 control references — all highly specific to the DoD compliance framework and essentially absent from general STT training data. For Level 2 CMMC assessments, the C3PAO (CMMC Third Party Assessment Organization) will review the organisation's training program documentation — and an organisation that has deployed mandatory security awareness training in an inaccessible format (inaccurate captions) has a compliance gap in both CMMC AT domain practice evidence and ADA Title I accommodation obligation.
How does MITRE ATT&CK technique ID vocabulary create specific STT failures?
MITRE ATT&CK technique IDs fail in STT at three distinct levels. The first is the framework name itself: "MITRE ATT&CK" contains an ampersand embedded in an all-caps acronym — the canonical written form is "ATT&CK" but narrators pronounce it as "attack" (one syllable), producing "MITRE attack" rather than "MITRE ATT&CK" in every STT output because no STT model preserves the "&" character in an acronym from speech. The second level is the technique ID format: "T1566.001" narrated as "T-fifteen-sixty-six-point-zero-zero-one" produces formatting variants ("T 1566.001," "T-fifteen sixty-six zero-zero-one," "T1566.1") that obscure the sub-technique designation (the ".001" distinguishes spear-phishing-attachment from spear-phishing-link at ".002"). The third level is the tactic/technique co-reference: a training module that describes "Initial Access techniques including T1566 Phishing and its sub-techniques T1566.001 Spear Phishing Attachment and T1566.002 Spear Phishing Link" contains three ATT&CK identifiers that must be formatted consistently — with STT producing potentially different formats for each occurrence. For a SOC analyst training module that covers 20 ATT&CK techniques with their tactic context, the technique ID formatting inconsistency creates a caption track where none of the 20 technique IDs are formatted the same way — the training record becomes unsearchable against ATT&CK navigator data and the learner cannot reliably locate the referenced techniques in the framework documentation.
What LMS do large enterprises use for KnowBe4 KMSAT or Proofpoint SAT training delivery?
Large enterprises (10,000+ employees) commonly integrate KnowBe4 KMSAT or Proofpoint SAT with their primary enterprise LMS for compliance record-keeping — so that security awareness training completions appear in the same system as HIPAA, Code of Conduct, and other mandatory training records. The most common integration patterns are: (1) SCORM/xAPI export from KnowBe4 or Proofpoint SAT imported into Cornerstone OnDemand, Workday Learning, or SAP SuccessFactors — the security platform provides the training content, the enterprise LMS provides the completion record and reporting; (2) Single sign-on (SSO) integration so employees access KnowBe4 or Proofpoint SAT from the enterprise LMS without a separate login, with completion data synced back to the LMS via LTI or API; or (3) Direct completion reporting via LRS (Learning Record Store) using xAPI statements from the security platform to the enterprise LRS (Watershed, Learning Locker) integrated with the LMS reporting layer. In all three integration patterns, the caption file must be managed within the security awareness platform's content layer (KnowBe4 module configuration or Proofpoint SAT course upload) — caption files do not transfer through SCORM/LTI integrations into the enterprise LMS. The practical workflow: accurate captions must be produced for the source content in the security awareness platform, not in the enterprise LMS.
How does security training vocabulary change between security-aware (all employees) and security-privileged (SOC/CSIRT/DevSecOps) training tracks?
Security awareness training has two distinct vocabulary profiles that correspond to its two target audiences. The general-population awareness track (all employees) uses English-first vocabulary with minimal technical jargon: phishing, malware, ransomware, MFA, social engineering, data handling, incident reporting. The vocabulary challenge in this track is the security-domain acronyms (BEC, MFA/TOTP/FIDO2, DLP, SOAR, SIEM) that appear in general-awareness language but are described in ways that non-technical employees can understand — the STT failures are the acronyms and a small set of technical proper nouns (phishing variant names, malware family names). The security-privileged technical track (SOC analysts, CSIRT engineers, penetration testers, DevSecOps engineers, threat-intel analysts) carries the full technical vocabulary: ATT&CK technique IDs and tactic codes, SIEM query language vocabulary (Splunk SPL, KQL, AQL), IOC (Indicator of Compromise) types (IP address, domain, hash value, YARA rule), threat-hunting vocabulary (hypothesis-driven hunting, TTPs, IOA vs. IOC), malware analysis vocabulary (static analysis, dynamic analysis, sandbox detonation, C2 beaconing, lateral movement indicators), penetration testing vocabulary (payload, shellcode, privilege escalation, lateral movement, C2 framework, post-exploitation), and SOAR playbook vocabulary (trigger, action, enrichment, containment, eradication). The technical track has 5-10× the STT vocabulary failure density of the general-population awareness track because every sentence contains multiple domain-specific terms — SIEM query names, ATT&CK technique references, tool names, threat-actor designations — that are each individually out-of-distribution for generic STT. For hearing-impaired SOC analysts or security engineers who rely on accurate captions for their role-specific training, the general-population STT failure mode is a nuisance; the technical-track STT failure mode is a substantive barrier to training comprehension.
Further reading
- Compliance training video captions: SOX, HIPAA, GDPR acronym vocabulary
- Section 508 captions: the federal ICT accessibility standard for training video
- HIPAA training captions: Security Rule, Privacy Rule, and workforce training vocabulary
- HHS Section 504 2024: Rehabilitation Act captioning obligations for federal programs
- ADA Title II captions: state and local government accessibility obligations
- WCAG 2.1 AA captions: what SC 1.2.2 "accurately convey the audio" requires
- Government employee training captions: Section 508, federal mandatory training, and agency vocabulary
- Cornerstone OnDemand captions: enterprise LMS for compliance training delivery
- TalentLMS captions: SMB and mid-market LMS captioning workflow
- Why 99% caption accuracy matters: the WCAG 2.1 AA threshold
- The hidden half-FTE cost of L&D caption correction