Engineering · Published 2026-06-21
Caption API automation for L&D teams: webhook-driven workflows, batch processing, and eliminating manual caption uploads
There is a specific workflow tax buried inside most caption compliance programmes that nobody budgets for. An L&D operations engineer or instructional designer uploads a video to the LMS. Then they open a browser tab, navigate to the captioning vendor's portal, upload the same video file again, wait for the transcript, download the SRT file, navigate back to the LMS, find the video asset, navigate to the captions panel, and upload the SRT. Then they verify the timing. If the glossary has been updated since the last run, they re-request the file. If the LMS expects VTT rather than SRT, they run a conversion tool first. When the content library grows past 200 active videos per quarter, this workflow consumes a fraction of a full-time employee. The hidden half-FTE cost analysis established this at 4× real-time for correction labour; the upload-coordination overhead sits on top of that and is rarely tracked because it is distributed across multiple team members' time.
The architectural solution to this problem is not faster humans — it is an event-driven pipeline that triggers caption generation automatically when a video asset is published, applies the correct glossary without manual selection, polls the caption API until the job completes, retrieves the output file, and delivers it to the LMS caption ingestion endpoint without a human in the loop. The components of that pipeline — a webhook consumer, a job queue, an API client, a retry handler, and an LMS delivery function — are individually straightforward. The engineering complexity is in the integration points: each video host fires webhooks in a different schema, each caption API has a different job lifecycle model, and each LMS has a different caption ingestion surface with different format and authentication requirements.
This post covers the full engineering workflow for caption automation, from the video-upload event that starts the pipeline to the verified SRT file attached to the LMS asset. It is the companion post to the LMS caption ingestion workflow engineering guide, which covers bulk retrofit operations; this post covers the ongoing production pipeline for net-new content. It assumes familiarity with basic REST API patterns, webhook handling, and asynchronous job queues. The LMS platforms covered in depth are Kaltura, Cornerstone OnDemand, and Workday Learning — the three platforms with the deepest programmatic caption delivery surfaces in the enterprise L&D market. It also covers the video hosts most commonly found alongside enterprise LMS platforms: Panopto, Vimeo, and Wistia.
A word on scope: this post covers the plumbing, not the captioning itself. For vocabulary accuracy design — how to build glossaries that reduce ASR failure rates on technical and proper-noun content — see the customer glossary architecture post. For what happens after the caption file is delivered and you need to verify it meets DCMP accuracy standards — see the QA methodology post. This post is about getting the right caption file to the right LMS asset automatically, every time, without anyone touching a browser.
TL;DR — five components of an end-to-end caption automation pipeline
- Webhook consumer. Your video host (Kaltura, Panopto, Vimeo, Wistia) fires a webhook when a video is published or reaches a ready state. Your webhook consumer validates the event signature, extracts the video asset ID and metadata, and enqueues a caption job request. The consumer must respond with HTTP 200 within 5 seconds; all heavy processing belongs in the queue worker, not the consumer handler.
- Caption API job creation. The queue worker calls the caption API at
POST /v1/captionswith the video source URL, theglossary_idmatching the content category (engineering, healthcare, sales), and any priority or turnaround parameters. The API returns ajob_idthat the worker persists to a status store for polling. - Polling with exponential backoff. The worker polls
GET /v1/jobs/{job_id}on an exponential backoff schedule — typical intervals are 30s, 60s, 120s, 240s — until the job reaches a terminal state (completed,failed, orcancelled). Jobs that remain in a non-terminal state after 4 hours should be dead-lettered and alerted. Average job completion for a 10-minute video is 8–14 minutes depending on queue depth and turnaround tier. - SRT/VTT retrieval and normalisation. On job completion, the worker retrieves the caption file at
GET /v1/jobs/{job_id}/output?format=srt(orvtt). Before delivery, validate: UTF-8 without BOM, correct timing separator, non-empty cue count, and that the file's total duration covers the video duration ±2 seconds. Files failing validation should be flagged for human review, not silently delivered. - LMS delivery. The worker calls the LMS caption ingestion endpoint with the retrieved file and the LMS asset ID resolved from the original webhook payload. Kaltura uses the
caption_captionassetservice. Cornerstone OnDemand uses its Learning Object Media API. Workday uses its Learning Catalog API. Each requires different authentication (session token, API key, OAuth) and has different idempotency guarantees — duplicate delivery handling must be explicit in your worker logic.
Why manual caption upload fails at scale
The manual upload workflow is often introduced as a temporary measure — "we'll automate it later" — and then calcifies into the permanent approach because the immediate pain is distributed across multiple people and no single stakeholder owns the total cost. An instructional designer who spends 12 minutes per video on caption coordination does not experience this as a significant burden on any individual video. Across 40 videos per month, it is 480 minutes — 8 hours — of work that does not appear in any project plan as "caption administration." It appears as friction in every project, as latency between video delivery and LMS publication, and as the primary reason caption uploads are deferred when a project is under deadline pressure. When caption uploads are deferred, the video publishes without captions and the caption compliance programme's coverage metric quietly degrades.
Scale amplifies every problem in the manual workflow. Glossary selection errors — submitting a video to the captioning vendor without specifying the correct glossary — are common when the submission is done manually and the person doing it was not involved in building the glossary. The caption feedback loop post documents how six months of glossary compounding can move accuracy from 91% to 99% on technical content. A glossary selection error resets that compounding to zero for that video: the vendor receives the file, runs ASR without the benefit of the engineering or healthcare vocabulary list, and delivers a transcript at the baseline accuracy level. The error is often not caught until a learner reports it or a QA audit surfaces it, by which time the miscaptioned video has been live for weeks.
The LMS delivery step introduces a second category of manual error: format mismatch. TalentLMS's subtitle API accepts SRT with comma decimal separators but fails silently on VTT-style period separators even when the file extension is .srt. Docebo's subtitle track API requires BCP-47 language tags (en-US, not en) and rejects bare ISO 639-1 codes at the track creation endpoint. Kaltura's player renders UTF-8-with-BOM SRT files with a corrupted first cue because the BOM prefix is parsed as caption text. These format normalisation failures are well-documented in the LMS caption ingestion engineering guide, but they recur in the manual workflow because the person uploading the file has no automated pre-flight check that validates format before delivery. In an automated pipeline, the normalisation and validation step runs every time as an invariant, not as a step that gets skipped when someone is in a hurry.
The third failure mode of the manual workflow is audit gap. A caption compliance programme that produces captions manually has a sparse, inconsistent audit trail. The accessibility coordinator can check whether a caption track exists on a given video asset, but often cannot answer: which glossary was used, what accuracy level was self-reported by the vendor, whether the delivered file has been validated against the video duration, or when exactly the caption track was attached relative to the video's first publication date. An automated pipeline logs all of these data points as structured events at the moment each step completes. That log becomes the evidentiary record for any accessibility audit, OCR investigation, or ADA Title II compliance review — and it requires no additional work from the accessibility coordinator to produce.
See also: the annual review process, which covers how to use the pipeline's audit log as one of the six annual review inputs; and the compliance KPI reporting post, which covers how to derive the coverage rate, accuracy rate, and delivery latency metrics from the same event log.
Architecture choices: event-driven vs batch-scheduled
Before building a caption automation pipeline, you need to choose between two architectural patterns: event-driven and batch-scheduled. The right choice depends on the volume of net-new content, the availability of webhook support from your video host, and the acceptable latency between video publication and caption delivery.
Event-driven (webhook-triggered)
In the event-driven pattern, the video host fires a webhook when a video asset transitions to a ready or published state. Your webhook consumer receives the event, validates it, and enqueues a caption job. The caption job runs asynchronously, and the LMS delivery step fires when the job completes. End-to-end latency from video publication to caption delivery is typically 10–20 minutes for a 10-minute video: 1–2 minutes of queue overhead, 8–14 minutes of ASR processing, and 1–2 minutes of LMS delivery. This is fast enough that captions can be attached before most learners access a newly published video.
Event-driven is the preferred architecture for organisations publishing more than 10 videos per week. It eliminates the "polling window" problem — the batch-scheduled approach checks for new videos at fixed intervals, which means a video published at 10:01 AM may not get captions until the next batch at 11:00 AM. The event-driven approach processes the video within seconds of publication. The tradeoff is infrastructure complexity: you need a reliable webhook endpoint, message queue, worker pool, and dead-letter handling. For organisations running on AWS, this maps cleanly to API Gateway → SQS → Lambda → DynamoDB for status. On GCP, it maps to Cloud Run → Pub/Sub → Cloud Functions. On Azure, it maps to API Management → Service Bus → Azure Functions.
Batch-scheduled
In the batch-scheduled pattern, a cron job runs at fixed intervals (hourly, every 4 hours, or nightly) and queries the video host's API for assets published since the last run. Each new asset without an existing caption track is submitted to the caption API as a job. Completed jobs are polled in a subsequent run (or in the same run if the batch window is long enough). This pattern has lower infrastructure complexity — a scheduled script or cron Lambda replaces the webhook consumer and message queue — but introduces latency and requires idempotent state tracking to avoid submitting the same video twice across batch runs.
Batch-scheduled is appropriate for organisations publishing fewer than 10 videos per week or for backlog remediation work where real-time delivery is not required. For backlog remediation specifically — submitting a catalogue of 2,000 existing videos to the caption API — the batch pattern is preferred because it allows rate limiting and throughput control that a pure event-driven system would not provide out of the box. For backlog remediation architecture, see the backlog remediation playbook.
Hybrid: event-driven for new content, batch for backlog
Most production deployments use a hybrid approach. The event-driven pipeline handles all net-new content published from the day the automation goes live. A separate batch remediation pipeline runs on a scheduled basis to work through the existing back catalogue. The two pipelines share the same caption API client, glossary resolution logic, LMS delivery functions, and status store — they differ only in the ingestion mechanism (webhook vs cron query) and the throughput controls (new-content pipeline runs at full speed; backlog pipeline is rate-limited to avoid overloading vendor capacity). Both pipelines write to the same structured event log, so the accessibility coordinator has a single view of caption status across the entire content library regardless of which pipeline produced the caption.
Webhook integration: video host side
Each video host has a different webhook implementation. The common thread is that all of them fire an HTTP POST to your webhook endpoint when a video reaches a specified lifecycle state, and all of them include some form of signature header that you should validate before processing the payload. The specific event names, payload schemas, and signature mechanisms differ enough that a production implementation typically has a thin adapter layer per video host that normalises the incoming payload to a common internal event schema before enqueuing.
Kaltura
Kaltura's event notification system (EventNotificationTemplate) supports both HTTP push notifications and email notifications. For automation, you want HTTP push. Configure a notification template in the Kaltura Management Console (KMC) under Settings → Event Notifications. Select "HTTP Notification" as the type, set your endpoint URL, and select the entry event(s) you want to trigger on. The relevant events for caption automation are ENTRY_ADDED (fires when an entry is created), ENTRY_READY (fires when media conversion completes and the entry is playable), and ENTRY_UPDATE (fires on metadata changes — useful if you want to trigger re-captioning when content is updated).
Kaltura's HTTP notification payload is form-encoded (not JSON) by default. The ENTRY_READY payload includes the entry:id (the Kaltura entry ID), entry:name, entry:mediaType, and entry:duration. You need the entry ID to look up the media source URL and to deliver the completed caption asset via the caption_captionasset service. The notification does not include a signature header in the standard configuration — Kaltura recommends IP allowlisting or adding a shared secret as a custom POST parameter that your consumer validates. For tighter security, add the webhook endpoint behind an API Gateway with JWT authentication that Kaltura includes as a URL parameter.
For a detailed reference on the Kaltura caption API surfaces used in the delivery step, see the Kaltura captions reference.
Panopto
Panopto's webhook support is available in Panopto Cloud instances running version 7.0 or later and in on-premises instances running version 6.1 or later. Configure webhooks under System → Webhooks in the Panopto admin console. Panopto supports the SessionReady event, which fires when a recording is fully processed and available for viewing — this is the correct trigger for caption automation. The payload is JSON and includes the Id (session GUID), Name, Duration, and Urls.EmbedUrl fields.
Panopto signs webhook requests with an HMAC-SHA256 signature in the X-Panopto-Webhook-Signature header, computed from the raw request body and a shared secret that you configure in the admin console. Validate this signature on every incoming request — reject with HTTP 400 any request where the computed signature does not match the header value. Panopto retries failed webhook deliveries (non-200 responses) up to 5 times with exponential backoff over 24 hours.
A specific consideration for Panopto: the session's video source URL for submission to the caption API is not directly in the webhook payload. You need to call GET /Panopto/api/v1/sessions/{id} to retrieve the Urls.VideoUrl or extract the media URL from the session's delivery manifest. This adds one API call between webhook receipt and caption job creation. Build this into your consumer handler (or the queue worker's pre-processing step) rather than trying to derive the URL from the webhook payload alone.
Vimeo
Vimeo's webhooks are configured through the Vimeo Developer API. Create a webhook subscription by sending a POST to https://api.vimeo.com/webhooks with a JSON body specifying the url (your endpoint), the list of events to subscribe to, and your OAuth access token in the Authorization header. For caption automation, subscribe to the video.transcode.complete event, which fires when Vimeo has finished transcoding the uploaded video and made it available for playback. The payload includes the resource_key (Vimeo video ID), the video uri, and metadata fields. Vimeo signs events with an HMAC-SHA256 signature in the X-Vimeo-Signature header.
Vimeo has a specific consideration for text track delivery: its text track API (POST /videos/{video_id}/texttracks) accepts VTT only — not SRT. If your caption API delivers SRT as its primary output format, you need a format conversion step between retrieval and Vimeo delivery. The conversion is straightforward (replace the timing separator and add the WEBVTT header), but it must be in your pipeline — submitting an SRT file to Vimeo's text track API returns a format error. See the VTT format reference for specification details relevant to Vimeo delivery.
Wistia
Wistia's webhooks are configured in Account Settings → Integrations → Webhooks. Select the events you want and provide your endpoint URL. For caption automation, subscribe to the media.ready event, which fires when a video is processed and accessible via the Wistia API. The JSON payload includes the media.hashed_id (Wistia's media identifier), the media.name, media.duration, and the media.assets array containing CDN URLs for the video file. The original source URL in media.assets is the appropriate source for caption API submission — it is a direct MP4 download link that the caption API can fetch without authentication.
Wistia does not sign webhook requests as of its current API version. Use a shared-secret URL parameter approach: configure your Wistia webhook endpoint with a secret token in the query string (https://your-pipeline.example.com/webhooks/wistia?secret=your_secret_here) and validate that parameter in your consumer. Because Wistia's media assets include a direct MP4 URL, the Wistia integration can be fully resolved at webhook receipt time — no additional API call is needed to get the source URL for caption job creation, unlike Panopto.
LMS-native upload events
Some LMS platforms expose their own event hooks for video upload events, independent of an external video host. Docebo's Event Manager allows you to create automations triggered by course content events. Kaltura's MediaSpace application can be configured to fire notifications on video upload through the MediaSpace configuration panel, separate from the Kaltura platform notifications. For organisations where video assets are uploaded directly to the LMS rather than to a standalone video host, these LMS-native hooks may be the appropriate trigger point.
However, LMS-native hooks typically fire earlier in the asset lifecycle than video-host-native webhooks — they fire when the asset is uploaded, not when it is fully transcoded and playable. If you submit the video source URL to the caption API before transcoding is complete, the caption API may receive a non-playable asset and fail the job. For LMS-native trigger points, add a readiness check: after receiving the upload event, poll the LMS asset status endpoint until the asset is in a playable state before submitting the caption job. This adds latency (typically 2–8 minutes for LMS transcoding) but prevents the more expensive path of a failed caption job and manual resubmission.
Caption API job creation
Once your webhook consumer has received and validated a video-ready event, the next step is creating a caption job via the API. The GlossCap caption API follows REST conventions and uses JWT authentication. All requests include an Authorization: Bearer {token} header. Tokens are obtained via POST /v1/auth/token with your API key and expire after 24 hours. Implement token caching in your pipeline — requesting a new token for each caption job submission is wasteful and will hit rate limits on high-volume pipelines.
POST /v1/captions — request structure
The job creation endpoint accepts a JSON body with the following fields:
POST /v1/captions HTTP/1.1
Host: api.glosscap.com
Authorization: Bearer {token}
Content-Type: application/json
{
"source_url": "https://cdn.example.com/videos/onboarding-module-12.mp4",
"glossary_id": "gloss_eng_2026",
"language": "en-US",
"output_formats": ["srt", "vtt"],
"priority": "standard",
"metadata": {
"lms_asset_id": "ka_entry_abc123",
"content_category": "engineering_onboarding",
"department": "engineering",
"upload_event_id": "evt_wh_789xyz"
}
}
The source_url must be a publicly accessible URL that the caption API's media fetcher can download. Authenticated URLs (requiring cookies or a short-lived signed token) require either passing the authentication headers separately or pre-downloading the file and uploading it to the API as multipart/form-data at POST /v1/captions/upload. For most video hosts — Vimeo, Wistia, Kaltura with public-share links — the URL in the webhook payload is directly accessible. For Panopto with access controls or private Vimeo videos, use the upload endpoint instead.
The glossary_id parameter is the most important accuracy lever in the automation pipeline. Without it, the ASR engine runs with its default vocabulary, which performs at 91–94% accuracy on general-English content but falls to 73–84% on technical, healthcare, and compliance vocabulary. With the correct glossary applied, accuracy on those same technical content types rises to 97–99%. The glossary architecture post covers how to design per-department and per-content-category glossaries; the automation pipeline's job is to resolve the correct glossary_id for each video at submission time.
Glossary resolution logic
Glossary resolution — mapping a given video to its correct glossary_id — is typically done by reading metadata from the webhook payload or from the LMS asset record. Common resolution strategies:
- By folder or channel: Kaltura media is organised into categories; Panopto content is organised into folders. If your video host has a folder/category structure that maps to content type (e.g., the "Engineering Training" Kaltura category always uses the engineering glossary), resolve the glossary by category. This requires maintaining a mapping table from category ID to glossary ID, which typically lives in a configuration file or database table alongside your pipeline code.
- By LMS course membership: If the video host webhook payload includes the LMS course ID or curriculum tag, look up the course's department or content classification in the LMS and map that to a glossary. This provides the most accurate resolution because it uses the same content taxonomy that the L&D team already maintains — but it requires an additional API call to the LMS to look up the course metadata.
- By uploader or team: If your video host tracks the uploading user's team membership (Panopto can be configured to show the creator's AD group; Kaltura user metadata is accessible via the KS), resolve the glossary by the uploader's department. This is the fastest resolution strategy and requires no additional API calls, but it fails when a central L&D team uploads content for multiple departments.
- Default fallback: If none of the above signals are available, fall back to a general English glossary that covers the most common proper-noun failures across all departments — product names, executive names, acronyms. This is worse than a domain-specific glossary but better than no glossary at all.
Priority and turnaround options
The priority field accepts standard, priority, or urgent. Standard turnaround for a 10-minute video is 8–14 minutes on the GlossCap platform. Priority reduces this to 4–8 minutes. Urgent is a best-effort expedited path that targets under 4 minutes. Priority and urgent tiers carry higher per-minute costs, so the decision to use them should be based on the content category's urgency, not applied uniformly across all content. For most ongoing production pipelines, standard is the correct default. For time-sensitive content — benefits enrollment video with a 48-hour publication window, or compliance training tied to a regulatory deadline — use priority.
The API response to a successful job creation request is HTTP 202 Accepted with a JSON body containing the job_id and the estimated completion time:
HTTP/1.1 202 Accepted
Content-Type: application/json
{
"job_id": "job_a1b2c3d4e5f6",
"status": "queued",
"estimated_completion_at": "2026-06-21T14:32:00Z",
"glossary_id": "gloss_eng_2026",
"source_url": "https://cdn.example.com/videos/onboarding-module-12.mp4",
"created_at": "2026-06-21T14:18:47Z"
}
Persist the job_id to your status store immediately after receipt. Your polling worker needs this ID to check job status. If your webhook consumer crashes between job creation and ID persistence, you will lose the ID and the job will complete without triggering the LMS delivery step — which means a video that appeared to be "in progress" silently fails to get captions. Use a transactional write or at-least-once delivery guarantee for the status store update.
Polling for job status
Caption jobs are asynchronous — the initial job creation returns immediately with a queued status, and the actual ASR processing happens on the vendor's infrastructure. Your pipeline must poll the job status endpoint until the job reaches a terminal state. Poorly implemented polling is one of the most common sources of production incidents in caption automation pipelines: polling too frequently burns API rate limits; polling too infrequently means captions are delivered hours late; not handling the timeout case means jobs that fail on the vendor side silently remain in a "pending" state in your status store forever.
Polling endpoint and response structure
GET /v1/jobs/{job_id} HTTP/1.1
Host: api.glosscap.com
Authorization: Bearer {token}
HTTP/1.1 200 OK
Content-Type: application/json
{
"job_id": "job_a1b2c3d4e5f6",
"status": "processing",
"progress_percent": 45,
"source_url": "https://cdn.example.com/videos/onboarding-module-12.mp4",
"glossary_id": "gloss_eng_2026",
"language": "en-US",
"created_at": "2026-06-21T14:18:47Z",
"updated_at": "2026-06-21T14:23:12Z",
"estimated_completion_at": "2026-06-21T14:32:00Z"
}
The status field takes one of the following values:
queued— job is waiting for a processing slotprocessing— ASR and glossary application are in progresscompleted— caption file is ready for retrievalfailed— job failed; check theerrorfield for reasoncancelled— job was cancelled via DELETE /v1/jobs/{job_id}
Only completed, failed, and cancelled are terminal states. Poll until you reach one of these. Do not poll indefinitely on queued or processing — implement a wall-clock timeout (4 hours is a reasonable upper bound for any video under 3 hours of duration).
Exponential backoff schedule
The polling interval should start short and expand as time passes. An ASR job for a 10-minute video will not complete in the first 3 minutes — polling every 10 seconds for the first few minutes burns rate-limit budget on requests that will never return a terminal status. A reasonable default schedule:
Poll 1: after 2 minutes (most jobs under 5 min will complete by here)
Poll 2: after 4 minutes
Poll 3: after 8 minutes (cumulative: 14 minutes — most 10-min videos done)
Poll 4: after 12 minutes
Poll 5: after 20 minutes (cumulative: 46 minutes — catches slow-queue jobs)
Poll 6: after 30 minutes
Poll 7+: every 30 minutes until timeout at 4 hours
Add a ±15% jitter to each interval to prevent a fleet of workers that all started at the same time from polling simultaneously. In AWS SQS with Lambda, implement the polling loop using SQS delay queues: the initial poll message is queued with a 2-minute delivery delay; if the job is not terminal, the worker enqueues a new poll message with the next interval's delay. This avoids the long-running Lambda anti-pattern where a single Lambda waits in a loop for 14 minutes.
Timeout and dead-letter handling
A job that has not reached a terminal state after 4 hours should be moved to a dead-letter queue and an alert should fire. Do not silently drop it. The dead-letter record should include the job_id, the source video asset, the LMS asset ID, the original webhook event ID, and the timestamp at which it was dead-lettered. A human needs to investigate whether the job is still in the vendor's queue (occasionally true during vendor-side incidents), needs to be re-submitted, or has completed but your polling worker missed the status update.
Similarly, a job that returns a failed status should trigger an alert and a retry decision. The error field in the failure response indicates the failure category: source_unreachable (the video URL returned 4xx or 5xx), source_format_unsupported (the video codec or container is not supported), source_duration_exceeded (the video is longer than the account's per-job duration limit), or glossary_invalid (the specified glossary_id does not exist or is not licensed to your account). Source URL failures are retryable; format and glossary failures require fixing the underlying issue before resubmission.
SRT and VTT retrieval
When the job status reaches completed, retrieve the caption file via the output endpoint. Request the format that matches your target LMS's ingestion requirement:
GET /v1/jobs/{job_id}/output?format=srt HTTP/1.1
Host: api.glosscap.com
Authorization: Bearer {token}
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Disposition: attachment; filename="job_a1b2c3d4e5f6.srt"
X-Glosscap-Word-Error-Rate: 0.009
X-Glosscap-Confidence: 0.994
X-Glosscap-Duration-Seconds: 618.4
X-Glosscap-Cue-Count: 124
1
00:00:00,000 --> 00:00:03,180
Welcome to the GlossCap API integration module.
2
00:00:03,480 --> 00:00:07,240
In this section, we'll cover the OAuth 2.0 authentication flow
and the glossary parameter structure.
...
The response headers include quality metadata: X-Glosscap-Word-Error-Rate (the estimated WER on the technical vocabulary in the file), X-Glosscap-Confidence (average per-word confidence score), X-Glosscap-Duration-Seconds (the total duration of the transcribed content), and X-Glosscap-Cue-Count (the number of cues in the file). Your pipeline should log all four values alongside the job ID and asset ID — they are the raw data for the accuracy tracking component of your compliance KPI reporting.
Pre-delivery validation
Before delivering the retrieved file to the LMS, run a pre-delivery validation pass. This catches formatting issues that would cause silent ingestion failures at the LMS layer without producing any error that traces back to the caption file:
- BOM check: Verify that the file does not begin with the UTF-8 BOM sequence (bytes EF BB BF). If it does, strip the BOM before delivery. GlossCap's API never returns BOM-prefixed files, but if your pipeline includes a format conversion step that produces an intermediate file using a Windows-native tool or a Node.js
fs.writeFileSynccall with BOM encoding, the BOM can be introduced at that step. - Timing separator check (SRT): Verify that all timing lines use comma as the decimal separator (
00:00:03,180), not period (00:00:03.180). A regex over the first 10 cues catches this in under 1ms. - Duration coverage check: Verify that the last cue's end timestamp is within ±30 seconds of the video's known duration (from the webhook payload). If the caption file ends at 03:24 but the video is 12:47, the ASR job likely failed to process the full file and completed with a partial transcript — this is a
completedstatus with silently truncated output. - Non-empty cue check: Verify that
X-Glosscap-Cue-Countis greater than zero. An empty caption file is a valid SRT document but not a valid caption — it will appear as a caption track in the LMS player with no visible captions during playback. - WEBVTT header check (VTT): Verify that VTT files begin with the string
WEBVTTon the first line. Some platforms reject VTT files that are missing the header even if the rest of the format is correct.
Files that fail any validation check should be moved to a human review queue rather than delivered to the LMS. Silently delivering a malformed caption file is worse than not delivering any caption, because it creates a false positive in your coverage metric — the LMS reports a caption track is present, your coverage report shows 100%, but learners receive no captions or garbled captions. The QA methodology post covers how to structure the human review process for files that fail automated validation.
LMS delivery: Kaltura
Kaltura has the deepest caption API of any enterprise LMS or video platform in the L&D market. The caption_captionasset service provides full lifecycle management for caption tracks attached to Kaltura media entries. Use the caption_captionasset.add action to create a new caption asset, then caption_captionasset.setContent to upload the caption file content. This two-step approach is required because Kaltura separates the caption asset record (metadata) from the caption asset content (the file itself).
Step 1: Create the caption asset record
POST https://www.kaltura.com/api_v3/service/caption_captionasset/action/add
Content-Type: application/x-www-form-urlencoded
ks={kaltura_session}
&entryId=0_abc12def
&captionAsset[objectType]=KalturaCaptionAsset
&captionAsset[language]=English
&captionAsset[languageCode]=en
&captionAsset[label]=English
&captionAsset[format]=1
&captionAsset[accuracy]=99
&captionAsset[isDefault]=1
The format field uses Kaltura's internal format enum: 1 = SRT, 2 = DFXP (TTML), 3 = WebVTT. Use 1 for SRT delivery. The isDefault flag controls whether this caption track is the default track displayed to learners who have enabled captions in the player. If the entry already has a default caption track from a prior run (e.g., an auto-generated track that you are replacing), set isDefault=1 and also call caption_captionasset.setAsDefault to ensure the new track takes precedence.
Step 2: Upload caption file content
POST https://www.kaltura.com/api_v3/service/caption_captionasset/action/setContent
Content-Type: application/x-www-form-urlencoded
ks={kaltura_session}
&id={caption_asset_id_from_step1}
&contentResource[objectType]=KalturaUploadedFileTokenResource
&contentResource[token]={upload_token}
The content upload uses Kaltura's upload token mechanism: first call uploadtoken.add to create an upload token, then uploadtoken.upload to upload the caption file bytes to the token endpoint, then pass the token ID in the setContent call above. The upload token approach supports resumable uploads for large files — while caption SRT files are small (typically under 100KB), the token pattern is Kaltura's standard approach for all file content and should be followed for consistency with the rest of your Kaltura integration.
Kaltura's caption_captionasset service is idempotent at the action level but not at the entry level — you can create multiple caption assets for the same entry. If your pipeline might deliver captions to the same Kaltura entry more than once (e.g., after a content revision or a failed first delivery), query the existing caption assets with caption_captionasset.list before calling add, and update the existing asset (via setContent) rather than creating a duplicate. The caption_captionasset.list call with filter[entryIdEqual]={entryId} returns all caption assets for the entry.
Kaltura also exposes a REACH caption-ordering service that provides an alternative caption generation path — if your Kaltura account has a REACH subscription, you can order captions directly through Kaltura rather than using an external caption API. For organisations where Kaltura is the primary video infrastructure and REACH is already licensed, this may be simpler than a separate caption API integration. For organisations that use multiple video hosts (Kaltura + Panopto + Vimeo), a unified external caption API that handles all video sources through a single integration surface is typically preferred over platform-native ordering services.
LMS delivery: Cornerstone OnDemand
Cornerstone OnDemand's caption delivery surface is structured differently from Kaltura's. In Cornerstone, video content is typically managed through the Training Module (transcripts and course completions) and the Media Library (video asset management). Caption tracks for video content in Cornerstone are associated with the Video Learning Object, not directly with the media asset. The API surface for caption delivery depends on how your organisation's Cornerstone instance is configured and what media hosting integration is in use.
Cornerstone instances that use integrated video hosting (Cornerstone Video) expose caption management through the Cornerstone API v1 and v2 endpoints. The relevant endpoint for adding a caption track to a video learning object is part of the Learning Object Management API:
POST /api/v1/loProxy/transcriptFile
Authorization: Basic {base64(api_key:api_secret)}
Content-Type: multipart/form-data
{
"loId": "12345",
"loType": "online",
"transcriptFile": (binary SRT file content),
"language": "en-US",
"isDefault": true
}
This endpoint is available in Cornerstone instances with the Caption Management feature flag enabled. Contact your Cornerstone customer success representative to verify whether the flag is active on your instance — it is not enabled by default for all account tiers. Without the feature flag, caption management must be done through the Cornerstone administrative UI, which does not have a corresponding API surface for automated delivery.
For Cornerstone instances where the video content is hosted externally (in Kaltura, Panopto, or a CDN-hosted MP4), the caption track is typically associated with the video host rather than with Cornerstone directly. In this configuration, your pipeline delivers the caption file to the video host's API (Kaltura's caption_captionasset service, or Panopto's subtitle endpoint), and Cornerstone inherits the caption track through the embed integration. The LMS asset ID passed in your webhook payload must be mapped to the correct host-side asset ID for this pattern to work.
A specific Cornerstone consideration for the LMS migration context: when organisations migrate from one LMS to Cornerstone (a common migration destination in the enterprise L&D market), the caption files from the source LMS are often lost in the migration because the migration tools move course content files but not associated media metadata. If your pipeline is deployed during or after a Cornerstone migration, run an inventory check on caption coverage in the new Cornerstone instance before relying on the pipeline for net-new content only — a significant portion of the migrated content may need captions delivered as part of the migration remediation, not just for new uploads.
LMS delivery: Workday Learning
Workday Learning uses the Workday Web Services API (WWS) and the newer Workday REST API for Learning Catalog operations. Caption delivery in Workday Learning is associated with the Course Section or Learning Content object. The REST API path for updating a learning content object with caption metadata is:
PUT /api/learning/v1/learningContent/{learningContentWid}
Authorization: Bearer {oauth_token}
Content-Type: application/json
{
"captionFile": {
"fileName": "module-12-captions-en-US.srt",
"fileContent": "{base64-encoded SRT content}",
"languageCode": "en-US",
"isDefault": true
}
}
The learningContentWid (Workday ID) for the video asset must be resolved from the video's identifier in your system. If your Workday Learning instance ingests video from an external source (Panopto, Kaltura, or a CDN), the Workday content object has an externalId field that typically matches the video host's asset ID. Use that external ID to look up the Workday learning content WID via GET /api/learning/v1/learningContent?externalId={asset_id} before making the PUT call.
Workday Learning's API requires OAuth 2.0 authentication with a service account that has the Learning Administrator security group membership. The OAuth client credentials flow (grant_type=client_credentials) is the appropriate pattern for a machine-to-machine pipeline. Workday issues access tokens with a 3600-second TTL — implement token refresh in your pipeline using the same caching pattern as the caption API token. Both tokens should be stored in a secure credential store (AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager) and rotated on the credential store's rotation schedule, not hardcoded in pipeline configuration.
A key operational difference between Workday Learning and the other LMS platforms: Workday's API rate limits are more restrictive and more strictly enforced. The standard Learning API rate limit is 60 requests per minute per integration client. For a pipeline delivering captions to a high-volume Workday instance (50+ new videos per week), batch your LMS delivery calls rather than making individual PUT calls as each caption job completes. Accumulate completed jobs over a 15-minute window, then deliver all of them in a single batch pass that respects the rate limit. This is the one place in the pipeline where a batch accumulation pattern is preferable even in an otherwise event-driven system.
Batch processing for backlog remediation
Deploying an automated caption pipeline for net-new content does not resolve the back catalogue. An organisation that has been manually captioning (or not captioning) content for three years before deploying the automation pipeline will have a back catalogue of uncaptioned or inconsistently captioned content. The automation pipeline described above handles net-new content; a separate batch remediation pipeline is required for the back catalogue.
Inventory extraction
The first step in backlog remediation is extracting the full inventory of video assets from the LMS and video host, tagged with caption status. For Kaltura, use media_entry.list with a filter on mediaType = KalturaMediaType.VIDEO and paginate with pager[pageSize]=500 until you have the full entry list. For each entry, call caption_captionasset.list to check whether a human-verified caption track exists. Flag entries with no caption assets, or with caption assets where the accuracy metadata field is below 95%, as remediation candidates. The LMS caption audit methodology covers the full inventory and triage process in depth.
Queue architecture for batch remediation
Do not submit the entire back catalogue to the caption API simultaneously. A batch of 500 videos submitted at once will exceed your vendor's per-account concurrency limit (typically 20–50 concurrent jobs) and result in most jobs queuing for hours. The correct pattern is a controlled submission queue: pull assets from the inventory in priority order (see the backlog remediation playbook for priority tiers), submit batches of 10–20 jobs, and wait until at least 80% of the submitted batch has reached a terminal state before submitting the next batch. This keeps the vendor queue depth manageable and prevents a sudden influx from degrading turnaround time for your new-content pipeline.
A simple batch controller in Python:
import time
from typing import Iterator
from glosscap_client import CaptionClient, JobStatus
BATCH_SIZE = 15
BATCH_COMPLETION_THRESHOLD = 0.8
def process_backlog(
client: CaptionClient,
assets: Iterator[dict],
glossary_resolver: callable
) -> None:
batch = []
for asset in assets:
glossary_id = glossary_resolver(asset)
job = client.create_job(
source_url=asset["url"],
glossary_id=glossary_id,
priority="standard",
metadata={"lms_asset_id": asset["id"]}
)
batch.append(job["job_id"])
if len(batch) >= BATCH_SIZE:
wait_for_batch_completion(client, batch, BATCH_COMPLETION_THRESHOLD)
batch = []
if batch:
wait_for_batch_completion(client, batch, 1.0)
def wait_for_batch_completion(
client: CaptionClient,
job_ids: list,
threshold: float
) -> None:
terminal_statuses = {JobStatus.COMPLETED, JobStatus.FAILED, JobStatus.CANCELLED}
interval = 120 # seconds
max_wait = 14400 # 4 hours
elapsed = 0
while elapsed < max_wait:
statuses = [client.get_job(jid)["status"] for jid in job_ids]
terminal_count = sum(1 for s in statuses if s in terminal_statuses)
if terminal_count / len(job_ids) >= threshold:
return
time.sleep(interval)
elapsed += interval
interval = min(interval * 1.5, 600) # cap at 10-minute intervals
For large backlogs (over 1,000 assets), run the batch controller as a scheduled job that processes 50–100 assets per run, rather than attempting to process the full catalogue in a single long-running process. This makes progress visible in your audit log, allows the remediation to be paused and resumed, and limits the blast radius if a bug in the glossary resolver submits a batch with incorrect glossary assignments.
Automated QA gate
An automation pipeline that delivers caption files without quality validation is worse than a manual workflow in one important respect: it delivers errors at scale. A misconfigured glossary parameter or a vendor-side accuracy regression that would have been caught manually by the person reviewing the file before upload will instead be delivered to every video asset in the content category before anyone notices. Automated QA gates are the mechanism that restores the quality check that the manual workflow provided implicitly.
Confidence-score gate
The X-Glosscap-Confidence header on the output retrieval response provides an average per-word confidence score for the delivered transcript. A confidence score below 0.92 on a video that uses a domain-specific glossary typically indicates that the glossary did not engage correctly — either the glossary_id was wrong, the vocabulary in the video diverged significantly from the glossary terms, or the audio quality was below the threshold for accurate ASR. Flag jobs where confidence < 0.92 for human spot-check before LMS delivery, rather than delivering them directly.
Random sample audit
Even for jobs that pass the confidence-score gate, implement a random sample audit: select 5% of completed jobs for DCMP-protocol accuracy validation by a human reviewer. This provides ongoing quality signal that the automated system is performing correctly, catches systematic regressions that would be invisible in the confidence score alone, and produces the audit trail documentation required for the annual programme review. Log the DCMP accuracy score for each audited job alongside the job ID and asset ID in your event log.
Post-delivery verification
After delivering a caption file to the LMS, verify that the delivery was successful by querying the LMS API to confirm that the caption track is present on the asset. For Kaltura, call caption_captionasset.list and verify that a caption asset with status READY exists for the entry. For Cornerstone, call the transcript file endpoint and verify the response includes the delivered file. For Workday, call GET /api/learning/v1/learningContent/{wid} and verify that the captionFile field is populated. A caption file that was submitted without error but did not actually attach to the LMS asset is a silent failure — it appears successful in your pipeline's event log but produces no captions for learners.
See also: the caption QA methodology post for the DCMP spot-check protocol and error taxonomy used in the human review step of the QA gate.
Glossary management via API
A caption automation pipeline is only as accurate as its glossaries. An automated pipeline that does not also automate glossary updates will deliver accurate captions at launch and then degrade over time as the product vocabulary, personnel names, and compliance terminology evolve. The glossary architecture post covers the term-sourcing and maintenance cadence in depth; this section covers the API operations for keeping glossaries current as part of the automation infrastructure.
Creating and updating glossary terms via API
The GlossCap glossary API follows the same REST conventions as the caption API. Glossary terms can be created and updated programmatically:
POST /v1/glossaries/{glossary_id}/terms
Authorization: Bearer {token}
Content-Type: application/json
{
"terms": [
{
"term": "OpenTelemetry",
"phonetic": "open-tel-ih-met-ree",
"boost": 1.8,
"context": "observability platform, often abbreviated OTel"
},
{
"term": "GlossCAP",
"phonetic": "gloss-cap",
"boost": 2.0,
"canonical_form": "GlossCap"
}
]
}
The boost parameter (range 1.0–3.0) controls how strongly the ASR decoder is weighted toward the glossary term when it encounters acoustically similar alternatives. A boost of 1.0 is a gentle preference; 2.0 is a strong preference for terms that are clearly defined in the glossary vocabulary; 3.0 is the maximum, appropriate for terms that are so unusual that any transcription other than the glossary term is almost certainly wrong (e.g., an internal product name with no common-English alternative pronunciation). Over-boosting generic words by setting very high boost values on common-English terms can degrade overall accuracy — restrict high-boost values to proper nouns and coined terms.
Triggering glossary updates from external systems
The most effective glossary maintenance pattern is integrating your glossary update pipeline with the systems that own the canonical term list. For engineering vocabulary, this is typically the developer docs or the SDK changelog. For healthcare vocabulary, it is the formulary update schedule or the compliance content calendar. For sales content, it is the product release notes or the pricing sheet. A lightweight integration that monitors these sources and fires glossary update API calls when new terms are detected keeps the glossary current without requiring manual maintenance by the accessibility coordinator.
A practical pattern: subscribe to your company's GitHub repository's push events. When a commit to the SDK documentation or product changelog contains a new product name or SDK method name (detectable via a simple regex against the diff), fire a webhook to your glossary management service that extracts the new terms and calls POST /v1/glossaries/{id}/terms. This is a speculative addition — not every new term in a changelog needs to be in the captioning glossary — but it surfaces candidates for human review rather than requiring the accessibility coordinator to manually track every product update.
Monitoring and observability
A caption automation pipeline that works correctly is invisible — captions appear on videos, learners receive accessible content, and no one needs to think about how it happened. A caption automation pipeline that fails silently is worse than no automation, because the compliance programme believes it is working while captions are not being delivered. Observability is the difference between the two.
Metrics to track
The following metrics should be emitted from your pipeline to a time-series metric store (DataDog, CloudWatch, Prometheus) with per-content-category dimensions:
- Jobs submitted per hour: Spike detection — a sudden increase indicates a batch remediation run or a bulk upload event. A sustained zero indicates the webhook integration is broken.
- Job completion rate:
completed / (completed + failed + cancelled)over a rolling 24-hour window. Should be above 98% in steady state. A drop below 95% signals a vendor-side issue or a systematic source-URL problem in your pipeline. - Median job completion time: Track P50 and P95 separately. A rising P95 without a rising P50 indicates that some content category or video host is producing anomalously slow jobs.
- LMS delivery success rate:
successfully_delivered / attempted_deliveries. Should be 100% in steady state. Any value below 100% means caption files are completing but not reaching the LMS. - Confidence score distribution: Track the P10, P50, and P90 of the
X-Glosscap-Confidencevalues across all completed jobs. A downward shift in P10 indicates a vocabulary coverage gap — new content is using terms that are not in the glossary. - Human review queue depth: The number of jobs flagged for human review that have not yet been reviewed. A growing queue indicates either that your quality gate threshold is too aggressive or that the human review capacity is not keeping pace with volume.
Audit trail for compliance
Every event in the pipeline — webhook receipt, job creation, status poll, caption retrieval, LMS delivery, QA gate result — should be written to an append-only structured event log with a timestamp, the event type, the job ID, the source asset ID, and the LMS asset ID. This log is the compliance record for your caption programme. When an accessibility audit asks "when was a caption track first delivered to this video asset, and what was the glossary used?", the answer should be a single query against the event log, not a manual reconstruction from multiple systems' logs.
Use an event schema that supports the queries you will need for the compliance KPI reporting: coverage rate (what percentage of published videos have a verified caption track, by week), delivery latency (median time from video publication to caption delivery), and accuracy distribution (confidence score percentiles across the content library). Log schema consistency matters — if the schema changes between pipeline versions, historical queries will fail. Version your event schema explicitly and maintain backward compatibility when adding new fields.
Connecting the pipeline to caption programme governance
An automated caption pipeline is a technical artefact, but it sits inside a governance programme that defines the rules it must enforce. The pipeline must implement the specific policies documented in your governance policy — it is not sufficient for the policy to say "all new videos shall have captions within 48 hours of publication" if the pipeline's median delivery time is 2 weeks because no one has configured the correct priority tier for time-sensitive content categories.
The key governance integration points:
- Pre-publication gate: If your governance policy requires that captions be delivered before a video is visible to learners, the pipeline must be able to delay LMS publication until the caption job completes. This requires an integration between the pipeline and the LMS's content visibility controls — typically a "draft" or "pending" status on the LMS asset that is only promoted to "published" once the pipeline's post-delivery verification step confirms the caption track is present. This is the most technically demanding governance integration and is worth the implementation effort for organisations where any uncaptioned-video window is a compliance violation.
- Exception handling: Your governance policy likely includes an exception procedure for content that cannot be captioned within the standard timeline. The pipeline should support exception flagging: when a job times out, fails after retries, or is flagged for human review, the pipeline should notify the accessibility coordinator with enough detail (source asset, LMS course, content category, failure reason) to initiate the exception procedure rather than leaving the flag in a queue that no one is monitoring.
- Change-of-content trigger: When a video asset is updated with new audio — a revised narration recording, a re-recorded module, a version update — the pipeline should detect the change and submit a new caption job rather than leaving the old caption track attached to the updated video. Kaltura's
ENTRY_UPDATEevent fires on content updates and can trigger re-captioning; other video hosts may require a change-detection mechanism (comparing file size or MD5 hash between versions) to detect when the underlying video content has changed versus when only metadata has been updated.
The intersection of automation and governance is also where the annual review gets its data. The pipeline's structured event log, quality metrics, and exception records are the inputs to Components 1, 2, and 4 of the annual review (content library audit, vendor performance review, and LMS delivery audit). An organisation with a well-instrumented automation pipeline can complete the annual review data collection phase in a day rather than a week, because the data is already structured and queryable rather than scattered across manual spreadsheets and vendor portals.
Eight failure modes in caption automation pipelines
-
Webhook delivery failures treated as events-never-occurred
Webhook delivery is not guaranteed. Your video host may fail to deliver a webhook due to network errors, your endpoint being temporarily unavailable, or rate limits on the notification service. If you treat each webhook as a guaranteed single delivery, you will have a gap between videos published during downtime and the pipeline's awareness of them. Add a reconciliation process that periodically queries the video host's API for assets published in the last 24 hours without a corresponding pipeline event, and submits any missing assets as catch-up jobs. For Kaltura, this is a scheduled
media_entry.listcall filtered bycreatedAtGreaterThanOrEqual. For Panopto, it is the Sessions API filtered byAfter(creation timestamp). For Vimeo and Wistia, both have video list endpoints withcreated_time.beforeandcreated_time.afterfilters. -
Glossary resolver returns null for unrecognised content categories
When a video is uploaded to a folder, channel, or course that is not in your glossary resolver's mapping table, the resolver returns null and the job is submitted without a glossary. This produces a caption file at baseline accuracy. The failure mode is invisible in the pipeline's event log if you log the null glossary_id as "general" rather than as "unresolved." Log unresolved glossary lookups explicitly, emit a metric for them, and alert when the unresolved rate exceeds 5% of submitted jobs. This catches the case where an L&D team creates a new content category or folder structure that the pipeline doesn't know about.
-
Silent LMS delivery success with no caption track produced
Several LMS caption delivery endpoints return HTTP 200 on a request that has succeeded from the API's perspective but has not actually attached the caption track to the asset in a player-visible way. Kaltura's
setContentcall may return success even if the uploaded file content is malformed in a way that Kaltura's parser rejects at the transcoding stage. Always follow delivery with a verification read — query the asset's caption track list and verify that a track withstatus=READYexists. If the verification fails, treat the delivery as a failure and retry. -
Token expiry causing burst failures
If your pipeline caches authentication tokens without monitoring expiry, the token will eventually expire and all subsequent API calls will return HTTP 401 until the cache is cleared or the token is refreshed. This typically manifests as a burst of failures at a predictable interval — 24 hours after the token was last issued. Implement a proactive refresh that requests a new token 5 minutes before the current token expires, rather than reacting to 401 responses. Token expiry bugs are harder to reproduce in testing because the 24-hour window means they are not caught by unit tests that run in seconds.
-
Duplicate caption delivery creating conflicting tracks
If the webhook consumer is not idempotent — if it processes the same webhook event twice due to a retry from the video host — your pipeline will submit two caption jobs for the same video, retrieve two caption files, and attempt to deliver both to the LMS. On Kaltura, this creates two caption assets for the same entry, which may both appear as selectable tracks in the player. Add deduplication at the queue entry point: before enqueuing a caption job, check the status store for an existing job for the same source asset ID. If one exists and is not in a failed state, skip the new submission.
-
Backlog remediation rate limiting causing content-type queue starvation
When running a backlog remediation batch concurrently with the new-content pipeline, the batch can saturate the vendor's concurrency limit and cause new-content jobs to queue behind hundreds of backlog jobs. New videos published during active backlog remediation may wait hours for captions instead of minutes. Implement separate priority queues for new-content and backlog jobs, and coordinate with your captioning vendor to understand the account-level concurrency model. Some vendors support priority tiers that allow new-content jobs to skip ahead of the backlog queue; others allocate concurrency slots equally across all jobs. Design your batch submission rate accordingly.
-
Content revision re-captioning misses partially updated videos
When a video is revised — the narration is re-recorded but the video ID in the LMS remains the same — your change-detection mechanism may not fire if it relies on a field that did not change (e.g., the video's Kaltura entry ID, which persists across content revisions that replace the media file). Implement change detection on the media file's checksum (
media_entry.listreturns an MD5 checksum for Kaltura entries) or on the entry'supdatedAttimestamp rather than on the entry ID alone. Compare the stored checksum to the current checksum on a scheduled basis for all assets with existing caption tracks. -
Compliance event log schema drift
Pipeline code changes over time. New event types are added, field names are changed, and log entries written under an old schema become incompatible with queries written for the new schema. The compliance event log that is the foundation of your accessibility coordinator's reporting will become unreliable if its schema drifts without versioning. Explicitly version every event type in your log schema (e.g.,
caption_job_created_v2), maintain backward compatibility by adding fields rather than removing them, and run a weekly schema validation job that verifies historical log entries parse correctly under the current schema. A compliance log that is correct for the last 3 months but missing or unparseable for months 4–12 is not a valid compliance record.
Frequently asked questions
Do we need a separate infrastructure deployment for the automation pipeline, or can it run inside the LMS?
The pipeline requires components that typically run outside the LMS: a webhook-receiving endpoint, a message queue, and worker processes that run continuously. Most LMS platforms do not support custom webhook consumers or background workers as part of their hosting environment. The deployment pattern that minimises operational burden is serverless functions on a cloud provider (AWS Lambda, GCP Cloud Functions, or Azure Functions) with a managed message queue (SQS, Pub/Sub, or Service Bus). For organisations that cannot deploy to a public cloud, a containerised deployment on an internal Kubernetes cluster is the next best option. The pipeline's compute requirements are low — a single Lambda function with 256MB memory can handle 50+ concurrent caption jobs — so cost is not a significant factor in the deployment choice.
How do we handle videos where the audio quality is too poor for automated ASR?
Poor audio quality — significant background noise, microphone problems, room reverb, or non-native-accented speech far from the ASR training distribution — causes the ASR confidence score to drop below the automated QA gate threshold. The correct handling is to route these jobs to human captioners rather than delivering low-confidence output. Configure your QA gate to trigger a human captioning request when confidence falls below 0.85 (at 0.92 you are already applying human spot-check; 0.85 indicates a more severe accuracy problem). Most captioning vendors, including GlossCap, offer a hybrid service tier where the ASR output is used as a starting point for a human editor rather than delivered directly — this reduces the cost relative to a full-manual transcription while recovering accuracy on difficult audio.
For the remote and hybrid workforce context, home-office audio is the most common source of quality problems in organisational training video. The audio remediation approaches described in that post (noise floor removal, normalisation, reverb reduction) can be applied as a pre-processing step before caption job submission and will improve ASR accuracy without requiring human captioner intervention. Budget 30–60 seconds of compute time for audio pre-processing per 10-minute video if you add this step to your pipeline.
Our LMS is not Kaltura, Cornerstone, or Workday. How do we add a new LMS delivery target?
The caption API and polling components are LMS-agnostic — they are the same regardless of target platform. Adding a new LMS delivery target requires writing a delivery function that takes a caption file and an LMS asset ID as inputs and handles the platform-specific authentication, format requirements, and API surface. The LMS caption ingestion engineering guide documents the delivery surfaces for TalentLMS, Docebo, Absorb, and additional video hosts (Panopto, Vimeo, Wistia) that are not covered in depth in this post. Structure your delivery functions as a strategy pattern — each LMS has its own delivery class that implements a common interface — so the pipeline core does not need to change when a new LMS is added.
How do we handle multilingual content where a video needs captions in multiple languages?
For multilingual content, submit one caption job per target language. The initial ASR job produces the source-language transcript; subsequent translation jobs take the source transcript as input and produce target-language caption files. The translation step uses a different API endpoint from the initial ASR job. Consult the multilingual caption workflow post for the translation pipeline design, source-lock prerequisites, and per-language LMS delivery considerations. Not all LMS platforms support multiple caption tracks at the same quality level — Kaltura supports multi-track caption delivery natively; TalentLMS supports one caption track per video in its current API surface.
What is the correct approach when a batch remediation job overruns and competes with the new-content pipeline for vendor capacity?
Implement a circuit breaker: when the new-content pipeline's P50 job completion time exceeds a threshold (e.g., 25 minutes for a 10-minute video — roughly 2× the normal completion time), pause the backlog batch submission until the new-content queue clears. Monitor the P50 completion time as a metric and trigger the circuit breaker automatically rather than relying on a human to notice the slowdown. When the circuit breaker opens, log the pause event with the backlog's current position so the batch controller can resume from the same point without re-submitting already-completed jobs.
Does the automation pipeline eliminate the need for the accessibility coordinator role?
No — it changes the role's focus rather than eliminating it. Before automation, the accessibility coordinator's time is dominated by manual coordination: submitting videos, downloading files, uploading to the LMS, tracking status in a spreadsheet. After automation, those tasks run without coordinator involvement. The coordinator's time shifts to higher-value activities: reviewing human-review-flagged jobs, maintaining the glossary mapping table, conducting the random sample audit, interpreting the compliance KPI dashboard, and managing exception requests. See the accessibility coordinator playbook for a detailed RACI matrix that maps the pre-automation and post-automation version of the role.
How should we version the caption files we deliver to the LMS, so we can track which glossary version produced each file?
Include the glossary version in the caption asset's metadata fields wherever the LMS supports it. For Kaltura caption assets, the label field (displayed in the player's caption track selector) can include the glossary version slug: "English (eng-v24, 2026-06)" rather than just "English." The compliance event log should always record the glossary_id and the glossary version at the time of job creation. When glossary terms are updated, the version increments — subsequent caption jobs use the new glossary version, and the event log records the transition. This allows the accessibility coordinator to identify which videos in the library were captioned under an older glossary version and may benefit from re-captioning, especially if the updated glossary corrects terms that were known to have high failure rates under the prior version.
Automate caption delivery with GlossCap
GlossCap provides the caption API, glossary parameter, and structured job response that this pipeline requires. The REST API supports webhook-driven job submission, exponential-backoff polling, format selection (SRT and VTT), confidence score metadata on output, and per-glossary accuracy compounding. Connect GlossCap to your Kaltura, Panopto, Vimeo, or Wistia webhook and eliminate the manual upload step from your caption workflow.
See API pricing Start free trial