Captioning

The Americans with Disabilities Act (ADA) as amended offers guidance on providing effective communication for individuals with hearing, vision and speech disabilities.  The selected communication tool(s) should provide the individual with the same information as an individual who does not require accommodation.

HUIT and UDR offer guidance to schools and departments looking to add captioning to audio media resources. Vendors that may be useful in providing captioning are identified on our Vendors & Service Providers page.

Captioning is the process of converting the audio content of a video, webinar, live event, or other production into text and displaying the text on a screen or monitor. Quality captions not only display words as the textual equivalent of spoken dialogue or narration, but they also include speaker identification, sound effects, and music description. Captioning is critical to individuals who are deaf or hard of hearing and can also aid and enhance the experience and understanding for many others.  (National Association of the Deaf, 2020)

Transcription vs Captioning

Transcription and captioning are separate processes and products. While transcription forms the basis for captioning, they each have different uses.

Transcription is the process in which audio is converted into a written text document with no time information attached to it, whereas captioning divides transcript text into time-coded chunks, known as “caption frames” and synchronizes each frame with the audio. Captions are typically located at the bottom of a screen and allow viewers to follow along with the audio and video or captions interchangeably.

 Transcription has many benefits, however, because it is not time-coded, it does not allow deaf or hard of hearing individuals to follow along in real-time with the audio content. Captions are the best practice to provide equal access.

Types of Captions

Closed-Captioning: Provides a text description, typically at the bottom of the screen, and is available and visible when turned on by the user. 

Open-Captioning (aka "burned-in", "baked on", "hard-coded", or "hard" captions): Provides a text description, typically at the bottom of the screen, and is available and visible to all viewers and cannot be turned off. 

Subtitles: Subtitles do not include the non-speech elements of the audio (like sounds or speaker identifications). Subtitles are not considered a form of effective communication for individuals who are deaf or hard of hearing.

How Captions are Generated

Automatic Speech Recognition (ASR):  ASR is the use of computer-based techniques to identify the words a person has spoken through a microphone, analyze it using an algorithm, and produce a text output.

Communication Access Realtime Translation (CART):  CART is the instant translation of the spoken word into text by a human type-corrector using a stenotype machine, computer, and realtime software. The text appears on a computer monitor or other display.  CART services can be provided on-site or remotely via a web conferencing tool. 

ASR-CART Hybrid:  The process in which audio goes through ASR technology to produce a captioning rough draft and then is corrected by a human editor.

Live Captioning vs. Post-Production Captioning

Live captioning is provided in the moment along with a live broadcast or event. Post-production captioning is added onto a finished video so that captions are available in the video playback.

Quality Captions

Best practice dictates that quality captions for effective communication be delivered at a 99%+ accuracy rate regarding synchronicity, completeness, and placement. It is important that the captions are (1) synchronized and appear at approximately the same time as the audio is delivered, (2) equivalent and equal in content to that of the audio, including speaker identification and sound effects; and (3) accessible and readily available to those who need or want them.

The Described and Captioned Media Program (DCMP), funded by the United States Department of Education, provides guidelines for quality captions which are consistent with the 2014 mandates adopted by the Federal Communications Commission addressing content quality for closed captioning of video programming. These guidelines list the following elements of quality captions:

Accurate: Errorless captions are the goal for each production.

Consistent: Uniformity in style and presentation of all captioning features is crucial for viewer understanding.

Clear: A complete textual representation of the audio, including speaker identification and non-speech information, provides clarity.

Readable: Captions are displayed with enough time to be read completely, are in synchronization with the audio, and are not obscured by (nor do they obscure) the visual content.

Equal: Equal access requires that the meaning and intention of the material is completely preserved.

 WCAG 2.2 AA also offers captioning guidelines.

Note that, while auto-generated captions that use Automatic Speech Recognition (ASR) are improving, they continue to have lower accuracy than live transcribers and are not an effective option in many situations. For that reason, a best practice for important meetings and events is to provide live captioning such as CART.

Using a Captioning Vendor

There are numerous captioning vendors available.  More information on exploring and hiring service providers may be found on the Digital Accessibility website.

For additional information, please visit, "Providing Live Captions for Events" on the HUIT Digital Accessibility Services website.