This essay is part of a larger research effort into the
original analog, Line-21/608 captioning systems.
Text and original artwork: Copyright © 2019 Art & Technic LLC
Closed Captioning is the original user-facing metadata of broadcast television.
It was designed - specifically and deliberately - to open the audio-visual medium of television up to the deaf. With what would eponymously become known as “the decoder,” television slowly began to become accessible to the deaf, the hard-of-hearing, and in time (after being mandated by law), the general public. The end product of decades of advocacy was simple, and yet profound: everyone could understand the meaning of television’s images in almost any environment.
In the 21st century, the wide availability of captioning is taken for granted, as though it were a given. In this, as in most things, complacency is the enemy of progress. We should remember that the path to the convenient, (seemingly) accessible present that we live in was, in fact, a hard-fought battle… that literally took decades.
In that light, it’s an easily made point that those who are charged with both preserving and making available the audio-visual materials of the past, should also make the effort to preserve – and make available – the program-related captioning that originally accompanied it.
In well-equipped post-production environments, there are a variety of means – inclusive of specialized hardware – which allow one to extract, encapsulate, and otherwise integrate the original caption data with the content. For those institutions and individuals who are preserving our audio-visual history on their own, outside the scope of those post-production facilities, budgets – and options – tend to be far more restrictive.
In that milieu, captioning all too frequently seems to be an overlooked step-child. This is despite the fact that in the captioning scheme of things, a small amount of effort can go a long way in preserving the accessibility of one’s collections. So, if budgets are truly an obstacle to maintaining their accessibility, could the root difficulty in doing so simply be a matter of tooling? After all, while the extraction of analog Line-21/608 caption data from digitized video is conceptually simple, on a practical basis, it can often be considerably more involved.
This is particularly true with respect to open-source solutions. In our research, we discovered that there aren’t a bevy of options that could be described as either “fit for purpose” or “ready to run.” In fact, both of the open-source packages that we’re aware of can be readily described as “problematic,” each with their own unique set of complications.
The first of the two solutions that we discovered is, in fact, quite noteworthy: it’s the only open-source program fully dedicated to performing Line-21/608 caption data extraction from analog materials. While this unnamed (for now) program does “work,” it is hardly fit for use without significant end-user effort. A small list of out-of-the-box issues:
- The program’s source code isn’t wholly portable, which in practical terms means that it won’t run on the Macintosh platform without modifications to the source code. This means that, for all intents and purposes, you have to have some type of programming background to even hope to make use of the software.
- Several basic aspects / functions of the program are either implemented incorrectly or are implemented in non-functional form.
- It is not designed for piped output in the normal sense (requiring code modifications to do so).
- As a result of design decisions, the program has difficulty reading captioned material broadcast by the ABC television network between the years 1980 and 1992. (This is a grievous failure, given that ABC was an early champion of closed captioning.)
- In default operation, it doesn’t separate field one and field two data, which is highly problematic.
FFmpeg, the well-known hydra of the open-source world, is the second open-source program that performs Line-21/608 caption data extraction. In FFmpeg’s case, its support for Line-21/608 extraction comes entirely by way of the readeia608 filter. While this solution is more likely to run after downloading, as compared to the previous program, it also has a set of provisos:
- The author[s] appear[s] to be unfamiliar with the CEA/EIA 608 specification, and how decoding is supposed to be carried out.
- The overall design of the decoding solution used in readeia608 is simultaneously novel and... “interesting.” It appears to be based on a series of unstated assumptions about the structure of the signal that they expect as input, instead of actually parsing the signal, per se. That the code is based on these assumptions is semi-obfuscated, but clearly seen once one examines the role of the explicitly defined constants, as well as the defaults for several variables... which are effectively constants.
- As a result of those design decisions, the filter may:
- Fail to decode or partially decode (distressed) Line-21 data
- Fail to decode Line-21 materials with a non-standard “clock”
- Readily produce incorrectly decoded 608 data.
- The output format of the data – a dyad of hexadecimal values – is significantly less than ideal.
From these two programs, we can only conclude – sadly – that with respect to analog Line- 21/608, the available open-source software packages both fall into the same trap: they’re both firmly in the category of “not-quite-working,” and therefore not-quite-fit for either the reliable extraction and/or preservation of Line-21/608 data. The underlying cause in both cases is seemingly less a matter of technical acumen, than one of conceptual understanding. This knowledge gap, as we’ve noted, is self-apparent from the code on both of these projects.
Does this knowledge gap arise from open-source’s traditional recalcitrance to pay for widely available standards documents, which would provide the knowledge that would serve as the basis for all future correct implementations? Which is reflected in the open-source encyclopedia’s dumpster fire of an entry on CEA/EIA-608? Or, in this particular case, does it arise from the murky technical history of closed captioning, in which widely cited, fundamental historical documents concerning the system seem to have vanished without a trace?
The true answer is likely ‘a little from Column A, and a little from Column B.’ Line-21/608 caption decoding is a matter of ‘art,’ in the ancient sense of the word. When the craftsperson is unfamiliar with the art, the work product reflects that.
As such, our purpose here is to lay the framework for correct future implementations… by way of education. Having implemented several Line-21/608 data extraction routines in software, we not only know it can be done, but that it can be done better. We hope that this article will aid others in the correct implementation of their own extraction routines, so that they may become the standard… and not the exception.
- The Line-21 Waveform: Background and Theory
- Data in the Line-21 Closed Captioning System
- Working with the Digitized Line-21 Waveform in Practice
- Data Extraction
The Line-21 Waveform: Background and Theory
The Line-21 waveform did not spring forth, fully formed, from the foreheads of either PBS or NCI in 1980. The Line-21 waveform, as we know it, was in fact the third iteration of a signal originally developed by the National Bureau of Standards in the late 1960s... for the dissemination of highly accurate reference time and frequency signals on commercial television broadcasts. It’s important to understand this, because the basic principles of decoding the Line-21 closed captioning signal rest with both the original design and purpose of that signal.
The original version of the Line-21 signal, as presented to the FCC in 1973, looked like this:
As illustrated above, this signal was comprised of two major components:
- An analog 1 MHz frequency reference
- A 26-bit digital data transmission
The 1 MHz frequency reference portion of the signal was, literally, that: a reference 1 MHz signal generated by an NBS atomic clock, located at the point of origination, on the premises of the broadcaster. That 1 MHz frequency reference, however, did more than provide the entire nation with a federally traceable frequency standard: it served as the “clock” for the digital data portion of the transmission.
In a “decoder” of that era, the 1 MHz frequency burst was used to excite a tank circuit. Once excited, the tank circuit’s output would then be used as a clock generator to decode the digital data portion of the signal, which was encoded using Non-Return-to-Zero (NRZ) modulation.
Important points to remember about the 1 MHz frequency reference:
- The 1 MHz burst was effectively the “clock” for the digital data portion of the signal.
- Without the “clock,” it would be 'impossible' to decode the digital data.
- As a result, the signal is, in a sense, self-describing: everything you need to correctly decode the signal is carried within the signal itself.
- Since the 1 MHz frequency reference was generated by an atomic clock, whose phase was independent of the video signal, it was not phase coherent with the video signal.
For a variety of reasons, the National Bureau of Standards’ Line-21 system never gained approval from the FCC. However, PBS – who picked up the technology from NBS – did receive a temporary authorization from the FCC to use a modified version of the NBS system. That “interim” system was then used to develop the version of the closed captioning system which debuted in 1980.
That final version of the Line-21 signal, as presented to the FCC in 1976, looked like this:
Schematically, this incarnation is extremely similar to the NBS version of the Line-21 signal. The primary differences that we can readily see are that:
- The 1 MHz frequency reference has been replaced by a burst of a 503 kHz “clock run-in”
- The length of the “clock” has decreased
- The length of the digital data portion has increased
- The number of bits in the digital data transmission has been decreased to 16-bits
Although it may be difficult to discern on the face of things, each of these things are fundamentally interrelated.
How so? Well, let’s think back to the original version of the Line-21 signal: it was comprised of two discrete components. The first – a 1 MHz frequency reference, which was taken from an external atomic clock – was required to decode the second portion, the 26-bits of digital data. Since NBS designed the signal to convey both highly accurate, standards-grade time and frequency reference signals, it made sense to tie both of these elements together in that way.
PBS, however, did not have the dissemination of reference-grade time and frequency signals in their mandate. What was in PBS’ mandate was public service – and PBS felt that making television accessible to the Deaf community was an important part of their nascent network’s mission. As such, their version of the Line-21 signal needed to be centered around the television signal it would be ‘piggybacking’ on.
That change in “clock” frequency from 1 MHz to 503 kHz, therefore, reflected the change in the sponsoring agency’s mission: it is equivalent to analog NTSC’s horizontal line frequency, multiplied by 32. So, by basing the “clock” on the horizontal line frequency, the “clock” – and the remainder of the Line-21 signal, whose timing is based upon that clock – are consistently phase coherent with the analog video signal on which they are transmitted. In other words, PBS’s Line-21 signal was designed to be a well-tempered video signal that wouldn’t wobble horizontally, unlike the NBS signal.
Another consequence of reducing the frequency of the clock is that the clock’s ‘interval’ increases. Put differently: this means that in PBS’ version of the Line-21 signal, a ‘bit’ is nearly twice as wide as a ‘bit’ in the original NBS system. Which, in turn, explains why there’s less real-estate available for the clock, and more dedicated to the data: the 16-bits of digital data that are transmitted in the PBS system require more real-estate... because the individual bits are ‘wider.’
Data in the Line-21 Closed Captioning System
The doubling in size of each individual ‘bit,’ and the reduction in the number of transmitted bits per frame were both purposeful design choices. PBS did this to ensure that the captioning data could be successfully decoded even if the reception of the channel it was transmitted on was less than ideal. Therefore, the ability to recover and decode captioning data from less than ideal signals was a design decision for their closed captioning system.
(This is a critical point, which we cannot emphasize enough: this system was designed from the bottom-up to work in a particular way to reach those design goals. From the structure of the PBS Line-21 signal, to the code structure of the encapsulated data, to the methodology of data extraction, and to how a ‘decoder’ displays that extracted data on a screen, the design decisions are all part of a coherent system that takes into account the medium on which it was conveyed. Correctly handling Line-21/608 data is, therefore, a matter of correctly emulating the operation of a physical decoder.)
The fundamental way in which PBS’ Line-21 signal differs from the NBS system is that the data ‘bit’ – whose size is derived from the clock – structures the entire layout of the Line-21 signal. While the NBS Line-21 system specified a gap between the clock and the data, in the PBS system, the data portion of the signal begins at the end of the ‘clock.’ The physical position of every subsequent bit, therefore, is explicitly defined in terms of how many ‘bits’ they are from that starting point. The following diagram – from an Evertz equipment manual – illustrates this:
As we can see, starting immediately at the conclusion of the ‘clock’ are three start bits, which are then followed by the payload of 16 data bits. The payload is structured as a serial data transmission: a 7-bit ASCII character, and corresponding parity bit, followed by another 7-bit ASCII character, and its corresponding parity bit. The byte order of the transmission is in keeping with the ASCII standard for serial transmission: Least-Significant-Bit first.
If you were wondering how exactly a Line-21 decoder would determine the point at which the clock signal ends, and the data portion begins, that’s because it’s not entirely obvious from the documents these diagrams were taken from. In fact, it takes three pieces of information to discern:
- As denoted in both NBS and PBS Line-21 signal diagrams, the “center” of the waveform is at 25 IRE (50% maximum amplitude of the signal).
- From note 5 on PBS’ Line-21 signal diagram, “Negative going zero crossings of clock are coherent with data transitions.”
- And finally, from an early revision of EIA/CEA-608: “All interval measurements are made from the midpoints (half amplitude) on all edges.”
Taken together, we can conclude that a decoder “knows” that it has reached the transition point between the ‘clock’ and data portions of the signal when:
- It has passed the seven peaks of the 503 kHz “clock run-in.”
- The amplitude of the last downward (negative) going portion of the clock signal is equivalent to 25 IRE (50% maximum amplitude of the signal).
If we were to look for that point on the above diagram, it would be at the exact intersection of the 25 IRE level, and the rightmost, downward edge of the clock signal.
Working with the Digitized Line-21 Waveform in Practice
To extract closed captioned data as part of a post-capture workflow, the source analog video needs to be digitized at the 720x486 resolution. Digitization at the lower vertical resolution of 480 will likely result in the Line-21 captioning data being cropped out during capture. Which, obviously, is a non-starter.
While the 720x486 resolution is sufficient for reproducing standard definition imagery, it’s somewhat less than ideal when processing Line-21 closed captioning signals. Practically speaking, this doesn’t mean that digitization precludes post-capture data extraction. Rather, it requires that one be mindful of the limitations of the ‘sampling rate’ when designing an extraction algorithm.
A Line-21 data extraction algorithm should, in normal operation, only require one input parameter: the video line in the source video file that it should attempt to extract data from. In this sense, the algorithm should (essentially) emulate the data extraction function of a physical closed-captioning ‘decoder.’ If a stand-alone piece of hardware can perform the same function without a series of parameters from the user, then neither should your algorithm.
Only in more involved data recovery operations – such as that from marginal, distorted, or otherwise damaged sources – should a data extraction algorithm require operational parameters from the user. For those circumstances, the basic set of optional parameters should be:
Other items that designers / programmers of decoding algorithms should keep in mind when creating their recovery solutions for use with digitized materials –
- The Clock Must Be “Solved” for Each and Every LineAlthough we’re going to delve into the why behind this in the following section, it bears mentioning here since it is singularly important. While solutions that approach the problem in alternative ways may “work,” they often do not work consistently, or consistently well.
- The Clock ‘Interval’ is NOT an IntegerAs noted elsewhere, the value of the clock ‘interval’ is a floating-point number. Your pixel samples, however, are an array of discrete (rectangular) samples... which may or may not be wholly coincident with the sampled Line-21 signal. You need to account for this.
As a process, data extraction is fairly straightforward. On a per line basis, one must:
- Solve for the clock
- Determine the location of the “bit zones”
- Perform NRZ decoding of each “bit zone”
- Return the parity checked value of each transmitted character
Since the clock is the critical element in discerning the location of each subsequent bit in the signal, you must determine a solution for the clock on every line. With the analog Line-21 signal, one must not take for granted that all elements of the signal will consistently appear at the same position on the line, every time. Frequently, as a result of normal variation, time-base errors, reception issues, and even variations between elements in the original post-production process, consistent timing (read: horizontal positioning) of the Line-21 signal is not guaranteed.
Take for instance the educational television series, The Voyage of the Mimi. Each episode in the series was comprised of two parts: part “A,” which was a dramatic, fictionalized adventure, and part “B,” an educational ‘adventure’ that delved into some aspect of the science seen in the first half of the show. These two parts were each edited and captioned independently, and once finished, they were edited together onto a single master videotape.
The relevance of the show’s post-production process to caption data extraction is that at the join between part A and B, there is an observable change in the positioning (timing) of the Line-21 signal. We can see this readily in the waterfall / ‘overhead view’ of the Line-21 signal from episode seven:
While this timing change wouldn’t trip up a hardware decoder, for a software decoder that doesn’t appropriately solve for the clock… it certainly would.
Regardless, once you’ve solved for the ‘clock,’ you can then proceed to determine the location of the sixteen “bit zones.” A “bit zone,” strictly speaking, is the portion of the Line-21 real-estate that corresponds to a specific bit of data. The following diagram illustrates the location of the sixteen “bit zones” for a sample Line-21 signal:
To discern the value of a single bit, one must sum the value of all pixels in that “bit zone,” and then divide that sum by the number of samples, to determine the average pixel value. If the calculated average pixel value is nominally greater-than-or-equal-to 50% of the maximum amplitude of the clock, the value of the bit is 1. Otherwise, the value of the bit is 0.
Discerning the value of a bit in a “bit zone” in this way is an intentional part of the design of the Line-21 / 608 system: it aids data recovery in difficult reception environments (noise, multipath, etc.), making the system more robust.
To decode each character, we must first discern the value of all bits for that character – inclusive of the parity bit. We must then check the value of the extracted character with respect to odd parity. If the character passes the parity check, we can then return the value of the extracted character as-is. If the character does not pass the parity check, then the decoded value is invalid, and we must instead return the value which signals this condition for a[n] Line-21/608 decoder.
While it may be “easy” to ignore the parity bit, doing so is not inconsequential: it produces a non-compliant data stream. A non-compliant data stream, in turn, will likely result in unexpected behavior from compliant Line-21/608 decoders. So, to be compliant, one MUST check parity, and one MUST return appropriate decoded values. Checking parity is NOT optional – it is mandatory by default.
In this essay, we delved into the history and theory of the analog Line-21/608 signal to illustrate the concepts underlying the operation of the system, and how data is to be correctly decoded for that system. This should allow any programmer conversant in the ‘art’ to generate compliant decoding solutions for caption data originating from analog sources.
We sincerely hope that this will be of help to those organizations, institutions, and individuals charged with preservation of the audio-visual materials of the past, and will help make these recordings – inclusive of the captioning data locked within them – more easily accessible to future generations.
(Slight revisions: December 2019)
 We were unable to ascertain if the Line-21/608 extraction aspect works at all, as the filter wasn’t able to decode data from a test sample (an aircheck from the ABC television network in the mid-1980s), using the provided sample command in the documentation. Therefore, all conclusions in this section are based upon a close reading of the source code.