Thursday, November 7, 2019

Decoding Analog Line-21/608: A Primer

This essay is part of a larger research effort into the
original analog, Line-21/608 captioning systems.
Text and original artwork: Copyright © 2019 Art & Technic LLC

Introduction

Closed Captioning is the original user-facing metadata of broadcast television.

It was designed - specifically and deliberately - to open the audio-visual medium of television up to the deaf. With what would eponymously become known as “the decoder,” television slowly began to become accessible to the deaf, the hard-of-hearing, and in time (after being mandated by law), the general public. The end product of decades of advocacy was simple, and yet profound: everyone could understand the meaning of television’s images in almost any environment.

In the 21st century, the wide availability of captioning is taken for granted, as though it were a given. In this, as in most things, complacency is the enemy of progress. We should remember that the path to the convenient, (seemingly) accessible present that we live in was, in fact, a hard-fought battle… that literally took decades.

In that light, it’s an easily made point that those who are charged with both preserving and making available the audio-visual materials of the past, should also make the effort to preserve – and make available – the program-related captioning that originally accompanied it.

In well-equipped post-production environments, there are a variety of means – inclusive of specialized hardware – which allow one to extract, encapsulate, and otherwise integrate the original caption data with the content. For those institutions and individuals who are preserving our audio-visual history on their own, outside the scope of those post-production facilities, budgets – and options – tend to be far more restrictive.

In that milieu, captioning all too frequently seems to be an overlooked step-child. This is despite the fact that in the captioning scheme of things, a small amount of effort can go a long way in preserving the accessibility of one’s collections. So, if budgets are truly an obstacle to maintaining their accessibility, could the root difficulty in doing so simply be a matter of tooling? After all, while the extraction of analog Line-21/608 caption data from digitized video is conceptually simple, on a practical basis, it can often be considerably more involved.

This is particularly true with respect to open-source solutions. In our research, we discovered that there aren’t a bevy of options that could be described as either “fit for purpose” or “ready to run.” In fact, both of the open-source packages that we’re aware of can be readily described as “problematic,” each with their own unique set of complications.

The first of the two solutions that we discovered is, in fact, quite noteworthy: it’s the only open-source program fully dedicated to performing Line-21/608 caption data extraction from analog materials. While this unnamed (for now) program does “work,” it is hardly fit for use without significant end-user effort. A small list of out-of-the-box issues:

  1. The program’s source code isn’t wholly portable, which in practical terms means that it won’t run on the Macintosh platform without modifications to the source code. This means that, for all intents and purposes, you have to have some type of programming background to even hope to make use of the software.
  2. Several basic aspects / functions of the program are either implemented incorrectly or are implemented in non-functional form.
  3. It is not designed for piped output in the normal sense (requiring code modifications to do so).
  4. As a result of design decisions, the program has difficulty reading captioned material broadcast by the ABC television network between the years 1980 and 1992. (This is a grievous failure, given that ABC was an early champion of closed captioning.)
  5. In default operation, it doesn’t separate field one and field two data, which is highly problematic.

FFmpeg, the well-known hydra of the open-source world, is the second open-source program that performs Line-21/608 caption data extraction. In FFmpeg’s case, its support for Line-21/608 extraction comes entirely by way of the readeia608 filter. While this solution is more likely to run after downloading, as compared to the previous program, it also has a set of provisos[1]:

  1. The author[s] appear[s] to be unfamiliar with the CEA/EIA 608 specification, and how decoding is supposed to be carried out.
  2. The overall design of the decoding solution used in readeia608 is simultaneously novel and... “interesting.” It appears to be based on a series of unstated assumptions about the structure of the signal that they expect as input, instead of actually parsing the signal, per se. That the code is based on these assumptions is semi-obfuscated, but clearly seen once one examines the role of the explicitly defined constants, as well as the defaults for several variables... which are effectively constants.[2]
  3. As a result of those design decisions, the filter may:
    1. Fail to decode or partially decode (distressed) Line-21 data
    2. Fail to decode Line-21 materials with a non-standard “clock”[3]
    3. Readily produce incorrectly decoded 608 data.
  4. The output format of the data – a dyad of hexadecimal values – is significantly less than ideal.[4]

From these two programs, we can only conclude – sadly – that with respect to analog Line- 21/608, the available open-source software packages both fall into the same trap: they’re both firmly in the category of “not-quite-working,” and therefore not-quite-fit for either the reliable extraction and/or preservation of Line-21/608 data. The underlying cause in both cases is seemingly less a matter of technical acumen, than one of conceptual understanding. This knowledge gap, as we’ve noted, is self-apparent from the code on both of these projects.

Does this knowledge gap arise from open-source’s traditional recalcitrance to pay for widely available standards documents, which would provide the knowledge that would serve as the basis for all future correct implementations?[5] Which is reflected in the open-source encyclopedia’s dumpster fire of an entry on CEA/EIA-608?[6] Or, in this particular case, does it arise from the murky technical history of closed captioning, in which widely cited, fundamental historical documents concerning the system seem to have vanished without a trace?[7]

The true answer is likely ‘a little from Column A, and a little from Column B.’ Line-21/608 caption decoding is a matter of ‘art,’ in the ancient sense of the word. When the craftsperson is unfamiliar with the art, the work product reflects that.[8]

As such, our purpose here is to lay the framework for correct future implementations… by way of education. Having implemented several Line-21/608 data extraction routines in software, we not only know it can be done, but that it can be done better. We hope that this article will aid others in the correct implementation of their own extraction routines, so that they may become the standard… and not the exception.

Contents

  1. The Line-21 Waveform: Background and Theory
  2. Data in the Line-21 Closed Captioning System
  3. Working with the Digitized Line-21 Waveform in Practice
  4. Data Extraction

The Line-21 Waveform: Background and Theory

The Line-21 waveform did not spring forth, fully formed, from the foreheads of either PBS or NCI in 1980. The Line-21 waveform[9], as we know it, was in fact the third iteration of a signal originally developed by the National Bureau of Standards[10] in the late 1960s... for the dissemination of highly accurate reference time and frequency signals on commercial television broadcasts.[11] It’s important to understand this, because the basic principles of decoding the Line-21 closed captioning signal rest with both the original design and purpose of that signal.

The original version of the Line-21 signal, as presented to the FCC in 1973, looked like this:

As illustrated above, this signal was comprised of two major components:

  1. An analog 1 MHz frequency reference
  2. A 26-bit digital data transmission

The 1 MHz frequency reference portion of the signal was, literally, that: a reference 1 MHz signal generated by an NBS atomic clock, located at the point of origination, on the premises of the broadcaster. That 1 MHz frequency reference, however, did more than provide the entire nation with a federally traceable frequency standard: it served as the “clock” for the digital data portion of the transmission.

In a “decoder” of that era, the 1 MHz frequency burst was used to excite a tank circuit. Once excited, the tank circuit’s output would then be used as a clock generator to decode the digital data portion of the signal, which was encoded using Non-Return-to-Zero (NRZ) modulation.[12]

Important points to remember about the 1 MHz frequency reference:

  • The 1 MHz burst was effectively the “clock” for the digital data portion of the signal.
  • Without the “clock,” it would be 'impossible' to decode the digital data.
  • As a result, the signal is, in a sense, self-describing: everything you need to correctly decode the signal is carried within the signal itself.
  • Since the 1 MHz frequency reference was generated by an atomic clock, whose phase was independent of the video signal, it was not phase coherent with the video signal.[13]

For a variety of reasons, the National Bureau of Standards’ Line-21 system never gained approval from the FCC. However, PBS – who picked up the technology from NBS – did receive a temporary authorization from the FCC to use a modified version of the NBS system. That “interim” system was then used to develop the version of the closed captioning system which debuted in 1980.[14]

That final version of the Line-21 signal, as presented to the FCC in 1976, looked like this:

Schematically, this incarnation is extremely similar to the NBS version of the Line-21 signal. The primary differences that we can readily see are that:

  1. The 1 MHz frequency reference has been replaced by a burst of a 503 kHz “clock run-in”
  2. The length of the “clock” has decreased
  3. The length of the digital data portion has increased
  4. The number of bits in the digital data transmission has been decreased to 16-bits

Although it may be difficult to discern on the face of things, each of these things are fundamentally interrelated.

How so? Well, let’s think back to the original version of the Line-21 signal: it was comprised of two discrete components. The first – a 1 MHz frequency reference, which was taken from an external atomic clock – was required to decode the second portion, the 26-bits of digital data. Since NBS designed the signal to convey both highly accurate, standards-grade time and frequency reference signals, it made sense to tie both of these elements together in that way.

PBS, however, did not have the dissemination of reference-grade time and frequency signals in their mandate. What was in PBS’ mandate was public service – and PBS felt that making television accessible to the Deaf community was an important part of their nascent[15] network’s mission. As such, their version of the Line-21 signal needed to be centered around the television signal it would be ‘piggybacking’ on.

That change in “clock” frequency from 1 MHz to 503 kHz, therefore, reflected the change in the sponsoring agency’s mission: it is equivalent to analog NTSC’s horizontal line frequency, multiplied by 32. So, by basing the “clock” on the horizontal line frequency, the “clock” – and the remainder of the Line-21 signal, whose timing is based upon that clock – are consistently phase coherent with the analog video signal on which they are transmitted. In other words, PBS’s Line-21 signal was designed to be a well-tempered video signal that wouldn’t wobble horizontally, unlike the NBS signal.[16]

Another consequence of reducing the frequency of the clock is that the clock’s ‘interval’ increases.[17] Put differently: this means that in PBS’ version of the Line-21 signal, a ‘bit’ is nearly twice as wide as a ‘bit’ in the original NBS system. Which, in turn, explains why there’s less real-estate available for the clock, and more dedicated to the data: the 16-bits of digital data that are transmitted in the PBS system require more real-estate... because the individual bits are ‘wider.’

Data in the Line-21 Closed Captioning System

The doubling in size of each individual ‘bit,’ and the reduction in the number of transmitted bits per frame were both purposeful design choices. PBS did this to ensure that the captioning data could be successfully decoded even if the reception of the channel it was transmitted on was less than ideal.[18] Therefore, the ability to recover and decode captioning data from less than ideal signals was a design decision for their closed captioning system.

(This is a critical point, which we cannot emphasize enough: this system was designed from the bottom-up to work in a particular way to reach those design goals. From the structure of the PBS Line-21 signal, to the code structure of the encapsulated data, to the methodology of data extraction, and to how a ‘decoder’ displays that extracted data on a screen, the design decisions are all part of a coherent system that takes into account the medium on which it was conveyed. Correctly handling Line-21/608 data is, therefore, a matter of correctly emulating the operation of a physical decoder.)

The fundamental way in which PBS’ Line-21 signal differs from the NBS system is that the data ‘bit’ – whose size is derived from the clock – structures the entire layout of the Line-21 signal. While the NBS Line-21 system specified a gap between the clock and the data, in the PBS system, the data portion of the signal begins at the end of the ‘clock.’ The physical position of every subsequent bit, therefore, is explicitly defined in terms of how many ‘bits’ they are from that starting point. The following diagram – from an Evertz equipment manual – illustrates this:

As we can see, starting immediately at the conclusion of the ‘clock’ are three start bits, which are then followed by the payload of 16 data bits. The payload is structured as a serial data transmission: a 7-bit ASCII character, and corresponding parity bit, followed by another 7-bit ASCII character, and its corresponding parity bit. The byte order of the transmission is in keeping with the ASCII standard for serial transmission: Least-Significant-Bit first.[19]

If you were wondering how exactly a Line-21 decoder would determine the point at which the clock signal ends, and the data portion begins, that’s because it’s not entirely obvious from the documents these diagrams were taken from. In fact, it takes three pieces of information to discern:

  1. As denoted in both NBS and PBS Line-21 signal diagrams, the “center” of the waveform is at 25 IRE (50% maximum amplitude of the signal).
  2. From note 5 on PBS’ Line-21 signal diagram, “Negative going zero crossings of clock are coherent with data transitions.”
  3. And finally, from an early revision of EIA/CEA-608: “All interval measurements are made from the midpoints (half amplitude) on all edges.”

Taken together, we can conclude that a decoder “knows” that it has reached the transition point between the ‘clock’ and data portions of the signal when:

  1. It has passed the seven peaks of the 503 kHz “clock run-in.”
  2. The amplitude of the last downward (negative) going portion of the clock signal is equivalent to 25 IRE (50% maximum amplitude of the signal).

If we were to look for that point on the above diagram, it would be at the exact intersection of the 25 IRE level, and the rightmost, downward edge of the clock signal.

Working with the Digitized Line-21 Waveform in Practice

To extract closed captioned data as part of a post-capture workflow, the source analog video needs to be digitized at the 720x486 resolution. Digitization at the lower vertical resolution of 480 will likely result in the Line-21 captioning data being cropped out during capture. Which, obviously, is a non-starter.

While the 720x486 resolution is sufficient for reproducing standard definition imagery, it’s somewhat less than ideal when processing Line-21 closed captioning signals. Practically speaking, this doesn’t mean that digitization precludes post-capture data extraction. Rather, it requires that one be mindful of the limitations of the ‘sampling rate’ when designing an extraction algorithm.

A Line-21 data extraction algorithm should, in normal operation, only require one input parameter: the video line in the source video file that it should attempt to extract data from. In this sense, the algorithm should (essentially) emulate the data extraction function of a physical closed-captioning ‘decoder.’ If a stand-alone piece of hardware can perform the same function without a series of parameters from the user, then neither should your algorithm.

Only in more involved data recovery operations – such as that from marginal, distorted, or otherwise damaged sources – should a data extraction algorithm require operational parameters from the user. For those circumstances, the basic set of optional parameters should be:

  • Source video line for data extraction
  • Clock-Width value[21]
  • Threshold value for NRZ decoding[22]
Other items that designers / programmers of decoding algorithms should keep in mind when creating their recovery solutions for use with digitized materials –
  • The Clock Must Be “Solved” for Each and Every Line
    Although we’re going to delve into the why behind this in the following section, it bears mentioning here since it is singularly important. While solutions that approach the problem in alternative ways may “work,” they often do not work consistently, or consistently well.
  • The Clock ‘Interval’ is NOT an Integer
    As noted elsewhere, the value of the clock ‘interval’ is a floating-point number. Your pixel samples, however, are an array of discrete (rectangular) samples... which may or may not be wholly coincident with the sampled Line-21 signal. You need to account for this.

Data Extraction

As a process, data extraction is fairly straightforward. On a per line basis, one must:

  1. Solve for the clock
  2. Determine the location of the “bit zones”
  3. Perform NRZ decoding of each “bit zone”
  4. Return the parity checked value of each transmitted character

Since the clock is the critical element in discerning the location of each subsequent bit in the signal, you must determine a solution for the clock on every line. With the analog Line-21 signal, one must not take for granted that all elements of the signal will consistently appear at the same position on the line, every time. Frequently, as a result of normal variation, time-base errors, reception issues, and even variations between elements in the original post-production process, consistent timing (read: horizontal positioning) of the Line-21 signal is not guaranteed.

Take for instance the educational television series, The Voyage of the Mimi. Each episode in the series was comprised of two parts: part “A,” which was a dramatic, fictionalized adventure, and part “B,” an educational ‘adventure’ that delved into some aspect of the science seen in the first half of the show. These two parts were each edited and captioned independently, and once finished, they were edited together onto a single master videotape.

The relevance of the show’s post-production process to caption data extraction is that at the join between part A and B, there is an observable change in the positioning (timing) of the Line-21 signal. We can see this readily in the waterfall / ‘overhead view’ of the Line-21 signal from episode seven:

While this timing change wouldn’t trip up a hardware decoder, for a software decoder that doesn’t appropriately solve for the clock… it certainly would.

Regardless, once you’ve solved for the ‘clock,’ you can then proceed to determine the location of the sixteen “bit zones.” A “bit zone,” strictly speaking, is the portion of the Line-21 real-estate that corresponds to a specific bit of data. The following diagram illustrates the location of the sixteen “bit zones” for a sample Line-21 signal:

To discern the value of a single bit, one must sum the value of all pixels in that “bit zone,” and then divide that sum by the number of samples, to determine the average pixel value. If the calculated average pixel value is nominally greater-than-or-equal-to 50% of the maximum amplitude of the clock, the value of the bit is 1.[23] Otherwise, the value of the bit is 0.

Discerning the value of a bit in a “bit zone” in this way is an intentional part of the design of the Line-21 / 608 system: it aids data recovery in difficult reception environments (noise, multipath, etc.), making the system more robust.

To decode each character, we must first discern the value of all bits for that character – inclusive of the parity bit. We must then check the value of the extracted character with respect to odd parity. If the character passes the parity check, we can then return the value of the extracted character as-is. If the character does not pass the parity check, then the decoded value is invalid, and we must instead return the value which signals this condition for a[n] Line-21/608 decoder.[24]

While it may be “easy” to ignore the parity bit, doing so is not inconsequential: it produces a non-compliant data stream. A non-compliant data stream, in turn, will likely result in unexpected behavior from compliant Line-21/608 decoders. So, to be compliant, one MUST check parity, and one MUST return appropriate decoded values. Checking parity is NOT optional – it is mandatory by default.[25]

Coda

In this essay, we delved into the history and theory of the analog Line-21/608 signal to illustrate the concepts underlying the operation of the system, and how data is to be correctly decoded for that system. This should allow any programmer conversant in the ‘art’ to generate compliant decoding solutions for caption data originating from analog sources.

We sincerely hope that this will be of help to those organizations, institutions, and individuals charged with preservation of the audio-visual materials of the past, and will help make these recordings – inclusive of the captioning data locked within them – more easily accessible to future generations.

-DW
November 2019
(Slight revisions: December 2019)



[1] We were unable to ascertain if the Line-21/608 extraction aspect works at all, as the filter wasn’t able to decode data from a test sample (an aircheck from the ABC television network in the mid-1980s), using the provided sample command in the documentation. Therefore, all conclusions in this section are based upon a close reading of the source code.

[2] Had the author[s] documented their code, this would have been clearer. If the author[s] believe that I am mischaracterizing their code, I would suggest adding both COMMENTS and DOCUMENTATION to clarify both the operation of the code as well as their design intentions.

[3] In all fairness, this is not an issue unique to this decoder, but worth noting nonetheless.

[4] In a follow-up essay, we propose a superior format for Line-21/608 encapsulation: tabraw.

[5] In one discussion, someone involved in the open-source movement made an incredible argument against paying for a copy of a standard: they’d much rather spend their money with the Dollar Shave Club. While we can all agree that personal hygiene is important, so is professional competence and credibility. Ultimately, arguments against professional competence, credibility, and concomitant standards are symptoms of institutional and/or intellectual poverty.

[6] At the time of writing, it was not only awfully written but, in many respects, dead wrong. Those seeking to learn about the standard are better served… by literally any other reputable source.

[7] Art & Technic LLC is working to ameliorate this problem, both in this document, and in other forthcoming projects.

[8] Literally, “ars sine scientia nihil est.”

[9] So named because the waveform occupies the entirety of Line 21 in analog NTSC video.

[10] The National Bureau of Standards (NBS) is better known today as the National Institute of Standards and Technology (NIST).

[11] As summarized from ongoing research; subject to revision.

[12] The specific method used, as described by the Wikipedia, was ‘Unipolar non-return-to-zero level’ modulation. It’s also individually discussed on its own Wikipedia page: Unipolar encoding.

[13] While phase coherence is important to keep in mind for reasons that will soon become clear, the original system’s lack of phase coherence, and the impact of that design decision in the original system, is not relevant to the scope of this article. This summary of the original system is derived from ongoing research; subject to revision.

[14] As summarized from ongoing research; subject to revision.

[15] Per the Wikipedia, PBS began operations on October 5, 1970.

[16] The NBS system, whose frequency reference (“clock”) was taken from an atomic clock that had no phase relationship to the video signal on which it was piggybacking, exhibited continuous timing ‘jitter’… because the instantaneous phase of the clock differed every 1/30th of a second. As a consequence, the horizontal placement of the digital code transmission would vary from field to field as the phase of the “clock” varied with respect to the video signal.

Visually, PBS’ system is a stark contrast with its predecessor. Since the clock in the PBS system is phase coherent with the video signal, that means that the position of each element in the signal – all of which are based on the timing of the clock (which is in turn based on the timing of the video signal itself) – consistently appear at consistent points. So, while PBS’ Line-21 system appears to be a largely static signal on a waveform monitor, it actually is more involved behind-the-scenes.

(The detail on the original system is derived from ongoing research; subject to revision.)

[17] When we say ‘interval’ here, we are referring to the “data bit interval” - the width of a single bit at that clock frequency. Per EIA/CEA-608, the width of that interval is defined as 1/(fH x 32), or in terms of time, 1.986 ┬ÁS.

[18] In the modern ATSC digital television system, reception is either good enough to reproduce the transport stream, or not. Poor signal strength and multipath can easily preclude successful reception of ATSC. In contrast, an analog NTSC receiver could often provide a “watchable” signal in conditions that ranged from poor to marginal. (Development detail derived from ongoing research; subject to revision.)

[19] See Mackenzie, Charles, “Coded Character Sets, History and Development,” Addison-Wesley, 1980, page 253. This chapter of the book – “Which Bit First?” – is concerned with how the LSB / Little Endian standard for serial transmission of ASCII data came about... which also happens to be a brisk, informative, and worthwhile read.

[20] See EIA/CEA-608-B, October 2000, page 11, note 1.

[21] This is largely superfluous, because in the PBS Line-21 system, the clock-width is literally a constant. (Remember: the clock ‘interval’ is based on the horizontal line frequency of the television signal itself.)

[22] The basic principles of NRZ decoding with respect to Line-21 signals will be discussed in the next section.

[23] Recall from earlier that 1) the center of the waveform is at 50% maximum amplitude of the clock, and that 2) all interval measurements are also made at half-amplitude. While there are a variety of ways to determine the threshold value when decoding Line-21 signals, this is the recommended starting point.

[24] Nominally 7F hexadecimal. One should note that the process of data extraction is heavily tied into the operation of the decoder / display. See U.S. Federal Register, Volume 56, Number 114, June 13, 1991, pages 27204-27205.

[25] While we can imagine some cases in which one may want an extraction routine to return invalid values, this is a specialized use-case in data recovery, and therefore should require an explicit option to turn parity off for that explicit purpose.

No comments:

Post a Comment