Packets in CTA-708
Caption streams are transmitted with many packet wrappers around them. These are the picture user data, which contains the caption data, which contains the cc_data, which contains the Caption Channel packets, which contains the Service Block, which contains the caption streams. These packets are described in detail in this section. But the streams themselves are described in the following sections. This layering is based on the OSI Protocol Reference Model: This section will describe the various packets, the Coding Layer and Presentation Layers are described in the remainder of this document.Picture User Data
These are inserted before a SMPTE 259M active video frame or video packet. Common video packets are a picture header, a picture parameter set and a Material Exchange Format essence. bslbf: bit string, left bit first ; uimsbf: unsigned integer, most significant bit first bslbf: bit string, left bit first ; uimsbf: unsigned integer, most significant bit first NOTE: the SEI depending on the encoder can contain more payloads than just the captions, so one would need to navigate all payloadTypes contained within. When the ''itu_t_t35_country_code'' is set to 181, the ''itu_t_t35_provider_code'' defines U.S. maintained manufacturers. The ''itu_t_t35_provider_code'' for U.S. maintained manufacturers, when set to 47 defines DirecTV ''user_data'' and set to 49 definesClosed Caption Data Packet (cc_data_pkt)
3 bytes total: If cc_valid is not set the cc_data_pkt's should be considered padding and discarded.. If it is set, cc_type will be one of four values NTSC_CC_FIELD_1 = 0, NTSC_CC_FIELD_2 = 1, DTVCC_PACKET_DATA = 2, DTVCC_PACKET_START = 3. If it is either 0 or 1, the cc_data fields should be interpreted as EIA-608 Captions (allowing for 4 total captions, as EIA-608 does). If cc_type is 3 then a decoder should begin assembling a Caption Channel Packet with the cc_data as described below, and if the cc_type is 2 it should append the cc_data to any Caption Channel Packet being assembled. If a DTVCC packet is already being assembled and either cc_valid is set and the cc_type is 3 or cc_valid is clear and cc_type is 2 or 3, then the packet should be considered complete. NOTE: In a caption decoder cc_data packets must be reassembled in the correct order to create the DTVCC packets. The standard is not clear on this, but it appears this should be in frame display order, not encoded frame order. This means in encoder DTVCC Packets should probably be broken up and inserted into the picture user data as cc_data packets in display order as well. NOTE: To avoid this bug in the CTA-708 standard some encoders encode captions only on one frame type, such as only P frames, or only I frames, since if only one frame type is used, the frame display and frame encoded order are the same.DTVCC packet (cc_data_1/cc_data_2)
Within the packet_data, there is only one type of packet. This is known as the Service Block. This further subdivides the DTVCC Transport Stream into 63 substreams, each of which describes a discrete captioning service. Service 1 is designated as the Primary Caption Service, while Service 2 is the Secondary Language Service. The Caption Descriptor describes any other services offered. packet_size defines the number of two byte blocks that follow with odd blocks padded with a NULL byte.Service Block Packet (packet_data)
If service_number is 7, then the extended_service_number is added and used instead of the service_number. If block_size is 0, the service_number must be zero as well with no block_data present. This is known as a Null Service Block Header, which is used for padding the packet, when no captions are sent. Note: Service Blocks may not cross Caption Channel Packet Boundaries. This means each Caption Channel Packet can be parsed without keeping any state for the Service Blocks themselves.Caption Stream Encoding (block_data)
The 63 caption service sub-streams contain a mixed command and text stream, much like Telnet. There are four logical code sub-groups: CL, GL, CR, and GR. These each have single and multi-character code sets. Whenever a command character is seen any text accumulated in the parser should be flushed. Since text might need to be flushed when there is no command pending, there is a null command known as the ETX command in the C0 command set. There are also two special commands, the Reset and DelayCancel. These must be parsed with lookahead. A Delay command issued previously can be canceled at any time with a DelayCancel command, so once a Delay is seen a decoder must look ahead for a DelayCancel, and only look for a DelayCancel. A Reset command on the other hand is sent to break out from an unknown decoder state and all data before it must be ignored.Character Groups
C0 Table
NUL, BS, FF, and CR are interpreted as they are in ASCII control codes. HCR moves the pen location to the beginning of the current line and deletes its contents. FF clears the screen and moves the pen location to (0,0). ETX is the NULL command mentioned earlier, which is used to flush text to the current window when no other command is pending. EXT1 is used to escape to the 'C2', 'C3', 'G2', and 'G3' tables for the following byte. Finally, P16 can be used to escape the next two bytes for Chinese and other large character maps. All characters in the range 0x10-0x17, which currently includes EXT1, are followed by one byte which needs to be interpreted differently. And, all characters in the range 0x18-x1f, which currently includes P16, are followed by two bytes that need to be interpreted differently. If a decoder encounters one of these and does not know what to do, it should still skip the next byte or two, as appropriate, before continuing.C1 Table
The C1 Table contains all the currently defined caption commands. These will be described in detail in the next section.C2 Table
The C2 Table contains no commands as of CTA-708 revision A. However, if a command is seen in these code sets a decoder must skip an appropriate number of the following bytes.C3 Table
The C3 Table contains no commands as of CTA-708 revision A. However, if a command is seen in these code sets, a decoder must skip an appropriate number of the following bytes.G0 Table
The G0 Table consists of ASCII characters for the most part. SP here is shorthand for Space. MN is a musical note, which replaces the Delete command code in ASCII, and can be any of "♩", "♪", "♫" or "♬", depending on the receiver manufacturer.G1 Table
The G1 Table is basically the ISO 8859-1 Latin-1 character set. Note character 0xa0 is theG2 Table
TSP and NBTSP are the Transparent Space, and Non-Breaking Transparent Space, respectively. The G2 Table contains miscellaneous characters that may not be displayed in all browsers. BLK indicates a solid block which fills the entire character block with a solid foreground color.G3 Table
The G3 Table contains only a single character, the CIcon, with square corners. This character is at 0xa0.Caption commands
EndOfText (0x03)
The EndOfText command is a Null Command which can be used to flush any buffered text to the current window. All commands force a flush of any buffered text to the current window, so this command is only needed when no other command is pending.SetCurrentWindow0-7 (0x80-0x87)
SetCurrentWindow tells the caption decoder which window the following commands describe: SetWindowAttributes, SetPenAttributes, SetPenColor, SetPenLocation. If the window specified has not already been created with a DefineWindow command then SetCurrentWindow and the window property commands can be safely ignored.ClearWindows (0x88 + 1 byte)
ClearWindows clears all the windows specified in the 8 bit window bitmap.DisplayWindows (0x89 + 1 byte)
DisplayWindows displays all the windows specified in the 8 bit window bitmap.HideWindows (0x8A + 1 byte)
HideWindows hides all the windows specified in the 8 bit window bitmap.ToggleWindows (0x8B + 1 byte)
ToggleWindows hides all displayed windows, and displays all hidden windows specified in the 8 bit window bitmap.DeleteWindows (0x8C + 1 byte)
DeleteWindows deletes all the windows specified in the 8 bit window bitmap. If the current window, as specified by the last SetCurrentWindow command, is deleted then the current window becomes undefined and the window attribute commands should have no effect until after the next SetCurrentWindow or DefineWindow command.Delay (0x8D + 1 byte)
Delay suspends all processing of the current service, except for DelayCancel and Reset scanning. The period of suspension is set to by the one byte parameter. The parameter specifies the delay in tenths of a second, so the minimum delay is 0.1 seconds, and the maximum delay is 25.5 seconds. A zero second delay can safely be ignored in a decoder, but should not be emitted from an encoder. A delay should be cancelled if the caption decoder's input buffer becomes full, a DelayCancel or Reset is received, or the specified delay time elapses.DelayCancel (0x8E)
DelayCancel terminates any active delay and resumes normal command processing. DelayCancel should be scanned for during a Delay.Reset (0x8F)
Reset deletes all windows, cancels any active delay, and clears the buffer before the Reset command. Reset should be scanned for during a Delay.SetPenAttributes (0x90 + 2 bytes)
The SetPenAttributes command specifies how certain attributes of subsequent characters are to be rendered in the current window, until the next SetPenAttributes command. This command has the following parameters:+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ , TXT_TAG, OFS, PSZ, , I, U, EDTYP, FNTAG, +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 15 8 7 0 OFS = offset ; PSZ = pen size I = italic toggle ; U = underline toggle EDTYP = edge type ; FNTAG = font tag* pen size, 2 bits, * offset, 2 bits, * text tag, 4 bits, * font tag, 3 bits, * edge type, 3 bits, * underline, 1 bit, * italic, 1 bit,
SetPenColor (0x91 + 3 bytes)
SetPenColor sets the foreground, background, and edge color for the subsequent characters. Color is specified with 6 bits, 2 for each of blue, green and red. The lowest order bits are for blue, the next two for green and the highest order bits represent red. Opacity is represented by two bits, they represent SOLID=0, FLASH=1, TRANSLUCENT=2, and TRANSPARENT=3. The edge color is the color of the outlined edges of the text, but the outline shares its opacity with the foreground, so the highest order bits of the third parameter byte should both be cleared. The parameters are as follows:+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ , FOP, F_R, F_G, F_B, , BOP, B_R, B_G, B_B, , 0, 0, E_R, E_G, E_B, +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 23 16 15 8 7 0 FOP = foreground opacity ; BOP = background opacity F_? = foreground color component ; B_? = background color component E_? = edge color component* foreground color, 6 bits * foreground opacity, 2 bits * background color, 6 bits * background opacity, 2 bits * edge color, 6 bits
SetPenLocation (0x92 + 2 bytes)
SetPenLocation sets the location of for the next bit of appended text in the current window. It has two parameters, row and column. If a window is not locked (see Define Window) and the SMALL font is in effect the location can be outside the otherwise valid addresses.+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ , 0, 0, 0, 0, ROW , , 0, 0, COLUMN , +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 15 8 7 0* row, 4 bits, normally 0-14 * null padding, 4 bits * column, 6 bits, normally 0-31 for 4:3 formats, and 0-41 for 16:9 formats * null padding, 2 bits
SetWindowAttributes (0x97 + 4 bytes)
SetWindowAttributes Sets the window attributes of the current window. Fill Color is specified with 6 bits, 2 for each of blue, green and red. The lowest order bits are for blue, the next two for green and the highest order bits represent red. Fill Opacity is represented by two bits, they represent SOLID=0, FLASH=1, TRANSLUCENT=2, and TRANSPARENT=3. The window's Border Color is specified the same way. However, the Border Type is split into two fields. They should be combined, with ''border type 01'' representing the low order bits, and ''border type 2'' the high order bit. Once combined the Border Type has 6 valid values: NONE=0, RAISED=1, DEPRESSED=2, UNIFORM=3, SHADOW_LEFT=4, and SHADOW_RIGHT=5.+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ , FOP, F_R, F_G, F_B, , BTP, B_R, B_G, B_B, , W, B, PRD, SCD, JST, , EFT_SPD, EFD, DEF, +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 31 24 23 16 15 8 7 0 FOP = fill opacity ; BTP = border type lower bits; B = border type upper bit F_? = fill color component ; B_? = border color component W = word wrap toggle ; PRD = print direction ; SCD = scroll direction JST = justification ; EFT_SPD = effect speed ; EFD = effect direction ; DEF = display effect* fill color, 6 bits. Window interior color. * fill opacity, 2 bits. * border color, 6 bits. Window border color. * border type 01, 2 bits. See discussion above. * justify, 2 bits. For Left-to-Right and Right-to-Left print directions the values are: , for Top-to-Bottom and Bottom-to-Top print directions the values are: TOP=0, BOTTOM=1, CENTER=2, FULL=3 For ''Left'' justification, decoders should display any portion of a received row of text when it is received. For ''center'', ''right'', and ''full'' justification, decoders may display any portion of a received row of text when it is received, or may delay display of a received row of text until reception of a row completion indicator. A row completion indicator is defined as receipt of a CR, ETX or any other command, except SetPenColor, SetPenAttributes, or SetPenLocation where the pen relocation is within the same row. Receipt of a character for a displayed row which already contains text with ''center'', ''right'' or ''full'' justification will cause the row to be cleared prior to the display of the newly received character and any subsequent characters. Receipt of a justification command which changes the last received justification for a given window will cause the window to be cleared. * scroll direction, 2 bits. This specifies which direction text will scroll when the end of a caption "line" is reached. It has one of four values: LEFT_TO_RIGHT=0, RIGHT_TO_LEFT=1, TOP_TO_BOTTOM=2, and BOTTOM_TO_TOP=3. * print direction, 2 bits. This specifies how order text is added to a window. It has one of four values: LEFT_TO_RIGHT=0, RIGHT_TO_LEFT=1, TOP_TO_BOTTOM=2, and BOTTOM_TO_TOP=3. * word wrap, 1 bit. If set word wrapping is enabled, otherwise word wrap should not be employed. * border type 2, 1 bits. See discussion above. * display effect, 2 bits. This specifies an effect to be used to display or hide a window. It has one of three valid values: SNAP=0, FADE=1, and WIPE=2. SNAP means the window should assume full opacity immediately. FADE means the window should fade in or out at ''effect speed''. Finally, WIPE means the window should fly onto or off the screen from the border of the screen border specified in ''effect direction'' at the rate specified in ''effect speed'' * effect direction, 2 bits. This specifies where a wipe effect comes from on window display. It has one of four values: LEFT_TO_RIGHT=0, RIGHT_TO_LEFT=1, TOP_TO_BOTTOM=2, and BOTTOM_TO_TOP=3. When the window is wiped off the screen it should be wiped off in the opposite direction from how it was wiped onto the screen. * effect speed, 4 bits. This specifies in half-seconds how long a caption display or hide effect, such as FADE, and WIPE, should take. The maximum time is 7.5 seconds, and the minimum non-zero value is 0.5 seconds. Colors, text painting, effects, and border types can be customized with the SetWindowAttributes and SetPenAttributes commands. However, the caption provider may wish to use predefined standard window styles. A set of predefined styles will be hard stored in receivers. This set will anticipate the most widely used types of caption windows in order to conserve caption channel bandwidth by eliminating the need to transmit superfluous SetWindowAttributes and SetPenAttributes commands. Predefined window and pen styles can be specified by the window style and pen style ID parameters in the DefineWindow command.
DefineWindow07 (0x98-0x9F, + 6 bytes)
DefineWindow0-7 creates one of the eight windows used by a caption decoder. This command should be sent periodically by a caption encoder even for pre-existing windows so that a newly tuned in caption decoder can begin displaying captions. When issued on a pre-existing window the pen style and window style can be left null, this tells the decoder not to change the current styles if they exist, and initialize both to style 1 if the window does not exist in its context.+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ , 0, 0, V, R, C, PRIOR, , P, VERT_ANCHOR , , HOR_ANCHOR , +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 47 40 39 32 31 24 V = visible ; R = row lock toggle ; C = column lock toggle PRIOR = priority ; P = relative toggle +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ , ANC_ID , ROW_CNT, , 0, 0, COL_COUNT , , 0, 0, WNSTY, PNSTY, +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 23 16 15 8 7 0 WNSTY = window style ; PNSTY = pen styleThe parameters are as follows: * priority, 3 bits, 0-7. A decoder is only required to display up to four windows. If more than four displayed windows are requested, the decoder should display the four highest priority windows. * column lock, 1 bit. If set, column lock fixes the absolute number of columns to be displayed. If not set, a caption decoder may display more columns of text when the font size permits it, and a SetPenLocation command may go to a location outside the defined window size. * row lock, 1 bit. If set, row lock fixes the absolute number of rows to be displayed. If not set, a caption decoder may display more rows of text when the font size permits it, and a SetPenLocation command may go to a location outside the defined window size. * visible, 1 bit. If set, this flag causes the window to be displayed upon creation, if not set, the window is initially hidden. * null, 2 bits. Null padding. * anchor vertical, 7 bits. Vertical position of the window's anchor point. The range is normally 0-74. When the ''relative positioning'' bit is set however the range is 0-99. * relative positioning, 1 bit. If set, the ''anchor horizontal'' and ''anchor vertical'' represent relative coordinates, percentages, instead of regular coordinates. * anchor horizontal, 8 bits. Horizontal position of the window's anchor point. The range is normally 0-209 when the stream's aspect ratio is 16:9, and 0-159 when the stream's aspect ratio is 4:3. When the ''relative positioning'' bit is set however the range is 0-99. * row count, 4 bits. This is the number of rows of text, assuming the STANDARD font size, the window will hold. The range is 0-15. NOTE: In practice a decoder must add one to the number to get the intended effect. i.e. 0 -> 1, 1 -> 2, etc. * anchor ID, 4 bits. Valid Values: * column count, 6 bits. This is the number of columns of text, assuming the STANDARD font size, the window will hold. The range is 0-31 for 4:3 streams, and 0-41 for 16:9 streams. NOTE: In practice a decoder must add one to the number to get the intended effect. i.e. 0 -> 1, 1 -> 2, etc. * null, 2 bits. Null padding. * pen style, 3 bits. If the value is zero and this is a new window, pen style one should be used for future characters. If the value is zero and this is an existing window, the previous pen style should continue to be used. For non-zero values the pen style should be set as if SetPenStyle were called with the parameters in the ''pen style'' table, below. * window style, 3 bits. If the value is zero and this is a new window, window style one should be used for future characters. If the value is zero and this is an existing window, the previous window style should continue to be used. For non-zero values the window style should be set as if SetWindowStyle were called with the parameters in the ''window style'' table below. * null, 2 bits. Null padding.
= Predefined Pen style
= Unless stated the predefined font size is standard, offset is normal, italics and underline are not set, edge type is none, foreground color is white, foreground opacity is solid, background color is black, background opacity is solid, and edge color is black. # Default # Monospaced Serif # Proportional Serif # Monospaced Sans Serif # Proportional Sans Serif # Monospaced Sans Serif - background opacity is transparent # Proportional Sans Serif - background opacity is transparent= Predefined Window style
= Unless stated the predefined justification is left, print direction is left-to-right, scroll direction is bottom-to-top, word wrap is off, display effect is snap, effect direction and speed are not set, fill color is black, fill opacity is solid, and border type is none. # CEA-608 Style PopUp # PopUp w/Transparent Background - fill opacity is transparent # CEA-608 Style PopUp Centered - justification is center # CEA-608 Style RollUp - word wrap is on # RollUp w/Transparent Background - word wrap is on; fill opacity is transparent # CEA-608 Style Centered RollUp - word wrap is on; justification is center # Ticker Tape - print direction is top-to-bottom; scroll direction is right-to-leftHow to interpret the caption stream
Text/commands
Word wrap
It may sometimes be desired that word wrap be performed in a caption decoder. This may happen because the end user of the caption decoder specifies a different font than the encoder requests, or the end user wishes to see more of the caption text than normally possible. Note that SetWindowAttributes sets a word wrap flag, when set this indicates the subtitles are written with word wrap in mind, and this may be used as a hint to the decoder that word wrapping is safe. Word wrap can be performed on carriage return, space, and hyphen characters, however both the non-breaking space (0xA0 in the G1 Table), and the non-breaking transparent space (0x21 in the G2 Table) should not be considered safe characters to rewrite.Anchor ID
There are nine valid anchor ID's, shown below:Fonts
CTA-708 supports eight font tags: undefined, monospaced serif, proportional serif, monospaced sans serif, proportional sans serif, casual, cursive, small capitals. The first is not defined and should probably be avoided. However these fonts are implemented it should be possible to underline them, and italicize them. Bold versions are not needed, but it should be possible to draw the outline of each letter in a different color and opacity than the fill. Finally, these fonts must allow superscripts, subscripts, and be able to support Latin-1 plus the additional symbols in CTA-708, such as the Csymbol and the dozen or so Unicode characters in this standard. Below are some font examples, for more see the Wikipedia Fonts article.Windows
The window addressable area should always be within the Safe-Title area, so that all addressable locations are within the display window if the monitor overscans the image onto a non-rectangular screen. If the video stream has a 16:9 aspect ratio the addresses should be in the range 0..74 for the vertical addresses, and 0..209 for the horizontal addresses. If the video stream has a 4:3 aspect ratio the addresses should be in the range 0..74 for the vertical addresses, and 0..159 for the horizontal addresses. For other aspect ratios relative addressing should be used and both vertical and horizontal addresses should be in the range 0..99%. The window size should be scaled based on the font size. With this in mind, rows longer than 32 characters are discouraged even on 16:9 ratio screen so that larger than specified fonts may be selected by the user.Row and column locking
Row and column locking features are supported in the CTA-708-B standard but in the later version CTA-708-C it has been assumed that both rows and columns are locked. The basic functionality is as below: In total, four combinations are provided 1) Row locked and Column locked 2) Row unlocked and Column locked 3) Row locked and Column unlocked 4) Row unlocked and Column unlocked 1. Row locked and Column locked: If both rows and columns are locked then the window size in terms of columns and rows can't be extended. For a window if the number of rows and columns are defined as, say 3 and 10, then the text "ROWS AND COLUMNS ARE NOT LOCKED FOR EVER AND EVER AND EVER" which comes in the 0 row looks like below (assume that word wrapping is disabled) 1. ROWS AND C 2. 3. Since both are locked, text cannot be extended beyond 10 columns and also row cannot be extended beyond the 0 row. 2. Row unlocked and Column locked: In this case the window can be extended up to the max row given in the window define command. The same above text will look like below 1. ROWS AND C 2. OLUMNS ARE 3. NOT LOCKED Row is unlocked so text can be extended up to max rows of a window define command. 3. Row locked and Column unlocked: In this case the window can be extended up to max number columns. As per the CTA-708 standard Max number of columns for any window is 32. The same above text then look like below 1. ROWS AND COLUMNS ARE NOT LOCKED 2. 3. Column is unlocked so text can be extended up to max columns. 4. Row unlocked and Column unlocked: In this case the window can extended in terms of both rows and columns. The same above text then look like below 1. ROWS AND COLUMNS ARE NOT LOCKED 2. FOR EVER AND EVER AND EVER Since both are unlocked so the text can extended up to 32 columns and as well as total rows.Implementation notes
* The minimum buffer size for each of the 63 possible services (Service Input Buffers) is 128 bytes. * In a caption decoder the DelayCancel and Reset commands should be interpreted outside the buffering mechanism. It should be safe to scan just for the 0x8E and 0x8F codes. * In a caption encoder the 0x8E and 0x8F values might need to be encoded in a parameter to another command. Commands can be split into several subcommands to avoid this problem. * The closed caption icon in the G3 code set must not be rendered with rounded corners in a WTO country, due to trademark licensing problems.References
External links
* Critique of CTA-70