CBDF Specification: Text Section

Version 1.0 (Phase II)

Document: 04-Text-Section • Date: 2026-03-25

1. Overview

The Text section contains the document's readable content -- UTF-8 text interspersed with control codes that apply styles, insert elements, and mark semantic boundaries.

It sits between the second FS (end of Styles) and the third FS (start of Resources). It begins with a 4-byte LE length prefix, followed by STX (0x02) and ending with ETX (0x03).

Structure:

[Length: 4 bytes LE] [STX] [content...] [ETX]

2. Plain-Text Extraction

A plain-text reader extracts readable text by:

  1. Decompressing the section (if compression is enabled in meta).
  2. Scanning all bytes between TEXT_START (0x02) and TEXT_END (0x03).
  3. Keeping only:
    • Bytes >= 0x20 (all printable ASCII and UTF-8)
    • 0x09 TAB = Horizontal tab
    • 0x0A LINE_BREAK = Line feed
    • 0x0D HORIZ_RULE optionally replaced with "---"
  4. Stripping all other bytes 0x00-0x1F and their payloads.

To strip payloads correctly, the plain-text extractor must understand which control codes have payloads and how long they are:

CodePayload to skip
0x0D1 byte (style index)
0x0E1 + 1 + N bytes (type + length + target)
0x102 + N bytes (length LE + raw data)
0x111 byte (style index)
0x121 byte (style index)
0x131 byte (style index)
0x151 byte (element ID), or 3 if ID=0xFF
0x161 byte (image definition index)
0x192 bytes (type + style index)
0x1A1 + 2 + N bytes (type + length LE + prompt)
0x1B1 + variable (Phase III; skip code byte)
All others0 bytes (no payload)

3. Text Section Structure

[TEXT_START]
  [Optional: SUBJECT_START block (styled subject)]
  [Body content: text + control codes]
[TEXT_END]

SUBJECT_START block (if present, for emails):

[SUBJECT_START] [STYLE_TEXT style_id] Styled subject text [STYLE_END]

The SUBJECT_START block appears inside the TEXT_START/TEXT_END region so the parser has a single mode for reading styled content.

4. Control Code Usage Patterns

4A. Applying Text Styles

Switch to a text style:

[STYLE_TEXT index]  -- applies text style #index from the Text Styles sub-table

Revert to previous style:

[STYLE_END]         -- pops the style stack

Example:

[STYLE_TEXT 0x00] This is default text. [STYLE_TEXT 0x01] This is bold.
[STYLE_END] This is default again.

4B. Containers

Open a container:

[STYLE_CONTAINER index]  -- starts container with composite style #index

Close a container:

[BLOCK_END]              -- closes the current block and pops its style

Nested containers:

[STYLE_CONTAINER 0x00]
  Outer content
  [STYLE_CONTAINER 0x01]
    Inner content
  [BLOCK_END]
  More outer content
[BLOCK_END]

4C. Tables

Cells separated by UNIT_SEP (0x1F). Rows separated by RECORD_SEP (0x1E). End table with BLOCK_END (0x17).

Example (2 rows, 3 columns):

[STYLE_TABLE 0x00] Name [UNIT_SEP] Age [UNIT_SEP] City [RECORD_SEP]
Alice [UNIT_SEP] 30 [UNIT_SEP] NYC [BLOCK_END]

Text styles can be applied within cells:

[STYLE_TABLE 0x00] [STYLE_TEXT 0x02] Header1 [STYLE_END] [UNIT_SEP]
[STYLE_TEXT 0x02] Header2 [STYLE_END] [RECORD_SEP] Data1 [UNIT_SEP]
Data2 [BLOCK_END]

4D. Lists and Nav Bars

Start an item block: [ITEM_BLOCK type style_index]

Items separated by UNIT_SEP (0x1F). Block ends with BLOCK_END (0x17).

Unordered list (type=0):

[ITEM_BLOCK 0x00 0x00] First item [UNIT_SEP] Second item [UNIT_SEP]
Third item [BLOCK_END]

Ordered list (type=1):

[ITEM_BLOCK 0x01 0x00] Step one [UNIT_SEP] Step two [UNIT_SEP]
Step three [BLOCK_END]

Nav bar (type=2):

[ITEM_BLOCK 0x02 0x00]
  [LINK_START 0x00 0x05 /home] Home [LINK_END]
[UNIT_SEP]
  [LINK_START 0x00 0x06 /about] About [LINK_END]
[UNIT_SEP]
  [LINK_START 0x00 0x08 /contact] Contact [LINK_END]
[BLOCK_END]

4E. Links

[LINK_START type length target_bytes] visible link text [LINK_END]

Types:

TypeFormat
0 = URL[LINK_START 0x00 len url_bytes] text [LINK_END]
1 = QWeb Page ID[LINK_START 0x01 len page_id_bytes] text [LINK_END]
2 = Mailbox[LINK_START 0x02 0x07 mailbox_7_bytes] text [LINK_END]
3 = Action[LINK_START 0x03 0x01 action_id] text [LINK_END]

Example:

Visit [LINK_START 0x00 0x12 https://example.com] our website [LINK_END] today.

4F. Images

Insert image at current position: [IMAGE image_def_index]

The index references an Image Definition in the Styles section's Image Definitions sub-table. The Image Definition contains the Resource ID, width, height, fit mode, alignment, and border style.

Example:

Here is our logo: [IMAGE 0x01] And here is more text.
(Inserts image defined by Image Definition #1 in the Styles section)

4G. AI Prompts

[AI_PROMPT type len_lo len_hi prompt_bytes]

Types: 0=style, 1=image, 2=layout

Example (style prompt, 44 bytes):

[AI_PROMPT 0x00 0x2C 0x00] Create a cloudy sky style with soft blue tones

4H. Element IDs

Assign a stable ID to the next element: [ELEMENT_ID id] [element]

IDs must be unique and stable across re-serialization.

[ELEMENT_ID 0x01] [STYLE_CONTAINER 0x05] This container has ID=1 [BLOCK_END]
[ELEMENT_ID 0x02] [IMAGE 0x03]  (image definition #3, element ID=2)

Extended ID (for documents with >254 elements): [ELEMENT_ID 0xFF] [id_lo] [id_hi] (2-byte LE extended ID)

4I. Paragraph and Page Breaks

CommandDescription
[PARA_BREAK]Inserts paragraph spacing (larger than a line break)
[PAGE_BREAK]Starts a new rendering page
[LINE_BREAK]Standard line break
[HORIZ_RULE style_index]Draws a styled horizontal line

5. Worked Examples

5A. Simple Styled Email: "Hello World" in bold red

Styles section: Layout 0x00 (main only), no page background. Text styles sub-table: 2 records, base tier. Style 0: Arial 12pt, normal, black on transparent. Style 1: Arial 14pt, bold, red (#F800) on transparent.

Text section hex dump:

02                      STX
01                      SOH (styled subject)
  11 01                 DC1, apply text style #1 (bold red)
  47 72 65 65 74 69     "Greeti"
  6E 67                 "ng"
  14                    DC4 (pop style)
11 00                   DC1, apply text style #0 (default)
48 65 6C 6C 6F 20       "Hello "
11 01                   DC1, apply text style #1 (bold red)
57 6F 72 6C 64 21       "World!"
14                      DC4 (pop)
03                      ETX

Plain text extraction result: Greeting Hello World!

5B. Two-Column Layout with Nav Bar

Layout byte: 0x11 (header + main, 2 columns)

02                          STX
12 00                       DC2, container style #0 (header panel)
  19 02 00                  EM type=2(nav), nav style #0
    0E 00 05 2F686F6D65     SO URL "/home"
    48 6F 6D 65             "Home"
    0F                      SI
  1F                        US (next nav item)
    0E 00 06 2F61626F7574   SO URL "/about"
    41 62 6F 75 74          "About"
    0F                      SI
  17                        ETB (end nav bar)
17                          ETB (end header container)
12 01                       DC2, container style #1 (left column)
  11 00                     DC1, text style #0
  4C 65 66 74 20            "Left "
  63 6F 6C 75 6D 6E        "column"
17                          ETB
12 02                       DC2, container style #2 (right column)
  11 00                     DC1, text style #0
  52 69 67 68 74 20         "Right "
  63 6F 6C 75 6D 6E        "column"
17                          ETB
03                          ETX

Plain text extraction: Home About Left column Right column

5C. Meta-Only Message (EOF flag)

03 00                   Pair count: 3
1E 01 01                Key 30 (version), length 1, value 1
21 01 01                Key 33 (EOF flag), length 1, value 1
02 05 48 65 6C 6C 6F    Key 2 (subject), length 5, "Hello"

Total: 14 bytes. No FS markers, no styles, no text, no resources.
Client displays "Hello" as the message.

5D. Table Example

02                      STX
13 00                   DC3, table style #0
  11 02                 DC1, text style #2 (header style)
  4E 61 6D 65           "Name"
  14                    DC4
1F                      US (next cell)
  11 02                 DC1, text style #2
  41 67 65              "Age"
  14                    DC4
1E                      RS (next row)
  41 6C 69 63 65        "Alice"
1F                      US
  33 30                 "30"
1E                      RS
  42 6F 62              "Bob"
1F                      US
  32 35                 "25"
17                      ETB (end table)
03                      ETX

Plain text extraction: Name Age Alice 30 Bob 25