CBDF Specification: Document Structure

Version 1.0 (Phase II)

Document: 01-Document-Structure • Date: 2026-03-25

1. Overview

A CBDF document is a binary file divided into up to 5 major sections, separated by FS (0x1C) markers. Each section has a specific purpose and a defined internal structure.

Section order:

[Meta] FS [Styles] FS [Text] FS [Resources] FS [Logic]

The order places Text before Resources to enable "abortable downloads": clients can close the network stream after receiving text, skipping image data entirely.

Click diagram to open full size in new tab

2. Section Overview

#	Section	Length Prefix	Compressed?	Required?	Description
1	Meta	No (KV pairs)	Never	Yes	Document envelope/metadata
2	Styles	4 bytes LE	Yes*	Yes**	Visual formatting tables
3	Text	4 bytes LE	Yes*	Yes**	Marked-up body content
4	Resources	4 bytes LE	Never	No	Binary blobs (images, etc.)
5	Logic	4 bytes LE	TBD	No	Executable code (Phase III)

Notes

* Styles and Text are compressed together if compression is enabled. The compressed blob spans from the first FS to the third FS.

** May be empty (zero-length) but the FS separators must be present, unless the meta section contains an EOF flag (see Section 4).

3. Phase I Structure (backward compatible)

[Pair Count: 2 bytes LE] [Meta KV pairs]
0x1C
0x1C
0x02 [Plain text body] EOF

Phase I has no styles (empty section between the two FS markers), no resources, and no logic. The body starts with STX (0x02) and continues until EOF. Version byte in meta = 0.

4. Phase II Structure

4A. Standard Document (version >= 1, NO compression)

[Pair Count: 2 bytes LE] [Meta KV pairs]
0x1C [Styles Length: 4 LE] [Styles section bytes]
0x1C [Text Length: 4 LE] [0x02 ... markup ... 0x03]
0x1C [Resources Length: 4 LE] [Resource blobs]
0x1C [Logic -- empty for Phase II]

Length prefix convention

The 4-byte LE length appears AFTER the FS marker and is NOT included in the length value itself. The length counts only the section content bytes that follow it.

FS [Length=N: 4 bytes] [N bytes of section data]

FS markers are ALWAYS present, even for empty sections. An empty section is: [FS] [0x00 0x00 0x00 0x00] (FS + 4-byte zero length).

4A2. Standard Document (version >= 1, WITH compression)

When meta key 31 (Compression Type) is non-zero, the Styles and Text sections are compressed together into a single blob. The structure between the first and third FS markers changes:

[Pair Count: 2 bytes LE] [Meta KV pairs]
0x1C [Compressed Length: 4 LE] [Decompressed Length: 4 LE] [compressed data]
0x1C [Resources Length: 4 LE] [Resource blobs]
0x1C [Logic -- empty for Phase II]

Compressed blob header:

[Compressed Length: 4 bytes LE]   -- bytes of compressed data that follow
[Decompressed Length: 4 bytes LE] -- size after decompression

The compressed data, once decompressed, yields:

[Styles Length: 4 LE] [Styles section bytes]
0x1C
[Text Length: 4 LE] [0x02 ... markup ... 0x03]

Note

The middle FS (between Styles and Text) is INSIDE the compressed blob. After decompression, the parser finds individual section length prefixes and the FS separator as normal.

Parser flow:

Read meta (uncompressed).
Read first FS.
Read Compressed Length (4 bytes) and Decompressed Length (4 bytes).
Read Compressed Length bytes of data.
Decompress to Decompressed Length bytes.
Parse decompressed data: [Styles Len][Styles][FS][Text Len][Text].
Read third FS (after compressed blob).
Read Resources section (uncompressed).

4B. Meta-Only Document (EOF flag)

[Pair Count: 2 bytes LE] [Meta KV pairs including EOF flag]

When the meta section contains an EOF flag key (value = 1), the document consists of ONLY the meta section. No FS separators follow. No styles, text, resources, or logic sections exist.

This enables ultra-compact messages:

Plain text SMS-style messages (subject = the message)
Simple email notifications
System status messages

Example: "Hello" message

[0x03 0x00]           Pair count: 3
[key] [0x01] [0x01]   Version: 1
[key] [0x01] [0x01]   EOF flag: 1
[key] [0x05] [Hello]  Subject: "Hello"
Total: ~14 bytes

5. Meta Section ("The Envelope")

The meta section is a sequence of key/value pairs. It is NEVER compressed, allowing clients to read it without decompression. In QMail beta this is the private file_type=0 object, not the public Tell envelope; beacon-visible Tells must not carry subject, preview text, filenames, or labels.

5A. Meta Format

[Pair Count: 2 bytes LE] (total number of KV pairs)
[Key ID: 1 byte] [Value Length: 1 byte] [Value: N bytes]
[Key ID: 1 byte] [Value Length: 1 byte] [Value: N bytes]
...

Arrays (e.g., multiple To recipients) use repeated keys. Each element is its own KV pair with the same Key ID.

5B. Meta Key Table

The complete meta key table (all keys, value sizes, formats, phases, and the mailbox address layout) is maintained on a single authoritative page to avoid duplication. See CBDF Meta Section — Meta Key Table.

This Document Structure page only describes WHERE the meta section sits in the document and its key/value framing (Section 5A above). The specific keys, their meanings, and the 7-byte mailbox address format are documented on the Meta Section page.

6. Compression

When meta key 31 (Compression Type) is present and non-zero:

Scope: The Styles and Text sections are compressed TOGETHER into a single blob with a two-field header. See Section 4A2 for the exact layout.

What is compressed:

Styles section (including its length prefix)
The FS marker between Styles and Text
Text section (including its length prefix)

What is NOT compressed:

Meta section (must be readable for envelope display)
Resources section (images are already compressed formats)
The first FS marker (before the compressed blob)
The third FS marker (after the compressed blob, before Resources)

Compressed blob header (after the first FS):

[Compressed Length: 4 bytes LE]    -- how many compressed bytes follow
[Decompressed Length: 4 bytes LE]  -- expected size after decompression

This allows the parser to:

Allocate a decompression buffer of the right size.
Skip the compressed blob (using Compressed Length) to reach Resources without decompressing, if only images are needed.

Compression algorithms:

Value	Algorithm
`0`	None (no compression)
`1`	DEFLATE/zlib
`2`	LZ4
`3`	Zstandard (Zstd)
`4`	Brotli
`5`	Semantic encoding (see Section 6B)
`6-255`	Reserved

6B. Semantic Encoding (Compression Type 5)

Semantic encoding replaces the Text section's UTF-8 content with a compact latent-space representation that an AI model can reconstruct into equivalent text. Instead of transmitting the actual words, the sender encodes the MEANING into a small token sequence or embedding vector. The receiver's identical model decodes this back into natural language text.

This can achieve dramatic size reductions for long documents (e.g., a 10KB email body could become a few hundred bytes) but is inherently LOSSY -- the reconstructed text conveys the same meaning but may use different words.

6B1. Requirements

The sender and receiver MUST have the exact same model (same ID and version). Even minor model updates change the latent space, which can produce incorrect or garbled output.
The meta section MUST include:
- Key 31 (Compression Type) = 5 (semantic encoding)
- Key 38 (Semantic Model): Identifies the model. Format: [Model ID: 4 bytes] [Version Hash: 16 bytes]
- Key 36 (Preview Text): REQUIRED when semantic encoding is used. Provides a plain UTF-8 fallback.
The meta section SHOULD include:
- Key 35 (AI Summary): A human-readable summary as additional fallback.
- Key 39 (Semantic Flags): Bit 0: Lossy OK, Bit 1: Exact words matter.

6B2. Document Structure with Semantic Encoding

[Pair Count: 2 LE] [Meta KV pairs (including model ID and preview text)]
0x1C [Styles Length: 4 LE] [Styles section -- normal, not semantically encoded]
0x1C [Semantic Payload Length: 4 LE] [Semantic tokens/embedding]
0x1C [Resources Length: 4 LE] [Resources -- normal]
0x1C [Logic]

The Styles section is NOT semantically encoded (it's already compact binary). Only the Text section is replaced with semantic tokens.

6B3. Receiver Behavior

If the receiver has the matching model:

Read meta (including model ID and version hash).
Verify the local model matches (same ID and version hash).
Parse styles section normally.
Read semantic payload.
Decode semantic payload using the matching model → produces UTF-8 text.
Render the decoded text with styles applied.

If the receiver does NOT have the matching model:

Read meta.
Display Preview Text (key 36) and/or AI Summary (key 35) as the message body.
Optionally display a notice: "This message uses semantic encoding. Install [model name] for full rendering."
The Styles section and Resources section are still fully usable -- the client can render a styled version of the Preview Text.

6B4. When to Use Semantic Encoding

Good Candidates	Bad Candidates
Long conversational emails (high redundancy)	Legal documents, contracts (exact wording matters)
Newsletter content (summaries, descriptions)	Technical specifications (precision required)
Automated notifications (system-generated text)	Poetry, literature (artistic word choice matters)
Bulk messages to groups (bandwidth savings multiply)	Short messages (overhead exceeds savings)

6B5. Future Considerations

A registry of standardized Semantic Model IDs will be needed as this feature matures.
Hybrid encoding (some paragraphs semantic, some literal) may be added in Phase III.
Semantic encoding of the Styles section itself overlaps with the AI prompt feature (SUB 0x1A) and is not duplicated here.

7. Default Style Sets

When meta key 32 (Default Style Set) is >= 1:

Value 1: Use the client's built-in default style set. The styles section may be empty or contain delta overrides.
Value 2-255: Named style set IDs. Predefined style collections (e.g., "corporate," "newsletter," "casual") that clients recognize.

Delta encoding: When a default style set is active, any records present in the styles section REPLACE the corresponding default record at the same index. Records not present use the default.

Client themes: Users may apply visual themes (dark mode, high contrast, custom fonts) that override document styles. Themes are a client-side rendering concern and are not part of the CBDF format specification.

8. Rendering Model

Phase A: Immediate rendering.

Read meta section (always uncompressed).
If EOF flag: display subject as message. Done.
Decompress styles + text (if compressed).
Parse styles section (build style tables).
Parse text section (render content with image placeholders).
Display rendered content to user.

Phase B: Progressive resource loading (optional).

Stream resources section.
Replace image placeholders with actual images as they arrive.
Client may skip this phase entirely (abortable download).

This is analogous to how web browsers progressively render pages -- text and layout appear first, images fill in afterward.

9. Byte Order

All multi-byte integers in CBDF are LITTLE-ENDIAN (LE). This includes: pair counts, section lengths, timestamps, font IDs, color values, image dimensions, and all other multi-byte fields.

10. File Extensions

Extension	Description
`.qmail`	QMail email document
`.qweb`	QWeb page document
`.cbdf`	Generic CBDF document