Loading vLEI.wiki
Fetching knowledge base...
Fetching knowledge base...
This comprehensive explanation has been generated from 75 GitHub source documents. All source documents are searchable here.
Last updated: October 7, 2025
This content is meant to be consumed by AI agents via MCP. Click here to get the MCP configuration.
Note: In rare cases it may contain LLM hallucinations.
For authoritative documentation, please consult the official GLEIF vLEI trainings and the ToIP Glossary.
Self-framing is an encoding property where each primitive contains type, size, and value information in a single atomic unit, enabling parsers to extract elements from a stream without external delimiters or schemas by reading only the beginning of each element.
Self-framing is a fundamental encoding property in CESR where each primitive embeds its own metadata—specifically type and size information—directly within its encoding. This design enables parsers to determine exactly how many characters (in text domain) or bytes (in binary domain) to extract for a given element without parsing the element's content or relying on external schemas, delimiters, or encapsulation structures.
The core principle is that a stream of concatenated self-framing primitives can be parsed sequentially, with each primitive being extracted atomically based solely on information contained in its leading characters or bytes. This property eliminates the need for:
Self-framing operates through prepended codes (derivation codes, framing codes, or count codes) that encode both the type of primitive and sufficient information to calculate its total length. A parser reads these leading codes, determines the primitive's boundaries, extracts exactly that many characters/bytes, and continues to the next primitive—all without examining the primitive's actual content.
Traditional encoding schemes face a fundamental trade-off between human readability and parsing efficiency:
Text-based protocols (HTTP, SMTP, JSON) use delimiters and structural markers that are human-readable but require parsing the entire content to find boundaries. JSON, for example, requires parsing nested structures to determine where objects and arrays end, making it inherently non-self-framing.
Binary protocols (TCP, UDP, DNS) achieve compact representation but sacrifice readability and often require complex framing mechanisms or length-prefixed fields that are protocol-specific.
Hybrid formats (XML, JSON/CBOR, MessagePack) attempt to bridge this gap by offering both text and binary serializations, but they still rely on structural parsing rather than self-framing at the primitive level.
Implementations must maintain code tables mapping derivation codes to primitive types and lengths. These tables should be:
Parsers should implement:
When encoding primitives:
ps = (3 - (N mod 3)) mod 3 where N is raw binary lengthFor count codes:
When converting between text and binary domains:
For high-throughput applications:
The concept of self-describing data formats has existed in various forms—ASN.1's tag-length-value encoding, Protocol Buffers' varint encoding, and CBOR's major type system all incorporate aspects of self-description. However, these approaches typically focus on either text or binary domains, not both simultaneously, and don't provide the composability property that CESR achieves.
CESR implements self-framing as a core architectural principle that enables its unique text-binary concatenation composability. KERI's approach differs from traditional implementations in several critical ways:
CESR primitives are self-framing in both text and binary domains simultaneously. A primitive encoded in Base64 text includes a derivation code that specifies both its type and the number of characters it occupies. The same primitive in binary domain includes equivalent information encoded in bytes. Critically, the transformation between domains preserves the self-framing property—converting a concatenated stream from text to binary and back maintains each primitive's boundaries.
CESR achieves self-framing through strict 24-bit boundary alignment—the least common multiple of 6 bits (Base64 character) and 8 bits (byte). This constraint ensures that:
This alignment is achieved through pre-padding with lead bytes before Base64 conversion, rather than the trailing = pad characters used in naive Base64 encoding. The lead bytes are then replaced with derivation codes that encode the primitive's type and size.
CESR employs multiple code tables optimized for different primitive characteristics:
Each code table is designed so that the first character(s) uniquely identify the primitive type and implicitly specify its total length, enabling atomic extraction.
Beyond individual primitives, CESR extends self-framing to groups of primitives through count codes (also called group framing codes). These special codes specify the number of primitives in a group, enabling parsers to extract entire groups without parsing individual elements. This hierarchical self-framing supports:
CESR's self-framing design includes mechanisms for cold start stream parsing—recovering from parsing errors or system reboots without flushing buffers. Special count codes at boundaries between CESR and other serialization formats (JSON, CBOR, MGPK) serve as synchronization points. If a parser becomes confused by malformed data, it can skip forward to the next well-defined boundary and resume parsing, rather than requiring a complete stream restart.
Self-framing enables CESR to serve as the foundation for streaming text protocols similar to STOMP (Streaming Text Oriented Messaging Protocol) and RAET (Reliable Asynchronous Event Transport). These protocols benefit from delimiter-free parsing where primitives can be extracted and processed as they arrive, without buffering entire messages.
Self-framing is essential for several KERI ecosystem applications:
Key Event Logs (KEL): Key events and receipts are encoded as CESR streams where each cryptographic primitive (keys, signatures, digests) is self-framing. Validators can parse these logs efficiently without external schemas, and the logs remain human-readable in text domain while being compact in binary domain.
ACDC Credentials: Authentic Chained Data Containers use CESR encoding for cryptographic commitments. Self-framing enables efficient verification where parsers can extract and verify signatures without parsing entire credential structures.
Witness Networks: Witnesses exchange CESR-encoded messages containing key event receipts. Self-framing enables efficient message routing and processing in high-throughput witness pools.
OOBI Discovery: Out-of-band introductions use CESR encoding for endpoint discovery information. Self-framing enables compact representation while maintaining parseability.
Delimiter-Free Parsing: No special characters needed to separate primitives, eliminating escaping complexity and enabling efficient stream processing.
Schema-Free Interpretation: Parsers don't require external schemas or configuration to interpret primitive boundaries—all necessary information is embedded in the encoding.
Efficient Stream Processing: Parsers can determine primitive boundaries by reading only the leading codes, enabling single-pass parsing without lookahead or backtracking.
Human Readability: Text domain encoding remains readable for debugging, logging, and audit trails while maintaining the efficiency benefits of self-framing.
Composability: Self-framing primitives can be freely concatenated and converted between text and binary domains while maintaining separability—a unique property not found in traditional encodings.
Pipeline Processing: Group framing codes enable efficient routing of primitive groups to different processors without parsing individual elements, supporting high-bandwidth applications.
Error Recovery: Cold start mechanisms enable graceful recovery from parsing errors without losing buffered data.
Alignment Overhead: The 24-bit alignment constraint requires pre-padding, which adds a small amount of overhead compared to naive encodings. However, this overhead is minimal (typically 0-2 bytes) and is offset by the elimination of delimiter characters and the efficiency gains from self-framing.
Code Table Complexity: Supporting multiple code tables for different primitive types adds implementation complexity compared to simpler encoding schemes. However, this complexity is encapsulated in the CESR specification and implementations, not exposed to protocol designers.
Learning Curve: Developers must understand CESR's code table architecture and alignment constraints, which differs from familiar encodings like JSON or Protocol Buffers. However, once understood, CESR provides significant advantages for cryptographic protocols.
Limited Extensibility: Adding new primitive types requires updating code tables and potentially coordinating across implementations. However, CESR's design includes extensibility mechanisms through version codes and reserved code spaces.
Self-framing represents a fundamental architectural choice in CESR that enables its unique combination of human readability, compact binary representation, and efficient stream processing—properties essential for KERI's decentralized key management infrastructure.