Loading vLEI.wiki
Fetching knowledge base...
Fetching knowledge base...
This comprehensive explanation has been generated from 38 GitHub source documents. All source documents are searchable here.
Last updated: October 7, 2025
This content is meant to be consumed by AI agents via MCP. Click here to get the MCP configuration.
Note: In rare cases it may contain LLM hallucinations.
For authoritative documentation, please consult the official GLEIF vLEI trainings and the ToIP Glossary.
A dual text-binary encoding format is an encoding scheme that supports both human-readable text representation and compact binary representation of the same data, with full bidirectional composability—meaning concatenated primitives can be converted en masse between text and binary domains without loss while maintaining individual primitive separability.
The dual text-binary encoding format is a fundamental architectural property of CESR (Composable Event Streaming Representation) that enables cryptographic primitives and data structures to be represented in two fully interchangeable domains: a human-readable text domain using Base64 URL-safe encoding, and a compact binary domain optimized for efficient transmission and storage. This duality is achieved through careful alignment of encoding boundaries and the use of prepended derivation codes that preserve composability across domain transformations.
Formally, an encoding exhibits dual text-binary format capability when transformations T(B) (binary-to-text) and B(T) (text-to-binary) satisfy the composability property:
T(cat(b[k])) = cat(T(b[k])) for all k
B(cat(t[k])) = cat(B(t[k])) for all k
Where cat() represents concatenation, b[k] are primitives in the binary domain, and t[k] are primitives in the text domain. This mathematical property ensures that converting a stream of concatenated primitives as a group produces the same result as concatenating individually converted primitives.
The dual format serves multiple critical purposes in KERI:
Implementations should use a unified parser architecture that handles both text and binary domains through a common abstraction layer. The parser should:
Zero-Copy Optimization: When possible, avoid copying primitive data during format conversion. Instead, maintain pointers to the original data and perform conversion only when the data is accessed. This is particularly important for large streams containing many primitives.
Buffer Pooling: Reuse conversion buffers to reduce allocation overhead. Maintain separate buffer pools for text-to-binary and binary-to-text conversions, sized appropriately for the largest expected primitive.
The derivation code tables are central to dual format implementation:
Incremental Parsing: Implement streaming parsers that can process primitives as they arrive without requiring the entire stream in memory. This is critical for handling large KELs or ACDC chains.
Format Boundaries: When transitioning between text and binary formats in a stream, use explicit group codes to mark the transition point. This enables parsers to switch modes without ambiguity.
Error Recovery: Design parsers to recover from format errors by skipping to the next valid primitive boundary rather than failing the entire stream. This improves robustness in the presence of transmission errors.
Comprehensive Round-Trip Testing: Every primitive type must be tested for lossless round-trip conversion in both directions (text→binary→text and binary→text→binary).
The dual text-binary encoding format maintains cryptographic integrity through several mechanisms:
Derivation Code Binding: Each primitive includes a prepended derivation code that specifies the cryptographic algorithm used. This code is preserved across domain transformations, ensuring that a digest computed in one domain can be verified in the other domain. For example, a Blake3-256 digest with derivation code E in text domain (EABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop) converts to binary domain while maintaining the same cryptographic commitment.
24-Bit Boundary Alignment: The encoding achieves composability through strict alignment to 24-bit boundaries—the least common multiple of 6 bits (Base64 character width) and 8 bits (byte width). Every primitive in the text domain must be an integer multiple of 4 Base64 characters (24 bits), and every primitive in the binary domain must be an integer multiple of 3 bytes (24 bits). This alignment ensures that concatenated primitives can be converted en masse without crossing primitive boundaries.
Self-Framing Structure: Each primitive is self-framing, embedding type, size, and value information. This structure enables parsers to extract primitives from streams without external delimiters, and the self-framing property is preserved across domain transformations. A parser can determine primitive boundaries in either domain without needing to parse the entire primitive content.
The dual format provides several security guarantees:
Non-Repudiation: Digital signatures created in one domain remain verifiable in the other domain. A signature computed over text-domain data can be verified against binary-domain data after proper transformation, maintaining non-repudiation properties.
Tamper Evidence: Any modification to a primitive in either domain will be detected upon verification. The cryptographic binding between derivation codes and primitive values ensures that tampering is evident regardless of which domain is used for storage or transmission.
Algorithm Agility: The derivation code system supports multiple cryptographic algorithms within the same encoding framework. This enables post-quantum algorithm migration without breaking the dual format property—new algorithms can be added with new derivation codes while maintaining backward compatibility.
The text domain uses Base64 URL-safe encoding with the character set A-Z, a-z, 0-9, -, _. This 64-character alphabet avoids problematic characters for URLs and file systems. Each Base64 character represents 6 bits of information.
A typical text-domain primitive structure:
[Derivation Code][Base64 Encoded Value]
For example, a Blake3-256 digest:
EABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop
E: Derivation code indicating Blake3-256The binary domain represents the same information in compact byte form. The derivation code is encoded in the first byte(s), followed by the raw cryptographic material.
For the same Blake3-256 digest:
[0x1C][32 bytes of digest]
0x1C: Binary derivation code for Blake3-256CESR uses prepended derivation codes rather than traditional pad characters. In standard Base64, the = character pads strings to 4-character boundaries. CESR eliminates this by:
This approach ensures that:
= pad characters appear in CESR streamsText-to-Binary Transformation (T2B):
Binary-to-Text Transformation (B2T):
These transformations are lossless and reversible, satisfying the composability property for concatenated primitives.
Key Event Logs leverage dual format encoding extensively:
Inception Events: The inception event contains the initial public key(s) and configuration. In text domain, this enables human-readable audit logs and debugging. In binary domain, it provides compact storage and efficient transmission to witnesses.
Rotation Events: Key rotation events include current keys, next key digests, and signatures. The dual format allows developers to inspect rotation logic in text form while production systems use binary for performance.
Interaction Events: Interaction events anchor external data through seals. These seals are CESR-encoded digests that benefit from dual format—text for human verification of anchored data, binary for efficient event log storage.
ACDCs (Authentic Chained Data Containers) use dual format for:
SAIDs: Self-Addressing Identifiers are CESR-encoded digests that identify credential blocks. Text format enables URL-based credential references, while binary format supports compact credential chains.
Attribute Commitments: Selectively disclosable attributes use CESR-encoded digests as commitments. The dual format allows:
Edge References: ACDC edges that link credentials use CESR-encoded AIDs and SAIDs. Dual format enables both human-readable credential graphs and efficient graph traversal algorithms.
Witness receipts are CESR-encoded signatures attached to key events. The dual format provides:
Development: Text-domain receipts in log files for debugging witness consensus issues Production: Binary-domain receipts for efficient network transmission and storage Verification: Format-agnostic verification—receipts can be verified regardless of storage format
Out-of-Band Introductions use dual format for:
URL Encoding: Text-domain AIDs in OOBI URLs for web-based identifier discovery Response Payloads: Binary-domain KEL transmission for efficient bootstrap Caching: Format conversion based on cache requirements (text for debugging, binary for production)
The dual format is fundamentally enabled by composability—the mathematical property that concatenated primitive transformations preserve primitive boundaries. Without composability, dual format would require primitive-by-primitive conversion, losing the efficiency advantages.
Self-framing primitives embed type, size, and value information, enabling parsers to extract primitives without external delimiters. This property is essential for dual format because it ensures primitives remain parseable in both domains without requiring format-specific delimiters.
CESR count codes enable grouping of primitives for efficient stream processing. Count codes themselves are dual-format primitives that specify how many subsequent primitives belong to a group, enabling pipelined parsing in both text and binary domains.
Qualified primitives include prepended derivation codes that specify cryptographic algorithms. This qualification is preserved across domain transformations, ensuring that algorithm information travels with the primitive regardless of format.
Lazy Conversion: Implementations should avoid unnecessary format conversions. Store primitives in the format most appropriate for the use case and convert only when required by protocol or API boundaries.
Batch Processing: When converting multiple primitives, leverage composability to perform en-masse conversion rather than primitive-by-primitive transformation. This reduces overhead and improves throughput.
Code Table Caching: Derivation code tables should be cached in memory to avoid repeated lookups during parsing. The code table maps derivation codes to primitive types, lengths, and padding requirements.
Invalid Derivation Codes: Parsers must reject primitives with unrecognized derivation codes rather than attempting to guess the format. This prevents security vulnerabilities from malformed primitives.
Boundary Violations: Verify that all primitives align to 24-bit boundaries. Primitives that violate this constraint indicate either corruption or malicious tampering.
Incomplete Streams: When parsing concatenated primitives, detect incomplete primitives at stream boundaries and buffer them for completion rather than treating them as errors.
Round-Trip Testing: Verify that B(T(b)) = b and T(B(t)) = t for all primitive types. This ensures lossless conversion.
Concatenation Testing: Test that T(cat(b[k])) = cat(T(b[k])) for various primitive combinations to verify composability.
Boundary Testing: Test primitives at 24-bit boundaries and verify that padding is correctly handled in both domains.
Protocol Negotiation: Systems should negotiate format preference during connection establishment. Text format for debugging/development, binary for production.
Format Detection: Implement sniffable stream detection that can identify whether a CESR stream is in text or binary format by examining the first few bytes/characters.
Hybrid Streams: Support streams that contain both text and binary primitives by using format-specific group codes that indicate domain transitions.
The dual text-binary encoding format represents a significant innovation in protocol design, solving the traditional trade-off between developer experience and production performance. By providing mathematically guaranteed composability across domains, CESR enables KERI to achieve:
This architectural approach has influenced the design of other cryptographic protocols and demonstrates that careful attention to encoding fundamentals can eliminate false dichotomies in protocol design.
Concatenation Composability: Test that concatenated primitives convert correctly as a group, verifying that primitive boundaries are preserved.
Boundary Alignment: Verify that all primitives align to 24-bit boundaries in both domains, and that padding is correctly handled.
Performance Benchmarking: Measure conversion overhead for different primitive types and stream sizes to identify optimization opportunities.