Loading vLEI.wiki
Fetching knowledge base...
Fetching knowledge base...
This comprehensive explanation has been generated from 16 GitHub source documents. All source documents are searchable here.
Last updated: October 7, 2025
This content is meant to be consumed by AI agents via MCP. Click here to get the MCP configuration.
Note: In rare cases it may contain LLM hallucinations.
For authoritative documentation, please consult the official GLEIF vLEI trainings and the ToIP Glossary.
Multicodec is a self-describing format specification that uses compact prefixes (a variable-length integer variant plus a format code) to unambiguously identify different data encodings, particularly for binary representations of cryptographic keys and content identifiers.
Multicodec is a self-describing format specification developed as part of the multiformats project to solve the fundamental problem of format ambiguity in binary data streams. The protocol wraps existing data formats with a minimal metadata layer that enables automatic format detection without requiring external context, negotiation, or metadata exchange.
Multicodec addresses the challenge that binary data, when transmitted or stored, carries no inherent information about its encoding format. A sequence of bytes could represent a public key in Ed25519 format, an RSA key, a SHA-256 digest, or any number of other formats. Without external metadata, parsers cannot reliably interpret the data.
The protocol's objectives are:
The multicodec specification is maintained in the GitHub multicodec repository. Unlike IETF RFCs, multicodec follows an open-source specification model where the canonical reference is the GitHub repository containing:
Implementing varint encoding/decoding is the core technical challenge. Key considerations:
Implementations must decide how to manage the multicodec table:
Multicodec only identifies formats—validation is the application's responsibility:
Robust error handling is critical:
Multicodec has minimal overhead, but high-performance applications should:
When bridging between multicodec and CESR systems:
The specification does not have traditional version numbers; instead, the multicodec table evolves through additions of new format codes while maintaining backward compatibility.
While multicodec is not a KERI-specific protocol, it shares conceptual similarities with CESR (Composable Event Streaming Representation), KERI's native encoding system. Both protocols address the need for self-describing cryptographic primitives, but they differ in scope and design:
KERI implementations may encounter multicodec-encoded data when interoperating with systems using IPFS (InterPlanetary File System) or other multiformats-based protocols, particularly when dealing with Content Identifiers (CIDs).
Multicodec operates as a prefix layer that wraps existing encoding formats. The architecture follows a simple two-component model:
[Multicodec Prefix][Encoded Data]
The prefix provides metadata about the encoded data that follows. This creates a clear separation of concerns:
The protocol consists of three logical components:
The multicodec table is the authoritative registry of format codes. Each entry contains:
Example entries from the table:
Code Name Tag Status
0xed ed25519-pub key permanent
0x12 sha2-256 multihash permanent
0x1200 sha2-512 multihash permanent
Multicodec is a stateless protocol. Each encoded value is self-contained and can be parsed independently without maintaining session state or context from previous operations. This statelessness provides:
The multicodec prefix consists of a single component: a variable-length integer (varint) encoding the format code.
Multicodec uses unsigned varint encoding, a variable-length encoding scheme where:
Varint encoding rules:
1: More bytes follow0: This is the last byteExample: Encoding the value 300
300 in binary: 100101100
Split into 7-bit groups (right to left):
0000010 0101100
Add continuation bits (1 for first byte, 0 for last):
10101100 00000010
Hexadecimal: 0xAC 0x02
So the multicodec prefix for code 300 would be the two bytes [0xAC, 0x02].
A complete multicodec-encoded value has the structure:
[varint(code)][raw_data]
Where:
varint(code): Variable-length integer encoding of the format coderaw_data: The actual data in its native formatExample: Ed25519 Public Key
An Ed25519 public key (32 bytes) with multicodec prefix:
Code for ed25519-pub: 0xed (237 decimal)
Varint encoding of 237: 0xED 0x01
Complete encoding:
[0xED, 0x01][32 bytes of public key]
Multicodec is format-agnostic and can prefix any binary data. Common use cases include:
Public and private keys in various formats:
Cryptographic hash outputs (often used in multihash format):
Multicodec is a component of CID (Content Identifier) used in IPFS:
CID = [multibase][cid-version][multicodec][multihash]
The multicodec component identifies the content type (e.g., "dag-pb" for Protocol Buffers DAG, "raw" for raw binary).
The encoding process is straightforward:
Step 1: Determine Format
The encoder must know the format of the data being encoded. This is typically determined by:
Step 2: Lookup Code
The encoder looks up the appropriate multicodec code in the table. For example:
Step 3: Encode Varint
The code is encoded as an unsigned varint:
def encode_varint(n):
bytes = []
while n > 0x7f:
bytes.append((n & 0x7f) | 0x80)
n >>= 7
bytes.append(n & 0x7f)
return bytes
Step 4: Concatenate
The varint prefix is prepended to the raw data:
encoded = varint_bytes + raw_data_bytes
Decoding reverses the encoding process:
Step 1: Extract Varint
The decoder reads bytes from the stream until it encounters a byte with MSB = 0:
def decode_varint(stream):
n = 0
shift = 0
while True:
byte = stream.read(1)[0]
n |= (byte & 0x7f) << shift
if (byte & 0x80) == 0:
break
shift += 7
return n
Step 2: Lookup Format
The decoder looks up the code in the multicodec table to determine the format.
Step 3: Parse Data
The decoder parses the remaining bytes according to the identified format. This may involve:
Multicodec does not define message exchange patterns—it is a data encoding protocol, not a communication protocol. However, it is commonly used in contexts where self-describing data is transmitted:
Multicodec has no timing or ordering requirements. Each encoded value is independent and can be processed in any order.
Multicodec's threat model is limited because it is a format identification protocol, not a security protocol. However, certain security considerations apply:
Multicodec provides format identification, not security guarantees. Specifically:
Applications requiring these properties must layer additional protocols on top of multicodec, such as:
Multicodec provides weak resistance to format confusion attacks. An attacker can prepend an incorrect multicodec prefix to data, causing the decoder to misinterpret the format. However:
Multicodec implementations must guard against varint overflow attacks where an attacker provides a varint that:
Mitigations:
The multicodec table is maintained through a centralized governance process on GitHub. New codes are assigned through pull requests reviewed by maintainers. This provides:
However, the centralized governance model creates a single point of failure. If the GitHub repository or maintainer accounts are compromised, attackers could inject malicious codes.
Multicodec is a foundational protocol with minimal dependencies:
Multicodec does not depend on KERI, ACDC, or other identity protocols.
While multicodec is not part of the core KERI specification, it may be encountered in KERI implementations when:
Some KERI implementations may store KELs (Key Event Logs) or ACDCs in IPFS, which uses multicodec-encoded CIDs for content addressing. In this context:
KERI uses CESR for encoding cryptographic primitives, which provides similar self-describing properties to multicodec but with additional features (text-binary composability, count codes, etc.). When bridging to systems that use multicodec:
Both multicodec and CESR solve the problem of self-describing cryptographic primitives, but with different design priorities:
| Feature | Multicodec | CESR |
|---|---|---|
| Primary goal | Format identification | Text-binary composability |
| Encoding | Binary-only (varint prefix) | Dual text/binary with Base64 |
| Composability | Not composable | Fully composable |
| Overhead | Minimal (1-9 bytes) | Moderate (4+ characters) |
| Human readability | No (binary) | Yes (text domain) |
| Streaming | Limited | Optimized for streaming |
| Ecosystem | IPFS, libp2p, multiformats | KERI, ACDC, vLEI |
When to use multicodec:
When to use CESR:
Multicodec can integrate with KERI/ACDC systems at several points:
Implementing varint encoding/decoding correctly is critical:
Common pitfalls:
Best practices:
protobuf libraries include varint implementations)Implementations must decide how to manage the multicodec table:
Options:
Trade-offs:
Multicodec only identifies formats—it does not validate that data conforms to the format. Implementations should:
For example, when decoding an Ed25519 public key:
def decode_ed25519_pubkey(data):
# Step 1: Decode multicodec prefix
code = decode_varint(data)
if code != 0xed:
raise ValueError(f"Expected Ed25519 (0xed), got {code}")
# Step 2: Extract key bytes
key_bytes = data[varint_length:]
# Step 3: Validate length
if len(key_bytes) != 32:
raise ValueError(f"Ed25519 key must be 32 bytes, got {len(key_bytes)}")
return key_bytes
Multicodec has minimal performance impact:
For high-performance applications:
Implementations should handle errors gracefully:
Error conditions:
Error handling strategies:
Thorough testing is essential:
Test cases:
Test vectors:
The multicodec repository includes test vectors for common formats. Implementations should pass all test vectors to ensure correctness.
When integrating multicodec with other systems:
Multicodec provides a simple, effective solution to the problem of format identification in binary data. By prepending a compact, self-describing prefix to data, multicodec enables automatic format detection without external metadata or negotiation.
While multicodec is not part of the core KERI specification, it shares conceptual similarities with CESR and may be encountered when KERI systems interoperate with IPFS-based storage or other multiformats-based protocols. Understanding multicodec helps KERI developers navigate the broader ecosystem of self-describing data formats and make informed decisions about when to use multicodec versus CESR.
The protocol's simplicity is both a strength and a limitation. Multicodec excels at format identification but provides no security guarantees. Applications requiring integrity, authenticity, or confidentiality must layer additional protocols on top of multicodec, such as digital signatures, MACs, or encryption. In the KERI ecosystem, these security properties are provided by KERI's key event logs, CESR's cryptographic primitives, and ACDC's verifiable credentials.
Thorough testing is essential: