Loading vLEI.wiki
Fetching knowledge base...
Fetching knowledge base...
This comprehensive explanation has been generated from 180 GitHub source documents. All source documents are searchable here.
Last updated: October 7, 2025
This content is meant to be consumed by AI agents via MCP. Click here to get the MCP configuration.
Note: In rare cases it may contain LLM hallucinations.
For authoritative documentation, please consult the official GLEIF vLEI trainings and the ToIP Glossary.
A Self-Addressing Identifier (SAID) is a cryptographic identifier that is deterministically generated from the content it identifies and then embedded within that content, creating a self-referential, content-addressable identifier with tamper-evident properties.
A Self-Addressing Identifier (SAID) is a specialized cryptographic primitive that serves as both an identifier and an integrity proof for data structures. According to the IETF SAID specification, a SAID is "an identifier which is deterministically generated out of the content, digest of the content" and is simultaneously content-addressable (cryptographically bound to the data) and self-referential (embedded within the data it identifies).
The fundamental innovation of SAIDs is resolving the inherent tension between traditional content-addressable identifiers (which cannot be self-referential because including the identifier changes the content) and self-referential identifiers (which traditionally lack cryptographic binding). SAIDs achieve both properties through a special derivation protocol that makes the identifier both embedded in and derived from the serialized data structure.
SAIDs serve multiple critical functions in the KERI ecosystem:
SAIDs are classified as qualified cryptographic primitives in encoding, meaning they include a prepended derivation code that indicates the cryptographic algorithm used for digest computation. This qualification enables self-describing data structures that can be parsed without external schema information.
Field Ordering: KERI/ACDC mandates insertion-ordered field maps. The serialization must preserve the order in which fields were added to the data structure. Lexicographic (alphabetical) ordering is explicitly NOT used.
Compact JSON: SAID computation requires compact JSON serialization with:
Example canonical form:
{"d":"############################################","i":"EpDA1n-WiBA0A8YOqnKrB-wWQYYC49i5zY_qrIZIicQg","name":"Alice"}
The dummy string must:
# character (ASCII 35)d field before digest computationFor ACDCs with multiple sections:
Recommended: Blake3-256 (CESR code E)
Supported Alternatives: Blake2b-256 (F), Blake2s-256 (G), SHA3-256 (H)
Deprecated: SHA2-256 (I) - not recommended for new implementations
# Pseudocode for SAID verification
def verify_said(data_structure):
# Extract SAID
said = data_structure['d']
# Parse derivation code to identify algorithm
algorithm = parse_cesr_code(said[0])
# Create copy with dummy string
verification_copy = data_structure.copy()
verification_copy['d'] = '#' * len(said)
# Canonicalize (compact JSON, insertion order)
canonical = canonicalize(verification_copy)
# Compute digest
computed_digest = algorithm.hash(canonical)
# CESR encode with derivation code
computed_said = cesr_encode(algorithm.code, computed_digest)
# Compare
return computed_said == said
The SAID specification mandates cryptographic hash functions with approximately 128 bits of cryptographic strength to ensure collision resistance. The primary algorithm used in KERI/ACDC implementations is:
SAIDs provide several critical security guarantees:
Collision Resistance: The probability of two different data structures producing the same SAID is computationally infeasible (approximately 2^-128 for Blake3-256). This ensures that any copy of data that verifies against a SAID can be assumed identical to any other copy verifying to the same SAID.
Pre-image Resistance: Given a SAID, it is computationally infeasible to construct data that hashes to that SAID. This prevents attackers from creating fraudulent data structures that match legitimate SAIDs.
Tamper Evidence: Any modification to the data structure—no matter how small—produces a completely different SAID. This makes tampering immediately detectable upon verification.
Cryptographic Binding: The SAID creates an unbreakable cryptographic link between the identifier and the content, eliminating the need for external registries or resolution mechanisms.
For Blake3-256 (the standard algorithm):
SAIDs are encoded using CESR (Composable Event Streaming Representation), which provides dual text-binary encoding with composability properties. The CESR encoding includes:
Text Domain (Base64 URL-safe):
EAco5dU5WjDrxDBK4b4HrF82_rYb6MX6xsegjq4n0Y7M
Breakdown:
E: Derivation code for Blake3-256 digestAco5dU5WjDrxDBK4b4HrF82_rYb6MX6xsegjq4n0Y7M: Base64-encoded 32-byte digestBinary Domain:
0x12 (derivation code for Blake3-256)Common CESR derivation codes for SAIDs:
E: Blake3-256 digest (standard for KERI/ACDC)F: Blake2b-256 digest (alternative)G: Blake2s-256 digest (alternative)H: SHA3-256 digest (alternative)I: SHA2-256 digest (legacy, not recommended)The SAID derivation process follows a precise algorithm:
Allocate a fixed-length field in the serialization for the SAID. The length is determined by the chosen digest algorithm and CESR encoding (44 characters for Blake3-256 in text domain).
Fill the SAID field with placeholder characters (#, ASCII 35) of exact target length. For a 44-character SAID field:
{
"d": "############################################",
"name": "Alice",
"age": 30
}
Generate the cryptographic digest of the serialization containing the dummy string using the specified algorithm (Blake3-256). The serialization must be in canonical form with deterministic field ordering.
Encode the digest using CESR, which prepends the derivation code indicating the cryptographic algorithm.
Substitute the dummy string with the computed SAID to create the final serialization:
{
"d": "EAco5dU5WjDrxDBK4b4HrF82_rYb6MX6xsegjq4n0Y7M",
"name": "Alice",
"age": 30
}
To verify a SAID:
SAIDs appear in multiple contexts within KEL events:
Event Digest Field (d): Every KERI event includes a SAID in its d field, making the event self-addressing. This enables:
p field references the prior event's SAID)Schema References (s): Events reference schemas by their SAIDs, ensuring schema immutability and version-specific validation.
Seal References: When events anchor external data through seals, they reference that data by SAID.
ACDCs make extensive use of SAIDs for graduated disclosure:
Top-Level SAID (d): Every ACDC has a SAID identifying the entire credential structure.
Section SAIDs: Major sections (attributes a, edges e, rules r, schema s) can be represented either as:
Example compact ACDC:
{
"v": "ACDC10JSON00011c_",
"d": "ELvaU6Z-i0d8JJR2nmwyYAZAoTNZH3UfSVPzhzS6b5CM",
"i": "EpDA1n-WiBA0A8YOqnKrB-wWQYYC49i5zY_qrIZIicQg",
"s": "E46jrVPTzlSkUPqGGeIZ8a8FWS7a6s4reAXRZOkogZ2A",
"a": "EFgnk_c08WmZGgv9_mpldibRuqFMTQN-rAgtD-TCOwbs"
}
Here, the a field contains only the SAID of the attributes section, enabling selective disclosure where the holder can later provide the full attributes and prove they match the committed SAID.
TELs use SAIDs to reference:
Compact Disclosure: Present only SAIDs initially, revealing full content progressively as trust develops.
Chained Credentials: ACDCs reference other ACDCs by SAID in their edge sections, creating verifiable credential graphs.
Schema Versioning: Each schema version has a unique SAID, preventing schema mutation attacks.
Content Addressing: SAIDs enable universal retrieval of specific data versions without centralized registries.
Verifying SAID-based data structures:
d fieldA SAD is the data structure from which a SAID is derived. The relationship is:
Every SAID has a corresponding SAD, but not all data structures are SADs (only those containing SAIDs).
SAIDs are a specialized application of cryptographic hash functions with the unique property of self-referentiality. While standard digests are computed over external data, SAIDs are computed over data that includes the digest itself (via the dummy string mechanism).
SAIDs are one type of CESR primitive. Other related primitives include:
Nested SAIDs: ACDCs contain multiple SAIDs at different levels:
This creates a hash tree structure analogous to Merkle trees, where:
SAID Chains: In credential graphs, SAIDs create cryptographic links:
Credential A (SAID: E123...)
→ references Credential B (SAID: E456...)
→ references Credential C (SAID: E789...)
This enables verifiable provenance chains without expanding all credentials.
SAID computation requires deterministic serialization:
Failure to maintain canonical form produces different SAIDs for logically identical data.
While CESR supports multiple hash algorithms, Blake3-256 is strongly recommended for KERI/ACDC because:
The dummy string must exactly match the final SAID length:
Incorrect dummy length produces invalid SAIDs.
For nested structures (like ACDCs with multiple sections), SAIDs must be computed recursively from innermost to outermost:
This ensures that the top-level SAID cryptographically commits to all nested content.
SAID computation can be optimized by:
Algorithm Agility: While Blake3-256 is current standard, CESR's derivation codes enable algorithm migration if vulnerabilities are discovered.
Collision Attacks: With 128-bit security strength, collision attacks require approximately 2^64 operations (birthday paradox), which is computationally infeasible with current technology.
Pre-image Attacks: Finding data that produces a specific SAID requires 2^256 operations for Blake3-256, which is astronomically infeasible.
Quantum Resistance: Hash functions like Blake3 are generally considered quantum-resistant, as Grover's algorithm only provides quadratic speedup (reducing 256-bit security to 128-bit, which remains secure).
Caching: Store computed SAIDs to avoid recomputation when the same data is processed multiple times.
Streaming: For large data structures, use streaming hash APIs that process data incrementally rather than loading entire structures into memory.
Parallel Processing: When computing SAIDs for multiple independent sections (e.g., ACDC attributes and rules), compute them concurrently.
Incorrect Dummy Length: Using a dummy string that doesn't match the final SAID length produces invalid results.
Non-Canonical Serialization: Including whitespace, using wrong field ordering, or inconsistent encoding produces different SAIDs for logically identical data.
Wrong Recursion Order: Computing parent SAIDs before child SAIDs in nested structures produces incorrect results.
Algorithm Mismatch: Using a different hash algorithm than indicated by the CESR derivation code during verification.
Round-Trip Testing: Verify that SAID generation and verification are inverse operations.
Canonicalization Testing: Ensure that logically equivalent data structures (with different whitespace, field ordering) produce the same SAID after canonicalization.
Tamper Detection Testing: Verify that any modification to the data structure (even single-bit changes) produces a completely different SAID.
Cross-Implementation Testing: Verify that SAIDs computed by different implementations (Python, TypeScript, Rust) match for identical data.