Loading vLEI.wiki
Fetching knowledge base...
Fetching knowledge base...
This comprehensive explanation has been generated from 178 GitHub source documents. All source documents are searchable here.
Last updated: October 7, 2025
This content is meant to be consumed by AI agents via MCP. Click here to get the MCP configuration.
Note: In rare cases it may contain LLM hallucinations.
For authoritative documentation, please consult the official GLEIF vLEI trainings and the ToIP Glossary.
Self-addressing data (SAD) is a data structure where a Self-Addressing Identifier (SAID) is cryptographically derived from and embedded within the data content itself, creating a mutually tamper-evident relationship where the identifier both addresses and verifies the integrity of its containing data.
Self-addressing data (SAD) represents a fundamental cryptographic primitive in the KERI ecosystem where data content and its identifier are cryptographically bound through a self-referential mechanism. A SAD is formally defined as a representation of data content from which a SAID (Self-Addressing Identifier) is derived, where the SAID is both content-addressable and encapsulated by (self-referential to) its SAD.
The core innovation lies in the circular relationship: the SAID is computed from the serialized data that includes the SAID itself. This creates an immutable cryptographic binding where any modification to either the data content or the embedded SAID breaks the verifiable relationship, making tampering immediately evident.
SADs serve multiple critical functions in the KERI protocol suite:
Field Ordering: Implementations MUST preserve insertion order in JSON objects. Use Python 3.7+ dict, JavaScript ES2015+ Object, or equivalent ordered map structures. Do NOT sort fields lexicographically.
Compact Serialization: Remove all unnecessary whitespace before SAID computation. Use json.dumps(separators=(',', ':')) in Python or JSON.stringify() without space parameters in JavaScript.
Consistent Encoding: Use UTF-8 encoding consistently. Normalize Unicode strings to NFC form before serialization.
Length Matching: The placeholder MUST be exactly the same length as the final SAID. For Blake3-256 text encoding, use 44 # characters.
Field Position: The SAID field (typically d) must be in the correct position according to the schema. For ACDCs, d appears after v (version) in top-level objects.
Default Algorithm: Use Blake3-256 (derivation code E) for new implementations unless specific requirements dictate otherwise.
Algorithm Support: Implement support for multiple algorithms to enable verification of existing SADs and future algorithm transitions. Minimum support: Blake3-256, Blake2b-256, SHA3-256.
Derivation Code Validation: Always validate the derivation code before attempting verification. Reject SAIDs with unsupported or invalid codes.
Step-by-Step Verification:
d fieldNested SAD Verification: Verify nested SADs recursively from innermost to outermost. Each nested SAD must verify independently before verifying the parent SAD.
Caching: Cache computed SAIDs for frequently accessed data structures (schemas, credential templates). Invalidate cache only when structure changes.
SADs represent a specific class of self-referential cryptographic data structures that extend traditional content-addressable storage with self-containment properties. Unlike simple content-addressed data (where identifiers are external), SADs embed their identifiers within the data itself, creating a self-proving structure.
The SAID derivation follows a precise four-step protocol:
# characters matching the target SAID length)Verification reverses this process: extract the SAID, replace with placeholder, recompute hash, and compare.
KERI's SAID implementation primarily uses Blake3-256 as the default hash algorithm, though the architecture supports cryptographic agility through CESR derivation codes. Blake3-256 provides:
Alternative algorithms supported include Blake2b-256, Blake2s-256, SHA3-256, and SHA2-256, each identified by specific CESR derivation codes.
SADs provide several critical security guarantees:
Collision Resistance: The cryptographic hash function ensures that finding two different data structures with the same SAID is computationally infeasible (approximately 2^128 operations for 256-bit hashes).
Pre-image Resistance: Given a SAID, it is computationally infeasible to construct data that produces that SAID without knowing the original content.
Tamper Evidence: Any modification to the data content invalidates the SAID, making tampering immediately detectable through verification.
Binding Strength: The cryptographic binding between SAID and content provides approximately 128 bits of security strength, meeting KERI's security requirements.
For Blake3-256 (the default):
E (text) or 0x0c (binary) for Blake3-256The CESR encoding ensures that SAIDs are self-describing, with the derivation code indicating which hash algorithm was used, enabling cryptographic agility and future algorithm transitions.
SAIDs are encoded using Composable Event Streaming Representation (CESR), which provides dual text/binary encoding with composability properties. The encoding structure consists of:
Example SAID in text domain:
ELvaU6Z-i0d8JJR2nmwyYAZAoTNZH3UfSVPzhzS6b5CM
Breakdown:
E: Derivation code for Blake3-256LvaU6Z-i0d8JJR2nmwyYAZAoTNZH3UfSVPzhzS6b5CM: Base64 URL-safe encoded hashText Domain (T): Uses Base64 URL-safe character set [A-Z, a-z, 0-9, -, _] for human readability and debugging. Text SAIDs are 44 characters for 256-bit hashes.
Binary Domain (B): Uses raw bytes for compact transmission and storage. Binary SAIDs are 33 bytes for 256-bit hashes.
Composability: CESR ensures that concatenated SAIDs can be converted between text and binary domains without loss, maintaining separability of individual primitives.
CESR derivation codes for common SAID algorithms:
E (text) / 0x0c (binary): Blake3-256 (44 characters / 33 bytes)F (text) / 0x0d (binary): Blake2b-256G (text) / 0x0e (binary): Blake2s-256H (text) / 0x0f (binary): SHA3-256I (text) / 0x10 (binary): SHA2-256For 512-bit variants:
0D (text) / 0x1c1d (binary): Blake3-512 (88 characters / 66 bytes)0E (text) / 0x1c1e (binary): Blake2b-512The derivation code enables cryptographic agility, allowing systems to transition to new hash algorithms while maintaining backward compatibility.
SADs and SAIDs appear throughout KERI and ACDC structures:
KERI Key Events:
d field) is a SAID of the inception event datad field) and prior event digest (p field) are SAIDsd field) is a SAIDACDC Credentials:
d field is the SAID of the entire credentials field references the schema by its SAIDa field can be a SAID in compact forme field can be a SAID referencing linked credentialsr field can be a SAID for Ricardian contractsTransaction Event Logs (TELs):
Compact Disclosure: ACDCs use SAIDs to represent undisclosed sections, enabling graduated disclosure:
{
"v": "ACDC10JSON00011c_",
"d": "ELvaU6Z-i0d8JJR2nmwyYAZAoTNZH3UfSVPzhzS6b5CM",
"i": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM",
"s": "E46jrVPTzlSkUPqGGeIZ8a8FWS7a6s4reAXRZOkogZ2A",
"a": "EEveY4-9XgOcLxUderzwLIr9Bf7V_NHwY1lkFrn9y2PY"
}
Here, the a field contains only the SAID of the attributes block, not the full attributes. The holder can later reveal the full attributes while proving they match the committed SAID.
Schema References: Credentials reference schemas by SAID rather than URL, ensuring schema immutability:
{
"s": "E46jrVPTzlSkUPqGGeIZ8a8FWS7a6s4reAXRZOkogZ2A"
}
Any party can verify that a retrieved schema matches the committed SAID, preventing schema tampering.
Credential Chaining: ACDCs use SAIDs in edge sections to create verifiable credential graphs:
{
"e": {
"d": "EFH3dCdoFOLe71iheqcywJcnjtJtQIYPvAu6DZIl3MOA",
"qvi": {
"n": "ELvaU6Z-i0d8JJR2nmwyYAZAoTNZH3UfSVPzhzS6b5CM",
"s": "EBfdlu8R27Fbx-ehrqwImnK-8Cm79sqbAQ4MmvEAYqao"
}
}
}
The n field contains the SAID of a prerequisite credential, creating a cryptographically verifiable dependency chain.
Verifying a SAD involves:
d)For nested SADs (like ACDC sections), verification proceeds recursively from innermost to outermost SAIDs.
Verification Failure Modes:
The SAID is the identifier component of a SAD. While SADs are the data structures, SAIDs are the identifiers derived from and embedded within those structures. Every SAD contains exactly one SAID at its top level, though it may contain nested SADs with their own SAIDs.
Digers are CESR primitives representing cryptographic digests. SAIDs are a specialized form of Diger with the additional property of self-referentiality. While a Diger can represent any hash value, a SAID specifically represents a hash that is embedded within the data it hashes.
ACDCs are the primary application of SADs in the KERI ecosystem. Every ACDC is a SAD, with its top-level d field containing the credential's SAID. ACDCs leverage SADs' properties to enable:
Nested SADs: ACDCs contain multiple nested SADs:
Each nested SAD has its own SAID, and the parent SAD's SAID is computed over the serialization that includes the child SAIDs. This creates a Merkle tree-like structure where the root SAID commits to all nested content.
SAID References: SADs can reference other SADs by including their SAIDs as field values. This creates directed acyclic graphs (DAGs) of verifiable data, where each node is a SAD and edges are SAID references.
Compact/Full Variants: The same logical SAD can be represented in multiple forms:
All variants have the same top-level SAID, enabling cryptographic equivalence across representations.
SAD implementations must ensure deterministic serialization:
Field Ordering: JSON objects must maintain insertion order, not lexicographic order. Modern JSON libraries (Python 3.7+, JavaScript ES2015+) preserve insertion order by default.
Whitespace: Canonical form uses compact JSON with no unnecessary whitespace. Pretty-printed JSON for human readability must be compacted before SAID computation.
Unicode Normalization: String values should use consistent Unicode normalization (typically NFC) to prevent equivalent strings from producing different hashes.
Numeric Representation: Numbers should use consistent precision and formatting (no trailing zeros, consistent exponential notation).
The placeholder string must:
#)Example placeholder for Blake3-256:
{
"d": "############################################"
}
Incremental Hashing: For large data structures, use streaming hash APIs to avoid loading entire structures into memory.
Caching: Cache computed SAIDs to avoid redundant computation, especially for frequently accessed credentials or schemas.
Parallel Verification: When verifying multiple SADs (e.g., a credential chain), parallelize verification operations since each SAD can be verified independently.
Hash Algorithm Selection: Use Blake3-256 as default for new implementations. Support multiple algorithms for backward compatibility and future transitions.
SAID Length Validation: Always validate that SAID length matches the expected length for the indicated algorithm to prevent truncation attacks.
Derivation Code Verification: Verify the derivation code is supported before attempting verification to prevent processing with incorrect algorithms.
Replay Protection: SAIDs alone don't prevent replay attacks. Combine with timestamps, nonces, or sequence numbers in higher-level protocols.
CESR Compliance: Ensure SAID encoding follows CESR specifications exactly, including correct derivation codes and Base64 URL-safe encoding.
Schema Validation: When implementing ACDC SADs, validate against JSON Schema to ensure structural correctness before SAID computation.
Cross-Implementation Testing: Test SAID computation and verification across different implementations (Python, Rust, TypeScript) to ensure consistent canonicalization.
KERI defines a SAD Path Language for referencing nested SADs within larger structures. Paths use - as separator and support both field labels and integer indices:
This enables precise references to nested SADs for selective disclosure and verification.
CESR Proof Signatures can be attached to SADs and transposed across envelope boundaries. The signature remains valid because it's computed over the SAD's canonical form, which is invariant regardless of how the SAD is embedded or transmitted.
Private ACDCs use UUIDs (high-entropy nonces) as "salty nonces" to prevent rainbow table attacks on compact SAIDs. The UUID is included in SAID computation, making it computationally infeasible to determine the content of a compact ACDC from its SAID alone.
ACDCs form Merkle tree-like structures where:
This enables efficient proof-of-inclusion for specific attributes without revealing the entire credential structure.
Streaming Hashing: For large data structures, use streaming hash APIs (e.g., hashlib incremental hashing in Python) to avoid memory overhead.
Parallel Verification: When verifying credential chains or multiple credentials, parallelize verification since each SAD is independent.
Length Validation: Always validate SAID length matches expected length for the algorithm (44 chars for 256-bit, 88 chars for 512-bit in text domain).
Truncation Prevention: Reject SAIDs that are shorter than expected length to prevent truncation attacks.
Algorithm Deprecation: Plan for algorithm transitions. Support multiple algorithms but mark deprecated algorithms in documentation.
Lexicographic Sorting: Do NOT sort JSON object keys alphabetically. This breaks SAID verification.
Pretty Printing: Do NOT compute SAIDs over pretty-printed JSON. Always compact before hashing.
String Encoding: Ensure consistent UTF-8 encoding. Avoid platform-specific encodings (e.g., UTF-16 on Windows).
Floating Point: Use consistent numeric precision. Avoid floating-point representation issues by using integer or string representations for precise values.
Cross-Implementation Testing: Test SAID computation and verification across different language implementations to ensure consistent canonicalization.
Round-Trip Testing: Verify that SADs can be serialized, deserialized, and re-verified without SAID changes.
Edge Cases: Test with empty objects, deeply nested structures, Unicode characters, large numbers, and special characters.
Python: Use keripy library which provides Saider class for SAID operations. Alternatively, use blake3 library directly with json module.
Rust: Use cesrox library which provides SAID primitives with CESR encoding support.
TypeScript/JavaScript: Use signify-ts library which provides SAID generation and verification utilities.
CESR Compliance: Ensure SAID encoding strictly follows CESR specifications. Use official CESR libraries when possible.
Schema Validation: Validate data structures against JSON Schema before SAID computation to catch structural errors early.
Version Compatibility: Support multiple CESR versions and SAID algorithms to maintain backward compatibility with existing credentials and events.