Canonicalization is the process of converting data that has multiple possible representations into a single, deterministic "standard" or "canonical" form, enabling consistent cryptographic operations, equivalence comparison, and verifiable data structures across KERI/ACDC systems.
Related Concepts
No related concepts available
Comprehensive Explanation
Canonicalization in KERI/ACDC Systems
Process Definition
Canonicalization (also called standardization or normalization) is a fundamental data transformation process in KERI and ACDC implementations that converts data structures with potentially multiple valid representations into a single, deterministic, reproducible form. This process is critical for cryptographic integrity because cryptographic hash functions and digital signatures require byte-exact input to produce consistent, verifiable outputs.
In the KERI ecosystem, canonicalization accomplishes several essential objectives:
Enables cryptographic verifiability: By ensuring data serializes identically across different systems, canonicalization allows SAID (Self-Addressing Identifier) computation to produce consistent digests
Supports equivalence comparison: Different representations of logically identical data can be compared by canonicalizing both and checking for byte-exact equality
Prevents malleability attacks: Deterministic serialization prevents attackers from creating alternative representations of signed data that would produce different signatures
Facilitates interoperability: Systems using different serialization libraries or programming languages can exchange verifiable data structures
Canonicalization is used throughout KERI operations including:
Key participants in canonicalization processes include:
Implementation Notes
Critical Implementation Requirements
Field Ordering Strategy
ACDC Canonical Order: The canonical field order for ACDCs is schema-defined, not lexicographic. The schema's properties object defines the order in which fields must appear in canonical serializations. This is explicitly stated in Document 2: "The canonical ordering is defined by the JSON schema document, not lexicographical (alphabetical) order."
Implementation Approach:
Parse the JSON Schema to extract field order from properties object
Maintain insertion order when constructing data structures
Never use alphabetical sorting for canonicalization
Test with schemas that have non-alphabetical field orders to catch ordering bugs
SAID Placeholder Handling
Placeholder Requirements:
Must be exactly 44 characters for Blake3-256 with CESR encoding
Typically uses # (ASCII 35) characters: ############################################
Must be replaced byte-for-byte with computed SAID
Placeholder length varies with hash algorithm (32 chars for SHA-256, 44 for Blake3-256)
Common Mistake: Using arbitrary placeholder lengths or forgetting to account for CESR derivation code in length calculation.
Recursive Canonicalization
Nested Structure Handling: ACDCs often contain nested field maps (attributes, edges, rules sections). Canonicalization must be applied recursively:
Start with innermost nested structures
Canonicalize and compute SAIDs for leaf nodes
Embed computed SAIDs into parent structures
Continue recursively until top-level SAID is computed
Document 26 provides detailed guidance: "The SAIDification process must proceed from the innermost blocks outward. This recursive approach ensures that: 1. Leaf-level SAIDs are calculated first, 2. These SAIDs are embedded into their parent structures, 3. Parent-level SAIDs are then calculated."
All formats must produce equivalent canonical forms
Conversion between formats must preserve field ordering
The v (version) field indicates serialization format
Issuers: Must canonicalize ACDC structures before computing SAIDs and signing
Verifiers: Must canonicalize received data structures to recompute digests and verify signatures
Witnesses: Canonicalize key events before creating receipts
Parsers: Must apply canonical ordering rules when deserializing data structures
Process Flow
Canonicalization for SAID Generation
The most critical canonicalization workflow in KERI is the SAID generation process, which requires careful ordering to handle the self-referential nature of SAIDs:
Step 1: Data Structure Population
Populate all fields in the data structure (e.g., ACDC credential)
Insert a placeholder value (typically # characters) where the SAID will eventually reside
The placeholder must be exactly the length of the final SAID (e.g., 44 characters for Blake3-256 with CESR encoding)
Step 2: Canonical Serialization
Apply insertion-order preservation for field maps (JSON objects)
The order of fields must match the schema-defined order, not lexicographic (alphabetical) order
For ACDCs, the canonical field order is defined by the JSON Schema document
Nested structures must recursively apply canonical ordering
Array elements maintain their positional order
Step 3: Identifiable Basis Creation
The canonically serialized data with placeholder becomes the identifiable basis
This representation is the input to the cryptographic hash function
The identifiable basis must be byte-exact reproducible across all implementations
Step 4: Cryptographic Digest Computation
Apply the specified hash algorithm (e.g., Blake3-256, SHA-256) to the identifiable basis
The hash algorithm is indicated by the CESR derivation code
The raw hash bytes are then encoded using CESR encoding
Step 5: SAID Embedding
Replace the placeholder in the identifiable basis with the computed SAID
This creates the saidified data - the final canonical form
The SAID is now cryptographically bound to the content
Canonicalization for Verification
When verifying a SAID or signature, the process reverses:
Malformed structures: Data that doesn't conform to expected schema
Recovery Strategies:
Reject invalid data: Most errors should result in rejection
Graceful degradation: Unknown optional fields can be ignored
Detailed error reporting: Provide specific information about canonicalization failures for debugging
Usage Patterns
ACDC Schema Canonicalization
One of the most critical canonicalization use cases is schema SAIDification. ACDC schemas must be canonicalized to compute their SAIDs, which are then embedded in credentials to cryptographically bind credentials to their schemas.
Typical Workflow:
Author JSON Schema with empty $id fields ("") at all levels
Recursively canonicalize from innermost blocks outward
All nested structures must maintain canonical ordering
Integration Considerations
Programming Language Differences:
Python: Use dict with insertion order (Python 3.7+)
JavaScript/TypeScript: Use Object or Map with insertion order
Rust: Use IndexMap or LinkedHashMap for ordered maps
Go: Use ordered map libraries (standard map is unordered)
Serialization Library Selection:
Choose libraries that support insertion-order preservation
Avoid libraries that automatically sort keys alphabetically
Test round-trip serialization/deserialization to verify ordering
Schema Validation Integration:
Validate structure before canonicalization
Use JSON Schema validators that preserve field order
Ensure validators don't reorder fields during validation
Caching Strategies:
Cache canonical forms of frequently used schemas
Invalidate cache when schemas are updated
Be cautious with caching credentials (they may contain time-sensitive data)
Error Propagation:
Canonicalization errors should propagate to calling code
Provide detailed error messages for debugging
Log canonicalization failures for security monitoring
Common Pitfalls
Lexicographic Ordering Mistake:
Wrong: Sorting fields alphabetically
Correct: Using schema-defined or insertion order
This is the most common canonicalization error in KERI implementations
Placeholder Length Mismatch:
Wrong: Using arbitrary placeholder length
Correct: Placeholder must exactly match final SAID length (44 chars for Blake3-256)
Nested Structure Ordering:
Wrong: Only ordering top-level fields
Correct: Recursively applying canonical ordering to all nested structures
Serialization Format Inconsistency:
Wrong: Mixing JSON and CBOR without proper conversion
Correct: Using consistent serialization format throughout a workflow
Whitespace Handling:
Wrong: Including formatting whitespace in canonical form
Correct: Using compact serialization without unnecessary whitespace
Performance Optimization
Lazy Canonicalization:
Only canonicalize when needed (e.g., before signing or verification)
Don't canonicalize data that won't be cryptographically processed
Incremental Canonicalization:
For large structures, canonicalize sections independently
Combine canonical sections to form complete structure
Parallel Processing:
Canonicalize independent data structures in parallel
Use worker threads for large batch operations
Memory Management:
Stream large data structures rather than loading entirely into memory
Use generators/iterators for processing large credential sets
Relationship to KERI Security Model
Canonicalization is foundational to KERI's security architecture:
Self-Certification: SAIDs computed over canonical forms enable self-certifying identifiers that don't require external registries
Duplicity Detection: Canonical forms ensure that different representations of the same event can be detected, enabling duplicity detection in KELs
End-Verifiability: Canonical serialization enables end-verifiable data structures where any party can independently verify integrity
Composability: CESR's composability property depends on canonical encoding of primitives
Authentic Chaining: ACDC chains rely on canonical SAIDs to cryptographically link credentials
Without deterministic canonicalization, none of KERI's core security properties would be achievable. It is the invisible foundation that makes cryptographic verifiability practical across heterogeneous systems.
Test round-trip conversions to verify equivalence
JSON Serialization: Use compact format without whitespace:
json.dumps(data, separators=(',', ':')) # No spaces
Programming Language Considerations
Python (3.7+):
dict maintains insertion order by default
Use json.dumps() with separators=(',', ':') for compact output
OrderedDict is no longer necessary but can be used for clarity
JavaScript/TypeScript:
Object property order is insertion order (ES2015+)
Use JSON.stringify() for serialization
Be aware of numeric key sorting behavior
Rust:
Use IndexMap or LinkedHashMap for ordered maps
Standard HashMap is unordered and unsuitable
serde_json preserves order with appropriate map types
Go:
Standard map is unordered
Use third-party ordered map libraries
Consider using structs with explicit field ordering
Performance Optimization
Caching Strategies:
Cache canonical forms of frequently used schemas
Invalidate cache when schemas are updated
Be cautious with credential caching (may contain time-sensitive data)
Lazy Evaluation:
Only canonicalize when needed (before signing/verification)
Don't canonicalize data that won't be cryptographically processed
Parallel Processing:
Canonicalize independent structures in parallel
Use worker threads for batch operations
Canonicalization is stateless and thread-safe
Error Handling
Validation Before Canonicalization:
Validate structure against schema before canonicalization
Check for required fields, type correctness
Fail fast on structural errors
Detailed Error Messages:
Report which field caused canonicalization failure
Include expected vs. actual types
Provide context for debugging (e.g., path to problematic field)
Security Considerations:
Log canonicalization failures for security monitoring
Reject data with SAID mismatches immediately
Don't expose internal implementation details in error messages
Testing Requirements
Test Cases:
Round-trip testing: Serialize → Deserialize → Serialize should produce identical output
Cross-implementation testing: Same data canonicalized by different implementations should match
Edge cases: Empty objects, deeply nested structures, large arrays
Negative tests: Invalid placeholders, missing fields, type mismatches
Performance tests: Large credentials, batch processing
Test Vectors: Use official KERI test vectors to verify implementation correctness. Document 14 references test implementations in the qvi-software repository.
Integration with KERI Components
KEL Integration: Key events must be canonically serialized before signing. The KEL verification process recomputes digests over canonical forms.
ACDC Integration: Credentials must maintain canonical ordering throughout their lifecycle: issuance, presentation, verification.
CESR Integration: CESR primitives are canonically encoded. The composability property depends on canonical representation.
IPEX Integration: Presentation exchanges require canonicalization for both compact and full disclosure variants.
Common Pitfalls
Lexicographic Ordering: Never sort fields alphabetically. This is the most common canonicalization error.
Whitespace Handling: Don't include formatting whitespace in canonical forms. Use compact serialization.
Placeholder Length: Ensure placeholder exactly matches final SAID length (44 chars for Blake3-256).
Nested Structure Ordering: Apply canonical ordering recursively to all nested structures, not just top-level.
Serialization Format Mixing: Use consistent serialization format throughout a workflow. Don't mix JSON and CBOR without proper conversion.
Caching Stale Data: Invalidate cached canonical forms when underlying data changes.
Type Coercion: Ensure numeric types are preserved (don't convert integers to strings during serialization).
Debugging Techniques
Diff Canonical Forms: When SAID verification fails, diff the expected and actual canonical forms to identify discrepancies.
Hex Dump Comparison: For binary formats, hex dump both forms to identify byte-level differences.
Field Order Inspection: Print field order before and after canonicalization to verify ordering logic.
Hash Intermediate Steps: Log intermediate values (identifiable basis, hash bytes, encoded SAID) to isolate where the process diverges.
Security Implications
Malleability Prevention: Canonical serialization prevents attackers from creating alternative representations of signed data that would produce different signatures.
Duplicity Detection: Canonical forms enable detection of conflicting representations of the same event in KELs.
Replay Attack Prevention: Combined with timestamps and nonces, canonical forms help prevent replay attacks.
Side-Channel Resistance: Deterministic canonicalization reduces timing side-channels in cryptographic operations.