Loading vLEI.wiki
Fetching knowledge base...
Fetching knowledge base...
This comprehensive explanation has been generated from 14 GitHub source documents. All source documents are searchable here.
Last updated: October 7, 2025
This content is meant to be consumed by AI agents via MCP. Click here to get the MCP configuration.
Note: In rare cases it may contain LLM hallucinations.
For authoritative documentation, please consult the official GLEIF vLEI trainings and the ToIP Glossary.
The sniffer is a format detection component within KERI's Parside parser that automatically identifies serialization formats (CESR binary, CESR text, JSON, CBOR, MessagePack) in streaming data by examining initial codes and markers, enabling proper parsing dispatch without prior configuration.
The sniffer is a specialized detection component implemented within KERI's Parside parser infrastructure. It serves as the first-stage analyzer in the CESR (Composable Event Streaming Representation) stream processing pipeline, responsible for identifying serialization formats before parsing begins.
The sniffer addresses the fundamental cold start problem in stream parsing: determining how to begin processing structured data in an incoming stream without prior knowledge of its format. In KERI's multi-format ecosystem, streams may contain:
The sniffer enables automatic format detection by examining recognizable markers at the beginning of streams, allowing the parser to dispatch processing to the appropriate handler without manual configuration or buffering.
The sniffer implements CESR's core design principle of self-describing streams. CESR streams are designed to be sniffable - they contain sufficient information at their beginning to enable automatic format detection. This capability is essential for:
Implementations must examine the initial bytes/characters of the stream to identify format markers:
{ and locate version string field via regexFor non-CESR formats (JSON, CBOR, MessagePack):
This approach is necessary because these formats are not self-framing like CESR primitives.
When detecting CESR format:
Implementations should handle:
The sniffer should be optimized for:
The sniffer performs comprehensive format detection across five distinct serialization types. Each format has unique identifying characteristics:
CESR Formats: Both CESR binary and CESR text formats contain group codes or object codes at the stream's beginning. These codes provide unique three-bit combinations that enable immediate recognition without ambiguity.
Structured Formats: JSON, CBOR, and MessagePack each have distinct structural markers that the sniffer can recognize through pattern matching.
The sniffer implements different processing strategies depending on the detected format:
When the sniffer detects JSON, CBOR, or MGPK formats, it initiates a specialized extraction process:
Regex Matching - The parser applies regular expressions to locate the version string embedded within the serialized data. The version string is a critical field in KERI field maps that appears first in any top-level structure.
Length Extraction - From the version string, the parser extracts the character count (for text formats) or byte count (for binary formats) that defines the total length of the serialized structure. This length information is embedded within the version string itself.
Boundary Detection - Using the extracted length, the parser determines exactly where the current data structure ends. This enables precise segmentation of the stream without parsing the entire structure.
Resume Sniffing - After processing the bounded structure, the sniffer resumes operation on the remaining stream. This allows for heterogeneous streams containing multiple serialization formats in sequence.
This approach is necessary because JSON, CBOR, and MessagePack are not self-framing in the same way CESR primitives are. They require explicit length information to determine boundaries.
When the sniff result indicates CESR format at the top level, the processing differs significantly:
The parser searches for the CESR version count code which identifies the specific CESR encoding tables in use. This version code determines which code table the parser should load for interpreting subsequent primitives.
It also looks for other count codes that define the number and types of primitives in groups. Count codes are special framing codes that enable self-framing at the group level.
These codes enable the parser to properly decode the self-framing primitives in the stream without requiring external length information.
The fundamental difference is that CESR formats are self-framing - each primitive contains sufficient information to determine its own boundaries. This property, described as "relatively new" in the design discussions, enables modular parser construction where parsing responsibilities can be dispatched based on the strip parameter.
A stream achieves sniffability when it begins with specific recognizable markers:
Each supported datablock type has either an Object code or Group code (available in both binary and text variants) that provides a unique three-bit signature for immediate recognition. This design enables:
Streams lacking these markers are classified as non-sniffable and require alternative parsing strategies.
The sniffer works in conjunction with CESR's version management system:
This version management is critical because count code interpretation depends on the active version context. The parser must maintain a default version to handle scenarios where:
The sniffer operates as the first stage in CESR stream processing:
This pipeline architecture ensures that format detection happens before any parsing attempts, preventing errors from misinterpreted data.
The sniffer is explicitly described as "part of Parside" in the source documentation. Parside is the component responsible for parsing group codes and orchestrating higher-level stream structure, while Cesride handles individual CESR primitives.
This division of responsibilities reflects CESR's modular design:
The strip parameter determines which portion of the CESR stream will be parsed by which code component, enabling dynamic dispatch based on detected format.
When the sniffer identifies CESR format, it enables hierarchical parsing:
The sniffer is fundamental to CESR's composability property - the ability to convert any set of self-framing concatenated primitives between text and binary domains without loss. By enabling automatic format detection, the sniffer allows:
The sniffer directly addresses the cold start problem identified in KERI documentation. Without sniffability:
With the sniffer's detection capabilities:
By supporting multiple serialization formats (JSON, CBOR, MessagePack) alongside native CESR formats, the sniffer enables KERI to integrate with existing data interchange standards while maintaining CESR's composability properties. This is critical for:
The sniffer's design prioritizes efficiency:
These characteristics enable high-throughput stream processing essential for KERI's scalability goals.
While not extensively documented in the source materials, the sniffer's role in error handling is implicit:
Proper error handling at the sniffer stage prevents downstream parsing failures.
The source documentation indicates the sniffer concept emerged from design discussions in February 2023 (Cesride Slack thread). The design reflects CESR's "relatively new" self-framing property, suggesting the sniffer architecture evolved alongside CESR's maturation.
Key design principles established in these discussions:
These principles reflect a mature understanding of stream processing requirements in cryptographic protocols.
The sniffer enables several critical KERI capabilities:
KERI key event logs (KELs) and transaction event logs (TELs) are transmitted as CESR streams. The sniffer enables:
Out-of-Band Introductions (OOBIs) may contain various serialization formats. The sniffer enables OOBI endpoints to accept:
Authentic Chained Data Containers (ACDCs) may be serialized in multiple formats. The sniffer enables:
The sniffer's multi-format support is thus essential for KERI's practical deployment across diverse use cases and integration scenarios.
The sniffer operates as the first stage in Parside's parsing pipeline:
The strip parameter mechanism determines which code handles which stream portion based on sniffer results.
When CESR format is detected:
Implementations should be tested with: