How to kickstart Secure Message Transfer with Short Authentication Strings & Out-of-Band Channels
TL;DR. This post explains how to bootstrap an authenticated secure-messaging channel using Short Authentication Strings (SAS) and a lightweight out-of-band (OOB) verification step. The goal: defeat active MitM at setup time without heavy PKI ceremony, with UX that’s actually doable for humans. (Paper: ePrint 2025/1598)
Problem: first-contact authentication without PKI
Diffie–Hellman is a pretty nice key exchange. It’s an ingenious invention from the early days of modern cryptography that helped define what security and cryptography look like today. One of the first questions, however, that information-security students are asked in their first semesters is, paradoxically, “Why is it insecure?”
The fun part is watching students try to solve the discrete logarithm problem—but because we’re not tormentors, we explain that the issue does not lie within the mathematics of the key exchange. It’s the key-distribution problem. More specifically, it’s an authenticated key-distribution problem.
End-to-end encryption (E2EE) protects messages after two endpoints share authentic public keys. The hard part is the first contact: how do Alice and Bob ensure they’re not talking to Mallory on day zero? In the early days of the internet, there were great proposals around a web of trust, where people certify other people’s public keys. Remember the crypto parties? It’s a nice, decentralized idea, but it lives in an idealized world where most people care deeply about security. The truth is: they don’t. Most users expect the things they use to be secure without them doing anything. As a result, the web of trust never saw broad adoption.
Instead, Public Key Infrastructures (PKIs) became dominant: trusted, centralized certificate authorities (CAs) certify that a public key belongs to a specific organization, service, or person.
At some point, researchers realized that a PKI can be overkill in some contexts, e.g., Internet-of-Things deployments, and designed a special class of key-exchange protocols: Password-Authenticated Key Exchange (PAKE). PAKEs date back to NEKE and SPEKE in the 1990s. A PAKE is an ingenious idea: it provides end-to-end encryption bootstrapped from a small shared password, while preventing an online adversary from impersonating either party (except with the small probability of a correct first-try guess). Even more, an offline adversary who later learns or brute-forces the password still cannot extract the session key that protected the exchange. That’s nice. But it’s not the only solution available.
Classics like ZRTP tackled the authentication problem for VoIP: perform a Diffie–Hellman (DH) key exchange, then display a Short Authentication String (SAS) on both phones so users verbally compare a few words or digits. If the SAS matches, you’ve authenticated that DH exchange (and often get key continuity for future calls). Earlier, PGPfone adopted a similar approach.
Modern messengers have adjusted this idea. Signal’s safety number is a long fingerprint you scan as a QR code, because most people won’t compare long strings (and let’s be honest, they even won’t compare short strings…thus the decision from the signal team makes sense, but nevermind). Matrix still uses SAS/QR verification between devices. This comparison is an out-of-band check that raises the cost of a man-in-the-middle (MitM) attack from “transparent” to “detectable.”
Next, I’ll describe how a SAS-based approach can be used to send E2EE files from one peer to another.
But why do this, you may ask, if there are already approaches like croc, Magic Wormhole, and WebWormhole? In brief: those tools use a single human-readable code (e.g., 7-crossover-clockwork
) that doubles as both the rendezvous identifier and the PAKE password. (or at least different parts are used for different purposes)
The receiver enters the code, both sides attach to the same rendezvous, and then run a PAKE, either to derive application-layer streaming keys directly (e.g., SPAKE2), or to authenticate the signaling that forms the data channel (e.g., CPace protecting SDP/DTLS fingerprints). This “one code, typed once” UX is excellent, but it requires the out-of-band code to carry sufficient entropy and leaves security hinging on online-guess resistance and careful client behavior. A malicious server sees the rendezvous identifier and sees the PAKE transcript (but not the password).
A SAS-based design decouples meeting from binding: use a short, non-secret rendezvous code purely to meet, then display and compare a short SAS to bind the session keys. No long secret needs to be shared out of band, and the residual misbinding risk is explicit (≈ 2^-t), assuming a synchronous SAS channel. Which paradigm offers the best usability/security trade-off remains an open question.
Idea: SAS + OOB to “kickstart” Secure Message Transfer (SMT)
My ePrint (2025/1598) focuses on how to kickstart an authenticated messaging channel using three simple ingredients:
- In-band key agreement (e.g., DH, X3DH, HPKE/KEM) that yields a session key.
- A tiny out-of-band (OOB) check of a Short Authentication String (SAS) derived from the transcript, so users can detect man-in-the-middle (MitM) attacks with minimal friction.
- A digital signature algorithm (e.g., RSA, ECDSA, ML-DSA).
Conceptually:
- Run a modified handshake over the network (e.g., MANA-IV-style).
- Both apps compute an SAS from the handshake transcript (e.g., a “hash” of both parties’ public keys and nonces). Why is the hash here embraced by quotes you may ask. We will come back to this later in the Random Oracle heuristic section later.
- Display a t-character, human-verifiable SAS (numeric, word list, or emoji).
- Users confirm the SAS over a separate channel (face-to-face, phone call, or by scanning a QR code).
- If it matches, no MitM is present; the channel is authenticated and key continuity (pinning) is established for future sessions.
This pattern aligns with well-understood constructions used in ZRTP and modern messengers, but refines them into a lightweight, repeatable recipe for information exchange, not just calls.
Why this makes sense (threat model & guarantees)
- Threat model: active network attacker capable of MitM at first contact.
- Goal: guarantee that if users perform one short OOB check, MitM is detectable (with negligible false accept) without requiring global PKI, TOFU or confidential password distribution trust.
SAS security comes from binding the displayed string to the exact handshake transcript. A MitM changing bits inside the handshake changes the SAS. If users compare over an independent channel (voice, physical co-presence, or QR), the attacker must compromise both channels. Importantly, this comparison need not be confidential. In the security proof of 2025/1598, I model the OOB channel as authenticated but public: the adversary can observe the actual SAS during the comparison. This is a feature, which is not possible with PAKE-based approaches.
Design knobs that matter in practice
- SAS length & format. Words or emojis reduce read-off errors vs raw digits. Longer SAS ⇒ exponentially lower undetected MitM probability, but worse UX. ZRTP uses a short SAS as a sweet spot. For high-risk users, offer a longer check or QR fallback, but keep in mind that at some point (if the SAS gets too long) you are simply comparing long fingerprints, which then make the whole SAS-based handshake obsolete and you can simply default to the approach in Signal.
- Transcript binding. Include both parties’ static or pre-keys (as applicable) and ephemeral randomness in the SAS derivation. That prevents downgrade or reflection quirks.
- Key continuity. Cache a binding (like ZRTP’s retained secrets). Even if users skip SAS later, continuity hardens against later MitM.
- Out-of-band (OOB) channel quality. Verbal comparison is the easiest but error-prone. QR scanning is faster and less error-prone when co-present.
Limitations & honest caveats
- Human step required. SAS/OOB only helps if users actually perform the check. Research shows many users skip manual comparisons unless prompted thoughtfully.
- Compromised endpoints. If Mallory controls a device, OOB checks may not help. You need a secure UI and tamper-resistant storage. This is a complex topic, and I may write a dedicated post on post-compromise security for end-to-end encryption. Keep in mind that even Signal is not the holy grail of post-compromise security (though it still offers meaningful protections in such settings), see these slides.
- Side channels. Be careful with how the SAS is rendered and spoken (e.g., locales, font issues, confusion between similar characters).
Related reading
- ZRTP (RFC 6189) — classic SAS for VoIP; great mental model for SAS + continuity.
- Signal safety numbers — long fingerprint + QR for practical verification in messengers.
- Matrix verification — SAS/QR between devices in practice.
- OOB background — what “out-of-band” means and why it helps
Bottom line
If your threat model includes an active attacker at first contact, SAS + OOB is a pragmatic way to kickstart secure message transfer: small cognitive load for the user, big reduction in MitM risk, and no dependency on global PKI.