whitenoise.systems

How to kickstart Secure Message Transfer with Short Authentication Strings & Out-of-Band Channels

TL;DR. This post explains how to bootstrap an authenticated secure-messaging channel using Short Authentication Strings (SAS) and a lightweight out-of-band (OOB) verification step. The goal: defeat active MitM at setup time without heavy PKI ceremony, with UX that’s actually doable for humans. (Paper: ePrint 2025/1598)

Problem: first-contact authentication without PKI

Diffie–Hellman is a pretty nice key exchange. It’s an ingenious invention from the early days of modern cryptography that helped define what security and cryptography look like today. One of the first questions, however, that information-security students are asked in their first semesters is, paradoxically, “Why is it insecure?”

The fun part is watching students try to solve the discrete logarithm problem—but because we’re not tormentors, we explain that the issue does not lie within the mathematics of the key exchange. It’s the key-distribution problem. More specifically, it’s an authenticated key-distribution problem.

End-to-end encryption (E2EE) protects messages after two endpoints share authentic public keys. The hard part is the first contact: how do Alice and Bob ensure they’re not talking to Mallory on day zero? In the early days of the internet, there were great proposals around a web of trust, where people certify other people’s public keys. Remember the crypto parties? It’s a nice, decentralized idea, but it lives in an idealized world where most people care deeply about security. The truth is: they don’t. Most users expect the things they use to be secure without them doing anything. As a result, the web of trust never saw broad adoption.

Instead, Public Key Infrastructures (PKIs) became dominant: trusted, centralized certificate authorities (CAs) certify that a public key belongs to a specific organization, service, or person.

At some point, researchers realized that a PKI can be overkill in some contexts, e.g., Internet-of-Things deployments, and designed a special class of key-exchange protocols: Password-Authenticated Key Exchange (PAKE). PAKEs date back to NEKE and SPEKE in the 1990s. A PAKE is an ingenious idea: it provides end-to-end encryption bootstrapped from a small shared password, while preventing an online adversary from impersonating either party (except with the small probability of a correct first-try guess). Even more, an offline adversary who later learns or brute-forces the password still cannot extract the session key that protected the exchange. That’s nice. But it’s not the only solution available.

Classics like ZRTP tackled the authentication problem for VoIP: perform a Diffie–Hellman (DH) key exchange, then display a Short Authentication String (SAS) on both phones so users verbally compare a few words or digits. If the SAS matches, you’ve authenticated that DH exchange (and often get key continuity for future calls). Earlier, PGPfone adopted a similar approach.

Modern messengers have adjusted this idea. Signal’s safety number is a long fingerprint you scan as a QR code, because most people won’t compare long strings (and let’s be honest, they even won’t compare short strings…thus the decision from the signal team makes sense, but nevermind). Matrix still uses SAS/QR verification between devices. This comparison is an out-of-band check that raises the cost of a man-in-the-middle (MitM) attack from “transparent” to “detectable.”

Next, I’ll describe how a SAS-based approach can be used to send E2EE files from one peer to another.

But why do this, you may ask, if there are already approaches like croc, Magic Wormhole, and WebWormhole? In brief: those tools use a single human-readable code (e.g., 7-crossover-clockwork) that doubles as both the rendezvous identifier and the PAKE password. (or at least different parts are used for different purposes) The receiver enters the code, both sides attach to the same rendezvous, and then run a PAKE, either to derive application-layer streaming keys directly (e.g., SPAKE2), or to authenticate the signaling that forms the data channel (e.g., CPace protecting SDP/DTLS fingerprints). This “one code, typed once” UX is excellent, but it requires the out-of-band code to carry sufficient entropy and leaves security hinging on online-guess resistance and careful client behavior. A malicious server sees the rendezvous identifier and sees the PAKE transcript (but not the password).

A SAS-based design decouples meeting from binding: use a short, non-secret rendezvous code purely to meet, then display and compare a short SAS to bind the session keys. No long secret needs to be shared out of band, and the residual misbinding risk is explicit (≈ 2^-t), assuming a synchronous SAS channel. Which paradigm offers the best usability/security trade-off remains an open question.

Idea: SAS + OOB to “kickstart” Secure Message Transfer (SMT)

My ePrint (2025/1598) focuses on how to kickstart an authenticated messaging channel using three simple ingredients:

  1. In-band key agreement (e.g., DH, X3DH, HPKE/KEM) that yields a session key.
  2. A tiny out-of-band (OOB) check of a Short Authentication String (SAS) derived from the transcript, so users can detect man-in-the-middle (MitM) attacks with minimal friction.
  3. A digital signature algorithm (e.g., RSA, ECDSA, ML-DSA).

Conceptually:

  1. Run a modified handshake over the network (e.g., MANA-IV-style).
  2. Both apps compute an SAS from the handshake transcript (e.g., a “hash” of both parties’ public keys and nonces). Why is the hash here embraced by quotes you may ask. We will come back to this later in the Random Oracle heuristic section later.
  3. Display a t-character, human-verifiable SAS (numeric, word list, or emoji).
  4. Users confirm the SAS over a separate channel (face-to-face, phone call, or by scanning a QR code).
  5. If it matches, no MitM is present; the channel is authenticated and key continuity (pinning) is established for future sessions.

This pattern aligns with well-understood constructions used in ZRTP and modern messengers, but refines them into a lightweight, repeatable recipe for information exchange, not just calls.

Why this makes sense (threat model & guarantees)

SAS security comes from binding the displayed string to the exact handshake transcript. A MitM changing bits inside the handshake changes the SAS. If users compare over an independent channel (voice, physical co-presence, or QR), the attacker must compromise both channels. Importantly, this comparison need not be confidential. In the security proof of 2025/1598, I model the OOB channel as authenticated but public: the adversary can observe the actual SAS during the comparison. This is a feature, which is not possible with PAKE-based approaches.

Design knobs that matter in practice

Limitations & honest caveats


Bottom line

If your threat model includes an active attacker at first contact, SAS + OOB is a pragmatic way to kickstart secure message transfer: small cognitive load for the user, big reduction in MitM risk, and no dependency on global PKI.