ARX RUNA

Encrypted here · Stored anywhere

Arx Runa encrypts files locally before cloud storage. Data is chunked and encrypted client-side using XChaCha20-Poly1305 AEAD; keys remain on the user's device and never leave the local system. Cloud providers receive only opaque ciphertext.

Local-first. Encryption happens entirely on your machine. The cloud never sees plaintext.

Cloud-agnostic. Rclone syncs sealed shards to any provider you choose — S3, Backblaze, Dropbox, your own server.

Zero trust. No accounts, no servers, no third-party key management. Your keys live with you.

Arx Runa is a personal file encryption tool built around one principle: your files should be unreadable to anyone but you — including the cloud service storing them.

When you upload a file, Arx Runa encrypts it on your device before it ever leaves. The cloud receives meaningless scrambled data. When you download, Arx Runa decrypts it locally. At no point does the cloud hold your encryption keys, your filenames, your folder structure, or any other metadata. This is called zero-knowledge storage.

Research Problem

Mainstream cloud storage services (OneDrive, Google Drive, Dropbox) require users to trust the provider with their plaintext files, filenames, and metadata. A compromised or legally compelled provider can expose everything. Arx Runa explores whether it is possible to build a practical alternative where the provider is structurally incapable of reading your data.

Main question: How can a software solution for secure cloud storage be designed and implemented such that client-side encryption eliminates the need for trust in third-party providers, and how can the use of physical hardware factors (MFA) and "Zero-Trace" principles minimise the local attack surface on the user's machine?

This breaks down into five sub-questions:

Encryption standards and key management: Which modern encryption standards and key management principles are best suited for ensuring data confidentiality and integrity when data must be stored in an environment outside the user's control?
USB key file factor and offline recovery: How can a physical USB key file be integrated into the authentication flow as a mandatory second factor — ensuring that password knowledge alone is insufficient to access vault data — and how can an offline BIP-39 recovery mechanism enable user-controlled credential recovery without delegating trust to a third party or introducing a server-side backdoor?
Chunking, synchronisation, and provider-agnostic storage: How can effective chunking and synchronisation logic be implemented to upload changes to the cloud without revealing filenames, directory structures, or metadata to the cloud provider — and how can the synchronisation protocol maintain consistency across multiple devices while remaining provider-agnostic, enabling redundant backup to multiple destinations without re-encryption?
Zero-Trace operation through a RAM-based UI: How can a RAM-based in-application UI achieve Zero-Trace operation — ensuring that decrypted file content is never written to disk during a session — and what forensic residue, if any, persists on the host machine after the vault is locked?
File sharing in a zero-trust system: What cryptographic and protocol-level challenges arise when enabling file-granularity sharing between independent users in a zero-trust client-side encrypted system, and how does the proposed sharing architecture compare to existing approaches such as OneDrive sharing links and Cryptomator shared vaults?

Explore some stuff

What Arx Runa Does

Use Cases — Real-world scenarios Arx Runa is designed for

How It Works

How It Works — Plain-language walkthroughs: the vault, unlocking, encryption, cloud sync, sharing, and recovery

Going Deeper

Deep Dives — Research-level detail on cryptographic choices, file sharing, key recovery, and padding

Reference

Glossary — Term definitions
Security Model — Trust boundaries and threat model

Core Pillars

Feature	What it means
Client-side encryption	Your files are encrypted on your device before upload — the cloud only ever sees opaque ciphertext blobs
Tiered authentication	Tier 1 (password only) or Tier 2 (password + 32-byte USB key file); both are combined before key derivation, so neither factor alone is sufficient
Zero-Trace	Sensitive data is zeroed from memory as soon as it is no longer needed; session keys are mlock'd so the OS cannot page them to disk; no temporary plaintext files are written
Fixed-size chunks with BLAKE3 integrity	Files are split into equal-sized, padded chunks so the cloud cannot guess sizes; every chunk is BLAKE3-hashed and verified before decryption to catch bit rot or tampering
Secure file sharing	Share individual files using HPKE (RFC 9180) with X25519 identities — only the recipient's private key opens the share; the cloud sees only encrypted blobs
Bring Your Own Cloud	Works with any provider Rclone supports (S3, Backblaze B2, Dropbox, Google Drive, and 70+ more) — no lock-in, multiple destinations supported

Technology Stack

Component	Technology	Purpose
Language	Rust (edition 2024)	Memory-safe systems programming
Application framework	Tauri	Native desktop shell and Rust backend
UI framework	Leptos (Rust/WASM, CSR)	Reactive frontend compiled to WebAssembly
Encryption	XChaCha20-Poly1305	Authenticated encryption for every chunk; 192-bit random nonce per chunk
Key derivation	Argon2id → HKDF-SHA256	Memory-hard password hardening; then key expansion into independent vault keys
File sharing	HPKE (RFC 9180) with X25519	End-to-end encrypted share packages; only the recipient's private key can open them
Integrity	BLAKE3	Per-chunk checksums recorded in the manifest; verified on download before decryption
Local database	SQLite + SQLCipher	Encrypted manifest: file paths, chunk records, wrapped file keys
Cloud transport	Rclone	Provider-agnostic transfer to 70+ storage backends
Memory safety	`zeroize`, `secrecy`, `mlock`/`VirtualLock`	Keys zeroed after use; locked memory never paged to disk

Download

⚠️ Early demo — expect bugs and data loss This is pre-release software. Encrypted vaults, keys, and file metadata may be lost or corrupted between versions. Do not rely on this as your only copy of important files.

All releases are available on GitHub.

→ View all releases on GitHub

Windows

Download the NSIS installer (.exe) from the latest release and run it.

SmartScreen warning: Windows may show "Windows protected your PC" because the app is not code-signed. Click More info, then Run anyway. This is normal for open-source software distributed outside the Microsoft Store.

macOS

Download the disk image (.dmg) from the latest release, open it, and drag Arx Runa to your Applications folder.

Gatekeeper warning: On first launch, macOS will block the app because it is not notarized. To open it:

Right-click (or Control-click) the app icon → Open

Click Open in the dialog that appears

Alternatively: System Settings → Privacy & Security → scroll down → Open Anyway

Linux

Two formats are available from the latest release:

AppImage (.AppImage) — portable, runs on any distribution without installation:
```
chmod +x arx-runa_*.AppImage
./arx-runa_*.AppImage
```
Debian package (.deb) — for Ubuntu, Debian, and derivatives:
```
sudo dpkg -i arx-runa_*.deb
```

Source

Build from source by cloning the repository and following the instructions in the README.

Use Cases

This section describes the real-world scenarios Arx Runa is built to handle — what a user is trying to do, how Arx Runa helps, and what security guarantees apply.

Security Tiers

Arx Runa lets you choose how strongly each vault is protected when you create it:

Tier	What you need to unlock	Best for
Tier 1	Password only	Everyday use — accessible from any device
Tier 2	Password + a specific USB file	High-value data — two factors required (password + physical USB key file); opt-in recovery phrase available

Regardless of tier, the cloud never holds your encryption keys or unencrypted files.

Use Cases

	Scenario
Personal File Backup	Back up sensitive files to any cloud provider. Only you can read them, even if the provider is breached.
Cross-Device Access	Access the same vault from multiple devices. Changes sync automatically without leaking filenames or structure.
Hardware Key & Recovery	Use a physical USB file as a second factor. Covers what happens if the USB or password is lost, and opt-in BIP-39 recovery phrase setup and use.
File Sharing	Share individual files with another person securely — without sharing your password or compromising the vault.
Multi-Destination Backup	Back up to multiple cloud providers simultaneously. Covers mirror and accumulating modes, cloud provider migration, and backup failure recovery.

What Each Use Case Covers

Each use case document describes:

Who is involved and what they are trying to do
The step-by-step flow of a successful scenario
What can go wrong and how Arx Runa handles it
The security properties that must hold throughout

Design traceability (for developers and reviewers)

Sub-Question Traceability

Sub-question	Description	Covered by
SQ1	Encryption standards and key management	Use case 1
SQ2	USB hardware factor in authentication	Use case 3
SQ3	Chunking and sync without metadata leakage	Use cases 1, 2
SQ4	RAM-based UI / Zero-Trace	Use case 1
SQ5	File sharing in a zero-trust system	Use case 4

Use Case 1: Zero-Knowledge Personal Backup

Overview

An individual user wants to back up sensitive personal files (documents, photos, videos) to cloud storage without exposing plaintext, filenames, or metadata to the cloud provider. Arx Runa uses a drop zone as the primary interface. When creating a vault the user chooses an authentication tier: Tier 1 (password-only) or Tier 2 (password + key file) — Tier 2 is selected by default for stronger out-of-box security. The tier applies to the entire vault — users who need different security levels create separate vaults.

Actors

Primary Actor: Individual user with sensitive personal files
Secondary Actors: Cloud storage provider (untrusted), Arx Runa system

Preconditions

User has installed Arx Runa on their local machine
User has configured an Rclone backend (e.g., Google Drive, Dropbox)

Main Flow

User launches Arx Runa and selects "Create Vault"
Arx Runa prompts: "Choose authentication tier — Tier 1 (password only) or Tier 2 (password + key file)". Tier 2 is selected by default.
User selects a tier and completes setup (password for Tier 1; password + key file generation for Tier 2)
Arx Runa derives encryption keys from the provided credentials
Arx Runa unlocks vault and displays drop zone UI with vault file browser
User drags files or folders onto the drop zone
Arx Runa generates a unique encryption key for each file
Arx Runa splits and encrypts the file into fixed-size chunks
Arx Runa stores the encrypted chunks in the local vault database
User clicks sync
Arx Runa uploads vault-header, encrypted chunks and vault database saved as a manifest to cloud
User browses vault and views files in-app (Zero-Trace)
User locks vault (and removes key file if Tier 2)

Alternate Flows

Media Files (EXIF and In-Memory Viewing)

Trigger: User drops photos or videos onto the drop zone

Steps:

Arx Runa detects media file types
Arx Runa optionally strips EXIF metadata (GPS, camera model, timestamps) before encryption
Arx Runa encrypts and uploads as in Main Flow
When user opens a photo: Arx Runa decrypts chunks into RAM and renders in-app (no temp file written to disk)
For large videos: Arx Runa decrypts and streams progressively from cloud chunks

Step 5 (video streaming) is not yet implemented.

Export Decrypted File to Disk

Trigger: User wants to save a decrypted copy of a file outside Arx Runa (e.g., to edit in an external application)

Steps:

User selects a file in the vault browser and chooses "Download"
Arx Runa warns: "Exported file will be written to disk in plaintext, outside vault protection"
User confirms "Export Anyway" or cancels
Arx Runa prompts user to choose a save location
Arx Runa downloads encrypted chunks and decrypts in RAM
Arx Runa writes the plaintext file to the chosen location
User is responsible for the exported copy

Cloud Provider Unavailable

Trigger: Rclone backend is unreachable

Steps:

Arx Runa completes local encryption and manifest update
When connectivity restores, user triggers sync and Arx Runa uploads pending chunks

Cloud Provider Migration

Trigger: User wants to switch to a different cloud provider (e.g., from Google Drive to Backblaze B2)

Steps:

User adds the new provider as a destination with backup mode "Mirror" (Destinations page)
Arx Runa syncs all encrypted blobs to the new destination (UUID names and content unchanged)
User switches the new destination to "primary" on the Destinations page
User verifies sync and removes the old destination
No re-encryption required — data remains opaque to both providers

See use-case-5 for full multi-destination flows including mirror mode, accumulating mode, and backup failure recovery.

File Already Exists

This flow is not yet implemented. Arx Runa currently overwrites silently; a conflict prompt is planned.

Success Criteria

All files are encrypted in RAM before any data leaves the client
Cloud provider receives only opaque blobs with random UUID names (no filenames, sizes, or metadata)
Fixed-size chunks (default 4 MiB, configurable) hide exact file size from cloud provider
EXIF metadata is stripped or encrypted before upload (media files)
Decrypted content is displayed in-memory — no plaintext written to disk (Zero-Trace)
Drop zone is the primary upload interface; a file picker button is also available as a supplementary upload method. Both files and folders can be dragged onto the drop zone.
User selects authentication tier (Tier 1 or Tier 2) when creating the vault
Tier 1 vault requires password only; Tier 2 vault additionally requires a key file
Vault cannot be opened without the correct authentication factors for the chosen tier

See also use-case-5 for multi-destination and redundant backup scenarios.

Security Considerations

Threats Addressed

Untrusted cloud provider: Cloud never receives plaintext or file metadata
Traffic analysis: Fixed-size chunks prevent file size inference
EXIF metadata leakage: GPS, camera model, timestamps stripped or encrypted
Temp file artifacts: In-memory rendering prevents plaintext disk writes (Zero-Trace)
Chunk swap attacks: AAD (file_id || chunk_index) binds each chunk to its file and position
AEAD tampering: Authentication tag detects any modification to ciphertext

Assumptions

User's local machine is trusted and not compromised during a session
Password has sufficient entropy (≥12 characters recommended)
Rclone backend provides reliable storage (Arx Runa does not implement redundancy)

Out of Scope

Physical theft of device during an unlocked session
Malware capturing keys or screen during session
Cloud provider deleting or corrupting blobs
Quantum computing attacks (symmetric XChaCha20-Poly1305 remains secure; see design doc)

Notes

This is the canonical use case for Arx Runa. Tier 2 is the default for stronger security; users who prefer password-only may select Tier 1 during vault creation.

Password loss warning: For a Tier 1 vault, the password is the sole authentication factor. Forgetting it without a recovery phrase configured means permanent, unrecoverable data loss — there is no admin override or cloud-based reset. Tier 2 users who opted down from the default should also store their password securely. All users should either store their password in a password manager or configure the opt-in BIP-39 recovery phrase immediately after vault creation.

See use-case-3 for all credential-loss and recovery flows, including Tier 1 password loss (with and without a recovery phrase) and the full Tier 2 (key file) setup and key-loss scenarios.

Use Case 2: Cross-Device Synchronisation

Overview

An individual user wants to access and edit their encrypted files from multiple devices (home PC, work laptop, tablet) using the same vault. The cloud manifest acts as the synchronisation source of truth; conflicts are detected and resolved manually.

Actors

Primary Actor: Individual user with multiple devices
Secondary Actors: Cloud storage provider (untrusted), Arx Runa system, USB key file (Tier 2 vaults only)

Preconditions

User has Arx Runa installed on all devices
The secondary device has cloud-config.json already present (either copied from the primary device or produced by the new-device bootstrap — see Alternate Flow below)
User has previously created a vault and pushed an encrypted manifest to cloud (see use-case-1)
For Tier 2 vaults: the USB key file is available on the secondary device

Main Flow

This describes ongoing use on a device that already has a local vault state (manifest present). For first-time use on a new device, see the "First Time on This Device" alternate flow below.

User launches Arx Runa on secondary device
User authenticates (password for Tier 1 vaults; password + USB key for Tier 2 vaults)
Arx Runa derives encryption keys and opens the local manifest, displaying the file browser
User selects a file to download
Arx Runa downloads encrypted chunks from cloud and decrypts them, verifying integrity
User views files in-app (Zero-Trace)
To update a file, user uploads the modified version via the drop zone
Arx Runa encrypts and stages the updated file locally
User triggers sync; Arx Runa increments the snapshot counter, uploads the updated chunks and manifest backup to cloud
User locks vault and removes USB key (if Tier 2)

Alternate Flows

First Time on This Device

Trigger: Secondary device has Arx Runa installed but has never accessed this vault (no local manifest, no cloud-config.json)

Steps:

User clicks "Recover vault from cloud" on the vault picker screen
User enters the cloud endpoint details (Rclone remote name, bucket, region), vault password, and (Tier 2) path to the USB key file on the recovery page; Arx Runa writes cloud-config.json to the local app data directory
Arx Runa downloads vault-header.json (plaintext) from the cloud root
Arx Runa derives encryption keys and downloads manifest/manifest-backup.blob from cloud
Arx Runa decrypts the manifest backup and writes the local SQLCipher database
Device is now fully set up; continue from Main Flow step 3

Recover with Recovery Phrase

Trigger: User has lost their vault password but retains their 24-word recovery phrase

Steps:

User clicks "Recover vault from cloud" on the vault picker screen, or selects "Forgot password?" on the login page
User selects the "Recovery phrase" mode and enters the cloud endpoint details, their 24-word recovery phrase, and (Tier 2) path to the USB key file
Arx Runa downloads vault-header.json from the cloud root
Arx Runa derives encryption keys from the recovery phrase and downloads manifest/manifest-backup.blob from cloud
Arx Runa decrypts the manifest backup and writes the local SQLCipher database
Device is now fully set up; continue from Main Flow step 3

Manifest Out of Sync

Trigger: User syncs (pushes) and Arx Runa detects the cloud snapshot_counter is ahead of the local copy

Steps:

Arx Runa detects cloud snapshot_counter > local snapshot_counter during sync
Arx Runa shows dialog: "Another device has synced. Pull changes and continue?"
If accepted: Arx Runa runs pull_and_reconcile, downloads the latest manifest from cloud replacing the local copy, then retries sync
If declined: Arx Runa shows a persistent banner "Working with stale manifest — conflicts possible"; user can pull at any time via the banner

Concurrent Edit Conflict

Trigger: Same file was edited on two devices before either pushed

Steps:

User pushes from Device A (snapshot_counter increments)
User attempts to push from Device B with stale manifest
Arx Runa detects conflict during sync (snapshot_counter mismatch) and prompts: "Another device has synced. Pull changes and continue?"
User accepts pull: Arx Runa downloads cloud manifest and replaces local copy
Locally-pending files whose names collide with cloud entries are automatically renamed with a (conflicted copy) suffix (e.g. report.pdf → report (conflicted copy).pdf)
Arx Runa retries sync; both the cloud version and the renamed local version are uploaded

USB Key Not Available (Tier 2 Vault)

Trigger: User at secondary device without their USB key

Steps:

User attempts to access a Tier 2 vault
Arx Runa displays: "No key file selected"
User cannot access Tier 2 vault until USB key is available
Tier 1 vaults remain accessible with password only

Download-Only Mode

Trigger: User wants read-only access on a shared or public device

Steps:

User follows Main Flow steps 1–6 (authenticate, pull, download, decrypt)
User views files but does not edit
User locks vault without pushing any changes

Edit File Externally

Trigger: User wants to edit a file in an external application

Steps:

User exports a decrypted copy to disk (see use-case-1 Export alternate flow)
User edits the file in an external application
User uploads the modified file back via the drop zone
Arx Runa encrypts the updated file and replaces the previous version
The exported copy remains on disk — the user is responsible for deleting it

Success Criteria

User can access vault from any device with the correct authentication factors
Cloud manifest stays synchronised; snapshot_counter detects divergence
Conflicts are detected when syncing; pending local files are preserved as conflict copies when they collide with cloud state
Tier 1 vaults are accessible with password only; Tier 2 vaults require USB key on each device
No device stores plaintext persistently unless the user explicitly exports a file

Security Considerations

Threats Addressed

Cloud provider correlation: Cloud sees only random UUID uploads from different devices
Device compromise: Compromise of one device does not affect other devices (no plaintext at rest)
Shared computer risk: User can access vault temporarily without leaving plaintext artifacts

Assumptions

All devices running Arx Runa are trusted (no malware capturing keys during session)
User remembers to lock vault when leaving a device unattended
Network between devices and cloud is not trusted (Arx Runa does not rely on transport security)

Out of Scope

Automatic conflict resolution (user must resolve manually)
Real-time sync across devices (push/pull model, not live collaboration)
Multi-user access control (single-user vault only in current design)

Notes

Cross-device sync requires explicit pull/push operations — Arx Runa does not run a background sync daemon. For Tier 2 vaults, carrying the USB key between devices is a deliberate security trade-off.

Use Case 3: Hardware MFA, Recovery, and Key Loss

Overview

This use case covers all credential-loss and recovery scenarios across both authentication tiers. The main flow demonstrates Tier 2 vault creation (password + USB key file). Alternate flows cover: password loss for Tier 1 and Tier 2 vaults (with and without a recovery phrase), USB key loss for Tier 2 vaults, backup USB key restoration, recovery phrase setup, password change with an active recovery slot, and USB key compromise. The opt-in BIP-39 recovery phrase is the single recovery mechanism available to users of either tier.

Actors

Primary Actor: Individual user requiring hardware-based authentication
Secondary Actors: Arx Runa system, USB key file (hardware factor)

Preconditions

User has Arx Runa installed on their local machine
User has configured an Rclone backend
User has a dedicated USB drive for key file generation

Main Flow

User launches Arx Runa and selects "Create Vault"
Arx Runa prompts: "Choose authentication tier — Tier 1 (password only) or Tier 2 (password + USB key)"
User selects Tier 2
User sets vault password
Arx Runa prompts: "Insert USB drive for key file generation"
User inserts USB drive
Arx Runa generates a random key file and writes it to the USB drive
Arx Runa displays: "Store this USB key securely — losing it means permanent data loss for this vault"
Arx Runa derives encryption keys from the password and key file, then creates the vault
User removes USB drive and stores it securely
Later, user accesses the vault:
Arx Runa prompts: "Insert USB key and enter password"
User inserts USB drive; Arx Runa reads key_file_bytes and derives keys
User accesses files; locks vault and removes USB key when done

Alternate Flows

Password Loss — Without Recovery Phrase

Trigger: User forgets vault password and has no recovery phrase configured

Steps:

(Tier 2 only) User inserts USB key
User attempts vault unlock with incorrect password
Arx Runa derives wrong master_key; SQLCipher decryption fails
Arx Runa displays: "Authentication failed"
No recovery slot is configured — vault data is permanently inaccessible

Outcome: Data lost. Mitigations: store password in a password manager or physical safe; configure a recovery phrase at vault creation.

Password Loss — With Recovery Phrase

Trigger: User forgets vault password but has a recovery phrase configured

Steps:

User selects "Recover with phrase" on the login screen
Arx Runa fetches vault header; confirms a bip39 recovery slot is present
User enters 24-word recovery phrase
Arx Runa validates BIP-39 checksum — words not in the BIP-39 wordlist or an invalid checksum are caught immediately; if all words are valid but the phrase is incorrect, recovery fails with an authentication error
Arx Runa derives recovery_key via Argon2id and decrypts wrapped_master_key
HKDF derives vault-level session keys; session begins
Arx Runa prompts: "Set a new password to complete recovery"
User sets new password; vault is re-keyed; recovery slot re-wrapped under new master_key

Outcome: Vault recovered. (Tier 2) User should verify backup USB key is still functional after recovery.

USB Key Loss (Tier 2 Vault) — Without Recovery Phrase

Trigger: User loses the USB drive and has no recovery phrase configured

Steps:

User knows password but cannot locate USB key file
Arx Runa scans removable drives for a file matching the key file fingerprint stored in the vault
No matching key file found; Arx Runa displays: "Key file not found"
No recovery slot is configured — Tier 2 vault data is permanently inaccessible

Outcome: Data lost. Mitigations: create backup USB key copies immediately after vault creation; configure a recovery phrase.

USB Key Loss (Tier 2 Vault) — With Recovery Phrase

Trigger: User loses the USB drive but has a recovery phrase configured

Steps:

User selects "Recover with phrase" on the login screen
User enters 24-word recovery phrase; Arx Runa decrypts wrapped_master_key as above
Session begins; Arx Runa prompts: "Set a new password and insert a new USB key to complete recovery"
User sets new password and inserts a new USB drive; Arx Runa generates a new key file
Vault is re-keyed to the new password + new USB key; recovery slot re-wrapped

Outcome: Vault recovered. The old USB key file is irrevocably lost; the new USB key replaces it. The user should create backup copies of the new USB key immediately.

Backup USB Key Restoration

Trigger: User loses primary USB key but has a backup copy

Steps:

User retrieves backup USB drive from secure storage (e.g., fireproof safe, safety deposit box)
Arx Runa finds the 32-byte file with matching BLAKE3 fingerprint
User enters password; Arx Runa derives same master_key (identical key_file_bytes)
Vault unlocks successfully

Outcome: Data recovered. Create backup copies immediately after generating the key file.

Recovery Phrase Setup

Trigger: User wants to configure a recovery phrase for their vault

Steps:

User opens Security settings and selects "Set up recovery phrase"
Arx Runa prompts: "Enter your current password" (and "Insert USB key" for Tier 2)
Arx Runa re-derives master_key from current credentials
Arx Runa generates 256 bits of entropy; displays 24 words to the user
User writes down all 24 words; Arx Runa prompts: "I have written down my recovery phrase"
After acknowledgement, phrase is zeroed from memory; recovery slot added to vault header
Arx Runa displays: "Recovery phrase configured. Keep it in a secure, separate location from your USB key."

Outcome: Recovery slot active. The phrase is the only copy — Arx Runa does not store it.

Password Change with Recovery Phrase Active

Trigger: User changes their vault password while a recovery slot is configured

Steps:

User opens Security settings and selects "Change password"
Arx Runa authenticates with current credentials
Arx Runa prompts: "Enter your recovery phrase to keep it valid after the password change"
User enters 24-word phrase; Arx Runa verifies it decrypts the current master_key correctly
User enters new password; Arx Runa derives new master_key and re-wraps all keys
Recovery slot is updated: master_key re-encrypted under the same recovery_key (phrase unchanged)
Vault header uploaded; session continues with new keys

Outcome: Password changed; existing recovery phrase remains valid. If the user cannot provide the phrase at step 4, they can skip it — the recovery slot is removed with a warning.

USB Key Compromised

Trigger: Attacker obtains a copy of the USB key file but not the password

Steps:

Attacker attempts brute-force against vault with copied key file
The key derivation function makes each attempt computationally expensive
Vault remains secure as long as password has sufficient entropy
User should rotate the USB key file (Arx Runa re-wraps internal keys without re-encrypting cloud data)

Success Criteria

Tier 1 vault cannot be unlocked without the correct password — unless recovery phrase is used
Tier 2 vault cannot be unlocked with password alone (USB key mandatory) — unless recovery phrase is used
Tier 2 vault cannot be unlocked with USB key alone (password mandatory) — unless recovery phrase is used
USB key file is deterministic: identical bytes always produce the same master_key
No cloud-based factors, no third-party recovery, no admin override
Authentication is fully offline — no internet required (vault header is cached locally after first download)
A separate Tier 1 vault (if the user has one) remains accessible with password only
Recovery phrase alone unlocks vault regardless of tier — when configured
After recovery, user must set new primary credentials before vault is fully operational
Recovery slot survives password change and key rotation when phrase is provided during the ceremony

Password and Key Recovery — Full feasibility analysis and decision rationale

Security Considerations

Threats Addressed

Password-only attack: Attacker with password but no USB key cannot unlock Tier 2 vault (without recovery phrase)
USB-only attack: Attacker with USB key but no password faces expensive Argon2id brute-force (without recovery phrase)
Cloud provider subpoena: Provider has only encrypted blobs with no key material
Coerced account recovery: No backdoor exists for law enforcement or Arx Runa developers
Insider threats: No admin mechanism that could be abused to bypass authentication
Recovery phrase attack: Attacker who obtains the 24-word phrase can unlock the vault regardless of tier. Mitigation: phrase has 256-bit entropy — brute-force is computationally infeasible. Physical security of the written phrase is the user's responsibility.

Assumptions

User physically secures USB key (locked drawer, safe, or safety deposit box)
User creates at least one backup USB key and stores it in a separate physical location
User chooses a strong password (≥12 characters, mixed case, symbols, numbers)
User accepts that Tier 2 key loss means permanent data loss for that vault — unless a recovery phrase is configured
If a recovery phrase is configured, user stores it in a secure location physically separate from the USB key (compromising both voids the two-factor protection)

Out of Scope

Social engineering or coercion to provide both factors
Malware capturing key file bytes during session
Tier 1 vault key loss (password-only; recover via password manager)

Notes

Zero-knowledge architecture is compatible with client-side recovery mechanisms where recovery material is generated and stored entirely by the user — the server never sees keys or plaintext in any recovery flow. Server-side account recovery remains incompatible: any mechanism requiring a server to hold or re-derive key material violates the zero-knowledge guarantee.

Users who require data recoverability should configure the opt-in BIP-39 recovery phrase and store it in a secure, offline location separate from the USB key. Users who apply Tier 2 to their highest-value vaults and do not configure recovery must maintain backup USB key copies as their sole fallback.

Overview

A user wants to share specific files from their vault with a friend or family member — holiday photos, a shared document, a home video — without exposing the content to the cloud provider. Both parties use Arx Runa; the sender encrypts the file's key so only the intended recipient can access it.

Actors

Primary Actor: Individual user sharing files (sender)
Secondary Actors: Friend or family member receiving files (recipient), cloud storage provider (untrusted), Arx Runa system

Preconditions

Sender has Arx Runa installed with a vault containing the files to share
Recipient has Arx Runa installed and has shared their public identity key with the sender
Both parties have their vault backed by Backblaze B2 or Google Drive (MVP scope)

Main Flow

Sender unlocks vault and selects a file to share (e.g., a holiday photo, a shared document)
Sender selects "Share" and enters the recipient's identifier (name or public key)
Arx Runa retrieves the file's encryption key
Arx Runa encrypts the file key so only the recipient can decrypt it
Arx Runa creates an encrypted share package containing the file's encryption key and cloud location
Arx Runa copies the encrypted file chunks to a shared area in the cloud
Sender delivers the share package to the recipient out-of-band (email, messaging, USB)
Recipient opens Arx Runa and imports the share package
Arx Runa displays the shared file in "Shared with Me"
Recipient decrypts the file key using their private key
Recipient downloads and decrypts the shared file
Recipient views the file in-app or exports a decrypted copy to disk
Arx Runa writes a download receipt (file_id, timestamp) to the cloud under shared/<file_share_id>/receipts/

Alternate Flows

Trigger: Sender wants the share to expire after a set time

Steps:

Sender configures expiration (e.g., "expire after 30 days") when sharing
After expiration, recipient attempts to access the file
Arx Runa checks timestamp; displays "Share expired — contact sender for renewed access"
Shared file chunks are deleted from cloud; recipient can no longer access the file

Revoke Access

Trigger: Sender changes their mind and wants to remove access

Steps:

Sender selects the shared file → "Revoke share"
Arx Runa deletes the shared file chunks from the cloud and removes the share record locally
Recipient pulls updated manifest; file no longer appears in "Shared with Me"

Owner Notified of Download

Trigger: Sender opens Arx Runa after recipient has downloaded a shared file

Steps:

Sender unlocks vault and pulls latest manifest from cloud
Arx Runa reads the download receipt written by the recipient (file_id, timestamp)
Arx Runa displays: "Your shared file was downloaded by [recipient] on [date]"
No server required — receipt is a small encrypted blob written under shared/<file_share_id>/receipts/ in the cloud, picked up on next pull

Recipient Does Not Have Arx Runa

Trigger: Recipient is a non-technical user without Arx Runa installed

Steps:

Current design: recipient must have Arx Runa to decrypt the share package and access the file
Workaround: sender downloads and decrypts the file locally, then shares the plaintext via another channel (email, messaging app)
Sender accepts that the plaintext copy is outside Arx Runa's protection once exported

Success Criteria

File content is never exposed to the cloud provider during sharing
Only the intended recipient (holder of the matching private key) can decrypt the shared file
Sender can revoke access or set expiration at any time
Cloud provider can see that shared data exists but cannot read its content or identify the recipient
Recipient does not need access to the sender's vault or authentication factors
Sender is notified (on next pull) when recipient downloads a shared file
Recipient can export a decrypted copy to disk; sender is warned this is outside vault protection

Security Considerations

Threats Addressed

Untrusted cloud provider: Share metadata encrypted; cloud cannot see who shared what
Wrong recipient: The file key is encrypted for the intended recipient only — others cannot decrypt it
Persistent access after revocation: Shared chunks deleted from cloud on revoke; recipient who has already downloaded the file retains their local copy

Assumptions

Recipient's public key is obtained through a trusted channel (not from the cloud provider)
Recipient pulls the latest manifest before accessing the shared file
Revocation is not instant — recipient who cached the manifest retains access until they pull

Out of Scope

Sharing with recipients who do not have Arx Runa installed
Cryptographic enforcement of read-only access (recipient holds the file key and can re-encrypt)
Group sharing with multiple recipients simultaneously (future enhancement)
Single-click folder sharing (planned; currently requires sharing each file individually)

Notes

See use-case-3 for authentication factor considerations when the sender uses a Tier 2 vault.

Use Case 5: Multi-Destination & Redundant Backup

Overview

A user wants their encrypted vault backed up to more than one cloud provider simultaneously — for redundancy, cost diversification, or to migrate from one provider to another without downtime. Arx Runa lets the user add multiple destinations, each with its own backup mode (mirror or accumulating), and designate one as the primary. Sync pushes to all active destinations; the primary is used as the read source when pulling the vault.

Actors

Primary Actor: Individual user with an existing vault
Secondary Actors: Two or more cloud storage providers (both untrusted), Arx Runa system

Preconditions

User has Arx Runa installed and has at least one vault configured with a primary destination
User has credentials for a second cloud provider (e.g., Backblaze B2 account, Google Drive Service Account)

Main Flow — Add a Mirror Destination

User opens the Destinations page
User clicks "Add Destination", enters a label, and selects backup mode "Mirror"
User selects the provider type (Backblaze B2, Google Drive, local path, S3-compatible, etc.) and fills in the provider-specific fields
User clicks "Add Destination" — Arx Runa registers the new destination
User triggers sync
Arx Runa encrypts and uploads all pending chunks to every active destination
Both destinations now hold an identical set of encrypted blobs
If either destination is unreachable, Arx Runa records the backup failure and completes the upload to the reachable destination

Alternate Flows

Accumulating Mode (Retain Deleted Files)

Trigger: User wants a secondary destination that keeps a historical copy of deleted files

Steps:

User adds a new destination and selects backup mode "Accumulating"
When the user deletes a file from the vault and syncs, Arx Runa removes the chunks from mirror destinations but retains them on accumulating destinations
The primary destination always reflects the current vault state; the accumulating destination acts as a long-term archive
To recover a deleted file the user must restore from the accumulating destination manually (not yet supported in-app)

Cloud Provider Migration

Trigger: User wants to switch from one cloud provider to another with no data loss and minimal downtime

Steps:

User adds the new provider as a destination with backup mode "Mirror"
User syncs — Arx Runa uploads all encrypted blobs to the new destination (UUID names and content unchanged; no re-encryption needed)
User verifies the new destination is healthy (no backup failures shown)
User clicks "Set as Primary" on the new destination — Arx Runa promotes it and demotes the old primary
User deletes the old destination once confident the migration is complete
No re-encryption required at any step — the same opaque blobs work across providers

Backup Failure Recovery

Trigger: A destination becomes unreachable during or after a sync

Steps:

After a failed sync to a destination, Arx Runa displays a "N backup failures" badge on that destination in the list
User resolves the connectivity or credential issue (e.g., re-configures the Rclone backend)
User triggers sync — Arx Runa retries the failed destination alongside the others
On success, the failure badge clears

Delete a Destination

Trigger: User wants to remove a destination (e.g., after migrating to a new provider)

Steps:

User clicks "Delete" on a non-primary destination
Arx Runa asks for confirmation
User confirms — Arx Runa removes the destination from the list; existing blobs on the remote are not deleted by Arx Runa
Primary destinations cannot be deleted; user must first promote another destination to primary

Success Criteria

Sync pushes identical encrypted blobs to every active destination in a single operation
Mirror destinations always reflect the current vault state (deleted files are removed)
Accumulating destinations retain blobs even when the corresponding file is deleted from the vault
Exactly one destination is marked "primary" at all times; it cannot be deleted without first promoting another
Backup failures are surfaced per-destination and cleared automatically on the next successful sync
No re-encryption is required when migrating between providers

Security Considerations

Threats Addressed

Single provider failure: Data survives the loss of any one provider when a mirror destination is configured
Provider lock-in: Migration between providers never requires decrypting and re-encrypting; blobs are provider-agnostic
Accidental deletion: Accumulating mode retains deleted chunks on at least one destination

Assumptions

Both providers are untrusted; neither receives encryption keys or plaintext at any point
The user is responsible for keeping provider credentials up to date
Arx Runa does not verify that a deleted destination's remote storage has been cleaned up — that is the user's responsibility

Out of Scope

Automatic conflict resolution between destinations that have diverged
In-app restoration of files from an accumulating destination
Bandwidth or cost management across destinations

How It Works

These six pages explain what Arx Runa does and why it is trustworthy — not how to build it, and not how to use it. The audience is anyone who wants to understand the security model before deciding whether to trust the system.

The one guarantee that runs through every page: everything is encrypted before it leaves your device. The cloud receives opaque blobs. No file names, no directory structure, no metadata, no keys. Arx Runa is designed so that even a fully compromised cloud provider learns nothing about your files.

The Vault — What a vault is, where it lives, and why the master key never touches the cloud.
Unlocking: Password and USB Key — What happens when you unlock, what Argon2id does, and what the USB key adds.
Recovery: If You Lose Your Key — Recovery phrases, new-device bootstrap, and the honest limits of recovery.
How Files Are Encrypted and Decrypted — The full round-trip: EXIF stripping, chunking, padding, per-file keys, authenticated encryption, integrity verification, and reassembly.
What the Cloud Sees — The cloud layout, what an attacker with storage access can and cannot learn, and how multi-destination backup works.
Sharing Files Privately — Public-key sharing with HPKE, snapshot semantics, revocation, and the role of key fingerprints.

The Vault

Everything in Arx Runa starts with the vault. Think of it as a strongbox that spans two places at once: a local encrypted database on your device, and a collection of encrypted blobs in your cloud storage. Neither half is useful without the key — and the key only ever lives in your head.

What a vault contains

On your device, the vault is a SQLCipher database — an encrypted SQLite file — that tracks every file you've added: its name, directory structure, where its encrypted chunks live in the cloud, and the keys needed to decrypt them. This database is your manifest. Without it, the cloud blobs are unreadable ciphertext with no index.

In the cloud, the vault is a flat directory of opaque, fixed-size blobs plus a vault header file and an encrypted manifest backup. The blobs are your encrypted file data. The header holds the parameters needed to re-derive your keys on any device. The manifest backup means that even if you lose your local device entirely, the full index of your files can be restored — encrypted, so the cloud provider sees only ciphertext.

The master key

When you create a vault, Arx Runa runs your password through Argon2id — a deliberately slow, memory-hard function designed to make brute-force attacks expensive. The output is your master key: a 256-bit value that exists only in locked memory during your session. It is never written to disk, never logged, and zeroed the moment it has served its purpose.

If you've enabled the USB key option, the key file on your drive is combined with your password before Argon2id runs. Losing either input means the master key can't be reconstructed without both.

The key tree

The master key doesn't directly encrypt your files. Instead, Arx Runa uses HKDF (RFC 5869) to expand it into three purpose-specific keys, each derived with a distinct label so they are cryptographically independent:

key_encryption_key — wraps the individual encryption key for each of your files
sqlcipher_key — encrypts the manifest database itself
manifest_key — encrypts the manifest backup stored in the cloud

Knowing one of these keys reveals nothing about the others. As soon as all three are derived, the master key is zeroed — it never persists beyond that instant.

flowchart TD
    PW["Password"]:::user
    KF["USB Key File<br/>(32 bytes random)"]:::user
    SALT["Argon2 Salt<br/>(from vault header)"]:::storage

    subgraph KDF ["Key Derivation — Argon2id"]
        ARGON["Argon2id<br/>m=65536, t=3, p=4"]:::crypto
    end

    MK_NODE(["master_key<br/>(mlocked memory)"]):::secret

    subgraph HKDF_LAYER ["Key Expansion — HKDF-SHA256 (RFC 5869)"]
        HKDF1["HKDF<br/>info: arx-runa-key-encryption"]:::crypto
        HKDF2["HKDF<br/>info: arx-runa-sqlcipher"]:::crypto
        HKDF3["HKDF<br/>info: arx-runa-manifest-backup"]:::crypto
    end

    subgraph VAULT_KEYS ["Vault-Level Keys"]
        KEK["key_encryption_key<br/>Wraps per-file file_keys"]:::secret
        SK["sqlcipher_key<br/>SQLCipher DB"]:::secret
        MK["manifest_key<br/>Cloud backup blob"]:::secret
    end

    subgraph PER_FILE ["Per-File Keys (generated at file creation)"]
        FK["file_key<br/>(random 256-bit via CSPRNG)<br/>XChaCha20-Poly1305 chunk encryption"]:::secret
        FKW["file_key_wrapped<br/>(file_key encrypted with key_encryption_key)<br/>stored in SQLCipher nodes table"]:::storage
    end

    PW -->|combined input| ARGON
    KF -->|combined input| ARGON
    SALT -->|salt| ARGON

    ARGON -->|outputs| MK_NODE

    MK_NODE -->|input| HKDF1
    MK_NODE -->|input| HKDF2
    MK_NODE -->|input| HKDF3

    HKDF1 -->|derives| KEK
    HKDF2 -->|derives| SK
    HKDF3 -->|derives| MK

    HKDF3 --> ZEROIZE_MK["zeroize(master_key)<br/>Immediately after HKDF"]:::zeroize

    KEK -->|wraps/unwraps| FK
    FK -->|encrypted with KEK| FKW

    FK --> USE_FK["Use for<br/>chunk encrypt/decrypt"]:::proc
    USE_FK --> ZEROIZE_FK["zeroize(file_key)<br/>After each operation"]:::zeroize

    classDef secret fill:#dc2626,stroke:#991b1b,color:#fff
    classDef crypto fill:#2563eb,stroke:#1e40af,color:#fff
    classDef storage fill:#16a34a,stroke:#166534,color:#fff
    classDef user fill:#9333ea,stroke:#6b21a8,color:#fff
    classDef zeroize fill:#ef4444,stroke:#991b1b,color:#fff,stroke-width:3px,stroke-dasharray:5 5
    classDef proc fill:#6b7280,stroke:#374151,color:#fff

Per-file keys

Every file you add gets its own unique random key, generated fresh at the time of encryption. That key encrypts the file's data chunks, then is immediately wrapped — encrypted — with your key_encryption_key and stored in the manifest. When you open a file, the wrapped key is unwrapped in memory, used for decryption, and zeroed again.

This means each file's security is independent. Re-encrypting or rekeying one file has no effect on any other, and there is no single "file encryption key" whose exposure would compromise your entire vault.

The vault header

The vault header is a small file stored in your cloud alongside your encrypted blobs. It holds your Argon2id parameters and the random salt used during key derivation — the inputs needed to repeat the derivation on a new device. If you've set up a recovery phrase, the header also contains an encrypted copy of the master key wrapped under the recovery key.

None of this is useful without your password or recovery phrase to drive the derivation. The header is what lets you unlock your vault on a new device without separately transferring the manifest database.

Unlocking: Password and USB Key

Unlocking a vault looks simple from the outside — type your password, press enter, done. Under the hood, the process is deliberately expensive. This page explains what happens step by step, and why each piece matters.

Two factors, one combined secret

Arx Runa supports two security tiers. In the basic configuration you unlock with your password alone. With USB MFA enabled, you also need a key file: a small file of exactly 32 bytes of random data that lives on a removable drive. No special hardware is required — any USB stick will do. The key file is pure entropy; it has no internal structure, no device identifier, no serial number. It is raw randomness written to a file you can name and place wherever you like on the drive.

The two factors are not stacked — they are combined into a single input before any cryptography runs. Arx Runa appends the raw key file bytes to your password bytes and feeds the combined value into the key derivation function. An attacker with only your password cannot unlock the vault, because the derivation produces a different master key without the key file. An attacker with only your drive cannot unlock it either. Each factor is useless without the other.

Finding your key file automatically

You never need to navigate to the key file or remember where you put it. When you plug in your USB drive, Arx Runa detects the mount event and scans the drive for any file that is exactly 32 bytes. For each candidate, it computes a BLAKE3 hash and compares it against a fingerprint stored in your vault header. When the fingerprints match, the key file field in the login screen fills automatically. You only need to enter your password and confirm.

The fingerprint in the vault header is public information — it is a verifier, not a secret. Knowing the hash does not help an attacker reconstruct the 32-byte file it was derived from. The file content is what matters; the fingerprint only identifies which file to use.

Argon2id: the slow step

Once the input is assembled — password alone, or password combined with key file — it goes into Argon2id. Argon2id is a memory-hard key derivation function: it consumes 64 MiB of RAM across multiple passes so that each guess an attacker makes costs significant time and memory. Where conventional hash functions can be parallelised onto thousands of GPU cores for fast brute-force, Argon2id's memory requirement caps how many guesses can run in parallel. The cost per guess stays high regardless of the attacker's hardware.

This is why unlocking takes a moment — typically about a second. That pause is the security working. The Argon2id parameters (memory, iterations, parallelism) are stored in your vault header so they can be updated in future versions without invalidating existing vaults.

The session

After Argon2id finishes, HKDF expands the output into your session keys (as described in The Vault). Those keys are immediately locked into physical RAM — the operating system is instructed not to page them to disk under any circumstances. If memory locking fails for any reason, Arx Runa refuses to open the session rather than silently operating with weaker protection.

Your session stays open while you're active. After 15 minutes of inactivity, Arx Runa zeroes every session key and closes your vault. You can also lock manually at any time. For Tier 2 vaults, removing the USB drive triggers an immediate lock — the physical key must be present to keep the vault open.

sequenceDiagram
    participant You
    participant App as Arx Runa
    participant USB as USB Drive
    participant KDF as Argon2id + HKDF
    participant Mem as Locked Memory (mlock)

    You->>App: Open Arx Runa
    App->>USB: Watch for drive mount event
    USB-->>App: Drive connected
    App->>App: Scan for 32-byte files<br/>verify BLAKE3 fingerprint against vault header
    App-->>You: Key file detected — enter password
    You->>App: Type password, confirm
    App->>KDF: Argon2id(password #124;#124; key_file, salt) 64 MiB, 3 iterations, 4 threads
    KDF-->>App: master_key (~1 second)
    App->>KDF: HKDF(master_key) x 3
    KDF-->>App: session keys
    App->>App: zeroize(master_key)
    App->>Mem: mlock(session keys)
    App-->>You: Vault open

    note over App,Mem: 15 min inactivity or USB removal
    App->>Mem: zeroize(session keys)
    App-->>You: Vault locked — re-enter password to continue

What an attacker faces offline

If someone extracts your vault's storage — the encrypted blobs and vault header — they can try to brute-force your password. What they find is that each guess requires a full Argon2id computation: 64 MiB of RAM, three passes, with no shortcut available regardless of their hardware. For a Tier 2 vault, they also need your USB drive in hand. The combination makes offline attacks impractical against any reasonable password.

Recovery: If You Lose Your Key

Every security system has to answer the same uncomfortable question: what happens when something goes wrong? Arx Runa has thought carefully about failure modes, and this page walks through your options for each one. Knowing the recovery paths — and their limits — is part of evaluating whether the system deserves your trust.

The Recovery Phrase

When you create a vault, Arx Runa can generate a recovery slot: an independent second way to open your vault. Setup is opt-in and works like this. Arx Runa generates 256 bits of cryptographically random entropy and encodes it as a 24-word phrase in the BIP-39 wordlist format — the same format used by hardware cryptocurrency wallets. You see this phrase exactly once. Write it down and store it somewhere safe, separate from your devices.

Internally, your phrase becomes a key through the same Argon2id derivation used for your password. That key then wraps an encrypted copy of your master key and stores it in the vault header in the cloud. The phrase itself is never stored anywhere — Arx Runa holds only the encrypted copy.

flowchart TD
    PHRASE["BIP-39 Phrase<br/>(24 words, 256-bit entropy)"]:::user
    REC_SALT["Recovery Salt<br/>(from vault header)"]:::storage

    subgraph REC_KDF ["Recovery Key Derivation — Argon2id"]
        REC_ARGON["Argon2id<br/>same params as primary slot"]:::crypto
    end

    REC_KEY(["recovery_key"]):::secret

    MK_INPUT(["master_key<br/>(from primary derivation;<br/>held in mlocked memory)"]):::secret

    subgraph WRAP_BLOCK ["Key Wrapping — XChaCha20-Poly1305"]
        WRAP["XChaCha20-Poly1305 encrypt<br/>AAD: #34;arx-runa recovery v1#34; #124;#124; vault_id_bytes<br/>Nonce: 24B CSPRNG"]:::crypto
    end

    WMK["wrapped_master_key<br/>(72 bytes: 24B nonce #124; 32B ciphertext #124; 16B tag)<br/>stored in vault header recovery_slot"]:::storage

    PHRASE -->|phrase input| REC_ARGON
    REC_SALT -->|salt| REC_ARGON
    REC_ARGON -->|derives| REC_KEY

    MK_INPUT -->|plaintext input| WRAP
    REC_KEY -->|encryption key| WRAP
    WRAP -->|ciphertext blob| WMK

    classDef secret fill:#dc2626,stroke:#991b1b,color:#fff
    classDef crypto fill:#2563eb,stroke:#1e40af,color:#fff
    classDef storage fill:#16a34a,stroke:#166534,color:#fff
    classDef user fill:#9333ea,stroke:#6b21a8,color:#fff

Using the Phrase to Recover

If you forget your password or lose your USB key file, you enter the recovery phrase instead. Arx Runa runs Argon2id over the phrase, derives the recovery key, and uses it to unwrap your master key from the cloud vault header. From that point on, the session proceeds exactly as a normal unlock.

Recovery is a single atomic ceremony: you supply the phrase and your new credentials in one step. Arx Runa re-wraps everything under the new credentials and uploads an updated vault header. Afterwards your vault is fully operational under a new password (and optionally a new USB key), and your recovery phrase continues to work against the updated vault — you do not need to generate a new one.

The BIP-39 checksum embedded in the final word of the phrase catches transcription errors before Argon2id even runs, giving you immediate feedback on a mistyped word.

New Device

Moving to a new machine requires no special ceremony if you still know your credentials. Configure your cloud backend, and Arx Runa fetches the vault header — which contains everything needed to re-derive your keys. Enter your password (and insert your USB key file if your vault uses one), and Arx Runa downloads the encrypted manifest backup from the cloud, decrypts it, and bootstraps a fully operational local vault. Nothing was stored locally on the old machine that needs to be transferred.

If the old machine is gone and you have also forgotten your password, this is where the recovery phrase is essential: fetch the vault header, enter the phrase, set new credentials, and you are back.

Replacing a Lost USB Key

If you use USB two-factor authentication and lose the drive — but still remember your password — you can rotate the key file without the recovery phrase. The rotation ceremony requires the old key file to be present, so you need to act before losing access to it entirely. Arx Runa generates a new key file on a replacement drive, re-derives the master key under the new combination, and re-wraps everything. Your sharing relationships survive: the underlying identity keypair does not change during rotation, only the wrapping around it.

The Hard Limit

If you did not configure a recovery slot, or if you lose both your password and your recovery phrase, your vault cannot be opened — not by you, not by Arx Runa. The same encryption that makes your files safe from an attacker makes them equally inaccessible without the keys. This is deliberate, not a gap: it means no support process, no account recovery form, and no legal demand can produce your data.

If that prospect concerns you, the answer is to configure a recovery slot now and store the phrase somewhere physically safe.

How Files Are Encrypted and Decrypted

When you add a file to Arx Runa, it never reaches the cloud in recognisable form. By the time the first byte leaves your device, the file has been stripped of hidden metadata, split into uniform chunks, padded to an identical size, and encrypted under a key that exists nowhere outside your vault. Here is what happens at each step — and why.

Stripping hidden metadata

Before encryption begins, Arx Runa strips EXIF, XMP, and IPTC metadata from media files. A photo from your phone carries GPS coordinates, camera model, lens settings, and timestamps alongside the image itself — information you may not intend to archive. Arx Runa removes all of it in memory before any further processing. Your original file on disk is never modified; the stripped copy is what enters the encryption pipeline. When you later export a file, the exported copy will also be free of that embedded metadata.

Splitting into chunks

The clean file is split into fixed-size chunks — 4 MiB by default, though you choose the size once at vault creation and it applies to every file thereafter. Chunking serves three purposes: it enables streaming so Arx Runa never holds an entire file in memory, it allows partial download when you only need part of a large file, and it keeps memory use bounded regardless of how big the file is.

Padding every chunk to the same size

The final chunk of most files is shorter than the full chunk size. Rather than uploading a shorter blob — which would let an observer infer the file's true size — Arx Runa zero-pads the last chunk to the full chunk size before encryption. Every blob the cloud receives is identically sized, whether it contains 1 byte of real data or 4 MiB. The actual file size is stored inside your encrypted manifest database; the cloud learns only how many blobs exist, nothing more.

A unique key for every file

When you add a file, Arx Runa generates a fresh random 256-bit key just for that file. This key is immediately wrapped — encrypted using the key_encryption_key derived from your master key — and stored in the manifest. The raw file key never touches disk and is zeroed from memory as soon as encryption is complete.

Because each file has its own independent key, the exposure or rotation of one file's key has no effect on any other file. Rekeying is surgical, not vault-wide.

Encrypting each chunk

Each chunk is encrypted with XChaCha20-Poly1305, an authenticated encryption scheme. The cipher generates a fresh 192-bit random nonce for every single chunk — with a nonce space this large, the probability of two chunks ever sharing a nonce is negligible across any realistic number of files.

Alongside the nonce, each encryption operation takes in associated authenticated data (AAD) that binds the ciphertext to both the file's unique identity and the chunk's position in the sequence. This means a chunk cannot be silently reordered or transplanted from another file — if the file or position doesn't match, the authentication tag will fail and decryption is rejected before any data is returned.

The wire format stored for each chunk is:

[ 24-byte nonce | ciphertext | 16-byte Poly1305 tag ]

The 16-byte tag is what makes tampering detectable. Flip a single bit in transit or storage and decryption fails cleanly.

sequenceDiagram
    participant Caller
    participant encrypt_chunk
    participant CSPRNG
    participant XChaCha20Poly1305

    Caller->>encrypt_chunk: plaintext, file_key, file_id, chunk_index
    encrypt_chunk->>CSPRNG: generate_nonce()
    CSPRNG-->>encrypt_chunk: nonce (24 bytes)
    encrypt_chunk->>encrypt_chunk: construct AAD = file_id #124;#124; chunk_index
    encrypt_chunk->>XChaCha20Poly1305: encrypt_in_place_detached(nonce, aad, plaintext)
    XChaCha20Poly1305-->>encrypt_chunk: tag (16 bytes)
    encrypt_chunk->>encrypt_chunk: assemble [nonce #124; ciphertext #124; tag]
    encrypt_chunk-->>Caller: blob

Integrity check on the encrypted blob

After encryption, Arx Runa computes a BLAKE3 checksum over the encrypted blob. This checksum is recorded in the manifest alongside the chunk record. When a chunk is downloaded, the checksum is verified before decryption begins — so bit rot or storage corruption is caught immediately, without any decryption key being exercised against corrupt data.

The manifest

The manifest is a SQLCipher database encrypted with its own key derived independently from the master key. It holds the mapping from your file paths and directory structure to chunk records — including chunk positions, blob identifiers, BLAKE3 checksums, and the wrapped file keys. Without the manifest, the cloud blobs are an anonymous, unordered collection of identically sized ciphertext. The manifest is also backed up to the cloud in encrypted form so you can restore it on a new device.

The full pipeline

flowchart TD
    subgraph ENCRYPT ["Encrypt Path"]
        E1["Source file<br/>(streaming)"]:::io
        E2["Strip EXIF metadata<br/>(in memory only)"]:::proc
        E3["Read chunk_size bytes<br/>(zero-pad if last chunk)"]:::proc
        E4["encrypt_chunk<br/>(file_key, AAD = file_id #124;#124; chunk_index)"]:::crypto
        E5["[24B nonce #124; ciphertext #124; 16B tag]<br/>wire_blob"]:::data
        E6["blake3::hash(wire_blob)<br/>→ blake3_checksum"]:::proc
        E7["Write to staging/{uuid}.blob"]:::io
        E8["ChunkRecord<br/>(chunk_index, blob_name, blake3_checksum)"]:::data
        E9["Insert node + chunks<br/>(SQLCipher transaction)"]:::db
    end

    subgraph KEYS ["Key Lifecycle"]
        K1["Generate file_key<br/>(256-bit CSPRNG)"]:::crypto
        K2["Wrap: encrypt(file_key, key_encryption_key)<br/>#45;#62; file_key_wrapped"]:::crypto
        K3["Store file_key_wrapped<br/>in manifest"]:::db
        K4["Zeroize file_key<br/>after use"]:::crypto
    end

    K1 --> K2 --> K3
    E1 --> E2 --> E3 --> E4
    K1 -.->|file_key| E4
    E4 --> E5 --> E6 --> E7 --> E8 --> E9 --> K4

    classDef io fill:#16a34a,stroke:#166534,color:#fff
    classDef proc fill:#2563eb,stroke:#1e40af,color:#fff
    classDef crypto fill:#dc2626,stroke:#991b1b,color:#fff
    classDef data fill:#9333ea,stroke:#6b21a8,color:#fff
    classDef db fill:#d97706,stroke:#92400e,color:#fff

How Files Are Decrypted

Decryption is the exact inverse of encryption. Every guarantee made on the way in — authentication, ordering, padding removal — is enforced again on the way out, before a single byte of plaintext is written.

Unwrapping the file key

The manifest stores the file key in wrapped form. To begin decryption, Arx Runa unwraps it using the key_encryption_key derived from the master key. The raw file key exists in memory only for the duration of the operation and is zeroed immediately after.

Pre-flight validation

Before touching any blobs, Arx Runa validates the chunk list from the manifest: the number of chunks must match what is expected for the file size, and chunk indices must be contiguous starting at zero with no gaps or duplicates. Any anomaly here is a hard stop — it means the manifest is inconsistent and the file cannot be safely reconstructed.

Locating each blob

Chunks may live in different locations depending on sync state. For each blob, Arx Runa checks in order: the pending upload directory, the local cache, and the staging directory. Whichever location holds the file wins. If none do, the blob must be downloaded from the cloud before decryption can proceed.

Verifying integrity before decryption

The BLAKE3 checksum stored in the manifest is verified against the blob before the file key is used. This is enforced at the type level — the VerifiedBlob type that decrypt_chunk accepts can only be constructed by verify_checksum, so it is impossible to decrypt a blob without first checking it. A mismatch means the blob was corrupted in storage or transit; the error is reported and decryption stops immediately.

Arx Runa also checks the blob's file size against the expected wire format size (chunk_size + 40 bytes for the 24-byte nonce and 16-byte tag) before reading it. A size mismatch fails without reading the blob content.

Decrypting each chunk

decrypt_chunk takes the verified blob, the file key, and the same AAD used during encryption (file_id || chunk_index). XChaCha20-Poly1305 authenticates the ciphertext and tag together: if either has been tampered with, or if the wrong file identity or chunk position is supplied, the authentication tag fails and no plaintext is returned. There is no partial output on failure.

The result is a buffer of exactly chunk_size bytes — the padded plaintext.

Stripping padding from the last chunk

Every chunk except the last is written in full. For the last chunk, Arx Runa reads the true file size from the manifest and writes only the bytes that belong to the file:

bytes_to_write = file_size − (chunk_index × chunk_size)

The zero-padding added at encryption time is silently discarded. The output file will be byte-for-byte identical to the original, minus any EXIF metadata that was stripped on the way in.

Atomic output

Arx Runa writes each chunk to a temporary file named <destination>.arx-runa-decrypt-<uuid>.tmp. Only after all chunks have been written and verified does it atomically rename the temporary file to the final destination. A crash at any point before the rename leaves no partial output at the destination path — the next attempt starts from the beginning.

The full pipeline

flowchart TD
    subgraph DECRYPT ["Decrypt Path"]
        D1["Read chunks from manifest<br/>(ordered by chunk_index)"]:::db
        D2["Resolve blob path<br/>(pending → cache → staging)"]:::io
        D3["Check file size<br/>(must equal chunk_size + 40)"]:::proc
        D4["Read wire_blob<br/>(BufReader read_exact)"]:::io
        D5["verify_checksum(wire_blob, blake3_checksum)<br/>→ VerifiedBlob"]:::proc
        D6["decrypt_chunk<br/>(file_key, AAD = file_id || chunk_index)"]:::crypto
        D7["padded_plaintext<br/>(chunk_size bytes)"]:::data
        D8["Write chunk to .tmp<br/>(full, or truncate last chunk)"]:::io
        D9["Atomic rename .tmp → destination"]:::io
        D10["Zeroize file_key"]:::crypto
    end

    subgraph KEYS ["Key Lifecycle"]
        K1["Read file_key_wrapped<br/>from manifest"]:::db
        K2["Unwrap: decrypt(file_key_wrapped,<br/>key_encryption_key) → file_key"]:::crypto
    end

    K1 --> K2
    D1 --> D2 --> D3 --> D4 --> D5 --> D6 --> D7 --> D8 --> D9 --> D10
    K2 -.->|file_key| D6

    classDef io fill:#16a34a,stroke:#166534,color:#fff
    classDef proc fill:#2563eb,stroke:#1e40af,color:#fff
    classDef crypto fill:#dc2626,stroke:#991b1b,color:#fff
    classDef data fill:#9333ea,stroke:#6b21a8,color:#fff
    classDef db fill:#d97706,stroke:#92400e,color:#fff

What the Cloud Sees

The cloud is the most obvious place to ask: what if someone gets in? A compromised S3 bucket, a subpoena to your provider, a rogue employee with storage access — any of these might give an attacker read access to everything you've uploaded. This page explains what they would find.

What is actually stored in the cloud

Your cloud storage holds a flat directory of encrypted blobs, a vault header file, and an encrypted manifest backup. Nothing else.

<remote>:<cloud_root>/
  vault-header.json               -- public parameters only, no key material
  manifest/
    manifest-backup.blob          -- encrypted SQLCipher export
  vault/
    <uuid>.blob                   -- your encrypted file chunks

Every encrypted chunk is named with a random UUID — 128 bits with no relation to the file it came from, the chunk's position in that file, or any other identifying information. There is no folder structure. There are no file names. There are no timestamps that reveal when a file was last modified. From the outside, the vault directory is an undifferentiated pile of identically sized blobs.

What the cloud provider can observe

An observer with full read access to your cloud storage can see:

What they see	What it reveals
Number of blobs in `vault/`	Approximate vault size in 4 MiB increments — not file count or individual file sizes
Each blob's size	Nothing — all blobs are padded to exactly the same size before encryption
Blob names	Nothing — each is a random UUID with no connection to file identity
Upload and download timing	When you are active; upload order is randomised, so which blobs belong to the same file cannot be inferred from timing alone
`vault-header.json` contents	Only public parameters needed to re-derive keys: the Argon2id salt, memory settings, and a BLAKE3 fingerprint of your USB key file — no key material, no decryptable content
`manifest/manifest-backup.blob`	That a manifest backup exists; the content is AEAD-encrypted and unreadable without your master key
File names, folder structure, file content	Nothing — all of this lives inside encrypted ciphertext

The one thing the cloud does learn is a lower bound on how much data you have — blob count multiplied by 4 MiB. This is inherent to any cloud backup system and cannot be hidden without far more complex techniques.

Staging is local and temporary

Before any chunk reaches the cloud, it passes through a local staging directory on your device. Chunks are encrypted and written to staging first, then uploaded by the sync layer, then deleted from staging once the upload is confirmed. The staging directory is never exposed to the cloud and is cleared of orphaned blobs on startup. The cloud receives only finished, encrypted blobs.

Rclone as the transport layer

Arx Runa does not implement its own cloud protocol. Instead it uses rclone — a mature, open-source tool that speaks to over 70 storage backends — as a sidecar process that handles the actual bytes-over-the-wire work. Arx Runa manages what gets uploaded and in what order; rclone handles authentication and transfer.

Out of the box, the setup wizard covers the most common providers: AWS S3, Backblaze B2, Wasabi, Cloudflare R2, Google Drive, OneDrive, and local or external drives. If your provider isn't on that list, you can supply a raw rclone configuration directly, which gives access to all backends rclone supports.

Cloud credentials are never written to disk in plaintext. They are stored as encrypted rows inside your vault's SQLCipher database — the same Argon2id-hardened key chain that protects everything else — and are passed to rclone via a temporary file when a session opens. That file is overwritten and deleted when the session closes.

Multiple destinations

You can configure one primary destination and any number of backup destinations. Every push goes to the primary; backups are mirrored from the primary on demand or on schedule using rclone sync. Because the blobs are already encrypted before they leave your device, copying them to a second cloud provider requires no re-encryption — the same XChaCha20-Poly1305 ciphertext lands verbatim on the backup. Losing your primary provider does not mean losing your data.

The vault header in the cloud

One file in the cloud is intentionally readable before you authenticate: vault-header.json. It contains the Argon2id salt and parameters your device needs to re-derive your keys on a new machine, and the BLAKE3 fingerprint that identifies your USB key file. It contains no key material and no decryptable content. Anyone who downloads it learns only that Arx Runa is being used and which Argon2id parameters were chosen — the same information that would be visible on the login screen of any app.

Storing the header in the cloud is what makes new-device recovery possible without any server on Arx Runa's side. See Recovery: If You Lose Your Key for the full flow.

Sync sequence

The diagram below shows a complete push and pull cycle, including conflict detection. All blob uploads are randomised in order and parallelised; the manifest backup is uploaded last, after all chunks are confirmed.

sequenceDiagram
    participant User
    participant Sync as Sync Module
    participant Meta as MetadataStore (SQLCipher)
    participant Stage as Staging Directory
    participant RT as RcloneTransport (sidecar)
    participant Cloud as Cloud Remote

    note over User,Cloud: Push Flow (upload local changes)
    User->>Sync: push()
    Sync->>Meta: get_meta("snapshot_counter") #45;#62; local_counter
    Sync->>RT: download_blob("manifest/manifest-backup.blob", temp)
    RT->>Cloud: rclone copyto manifest/manifest-backup.blob
    Cloud-->>RT: manifest-backup.blob
    RT-->>Sync: temp file
    Sync->>Sync: decrypt manifest backup #45;#62; cloud_counter
    break cloud_counter #62; local_counter
        Sync-->>User: CONFLICT - pull first
    end
    break cloud_counter #60; local_counter
        Sync-->>User: CONFLICT - cloud manifest older than local
    end

    Sync->>Meta: get all staged blob_names
    Sync->>Sync: Fisher-Yates shuffle(blob_list)
    
    note over Sync,Cloud: Concurrent upload (4 Rclone processes via JoinSet)
    
    par Upload blob 1
        Sync->>RT: upload_blob(staging/uuid1.blob)
        RT->>Cloud: rclone copyto vault/uuid1.blob
        Cloud-->>RT: ok
        RT-->>Sync: ok
        Sync->>Stage: delete staging/uuid1.blob
    and Upload blob 2
        Sync->>RT: upload_blob(staging/uuid2.blob)
        RT->>Cloud: rclone copyto vault/uuid2.blob
        Cloud-->>RT: ok
        RT-->>Sync: ok
        Sync->>Stage: delete staging/uuid2.blob
    and Upload blob 3
        Sync->>RT: upload_blob(staging/uuid3.blob)
        RT->>Cloud: rclone copyto vault/uuid3.blob
        Cloud-->>RT: ok
        RT-->>Sync: ok
        Sync->>Stage: delete staging/uuid3.blob
    and Upload blob 4
        Sync->>RT: upload_blob(staging/uuid4.blob)
        RT->>Cloud: rclone copyto vault/uuid4.blob
        Cloud-->>RT: ok
        RT-->>Sync: ok
        Sync->>Stage: delete staging/uuid4.blob
    end
    
    note over Sync: Repeat for next batch until all blobs uploaded
    Sync->>Meta: increment_snapshot_counter() #45;#62; new_counter
    Sync->>Meta: set_meta("last_synced_at", now)
    Sync->>Sync: VACUUM INTO temp#59; encrypt with manifest_key
    Sync->>RT: upload_blob(temp, manifest/manifest-backup.blob)
    RT->>Cloud: rclone copyto
    Cloud-->>RT: ok
    Sync->>RT: upload_blob(vault-header.json, vault-header.json)
    RT->>Cloud: rclone copyto
    Cloud-->>RT: ok
    Sync-->>User: push complete (new_counter blobs synced)

    note over User,Cloud: Pull Flow (new-device recovery)
    User->>Sync: pull()
    Sync->>RT: download_blob("vault-header.json", temp)
    RT->>Cloud: rclone copyto vault-header.json
    Cloud-->>RT: vault-header.json
    RT-->>Sync: temp file
    Sync->>Sync: parse VaultHeader #45;#62; salt, params, key_file_blake3
    Sync-->>User: prompt: password + USB key file
    User->>Sync: password + key_file_path
    Sync->>Sync: Argon2id(password || key_file, salt) #45;#62; master_key
    Sync->>Sync: HKDF #45;#62; key_encryption_key, sqlcipher_key, manifest_key
    Sync->>Sync: zeroize(master_key)
    Sync->>RT: download_blob("manifest/manifest-backup.blob", temp)
    RT->>Cloud: rclone copyto manifest/manifest-backup.blob
    Cloud-->>RT: manifest-backup.blob
    RT-->>Sync: temp file
    Sync->>Sync: decrypt manifest backup with manifest_key
    Sync->>Meta: import SQLCipher DB (keyed with sqlcipher_key)
    Sync->>Meta: get all chunk rows #45;#62; (blob_name, blake3_checksum)
    
    note over Sync,Cloud: Concurrent download (4 Rclone processes via JoinSet)
    
    par Download blob 1
        Sync->>RT: download_blob(vault/uuid1.blob)
        RT->>Cloud: rclone copyto vault/uuid1.blob
        Cloud-->>RT: uuid1.blob
        RT-->>Sync: staging/uuid1.blob
        Sync->>Sync: Verify BLAKE3 (delete + record failure on mismatch)
    and Download blob 2
        Sync->>RT: download_blob(vault/uuid2.blob)
        RT->>Cloud: rclone copyto vault/uuid2.blob
        Cloud-->>RT: uuid2.blob
        RT-->>Sync: staging/uuid2.blob
        Sync->>Sync: Verify BLAKE3
    and Download blob 3
        Sync->>RT: download_blob(vault/uuid3.blob)
        RT->>Cloud: rclone copyto vault/uuid3.blob
        Cloud-->>RT: uuid3.blob
        RT-->>Sync: staging/uuid3.blob
        Sync->>Sync: Verify BLAKE3
    and Download blob 4
        Sync->>RT: download_blob(vault/uuid4.blob)
        RT->>Cloud: rclone copyto vault/uuid4.blob
        Cloud-->>RT: uuid4.blob
        RT-->>Sync: staging/uuid4.blob
        Sync->>Sync: Verify BLAKE3
    end
    
    note over Sync: Repeat for next batch until all blobs downloaded
    Sync-->>User: pull complete (any failures reported)

Most file sharing works by trusting something: a shared password, a server that brokers access, or a platform that holds the keys on both ends. Arx Runa takes a different approach — it lets you share a file with someone so that only they can read it, and the cloud hosting the file cannot.

Your identity: a key pair, not an account

When you first run Arx Runa, it generates an X25519 key pair. This is your sharing identity. The private key lives in your encrypted vault, protected by the same password and USB key that guards everything else. Your public key is something you can hand to anyone — it contains no secret information.

There is no central server that stores or verifies identities. Arx Runa doesn't have accounts. Email addresses appear in the contacts list as human-readable labels, not as delivery addresses — Arx Runa never touches email infrastructure.

Exchanging public keys out-of-band

Before you can share a file with someone, you each need the other's public key. Arx Runa exports your public key as a small file or QR code. You send it to your contact via whatever channel you already trust — a message, an email, a USB stick. They import it, and do the same in reverse. This is a one-time setup per contact pair.

The security of this step depends on the channel you use. If an attacker controls that channel, they could substitute their own public key and intercept the share. To guard against this, Arx Runa displays a short fingerprint alongside each contact — the first 16 hex characters of the SHA-256 hash of their public key. A quick phone call to compare fingerprints is enough to confirm you have the real key.

Every file in Arx Runa has its own random 256-bit encryption key — the file_key. This key is what encrypts the file's chunks in the cloud. It is wrapped with your vault's key_encryption_key and stored in the encrypted manifest, so normally only you can use it.

When you share a file, Arx Runa does something precise: it takes that file's file_key and encrypts it for the recipient's public key using HPKE (RFC 9180). The ciphersuite is DHKEM(X25519, HKDF-SHA256) + HKDF-SHA256 + ChaCha20-Poly1305. Only the recipient's private key can open this envelope. Not the cloud. Not Arx Runa's servers. Not you, once it's sent.

The result is a share package — a small file (.vgshare) that contains:

The HPKE-encrypted envelope (which holds the file_key, the file name, chunk identifiers, and the cloud location)
Nothing else — no unencrypted key material, no file content

You deliver the share package the same way you exchanged public keys: out-of-band, through a channel of your choosing.

sequenceDiagram
    participant Owner as Owner (Arx Runa)
    participant Cloud as Cloud Storage
    participant Channel as Out-of-Band Channel
    participant Recipient as Recipient (Arx Runa)

    note over Owner,Recipient: Phase 0 #45;#45; Key Exchange (one-time setup)
    Owner->>Channel: Export X25519 public key (file or QR code)
    Channel->>Recipient: Deliver public key
    Recipient->>Owner: Export X25519 public key (file or QR code)
    Owner->>Channel: Deliver public key
    note over Owner,Recipient: Optional#58; compare key fingerprints to verify (MITM mitigation)

    note over Owner,Cloud: Phase 1 #45;#45; Share a File
    Owner->>Owner: Unwrap file_key from vault manifest
    Owner->>Owner: HPKE.Seal(recipient_pub_key, plaintext=file_key + metadata)
    Owner->>Cloud: Copy encrypted blobs to shared/[share_id]/
    Owner->>Channel: Export share package (.vgshare)
    Channel->>Recipient: Deliver share package

    note over Recipient,Cloud: Phase 2 #45;#45; Recipient Imports and Fetches
    Recipient->>Recipient: HPKE.Open(recipient_priv_key) #45;#62; file_key + metadata
    Recipient->>Cloud: Fetch encrypted blobs
    Cloud->>Recipient: Return encrypted blobs
    Recipient->>Recipient: Decrypt chunks with file_key #45;#62; reassemble file

    note over Owner,Cloud: Phase 3 #45;#45; Revocation (owner-initiated)
    Owner->>Cloud: Delete shared/[share_id]/

What the cloud hosts

To let the recipient download the file, Arx Runa copies the encrypted blobs into a separate folder in your cloud storage, under shared/<share_id>/. The cloud can see those blobs — the same opaque, fixed-size encrypted chunks that make up your normal vault. It cannot read the share package, which you deliver separately and out-of-band. It cannot read the file_key inside the package, because that is sealed to the recipient's public key.

The cloud sees ciphertext. The share package is the only thing that unlocks it, and the share package is only readable by the recipient.

Snapshot semantics

A share is a point-in-time snapshot. When you share a file, the share package contains the chunk identifiers for the file as it exists at that moment. If you edit the file later, the recipient's share still points to the original version. To give them the updated file, you create a new share.

This is a deliberate choice. A "live" share — where the recipient always sees your latest version — would require a different, more complex model. The snapshot approach keeps the cryptography simple and the boundaries clear.

Revocation and expiration

If the recipient has not yet fetched the blobs, you can revoke the share by deleting the shared/<share_id>/ folder from the cloud. The share package they hold becomes a pointer to nothing — access is cut without any re-encryption.

If they have already downloaded and decrypted the blobs, the data is on their machine. Cryptographic revocation of content that has left your control is not possible — this is honest, not a flaw, and it is the same limitation that applies to any file you share by any method. For a stronger guarantee after-the-fact, Arx Runa supports re-encrypting the file under a new key, which invalidates any future fetches from the old blobs.

Shares can also have an expiry date. When a share expires, Arx Runa automatically deletes the blobs from cloud on its next sync — no manual action required.

Deep Dives

In-depth technical explorations of specific design decisions in Arx Runa — covering cryptographic primitive choices, key recovery trade-offs, and padding strategies. Each document surveys alternatives, evaluates them against the zero-knowledge threat model, and records the rationale for what was chosen.

Cryptographic Primitive Rationale — Justification and alternative analysis for every cryptographic primitive in the design: XChaCha20-Poly1305, HKDF-SHA256, Argon2id, per-file key wrapping, BLAKE3 checksums, and ZeroizeOnDrop + Secret<T> memory protection.
File Sharing Cryptography — Cryptographic decisions for Phase 5 file sharing: HPKE (RFC 9180) over ad-hoc ECIES, X25519 curve confirmation, CTX-ChaCha20-Poly1305 as the committing AEAD, and simplification of the share package envelope.
Password and Key Recovery — Feasibility survey of every known vault recovery mechanism (recovery phrases, Shamir's SSS, SLIP-39 shares, trusted-contact key wrapping, platform biometrics, cloud escrow) evaluated against the zero-knowledge threat model.
Reducing Padding Overhead — Survey of all known techniques for reducing per-file padding waste: Padmé padding, tiered chunk sizes, smaller uniform chunk size, content-defined chunking (rejected — fingerprinting attacks), and epoch-based deferred batching.

Arx Runa: Cryptographic Primitive Rationale

Document type: Exploration / feasibility research Status: Draft Last updated: 2026-04-12

Justification and alternative analysis for every cryptographic primitive selected in the Arx Runa cryptographic-primitives design: XChaCha20-Poly1305 AEAD, HKDF-SHA256 key derivation, Argon2id password hashing, per-file random key generation and wrapping, BLAKE3 checksums, and the ZeroizeOnDrop + Secret<T> memory-protection stack.

For password and key recovery, see Password and Key Recovery.

The Problem

Arx Runa is a zero-knowledge, bring-your-own-cloud file encryption tool. Every cryptographic primitive must be justified against a threat model where the cloud provider is a potential adversary — they hold opaque blobs and nothing else. The design must be:

Correct: proven-secure constructions only, no custom cryptography
Implementation-safe: resistant to common implementation errors (nonce reuse, timing attacks, memory disclosure)
Auditable: well-documented in published standards (NIST, IETF RFC, IACR) with real-world prior art
Future-proof: upgrade paths documented when better alternatives exist or are maturing

This document justifies each primitive selected, presents alternatives that were considered and rejected, and provides authoritative sources for each claim.

AEAD Cipher: XChaCha20-Poly1305

Selected: XChaCha20-Poly1305 (192-bit nonce)

XChaCha20-Poly1305 is the extended-nonce variant of the ChaCha20-Poly1305 AEAD construction standardized in RFC 8439; the XChaCha nonce-extension construction itself is specified in draft-irtf-cfrg-xchacha-03. The "X" prefix extends the nonce from 96 bits to 192 bits by running an additional HChaCha20 subkey derivation step, making random nonce generation safe at any practical volume.

Alternatives Considered

Alternative	Why rejected
AES-256-GCM	Nonce reuse is catastrophic (auth key leaks); constant-time requires AES-NI hardware; shorter GCM nonce (96-bit) requires counter discipline
ChaCha20-Poly1305 (RFC 8439)	96-bit nonce is too short for random generation (birthday bound ~2^32 before collision risk); requires counter/state
AES-256-GCM-SIV (RFC 8452)	Nonce-misuse resistant, but maximum message length is 4 GiB and there is a multi-key safety limit; less library support in the Rust ecosystem
AEGIS-256	~2× higher throughput on AES-NI hardware, 256-bit nonce, ephemeral key erasure — but still in IETF CFRG draft (draft-irtf-cfrg-aegis-aead); no completed RFC or independent Rust crate audit available yet

Nonce Safety

draft-irtf-cfrg-xchacha-03 Section 3.1 states verbatim:

"Assuming a secure random number generator, random 192-bit nonces should experience a single collision (with probability 50%) after roughly 2^96 messages. A more conservative threshold (2^-32 chance of collision) still allows for 2^80 messages."

Applying the birthday bound formula (collision probability ≈ q² / 2¹⁹³):

After 2⁶⁴ encryptions: ~2⁻⁶⁵ collision probability (negligible)
After 2⁸⁰ encryptions: ~2⁻³³ collision probability (conservative safe threshold per the draft)
With 96-bit nonces: collision probability becomes non-negligible around 2³² encryptions

Bernstein (SKEW 2011) provides the underlying security proof that the HChaCha20 subkey derivation step makes the extended-nonce construction secure under the same assumptions as the base cipher.

For a personal vault encrypting thousands of chunks per file, the 192-bit nonce space is effectively unbounded.

AES-GCM Nonce Reuse Catastrophe

When AES-GCM reuses a nonce, it reuses the same CTR keystream, so an attacker who sees two ciphertexts under the same key and nonce can derive the XOR of the corresponding plaintexts. Reuse also breaks authentication: GCM uses GHASH, not Poly1305, with hash subkey H = AES_K(0^128), and repeated nonces give the attacker enough algebraic structure over GF(2^128) to recover H or otherwise forge valid tags after observing a small number of reused-nonce messages. This is why AES-GCM nonce reuse is considered catastrophic for both confidentiality and integrity. The 2016 "Nonce-Disrespecting Adversaries" paper (Bock et al., USENIX WOOT 2016) demonstrated this attack class against real TLS implementations.

ChaCha20-Poly1305 nonce reuse is also serious: it repeats the ChaCha20 keystream, leaking the XOR of plaintexts, and it reuses the Poly1305 one-time key for that nonce, which can enable message forgeries. XChaCha20-Poly1305 reduces the practical risk of accidental reuse by expanding the nonce space to 192 bits, making random nonce collisions negligible at Arx Runa's scale.

Key Non-Commitment

XChaCha20-Poly1305 is not a committing AEAD. It does not provide binding security (also called "key commitment" or "CMT-1 security"). In theory, it is possible to find two different keys that both authenticate the same ciphertext (a "multi-key" or "partition oracle" attack). For symmetric file encryption with a single key per file, this is not a practical threat. Phase 5 file sharing addresses this by using HPKE with CTX-ChaCha20-Poly1305 — a CMT-4 committing AEAD — for share package encryption. See docs/research/file-sharing-cryptography.md.

Upgrade Path

AEGIS-256 (IETF CFRG draft-irtf-cfrg-aegis-aead) is the leading candidate for a future upgrade:

~0.7 cycles/byte vs ~1.5 cycles/byte for XChaCha20-Poly1305 on AES-NI hardware
256-bit nonce (safe for random generation)
Ephemeral key erasure before data processing (forward secrecy property)
Committing AEAD variant (AEGIS-256-MAC) under development

The wire format and API surface would remain identical; only the cipher primitive changes.

Key Derivation: HKDF-SHA256

Selected: HKDF-SHA256 (RFC 5869)

HKDF (HMAC-based Key Derivation Function) is a two-step construction: an Extract step that produces a uniform pseudorandom key (PRK) from the IKM, and an Expand step that stretches PRK into multiple derived keys using domain-separated info strings.

In Arx Runa, Argon2id already produces a high-entropy 32-byte master_key, so HKDF is used in Expand-only mode (the salt acts as a domain separator, not for entropy extraction).

Key Derivation Tree

master_key (Argon2id output, 32 bytes)
    │
    ├── HKDF-SHA256(info = "arx-runa-key-encryption")  → key_encryption_key (32 bytes)
    ├── HKDF-SHA256(info = "arx-runa-sqlcipher")       → sqlcipher_key (32 bytes)
    └── HKDF-SHA256(info = "arx-runa-manifest-backup") → manifest_key (32 bytes)

Salt: Fixed domain separator b"arx-runa-v1" — provides application-identity binding even though Argon2id output already has full entropy.

Alternatives Considered

Alternative	Why rejected
BLAKE3-derive_key	Fast, elegant, but less widely audited for key derivation specifically; HKDF-SHA256 has broader standards recognition (NIST SP 800-56C, RFC 5869)
SP 800-108 KBKDF (Counter Mode)	Counter-based KDF; more complex implementation; primarily used in FIPS contexts where HKDF is not acceptable
Direct SHA-256 truncation	Not a KDF — lacks domain separation; XOR of outputs is trivially related to the master key
Repeat Argon2id calls	Prohibitively expensive for vault-open latency; Argon2id is designed for password stretching, not fast key expansion

Why SHA-256, not SHA-3 or BLAKE2?

HKDF is defined over HMAC-SHA2. SHA-256 is the baseline:

Standardized: NIST FIPS 180-4
Proven security reduction: HMAC security relies on PRF assumption of SHA-256, which is well-studied
Widely used: TLS 1.3 (RFC 8446), Signal Protocol, WireGuard all use HKDF-SHA256
SHA-3 (Keccak) would also be fine but offers no practical advantage here and has less Rust ecosystem history in this role

Info String Domain Separation

Each derived key uses a distinct info string. HKDF's Expand step is a PRF keyed by PRK, so outputs for distinct info values are computationally independent. Knowing key_encryption_key gives no information about sqlcipher_key under standard HMAC assumptions.

Extensibility

New keys are added by expanding with a new info string. Existing keys are unaffected — HKDF outputs are independent by construction. This is used in Phase 5 (file sharing) which adds a separate derivation tree using ECDH shared secrets as IKM, documented separately.

Password Hashing: Argon2id

Selected: Argon2id (RFC 9106)

Argon2id is the winner of the Password Hashing Competition (PHC, 2015) and is the current OWASP, NIST SP 800-63B, and RFC 9106 recommendation for password-based key derivation.

The "id" variant combines:

Argon2i: data-independent memory access pattern (side-channel resistant)
Argon2d: data-dependent memory access (GPU/ASIC resistant)

Argon2id uses data-independent access for the first pass (protecting against side-channel attacks from co-located processes) and data-dependent for subsequent passes (making GPU/ASIC optimization expensive).

Recommended Parameters

RFC 9106 Section 4 states two recommended parameter sets. The second (higher-security) option is verbatim:

"t=3 iterations, p=4 lanes, m=2^16 (64 MiB of RAM), 128-bit salt, 256-bit tag size."

Arx Runa uses this set: 64 MiB memory, 3 iterations, parallelism 4.

This is not an arbitrary choice — it is the IETF standard recommendation for non-interactive or high-security contexts. OWASP also recommends these parameters as the upper tier of interactive desktop authentication.

Alternatives Considered

Alternative	Why rejected
bcrypt	Maximum 72-byte password limit; no memory hardness; output entropy limited to 184 bits; not suitable for key derivation
scrypt	Predecessor to Argon2; time-memory trade-off is less favorable; RFC 7914 but not recommended by OWASP for new designs
PBKDF2-SHA256	No memory hardness; GPU-parallelizable; NIST still recommends for FIPS contexts but Argon2id is strictly superior for key derivation
Balloon Hashing	NIST 800-63B mentions it as a future option; not yet in RFC; less library support

Output

Argon2id produces a 32-byte (256-bit) master_key. This is the only point where a user-controlled secret (password + optional recovery phrase) is converted to key material.

Salt

The Argon2id salt is a random 32-byte value stored in the vault header (plaintext). This is the standard design: the salt prevents pre-computation attacks (rainbow tables) but has no secrecy requirement. Without the password, the salt provides no advantage to an attacker. 32 bytes exceeds the NIST SP 800-132 minimum of 128 bits and is consistent with the 256-bit security level used throughout Arx Runa.

Per-File Key Generation and Wrapping

Selected: Per-file random 256-bit key, wrapped with key_encryption_key

Each file is encrypted with a unique file_key generated from CSPRNG. The file_key is never stored in plaintext — it is wrapped (encrypted) using key_encryption_key (XChaCha20-Poly1305) and stored in the manifest. To decrypt a file, the vault must be unlocked (deriving key_encryption_key), then the file_key is unwrapped just-in-time.

Key Hierarchy

master_key
    └── key_encryption_key (HKDF)
            └── file_key_1 (random, wrapped)
            └── file_key_2 (random, wrapped)
            └── ...

Alternatives Considered

Alternative	Why rejected
Single vault-wide chunk key	Compromise of one file's key would compromise all files; no key rotation granularity
Derive file_key from master_key + file_id (deterministic)	Deterministic derivation means key rotation requires re-encrypting all files; also, if HKDF info is guessable, the file_key is computable without the wrapped blob
Per-user key + per-file nonce/tweak	Less standard; AEAD already provides per-operation randomness via nonce; this mixes tweak and key concepts

Key Isolation Properties

NIST SP 800-57 Part 1 Rev. 5 Section 6.2 defines the purpose of a Key Encryption Key (KEK) hierarchy: "keys used to encrypt other keys" with the explicit goal of limited exposure — a data-encrypting key (DEK) compromise affects only the data it protects, not other DEKs. In Arx Runa: key_encryption_key is the KEK; each file_key is a DEK.

Per-file random keys therefore provide:

Limited exposure (NIST SP 800-57 §6.2): compromising one file_key exposes only that file's data — other files remain protected under independent keys
Key rotation: individual files can be re-encrypted by generating a new file_key without touching other files or the key_encryption_key
Sharing: Phase 5 file sharing includes a single file_key inside an HPKE share package envelope — the vault's key_encryption_key is never shared

LUKS (Linux Unified Key Setup) and Linux fscrypt implement the same pattern as production precedents: per-volume or per-file keys independently wrapped by passphrase-derived keys, so that passphrase compromise does not automatically compromise data keys.

Wire Format

Wrapped file key: [24-byte nonce | 32-byte encrypted file_key | 16-byte Poly1305 tag] = 72 bytes

The wrapping uses XChaCha20-Poly1305 with empty AAD (the wrapped key is self-contained — its identity comes from where it is stored in the manifest, which is itself authenticated by SQLCipher). Recovery slot wrapping uses non-empty AAD (b"arx-runa recovery v1" || vault_id_bytes) to bind the blob to a specific vault.

BLAKE3 Checksums

Selected: BLAKE3 (unkeyed, over encrypted blob)

BLAKE3 checksums are computed over the encrypted blob (not plaintext) and stored in the SQLCipher manifest. Before decryption, the checksum is verified via the VerifiedBlob newtype — a type-system enforced check that makes skipping verification a compile error.

What the Checksum Provides

BLAKE3 here is a corruption detection check, not an authentication check. The AEAD tag already provides INT-CTXT security (ciphertext integrity) — an adversary cannot produce a new valid ciphertext without the key (Bellare & Namprempre 2000). BLAKE3 provides:

Fast pre-decryption corruption detection (hardware errors, network corruption, partial downloads)
Actionable error messages: "blob is corrupt" vs "authentication failed" are different failure modes
Early failure before the more expensive AEAD operation

Krawczyk (CRYPTO 2001) proves that checking integrity over the ciphertext before decryption — Encrypt-then-MAC — is the only generically secure ordering. BLAKE3 follows this ordering.

Why Unkeyed?

The literature on Encrypt-then-MAC (Krawczyk 2001, Bellare-Namprempre 2000) requires a keyed MAC to provide INT-CTXT. BLAKE3 here is unkeyed — this is a deliberate scope reduction, justified by the specific trust model:

The AEAD tag already provides INT-CTXT. An adversary cannot forge a new valid ciphertext without the file_key. BLAKE3 does not need to provide authentication — it only needs to detect accidental corruption.
The hash is stored inside SQLCipher. SQLCipher encrypts and authenticates the entire manifest database. An adversary who can modify a cloud blob cannot also silently modify the corresponding BLAKE3 hash in the manifest — the SQLCipher authentication tag would fail. The hash is therefore protected from adversarial manipulation by an independent authenticated channel.
Corruption detection only requires collision resistance, not a keyed MAC. BLAKE3 is collision-resistant — accidentally corrupted data will produce a different hash with overwhelming probability. This is sufficient for the stated purpose.

This is a design argument, not a directly citable theorem. The claim is: when a hash is stored in an independently authenticated channel (SQLCipher) and is only used for corruption detection rather than authentication, an unkeyed collision-resistant hash is operationally sufficient.

BLAKE3 vs Alternatives

Alternative	Trade-offs
SHA-256	~3-4× slower on modern hardware; well-standardized (FIPS 180-4) but no advantage here
SHA-3 (Keccak-256)	Slower than BLAKE3; no advantage; primarily useful for FIPS 202 compliance
BLAKE2b	Predecessor to BLAKE3; BLAKE3 is strictly faster and has a simpler streaming API
CRC32/Adler32	Not cryptographic; trivially forgeable; only useful for hardware error detection
xxHash	Non-cryptographic; fast; not collision-resistant against adversarial input

VerifiedBlob Type Safety

The VerifiedBlob newtype is a zero-cost mechanism:

#![allow(unused)]
fn main() {
pub struct VerifiedBlob(Vec<u8>);  // opaque — only constructible by verify_checksum
pub fn decrypt_chunk(blob: VerifiedBlob, ...) -> Result<Vec<u8>, CryptoError>;
}

This enforces the check-before-decrypt order at compile time, making it impossible to call decrypt_chunk on an unverified blob.

Memory Protection: ZeroizeOnDrop + Secret<T>

Selected: `zeroize` crate (ZeroizeOnDrop) + `secrecy` crate (Secret<T>)

All key types in Arx Runa implement ZeroizeOnDrop, which overwrites the backing memory with zeros when the value is dropped. The Secret<T> wrapper (from the secrecy crate) prevents accidental logging or debug-printing of sensitive data.

Why Explicit Zeroization is Necessary

In Rust, the compiler is free to elide "dead stores" — writes to memory that are never subsequently read before the memory is freed. A naive let mut key = [0u8; 32]; key.copy_from_slice(key_material); followed by key.fill(0) may have the zeroing optimized away. The zeroize crate uses core::ptr::write_volatile (or OS-level APIs on supported platforms) to ensure the zeroing is not elided.

`Secret<T>` Benefits

Implements Debug as Secret([REDACTED]), preventing key material from appearing in logs
Does not implement Display, Serialize, or Clone by default
Forces the programmer to explicitly call .expose_secret() when the value is needed, making accidental exposure visible in code review

Alternatives Considered

Alternative	Why rejected
Manual `ptr::write_volatile`	Correct but verbose; easy to forget; `zeroize` is the standard Rust approach
OS-level `SecureZeroMemory` / `explicit_bzero`	Platform-specific; `zeroize` wraps these when available
Garbage-collected languages	GC languages cannot guarantee when (or if) memory is zeroed — sensitive data may linger in the heap
`memsec` crate	Provides `mlock` + zeroize; considered but `zeroize` + `secrecy` is more composable

mlock / VirtualLock

The design notes that session keys are held in mlocked memory (preventing swap-out to disk). This is a separate concern from zeroization:

Zeroization (zeroize): ensures key bytes are overwritten when Rust drops the value
mlock: prevents the OS from paging the memory to disk while the session is active

Cold boot attacks (reading DRAM after power-off) and compromised OS kernels are out of scope.

Recommendation

All six primitives in the Arx Runa cryptographic design are well-justified:

XChaCha20-Poly1305 is the correct choice for a random-nonce AEAD in a system that cannot manage counter state. The 192-bit nonce eliminates birthday-bound concerns at any practical volume. The primary future consideration is AEGIS-256 once it reaches RFC status.
HKDF-SHA256 is the standard and correct choice for key expansion from high-entropy material. The use of distinct info strings provides cryptographic domain separation between all derived keys.
Argon2id is the current gold standard for password-based key derivation, with RFC 9106, OWASP, and NIST SP 800-63B all recommending it. The OWASP minimum parameters are appropriate for interactive vault unlock.
Per-file random keys with key_encryption_key wrapping is the correct architecture for limiting blast radius and enabling future key rotation and file sharing. It follows the same pattern used by commercial encrypted storage systems (e.g., LUKS, VeraCrypt, age).
BLAKE3 checksums over encrypted blobs are correct for fast pre-decryption corruption detection. Unkeyed is operationally sufficient because the hash is stored inside a SQLCipher-encrypted manifest. The VerifiedBlob newtype provides a compile-time safety guarantee.
ZeroizeOnDrop + Secret<T> is the Rust-ecosystem standard for sensitive key material. It addresses both the compiler-elision problem (volatile writes) and the accidental-logging problem (Debug redaction).

No changes to the design are recommended based on this research. The one open consideration is monitoring AEGIS-256 for RFC completion as a future upgrade to XChaCha20-Poly1305.

Decisions

Choices made during this research session. Updated as the session progresses.

Decision	Alternatives considered	Rationale
Argon2id parameters: 64 MiB, 3 iterations, parallelism 4	OWASP minimum (19 MiB / 2 / 1); 1Password-tier (650 MiB / 3 / 4)	OWASP recommended tier; matches Bitwarden and KeePassXC; ~300–500 ms on modern desktop; significantly stronger against GPU attackers than the minimum
Phase 5 file sharing uses CTX-ChaCha20-Poly1305 as committing AEAD	AES-GCM-SIV (not committing), AEGIS-256-MAC (draft only), UtC prefix	XChaCha20-Poly1305 is non-committing — partition oracle attacks are theoretically possible in multi-key public-key envelope contexts (original ad-hoc ECIES draft, now HPKE envelope); CTX construction (Chan & Rogaway, IACR 2022) replaces Poly1305 tag with BLAKE3 commitment, achieving CMT-4 security; decided in file-sharing-cryptography research

Open Questions

AEGIS-256 readiness: When does draft-irtf-cfrg-aegis-aead reach RFC status? When will an independent Rust crate audit be available? These are the two gates before it can replace XChaCha20-Poly1305.
Key commitment cipher: Resolved — CTX-ChaCha20-Poly1305 selected for Phase 5 HPKE share packages. See docs/research/file-sharing-cryptography.md.

Sources

Source	Topic	URL
RFC 8439 — ChaCha20 and Poly1305 for IETF Protocols (2018)	ChaCha20-Poly1305 specification	https://www.rfc-editor.org/rfc/rfc8439
draft-irtf-cfrg-xchacha-03 — XChaCha: eXtended-nonce ChaCha and AEAD_XChaCha20_Poly1305 (Arciszewski, 2020)	XChaCha20 specification; Section 3.1 contains verbatim birthday bound figures	https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-xchacha-03
Bernstein — "Extending the Salsa20 nonce" (SKEW 2011)	Security proof that extended-nonce construction is secure under base cipher assumptions	http://cr.yp.to/snuffle/xsalsa-20110204.pdf
NIST SP 800-38D — Recommendation for Block Cipher Modes: GCM (2007)	AES-GCM specification and nonce requirements	https://csrc.nist.gov/pubs/sp/800/38/d/final
RFC 8452 — AES-GCM-SIV (2019)	Nonce-misuse resistant AEAD	https://www.rfc-editor.org/rfc/rfc8452
draft-irtf-cfrg-aegis-aead — The AEGIS Family of Authenticated Encryption Algorithms	AEGIS-256 candidate for future upgrade	https://datatracker.ietf.org/doc/draft-irtf-cfrg-aegis-aead/
Böck, Zauner, Devlin, Somorovsky, Jovanovic — "Nonce-Disrespecting Adversaries: Practical Forgery Attacks on GCM in TLS" (USENIX WOOT 2016)	AES-GCM nonce reuse catastrophe demonstrated against real TLS implementations	https://eprint.iacr.org/2016/475
Chan & Rogaway — "On Committing Authenticated Encryption" (IACR 2022)	AEAD key non-commitment; committing AE (cAE) framework and CTX construction	https://eprint.iacr.org/2022/1260
RFC 5869 — HMAC-based Key Derivation Function (HKDF) (2010)	HKDF specification	https://www.rfc-editor.org/rfc/rfc5869
NIST SP 800-56C Rev 2 — Two-Step Key Derivation (2020)	HKDF as NIST-approved KDF	https://csrc.nist.gov/publications/detail/sp/800-56c/rev-2/final
RFC 8446 — TLS 1.3 (2018)	HKDF-SHA256 use in production protocol	https://www.rfc-editor.org/rfc/rfc8446
NIST FIPS 180-4 — Secure Hash Standard (SHA-2) (2015)	SHA-256 specification	https://csrc.nist.gov/pubs/fips/180-4/upd1/final
RFC 9106 — Argon2 Memory-Hard Function (Biryukov, Dinu, Khovratovich, Josefsson; 2021)	Argon2id specification; Section 4 verbatim recommends 64 MiB / t=3 / p=4	https://www.rfc-editor.org/rfc/rfc9106
Biryukov, Dinu, Khovratovich — "Argon2: New Generation of Memory-Hard Functions" (IEEE EuroS&P 2016)	Theoretical AT product analysis; tradeoff-attack reduction factor ≤ 1.33× for Argon2id	https://ieeexplore.ieee.org/document/7467361
OWASP Password Storage Cheat Sheet (2024)	Argon2id recommended parameters for interactive applications	https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html
NIST SP 800-63B — Digital Identity Guidelines (2017, updated 2024)	Password-based authentication and KDF recommendations	https://pages.nist.gov/800-63-4/sp800-63b.html
RFC 7914 — scrypt (2016)	scrypt specification (rejected alternative to Argon2id)	https://www.rfc-editor.org/rfc/rfc7914
NIST SP 800-57 Part 1 Rev. 5 — Recommendation for Key Management (2020)	Section 6.2 defines DEK/KEK hierarchy and "limited exposure" rationale for key wrapping	https://csrc.nist.gov/publications/detail/sp/800-57-part-1/rev-5/final
LUKS On-Disk Format Specification (Fruhwirth et al.; v1.2.3 / LUKS2 v1.1.4)	Production precedent: per-keyslot key wrapping so passphrase compromise does not compromise volume key	https://gitlab.com/cryptsetup/LUKS2-docs
Linux fscrypt documentation (kernel.org)	Production precedent: per-file key derivation for cryptographic isolation in filesystems	https://docs.kernel.org/filesystems/fscrypt.html
Krawczyk — "The Order of Encryption and Authentication for Protecting Communications" (CRYPTO 2001)	Proves Encrypt-then-MAC is the only generically secure ordering; INT-CTXT via check-before-decrypt	https://eprint.iacr.org/2001/045
Bellare & Namprempre — "Authenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm" (ASIACRYPT 2000 / JoC 2008)	Defines INT-CTXT; proves Encrypt-then-MAC achieves it; foundational for AEAD ordering	https://eprint.iacr.org/2000/025
O'Brien & Paterson — "Security of Symmetric Encryption against Mass Surveillance" (IACR 2013)	Key isolation properties under mass surveillance threat model	https://eprint.iacr.org/2013/130
BLAKE3 — "BLAKE3: One Function, Fast Everywhere" (O'Connor, Aumasson, Neves, Wilcox-O'Hearn; 2020)	BLAKE3 design, performance, and security properties	https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf
Aumasson et al. — BLAKE2 specification (2012)	BLAKE2 predecessor to BLAKE3	https://www.blake2.net/blake2.pdf
zeroize crate documentation	Compiler-resistant memory zeroing via volatile writes in Rust	https://docs.rs/zeroize
secrecy crate documentation	Secret<T> / SecretBox wrapper with ExposeSecret trait; Debug redaction	https://docs.rs/secrecy

Document type: Exploration / feasibility research Status: Concluded Last updated: 2026-04-12

Justification and alternative analysis for the cryptographic decisions in the Arx Runa Phase 5 file-sharing design: ECIES variant selection (historical draft), elliptic curve choice (X25519 vs P-256), KDF construction inside ECIES (historical draft), and committing AEAD selection (mandated by Phase 1 primitive research).

For background on the cryptographic primitives used throughout Arx Runa, see Cryptographic Primitive Rationale.

The Problem

The original Phase 5 draft encrypted share packages using ad-hoc ECIES: the sender performed an ephemeral ECDH with the recipient's long-term X25519 public key, derived a symmetric key via HKDF, and encrypted the share package payload with XChaCha20-Poly1305. This research evaluated that draft and selected HPKE (RFC 9180) + CTX-ChaCha20-Poly1305 for the canonical design.

The design raises four cryptographic questions that require principled justification:

ECIES variant: Which ECIES construction (ISO 18033-2, SEC1, HPKE RFC 9180, or ad-hoc ECIES-KEM+DEM) should be used? Each has different security properties and library support.
Curve selection: X25519 (Curve25519 in ECDH mode) or P-256 (NIST curve)? The design specifies X25519 but the rationale needs to be documented.
KDF inside ECIES: HKDF-SHA256 is used with info="arx-runa-share". What are the alternatives and is this the right choice?
Committing AEAD: The cryptographic-primitive-rationale.md explicitly mandates a committing AEAD for Phase 5 (to defend against partition oracle attacks in the multi-key context of ECIES). XChaCha20-Poly1305 is non-committing. What cipher should replace or supplement it?

ECIES Variant Selection

The original Arx Runa draft used ad-hoc ECIES: a custom composition of X25519 ECDH + HKDF + XChaCha20-Poly1305. The primary alternative evaluated here is HPKE (RFC 9180).

Ad-Hoc ECIES (original draft)

The current construction — ephemeral X25519 → ECDH → HKDF-SHA256 → XChaCha20-Poly1305 — matches the construction used by the age encryption tool (Filippo Valsorda, 2019). The age tool is widely adopted and well-regarded; its X25519 recipient type uses:

shared_secret = ECDH(ephemeral_private, recipient_public)
wrap_key = HKDF-SHA256(IKM=shared_secret, salt=ephemeral_pk||recipient_pk, info="age-encryption.org/v1/X25519")
ciphertext = ChaCha20-Poly1305.Seal(wrap_key, file_key)

Notable difference from the current Arx Runa spec: age includes both the ephemeral and recipient public keys in the HKDF salt, providing explicit key binding. The Arx Runa design uses only the ephemeral public key as the salt.

HPKE (RFC 9180) — the modern alternative

HPKE (Hybrid Public Key Encryption, RFC 9180, published 2022) is the IETF CFRG standardization of exactly this pattern. Key improvements over ad-hoc ECIES:

Property	Ad-hoc ECIES	HPKE (RFC 9180)
Formal IND-CCA2 proof	No (security relies on informal analysis)	Yes
Includes both public keys in key schedule	Only if explicitly added (age does this, current Arx Runa spec does not)	Yes — all public keys always included
Prevents KEM malleability	Not guaranteed	Formally proven
AEAD agility	Manual	Built-in: KEM/KDF/AEAD are modular
Test vectors	No	Yes (RFC 9180 Appendix A)
Adoption	age, Noise Protocol variants	TLS Encrypted Client Hello (ECH), ODoH, DAP/PPM
Rust ecosystem	`x25519-dalek` + `hkdf` + `chacha20poly1305`	`hpke` crate v0.13.0 (~4M downloads)
Formal analysis	None specific to this composition	Badertscher et al. 2021 (analysis of HPKE security)

HPKE does not natively address key commitment — it defers to the AEAD layer. If a committing AEAD is needed, it must be plugged in as the AEAD component.

Elliptic Curve: X25519 vs P-256

Both curves provide ~128-bit security (equivalent to AES-128, RSA-3072). The design specifies X25519; the rationale is documented here.

X25519 (Curve25519)

SafeCurves compliant (Bernstein & Lange, 2013 / updated 2024): meets conservative security criteria that P-256 fails on several axes
Constant-time by construction: the Montgomery ladder scalar multiplication is naturally constant-time; timing side-channel requires deliberate effort to introduce
Cofactor clamping: cofactor h=8 is handled by "clamping" the scalar (RFC 7748 §5), eliminating small-subgroup attacks without requiring the implementation to check point order
No twist attacks in practice: cofactor clamping and the protocol design prevent low-order point injection
No patents: Bernstein explicitly dedicated the curve to the public domain
Widely deployed: TLS 1.3 (RFC 8446), WireGuard, Signal Protocol, SSH, age, Noise Protocol

P-256 (prime256v1 / secp256r1)

NIST FIPS 186-5 standard. Key issues relative to X25519:

Implementation pitfalls: historically vulnerable to timing side-channels due to incomplete addition formulas in Weierstrass form; constant-time P-256 requires explicit engineering
SafeCurves: P-256 fails several SafeCurves criteria (twist security, completeness)
Cofactor h=1: advantage (no small subgroup issue) but the implementation complexity is higher
FIPS compliance: the only advantage — required in some government/regulated contexts

For Arx Runa (zero-knowledge personal vault, no FIPS requirement), X25519 is the correct choice.

Comparison table

Property	X25519	P-256
Security level	~128-bit	~128-bit
SafeCurves	Yes	No
Constant-time by construction	Yes	No — requires explicit effort
Cofactor handling	Automatic via clamping (RFC 7748)	h=1, no issue
Patent status	Public domain	NIST (no known patents, but NIST origins)
FIPS compliance	No (RFC 7748 only)	Yes (FIPS 186-5)
Rust ecosystem	`x25519-dalek` (audited by NCC Group)	`p256` (RustCrypto)
Used by	TLS 1.3, WireGuard, Signal, age, SSH	TLS, ECDSA certificates, FIDO2

KDF Inside ECIES

The current design: HKDF-SHA256(shared_secret, salt=ephemeral_public_key, info="arx-runa-share")

Issue: Missing recipient public key in salt

The age tool and HPKE both include both public keys (ephemeral and recipient long-term) in the HKDF salt:

age:  salt = ephemeral_pk || recipient_pk
HPKE: labeled KDF includes both keys in the "kem_context" via the key schedule

Including the recipient's long-term public key provides explicit key binding: the derived symmetric key is cryptographically bound to the specific recipient. Without it, an attacker who can construct a different X25519 keypair where the ECDH output is the same (practically infeasible, but theoretically unsound) could substitute keys without detection.

This is a soundness improvement, not a practical vulnerability in the current design (the ECDH shared secret already implicitly depends on the recipient's public key). But explicit binding matches the construction in age and HPKE, and is the correct practice.

Recommended fix: change the HKDF salt from ephemeral_public_key to ephemeral_public_key || recipient_public_key, matching the age construction.

HKDF-SHA256: correct choice

HKDF-SHA256 (RFC 5869) is the right KDF for this context:

Standardized and well-analyzed
Used by age, HPKE, TLS 1.3, Signal, WireGuard — all with X25519 ECDH as IKM
The info="arx-runa-share" string provides application-identity domain separation
Output is independent of any other HKDF derivation in the key tree (different IKM, different info)

No alternative KDF is justified here.

Committing AEAD Selection

The mandate

cryptographic-primitive-rationale.md explicitly mandates a committing AEAD for Phase 5. XChaCha20-Poly1305 is not committing (CMT-1 insecure).

Why it matters: partition oracle attacks

Len, Grubbs, Ristenpart — "Partitioning Oracle Attacks" (USENIX Security 2021) demonstrated that non-committing AEADs enable partitioning oracle attacks: an adversary can construct a single ciphertext that decrypts successfully under multiple keys, then use a decryption oracle to determine which key a target holds. Confirmed vulnerable: AES-GCM, ChaCha20-Poly1305, XSalsa20-Poly1305.

In the context of ECIES for file sharing: each share package uses a fresh ECDH-derived key per recipient, so the multi-key scenario arises when the same package is sent to multiple recipients (or more relevantly: when an attacker can query an oracle that tries many keys against a single ciphertext). The threat is lower than in systems with a single static key shared across many users, but the mandate stands because the file_key inside the package is a real secret that could be targeted.

Key commitment constructions

Construction	Source	How it works	Cost	Ciphertext size change
CTX	Chan & Rogaway, IACR 2022	Replace AEAD tag with `H(key \|\| nonce \|\| ciphertext)`	One hash over short input	None (same tag size)
UtC (prepend commitment)	Bellare & Hoang, EUROCRYPT 2022	Prepend `H(key)` to ciphertext, then AEAD	One short hash	+32 bytes commitment
HtE	Bellare & Hoang 2022	Hash key+message, use result to re-key before encrypt	One hash over plaintext	None
AES-GCM-SIV	RFC 8452	Nonce-misuse resistant, but NOT committing	—	Requires AES, not native committing
AEGIS-256-MAC	IETF draft	Purpose-built committing AEAD	—	Not yet RFC; no Rust audit

Key finding: AES-GCM-SIV is nonce-misuse resistant but is not a committing AEAD (GCM's GHASH authentication is not collision-resistant under multi-key). The search results and USENIX paper both confirm GCM and GCM-SIV are CMT-1 insecure.

Practical recommendation path

The simplest production-ready approach for Arx Runa:

Option A — CTX construction on top of XChaCha20-Poly1305 Replace the 16-byte Poly1305 authentication tag with a 32-byte commitment tag:

commitment = BLAKE3(b"arx-runa-commitment-v1" || key || nonce || ciphertext)
wire: [ephemeral_pk | nonce | ciphertext | commitment(32B)]

This achieves CMT-1 (key-committing) and CMT-4 (full commitment) security per the CTX paper. Cost: one BLAKE3 call over a short string. Wire format changes: tag size increases from 16 to 32 bytes.

Option B — Prepend key commitment prefix (UtC-style) Prepend BLAKE3(b"arx-runa-key-commit-v1" || key) (32 bytes) before the AEAD ciphertext:

wire: [ephemeral_pk | nonce | key_commitment(32B) | ciphertext | Poly1305 tag(16B)]

Slightly larger (extra 32 bytes), but allows separate verification of key commitment and ciphertext integrity without re-implementing AEAD internals.

Option C — Migrate to HPKE + committing AEAD Use HPKE (RFC 9180) with a future committing AEAD (e.g., AEGIS-256-MAC when it reaches RFC). This is the most future-proof path but depends on AEGIS-256 standardization.

Key Commitment and Partition Oracle Attacks

Why ECIES + non-committing AEAD is specifically risky

In ECIES, the AEAD key is derived from an ephemeral ECDH. If the AEAD is not committing:

Attacker can construct a ciphertext C that decrypts under key K₁ to malicious content, and under key K₂ to benign content
In a file-sharing context: attacker delivers such a C to recipient; depending on which key the recipient uses, they get different content
The partition oracle risk is real if the system re-uses or exposes the ECDH-derived key for multiple operations (in Arx Runa: the outer envelope key is also used to decrypt file_key_wrapped inside the package — this is a two-application use of the same key)

Arx Runa-specific concern: double use of the derived key

The current design uses the ECDH-derived symmetric key for two purposes:

Encrypting the outer envelope (JSON payload)
Separately encrypting file_key_wrapped inside the envelope

Using the same key for two different ciphertexts creates a key commitment dependency: if the outer AEAD decrypts successfully, you cannot assume the file_key_wrapped tag is also valid under a different key. A committing AEAD on the outer envelope resolves this — once the outer envelope verifies, the key is bound.

Alternative recommendation: include file_key directly in the ECIES-encrypted JSON payload rather than separately re-encrypting it. This eliminates the redundant encryption and simplifies the construction. The outer AEAD already provides confidentiality and integrity for everything inside the envelope.

Recommendation

1. Migrate from ad-hoc ECIES to HPKE (RFC 9180)

Use DHKEM(X25519, HKDF-SHA256) + HKDF-SHA256 + CTX-ChaCha20-Poly1305 as the HPKE ciphersuite.

The current ad-hoc ECIES construction is functional but lacks a formal security proof and requires manual discipline to keep correct. HPKE (RFC 9180) standardizes exactly this pattern with an IND-CCA2 proof, automatic inclusion of both public keys in the key schedule, test vectors, and a modular design that makes algorithm agility straightforward. The hpke crate (v0.13.0, ~4M downloads) provides a production-quality Rust implementation.

Impact on existing cryptographic-primitives design: none. HPKE is additive — a new Phase 5 module only. The vault encryption stack (XChaCha20-Poly1305, HKDF from master_key, Argon2id, BLAKE3, ZeroizeOnDrop) is untouched. Note that the AEAD inside HPKE is ChaCha20-Poly1305 (96-bit nonce managed by HPKE), not XChaCha20-Poly1305 — this is correct and intentional.

2. CTX construction for key commitment

Wrap ChaCha20-Poly1305 inside a CTX layer as the AEAD component of HPKE:

tag = BLAKE3(b"arx-runa-ctx-v1" || key || nonce || ciphertext)
wire: [ciphertext | tag(32B)]

The Poly1305 tag is replaced with a 32-byte BLAKE3 commitment. This achieves CMT-4 security (the strongest committing AEAD notion) per Chan & Rogaway (IACR 2022). Cost: one BLAKE3 call over a short input, constant in message length. Wire format: tag grows from 16 to 32 bytes — negligible for share packages.

This is implemented as a thin wrapper type (CtxChaCha20Poly1305) in the sharing crypto module, not a change to the existing src-tauri/src/crypto/ module.

3. Wire format

With HPKE one-shot mode, the new share package wire format is:

[enc(32B) | ciphertext | ctx_tag(32B)]

Where enc is the ephemeral public key output by HPKE's KEM. The 24-byte explicit nonce from the current design disappears — HPKE manages the nonce internally. Net wire format change: −24 bytes (nonce removed) + 16 bytes (tag grows from 16 → 32) = −8 bytes.

4. `file_key` directly inside the envelope

Replace file_key_wrapped with file_key as raw bytes in the JSON payload:

{
  "share_id": "...",
  "file_id": "...",
  "file_name": "report.pdf",
  "file_key": "<32 bytes, base64>",
  "chunk_count": 12,
  "chunk_size": 4194304,
  "chunk_uuids": ["..."],
  "cloud_endpoint": { ... },
  "expires_at": null
}

The HPKE outer envelope (with CTX-ChaCha20-Poly1305) provides all confidentiality and integrity for file_key. The previous file_key_wrapped was doubly encrypted with the same derived key, which is redundant and required a second nonce with no clear construction. The "wrapped" terminology is reserved for KEK-based wrapping in the vault.

5. Curve: X25519 confirmed

No change. X25519 is SafeCurves-compliant, constant-time by construction, patent-free, and the natural pairing for HPKE's DHKEM(X25519, HKDF-SHA256) ciphersuite.

Summary

Aspect	Current design	Recommended
Outer construction	Ad-hoc ECIES	HPKE RFC 9180
KEM	Manual X25519 + HKDF	DHKEM(X25519, HKDF-SHA256)
KDF	HKDF-SHA256 (salt = ephemeral_pk only)	HPKE key schedule (includes both public keys automatically)
AEAD	XChaCha20-Poly1305 (non-committing)	CTX-ChaCha20-Poly1305 (CMT-4 committing)
Key in envelope	`file_key_wrapped` (double-encrypted)	`file_key` (raw, inside HPKE-protected JSON)
Wire format	`[epk(32) \| nonce(24) \| ct \| tag(16)]`	`[enc(32) \| ct \| ctx_tag(32)]`

Decisions

Choices made during this research session. Updated as the session progresses.

Decision	Alternatives considered	Rationale
ECIES construction: HPKE (RFC 9180)	Ad-hoc ECIES (same as `age` tool)	Formal IND-CCA2 proof; both public keys always in key schedule by construction; test vectors; modular AEAD agility; widely deployed in TLS ECH and ODoH
Curve: X25519 (confirmed)	P-256	SafeCurves compliant; constant-time by construction; no patents; HPKE natively supports DHKEM(X25519, HKDF-SHA256)
KDF inside ECIES: absorbed into HPKE key schedule	Manual HKDF with ephemeral_pk only as salt	HPKE's key schedule automatically includes both public keys and provides domain separation via labeled ops
Committing AEAD: CTX construction over ChaCha20-Poly1305	UtC prefix, AES-GCM-SIV (not committing), AEGIS-256-MAC (draft only)	CTX achieves CMT-4 (full commitment); one BLAKE3 call over a short string; tag grows from 16 → 32 bytes (negligible at share package size); no plaintext pass required
`file_key` included as raw bytes inside HPKE envelope	`file_key_wrapped` (double-encrypted with same key)	HPKE outer envelope already provides confidentiality and integrity; inner wrapping is redundant and requires a second nonce; "wrapped" terminology is reserved for KEK-based wrapping in the vault

Open Questions

AEGIS-256 + HPKE: When draft-irtf-cfrg-aegis-aead reaches RFC status and a Rust audit is available, AEGIS-256-MAC (a purpose-built committing AEAD) could replace the CTX wrapper as the HPKE AEAD component. The wire format and HPKE API call sites would remain identical — only the AEAD type parameter changes.
HPKE sender authentication: The Base mode (used here) provides no sender authentication — any holder of the recipient's public key can create a valid share package. The Auth mode (SetupAuthS / SetupAuthR) adds sender authentication using the sender's long-term private key. This is not needed for Phase 5 (out-of-band key exchange already implies trust) but is noted as a future option for stronger provenance guarantees.
Post-quantum migration: HPKE's modular KEM design means a PQ-KEM (e.g., ML-KEM / Kyber) can replace DHKEM(X25519) when needed. PQ-HPKE (Anastasova et al., IACR 2022) documents this path.

Sources

Source	Topic	URL
RFC 9180 — Hybrid Public Key Encryption (Barnes, Bhargavan, Lipp, Wood; 2022)	HPKE: formal specification, IND-CCA2 proof, KEM/KDF/AEAD composition	https://www.rfc-editor.org/rfc/rfc9180
RFC 7748 — Elliptic Curves for Security (Langley, Hamburg, Turner; 2016)	X25519 and X448 specification; cofactor clamping	https://www.rfc-editor.org/rfc/rfc7748
Bernstein & Lange — "Safe Curves for Elliptic-Curve Cryptography" (2013, updated 2024)	SafeCurves criteria; X25519 vs P-256 security properties	https://cr.yp.to/papers/safecurves-20240809.pdf
Len, Grubbs, Ristenpart — "Partitioning Oracle Attacks" (USENIX Security 2021)	Partition oracle attacks on AES-GCM, ChaCha20-Poly1305, XSalsa20-Poly1305	https://www.usenix.org/conference/usenixsecurity21/presentation/len
Chan & Rogaway — "On Committing Authenticated Encryption" (IACR 2022)	CTX construction; CMT-1/CMT-4 security notions; key commitment for ECIES	https://eprint.iacr.org/2022/1260
Bellare & Hoang — "Efficient Schemes for Committing Authenticated Encryption" (EUROCRYPT 2022)	UtC, RtC, HtE transforms for adding key commitment	https://eprint.iacr.org/2022/268.pdf
Cloudflare Blog — "HPKE: Standardizing public-key encryption (finally!)"	HPKE vs ECIES comparison; problems HPKE fixes; adoption in TLS ECH	https://blog.cloudflare.com/hybrid-public-key-encryption/
`hpke` crate — rust-hpke (rozbb)	RFC 9180 Rust implementation; v0.13.0; ~4M downloads	https://docs.rs/hpke/latest/hpke/
`age` X25519 recipient (Filippo Valsorda) — x25519.go	age ECIES construction: HKDF salt = ephemeral_pk \|\| recipient_pk; ChaCha20-Poly1305	https://github.com/FiloSottile/age/blob/main/x25519.go
`aes-gcm-siv` crate — RustCrypto (Tony Arcieri)	AES-GCM-SIV Rust implementation; audit status: no direct audit	https://docs.rs/aes-gcm-siv/latest/aes_gcm_siv/
NIST FIPS 186-5 — Digital Signature Standard (2023)	P-256 / secp256r1 curve specification	https://csrc.nist.gov/pubs/fips/186-5/final

Arx Runa: Password and Key Recovery

Document type: Exploration / feasibility research Status: Concluded Last updated: 2026-04-10

Investigates every known mechanism for recovering vault access after a password or key file is lost, evaluated against Arx Runa's zero-knowledge threat model.

The Problem

Arx Runa derives the vault master key entirely from user credentials:

master_key = Argon2id(password || key_file_bytes, salt)

This means:

No recovery by design — the server never holds the key, so there is nobody to call
Loss of password = loss of vault (unless the user kept the key file)
Loss of key file = loss of vault (even if password is known)

The question is: can we offer any recovery path without compromising the zero-knowledge property? And which paths are worth offering as opt-in features?

Prior Art

BitLocker (Microsoft)

Generates a 48-digit numeric recovery key at setup. The user stores this key externally (print, USB, Microsoft account). The recovery key is a separate AES key that wraps the Volume Master Key — losing the password does not lose the data if the recovery key is retained.

ZK relevance: The recovery key is generated on-device and stored by the user — Microsoft never sees it (unless the user uploads it to their Microsoft account). The pattern is sound.

1Password (Emergency Kit)

At account creation, prints a PDF "Emergency Kit" containing the account password, Secret Key, and a QR code. The Secret Key is a device-generated high-entropy value that supplements the master password in the key derivation. If both are lost, data is gone.

ZK relevance: 1Password holds encrypted vault data but not keys. Their "Account Recovery" for Teams/Business uses an admin-encrypted copy of the user's key — a deliberate escrow for enterprise. For individual plans, there is no recovery — they are explicit about this.

Bitwarden

Offers an optional "Emergency Access" feature: a trusted contact can request access, and after a configurable waiting period the user can approve/deny. The contact receives an asymmetric key share. Bitwarden holds encrypted data; the key exchange happens on-device.

ZK relevance: Emergency access uses RSA key wrapping — the contact's public key encrypts a wrapped vault key. The server sees only ciphertext. This is ZK-compatible in principle.

Age (age-encryption.org)

No recovery mechanism. If the passphrase is lost, so is the data. Age explicitly documents this. The design philosophy: recovery is the user's responsibility.

LUKS (Linux Unified Key Setup)

Supports up to 32 key slots — any slot can unlock the volume master key. A recovery passphrase is simply an additional key slot. Keyfile-based slots and passphrase slots can coexist.

ZK relevance: Multi-slot design is highly relevant — each slot independently wraps the same master key. Adding a "recovery slot" does not weaken the primary password slot.

Wallets like Argent allow N-of-M "guardians" (trusted addresses) to approve a recovery operation. The wallet key is not split — guardians vote to replace the signing key. This is account-level, not key-level recovery.

ZK relevance: This pattern translates to Shamir's Secret Sharing at the key level.

Splits a secret S into N shares such that any K shares reconstruct S, but K−1 shares reveal nothing (information-theoretic security). Defined by Shamir (1979). Libraries: vsss-rs, sharks (Rust).

ZK relevance: No server involvement required. Shares can be distributed to trusted contacts, stored in separate locations, or printed and sealed. Fully ZK-compatible.

SLIP-39 (Satoshi Labs)

An SSS scheme designed for cryptocurrency key recovery. Uses a 10-bit word list (1024 words), Reed-Solomon error correction, and optional passphrase. Standardized by Trezor. More robust than raw SSS.

ZK relevance: Directly applicable — designed exactly for high-value key backup.

BIP-39 (Bitcoin mnemonic phrases)

Encodes 128–256 bits of entropy as 12–24 human-readable words. Not SSS — it is a direct encoding of the key. If all words are captured, the key is captured.

ZK relevance: Could be used to encode and display the vault master key (or a separately generated recovery key) as a mnemonic. Very user-friendly for "write this down."

OPAQUE (RFC 9380, IRTF CFRG)

An asymmetric PAKE protocol where the server never sees the password, even during registration. A server-side "secret oprf key" is mixed into derivation — changing the password requires server involvement. Does NOT help with password recovery, but eliminates password-at-rest exposure.

ZK relevance: Interesting for passwordless future, but does not solve recovery.

Recovery Mechanisms

1. Recovery Phrase (BIP-39 Mnemonic)

At vault creation, generate 32 bytes of additional entropy and encode as a 24-word BIP-39 phrase. This phrase is an alternative vault key that independently wraps the master key (a second LUKS-style slot).

User writes down / prints 24 words
Losing the password is recoverable if the phrase is retained
Phrase itself must be kept secret

2. Recovery Code (Numeric/Alphanumeric)

Simpler than BIP-39 — generate a random 40-character alphanumeric code (like BitLocker's 48-digit code). Display once, user saves externally. Wraps the master key in a second slot.

Easier to type for less technical users
Less memorable but shorter to store

At setup, split the master key into N shares. User distributes to trusted contacts or prints and stores in separate locations. Any K shares reconstruct.

sharks crate (Rust): pure SSS, no external deps
vsss-rs crate: Verifiable SSS (VSS) — shares can be verified without reconstruction
2-of-3 is the most common practical choice (two locations + one trusted person)

UX complexity: High. Users must manage multiple physical shares. Error-prone without tooling.

4. SLIP-39 Mnemonic Shares

Like Shamir's but each share is human-readable words with error correction. Better UX than raw byte shares. Supported by hardware wallets (Trezor).

Libraries: slip39 (Rust, community crate, )
Reed-Solomon error correction within each share reduces transcription errors

5. Key File as Recovery Mechanism

The existing key file feature already functions as a second factor. If used correctly (key file stored separately from password), losing the password is recoverable via the key file.

No new code needed — just documentation and UX guidance
Users should be told explicitly: "your key file IS your recovery key, store it safely"

6. Trusted Contact (Asymmetric Key Wrapping)

User imports a trusted contact's public key (X25519 or RSA). The master key is wrapped under that public key and stored in the vault metadata. If the user loses their password, the contact can unwrap and share the master key.

Requires contact to have Arx Runa installed (or a standalone unwrapping tool)
The wrapped blob is stored in the vault — server sees only ciphertext
ZK-compatible: server never sees plaintext

7. Time-Delayed Recovery (Dead Man's Switch variant)

User registers a recovery email or contact. After N days of no login, an encrypted recovery blob is released to the contact. Not self-recovery — useful for inheritance.

Requires a server component with liveness tracking — breaks pure local operation
Out of scope for current architecture

8. Cloud-Backed Encrypted Recovery Blob (Opt-In Escrow)

User chooses to upload an Argon2id-encrypted copy of their master key to a separate cloud service (not their vault cloud). Recovery requires knowing the escrow password (a simpler, separately stored password).

Completely voluntary — user chooses the escrow destination
ZK for the primary vault: the vault cloud still never sees keys
The escrow service sees a ciphertext blob — ZK if escrow password is strong

9. Hardware Security Key (FIDO2 / YubiKey) — Multiple Keys

If vault auth uses a hardware key, registering two hardware keys is already recovery: keep one as backup. This is an existing pattern for FIDO2/WebAuthn.

Arx Runa does not currently have FIDO2 integration — future phase

10. Platform Biometric / OS Keychain Escrow

On supported platforms (Windows Hello, macOS Secure Enclave, Android Keystore), the OS can bind a key to the device and biometric. A sealed copy of the master key is stored in the OS keychain.

Recovery requires same device + biometric (or platform recovery)
Does not help if device is lost — only helps with forgotten password on same device
Windows Hello recovery: Microsoft account backup of keychain (optional)

ZK Threat Model Evaluation

Mechanism	Server sees plaintext?	Server required?	ZK-compatible?	Notes
Recovery phrase (BIP-39)	No	No	✅ Yes	Purely local, strong
Recovery code (numeric)	No	No	✅ Yes	Purely local, simple
Shamir's SSS	No	No	✅ Yes	Complex UX
SLIP-39 shares	No	No	✅ Yes	Better UX than raw SSS
Key file (existing)	No	No	✅ Yes	Already implemented
Trusted contact (X25519 wrap)	No	No	✅ Yes	Requires contact tooling
Time-delayed / dead man's	No	Yes	⚠️ Partial	Needs liveness server
Cloud-backed escrow (opt-in)	No (ciphertext)	Yes (escrow)	⚠️ Partial	Escrow cloud is a third party
FIDO2 backup key	No	No	✅ Yes	Future phase
Platform biometric	No	No	✅ Yes	Device-bound, limited

Comparison Table

Mechanism	Complexity (dev)	Complexity (user)	Entropy	Offline?	Prior art
Recovery phrase (BIP-39)	Low	Low	256-bit	✅	1Password, hardware wallets
Recovery code	Very low	Very low	~200-bit	✅	BitLocker
Shamir's SSS (2-of-3)	Medium	High	Same as key	✅	Ethereum social recovery
SLIP-39	Medium	Medium	Same as key	✅	Trezor
Key file (existing)	None	Medium	512-bit	✅	Arx Runa already
Trusted contact wrap	High	Medium	Same as key	✅	Bitwarden Emergency Access
Cloud escrow (opt-in)	High	Medium	Escrow pw	⚠️	—
Platform biometric	Very high	Very low	Device-bound	✅ (same device)	macOS Keychain

Recommendation

Recovery is opt-in. Users who do not set it up lose their vault if they lose their credentials — this is explicitly documented and expected.

Phase: ship BIP-39 recovery phrase

The primary recovery mechanism is a BIP-39 24-word mnemonic that functions as a second key slot (LUKS-style). The master key is never stored directly — the phrase wraps it:

recovery_salt         = CSPRNG(32 bytes)
recovery_key          = Argon2id(phrase_words_joined, recovery_salt)   // same params as primary slot
recovery_slot         = XChaCha20-Poly1305.encrypt(master_key, recovery_key, aad=b"arx-runa recovery v1" || vault_id_bytes)

recovery_slot and recovery_salt are stored in vault metadata alongside the primary password slot. Either slot unlocks the vault independently.

Phrase generation: 256 bits of entropy via rand::rng().fill(), encoded via the bip39 crate. The last word is a checksum — mistyping any word is caught before the wrong key is derived.

Display policy: shown exactly once during the recovery-setup ceremony (a separate post-creation flow via Security settings). Never stored. User must acknowledge before proceeding.

Argon2id parameters: identical to the primary password slot — the recovery phrase is not a weaker path.

AAD: b"arx-runa recovery v1" || vault_id_bytes binds the ciphertext to its vault context and version, preventing both cross-vault recovery slot transplant attacks and cross-slot confusion attacks.

Future phases (not shipped now)

SLIP-39 shares: 2-of-3 by default, user-overridable. Depends on a production-ready slip39 Rust crate — needs further evaluation.
Trusted contact wrap: age-encryption.org/v1 X25519 format, so the contact can unwrap with the age CLI without Arx Runa installed. Higher implementation complexity; deferred.

Decisions

Choices made during this research session. Updated as the session progresses.

Decision	Alternatives considered	Rationale
Use BIP-39 24-word mnemonic for the primary recovery slot	Recovery code (alphanumeric string), SLIP-39 shares	BIP-39 checksum catches transcription errors immediately; well-understood mental model from crypto wallets; `bip39` crate is battle-tested; 256-bit entropy
Use SLIP-39 word shares (not raw SSS bytes)	Raw byte shares, skip SSS entirely	SLIP-39 Reed-Solomon error correction catches transcription errors; human-readable words; Trezor-compatible format
Default SLIP-39 split: 2-of-3, user-overridable	3-of-5, fixed threshold	2-of-3 is the standard practical choice; tolerates one lost share; lower setup friction than 3-of-5
Trusted contact wrap uses age encryption format (age-encryption.org/v1)	Arx Runa-native X25519 format	Contact can unwrap with `age -d` CLI — no dependency on Arx Runa being available; format is audited and standardised
Ship BIP-39 phrase only; defer SLIP-39 and trusted contact wrap	Ship all four mechanisms, ship BIP-39 + SLIP-39	Start with the simplest ZK-compatible mechanism; SLIP-39 crate maturity unverified; trusted contact requires age format integration
Recovery is opt-in (no forced recovery setup)	Forced setup, no recovery at all	Follows the principle of least surprise; power users may not want the attack surface; honest about data loss risk
Recovery slots re-wrapped during password/key change (phrase required)	Invalidate slots on change, wrap a stable intermediate key	Re-wrapping preserves the user's existing recovery setup; avoids forcing re-setup after every password change; integrity check confirms phrase correctness before re-wrap
Recovery slot AAD includes vault_id: `b"arx-runa recovery v1" \|\| vault_id_bytes`	AAD without vault_id	Prevents cross-vault recovery slot transplant attacks; binds the ciphertext to its vault context
Recovery setup is post-creation (Security settings + one-time prompt)	Inline during vault creation	Vault creation is already a 21-step ceremony for Tier 2; recovery setup requires password re-entry to re-derive `master_key`, which is a natural post-creation ceremony
Same Argon2id parameters for recovery phrase as primary password slot	Weaker/faster parameters (phrase has 256-bit entropy)	Identical parameters provide slot indistinguishability in the vault header — an attacker cannot determine which salt/params belong to the recovery slot vs. primary slot
Vault header uses `recovery_slots` array (supports multiple methods)	Single recovery slot field	Designed for future extensibility (SLIP-39, trusted contact) without vault header schema migration; in the BIP-39-only phase, at most one element

Open Questions

All questions resolved. See Decisions table for rationale.

Q1 — slip39 Rust crate maturity: Resolved. No production-ready slip39 Rust crate exists with a security audit. The slip39 crate on crates.io has limited adoption and no published audit . The vsss-rs crate provides Shamir field arithmetic but not the SLIP-39 word encoding with Reed-Solomon error correction. SLIP-39 is deferred per Decision 5. When pursued, implement the word layer on top of vsss-rs or wait for a crate to reach production maturity. See Sources: slip39 Rust crate, vsss-rs Rust crate.
Q2 — Simultaneous BIP-39 and SLIP-39 slots: Resolved. Yes — the vault header uses a recovery_slots array, supporting multiple independent recovery methods. In the BIP-39-only phase, at most one element is present. Future phases may add SLIP-39 or trusted-contact slots alongside without a vault header schema migration.
Q3 — Argon2id parameters for recovery phrase: Resolved. Use the same Argon2id parameters as the primary password slot. The recovery phrase has 256-bit entropy, so Argon2id's brute-force resistance is redundant in practice. However, identical parameters provide slot indistinguishability in the vault header — an attacker cannot determine which salt belongs to the recovery slot vs. the primary slot. See Decisions table.
Q4 — UX flow (inline vs. post-creation): Resolved. Post-creation, via a Security settings page, with a one-time dismissible prompt after vault creation. Vault creation is already a 21-step ceremony for Tier 2. Recovery setup is a separate ceremony that requires password re-entry to re-derive master_key — the critical invariant (no long-lived master_key) is preserved in both the creation and recovery-setup paths.

Sources

Source	Topic	URL
Shamir, A. (1979). "How to share a secret." Communications of the ACM	Shamir's Secret Sharing foundational paper	https://dl.acm.org/doi/10.1145/359168.359176
BIP-39 specification	Mnemonic encoding of entropy	https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki
SLIP-39 specification (Satoshi Labs)	SSS-based mnemonic shares with error correction	https://github.com/satoshilabs/slips/blob/master/slip-0039.md
LUKS on-disk format v2	Multi-keyslot design for volume encryption	https://gitlab.com/cryptsetup/cryptsetup/-/wikis/LUKS-standard/on-disk-format.pdf
1Password White Paper	Emergency Kit and Secret Key design	https://1passwordstatic.com/files/security/1password-white-paper.pdf
Bitwarden Help Center — Emergency Access	Emergency access workflow and trusted-contact model	https://bitwarden.com/help/emergency-access/
RFC 9380 — OPAQUE (IRTF CFRG)	OPAQUE asymmetric PAKE	https://www.rfc-editor.org/rfc/rfc9380
`sharks` Rust crate	Shamir's Secret Sharing in Rust	https://docs.rs/sharks/latest/sharks/
`vsss-rs` Rust crate	Verifiable Secret Sharing in Rust	https://docs.rs/vsss-rs/latest/vsss_rs/
`slip39` Rust crate	SLIP-39 mnemonic shares in Rust (community crate)	https://docs.rs/slip39/latest/slip39/
Microsoft BitLocker documentation	BitLocker recovery key design	https://learn.microsoft.com/en-us/windows/security/operating-system-security/data-protection/bitlocker/recovery-overview
`bip39` Rust crate	BIP-39 mnemonic generation and validation	https://docs.rs/bip39/latest/bip39/
age encryption format specification v1	X25519 recipient stanza and file key wrapping	https://age-encryption.org/v1

Arx Runa: Reducing Padding Overhead

Document type: Exploration / feasibility research Status: Living document Last updated: 2026-04-10

This document surveys all known techniques for reducing the per-file padding overhead caused by Arx Runa's fixed-size 4 MiB chunk design, and evaluates each against Arx Runa's privacy model and implementation constraints.

The Encryption and Upload Flow

Before discussing security properties, it is useful to be precise about the order of operations — because it is not always intuitive.

Correct order: chunk → pad → encrypt → upload

File (plaintext)
  │
  ├─→ Chunk 1 [████████████████] 4 MiB real data               → encrypt → upload as blob UUID-1
  ├─→ Chunk 2 [████████████████] 4 MiB real data               → encrypt → upload as blob UUID-2
  └─→ Chunk 3 [████████░░░░░░░░] 2 MiB real data + 2 MiB zeros → encrypt → upload as blob UUID-3

Only the last chunk is padded. All preceding chunks are naturally full at exactly 4 MiB and require no padding.

For a small file (e.g. a 500 KB document), there is only one chunk and it is almost entirely padding:

File (plaintext)
  │
  └─→ Chunk 1 [█░░░░░░░░░░░░░░░] 500 KB real data + 3.5 MiB zeros → encrypt → upload as blob UUID-1

This single blob is indistinguishable from a blob containing 4 MiB of real data. The cloud cannot tell the difference — which is the privacy benefit. The cost is the 3.5 MiB of wasted cloud storage.

Encryption happens last, after chunking and padding. Each chunk is encrypted independently with its own random 24-byte nonce and produces its own 16-byte authentication tag. The AAD (Additional Authenticated Data) bound to each chunk is file_id || chunk_index, which cryptographically ties each blob to its position in the file — preventing an attacker from swapping or reordering chunks.

This design follows the standard approach for streaming authenticated encryption described by Adam Langley ("Encrypting Streams", imperialviolet.org, 2014) and implemented in libraries such as libsodium SecretStream and Google Tink's Streaming AEAD. The core requirement, as Langley notes, is that a chunked encryption scheme must prevent: chunk reordering, chunk dropping from the start or end, and cross-stream chunk injection. Binding chunk_index into the AAD of each chunk satisfies all three.

Encrypting the whole file first and then chunking would not work: you would have one large ciphertext with no way to partially download or verify individual segments, and no way to bind each segment to its position.

What a "blob" is

A blob (Binary Large Object) is the atomic unit of storage in object storage systems such as Amazon S3, Backblaze B2, and Cloudflare R2. Each blob is a named byte sequence stored and retrieved as a whole — the storage system does not interpret its contents. In Arx Runa's case, each encrypted chunk is stored as one blob: an object on the cloud backend with a random UUID v4 name, containing nonce (24 bytes) || ciphertext (4 MiB) || Poly1305 tag (16 bytes). The cloud treats it as opaque binary data — it can store, retrieve, and delete it, but cannot read its contents.

Does the cloud know which blobs belong to the same file?

No — and this is a key privacy property.

When Arx Runa uploads a 10 MiB file, it produces 3 blobs with random UUID names:

3f8a2b1c-4d5e-6f7a-8b9c-0d1e2f3a4b5c   (4 MiB + 40 bytes)
a9f3c2e1-7b8c-9d0e-1f2a-3b4c5d6e7f8a   (4 MiB + 40 bytes)
7b2d4f8a-1c2d-3e4f-5a6b-7c8d9e0f1a2b   (4 MiB + 40 bytes)

The cloud cannot distinguish "one 10 MiB file split into 3 chunks" from "three separate 4 MiB files" or "one 4 MiB file and one 8 MiB file" or any other combination. All blobs are identical in size and randomly named. The manifest — stored locally in an encrypted SQLCipher database — is the only record of which UUID belongs to which file and in which order. The cloud never sees the manifest.

Your intuition is correct: the cloud sees a bucket of N uniform, anonymous blobs. It cannot count files, identify file sizes precisely, or link blobs together — unless it can observe upload timing.

The timing side-channel

The one exception is upload timing. If all 3 chunks of a file are uploaded in rapid succession as a burst, an adversary watching the upload log might infer those 3 blobs are related. This is a weak side-channel — it reveals approximate file size (from burst size), not content. Research on encrypted traffic analysis confirms that timing and volume patterns remain exploitable even when payload content is fully encrypted ("The Inevitability of Side-Channel Leakage in Encrypted Traffic", arxiv 2602.14055). Epoch-based batching mitigates this by interleaving blobs from multiple files in a single upload batch, making grouping inference harder.

A natural response to timing leakage is to add random upload delays — jitter of a few seconds between blob uploads to blur burst boundaries. This approach is evaluated and rejected in the Upload Jitter section below.

Why Chunking and Padding Exist

Before addressing the cost, it is worth being precise about what security property chunking and padding actually provide — because every trade-off in this document is a trade-off against this property.

What an adversary sees without padding

Suppose Arx Runa encrypted files and uploaded them as variable-size blobs — no padding, no fixed chunk size. The ciphertext content is unreadable. But the cloud provider, or anyone who can observe the storage bucket, would see:

A blob of 2,497,152 bytes
A blob of 5,242,880 bytes
A blob of 52,428,800 bytes

File sizes are metadata. They reveal information independently of content:

What the adversary observes	What they can infer
Blob is ~2.5 MB	Almost certainly a smartphone photo (HEIC/JPEG size range)
Blob is ~50 KB	Small document, config file, or thumbnail
Blob is ~4 GB	Large video file or disk image
Blob matches a known file exactly	Can confirm whether a specific file is present — even without decrypting it

The last point is the most serious: an adversary who has a copy of a target file (e.g., a known document or photo) can compute its size and compare it against observed blob sizes to confirm or deny its presence in the vault. This is a membership inference attack — no decryption required. The 2019 PURBs paper (Nikitin et al., EPFL) formally characterises this class of leakage, and the 2024 Broken Cloud Storage research demonstrated it as a practical attack against five major E2EE cloud storage providers.

What fixed-size chunking and padding provides

Arx Runa splits files into 4 MiB chunks and zero-pads the last chunk to 4 MiB before encryption. Every blob uploaded to the cloud is exactly 4 MiB + 40 bytes.

A key question: if a file is split into multiple chunks, does the number of chunks reveal the file size?

The cloud cannot directly count blobs per file. All blobs have random UUID names, are identical in size, and have no structural links between them. The manifest — the only record of which blobs belong to which file — is encrypted locally in SQLCipher and never sent to the cloud. An adversary watching the storage bucket sees a pool of anonymous, uniform-size objects. They cannot determine how many blobs any given file produced by inspecting storage alone.

The only mechanism by which the adversary can learn N for a specific file is upload timing: if a file's chunks are uploaded in a rapid burst, the adversary watching the upload log can group those blobs together and infer N. If uploads from multiple files are interleaved — as in epoch batching — the adversary cannot decompose the total blob count into per-file chunk counts.

If timing correlation succeeds, the inferred N gives a size range:

(N − 1) × chunk_size  <  file_size  ≤  N × chunk_size

For a 4 MiB chunk, this gives a size range of width 4 MiB per file. A 2.5 MB photo and a 3.9 MB photo are indistinguishable — both produce 1 blob. A 5 MB photo and a 7.9 MB photo are indistinguishable — both produce 2 blobs. The exact size within that range is hidden because the manifest (which stores size_bytes) is encrypted in SQLCipher and never visible to the cloud.

The precise security property is therefore: file size is hidden to within ±chunk_size, conditional on the adversary successfully correlating upload bursts to individual files. It is not zero leakage — it is bounded, timing-conditional leakage.

What the design provides:

Exact file size is not inferrable: the cloud cannot determine whether a file is 2.5 MB or 3.9 MB — both produce one identical blob. The encrypted manifest is the only record of the exact size.
No file type inference from blob size: all blobs are identical in size, so the adversary cannot use blob dimensions to distinguish photos from documents, videos from source code.
Membership inference is substantially blocked: an adversary who possesses a target file of known exact size S cannot confirm its presence from blob sizes (all identical) or from per-file blob count (which requires timing correlation to observe). Without the manifest, they cannot link any set of blobs to any specific file.
Blob-to-file mapping is hidden: the manifest, which records which blobs belong to which file, is encrypted. The adversary cannot determine which N blobs form a single file, or how many files a given set of blobs represents. Timing is the only exception: blobs uploaded in close succession may allow grouping inference (see "What padding does NOT protect" below).
Combined with UUID blob names: blobs are named with random UUID v4 identifiers. The cloud sees N identically-sized, randomly-named opaque objects with no structural relationships between them.

What padding does NOT protect

Padding addresses size-based inference. It does not protect:

Access patterns: which blobs are downloaded, when, and how often. If the adversary can observe downloads, they may infer which files are accessed even if they cannot read them. Arx Runa does not currently address access pattern leakage. Research on encrypted traffic analysis shows that size and timing patterns persist as side-channels even when content is fully encrypted ("The Inevitability of Side-Channel Leakage in Encrypted Traffic", arxiv 2602.14055).
Blob count over time: the total number of blobs in the vault grows as files are added. An adversary watching the vault over time can observe when files are added and removed, even if not which files.
Upload timing: the timing of uploads may correlate with user activity (e.g., a burst of uploads after a trip may suggest photo backup).

These are out of scope for the padding design and are acknowledged in the threat model.

Why this justifies the overhead cost

The padding overhead (up to 68% for small files) is the price of the bounded-leakage guarantee above. Every approach in this document reduces that price by accepting more leakage — either a narrower size range, or exact size in the worst case. Understanding precisely what the padding buys (file size hidden to within ±chunk_size, blob-to-file mapping hidden by the encrypted manifest) is necessary to evaluate whether any given trade-off is worth making.

The Problem, Restated

Arx Runa pads every file's last chunk to exactly 4 MiB before encryption. All blobs are identical in size — the cloud cannot determine exact file sizes, only a size range of width ±chunk_size from the blob count. The exact size within that range is protected by the encrypted manifest. The privacy property is strong, but the storage cost is high for small files:

File	Actual size	Stored	Overhead
iPhone HEIC photo	2.5 MB	4 MiB	68%
Android JPEG photo	5 MB	8 MiB	65%
Small document	50 KB	4 MiB	99%
10-min 4K video	1.5 GB	1,464 MiB	~0%

The overhead only matters for small files. Large files (videos, archives) waste at most 4 MiB per file regardless of total size — negligible at scale. The problem is concentrated in photo libraries and small document collections.

The Privacy Constraint

The current design achieves bounded size leakage: all blobs are 4 MiB + 40 bytes, so the adversary can infer a file's size only to within a 4 MiB range from the blob count. The exact size within that range is hidden by the encrypted manifest. An adversary cannot infer file types from blob dimensions, and cannot determine exact file sizes.

Any approach that reduces padding overhead either narrows that range (smaller chunks leak a tighter size bucket) or widens it further (larger chunks leak a coarser one). The key question for each approach is: how much additional leakage does it introduce, and is it acceptable given Arx Runa's threat model?

Arx Runa's threat model treats the cloud provider as untrusted and adversarial. The relevant question is not "does this leak anything?" but "does this leak enough to enable a meaningful attack?"

Approach 1 — Bin-Packing

Pack multiple small files into a single 4 MiB chunk before encryption.

Chunk: [file_A: 1.2 MiB | file_B: 0.8 MiB | file_C: 1.7 MiB | padding: 0.3 MiB]

Storage savings: high — approaches zero padding for large enough batches of small files.

Privacy: no additional leakage — all blobs remain the same fixed size. The existing bounded leakage (±chunk_size from blob count) is unchanged.

Core problem: write amplification. Deleting or updating one file in a packed chunk requires decrypting, repacking, and re-encrypting the entire chunk.

Best fit: write-once archival vaults (photo archives, document backups). Modelled on Facebook Haystack.

Covered in detail: bin-packing.md.

Approach 2 — Padmé Padding

Padmé is a padding scheme developed at EPFL as part of the PURB (Padded Uniform Random Blobs) research. Rather than padding all files to one fixed size, Padmé pads each file to the nearest value in a mathematically defined set of sizes — chosen to minimise both information leakage and storage overhead.

How Padmé works

Padmé represents the file length as a floating-point number and rounds the mantissa, producing a padded length that clusters files into size tiers. The tier boundaries are closer together at small sizes and further apart at large sizes, adapting to the actual distribution of file lengths in the wild.

The result:

An adversary learns at most O(log log M) bits about the file's size (where M is the maximum possible size)
This is the same asymptotic leakage as padding to the next power of two
But the maximum overhead is only 12% instead of up to 100% for power-of-two

Visualisation — what the cloud sees

With Arx Runa's current fixed-size chunking, every blob is exactly 4 MiB + 40 bytes regardless of the actual file size. A 500 KB photo and a 3.9 MB photo both produce a single identical blob — the cloud learns only that the file is somewhere in the 0–4 MiB range:

Current design (fixed 4 MiB chunks)

  500 KB file:
  └─→ Blob [█░░░░░░░░░░░░░░░] 500 KB data + 3.5 MiB zeros  → 4 MiB + 40 B

  2.5 MB file:
  └─→ Blob [██████████░░░░░░] 2.5 MB data + 1.5 MiB zeros  → 4 MiB + 40 B

  3.9 MB file:
  └─→ Blob [███████████████░] 3.9 MB data + 0.1 MiB zeros  → 4 MiB + 40 B

  All three blobs are identical in size. The adversary learns: "file is 0–4 MiB".
  Overhead: up to 88% for the 500 KB file, 68% for the 2.5 MB file.

With Padmé, each file is padded to the nearest Padmé tier — a much smaller size gap — and then encrypted. Blobs are no longer all the same size, but they cluster into a mathematically defined set of sizes:

Padmé design (variable blobs, bounded leakage)

  500 KB file:  padded to ~560 KB
  └─→ Blob [████████████░░░] 500 KB data + ~60 KB zeros  → ~560 KB + 40 B

  2.5 MB file:  padded to ~2.8 MB
  └─→ Blob [█████████████░░] 2.5 MB data + ~300 KB zeros → ~2.8 MB + 40 B

  3.9 MB file:  padded to ~4.0 MB
  └─→ Blob [███████████████] 3.9 MB data + ~100 KB zeros → ~4.0 MB + 40 B

  Blobs vary in size, but only between Padmé tier boundaries.
  The adversary learns: "file is 470–560 KB", "2.3–2.8 MB", "3.7–4.0 MB".
  Overhead: ≤ 12% per file.

The key difference is the width of the leakage window. Under fixed chunking, that window is 4 MiB wide for every small file. Under Padmé, the window shrinks proportionally with the file — a 500 KB file leaks only a ~90 KB range rather than a 4 MiB range. The adversary gains slightly more precise size information per file, but Arx Runa wastes far less cloud storage.

Overhead profile

File size	Padmé padded to	Overhead
50 KB	~56 KB	≤ 12%
500 KB	~560 KB	≤ 12%
2.5 MB	~2.8 MB	≤ 12%
5 MB	~5.6 MB	≤ 12%
300 MB	~324 MB	≤ 8%
1.5 GB	~1.55 GB	≤ 3%

In practice the average overhead across a realistic file corpus is approximately 3% — measured against 848,000 real hard-drive user files.

Real-world impact from EPFL research

Applied to real datasets, Padmé reduces the fraction of files uniquely identifiable by size from:

Dataset	Without Padmé	With Padmé
56k Ubuntu packages	83% uniquely identifiable	3%
191k YouTube videos	87% uniquely identifiable	3%
848k user files	45% uniquely identifiable	8%

Privacy trade-off for Arx Runa

The current Arx Runa design already has bounded size leakage — the blob count reveals file size to within ±4 MiB (one chunk). Padmé replaces that coarse-grained chunk-count leakage with a finer set of size tiers. For a 2.5 MB photo: currently the cloud learns "this file is 0–4 MB" (1 blob). With Padmé it learns "this file is somewhere in the 2.3–2.8 MB range" — slightly more precise, but with dramatically less wasted storage. Whether this is a net privacy improvement depends on the vault's content — for a vault of photos that are mostly 2–4 MB, Padmé reveals slightly more within that range, but the range is already implied by the blob count.

Whether this is acceptable depends on the threat model. For most Arx Runa use cases:

Knowing a file is a "2.3–2.8 MB image" is not actionable without the content
The gain is substantial: from 68% overhead per HEIC photo to ≤ 12%

Rust implementation

A Rust crate implementing Padmé exists: jedisct1/rust-padme-padding. Arx Runa could adopt this directly with minimal integration effort. The padding function takes a plaintext length and returns the padded length; zero-filling the remainder is unchanged from the current implementation.

Applied to Arx Runa

Instead of one fixed blob size, Arx Runa would produce blobs in a defined set of Padmé-determined sizes. The manifest stores the original size_bytes for truncation on decrypt (unchanged from current design). The cloud sees blobs of varying but clustered sizes — not exact file sizes.

For files larger than one chunk (4 MiB), the last chunk is Padmé-padded; full chunks remain at 4 MiB. This preserves the fixed-size property for all complete chunks and applies Padmé only to the last (partial) chunk of each file.

Approach 3 — Tiered Fixed Chunk Sizes

Instead of one fixed chunk size (4 MiB), define multiple fixed sizes — for example 256 KB, 1 MiB, and 4 MiB. Each file is assigned to the smallest tier that keeps padding overhead below a threshold.

Example tier assignment

File size	Tier chosen	Storage used	Overhead
50 KB	256 KB	256 KB	80%
200 KB	256 KB	256 KB	22%
500 KB	1 MiB	1 MiB	50%
900 KB	1 MiB	1 MiB	11%
2.5 MB	4 MiB	4 MiB	38%
5 MB	4 MiB × 2	8 MiB	38%
300 MB	4 MiB × 75	300 MiB	~1%

Privacy leakage

The cloud sees blobs of three different fixed sizes. It learns which tier a file belongs to — a 3-bit leakage for a 3-tier system. For a 256 KB blob: "this file is between 0 and 256 KB." For a 1 MiB blob: "this file is between 256 KB and 1 MiB."

This is a coarser anonymity set than Padmé (which creates many tightly-spaced tiers) but the leakage is bounded and predictable. Within a tier, all blobs are identical — no finer-grained size information is revealed.

Key properties

No write amplification: tiers apply at write time; mutation of a file does not change the tier assignment for other files
Simple to implement: the manifest already stores size_padded; the vault configuration adds a tier table
No new dependencies: purely a configuration change and padding arithmetic
Cloud cost per API call: blobs in the 256 KB tier produce ~16× more cloud objects for large files than the 4 MiB tier — but large files use the 4 MiB tier anyway

Variant: power-of-two tiers

Use 64 KB, 128 KB, 256 KB, 512 KB, 1 MiB, 2 MiB, 4 MiB as tiers. Maximum overhead is always < 100% (worst case: 1 byte above a tier boundary). Privacy leakage: O(log log M) bits — same asymptotic as Padmé, but with up to 100% overhead vs Padmé's 12% maximum. Power-of-two is simpler to reason about but less efficient than Padmé.

Approach 4 — Smaller Uniform Chunk Size

The simplest possible change: reduce the chunk size from 4 MiB to a smaller value. Average padding waste per file is chunk_size / 2.

Overhead comparison

Chunk size	Avg waste/file	iPhone HEIC overhead	1 GiB file chunks
4 MiB (current)	~2 MiB	68%	256
1 MiB	~512 KB	20%	1,024
512 KB	~256 KB	10%	2,048
256 KB	~128 KB	5%	4,096

Privacy

No additional leakage — all blobs are still a uniform fixed size within the vault. The adversary can still infer file size to within ±chunk_size from blob count, but this is the same bounded leakage as the current design. Smaller chunk sizes narrow that range and therefore leak more precise size information (see the chunk-size section above).

Trade-offs

Smaller chunks produce more cloud objects per large file, which has real costs:

Upload: More cloud API calls. Most providers charge per-operation (AWS S3: $0.005 per 1,000 PUTs; Backblaze B2: $0.004 per 1,000 uploads; Cloudflare R2: $0.0036 per million). For a 1 GiB video at 1 MiB chunks: 1,024 uploads vs 256 at 4 MiB — 4× more operations. In absolute cost this remains small, but it compounds across large libraries.

Restore (download): Each blob requires a separate HTTP GET. Downloading a 1 GiB video means 1,024 individual requests at 1 MiB chunks vs 256 at 4 MiB. Even with parallelism, the round-trip overhead and connection setup cost accumulate. Restore latency for large files increases meaningfully at small chunk sizes.

Manifest size: More rows in the chunks table and more entries per file_extents record. For a vault of large video files, this can grow significantly.

Crypto overhead: More AEAD decrypt operations per file retrieval. Each blob requires its own nonce read and Poly1305 tag verification. Negligible per operation, but scales with blob count.

The impact is asymmetric: small files (one blob at any chunk size) see no difference in API cost from smaller chunks. Large files pay proportionally more. A 1 GiB video that costs 256 API calls at 4 MiB costs 4× more at 1 MiB — and this matters most on restore, where the user is waiting.

Chunk size	HEIC overhead	1 GiB video blobs	1 GiB download requests
4 MiB (current)	68%	256	256
1 MiB	20%	1,024	1,024
512 KB	10%	2,048	2,048

Chunk size selection rationale

The current 4 MiB was chosen to balance padding waste against blob count for the anticipated workload. Reducing to 1 MiB meaningfully improves photo overhead (68% → 20%) while keeping large file blob counts at a reasonable 1,024 per GiB. Below 1 MiB, the restore penalty for large files grows substantially with diminishing privacy benefit — the bounded leakage range narrows, but the adversary already cannot observe per-file blob count without timing correlation.

Approach 5 — Content-Defined Chunking (CDC)

CDC splits files at content-dependent boundaries using a rolling hash (Rabin fingerprinting, Gear hashing, FastCDC). Chunks are variable-size but cluster around a target average. Used by restic, Borg, Kopia, Bupstash, Duplicacy, and Tarsnap — essentially all encrypted backup tools that support deduplication.

Why backup tools use CDC

CDC enables cross-file deduplication: if the same data block appears in two different files (or two versions of the same file), the same chunk boundary will be found and the chunk stored once. This is the primary motivation — deduplication ratios of 60–80% are common for backup workloads.

The privacy problem

CDC variable chunk sizes leak information. A 2025 paper ("Breaking and Fixing Content-Defined Chunking", Kien Truong) demonstrated:

An adversary observing encrypted chunk sizes can fingerprint specific files without decrypting them
The vector of chunk lengths for a file can uniquely identify it among a known set of candidate files
This enables a membership inference attack: "is this specific file present in this backup?"

This attack was demonstrated concretely against Tarsnap, Borg, and Restic. All three are vulnerable to an adversary who can observe encrypted blob sizes.

For Arx Runa's threat model — where the cloud provider is explicitly adversarial — CDC is unacceptable. Arx Runa's fixed-size chunk design was chosen precisely to prevent this class of attack. CDC would revert that protection.

Deduplication without CDC

Deduplication without content-defined chunk sizes is not generally possible — fixed-size chunks from two versions of a file will differ unless the file is identical. Arx Runa does not target deduplication as a design goal, which makes CDC's primary benefit irrelevant in addition to its privacy cost.

Approach 6 — Epoch-Based Deferred Batching

Instead of encrypting and uploading each file immediately, buffer all writes within a time window (an "epoch") and flush the accumulated files as a batch of packed chunks at the end of the epoch.

How it works

Epoch window (e.g., 30 minutes or user-triggered):
  file_A added → held in local staging buffer
  file_B added → held in local staging buffer
  file_C added → held in local staging buffer
  ...
  Epoch flush:
    Pack files into 4 MiB chunks, encrypt, upload batch

Difference from bin-packing

Standard bin-packing packs files into chunks and uploads immediately, then must re-pack on mutation. Epoch batching is append-only within an epoch: files written during an epoch are packed together and sealed. Subsequent mutations either:

Create a new version of the file in the next epoch (append-only, old version soft-deleted)
Or trigger an immediate flush of the current epoch

No in-place mutation of sealed epochs occurs. This eliminates write amplification.

Privacy

Blobs remain fixed-size (4 MiB + 40 bytes). Epoch batching does not increase size leakage — all blobs are still identical.

Epoch batching eliminates the timing side-channel for batched files. Without it, uploading a 500 KB file produces one blob immediately — the adversary can correlate that single-blob burst to a single small-file addition. With epoch batching, that file's data is mixed into a chunk with other files and the entire epoch flushes as one burst. The adversary cannot determine how many files were added, what their sizes are, or which blobs correspond to which files.

Trade-offs

Files are not immediately available in cloud until the epoch flushes. For Arx Runa's use case (sync, not real-time streaming), this is generally acceptable.
Partially-filled staging chunks at epoch flush incur last-chunk padding — but this applies once per epoch, not once per file.
Soft-deleted files accumulate until a compaction pass.
The local staging buffer must be encrypted at rest and cleared on vault lock.

Suitability

Best fit for bulk imports (e.g., importing a full photo library) where files are added in large batches. Less useful for individual file additions where the epoch window closes with a single file — yielding no packing benefit. Approach 7 addresses this limitation.

Approach 7 — Hybrid Auto-Routing (Small-File Epoch Buffering)

A refinement of epoch batching that routes files automatically based on size: small files go to the epoch buffer, large files upload immediately. This eliminates the main weakness of pure epoch batching (large file delay) while preserving its full benefit for small files.

How it works

The natural threshold is the chunk size itself. A file smaller than one chunk cannot fill any complete chunk — it only ever produces padding waste. Such files benefit maximally from packing and have no urgent upload requirement. Files larger than one chunk upload all their chunks immediately, including the trailing partial — which is zero-padded to a full fixed-size blob as in the current design.

file size < chunk_size
  → queue entire file in local epoch buffer
  → packed with other small files at epoch flush
  → uploaded as full fixed-size blobs

file size ≥ chunk_size
  → ALL chunks encrypted and uploaded immediately
  → trailing partial padded to chunk_size and uploaded as a standalone blob
  → no epoch involvement — file is fully backed up immediately

Trailing partials of large files are not queued in the epoch buffer. Doing so would create a backup-completeness problem: if no small files follow, the epoch may never fill, leaving the large file partially backed up with its last chunk stuck in the local buffer indefinitely. Uploading the trailing partial immediately as a standalone blob avoids this entirely — large file backup is always complete as soon as the upload finishes.

Visualisation

Small file (500 KB):
  └─→ epoch buffer → [file_A: 500 KB | file_B: 800 KB | file_C: 1.2 MiB | pad: 1.5 MiB]
                                                                         → encrypt → blob UUID-X

Large file (10 MiB):
  ├─→ Chunk 1 [████████████████] 4 MiB real data               → encrypt → upload as blob UUID-1
  ├─→ Chunk 2 [████████████████] 4 MiB real data               → encrypt → upload as blob UUID-2
  └─→ Chunk 3 [██░░░░░░░░░░░░░░] 2 MiB real + 2 MiB padding   → encrypt → upload as blob UUID-3
  (all three blobs uploaded immediately — same as current design)

Privacy

All blobs are fixed-size (4 MiB + 40 bytes) — the invariant is preserved.

Small files gain a stronger privacy property than the current design. The adversary watching uploads cannot determine how many small files were added in a given epoch, what any individual small file's size is, or which blobs correspond to which files.

Large files behave identically to the current design — bounded timing-conditional leakage from blob count. No regression, no new leakage.

Benefits over pure epoch batching

	Pure epoch batching	Hybrid auto-routing
Small files: padding waste	Near zero	Near zero
Small files: timing leakage	Eliminated	Eliminated
Large files: upload delay	Full delay until epoch flush	None — all chunks upload immediately
Large files: restore latency	Full delay	None — all chunks in cloud immediately
Large files: backup completeness risk	Yes — last chunk stuck in buffer	None
Epoch buffer size	All files	Small files only
Manifest complexity	Single-mode	Dual-mode (small files only)
Implementation complexity	Medium	Medium + size threshold check

Restore mechanics

The manifest must support two kinds of chunk location: standalone (current design) and packed extent (epoch or bin-packed blob). On restore, the client resolves each chunk differently depending on type.

Because large files upload all chunks immediately (including trailing partials), they never appear in epoch blobs. The manifest schema change and dual-mode lookup only apply to small files.

Manifest schema change:

-- existing columns
blob_id      TEXT     -- UUID of the cloud blob (NULL if packed in epoch blob)
chunk_index  INTEGER

-- added for small-file packed extents only
epoch_blob_id  TEXT     -- NULL for all large-file chunks; non-NULL for packed small files
byte_offset    INTEGER  -- byte start within the epoch blob
byte_length    INTEGER  -- byte count of this file's data

Large file chunks always have epoch_blob_id = NULL — they are standalone blobs. The dual-mode logic only triggers for small files.

Restore flow for a large file — unchanged from current design:

Restore large_file.mp4 (10 MiB, chunk_size = 4 MiB):

  Manifest:
    chunk 0 → standalone blob UUID-1
    chunk 1 → standalone blob UUID-2
    chunk 2 → standalone blob UUID-3  (trailing partial, padded)

  1. Fetch UUID-1 → decrypt → chunk 0
  2. Fetch UUID-2 → decrypt → chunk 1
  3. Fetch UUID-3 → decrypt → chunk 2 (truncate 2 MiB padding)
  4. Concatenate → truncate to 10_485_760 bytes → done

No epoch involvement. No byte offset arithmetic. Identical to current restore logic.

Restore flow for a small file packed in the same epoch blob:

Restore small_doc.pdf (800 KB):

  Manifest:
    chunk 0 → epoch blob UUID-E, offset=0, len=819_200

  1. Fetch UUID-E (4 MiB + 40 B) → decrypt → 4 MiB plaintext
     → slice [0 : 819_200] → 800 KB → chunk 0
  2. Truncate to 819_200 bytes → file restored

If both files are restored in the same session, UUID-E only needs to be downloaded and decrypted once. The client can cache decrypted epoch blobs in memory across the restore of multiple files.

AAD for epoch blobs:

Arx Runa binds each standalone chunk to file_id || chunk_index in the AEAD AAD. An epoch blob contains data from multiple files, so there is no single file-specific binding. Epoch blobs use their own epoch_blob_id as the AAD. Individual file data integrity within the blob is guaranteed by the manifest's byte offsets — the manifest is protected by SQLCipher and authenticated at the database level.

Comparison with bin-packing

Hybrid auto-routing and bin-packing solve the same problem with the same manifest schema. The differences are in when chunks are uploaded and what happens on mutation.

Where they converge:

Both pack multiple files into one fixed-size blob. Both require the byte_offset / byte_length schema extension. Both restore via the same two-path lookup. The implementation complexity of the manifest layer is identical.

Where hybrid auto-routing wins:

	Bin-packing (immediate)	Hybrid auto-routing (epoch)
Write amplification on update/delete	Yes — must re-encrypt entire chunk	No — soft-delete, new version in next epoch
Timing leakage for small files	Yes — pack uploads immediately	No — epoch flush hides individual additions
Large file upload delay	None	None (full chunks immediate)
Small file upload delay	None	Until epoch flush
Small file cloud availability	Immediate	Delayed

The decisive advantage of hybrid auto-routing is no write amplification. Bin-packing's write amplification is not a minor implementation detail — it compounds: deleting one file from a 4 MiB pack containing 8 small files forces a full decrypt-repack-re-encrypt cycle for all 8. In a vault with frequent deletions or renames, this becomes expensive and difficult to reason about.

Can bin-packing be modified to match or beat hybrid auto-routing?

Yes — by adopting the same techniques:

Soft-delete + compaction: Instead of repacking on delete, mark the extent as deleted in the manifest and repack lazily during a periodic compaction pass. This eliminates write amplification. At this point bin-packing's mutation behaviour is identical to hybrid auto-routing's.
Deferred flush: Instead of uploading each completed pack immediately, accumulate packs into an epoch and flush as a batch. This eliminates timing leakage. At this point bin-packing's upload behaviour is identical to hybrid auto-routing's epoch flush.

Once both modifications are applied, the two approaches are functionally identical — hybrid auto-routing is simply bin-packing that adopts soft-delete and epoch flush from the start rather than as retrofits.

Bin-packing's one remaining advantage:

A bin-packing implementation that uploads completed packs immediately (without epoch delay) gives small files immediate cloud availability. This is the only meaningful trade-off: timing leakage in exchange for no upload delay. For a backup tool where small files (documents, configs) are rarely time-sensitive, this trade-off is not worth making. For a use case where the user expects files to appear in the cloud within seconds of being added, it matters.

Conclusion: Hybrid auto-routing with epoch-based flush is the strictly better design for Arx Runa's use case. It matches bin-packing on storage efficiency, beats it on write amplification, and beats it on timing privacy. Bin-packing can be retrofitted to match, but doing so requires adopting the same two mechanisms that define hybrid auto-routing — at which point the distinction is architectural framing, not substance.

Trade-offs

Small files are not immediately available in cloud until the epoch flushes. Acceptable for backup/sync use cases.
The epoch buffer must be encrypted at rest and cleared on vault lock.
Soft-deleted small files accumulate in epoch chunks until compaction.
Restore requires dual-mode chunk lookup for small files (packed extent vs standalone); large file restore is unchanged.
Epoch flush trigger must handle the case where the buffer has data but no more small files arrive — vault lock should always force a flush to guarantee all data is in the cloud.

Epoch Buffer Flush Triggers

The hybrid auto-routing approach queues small files in a local epoch buffer until the buffer fills or another condition triggers upload. This raises a critical question: what if the user adds a single small file and then does nothing else?

The Single-File Problem

User: *uploads one 2 MB photo*
Arx Runa: "File added successfully!" 
         *puts file in staging buffer*
         *waits for more files to fill the 4 MiB chunk*
User: *locks vault and goes to bed*

If the only flush trigger is "buffer full", the file sits locally and is never backed up to the cloud. If the device fails before the next batch upload, the file is lost. This breaks the user's mental model of cloud backup.

The flush trigger policy must balance three constraints:

Backup completeness: Files in the buffer must eventually reach the cloud
Timing privacy: Frequent flush events create observable patterns
User expectation: Users expect "added" files to be "backed up" within a reasonable time

Option 1 — Time-Based Flush

#![allow(unused)]
fn main() {
Buffer flush triggers:
1. Buffer ≥ chunk_size (4 MiB) → flush immediately
2. T seconds elapsed since first file added → flush partial buffer
3. Vault lock → flush everything
4. User clicks "Sync Now" → flush immediately
}

Variants:

T = 60 seconds: near-immediate backup, weak timing privacy (uploads every minute during active use)
T = 300 seconds (5 minutes): reasonable backup window, moderate timing privacy
T = 900 seconds (15 minutes): strong timing privacy, users may perceive backup as "slow"

Privacy analysis:

If T = 5 minutes, the adversary observing cloud uploads sees:

10:00 — user adds 1 photo      → starts 5-min timer
10:05 — buffer flushes         → cloud sees 1 blob upload
10:07 — user adds 10 photos    → starts new 5-min timer
10:10 — user adds 5 more       → same timer still running
10:12 — buffer flushes (≥ 4 MiB threshold)  → cloud sees 4 blobs (15 photos packed)

The adversary sees two upload events (at 10:05 and 10:12) but cannot determine:

Whether 10:05 was 1 file or multiple files added before the timer expired
How many files went into the 10:12 batch
The exact times individual files were added within each epoch

Compared to per-file upload: much better (no 1:1 file-to-blob mapping)
Compared to pure epoch batching: weaker (timer creates periodic observable events)

Trade-off verdict: Time-based flush is a reasonable middle ground. It prevents indefinite local-only storage while still providing meaningful timing obfuscation for multi-file batches.

Option 2 — Lock-Only Flush (Pure Haystack Model)

#![allow(unused)]
fn main() {
Buffer flush triggers:
1. Buffer ≥ chunk_size → flush
2. Vault lock → flush everything
}

Privacy: Maximum timing obfuscation. No periodic events. The adversary only sees uploads when the vault locks — which may be once per day, or once per week.

Risk: If the user never locks the vault (always-on desktop scenario), files accumulate locally for days. If the device crashes before the next lock, all staged files are lost.

UX problem: Users adding files to an unlocked vault see "File added" but the cloud backup counter does not increment. The file is not backed up yet, but the UI suggests it is.

Mitigations:

UI indicator: "N files staged, will sync when vault locks"
Auto-lock after 1 hour idle → forced flush
Persistent staging: buffer survives restarts → files eventually flush on next lock

Trade-off verdict: Pure lock-only flush is too risky for general use. The "always-on vault" scenario is realistic (desktop vaults used for active work), and crash-before-lock data loss is unacceptable. This model is correct for write-once archival vaults (Haystack's design) but not for mutable active-use vaults.

Option 3 — Adaptive Multi-Condition Flush

#![allow(unused)]
fn main() {
pub struct EpochFlushPolicy {
    /// Flush after this duration since first file added
    /// Default: 300 seconds (5 minutes)
    pub time_threshold_seconds: u64,
    
    /// Flush when buffer exceeds this size
    /// Default: 50 MB (~12 typical photos)
    pub size_threshold_bytes: u64,
}

Buffer flush triggers:
1. Buffer size ≥ size_threshold_bytes → flush
2. time_threshold_seconds elapsed since first file added → flush
3. Vault lock → flush
4. User clicks "Sync Now" → flush
}

Behavior examples:

Scenario	What happens
User adds 1 small file, nothing else	After 5 min: uploads as 1 padded blob
User adds 20 photos in 30 seconds	After 30 sec: buffer hits 50 MB → flushes 3 blobs immediately
User adds 5 photos over 10 minutes	After 5 min from first: flushes partial batch; 5 min later: flushes remaining
User adds files, then locks vault	Immediate flush regardless of time/size

Privacy: Same as Option 1 (time-based) but with an additional size threshold to avoid holding large batches unnecessarily. The time threshold dominates the privacy trade-off.

Crash safety: Vault lock always flushes → no sensitive data left in staging. On crash before flush, files in staging are re-queued on restart.

UI indicators:

┌─────────────────────────────────────┐
│ Vault: my-photos                    │
│ Status: Unlocked                    │
│                                     │
│ ⏳ 3 files staged for sync          │
│    (auto-sync in 2m 15s)            │
│                                     │
│ [Sync Now]         [Lock Vault]     │
└─────────────────────────────────────┘

Users see:

How many files are pending
How long until auto-flush
Option to force immediate sync
Locking vault = guaranteed flush

Trade-off verdict: Adaptive multi-condition flush is the recommended approach. It balances backup completeness (5-min max wait), timing privacy (batching during active use), and crash safety (lock always flushes). The time threshold is tunable for different threat models.

Option 4 — Vault-Mode-Specific Policies

#![allow(unused)]
fn main() {
pub enum VaultMode {
    /// Active mutable vault: 5-minute time threshold
    GeneralPurpose,
    
    /// Archival write-once vault: lock-only flush
    Archive,
}
}

General-purpose vaults (default) use Option 3 (multi-condition flush). Archival vaults use Option 2 (lock-only). The user selects the mode at vault creation.

Rationale: Archival vaults (photo library import, document backup) align with the Haystack model — write-once, no updates, flush on lock. The user understands "I'm loading 10,000 photos, they'll upload when I click Done." General-purpose vaults (active work) need predictable backup without manual intervention.

Trade-off verdict: This is a future refinement. For the bachelor project, a single policy (Option 3) is sufficient. Document the vault-mode approach as a future enhancement.

Recommendation

Implement Option 3 — Adaptive Multi-Condition Flush with these defaults:

#![allow(unused)]
fn main() {
impl Default for EpochFlushPolicy {
    fn default() -> Self {
        Self {
            time_threshold_seconds: 300,      // 5 minutes
            size_threshold_bytes: 50_000_000, // 50 MB
        }
    }
}
}

Rationale:

5-minute time threshold is short enough to meet backup expectations without creating per-file timing leakage. Users adding a single document know it will reach the cloud within 5 minutes. Users batch-importing photos still get timing obfuscation if they add multiple files within the same 5-minute window.
50 MB size threshold (~12 typical HEIC photos, ~10 JPEG photos) triggers flush for large batch imports without waiting 5 minutes. This improves perceived responsiveness during bulk operations.
Vault lock always flushes ensures no sensitive plaintext is left in the staging directory after the session ends. This is a security requirement, not a performance optimization.
Manual "Sync Now" gives users control for time-sensitive uploads (adding a file right before catching a flight, etc.).

Document the trade-off explicitly:

The 5-minute auto-flush creates a weak timing side-channel: an adversary monitoring cloud uploads can observe that activity occurred within a given 5-minute window. This is strictly better than per-file upload timing (which reveals file-level granularity) but weaker than lock-only flushing (which reveals only session boundaries). The time threshold is a tunable parameter — users requiring maximum timing privacy can set it higher (or use an archival vault mode where it is disabled entirely). The default balances backup reliability against metadata leakage.

UI Requirements (Phase 6):

Status indicator showing staged file count and time until auto-flush
"Sync Now" button to trigger immediate flush
Visual confirmation when flush completes ("3 files backed up")
Settings screen allowing users to adjust time_threshold_seconds (advanced users only)

Integration with Cloud Sync Design

The flush policy affects the cloud synchronization design (Phase 4). When flush triggers:

Epoch buffer contents are packed into one or more fixed-size chunks
Each chunk is encrypted and moved to the staging directory as a standalone .blob file
The standard cloud push flow (from Phase 4 design) uploads staged blobs to vault/
After successful upload, staging .blob files are deleted
Manifest chunks table is updated with epoch_blob_id and byte offsets

The push flow does not need to know whether blobs are standalone (large file chunks) or packed (epoch blobs). All blobs are 4 MiB + 40 bytes, all have UUID names, and all upload identically. The flush policy is entirely internal to the storage layer.

Upload Jitter — Why It Does Not Work

A natural response to timing leakage is to add random delays between blob uploads — e.g., sleeping 1–5 seconds between each upload to blur burst boundaries. This is simple to implement and intuitively appealing. It does not solve the problem.

The adversary controls the clock

Arx Runa's threat model designates the cloud provider as adversarial. The cloud provider records millisecond-precision timestamps on every blob creation in their own storage logs — they have server-side visibility that the client cannot influence. Adding a 3-second delay between uploads does not remove those timestamps from the cloud's log. The adversary simply observes "3 blobs appeared at T+2s, T+5s, T+8s, then silence for 10 minutes" and still groups them as one file's upload.

Jitter is effective against a network-level observer (an ISP or passive eavesdropper watching connection traffic) who has coarser timing and cannot see inside the cloud's logs. For that threat, a few seconds of randomisation meaningfully blurs burst boundaries. But network-level observation is not Arx Runa's primary threat — the cloud provider is.

Statistical de-correlation

Even against a weaker adversary, jitter alone is consistently broken by traffic analysis. Timing obfuscation has been studied extensively in the context of Tor traffic, website fingerprinting, and VoIP analysis. The conclusion is consistent: random delays reduce correlation confidence but do not eliminate it, especially when the adversary can observe many uploads over time and build a statistical model of the upload pattern.

The right tool for the right adversary

Technique	Effective against cloud provider	Effective against network observer	Cost
Jitter (random delay)	No	Partially	Adds latency to every upload
Epoch batching	Yes	Yes	Delays small files until epoch flush
Constant-rate upload (dummy traffic)	Yes	Yes	Continuous cloud storage and API cost

Constant-rate uploading — sending dummy blobs at a fixed rate regardless of real activity — is the only timing defence that fully defeats the cloud provider. It is impractical for a consumer backup tool because it means paying for continuous uploads and cloud storage even when no files are being added.

Epoch batching addresses the timing problem at the right level: it eliminates per-file blob grouping by interleaving data before upload, rather than trying to obscure when individual blobs arrive.

Chunk Size as a Security Parameter

All previous approaches treat chunk size as a fixed implementation detail and focus on reducing the padding waste it causes. This section examines chunk size itself as a tunable security parameter — because it directly controls how much information the blob count leaks about file sizes.

What the chunk count leaks

The cloud cannot directly observe which blobs belong to which file — blobs are randomly named and the manifest is encrypted locally. The adversary can only infer N for a specific file by correlating upload timing: a burst of N blobs likely corresponds to one file's upload. If uploads are interleaved (epoch batching), this inference fails.

Assuming timing correlation succeeds, the adversary who observes N blobs for a single file can infer:

(N − 1) × chunk_size  <  file_size  ≤  N × chunk_size

A larger chunk size means a wider range — more uncertainty for the adversary. This is independent of cryptographic strength: XChaCha20-Poly1305 is equally strong at any chunk size. The only security dimension that chunk size affects is file size inference from blob count.

Concrete comparison across chunk sizes

For a 2.5 MB iPhone HEIC photo:

Chunk size	Blobs produced	What the adversary learns	Storage overhead
256 KB	10	File is 2.25–2.5 MB — very precise	7%
512 KB	5	File is 2.0–2.5 MB	14%
1 MiB	3	File is 2.0–3.0 MB	20%
4 MiB (current)	1	File is 0–4 MB	68%
8 MiB	1	File is 0–8 MB	84%

The counterintuitive result: reducing chunk size to save storage gives the adversary more precise file size information. Smaller chunks = better storage efficiency, weaker privacy. Larger chunks = worse storage efficiency, stronger privacy. These are directly opposed.

The security ceiling

Increasing chunk size beyond a certain point stops improving privacy. Once the chunk size comfortably exceeds the typical file size in a vault, almost every file produces exactly one blob — the adversary only learns "file is smaller than chunk_size." Further increasing the chunk size adds storage overhead without narrowing the adversary's uncertainty any further.

For a vault of iPhone photos (avg 2.5 MB), going from 4 MiB to 8 MiB chunks only improves privacy for the minority of photos between 4–8 MB. The majority already produce one 4 MiB blob, so the adversary's inference is unchanged. Going from 4 MiB to 16 MiB provides essentially no additional benefit while doubling the average padding waste per file.

For video files (hundreds of MB to several GB), chunk size barely affects privacy in either direction — the blob count is in the hundreds regardless, and the file size is already inferable to within one chunk.

Chunk size is not cryptographic block size

It is important not to confuse storage chunk size with the block size of a cipher. XChaCha20-Poly1305 is a stream cipher — it has no internal block size constraint in the traditional sense. The 4 MiB chunk size is a storage and metadata design decision, not a cryptographic one. Any chunk size that is a multiple of the nonce+tag overhead is equally valid from a cryptographic standpoint.

Vault-Specific Chunk Size

Since chunk size is a privacy vs. storage efficiency dial, it is a natural candidate for a vault-level configuration option set at vault creation time.

Why vault-level, not file-level

Chunk size is a property of the vault's storage format, not of individual files. Making it per-file would:

Require the manifest to store chunk_size per file (schema complexity)
Produce a mix of differently-sized blobs in the same vault (reducing the anonymity set — the adversary could distinguish file types by blob size)
Complicate encryption and retrieval logic

Per-vault chunk size keeps all blobs in a vault uniform, preserving the equal-size blob guarantee within that vault.

Vault types and ideal chunk sizes

Vault use case	Ideal chunk size	Reasoning
Sensitive documents, legal/medical	8 MiB	Maximum size ambiguity; storage overhead acceptable for small document collections
Photo archive	4 MiB	Photos are 2–5 MB; most produce 1 chunk; good privacy with moderate overhead
Video archive	4 MiB	Videos produce hundreds of chunks regardless; chunk size has little effect
Developer secrets / config files	512 KB	Files are tiny; user explicitly accepts size visibility for lower cloud cost
General purpose (default)	4 MiB	Reasonable balance for mixed workloads

Changing chunk size after vault creation

Chunk size cannot be changed without re-encrypting every blob in the vault. The existing blobs are padded and encrypted to the old chunk size — there is no way to resize them in-place. A chunk size change would require:

Downloading and decrypting all existing blobs
Re-chunking and re-encrypting with the new chunk size
Uploading all new blobs
Deleting all old blobs
Rebuilding the manifest

This is equivalent to recreating the vault from scratch. The chunk size should therefore be treated as an immutable vault property chosen at creation time, stored in manifest_meta, and validated on every open.

Precedent in existing systems

CryFS allows users to set the block size at vault creation (default 32 KiB). It is a documented configuration option, though framed as a performance parameter rather than a privacy one.
Azure Storage client-side encryption v2.1 made the chunk size configurable from 16 bytes to 1 GiB, having previously used a fixed 4 MiB.
Borg backup exposes chunker parameters (target chunk size, min/max size) as command-line options at repository creation.

None of these systems frame chunk size as a privacy dial — they expose it as a technical parameter. Arx Runa could be the first to present it explicitly as a privacy vs. storage trade-off, with human-readable preset names rather than raw byte counts.

Proposed vault creation UX

Vault storage mode:

  ○ Standard   (4 MiB chunks)
      Recommended for most use cases. Balanced privacy and storage cost.

  ○ Paranoid   (8 MiB chunks)
      Maximum file size ambiguity. Higher cloud storage usage.
      Best for: sensitive documents, legal or medical records.

  ○ Efficient  (512 KB chunks)
      Lower storage overhead for small files. Cloud can infer approximate
      file sizes within a 512 KB range.
      Best for: developer vaults, config files, small documents.

The vault header records the chosen chunk size. All subsequent operations use it without exposing the raw value to the user again.

Cross-vault considerations

If a user shares a file between vaults with different chunk sizes, the share package (Phase 5) must include the chunk size used for the shared file so the recipient can reconstruct it correctly. This is already implied by the share package design, which records chunk_size alongside chunk_uuids.

Comparative Summary

Approach	Storage savings (small files)	Timing leakage eliminated	Size leakage to cloud	Mutation cost	Complexity	Rust dependency
Current (4 MiB fixed)	None	No	±4 MiB range (timing-conditional)	None	—	—
Larger chunk size (e.g. 8 MiB)	Negative	No	±8 MiB range (coarser, better)	None	Minimal	None
Smaller chunk size (e.g. 1 MiB)	Medium	No	±1 MiB range (tighter, worse)	None	Minimal	None
Vault-specific chunk size	Depends	No	Depends on chosen size	None	Low	None
Bin-packing	High (near zero)	No	±chunk_size (unchanged)	High (write amplification)	High	None
Padmé padding	High (max 12%)	No	O(log log M) bits per file	None	Low	`rust-padme-padding`
Tiered chunk sizes	Medium	No	Size bucket per tier	None	Low	None
CDC	High (+ dedup)	No	High (fingerprinting)	None	Medium	Multiple crates
Epoch batching	High (amortised)	Yes (for batched files)	None	Low (soft delete)	Medium	None
Hybrid auto-routing (Approach 7)	High (small files)	Yes (small files)	None (small files); ±chunk_size (large files, timing-conditional)	Low (soft delete)	Medium	None
Upload jitter	None	No (against cloud provider)	No change	None	Minimal	None

Recommendation

There is no single best answer — the right approach depends on the workload and how much privacy trade-off is acceptable. Three paths stand out:

Path A — Smaller uniform chunk size (safest, lowest effort)

Reduce the chunk size from 4 MiB to 1 MiB. This requires changing one configuration constant and no architecture changes. Privacy is unchanged (all blobs still identical). iPhone HEIC photo overhead drops from 68% to 20%. Video overhead increases from 0.6% to ~2% (acceptable).

This is the correct first step — it addresses most of the practical overhead with essentially zero risk.

Path B — Padmé padding on the last chunk (best storage efficiency with bounded leakage)

Apply Padmé padding to the last (partial) chunk of each file. All complete chunks remain at the configured chunk size. The Rust crate jedisct1/rust-padme-padding is available. Maximum overhead drops to 12%. The cloud learns size buckets, not exact sizes.

This is the theoretically optimal approach for storage efficiency vs. leakage. It is a good candidate as an opt-in vault setting, clearly documented as trading the current bounded ±chunk_size leakage for near-zero padding overhead with Padmé-bounded leakage.

Path C — Hybrid auto-routing with epoch batching (Approach 7)

Implement the size-threshold routing from Approach 7: files smaller than chunk_size are automatically queued for epoch batching; larger files upload full chunks immediately with only trailing partials queued. This gives small files zero padding waste and eliminates timing correlation for them, while large files remain immediately available in cloud with no upload delay.

This is the most complete solution for photo-heavy vaults without any privacy trade-off. It requires a manifest schema extension (epoch_blob_id, byte_offset, byte_length on chunks) and a dual-mode restore path, but these are bounded in scope and do not affect the crypto layer.

Bin-packing (Approach 1) can be retrofitted to match by adopting soft-delete and epoch-based flush — at which point it is functionally identical to Approach 7. Starting with Approach 7 is preferable because it makes the right design choices from the beginning rather than as corrections to a mutable bin-packing design.

This addresses the most common high-overhead scenario (photo library import) without touching the general-purpose vault pipeline.

Path C is the chosen implementation approach. Paths A and B remain available as incremental steps or opt-in settings, but the target architecture is hybrid auto-routing.

What to avoid

CDC is incompatible with Arx Runa's threat model and should not be adopted regardless of the storage benefits. The fingerprinting attacks are concrete and published.

Full bin-packing on mutable vaults introduces write amplification that is disproportionate to the storage benefit for general-purpose use. Restrict it to explicit archival vault mode if implemented at all.

Decisions

Choices made during this research session. Updated as the session progresses.

Decision	Alternatives considered	Rationale
CDC (content-defined chunking) rejected	FastCDC, Rabin fingerprinting, Gear hashing	Published fingerprinting attacks (Truong 2025) enable file membership inference from encrypted chunk sizes; incompatible with Arx Runa's adversarial cloud threat model
Upload jitter rejected as a timing defence	Random delays between uploads, constant-rate dummy traffic	Cloud provider records server-side timestamps the client cannot influence; jitter does not remove these; epoch batching is the correct approach
Chunk size is user-configurable at vault creation (128 KiB–64 MiB, immutable after creation)	Fixed 4 MiB for all vaults, preset tiers only	Chunk size is a privacy vs. efficiency dial: larger = wider blob-count inference range = stronger privacy; smaller = lower overhead. Users choose the right point for their workload. Exposing it as a privacy parameter rather than a performance one is a differentiator over CryFS, Borg, and Azure. Immutable after creation because changing it requires re-encrypting every blob. Default remains 4 MiB.
Epoch buffer is user-configurable opt-in at vault creation (off by default)	Always-on epoch buffer, no epoch buffer option	In everyday single-file use the epoch buffer adds upload delay for no packing or privacy benefit — one file still produces one blob. Mandatory buffering harms usability without improving privacy for the common case. Users who want maximum timing privacy for bulk imports can enable it explicitly at vault creation.
Auto-Sync UI (Drop Zone) chosen as primary file ingestion interface	Explicit upload button only, system file picker only	Tauri WebView supports native drag-and-drop; a Drop Zone is the most natural interface for adding files to a vault and complements both immediate upload and epoch buffer modes without requiring menu navigation. Upload button retained as accessibility fallback.
Hybrid auto-routing (Approach 7) retained as the epoch buffer implementation	Standard bin-packing, pure epoch batching	When epoch buffer is enabled: files smaller than chunk_size are staged locally and packed; files larger than chunk_size upload full chunks immediately. This gives zero blob-size and blob-count leakage for small files while large files remain immediately available in cloud.
Epoch buffer flush trigger: adaptive multi-condition policy (Option 3)	Lock-only flush, time-only flush, single-file immediate upload	5-minute time threshold balances backup reliability against timing privacy; 50 MB size threshold handles bulk imports; vault lock always flushes ensuring no plaintext left in staging.

Open Questions

Chunk size for bachelor project: should the chunk size be changed from 4 MiB before implementation begins, or locked in and revisited post-launch? Changing it later is a breaking format change.
Padmé as opt-in: if implemented, Padmé changes the blob size contract. Vault metadata must record whether Padmé is enabled. Blobs from Padmé vaults and fixed-size vaults cannot be mixed in the same cloud path.
Combination approaches: Padmé + epoch batching could be combined. Padmé handles the last chunk of each epoch; epoch batching handles full chunks. This would achieve near-optimal storage efficiency with only bounded leakage and no write amplification.
User communication: how should storage overhead be communicated? A "vault storage efficiency" indicator (showing actual content size vs. cloud usage) would help users understand the trade-off they are accepting.
Vault-mode-specific flush policies: should archival vaults (write-once photo library import) use lock-only flushing for maximum timing privacy, while general-purpose vaults use the 5-minute auto-flush? This would require a vault mode selection at creation time.

Sources

Source	Topic	URL
ImperialViolet — Encrypting Streams (Adam Langley, 2014)	Canonical reference on per-chunk AEAD streaming encryption — position binding via AAD, chunk reordering/truncation attacks	imperialviolet.org/2014/06/27/streamingencryption.html
Libsodium — SecretStream	Practical streaming file encryption API — independent nonce per chunk, authentication tag per chunk, last-chunk marking	libsodium.gitbook.io/doc/secret-key_cryptography/secretstream
Google Tink — Streaming AEAD	Streaming AEAD standard — per-segment encryption enabling partial decrypt and verification	developers.google.com/tink/streaming-aead
RFC 5116 — An Interface and Algorithms for Authenticated Encryption	Normative AEAD interface and associated-data binding model	https://www.rfc-editor.org/rfc/rfc5116
Google Cloud — What is Blob Storage	Definition of Binary Large Object (blob) in object storage	cloud.google.com/discover/what-is-binary-large-object-storage
Microsoft Azure — Introduction to Blob Storage	Blob storage architecture — objects stored and retrieved as whole units, opaque to storage system	learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction
brokencloudstorage.info	End-to-End Encrypted Cloud Storage in the Wild: A Broken Ecosystem — real attacks on E2EE providers (Sync, pCloud, Icedrive, Seafile, Tresorit) exploiting metadata leakage	brokencloudstorage.info
ACM ToS — Encrypted Deduplication Leakage	Information Leakage in Encrypted Deduplication via Frequency Analysis — file volume patterns exploitable as a side-channel	dl.acm.org/doi/fullHtml/10.1145/3365840
arxiv — Side-Channel Leakage in Encrypted Traffic	The Inevitability of Side-Channel Leakage — size and access pattern leakage persists even with content encryption	arxiv.org/html/2602.14055v1
PETs 2019 — Nikitin et al. (EPFL)	Reducing Metadata Leakage from Encrypted Files and Communication with PURBs — Padmé definition, O(log log M) leakage bound, real-world evaluation on 848k files. DOI: 10.2478/popets-2019-0056	bford.info/pub/sec/purb.pdf
GitHub — jedisct1/rust-padme-padding	Rust implementation of Padmé — directly usable in Arx Runa	github.com/jedisct1/rust-padme-padding
GitHub — jedisct1/go-padme-padding	Go implementation of Padmé — reference for algorithm details	github.com/jedisct1/go-padme-padding
Padmé — age issue #83	Discussion of applying Padmé to the `age` encryption tool — implementation considerations	github.com/FiloSottile/age/issues/83
ktruong.dev	Breaking and Fixing Content-Defined Chunking — fingerprinting attacks on CDC backup systems	blog.ktruong.dev/breaking-cdc
restic.net	Introducing Content Defined Chunking — how CDC enables deduplication in restic	restic.net/blog/2015-09-12/restic-foundation1-cdc
FastCDC — USENIX ATC 2016	FastCDC: A Fast and Efficient Content-Defined Chunking Approach — algorithm details	usenix.org/system/files/conference/atc16/atc16-paper-xia.pdf
PETs 2019 — PURBs (petsymposium mirror)	Open-access mirror of the canonical PURBs paper	petsymposium.org/popets/2019/popets-2019-0056.pdf
Springer — Obfuscation Padding Schemes	Minimising Rényi Min-Entropy leakage via padding — theoretical foundations	link.springer.com/chapter/10.1007/978-981-99-7032-2_5
CryFS official site	Configurable block/chunk behavior precedent and product documentation	https://www.cryfs.org/
Cryptomator — Vault Cryptography	Fixed 32 KiB chunk size design — comparison point for chunk size choices	docs.cryptomator.org/security/vault
Azure Storage — Client-Side Encryption v2	Configurable chunk size (16 bytes to 1 GiB) in v2.1 — precedent for flexible chunk sizing	learn.microsoft.com — client-side encryption
Borg documentation — repository initialization	Borg repository initialization and chunking-related configuration surface as prior art	https://borgbackup.readthedocs.io/en/stable/usage/init.html

This is a living document. Add implementation findings and empirical overhead measurements as they emerge.

Glossary

Terms used consistently across Arx Runa documentation, use cases, and source code.

Vault

The entire encrypted storage namespace for a single user. A vault is not a folder - it is the top-level container that groups all of a user's encrypted files under one set of authentication credentials.

In cloud storage (Google Drive, Backblaze B2, S3, etc.) a vault appears as a configured root directory containing:

<cloud-root>/
  vault-header.json        <- plaintext JSON (public parameters)
  manifest/
    manifest-backup.blob   <- encrypted manifest backup
  vault/
    <uuid>.blob            <- encrypted file chunks (flat, no structure)
  shared/                  <- reserved for file sharing (Phase 5)

The cloud provider sees blob count, uniform blob sizes, and access timing - never filenames, folder structure, or file contents.

Vault Header

A plaintext JSON file stored at the cloud root (vault-header.json). It contains only public parameters needed to bootstrap key derivation on a new device:

vault_id - UUID v4 identifying the vault
tier - authentication tier selected at vault creation: 1 (password only) or 2 (password + USB key file)
argon2_salt - 32-byte Argon2id salt (CSPRNG-generated at vault creation)
argon2_params - Argon2id cost parameters (memory, iterations, parallelism)
key_file_blake3 - BLAKE3 fingerprint of the USB key file (Tier 2 only; null for Tier 1; preimage-resistant, does not reveal key material)
recovery_slot - optional wrapped_master_key for recovery-phrase access

The vault header is intentionally unencrypted: it must be downloadable before any keys exist, so a new device can derive the correct keys without prior authentication.

local-vault-params.json

A trusted local cache of vault-header KDF parameters stored in app data. It contains vault_id, argon2_salt, and argon2_params.

On existing devices, downloaded vault-header.json values must match this cache exactly. This blocks parameter downgrade and salt-swap attacks. The file is written at vault creation and updated after successful password or key-file rotation.

Manifest

The encrypted index of all files in a vault. The manifest is stored in a SQLCipher database locally (one database per device) and backed up to the cloud as manifest/manifest-backup.blob.

The manifest records, for each file:

Filenames and directory structure
Per-file file_key_wrapped stored in the nodes table
Chunk map: ordered list of UUID blob names and their sizes
snapshot_counter for sync conflict detection

The local SQLCipher database is encrypted with sqlcipher_key (derived from master_key via HKDF). The cloud backup (manifest-backup.blob) is separately encrypted with manifest_key (also derived from master_key via HKDF). The cloud never sees manifest contents in plaintext.

SQLCipher

An open-source extension to SQLite that provides transparent 256-bit AES encryption of the entire database file. Arx Runa uses SQLCipher as its local manifest database, keyed with sqlcipher_key. The database stores file paths, directory structure, chunk records, BLAKE3 checksums, and wrapped file keys. Without the correct sqlcipher_key, the database file is indistinguishable from random bytes.

Blob

A single encrypted file chunk stored in the cloud. Blob names are UUID v4 strings with a .blob extension (e.g., 3f7a1b2c-dead-beef-cafe-112233445566.blob). All blobs are padded to exactly 4 MiB, making them indistinguishable by size.

Chunk

A 4 MiB fixed-size block produced when a file is split for encryption. The final chunk of a file is zero-padded to 4 MiB. Each chunk is encrypted independently with XChaCha20-Poly1305 and becomes one blob in the cloud.

The chunk size is fixed (not content-defined) to prevent file size inference from blob sizes.

XChaCha20-Poly1305

The authenticated encryption scheme used to encrypt every file chunk in Arx Runa. XChaCha20 is a stream cipher with a 192-bit nonce; Poly1305 is the authentication tag. Each encryption call generates a fresh 192-bit nonce from the CSPRNG, making nonce collisions negligible across any realistic file count.

The wire format per chunk is [24-byte nonce | ciphertext | 16-byte Poly1305 tag]. The tag detects any single-bit tampering before decryption returns data.

XChaCha20-Poly1305 is also used to wrap master_key in the recovery_slot.

BLAKE3

A cryptographic hash function used in Arx Runa for two purposes:

Blob integrity — a BLAKE3 checksum is computed over each encrypted blob after encryption and stored in the manifest. On download the checksum is verified before decryption begins, catching bit rot or storage corruption immediately.
USB key file fingerprint — the vault header stores the BLAKE3 hash of the 32-byte key file. The hash identifies which file to use; it is preimage-resistant and reveals nothing about the key file bytes.

master_key

The root key material for a vault session, derived by Argon2id from the user's password (and USB key file bytes for Tier 2). All other vault keys are derived from master_key via HKDF-SHA256.

Tier 1: master_key = Argon2id(password, salt)
Tier 2: master_key = Argon2id(password || key_file_bytes, salt)

master_key exists only in RAM during an active session and is zeroized on vault lock.

Argon2id

A memory-hard key derivation function and winner of the Password Hashing Competition. Arx Runa uses Argon2id to derive master_key from the user's password (and USB key file bytes for Tier 2).

Its memory requirement — 64 MiB per invocation at default parameters — ensures that offline brute-force attacks remain expensive regardless of the attacker's hardware: where conventional hash functions can be parallelised onto thousands of GPU cores, Argon2id's memory bandwidth demand caps the number of guesses that can run simultaneously. Parameters (memory, iterations, parallelism) are stored in the vault header so they can be tuned in future versions without invalidating existing vaults.

HKDF

HMAC-based Extract-and-Expand Key Derivation Function (RFC 5869). Arx Runa uses HKDF-SHA256 to expand master_key into three purpose-specific vault-level keys, each derived with a distinct info label:

b"arx-runa-key-encryption" → key_encryption_key
b"arx-runa-sqlcipher" → sqlcipher_key
b"arx-runa-manifest-backup" → manifest_key

Separate labels ensure the three derived keys are cryptographically independent: compromise of one reveals nothing about the others. master_key is zeroized immediately after all three derivations complete.

CSPRNG

Cryptographically Secure Pseudo-Random Number Generator. The source of all random material in Arx Runa: encryption nonces, file_key values, Argon2id salts, and USB key file bytes. Arx Runa delegates CSPRNG calls to the operating system (getrandom on Linux/macOS, BCryptGenRandom on Windows).

file_key

A 32-byte random key generated per file at encryption time. It is the XChaCha20-Poly1305 encryption key for that file's chunks. At rest, it is stored only as file_key_wrapped in the nodes table.

key_encryption_key

A vault-level key derived from master_key via HKDF with info b"arx-runa-key-encryption". It wraps and unwraps at-rest file keys (nodes.file_key_wrapped and received_shares.file_key_wrapped). It is not used for chunk encryption.

manifest_key

A vault-level key derived from master_key via HKDF with info b"arx-runa-manifest-backup". It encrypts and decrypts the singleton cloud manifest backup manifest/manifest-backup.blob.

sqlcipher_key

A vault-level key derived from master_key via HKDF, used to encrypt the local SQLCipher manifest database.

mlock

A POSIX system call (and its Windows equivalent VirtualLock) that pins memory pages to RAM and instructs the OS not to swap them to disk. Arx Runa calls mlock on all session key buffers (key_encryption_key, sqlcipher_key, manifest_key) immediately after derivation. If mlock fails, Arx Runa refuses to open the session rather than silently operating with keys that could be written to a swap file or hibernation image.

Staging

A temporary local directory used during sync. Encrypted blobs are written to staging immediately after encryption, before cloud upload. Once a blob is confirmed uploaded it is deleted from staging. On startup, any orphaned blobs left by a previously interrupted session are cleaned up. The cloud never receives data from staging directly; it receives only finished encrypted blobs.

USB Key File

A 32-byte file of cryptographically random entropy stored on a physical USB drive. Used as the hardware second factor in Tier 2 authentication. The key file is identified by its BLAKE3 fingerprint stored in the vault header - the filename is irrelevant.

Losing the USB key file without a backup means permanent loss of access to Tier 2 vaults. See Use Case 3.

Tier 1 / Tier 2

Authentication tiers selected when creating a vault:

Tier	Factors	Key derivation
Tier 1	Password only	`Argon2id(password, salt)`
Tier 2	Password + USB key file	`Argon2id(password

Both tiers are zero-knowledge - the cloud provider never holds key material. Tier 2 additionally requires physical possession of the USB key file on every access.

BYOC (Bring Your Own Cloud)

Arx Runa's cloud-agnostic storage model. Users configure any Rclone-supported backend (Google Drive, Backblaze B2, Amazon S3, Azure Blob, self-hosted MinIO, etc.) as their vault's cloud storage. Arx Runa does not operate its own storage infrastructure.

Rclone

An open-source command-line tool that manages file synchronization and transfer to cloud storage backends. Arx Runa uses Rclone as its cloud transport layer to achieve Bring Your Own Cloud (BYOC) compatibility. Rclone supports 70+ storage providers including Google Drive, S3, Backblaze B2, Dropbox, Azure Blob, and self-hosted solutions like MinIO.

Arx Runa invokes Rclone programmatically to upload and download encrypted blobs, treating it as a storage-agnostic abstraction. Users configure their chosen backend via Rclone's standard configuration file (rclone.conf), and Arx Runa never handles cloud provider credentials directly.

See Rclone official documentation for configuration guides and supported backends.

Zero-Trace

The principle that Arx Runa leaves no plaintext artifacts on the host machine during a session. Decrypted file content is held in RAM only, never written to disk as temp files, thumbnails, or OS caches. When the vault is locked, no recoverable plaintext remains on the device.

snapshot_counter

An integer stored in the manifest that increments on every push. Used to detect whether the local manifest is out of date relative to the cloud backup. If the local snapshot_counter is less than the cloud's, a pull is required before pushing changes.

AAD (Additional Authenticated Data)

The binding value included in every XChaCha20-Poly1305 encryption call: file_id || chunk_index. AAD prevents chunk-swap attacks - a chunk from one file cannot be spliced into another file's chunk sequence without causing an authentication failure.

EXIF

Exchangeable Image File format metadata embedded in photos and other media files. A photo typically carries GPS coordinates, camera model, lens settings, capture timestamp, and device serial number alongside the image itself. Arx Runa strips EXIF (and the related XMP and IPTC formats) from media files in memory before encryption begins. The original file on disk is never modified; the stripped copy is what enters the encryption pipeline.

BIP-39

A standard wordlist encoding scheme for binary entropy, originally designed for cryptocurrency hardware wallet seed phrases. Arx Runa uses it to encode the 256-bit recovery phrase as 24 human-readable words. The final word embeds a checksum that catches single-word transcription errors immediately, before Argon2id even runs.

recovery_key

A 256-bit key derived by running the 24-word BIP-39 recovery phrase through Argon2id (using the same parameters and a dedicated recovery salt from the vault header). It exists only in RAM during a recovery ceremony and is used to unwrap wrapped_master_key from the vault header's recovery_slot. It is never stored anywhere.

recovery_slot

An optional field in the vault header that stores wrapped_master_key. When present, it allows the vault to be opened with the BIP-39 recovery phrase instead of the primary password. The slot is populated at vault creation (opt-in) and re-wrapped under updated credentials after every successful password rotation or recovery ceremony.

wrapped_master_key

The master_key encrypted with recovery_key under XChaCha20-Poly1305, stored in the vault header recovery_slot. 72 bytes total: [24-byte nonce | 32-byte ciphertext | 16-byte Poly1305 tag]. The AAD binding is vault_id_bytes, preventing the wrapped key from being transplanted to a different vault.

HPKE

Hybrid Public Key Encryption (RFC 9180). The asymmetric encryption scheme used in Arx Runa's file sharing. The ciphersuite is DHKEM(X25519, HKDF-SHA256) + HKDF-SHA256 + ChaCha20-Poly1305. HPKE seals the file_key and file metadata for a recipient's X25519 public key so that only the holder of the corresponding private key can open the envelope. Neither the cloud provider nor Arx Runa's infrastructure can read the contents.

X25519

The Diffie-Hellman key exchange function over Curve25519. In Arx Runa, each user generates an X25519 key pair as their sharing identity. The private key is stored in the encrypted vault; the public key is exported as a small file or QR code for out-of-band exchange with contacts. X25519 public keys are also the recipient key in HPKE share packages.

file_share_id

A UUID v4 that identifies the shared blob set at shared/<file_share_id>/. All recipients of the same shared file snapshot reference the same file_share_id. This is distinct from share_id, which is per recipient-file pair.

share_id

A UUID v4 that identifies one recipient-file share relationship. It appears in each share package and is the primary key in shares and received_shares. Multiple share_id values can reference one file_share_id.

sender_public_key

The owner's 32-byte X25519 public key in the share package, stored as received_shares.sender_public_key. Recipients use it to encrypt download receipts even when no contact row exists.

received_shares.file_key_wrapped

The recipient-side at-rest file key in SQLCipher. During import, raw file_key from the HPKE package is wrapped with local key_encryption_key, and only the wrapped value is persisted. Raw key bytes are zeroized after wrapping.

There is no separate share_key in Arx Runa. Share packages carry the existing per-file file_key inside the HPKE envelope. See Use Case 4.

Arx Runa Security Model

This document describes the security model of Arx Runa for a technically literate audience. It covers authentication tiers, brute force resistance, attack chains, endpoint threats, the cloud access model, and the identity system used for file sharing.

Zero-Trace Interpretation (Transient vs Persisted Plaintext)

Arx Runa's zero-trace policy separates transient runtime exposure from prohibited persistence:

Expected (transient in-memory use): decrypted plaintext may exist briefly in process memory while the user is actively decrypting, viewing, or processing data.
Prohibited (persisted/logged leakage): decrypted plaintext must not be written to durable or externally emitted outputs such as disk files, logs, telemetry, or developer-tooling output (for example debug traces or diagnostic command output).

Zero-trace means no persisted plaintext artifacts under application control. It does not claim plaintext is "never in memory."

Authentication Tiers

Arx Runa supports two authentication tiers. Both use Argon2id as the password-based key derivation function (KDF), but they differ in what material is fed into it.

Tier 1 — password only

master_key = Argon2id(password_utf8_bytes, argon2_salt, params)

The 32-byte argon2_salt is generated via CSPRNG at vault creation and stored in the vault header. New vaults use Argon2id defaults (m=65536 KiB, t=3, p=4), and these parameters are stored in the header for cross-device bootstrap. On existing devices, downloaded vault_id/Argon2 values are treated as untrusted cloud input and must exactly match locally cached local-vault-params.json; only first bootstrap (no cache) accepts OWASP floors (19456/2/1) with a warning below Arx defaults.

Tier 2 — password + USB key file

master_key = Argon2id(password_utf8_bytes || key_file_bytes, argon2_salt, params)

The key file is 32 bytes (256 bits) of CSPRNG entropy with no internal structure. It is concatenated with the password bytes before being passed to Argon2id. Because the key file is always exactly 32 bytes, the split point in the combined input is unambiguous at total_length - 32.

In both tiers, master_key is used only as input to HKDF-SHA256 (RFC 5869) to derive three purpose-specific keys, after which it is zeroed from memory and never stored.

Key Derivation from master_key

After Argon2id produces master_key, HKDF-SHA256 expands it into three vault-level keys using distinct info strings to guarantee cryptographic separation:

Derived Key	HKDF `info` string	Purpose
`key_encryption_key`	`arx-runa-key-encryption`	Wraps and unwraps per-file `file_key` values
`sqlcipher_key`	`arx-runa-sqlcipher`	Keys the local SQLCipher database
`manifest_key`	`arx-runa-manifest-backup`	Encrypts the cloud manifest backup

master_key is zeroed immediately after all three derivations complete. It is never assigned to a struct field, returned from a function, or written to any storage.

Brute Force Resistance

Why offline attacks differ from online attacks

An attacker who can submit authentication attempts to a server is rate-limited and eventually locked out after a small number of failures. With cloud-stored vaults, this defence does not apply to the password derivation itself: the attacker can fetch the vault header (which contains argon2_salt and argon2_params) and run derivation attempts locally, without rate limiting.

The defence against this offline attack is the computational cost of Argon2id.

Tier 1 offline resistance

Argon2id is the recommended algorithm for offline password hashing because its large memory requirement defeats GPU-parallel attacks.

At Arx Runa defaults (m=65536 KiB, t=3), each derivation requires approximately 64 MiB of RAM. GPU cores have limited per-core memory bandwidth; the memory requirement prevents the massive parallelism that makes GPUs effective against algorithms such as PBKDF2 or bcrypt.

The practical result is that GPU parallelism is severely limited compared to algorithms such as PBKDF2 or bcrypt, making offline brute force substantially harder for a given hardware budget.

Tier 1 brute force resistance depends entirely on password quality. A password with insufficient entropy can be found within a practical time budget regardless of Argon2id cost. A minimum of 12 randomly chosen characters from a broad character set is recommended; passphrase-style passwords of equivalent or higher entropy are equally acceptable.

Tier 2 offline resistance

When a key file is present, the Argon2id input is password_utf8_bytes || key_file_bytes. The key file contributes 256 bits of CSPRNG entropy. An attacker who does not possess the physical key file must search a 2^256-element space in addition to the password space. This is computationally infeasible regardless of hardware, and the Argon2id memory-hardness cost compounds the infeasibility.

Tier 2 brute force resistance is not dependent on password quality in the same way as Tier 1.

Attack Chain for a Compromised Password (Tier 1)

If an attacker obtains the correct password, the following steps are required to access vault contents. Each step is a prerequisite for the next.

Obtain the vault header from cloud storage — plaintext bootstrap metadata that must be treated as untrusted input
Extract argon2_salt and argon2_params from the vault header
Derive master_key = Argon2id(password, argon2_salt, params)
Derive key_encryption_key and sqlcipher_key via HKDF-SHA256
Authenticate to the cloud provider and download the encrypted SQLCipher manifest backup
Open the manifest with sqlcipher_key to obtain the list of files and their wrapped file_key values
Unwrap per-file file_key values using key_encryption_key
Authenticate to the cloud provider and download individual encrypted blobs
Decrypt blobs using the per-file file_key and the AEAD construction

The password alone is not sufficient. For owner-private data, steps 5 and 8 still require independent cloud provider credentials. Existing devices also pin vault_id + Argon2 salt/params in local-vault-params.json, so cloud-side header tampering is rejected before derivation.

Effect of Tier 2: step 3 requires password || key_file_bytes as Argon2id input. Without the physical USB key file, master_key cannot be derived from the correct password alone. The entire chain from step 3 onward is blocked.

Endpoint Threats

The cryptographic model provides strong guarantees against offline attacks and network-level adversaries. It does not protect against a compromised endpoint. The following attacks bypass Argon2id and the AEAD layer entirely:

Keystroke logging: malware capturing the password at entry time
Memory scraping: OS-level inspection of process memory after master_key or SessionKeys have been derived and placed in mlocked RAM
Screen capture or shoulder surfing: observing the password during entry
Phishing or credential reuse: the user is deceived into entering the password in a false context, or the same password is used on a compromised service

These are not cryptographic weaknesses. They are endpoint security concerns outside the scope of what any client-side encryption scheme can address.

Tier 2 materially reduces the impact of endpoint threats. A stolen password is useless without the physical USB drive; a stolen USB drive is useless without the password. An attacker must simultaneously compromise both factors. This does not prevent all endpoint threats — malware with full runtime access to the process at authentication time could capture both factors simultaneously — but it eliminates the large class of attacks that obtain only one factor (phishing, credential database leaks, single-keylogger sessions before the USB drive is inserted).

Arx Runa mitigates memory scraping with mlock/VirtualLock on all session key buffers and ZeroizeOnDrop on all key types, which overwrites key material before memory is released to the allocator.

Cloud Access Model

A common point of confusion is which vault resources require cloud authentication and which are publicly accessible. The table below defines the access requirement for each resource category.

Resource	Access requirement	Rationale
Vault header (`vault-header.json`)	Cloud provider credentials (owner remote), no vault auth	Plaintext bootstrap metadata fetched before vault authentication; existing devices validate `vault_id` + Argon2 salt/params against local `local-vault-params.json` trust anchor
Private vault blobs (`vault/<uuid>.blob`)	Cloud provider credentials	Owner's encrypted chunks; accessible only to the authenticated cloud account
SQLCipher manifest backup (`manifest/manifest-backup.blob`)	Cloud provider credentials	Encrypted, but also gated by cloud provider authentication
Shared blobs (`shared/<file_share_id>/<uuid>.blob`)	None (publicly readable)	Recipients hold no cloud credentials; AEAD with per-file `file_key` protects content
Shared blob content	`file_key` from share package	Delivered inside an HPKE-encrypted share package; without `file_key` the ciphertext is permanently inaccessible

Rationale for the public shared/ path

Recipients of a shared file are not expected to hold the owner's cloud credentials. Requiring recipients to authenticate would introduce credential management complexity and tightly couple sharing to a specific provider's permission model. The shared/ path is designed to be publicly readable because the blobs are opaque AEAD ciphertext — blob names are UUID v4 (122 bits of entropy, not guessable), and without the file_key the contents cannot be decrypted.

This is an accepted architectural property stated in the threat model: ciphertext exposure to a party who discovers a shared/ folder UUID is not a security failure, because AEAD without the key provides no information about plaintext.

X25519 Identity and Key Exchange

Arx Runa generates a local X25519 keypair at vault creation. This keypair is the user's cryptographic identity for file sharing. There is no central key server or registration requirement.

The X25519 private key is stored in SQLCipher, wrapped with key_encryption_key. It is protected by the same authentication flow as all other vault content. If the vault password and key file are rotated, the private key is re-wrapped under the new key_encryption_key; the keypair itself does not change, and existing sharing relationships are preserved.

The X25519 public key is shared out-of-band — exported as a file or QR code and delivered via the user's own channel (email, messaging application, physical media). Arx Runa does not publish public keys to any directory and does not require an email address or any network-accessible identity.

Only parties to whom the user has explicitly delivered their public key can construct share packages addressed to that user. The security of key exchange is as strong as the out-of-band delivery channel. Arx Runa displays a short fingerprint (first 16 hex characters of the SHA-256 hash of the public key) to allow out-of-band verification; this verification is opt-in.

Share packages are encrypted using HPKE (RFC 9180) with DHKEM(X25519, HKDF-SHA256) + HKDF-SHA256 + CTX-ChaCha20-Poly1305. The owner seals a JSON payload (including per-file file_key) to the recipient's long-term X25519 public key, and the ephemeral private key used for encapsulation is discarded after use.

Arx Runa Documentation