Keystore API

Created	Author(s)	Status
2023-01-03	@neekolas	Draft

Background & Motivation

Working on our Metamask Snap has put a longstanding issue in our SDK to the forefront. We don't have a strong separation of concerns between the cryptographic functions in our SDK and the core business logic. Cryptographic operations and crypto classes are sprinkled throughout the SDK, with a baked-in assumption that sensitive private key data is accessible from anywhere. This makes it awkward and difficult to develop more secure methods for encrypting/decrypting/signing of data and for us to move to a model where key material is compartmented.

Snaps are not the only place where this is becoming an issue. As we develop a libxmtp to handle cryptographic operations, there are significant performance implications to passing data between the application context and WebAssembly. Having key material and cryptgraphic operations all live in the same context will make the code much simpler to reason about and lead to greater performance.

This separation of concerns is also an essential first step towards unlocking the "Wallet SDK", as well as enabling the Chrome Extension or 1P mobile app to manage keys on behalf of web apps.

Goals/Non Goals

Goals

Refactor the xmtp-js codebase to establish strict modularization of code that handles business logic/API calls and code that handles private key material/cryptographic operations.
Establish a well defined interface for interacting with the Keystore
Do this in a way that minimizes impact to application developers. Should be a non-breaking change for typical apps
Simplify the Client code by handling all key management and cryptography in one place

Non goals

Actually move any of the core cryptgraphic operations out of the SDK. This is purely a refactor to allow for future releases that outsource key storage/encryption/decryption elsewhere (Snap, Chrome Extension, Wallet, 1P app, lixbxmtp).
Change any of our encryption primitives
Specify how this would work in languages other than JS (that may be desirable, but the immediate use-cases are all around the web SDK)

Proposed Solution

I propose refactoring the codebase to move any code that interacts with private keys, topic keys, and encryption/decryption into a separate module with a strict API boundary and well defined interface. In the initial version, all calls to the Keystore service will be in the same process as the rest of the SDK. The interface should be designed in a way that requests can be easily JSON serializable to allow for future providers that are remote (Snap, Chrome Extension, Wallet, 1P mobile app).

The keystore will need to maintain some amount of state (either persisted between sessions or ephemeral) to access the PrivateKeyBundle and any TopicKeys that have been found from invitations.

Components

In this new model, there are two distinct components of our SDK:

Client

The Client is responsible for all API calls to XMTP nodes, high level business logic (conversations abstraction), and handling of message encoding/decoding.

Keystore

The Keystore is responsible for holding the PrivateKeyBundle of the user (and any future delegated keys), encrypting/decrypting V1 messages, storing TopicKey material from invitations, and encrypting/decrypting V2 messages.

Separation of concerns

classDiagram
    class Keystore
    class Client
    Client: Conversations abstraction
    Client: All calls to XMTP APIs
    Client: ContentType management and decoding of content
    Keystore: Encrypt messages prior to publishing
    Keystore: Decrypt messages
    Keystore: Encrypt invitations
    Keystore: Decrypt invitations

Types Of Keystore

There are a number of potential keystore types I can see us developing over time. These are listed in rough order of priority and timing.

Base Keystore

The default Keystore, and the first one we will need to build, will simply be a module that implements the Keystore API locally. It will hold the user's PrivateKeyBundle and TopicKeys and execute API requests using those keys.

The Base Keystore can run in the same process as the Client, or it can be used to implement some of the remote Keystores listed below.

Effectively, this is just a refactor of our codebase.

Snaps Keystore

The Snaps Keystore will be a light wrapper around the Metamask Snaps API, where all requests to the Keystore are proxied as JSON-RPC calls to an installed Snap. The Snap will handle RPC requests using something similar to the Base Keystore, but with the additional capability of persisting keys in the Metamask encrypted storage.

Browser Extension Keystore

Users who have the XMTP Browser Extension installed would be able to proxy Keystore calls to the Browser Extension. The Browser Extension would implement some version of the Base Keystore. We would likely use a runtime.Port to communicate back and forth between the extension and the browser session. This would work similarly to window.ethereum (window.xmtp?) where a small bit of code would be injected into all webpages to handle communication.

Users would have to approve a dialog in the Chrome Extension for each domain they wish to use.

`libxmtp` Keystore

As we develop libxmtp, we can include a Typescript wrapper class that sends Keystore API calls across the WASM bridge to libxmtp to fulfill them. This keystore can be used inside other Keystores (for example, the Snap could proxy calls to libxmtp while using the Snap encrypted storage for persistence). Effectively a replacement for the Base Keystore.

Mobile Wallet Browser

Mobile wallets implementing XMTP can implement the Keystore API in the language of their choice. They can then inject window.xmtp into their mobile browser sessions, functioning in a similar way to the Browser Extension Keystore. That will allow any web page that uses XMTP to access the wallet's keystore (after the user grants permission to the domain)

Wallet SDK

See Wallet SDK for more details about how a communication channel may be established between browser sessions and mobile wallets.

API Design

One requirement of this API is to support the lowest common denominator of transport providers (JSON-RPC) with minimal translation overhead.

One way to do that is to design everything using primitive types that can be easily serialized to JSON.

Alternatively, we could use Protobuf serializable classes as API parameters, and serialize/base64 encode the protobufs and embed them in JSON payloads.

The design I propose leans closer to the "all primitive types", but does make use of some custom classes. For someone implementing a Keystore implementation that needs to send requests over the wire as JSON, the implementation would be responsible for that serialization of requests and deserialization of responses. As long as all of the classes are stateless, this should be straightforward.

There are many ways we could represent this service boundary/API spec, which you can think of on a spectrum of "More work in the client" <--> "More work in the Keystore". The definitions expressed below err on the side of the "More work in the client", and try to keep the logic executed in the Keystore to the bare minimum. For example, the Keystore is not responsible for decoding message headers (except where required to validate identities) or handling content encoding.

I would treat the exact definitions below as very provisional, other than the rough scope of what we would like to do in the Keystore.

import { messageApi } from '@xmtp/proto'
import Ciphertext from '../crypto/Ciphertext'
import {
  SignedPublicKeyBundle,
  PublicKeyBundle,
} from '../crypto/PublicKeyBundle'
import { InvitationContext } from '../Invitation'
import { ErrorCode } from './errors'

type KeystoreError = {
  error: string
  code: ErrorCode
}

export type ResultOrError<T> = T | KeystoreError

export type ConversationReference = {
  topic: string
  createdAt: Date
  context?: InvitationContext
}

export type DecryptV1Request = {
  payload: Ciphertext
  peerKeys: PublicKeyBundle
  headerBytes: Uint8Array
  isSender: boolean
}

export type DecryptV1Response = ResultOrError<{
  decrypted: Uint8Array
}>

export type DecryptV2Request = {
  payload: Ciphertext
  headerBytes: Uint8Array
  // Need to include contentTopic for the Keystore to know what topic key to use
  contentTopic: string
}

export type DecryptV2Response = ResultOrError<{
  decrypted: Uint8Array
}>

export type EncryptV1Request = {
  recipient: PublicKeyBundle
  payload: Uint8Array
  headerBytes: Uint8Array
}

export type EncryptResponse = ResultOrError<{
  ciphertext: Ciphertext
}>

export type EncryptV2Request = {
  contentTopic: string
  message: Uint8Array
  headerBytes: Uint8Array
}

export type CreateInviteRequest = {
  recipient: SignedPublicKeyBundle
  createdAt: Date
  context: InvitationContext
}

export type CreateInviteResponse = {
  conversation: ConversationReference
  // The full bytes of the sealed invitation, which can then be published to the API
  payload: Uint8Array
}

export interface Keystore {
  // Decrypt a batch of V1 messages
  decryptV1(req: DecryptV1Request[]): Promise<DecryptV1Response[]>
  // Decrypt a batch of V2 messages
  decryptV2(req: DecryptV2Request[]): Promise<DecryptV2Response[]>
  // Encrypt a batch of V1 messages
  encryptV1(req: EncryptV1Request[]): Promise<EncryptResponse[]>
  // Encrypt a batch of V2 messages
  encryptV2(req: EncryptV2Request[]): Promise<EncryptResponse[]>
  // Decrypt and save a batch of invite for later use in decrypting messages on the invite topic
  saveInvites(
    req: messageApi.Envelope[]
  ): Promise<ResultOrError<ConversationReference>[]>
  // Create the sealed invite and store the Topic keys in the Keystore for later use
  createInvite(req: CreateInviteRequest): Promise<CreateInviteResponse>
  // Get V2 conversations
  getV2Conversations(): Promise<ConversationReference[]>
  // Used for publishing the contact
  getPublicKeyBundle(): Promise<SignedPublicKeyBundle>
  // Technically duplicative of `getPublicKeyBundle`, but nice for ergonomics
  getWalletAddress(): Promise<string>
}

Common workflows

Loading V2 conversations

sequenceDiagram
    Participant C as Client
    Participant N as Node
    Participant K as Keystore
    C-)N: Query for invites
    N-->>C: Receive encrypted invites
    C-)K: saveInvites with encrypted payloads
    C-)K: getV2Conversations
    K-->>C: Receive conversation list

Listing messages

sequenceDiagram
    Participant C as Client
    Participant N as Node
    Participant K as Keystore
    C-)N: Query for envelopes
    N-->>C: Receive envelopes
    C-)K: decryptV2 with batch of payloads
    K-->>K: decrypt all payloads
    K-->>C: Receive decrypted payloads
    C-->>C: Decode content

Sending a message

sequenceDiagram
    Participant C as Client
    Participant N as Node
    Participant K as Keystore
    C-)K: getV2Conversations
    K-->>C: Receive conversations
    C-)K: encryptV2 { content, topic }
    K-->>C: Encrypted payload
    C-)N: Publish

Breaking changes

The Keystore is the beginning of the end for Client.getKeys(). While we may offer support for this feature for cases where the keys are in the browser context, I do not want to support any mechanism for extracting keys from secure contexts like Snaps or the Chrome Extension.
Caching conversations should happen inside the keystore. This should remove the need for user-facing APIs like conversations.export(). I would suggest we create a stateful keystore that uses LocalStorage to support the current use-cases of exporting Conversations
Some fields may disappear from user-facing classes like Conversation or DecodedMessage

Next steps

Finalize API design
Implement Base Keystore and refactor codebase to use it
Design and build Snap Keystore and Browser Extension Keystore

Keystore API

Background & Motivation​

Goals/Non Goals​

Goals​

Non goals​

Proposed Solution​

Components​

Client​

Keystore​

Separation of concerns​

Types Of Keystore​

Base Keystore​

Snaps Keystore​

Browser Extension Keystore​

libxmtp Keystore​

Mobile Wallet Browser​

Wallet SDK​

API Design​

Common workflows​

Loading V2 conversations​

Listing messages​

Sending a message​

Breaking changes​

Next steps​