XMTP Wallet SDK

Created	Author(s)	Status
2022-07-29	@neekolas	Draft

Background & Motivation

We know that key storage in the browser has major security risks. The way we gather signatures trains users to blindly approve a potentially dangerous signature request, as the prompt appears on every page load. A malicious application can permanently compromise a user's XMTP identity for both sending and receiving.

If we need a secure keystore, the obvious place to put this is a user's wallet. Our users already have wallets connected to XMTP apps and wallets already have very high standards for security.

Given the slow development cycles of wallet developers, we will likely want an interim solution that can act as a secure keystore and connects to third party wallets. This could be implemented as a browser extension or a standalone mobile app.

Goals/Non-Goals

Goals

Move IdentityPrivateKeys to a location that a compromised application cannot access
Minimal user friction in connecting wallets
Minimal wallet developer friction in implementing XMTP
Minimal friction for dApp developers
Get rid of user-visible wallet signatures

Non-goals

Specify how we would implement key ratcheting and device-keys. The solution to key management should be flexible enough to evolve with our encryption schemes.

Proposed Solution

We should provide a version of the SDK for wallet developers that would handle encrypting and decrypting messages on behalf of client applications. This Wallet SDK should keep any non-revokable encryption keys within the scope of the wallet at all times.

There are two ways we could structure this API:

Do not transmit any keys at all to the dApp. The dApp would send full message payloads and the PublicKeyBundle of counterparties to the wallet, and the wallet would encrypt/decrypt the message and return it to the client whole.
The wallet would maintain ownership of the IdentityKey, but would send the dApp a set of revokable and limited keys for messaging in a given conversation. The dApp could then use those keys until the session had ended and the keys were ratcheted by the wallet.

While 2 may be the long-term goal, for that to be possible we would need to have a key ratcheting system in place and/or device-specific keys. Those more advanced encryption features would need to be supported by both parties in a conversation, which may be challenging as we slowly bring new wallets on board with our existing implementation online and in use. 1 feels like a more practical short-term goal as it works nicely with our existing message/key structure.

For the purposes of this document, we will describe things in terms of 1. But 1 and 2 feel similar enough that any workable solution for one should be able to be adapted for the other.

Wallet Integration

Scope Of Responsibilities

Conceptually we should think of this project as breaking the xmtp-js SDK into two pieces, the Client SDK and the Wallet SDK. In practice, we may choose to offer both as a single NPM package for Javascript apps, but with clearly delineated modules with distinct responsibilities. Using both the Client SDK and the Wallet SDK in the same application would replicate today's scope of xmtp-js.

classDiagram
    class WalletSDK
    class ClientSDK
    WalletSDK: Communicate over session channel (for mobile wallets)
    WalletSDK: Encrypt/Decrypt messages
    WalletSDK: Key storage
    WalletSDK: Manage Authn
    ClientSDK: Communicate over session channel (when used with mobile wallets)
    ClientSDK: Send/Receive usert to user messages
    ClientSDK: Conversations abstraction
    ClientSDK: ContentType management and decoding

Key generation

When a user accesses XMTP for the first time we will still need to gather a wallet signature. The wallet developer should use a custom prompt for gathering this signature that is distinct from regular signature prompts and that explains what XMTP is. Once approved, the wallet would sign the message, generate XMTP keys, and store those keys permanently.

For wallets that have built-in backup facilities already (a common pattern is storing keys in an encrypted iCloud file), storing the keys on the network would be optional. For other wallets that do not have those capabilities storing the encrypted key bundle on the network would be required.

Browser Extension Wallets

For browser extension wallets - which includes both Metamask-type wallets and the built-in dApp browsers of mobile wallets - things are pretty straightforward. We would give the wallet developers a SDK that allows them to expose private APIs to the browser (similar to window.ethereum) which can access the encryption/decryption functions of the wallet. Approvals would be required per-domain before the wallet was allowed to perform encryption/decryption. Those approvals would be long-term and users would not need to re-approve, or provide signatures, when they returned to a site for some period of time (let's say 30 days).

Mobile wallets and desktop dApps

The problem here is very similar to the problem that WalletConnect 1.0 was designed to solve. We need a secure channel between the browser session and the mobile wallet, where messages can be passed back and forth. It's probably helpful to look at how WalletConnect currently solves this problem.

When a desktop user wants to pair with a WalletConnect wallet, they pass a specifically formatted URI to the wallet out of band (either through a QR code or a deep link). The URI is of the EIP-1328 format and looks something like this:

wc:$TOPIC@$WC_VERSION?relay-protocol=irn&symKey=$SYMMETRIC_ENCRYPTION_KEY

Because the URI is shared out of band it can contain secrets shared between the dApp and the Wallet. The two most relevant pieces of the URI are the $TOPIC and $SYMMETRIC_ENCRYPTION_KEY. The topic tells the wallet where to send communication to perform a handshake, and the symmetric encryption key is used for encrypting that handshake. The client generates both the topic and symmetric key randomly, and both are only used one time. During the handshake a new topic and encryption key are negotiated. These would be scoped to the length of a session.

While we could devise our own URI following a similar scheme in order to connect dApps and XMTP-compatible mobile wallets, I don't see any reason why we couldn't just hijack the WalletConnect URI and add a few XMTP-specific parameters.

This would mean we could have a single QR code that would enable both transaction signing (WalletConnect) and messaging (XMTP). This negates the entire UX advantage that WalletConnect 2.0 offers. Our SDK could be initialized at the same time as WalletConnect. It would be up to the wallet developer to build a UI that made this distinction clear, and offers the user a choice to connect the dApp to XMTP. Once access is granted in the mobile wallet the user would be able to use XMTP fully without needing to prompt for signatures.

To make this work, we would need our own secure channel to communicate between the dApp and the wallet. I suggest we use the XMTP network for this, and have both sides read/write from a negotiated topic. Because this is always real-time communication with both parties online we could have special rules on the nodes that would not store messages in these topics, as they are ephemeral by nature.

The first time user experience would look like this:

sequenceDiagram
    Participant C as Client
    Participant W as Wallet
    Participant U as User
    Participant N as Node
    C-->>W: Initiate handshake
    W-->>C: Send session encryption key and session topic
    Note over C,W: Communication over one-time topic using one-time encryption key
    W-->>U: Prompt for approval to generate XMTP identity
    U-->>W: Approval granted
    W-->>W: Create and locally store private key bundle using wallet signature
    W-->>N: Publish public key to contact topic
    W-->>N: (optional) Publish encrypted private bundle to PrivateTopicStore

For receiving messages for a wallet that has already been initialized, the flow would look like this:

sequenceDiagram
    Participant C as Client
    Participant W as Wallet
    Participant N as Node
    C-->>W: Initiate handshake
    W-->>C: Send session encryption key and session topic
    Note over C,W: Communication over one-time topic using one-time encryption key
    N-->>C: Download messages and associated PublicKeyBundles
    C-->>W: Send encrypted messages and PublicKeyBundles
    Note over C,W: Communication over session topic using sesssion key
    W-->>C: Return decrypted payloads

We will also have to figure out how to handle cases where the mobile wallet has been backgrounded. This will likely rely on us running a push server. I've considered it out of scope for now, but it's going to be important to figure out

Non-supported wallets

We need to maintain some way of using XMTP for users whose wallet of choice does not yet integrate the Wallet SDK. My proposal would be to run the Wallet SDK inside of a WebWorker and simulate the interface of Browser Extension Wallets. To both the end-user and developer, XMTP would continue to work the same way it does today.

We could also create a XMTP browser extension that would expose the same API as the browser extension wallet proposal, but would get wallet signatures from an external wallet (Metamask, WalletConnect, etc etc).

Risks?

We put in substantial work to the SDK and wallet developers never actually implement
Integration with wallets will be slow and updates to wallet code will be infrequent. Any feature that relies on changes to Wallet SDK code may be delayed as we wait for users/wallets to ugprade to the latest version.
Without the addition of identity key revokability, a compromised wallet or phishing attack will still permanently compromise a user on XMTP
Improved wallet UX will only help users with compatible wallets. The same problems we have today will persist for users with unsupported wallets.

Questions

How do we motivate/incentivize wallet developers to integrate the Wallet SDK?
After Javascript, do we just move forward and implement a native Swift and Java implementation of the Wallet SDK?
How do we get wallet developers involved in the design of the SDK to increase the likelihood of adoption?
How do we manage app-level authorizations in a way that cannot be spoofed (malicious-app-b claims to be connecting from already-approved-app-A and generates a new session)?

XMTP Wallet SDK

Background & Motivation​

Goals/Non-Goals​

Goals​

Non-goals​

Proposed Solution​

Wallet Integration​

Scope Of Responsibilities​

Key generation​

Browser Extension Wallets​

Mobile wallets and desktop dApps​

Non-supported wallets​

Risks?​

Questions​