XMTP Wallet SDK
| Created | Author(s) | Status |
|---|---|---|
| 2022-07-29 | @neekolas | Draft |
Background & Motivation
We know that key storage in the browser has major security risks. The way we gather signatures trains users to blindly approve a potentially dangerous signature request, as the prompt appears on every page load. A malicious application can permanently compromise a user's XMTP identity for both sending and receiving.
If we need a secure keystore, the obvious place to put this is a user's wallet. Our users already have wallets connected to XMTP apps and wallets already have very high standards for security.
Given the slow development cycles of wallet developers, we will likely want an interim solution that can act as a secure keystore and connects to third party wallets. This could be implemented as a browser extension or a standalone mobile app.
Goals/Non-Goals
Goals
- Move
IdentityPrivateKeysto a location that a compromised application cannot access - Minimal user friction in connecting wallets
- Minimal wallet developer friction in implementing XMTP
- Minimal friction for dApp developers
- Get rid of user-visible wallet signatures
Non-goals
- Specify how we would implement key ratcheting and device-keys. The solution to key management should be flexible enough to evolve with our encryption schemes.
Proposed Solution
We should provide a version of the SDK for wallet developers that would handle encrypting and decrypting messages on behalf of client applications. This Wallet SDK should keep any non-revokable encryption keys within the scope of the wallet at all times.
There are two ways we could structure this API:
- Do not transmit any keys at all to the dApp. The dApp would send full message payloads and the
PublicKeyBundleof counterparties to the wallet, and the wallet would encrypt/decrypt the message and return it to the client whole. - The wallet would maintain ownership of the
IdentityKey, but would send the dApp a set of revokable and limited keys for messaging in a given conversation. The dApp could then use those keys until the session had ended and the keys were ratcheted by the wallet.
While 2 may be the long-term goal, for that to be possible we would need to have a key ratcheting system in place and/or device-specific keys. Those more advanced encryption features would need to be supported by both parties in a conversation, which may be challenging as we slowly bring new wallets on board with our existing implementation online and in use. 1 feels like a more practical short-term goal as it works nicely with our existing message/key structure.
For the purposes of this document, we will describe things in terms of 1. But 1 and 2 feel similar enough that any workable solution for one should be able to be adapted for the other.
Wallet Integration
Scope Of Responsibilities
Conceptually we should think of this project as breaking the xmtp-js SDK into two pieces, the Client SDK and the Wallet SDK. In practice, we may choose to offer both as a single NPM package for Javascript apps, but with clearly delineated modules with distinct responsibilities. Using both the Client SDK and the Wallet SDK in the same application would replicate today's scope of xmtp-js.
classDiagram
class WalletSDK
class ClientSDK
WalletSDK: Communicate over session channel (for mobile wallets)
WalletSDK: Encrypt/Decrypt messages
WalletSDK: Key storage
WalletSDK: Manage Authn
ClientSDK: Communicate over session channel (when used with mobile wallets)
ClientSDK: Send/Receive usert to user messages
ClientSDK: Conversations abstraction
ClientSDK: ContentType management and decoding
Key generation
When a user accesses XMTP for the first time we will still need to gather a wallet signature. The wallet developer should use a custom prompt for gathering this signature that is distinct from regular signature prompts and that explains what XMTP is. Once approved, the wallet would sign the message, generate XMTP keys, and store those keys permanently.
For wallets that have built-in backup facilities already (a common pattern is storing keys in an encrypted iCloud file), storing the keys on the network would be optional. For other wallets that do not have those capabilities storing the encrypted key bundle on the network would be required.
Browser Extension Wallets
For browser extension wallets - which includes both Metamask-type wallets and the built-in dApp browsers of mobile wallets - things are pretty straightforward. We would give the wallet developers a SDK that allows them to expose private APIs to the browser (similar to window.ethereum) which can access the encryption/decryption functions of the wallet. Approvals would be required per-domain before the wallet was allowed to perform encryption/decryption. Those approvals would be long-term and users would not need to re-approve, or provide signatures, when they returned to a site for some period of time (let's say 30 days).
Mobile wallets and desktop dApps
The problem here is very similar to the problem that WalletConnect 1.0 was designed to solve. We need a secure channel between the browser session and the mobile wallet, where messages can be passed back and forth. It's probably helpful to look at how WalletConnect currently solves this problem.
When a desktop user wants to pair with a WalletConnect wallet, they pass a specifically formatted URI to the wallet out of band (either through a QR code or a deep link). The URI is of the EIP-1328 format and looks something like this:
wc:$TOPIC@$WC_VERSION?relay-protocol=irn&symKey=$SYMMETRIC_ENCRYPTION_KEY
Because the URI is shared out of band it can contain secrets shared between the dApp and the Wallet. The two most relevant pieces of the URI are the $TOPIC and $SYMMETRIC_ENCRYPTION_KEY. The topic tells the wallet where to send communication to perform a handshake, and the symmetric encryption key is used for encrypting that handshake. The client generates both the topic and symmetric key randomly, and both are only used one time. During the handshake a new topic and encryption key are negotiated. These would be scoped to the length of a session.
While we could devise our own URI following a similar scheme in order to connect dApps and XMTP-compatible mobile wallets, I don't see any reason why we couldn't just hijack the WalletConnect URI and add a few XMTP-specific parameters.
This would mean we could have a single QR code that would enable both transaction signing (WalletConnect) and messaging (XMTP). This negates the entire UX advantage that WalletConnect 2.0 offers. Our SDK could be initialized at the same time as WalletConnect. It would be up to the wallet developer to build a UI that made this distinction clear, and offers the user a choice to connect the dApp to XMTP. Once access is granted in the mobile wallet the user would be able to use XMTP fully without needing to prompt for signatures.
To make this work, we would need our own secure channel to communicate between the dApp and the wallet. I suggest we use the XMTP network for this, and have both sides read/write from a negotiated topic. Because this is always real-time communication with both parties online we could have special rules on the nodes that would not store messages in these topics, as they are ephemeral by nature.
The first time user experience would look like this:
sequenceDiagram
Participant C as Client
Participant W as Wallet
Participant U as User
Participant N as Node
C-->>W: Initiate handshake
W-->>C: Send session encryption key and session topic
Note over C,W: Communication over one-time topic using one-time encryption key
W-->>U: Prompt for approval to generate XMTP identity
U-->>W: Approval granted
W-->>W: Create and locally store private key bundle using wallet signature
W-->>N: Publish public key to contact topic
W-->>N: (optional) Publish encrypted private bundle to PrivateTopicStore
For receiving messages for a wallet that has already been initialized, the flow would look like this:
sequenceDiagram
Participant C as Client
Participant W as Wallet
Participant N as Node
C-->>W: Initiate handshake
W-->>C: Send session encryption key and session topic
Note over C,W: Communication over one-time topic using one-time encryption key
N-->>C: Download messages and associated PublicKeyBundles
C-->>W: Send encrypted messages and PublicKeyBundles
Note over C,W: Communication over session topic using sesssion key
W-->>C: Return decrypted payloads
We will also have to figure out how to handle cases where the mobile wallet has been backgrounded. This will likely rely on us running a push server. I've considered it out of scope for now, but it's going to be important to figure out
Non-supported wallets
We need to maintain some way of using XMTP for users whose wallet of choice does not yet integrate the Wallet SDK. My proposal would be to run the Wallet SDK inside of a WebWorker and simulate the interface of Browser Extension Wallets. To both the end-user and developer, XMTP would continue to work the same way it does today.
We could also create a XMTP browser extension that would expose the same API as the browser extension wallet proposal, but would get wallet signatures from an external wallet (Metamask, WalletConnect, etc etc).
Risks?
- We put in substantial work to the SDK and wallet developers never actually implement
- Integration with wallets will be slow and updates to wallet code will be infrequent. Any feature that relies on changes to Wallet SDK code may be delayed as we wait for users/wallets to ugprade to the latest version.
- Without the addition of identity key revokability, a compromised wallet or phishing attack will still permanently compromise a user on XMTP
- Improved wallet UX will only help users with compatible wallets. The same problems we have today will persist for users with unsupported wallets.
Questions
- How do we motivate/incentivize wallet developers to integrate the Wallet SDK?
- After Javascript, do we just move forward and implement a native Swift and Java implementation of the Wallet SDK?
- How do we get wallet developers involved in the design of the SDK to increase the likelihood of adoption?
- How do we manage app-level authorizations in a way that cannot be spoofed (
malicious-app-bclaims to be connecting fromalready-approved-app-Aand generates a new session)?