Key Passing
| Created | Author(s) | Status |
|---|---|---|
| 2022-08-19 | @neekolas | Draft |
Background & Motivation
A prerequisite for building a number of upcoming features is the ability to send keys to a recipient and for that recipient to be able to use those keys in encrypting/decrypting messages. Potential features that this would enable include: sender key escrow, delegate accounts, and channels.
In order to accomplish these goals we need a topic and message format that allows for parties to securely transmit keys over the network.
Goals/Non-Goals
Goals
- Establish message format for sending keys to a recipient
- Establish topic structure that allows for keys to be read and stored before they are needed for encryption/decryption
- Enable the SDK to choose the appropriate key to decrypt a given message
- Maintain our existing guarantees around message authenticity
- Solution should have a line-of-sight to compatibility with channels, sender key escrow, and delegate accounts
Non-Goals
- Fully define channels, sender key escrow, or delegate accounts
- Establish a system for channel-hopping
- Be prescriptive about encryption schemes. This framework should be agnostic to how we do handshakes and derive encryption keys.
Proposed Solution
There are three problems we need to solve to get to an acceptable key passing solution:
- Where to put the keys (topic structure)
- What data is needed on each key message (key format)
- How to store/retrieve the keys in the client to encrypt/decrypt a given message (key manager)
Topic structure
Before a message can be read, the corresponding decryption key must be available in the client. Before a message can be sent, the appropriate encryption key must be available to the client. With today's solution, this problem is trivial as we only have a single PrivateKeyBundle for the client and all encryption keys are derived based on the PublicKeyBundle of the recipient for each message. Those PublicKeyBundles are easily acccessible in the contact topic for the recipient.
With key passing, this becomes more complex. The client will need to store multiple PrivateKeyBundles and conversation-specific symmetric encryption keys that cannot be simply derived based on the public/private key bundles of the sender and recipient.
Where we store these keys is critical for both the performance and ease-of-use of the SDK, as keys must be loaded as a prerequisite to reading messages from other users.
Possible key storage locations
- As meta-messages in the conversation DM topic
- In the introduction topic of both the recipient and the sender
- In a per-conversation topic. This would be derived from the sender and recipients wallet addresses for DMs
Meta Messages in the DM topic
This option can be quickly disqualified from consideration. We already allow returning partial conversations (using the limit option or filtering by date range), we return conversations sorted in both ascending and descending order, and we support pagination of queries to the server. This means it is both possible and likely that messages could be returned from a query without the corresponding key.
The only real solution to this problem would be to scan the entirety of a DM topic to extract all keys before querying for messages, which would be extremely inefficient for large conversation histories. Another extremely hacky solution would be to force all keys to the top or bottom of the results by modifying the date of all keys (for example, to the year 1970) and querying for them first. Neither of these solutions seem viable.
Introduction Topics
Introduction topics are currently used as a destination for the first message in a conversation between two parties, which can then be used to list all ongoing conversations based on the header values of the message. The message payload is unnecessary in these messages as they are duplicated in the dm topic.
Instead of sending whole messages to the intro topic we can instead send Key Messages, which include both the key and the DM topic.
Pros:
- Easy for clients to load all keys on startup and to listen for new keys as they come in. All the keys are in one place that is constant.
- Similar to existing workflow, and can interoperate with legacy conversations
- Same channel could be used for channels, sender key escrow, and delegate accounts.
Cons:
- Vulnerable to DOS attacks, where a malicious sender can fill a targeted user's intro topic with spurious messages.
- In order to load a specific key, the intro topic would have to be read from the top until a matching key was found
Note: we don't have to use the existing intro topic. We could have a new prefix (invite-?) that maintains the same 1:1 relationship with wallet addresses.
Per conversation topic
Each conversation or channel could have a dedicated topic for storing keys related to that conversation. For DMs, the topic structure might look something like keys-${wallet_address_1}-${wallet_address_2}.
Pros:
- Fast for clients to gather the requisite keys for a single conversation
Cons:
- Hard for clients to gather all keys needed for all conversations
- Would require different topic naming conventions for different types of key sharing (channels, delegates, sender key escrow).
- Would still require introduction topics so that the client could be aware of which conversations it was a part of and know where to look for keys
Decision
We should stick with the intro channel format, but create a new prefix specifically for these messages (invite-).
In order to limit the size of this topic, we should ensure that each key is only written to the invite topic at most once. This will be addressed in the key manager section.
Key Format
Depending on the use-case we may want to transmit either symmetric encryption keys used for a single topic (sender key escrow, negotiated DM topics, or channels), or PrivateKeyBundles that can be used across many topics (delegate accounts). This key passing framework should be flexible enough to handle a variety of use-cases/future features. I am imagining two key types to start: TopicKey which would be an symmetric key bound to a particular content topic (useful for sender key escrow and negotiated DM topics) and PrivateKeyBundle (useful for delegate accounts). Additional key types with their own constraints and considerations can be added if needed.
Only one TopicKey may be used for a content topic. To rotate the key, callers must generate a new topic. Attempting to add a second key to an existing topic will result in an error.
The encryption scheme is specified in the TopicKey and is "take it or leave it". If the recipient does not like the suggested encryption scheme their two options are to either ignore the invite or send an invite of their own to the sender with a different encryption scheme. This will become more relevant as additional encryption schemes are added. Given our expectation that clients regularly upgrade to the latest version of xmtp-js, this isn't a major issue today. As we get closer to a LTS release we may consider adding support for negotiation of these values.
All messages on topics with the invite- prefix can be assumed to be of the EncryptionKey proto format and decoded directly. New versions of the EncryptionKey should be introduced into the version oneof field.
message EncryptionKeyV1 {
message TopicKey {
string topic = 1; // only allowing a key for a single topic
message Aes256gcmHkdfsha256 {
bytes key_material = 1;
}
oneof key_material {
// Specify the encryption method in the union, so the client knows what to do with the key material
Aes256gcmHkdfsha256 aes256_gcm_hkdf_sha256 = 2;
}
}
oneof key {
// Commenting out delegate but leaving as a reference until we figure out the exact format
// PrivateKeyBundle delegate = 1; // PrivateKeyBundle used for delegate accounts
TopicKey dm = 1; // TopicKey used for defining a key for DMs between two users
// Commenting out channel, but leaving as a reference until we figure out the exact format
// TopicKey channel = 3;
}
}
message EncryptionKey {
oneof version {
EncryptionKeyV1 v1 = 1;
}
}
Right now, I am not doing anything special to differentiate between sender key escrow and DM topic negotiation. Based on my current understanding, the message format should be identical.
Authentication
Today, we include the signed public keys of both parties in a conversation in the headers on each message. These headers are used as additionalData in the encryption/decryption of each message. As the IdentityKey is signed by the wallet, we have a chain of authentication that stretches from individual messages to the wallet address.
I suggest that KeyMessages retain the V1Message format we use for all messages today, where the headers indicate which PublicKey/PrivateKey should be used to derive the shared secret to decrypt the message and authenticate the sender. Messages sent using the TopicKey could rely on a V2Message format with no headers. The key manager will be able to select the appropriate key for a given message using the topic.
For shared symmetric keys (TopicKeys) we still need to be able to authenticate that any decrypted message was sent by someone in control of the wallet address the recipient is expecting. The key manager should maintain a record of the counterparty of all invite messages and associate them with the topic they are inviting the user to. When decrypting a DM message, validation should be performed that ensures the stated user (as claimed in the MessageContents) is either the sender or recipient of the invite message for the topic. A different authentication protocol should be devised as part of the group chat design.
Key Manager
The client SDK will need a Key Manager that will ingest these key messages and convert them into a state that can be queried.
The Key Manager would need an up-to-date view of the world to decrypt all valid messages that may come in, and to choose the appropriate key to encrypt a new message.
In order to achieve this, the Client SDK would list all messages on the invite topic on startup and would continuously listen for new messages on the topic. This allows the SDK to be aware of any key rotations/topic changes in near real-time.
I propose a key manager that would look something like this:
enum EncryptionAlgorithm {
AES_256_GCM_HKDF_SHA_256,
}
type TopicKeyMaterial = {
keyMaterial: Uint8Array
encryptionAlgorithm: EncryptionAlgorithm
}
type PrivateKeyRecord = {
bundle: PrivateKeyBundle
keySentBy: PublicKeyBundle
}
type TopicResult = {
// This would never include the client. We can assume you are in every topic available
participants: PublicKeyBundle[]
topicKey: TopicKeyMaterial
contentTopic: string
}
type WalletTopicRecord = {
contentTopic: string
createdAt: Date
}
type KeyManager = {
addDirectMessageTopic(
contentTopic: string,
key: TopicKeyMaterial,
counterparty: PublicKeyBundle,
createdAt: Date
): void
// Would be used to get all information required to decrypt/validate a given message
getTopicResult(contentTopic: string): TopicResult | undefined
// Would be used to know which topic/key to use to send to a given wallet address
// The last created topic for a given wallet address would be used
getDirectMessageTopic(walletAddress: string): TopicResult | undefined
addDelegateKey(record: PrivateKeyRecord): void
}
You can see the full POC of the key manager here
Open Questions
- I've opted to keep the relationship between topics and keys 1:1 for simplicity. Are there situations where this model breaks?
- Are there more efficient ways to read messages on the
invite-topic other than gathering everything up-front? - I have removed any time restrictions on keys, and assume that they will be valid for the lifespan of the topic. Are there other situations where we may want time limited keys?