Protocol Versioning
Status: Draft Date: March 1, 2022
Background
The XMTP protocol is under active development with major breaking changes on the horizon. Meanwhile, developers are being onboarded to the platform. We need to devise a system for reliable operation of the network against a changing codebase. We also need to set up an extensible foundation for us to be able to make changes to the protocol, client, and XMTP nodes once we are live on mainnet and have a commitment to persist messages for periods of time that span version upgrades.
Once on mainnet, the expectation is that breaking changes requiring a hard fork of the network should be extremely rare. Hard forks on mainnet will require months of planning and testing, as well as being disruptive to node operators and client developers. It is expected that any hard fork will partition the network, and any un-updated nodes will be left out of some or all communications. Given these constraints we should ensure that as much as possible can be achieved within soft forks of the network where messages and API calls are backwards compatible.
Problem
There are three types of changes to the network that have the potential to cause breaking issues
- Changes to the XMTP Envelope and Waku Envelope schemas
- Changes to the XMTP Node software
- Changes to the Client software
Requirements
- Nodes can make minor updates to their software without partitioning the network or breaking compatibility with previously compatible clients
- Major updates that partition the network are possible and there is a system in place to ensure that they do not break nodes which have not yet updated.
- Nodes may be using versions of the XMTP Protocol that are ahead of connected clients
- Messages sent and stored in a previous version of the protocol are compatible with nodes operating in a later version of the protocol
Non-Goals
I don't think we need to support cases where the client is using a later version of the protocol than the nodes. We should just stagger releases so that most nodes are upgraded before new client versions are released, and the client should be smart enough to not connect to incompatible nodes. More on this later.
Proposed solution
Versioning
All communication over the network happens over LibP2P. LibP2P has a fairly robust system for handling versioning of peer contracts in it's protocol specification.
The LibP2P protocol handlers also allow nodes to accept connections for many different protocols, including different versions of the same protocol. This makes it easy to support backwards compatibility. Nodes can include declare handlers for a 1.x and a 2.x version of a protocol for some period of time in the node, allowing for smooth upgrades.
I propose that we lean into the LibP2P system and use protocol versioning to coordinate upgrades, rather than having a global "XMTP version" that enforces which nodes can connect to one another.
This has three main advantages
- Individual parts of the application can be upgraded separately. LibP2P is already smart enough to connect to peers per-protocol, so a peer can be used for some protocols where there is a compatible version but not for others where there is no matching version. This would mean that upgrading a node to a newer Store Protocol, would not necessarily prevent it from receiving Relay messages if the Relay protocols were compatible with its peers.
- There is already built-in support for version matching, allowing us to use SemVer (or an alternative function) to negotiate whether or not two nodes can communicate for a given protocol.
- It already exists and is built in to Waku
Waku currently uses the following LibP2P Protocols:
/vac/waku/relay/2.0.0/vac/waku/lightpush/2.0.0-beta1/vac/waku/store/2.0.0-beta3/vac/waku/filter/2.0.0-beta1
We will likely need to expand the list of protocols to service future product features, and upgrade some of the existing protocols with new specifications.
Message Schema
Messages are expected to be stored on the network for periods of time that will span network updates. It is expected that we will update and improve the XMTP envelope format to support new features. As such, I propose the following change to the envelope schema in order to make the major version of a message self-describing. We should implement this now, despite only having a single major version of the envelope schema to ensure compatibility.
Current Schema
message Message {
bytes headerBytes = 1; // encapsulates the encoded MessageHeader
Ciphertext ciphertext = 2;
}
ProposedSchema
message MessageV1 {
bytes headerBytes = 1; // encapsulates the encoded MessageHeader
Ciphertext ciphertext = 2;
}
message Message {
union {
MessageV1 v1 = 1
}
}
This change will allow us to create future iterations that will still be able to decode older, incompatible, message formats. It also does not require us to store the protocol version that the message was received on as additional data in the store layer (which would otherwise be required to know how to decode the message).
NOTE: With the change to store the header as raw bytes, it will be up to the client to decode the header using the correct protocol buffer. The version of the Message only offers a hint to the major version. Using the wrong Protobuf decoder on the headerBytes will lead to a malformed message that may not be able to be processed correctly.
Types Of Network Updates
Any change, or package of changes, to the protocol will have to be classified as one of the following types of upgrades.
Patch
Patch updates do not change any of the contracts/protocols used by nodes to communicate with one another. Patch updates MUST be fully compatible with the current minor version of the protocol. Patch updates SHOULD not increment any protocol version numbers.
Minor
Minor updates introduce new features to the XMTP Protocol. For each LibP2P protocol affected, minor updates should increment the minor version of the LibP2P protocol. Minor updates MUST be compatible with all nodes using the same major version of the LibP2P protocol.
In some cases, a minor update may introduce new functionality to the node without affecting any of the LibP2P Protocol schemas (for example, changing the behaviour of an input already supported by the LibP2P Protocol). In these cases, it is still recommended to SemVer minor increment the version of the LibP2P protocol.
Creating an entirely new LibP2P protocol would be considered a minor update, since nodes that do not support this protocol would simply not communicate using it.
Adding new fields to an existing protocol buffer schema is likely the most common example of a minor update. In this case, older clients will simply not see the field but will still be able to process messages succesfully.
Major
Major updates would include breaking changes to an existing protocol. These would require a new major version of the protocol identifier. A new handler should be created to service this new protocol version. The previous handler MAY remain in service for some period of time after the new protocol version has been introduced.
Major updates will partition the network for this protocol, and communication using this protocol would be limited to nodes that support it.
Update Playbooks
Here are a few playbooks for different types of network updates.
Hard Network Update
This would only be used on the Playnet, as it is maximally disruptive. Main advantage is that the network would only be running with a single node codebase at one time.
There will be no expectation of compatibility with applications that do not follow all steps required in this flow.
- Push a new Docker image of the node software which may or may include any type of network update (PATCH, MINOR, MAJOR)
- Update all nodes to the new software
- (optional) Delete storage volumes
- (optional) Update
js-sdkif required - (optional) Update all applications to latest version of
js-sdkif 4 was required.
Soft Fork
This would be used on either Testnet or Mainnet to release new features in a non-breaking way. Would be used for PATCH or MINOR updates only
- Push new Docker image of the node software
- Update all XMTP nodes to the newest software
- Publish advisory for all external node operators to update to the latest software
- Push new version of
js-sdk - Encourage all application developers to use latest version of
js-sdk(although not required for existing functionality to be maintained)
Hard Fork
TBD. But the gist would be that we would have two versions of some protocols running inside the node for some period of time.
Rejected solutions
Ethereum 2.0 has some useful prior art here. In addition to using protocol identifier versioning like this proposal, they also include a version number in some LibP2P pubsub topics. For consensus topics, the LibP2P topic name includes a ForkDigestValue, which is the hash of the hard fork version (major version) and the merkle root of the chain (to differentiate testnets and avoid replays). The full topic format is:
/eth2/ForkDigestValue/Name/Encoding
We could do something similar with our Pubsub Topic names. The main reason I have decided not to include pubsub topics in the proposal is the complexity of splitting up messages between multiple pubsub topics based on major version. This would be required to provide backwards compatibility. Feels like it is going to make other features, like partitioning the network, more complicated. It also overlaps with LibP2P protocol versioning and is largely redundant.
We may decide to revisit this later if needed.
Open Questions
- I think incrementing the protocol version on minor updates that do not change schema is going to be controversial. It muddies the waters of what the protocol version is actually for. But it also shouldn't really matter if we are using SemVer properly and gives us transparency on what versions are out on the network. We can gather metrics from our nodes on what exact protocol versions they are seeing, for example.
- There is going to be some mental overhead of figuring out which versions need updating for a given change. It's also likely difficult to lint for. I am open to solutions that would lint against changes that require a version upgrade but do not include one.
- How soft and hard forks will work with node incentives is very TBD. Need a way to ensure that nodes can continue receiving incentives even if slow to adopt soft forks. Failing to follow a hard fork is expected to disrupt node incentives. Bitcoin and Ethereum both have "fallback" modes of message handling to process message formats that nodes do not fully understand that can be used for soft forks.