Abuse Prevention and Management

Created	Author(s)	Status
2022-03-24	@jazzz	Draft

Overview

Outlines the high-level approach to abuse mitigation throught the use of authenticated libp2p transports, rate limiting, identity banning.

Background & Motivation

Starting from launch day, XMTP will need a system to limit abuse of the network and nodes. ‘Abuse’ in this case refers to ‘attempting to use the network for a purpose which was not intended and causes harm.’ To ensure a positive experience for users, XMTP requires a method to:

Stop these behaviors from occurring
Recover from an incident should it occur anyways

The long-term solution to abuse will likely involve a complex de-incentivization scheme, and zero-knowledge proofs. In the interim a partial solution is required to protect the network until a longer-term solution is ready.

Goals / Non-goals

Goals
- Minimize effects of bad actors
Non-goals
- Solutions to low quality messages
- Crypto-system hardening

Guiding Principles

Despite all care and consideration XMTP will contain vulnerabilities when it launches.
Waku2 is not ready to handle malicious nodes, and will need to be addressed.

Proposed Solution

Control identity creation to limit abusers
- Utilizing Allow/Deny lists
- Require a cost/work to 'register'
Control how messages are sent
- Allow messages to be sent if and only if the identity exists and is not present on the Deny-list
- Use rate limits to control excessive messages
Monitor usage to ban accounts as necessary.

Note: As it is non-trivial to create an identity, getting banned causes abusers to incur another cost to access the system.

Identity Creation Control

In this approach it is important to control the ability to create identities. If an abuser can create new identities then rate-limiting and banning identities will have no meaningful effect.

The rules for Identity creation are:

Allow-listed users are always allowed to register.
Deny-listed users are forbidden from registering
Wallets must have an established history and have incurred a cost at some point.

Allow / Deny Lists

The Allow-list enables XMTP Labs to correct/fix any issues with the abuse system. There are some outlier use-cases which are difficult to capture with a generic approach to abuse management. These cases can be addressed by explicitly allowing certain wallets/identities to bypass the rules. It is anticipated that the Allow-list will be seldomly used, and that the abuse ruleset ought to capture the vast majority of cases.

The deny-list serves as the primary form of consequence in the network. The deny list specifies wallets/identities which are not allowed to send on the network, because of past abusive behavior.

The allow-list will always take precidence over the deny-list. In cases where a valid user is banned from the system, the allow-list can be manually edited to provide an override to the nominal behavior.

Registration Costs

The cost aspect to registration is important to protect against users/bots creating multiple accounts easily. If accounts were free then someone could create infinite accounts, which would undermine our ability to restrict them on the network. The cost component is a balance between two competing goals:

Minimizing barriers to entry for valid users
Making it cost prohibitive for abusers to generate 'throw-away' accounts.

To balance these two mechanics, explicit payment is not required however a history of previous financial costs is. The ruleset is as follow

Allow if address has ever appeared as the 'from' address on an ethereum token transaction(Paid gas fee)
- Includes contract approvals
Allow if address has ever appeared as the 'to' address on a non zero ethereum token transaction.

Transaction information is available via an Alchemy Api which includes erc20 tokens as well. Alternatively there is an option to run an ethereum node however would require building a database of all transactions. Details on the specific implementation to follow

Rate-limiting

Performing rate limiting at a XmtpMessage level is problematic if message deniability is to be maintained. Without verifiable sender information it is difficult to ensure we are throttling the correct identities, rather than the spoofed account.

Rate limiting at a libp2p transport/stream layer is an alternative option. Suggested flow:

On startup, the client sends proof of Identity to the node.
- Node maps( PeerId -> XmtpIdentity). The XmtpIdentity is still publicly deniable given the nodes are controlled by xmtp, and the channels are secured via noise.
Node will not accept messages from a peerId which has not provided a proof of identity
rate-limits are applied to xmtpIdentities via peerId mapping.

Two algorithms were considered:

Fixed window ratelimits

Limits are set for different timelines and the counters are cleared once the timeframe expires.

Example:

Group	hourly	daily	weekly	monthly
Allow-List	1000	10000	20000	50000
Public	100	1000	5000	15000

Advantages:

can provide fine grain control, particularly over longer time periods.

Disadvantages

Increased memory footprint.
More effort to implement.
Can cause bursty usage as all users windows reset at the sametime**.
Inconsistent user experience. When a time window expires**.

** There is a sliding window variant which addresses these concerns, but adds extra complications

Token Bucket Algorithm

A user is required to have a msgToken in-order to send a message. Users are given X number of msgTokens in a given timeframe, and allowed to accrue a stockpile of at most B tokens. Eg: 1 Token/minute with a bucket size of 100 tokens. In this example a user could send 100 messages immediately, but then must wait 1 minute to send the 101st message. Alternatively the user could send messages at a rate of 2 messages per minute, for 1h40m minutes as new tokens are continually alloted every minute.

Advantages

Easier to implement and understand what is happening
memory efficient (as little as 8 bytes per Identity)

Disadvantages

Lacks sensitivity to high usage over time.
2 Parameters make it difficult to tune (specifically at longer timescales). To enable bursty traffic the bucket size is increased. This however also increases the max sustained message rate.

Monitoring and Banning

Logging

Nodes ought to create log entries to increase visibility into the abuse management system. These logs can help

auto-ban accounts which are sending too many messages. Specific criteria undefined
Determine if the rate-limits need to be adjusted. (excessive false-positives)

Suggested changes:

Add log entry when an identity is rate-limited
Add log entry when an identity is within 10% of its rate limit

Watching logs can be a manual process to start and can later be automated if needed.

Bans

When abusive behavior is detected the offending user ought to be placed on the deny list, effectively banning them from the system. (Note banning allow-listed users would require they be removed from the allow-list first)

When a ban occurs:

Users can still receive messages
Users can no longer send messages. Sending messages returns the same error as a rate-limit event.

There is an open question of "How long should an identity be banned when abuse is detected?"

Perma-ban: users are banned forever. The ban criteria are selected to ensure that bans only occur when egregious abuse has been found. Optionally a communication channel can be set up for users to appeal.
Exponential-Backoff-Ban*: users are banned for increasingly longer times upon each subsequent ban. eg: Day, Week, Month, Year

Future Work

Honeypots

Honeypot wallets are XMTP controlled wallets which provide visibility into what is happening on the network. In early days these accounts can provide the data required to make corrective action.

Setting up wallets with specific parameters will allow detection of specific events

A wallet which does not appear any blockExplorer (enabled via allow-list):
- It should be impossible to discover this wallet address
- Should result in an auto-ban, and a security review of how the address got leaked
A wallet which holds some smaller altcoins .
- Spammers likely scrapping addresses from blockExplorer.
- Messages should be monitored and manual banned if appropriate. Low-quality messages are out-of-scope

All these wallets can be connected to slackbots so they can be monitored efficiently. Honeypot wallets are not needed for launch and will be a low priority.

Field Reports

It is possible for clients to automatically report back to nodes when they detect abusive behavior. Specifically if a client observes a message with a spoofed sender address it can notify the network to remove the message.

The ability to cleanup fraudulent messages minimize the long term damage from fraudulent messages.

There are many open questions which need to be resolved before putting this mechanism into production:

How can nodes verify the field report is trustworthy?
If the sender address is spoofed what actions can be taken to limit this behavior

More thought is needed.

Summary

Identity Registration:

Allow if: wallet is on the allow-list
Deny if: wallet is on the deny-list
Allow if: wallet has previously incurred a cost

To send messages an associated identity must satisfy all constraints:

Not be on the deny-list
have a public keyBundle registered on the network
have enough rate-limit msgTokens

Path To Decentralization

This entire work, will become obsolete once economic spam de-incentivization is implemented.

Rate-limiting
- Replaced by some form of postage
- Messages will not require artificial limiting because they would become cost prohibitive at a meaningful level of traffic.
Blockchain transaction requirement
- Replaced by some form of postage
- While the code for this will likely disappear, inspirit postage will likely have a similar requirement.
Bans
- Phased out as postage provides the main deterrent.
- Account level banning is only required to enforce message rate-limits.
- As the network becomes decentralized there is no need for xmtp-labs to police the network.

Plan & Timeline

Allow/Deny list: 3 Days Blockchain data integration: 1 Week Rate-limiting: 1 Week Logging: 1 Day

Dependencies

Only waku:Relay needs to be disabled for clients. Clients should use lightpush to interact with the network
Changes required to address https://github.com/xmtp-labs/hq/issues/465 and https://github.com/xmtp-labs/hq/issues/466 may have effects on this approach depending upon their

Alternatives Considered / Prior Art?

Zero Knowledge proof of membership

Proving membership in zero knowledge via merkle-trees is a method to ensure that users are allowed to send messages, while also preserving a sender's privacy. However the most pressing issue in xmtp is de-incentizing users from abusing the system. Proving membership is not particularly useful as the identity is still needed to apply the rate-limits and bans.

Simple Postage

Requiring every message to pay a transaction fee would lower (if not remove) excessive messages from the network. Unfortunately this would adversely impact network growth and be cost prohibitive to adoption.

One option that is being pursued is 'Collateralized Postage' where every message has a value attached to it as collateral. If that message is fraudulent or undesired by the recipient they can choose to claim the collateral. This provides many great properties:

Free messages between willing participants
Native Subscribe/Unsubscribe mechanics.

While there appears to be great promise, the timeline on such a project is difficult to estimate. While imperfect, the effort to implement Rate-limiting and bans is more quantifiable

Do Nothing

Given the costs of implementing a temporary solution, one must ask "is it really needed or can we get by without it for the time being?". With an estimated throughput of 1000s of messages a second, the network could become saturated by a few malicious laptops. With significant risk to reputation at stake, some action is required to thwart intentional abusers.

Closed Network

The risk of abuse grows the more public the network is. If sending permission were limited to a small select group of individuals then abuse would be a non issue. Given network growth is a primary goal, this closed approach is counter-productive.

Adding a trusted referral component could allow the network to expand, but would ultimately result in an effectively open network once it reaches a critical size.

Without a form of repercussion for inviting abusers to the network, this trust model is effectively equivalent to the network being wide open. This is further compounded by the fact that once on the network identifying which user is the culprit is a difficult task due to deniability of messages.

Abandon Message Deniability

One of the limiting factors to the solution space is that by design nodes cannot determine who the message sender is. With no authentication on the messages its easy for an abuser to spoof addresses, which makes taking corrective action difficult.

There are two potential solutions which come to mind:

Add significant infrastructure to support Zero-Knowledge proofs to work around indeterminable senders
Temporarily abandon message deniability to serve short term needs.

Neither of these feel great. Option 1 feels like that level of effort is quite close to a postage-like solution. Moving away from privacy feels like steps in the wrong direction. Given this solution is likely to be in place for significantly longer than intended, implementing solutions which are not in line with our beliefs is harder to get behind.

Risks?

The proposed plan is not sufficient enough and the network is dominated by abusers
The rate limiting parameters are excessively restrictive and innocent users are denied access.
The approach is deemed too centralized and the reputation is tarnished within the community
The aggregate approach of rate limiting and bans is more complicated than a

Questions

What should the criteria be to institute a ban? (Parameter Tuning)
Is there a nominal use case where clients would be connecting to multiple nodes in a short period of time? or can this behavior be labeled suspect?
There is currently no concept of registration on the xmtpNodes. Posting an identity key is the same as posting a message. Should the nodes remain message agnostic?
Ought there be rate-limits for allow-listed identities?

Abuse Prevention and Management

Overview​

Background & Motivation​

Goals / Non-goals​

Guiding Principles​

Proposed Solution​

Identity Creation Control​

Allow / Deny Lists​

Registration Costs​

Rate-limiting​

Fixed window ratelimits​

Token Bucket Algorithm​

Monitoring and Banning​

Logging​

Bans​

Future Work​

Honeypots​

Field Reports​

Summary​

Path To Decentralization​

Plan & Timeline​

Dependencies​

Alternatives Considered / Prior Art?​

Zero Knowledge proof of membership​

Simple Postage​

Do Nothing​

Closed Network​

Abandon Message Deniability​

Risks?​

Questions​

Appendix​