Skip to main content

Abuse Prevention and Management

CreatedAuthor(s)Status
2022-03-24@jazzzDraft

Overview​

Outlines the high-level approach to abuse mitigation throught the use of authenticated libp2p transports, rate limiting, identity banning.

Background & Motivation​

Starting from launch day, XMTP will need a system to limit abuse of the network and nodes. β€˜Abuse’ in this case refers to β€˜attempting to use the network for a purpose which was not intended and causes harm.’ To ensure a positive experience for users, XMTP requires a method to:

  1. Stop these behaviors from occurring
  2. Recover from an incident should it occur anyways

The long-term solution to abuse will likely involve a complex de-incentivization scheme, and zero-knowledge proofs. In the interim a partial solution is required to protect the network until a longer-term solution is ready.

Goals / Non-goals​

  • Goals
    • Minimize effects of bad actors
  • Non-goals
    • Solutions to low quality messages
    • Crypto-system hardening

Guiding Principles​

  • Despite all care and consideration XMTP will contain vulnerabilities when it launches.
  • Waku2 is not ready to handle malicious nodes, and will need to be addressed.

Proposed Solution​

  1. Control identity creation to limit abusers

  2. Control how messages are sent

    • Allow messages to be sent if and only if the identity exists and is not present on the Deny-list
    • Use rate limits to control excessive messages
  3. Monitor usage to ban accounts as necessary.

Note: As it is non-trivial to create an identity, getting banned causes abusers to incur another cost to access the system.

Identity Creation Control​

In this approach it is important to control the ability to create identities. If an abuser can create new identities then rate-limiting and banning identities will have no meaningful effect.

The rules for Identity creation are:

  • Allow-listed users are always allowed to register.
  • Deny-listed users are forbidden from registering
  • Wallets must have an established history and have incurred a cost at some point.

Allow / Deny Lists​

The Allow-list enables XMTP Labs to correct/fix any issues with the abuse system. There are some outlier use-cases which are difficult to capture with a generic approach to abuse management. These cases can be addressed by explicitly allowing certain wallets/identities to bypass the rules. It is anticipated that the Allow-list will be seldomly used, and that the abuse ruleset ought to capture the vast majority of cases.

The deny-list serves as the primary form of consequence in the network. The deny list specifies wallets/identities which are not allowed to send on the network, because of past abusive behavior.

The allow-list will always take precidence over the deny-list. In cases where a valid user is banned from the system, the allow-list can be manually edited to provide an override to the nominal behavior.

Registration Costs​

The cost aspect to registration is important to protect against users/bots creating multiple accounts easily. If accounts were free then someone could create infinite accounts, which would undermine our ability to restrict them on the network. The cost component is a balance between two competing goals:

  • Minimizing barriers to entry for valid users
  • Making it cost prohibitive for abusers to generate 'throw-away' accounts.

To balance these two mechanics, explicit payment is not required however a history of previous financial costs is. The ruleset is as follow

  • Allow if address has ever appeared as the 'from' address on an ethereum token transaction(Paid gas fee)
    • Includes contract approvals
  • Allow if address has ever appeared as the 'to' address on a non zero ethereum token transaction.

Transaction information is available via an Alchemy Api which includes erc20 tokens as well. Alternatively there is an option to run an ethereum node however would require building a database of all transactions. Details on the specific implementation to follow

Rate-limiting​

Performing rate limiting at a XmtpMessage level is problematic if message deniability is to be maintained. Without verifiable sender information it is difficult to ensure we are throttling the correct identities, rather than the spoofed account.

Rate limiting at a libp2p transport/stream layer is an alternative option. Suggested flow:

  1. On startup, the client sends proof of Identity to the node.
    • Node maps( PeerId -> XmtpIdentity). The XmtpIdentity is still publicly deniable given the nodes are controlled by xmtp, and the channels are secured via noise.
  2. Node will not accept messages from a peerId which has not provided a proof of identity
  3. rate-limits are applied to xmtpIdentities via peerId mapping.

Two algorithms were considered:

Fixed window ratelimits​

Limits are set for different timelines and the counters are cleared once the timeframe expires.

Example:

Grouphourlydailyweeklymonthly
Allow-List1000100002000050000
Public1001000500015000

Advantages:

  • can provide fine grain control, particularly over longer time periods.

Disadvantages

  • Increased memory footprint.
  • More effort to implement.
  • Can cause bursty usage as all users windows reset at the sametime**.
  • Inconsistent user experience. When a time window expires**.

** There is a sliding window variant which addresses these concerns, but adds extra complications

Token Bucket Algorithm​

A user is required to have a msgToken in-order to send a message. Users are given X number of msgTokens in a given timeframe, and allowed to accrue a stockpile of at most B tokens. Eg: 1 Token/minute with a bucket size of 100 tokens. In this example a user could send 100 messages immediately, but then must wait 1 minute to send the 101st message. Alternatively the user could send messages at a rate of 2 messages per minute, for 1h40m minutes as new tokens are continually alloted every minute.

Advantages

  • Easier to implement and understand what is happening
  • memory efficient (as little as 8 bytes per Identity)

Disadvantages

  • Lacks sensitivity to high usage over time.
  • 2 Parameters make it difficult to tune (specifically at longer timescales). To enable bursty traffic the bucket size is increased. This however also increases the max sustained message rate.

Monitoring and Banning​

Logging​

Nodes ought to create log entries to increase visibility into the abuse management system. These logs can help

  • auto-ban accounts which are sending too many messages. Specific criteria undefined
  • Determine if the rate-limits need to be adjusted. (excessive false-positives)

Suggested changes:

  • Add log entry when an identity is rate-limited
  • Add log entry when an identity is within 10% of its rate limit

Watching logs can be a manual process to start and can later be automated if needed.

Bans​

When abusive behavior is detected the offending user ought to be placed on the deny list, effectively banning them from the system. (Note banning allow-listed users would require they be removed from the allow-list first)

When a ban occurs:

  • Users can still receive messages
  • Users can no longer send messages. Sending messages returns the same error as a rate-limit event.

There is an open question of "How long should an identity be banned when abuse is detected?"

  • Perma-ban: users are banned forever. The ban criteria are selected to ensure that bans only occur when egregious abuse has been found. Optionally a communication channel can be set up for users to appeal.
  • Exponential-Backoff-Ban*: users are banned for increasingly longer times upon each subsequent ban. eg: Day, Week, Month, Year

Future Work​

Honeypots​

Honeypot wallets are XMTP controlled wallets which provide visibility into what is happening on the network. In early days these accounts can provide the data required to make corrective action.

Setting up wallets with specific parameters will allow detection of specific events

  • A wallet which does not appear any blockExplorer (enabled via allow-list):
    • It should be impossible to discover this wallet address
    • Should result in an auto-ban, and a security review of how the address got leaked
  • A wallet which holds some smaller altcoins .
    • Spammers likely scrapping addresses from blockExplorer.
    • Messages should be monitored and manual banned if appropriate. Low-quality messages are out-of-scope

All these wallets can be connected to slackbots so they can be monitored efficiently. Honeypot wallets are not needed for launch and will be a low priority.

Field Reports​

It is possible for clients to automatically report back to nodes when they detect abusive behavior. Specifically if a client observes a message with a spoofed sender address it can notify the network to remove the message.

The ability to cleanup fraudulent messages minimize the long term damage from fraudulent messages.

There are many open questions which need to be resolved before putting this mechanism into production:

  • How can nodes verify the field report is trustworthy?
  • If the sender address is spoofed what actions can be taken to limit this behavior

More thought is needed.

Summary​

Identity Registration:

  • Allow if: wallet is on the allow-list
  • Deny if: wallet is on the deny-list
  • Allow if: wallet has previously incurred a cost

To send messages an associated identity must satisfy all constraints:

  • Not be on the deny-list
  • have a public keyBundle registered on the network
  • have enough rate-limit msgTokens

Path To Decentralization​

This entire work, will become obsolete once economic spam de-incentivization is implemented.

  • Rate-limiting

    • Replaced by some form of postage
    • Messages will not require artificial limiting because they would become cost prohibitive at a meaningful level of traffic.
  • Blockchain transaction requirement

    • Replaced by some form of postage
    • While the code for this will likely disappear, inspirit postage will likely have a similar requirement.
  • Bans

    • Phased out as postage provides the main deterrent.
    • Account level banning is only required to enforce message rate-limits.
    • As the network becomes decentralized there is no need for xmtp-labs to police the network.

Plan & Timeline​

Allow/Deny list: 3 Days Blockchain data integration: 1 Week Rate-limiting: 1 Week Logging: 1 Day

Dependencies​

Alternatives Considered / Prior Art?​

Zero Knowledge proof of membership​

Proving membership in zero knowledge via merkle-trees is a method to ensure that users are allowed to send messages, while also preserving a sender's privacy. However the most pressing issue in xmtp is de-incentizing users from abusing the system. Proving membership is not particularly useful as the identity is still needed to apply the rate-limits and bans.

Simple Postage​

Requiring every message to pay a transaction fee would lower (if not remove) excessive messages from the network. Unfortunately this would adversely impact network growth and be cost prohibitive to adoption.

One option that is being pursued is 'Collateralized Postage' where every message has a value attached to it as collateral. If that message is fraudulent or undesired by the recipient they can choose to claim the collateral. This provides many great properties:

  • Free messages between willing participants
  • Native Subscribe/Unsubscribe mechanics.

While there appears to be great promise, the timeline on such a project is difficult to estimate. While imperfect, the effort to implement Rate-limiting and bans is more quantifiable

Do Nothing​

Given the costs of implementing a temporary solution, one must ask "is it really needed or can we get by without it for the time being?". With an estimated throughput of 1000s of messages a second, the network could become saturated by a few malicious laptops. With significant risk to reputation at stake, some action is required to thwart intentional abusers.

Closed Network​

The risk of abuse grows the more public the network is. If sending permission were limited to a small select group of individuals then abuse would be a non issue. Given network growth is a primary goal, this closed approach is counter-productive.

Adding a trusted referral component could allow the network to expand, but would ultimately result in an effectively open network once it reaches a critical size.

Without a form of repercussion for inviting abusers to the network, this trust model is effectively equivalent to the network being wide open. This is further compounded by the fact that once on the network identifying which user is the culprit is a difficult task due to deniability of messages.

Abandon Message Deniability​

One of the limiting factors to the solution space is that by design nodes cannot determine who the message sender is. With no authentication on the messages its easy for an abuser to spoof addresses, which makes taking corrective action difficult.

There are two potential solutions which come to mind:

  1. Add significant infrastructure to support Zero-Knowledge proofs to work around indeterminable senders
  2. Temporarily abandon message deniability to serve short term needs.

Neither of these feel great. Option 1 feels like that level of effort is quite close to a postage-like solution. Moving away from privacy feels like steps in the wrong direction. Given this solution is likely to be in place for significantly longer than intended, implementing solutions which are not in line with our beliefs is harder to get behind.

Risks?​

  • The proposed plan is not sufficient enough and the network is dominated by abusers
  • The rate limiting parameters are excessively restrictive and innocent users are denied access.
  • The approach is deemed too centralized and the reputation is tarnished within the community
  • The aggregate approach of rate limiting and bans is more complicated than a

Questions​

  • What should the criteria be to institute a ban? (Parameter Tuning)

  • Is there a nominal use case where clients would be connecting to multiple nodes in a short period of time? or can this behavior be labeled suspect?

  • There is currently no concept of registration on the xmtpNodes. Posting an identity key is the same as posting a message. Should the nodes remain message agnostic?

  • Ought there be rate-limits for allow-listed identities?

Appendix​