Software-defined Application Firewall for ActivityPub inboxes.
## Objective
The goal of this project is to write a dynamic trust evaluation framework for ActivityPub inboxes
that do not rely on the message content itself by default.
Currently I want to stay as unopinionated as possible and focus on the infrastructure and features, after I tested the system out I plan to roll out more opinionated and easy-to-use configurations. I also plan to allow zero-code or even zero-configuration deployments.
## What's wrong with current solutions?
### Content-based Filtering
While an effective and well-known method, particularly in the email world, they may not be as effective in the ActivityPub world, reasons include:
- Emails are usually formal and have a clear structure, while ActivityPub messages are more free-form and can contain arbitrary data, some of them even humans have trouble understanding.
- Short, confusing messages in ActivityPub are usually not spam, but they are in email.
- Content-based filtering is not effective against spam that is not in the content, such as follow requests, photos, links, etc.
- It is hard to set a hard threshold for things like "too many mentions" or "too many links" because they are not always spam. A heated debate can easily trigger these thresholds while being legitimate.
### Instance-level Blocking
Instance-level blocking is not effective against spam and abuse where the attacker has control over multiple instances (either by owning the domains themselves or by hijacking open registration instances).
Sometimes large instances are also used to send spam, and it is not always desirable to block them, especially if they are not malicious by themselves.
### Our Approach
Instead we will focus on machine-readable data that can be used to evaluate the trustworthiness of the incoming requests that considers context on each of global, instance, user and message level.
## Infrastructure
## Data Sources
In addition to the decoded inbox message itself,
we provide keyed LRU caches with TTL to retrieve additional supporting
information about the incoming requests,
such as domain history, user history, instance metadata, etc.
Multiple requests to the same key will be deduplicated and only one request will be made.
Evaluator is an async function that takes an incoming request and either passes it through or return an response early. When all evaluators pass the request, the request will be forwarded to the backend.
Evaluators can be written as a free async closure or a struct implementing the `Evaluator` trait.
### Built-in Evaluators
- [ ] Dangling mentions: If most of the mentioned user does not appear in the parent message, it is likely spam.
- [ ] High-frequency messages with highly similar content (maybe Bloom Filter or a real Fuzzy Hash)
- [ ] Completely fresh instance sending PMs or large number of mentions
- [ ] Open Registration Instances with Abnormal User Growth
- [ ] WHOIS from known bad registrars
- [ ] Instances already blocked by Friend Instances