fedivet/README.md

# Fedivet

WIP! Not usable yet.

Software-defined Application Firewall for ActivityPub inboxes.

## Objective

The goal of this project is to write a dynamic trust evaluation framework for ActivityPub inboxes
that do not rely on the message content itself by default.

Currently I want to stay as unopinionated as possible and focus on the infrastructure and features, after I tested the system out I plan to roll out more opinionated and easy-to-use configurations. I also plan to allow zero-code or even zero-configuration deployments.

## What's wrong with current solutions?

### Content-based Filtering

While an effective and well-known method, particularly in the email world, they may not be as effective in the ActivityPub world, reasons include:

- Emails are usually formal and have a clear structure, while ActivityPub messages are more free-form and can contain arbitrary data, some of them even humans have trouble understanding.
- Short, confusing messages in ActivityPub are usually not spam, but they are in email.
- Content-based filtering is not effective against spam that is not in the content, such as follow requests, photos, links, etc.
- It is hard to set a hard threshold for things like "too many mentions" or "too many links" because they are not always spam. A heated debate can easily trigger these thresholds while being legitimate.

### Instance-level Blocking

Instance-level blocking is not effective against spam and abuse where the attacker has control over multiple instances (either by owning the domains themselves or by hijacking open registration instances). 

Sometimes large instances are also used to send spam, and it is not always desirable to block them, especially if they are not malicious by themselves.

### Our Approach

Instead we will focus on machine-readable data that can be used to evaluate the trustworthiness of the incoming requests that considers context on each of global, instance, user and message level.

## Infrastructure

## Data Sources

In addition to the decoded inbox message itself,
we provide keyed LRU caches with TTL to retrieve additional supporting 
information about the incoming requests,
such as domain history, user history, instance metadata, etc.
Multiple requests to the same key will be deduplicated and only one request will be made.

### Built-in Data Sources

- [ ] [Fediverse Observer](https://fediverse.observer)
- [ ] Domain WHOIS
- [ ] Advertised NodeInfo
- [ ] Federation Reports from Friend Instances
  
## Evaluators

Evaluator is an async function that takes an incoming request and either passes it through or return an response early. When all evaluators pass the request, the request will be forwarded to the backend.

Evaluators can be written as a free async closure or a struct implementing the `Evaluator` trait.

### Built-in Evaluators

- [ ] Dangling mentions: If most of the mentioned user does not appear in the parent message, it is likely spam.
- [ ] High-frequency messages with highly similar content (maybe Bloom Filter or a real Fuzzy Hash)
- [ ] Completely fresh instance sending PMs or large number of mentions
- [ ] Open Registration Instances with Abnormal User Growth
- [ ] WHOIS from known bad registrars
- [ ] Instances already blocked by Friend Instances

```rs
#[allow(clippy::unused_async)]
async fn build_state(args: &Args) -> AppState<MisskeyError> {
    let mut state = AppState::new(args.backend.parse().expect("Invalid backend URL"));

    let instance_history = Arc::new(LruData::sized(
        &|host| async move { Ok::<_, ()>("Todo") },
        512.try_into().unwrap(),
        Some(Duration::from_secs(600)),
    ));

    state.push_evaluator(Box::new(move |info: &APRequestInfo<'_>| {
        let act = info.activity.as_ref().map_err(|_| ERROR_DENIED).cloned();
        let instance_history = Arc::clone(&instance_history);

        async move {
            let act = act?;
            let host = act
                .actor
                .as_ref()
                .and_then(|s| Url::parse(s).ok())
                .ok_or(ERROR_DENIED)?;

            let instance = instance_history
                .query(host.host_str().unwrap().to_owned())
                .await;

            match instance {
                Ok(i) => {
                    log::info!("Instance history: {:?}", i);
                    Ok(())
                }
                Err(_) => Err::<(), _>(ERROR_DENIED),
            }
        }
    }));

    // let user_history = Arc::new(LruData::sized( ... ));

    state.push_evaluator(Box::new(|info: &APRequestInfo<'_>| {
        let act = info.activity.as_ref().map_err(|_| ERROR_DENIED).cloned();
        async move {
            let act = act?;
            log::debug!("Activity: {:?}", act);
            Ok(())
        }
    }));

    state
}
```
init Signed-off-by: eternal-flame-AD <yume@yumechi.jp> 2024-10-14 23:13:16 -05:00			`# Fedivet`

Add some AP examples Signed-off-by: eternal-flame-AD <yume@yumechi.jp> 2024-10-15 03:21:56 -05:00			`WIP! Not usable yet.`

init Signed-off-by: eternal-flame-AD <yume@yumechi.jp> 2024-10-14 23:13:16 -05:00			`Software-defined Application Firewall for ActivityPub inboxes.`

			`## Objective`

			`The goal of this project is to write a dynamic trust evaluation framework for ActivityPub inboxes`
			`that do not rely on the message content itself by default.`

			`Currently I want to stay as unopinionated as possible and focus on the infrastructure and features, after I tested the system out I plan to roll out more opinionated and easy-to-use configurations. I also plan to allow zero-code or even zero-configuration deployments.`

			`## What's wrong with current solutions?`

			`### Content-based Filtering`

			`While an effective and well-known method, particularly in the email world, they may not be as effective in the ActivityPub world, reasons include:`

			`- Emails are usually formal and have a clear structure, while ActivityPub messages are more free-form and can contain arbitrary data, some of them even humans have trouble understanding.`
			`- Short, confusing messages in ActivityPub are usually not spam, but they are in email.`
			`- Content-based filtering is not effective against spam that is not in the content, such as follow requests, photos, links, etc.`
			`- It is hard to set a hard threshold for things like "too many mentions" or "too many links" because they are not always spam. A heated debate can easily trigger these thresholds while being legitimate.`

			`### Instance-level Blocking`

			`Instance-level blocking is not effective against spam and abuse where the attacker has control over multiple instances (either by owning the domains themselves or by hijacking open registration instances).`

			`Sometimes large instances are also used to send spam, and it is not always desirable to block them, especially if they are not malicious by themselves.`

			`### Our Approach`

			`Instead we will focus on machine-readable data that can be used to evaluate the trustworthiness of the incoming requests that considers context on each of global, instance, user and message level.`

			`## Infrastructure`

			`## Data Sources`

			`In addition to the decoded inbox message itself,`
			`we provide keyed LRU caches with TTL to retrieve additional supporting`
			`information about the incoming requests,`
			`such as domain history, user history, instance metadata, etc.`
			`Multiple requests to the same key will be deduplicated and only one request will be made.`

			`### Built-in Data Sources`

			`- [ ] [Fediverse Observer](https://fediverse.observer)`
			`- [ ] Domain WHOIS`
			`- [ ] Advertised NodeInfo`
			`- [ ] Federation Reports from Friend Instances`

			`## Evaluators`

			`Evaluator is an async function that takes an incoming request and either passes it through or return an response early. When all evaluators pass the request, the request will be forwarded to the backend.`

			Evaluators can be written as a free async closure or a struct implementing the `Evaluator` trait.

			`### Built-in Evaluators`

			`- [ ] Dangling mentions: If most of the mentioned user does not appear in the parent message, it is likely spam.`
			`- [ ] High-frequency messages with highly similar content (maybe Bloom Filter or a real Fuzzy Hash)`
			`- [ ] Completely fresh instance sending PMs or large number of mentions`
			`- [ ] Open Registration Instances with Abnormal User Growth`
			`- [ ] WHOIS from known bad registrars`
			`- [ ] Instances already blocked by Friend Instances`

			```rs
			`#[allow(clippy::unused_async)]`
			`async fn build_state(args: &Args) -> AppState<MisskeyError> {`
			`let mut state = AppState::new(args.backend.parse().expect("Invalid backend URL"));`

			`let instance_history = Arc::new(LruData::sized(`
			`&\|host\| async move { Ok::<_, ()>("Todo") },`
			`512.try_into().unwrap(),`
			`Some(Duration::from_secs(600)),`
			`));`

			`state.push_evaluator(Box::new(move \|info: &APRequestInfo<'_>\| {`
			`let act = info.activity.as_ref().map_err(\|_\| ERROR_DENIED).cloned();`
			`let instance_history = Arc::clone(&instance_history);`

			`async move {`
			`let act = act?;`
			`let host = act`
			`.actor`
			`.as_ref()`
			`.and_then(\|s\| Url::parse(s).ok())`
			`.ok_or(ERROR_DENIED)?;`

			`let instance = instance_history`
			`.query(host.host_str().unwrap().to_owned())`
			`.await;`

			`match instance {`
			`Ok(i) => {`
			`log::info!("Instance history: {:?}", i);`
			`Ok(())`
			`}`
			`Err(_) => Err::<(), _>(ERROR_DENIED),`
			`}`
			`}`
			`}));`

			`// let user_history = Arc::new(LruData::sized( ... ));`

			`state.push_evaluator(Box::new(\|info: &APRequestInfo<'_>\| {`
			`let act = info.activity.as_ref().map_err(\|_\| ERROR_DENIED).cloned();`
			`async move {`
			`let act = act?;`
			`log::debug!("Activity: {:?}", act);`
			`Ok(())`
			`}`
			`}));`

			`state`
			`}`
			```