Software-defined Application Firewall for ActivityPub inboxes.

Find a file

eternal-flame-AD 46456b0a61 update deps Signed-off-by: eternal-flame-AD <yume@yumechi.jp>		2024-11-22 08:54:39 -06:00
ci	Dependency audit	2024-10-17 13:32:33 -05:00
src	make context optional	2024-11-22 08:54:22 -06:00
test-data	Add some AP examples	2024-10-15 03:21:56 -05:00
.gitignore	fix reverse proxying	2024-10-16 20:18:00 -05:00
Cargo.lock	update deps	2024-11-22 08:54:39 -06:00
Cargo.toml	add filter module	2024-11-19 01:08:13 -06:00
deny.toml	Dependency audit	2024-10-17 13:32:33 -05:00
LICENSE	init	2024-10-15 00:01:08 -05:00
LICENSE-dependencies	Dependency audit	2024-10-17 13:32:33 -05:00
README.md	Add some AP examples	2024-10-15 03:21:56 -05:00

README.md

Fedivet

WIP! Not usable yet.

Software-defined Application Firewall for ActivityPub inboxes.

Objective

The goal of this project is to write a dynamic trust evaluation framework for ActivityPub inboxes that do not rely on the message content itself by default.

Currently I want to stay as unopinionated as possible and focus on the infrastructure and features, after I tested the system out I plan to roll out more opinionated and easy-to-use configurations. I also plan to allow zero-code or even zero-configuration deployments.

What's wrong with current solutions?

Content-based Filtering

While an effective and well-known method, particularly in the email world, they may not be as effective in the ActivityPub world, reasons include:

Emails are usually formal and have a clear structure, while ActivityPub messages are more free-form and can contain arbitrary data, some of them even humans have trouble understanding.
Short, confusing messages in ActivityPub are usually not spam, but they are in email.
Content-based filtering is not effective against spam that is not in the content, such as follow requests, photos, links, etc.
It is hard to set a hard threshold for things like "too many mentions" or "too many links" because they are not always spam. A heated debate can easily trigger these thresholds while being legitimate.

Instance-level Blocking

Instance-level blocking is not effective against spam and abuse where the attacker has control over multiple instances (either by owning the domains themselves or by hijacking open registration instances).

Sometimes large instances are also used to send spam, and it is not always desirable to block them, especially if they are not malicious by themselves.

Our Approach

Instead we will focus on machine-readable data that can be used to evaluate the trustworthiness of the incoming requests that considers context on each of global, instance, user and message level.

Infrastructure

Data Sources

In addition to the decoded inbox message itself, we provide keyed LRU caches with TTL to retrieve additional supporting information about the incoming requests, such as domain history, user history, instance metadata, etc. Multiple requests to the same key will be deduplicated and only one request will be made.

Built-in Data Sources

Fediverse Observer
Domain WHOIS
Advertised NodeInfo
Federation Reports from Friend Instances

Evaluators

Evaluator is an async function that takes an incoming request and either passes it through or return an response early. When all evaluators pass the request, the request will be forwarded to the backend.

Evaluators can be written as a free async closure or a struct implementing the Evaluator trait.

Built-in Evaluators

Dangling mentions: If most of the mentioned user does not appear in the parent message, it is likely spam.
High-frequency messages with highly similar content (maybe Bloom Filter or a real Fuzzy Hash)
Completely fresh instance sending PMs or large number of mentions
Open Registration Instances with Abnormal User Growth
WHOIS from known bad registrars
Instances already blocked by Friend Instances

#[allow(clippy::unused_async)]
async fn build_state(args: &Args) -> AppState<MisskeyError> {
    let mut state = AppState::new(args.backend.parse().expect("Invalid backend URL"));

    let instance_history = Arc::new(LruData::sized(
        &|host| async move { Ok::<_, ()>("Todo") },
        512.try_into().unwrap(),
        Some(Duration::from_secs(600)),
    ));

    state.push_evaluator(Box::new(move |info: &APRequestInfo<'_>| {
        let act = info.activity.as_ref().map_err(|_| ERROR_DENIED).cloned();
        let instance_history = Arc::clone(&instance_history);

        async move {
            let act = act?;
            let host = act
                .actor
                .as_ref()
                .and_then(|s| Url::parse(s).ok())
                .ok_or(ERROR_DENIED)?;

            let instance = instance_history
                .query(host.host_str().unwrap().to_owned())
                .await;

            match instance {
                Ok(i) => {
                    log::info!("Instance history: {:?}", i);
                    Ok(())
                }
                Err(_) => Err::<(), _>(ERROR_DENIED),
            }
        }
    }));

    // let user_history = Arc::new(LruData::sized( ... ));

    state.push_evaluator(Box::new(|info: &APRequestInfo<'_>| {
        let act = info.activity.as_ref().map_err(|_| ERROR_DENIED).cloned();
        async move {
            let act = act?;
            log::debug!("Activity: {:?}", act);
            Ok(())
        }
    }));

    state
}