2020-02-13 00:45:15 -06:00
# Avoiding the N+1 Problem With Dataloaders
A common issue with graphql servers is how the resolvers query their datasource.
2020-04-26 01:26:16 -05:00
This issue results in a large number of unneccessary database queries or http requests.
2020-02-13 00:45:15 -06:00
Say you were wanting to list a bunch of cults people were in
```graphql
query {
persons {
id
name
cult {
id
name
}
}
}
```
What would be executed by a SQL database would be:
```sql
SELECT id, name, cult_id FROM persons;
SELECT id, name FROM cults WHERE id = 1;
SELECT id, name FROM cults WHERE id = 1;
SELECT id, name FROM cults WHERE id = 1;
SELECT id, name FROM cults WHERE id = 1;
SELECT id, name FROM cults WHERE id = 2;
SELECT id, name FROM cults WHERE id = 2;
SELECT id, name FROM cults WHERE id = 2;
# ...
```
Once the list of users has been returned, a separate query is run to find the cult of each user.
You can see how this could quickly become a problem.
A common solution to this is to introduce a **dataloader** .
2020-05-13 21:37:14 -05:00
This can be done with Juniper using the crate [cksac/dataloader-rs ](https://github.com/cksac/dataloader-rs ), which has two types of dataloaders; cached and non-cached.
2020-02-13 00:45:15 -06:00
2020-05-13 21:37:14 -05:00
#### Cached Loader
DataLoader provides a memoization cache, after .load() is called once with a given key, the resulting value is cached to eliminate redundant loads.
DataLoader caching does not replace Redis, Memcache, or any other shared application-level cache. DataLoader is first and foremost a data loading mechanism, and its cache only serves the purpose of not repeatedly loading the same data in the context of a single request to your Application. [(read more) ](https://github.com/graphql/dataloader#caching )
2020-02-13 00:45:15 -06:00
### What does it look like?
!FILENAME Cargo.toml
```toml
[dependencies]
2021-07-06 18:23:41 -05:00
actix-identity = "0.4.0-beta.2"
2020-02-13 00:45:15 -06:00
actix-rt = "1.0"
actix-web = {version = "2.0", features = []}
2020-05-13 21:37:14 -05:00
juniper = { git = "https://github.com/graphql-rust/juniper" }
2020-02-13 00:45:15 -06:00
futures = "0.3"
postgres = "0.15.2"
2020-05-13 21:37:14 -05:00
dataloader = "0.12.0"
async-trait = "0.1.30"
2020-02-13 00:45:15 -06:00
```
```rust, ignore
2020-05-13 21:37:14 -05:00
// use dataloader::cached::Loader;
use dataloader::non_cached::Loader;
use dataloader::BatchFn;
2020-02-13 00:45:15 -06:00
use std::collections::HashMap;
use postgres::{Connection, TlsMode};
use std::env;
pub fn get_db_conn() -> Connection {
let pg_connection_string = env::var("DATABASE_URI").expect("need a db uri");
println!("Connecting to {}", pg_connection_string);
let conn = Connection::connect(& pg_connection_string[..], TlsMode::None).unwrap();
println!("Connection is fine");
conn
}
#[derive(Debug, Clone)]
pub struct Cult {
pub id: i32,
pub name: String,
}
pub fn get_cult_by_ids(hashmap: & mut HashMap< i32 , Cult > , ids: Vec< i32 > ) {
let conn = get_db_conn();
for row in & conn
.query("SELECT id, name FROM cults WHERE id = ANY($1)", & [& ids])
.unwrap()
{
let cult = Cult {
id: row.get(0),
name: row.get(1),
};
hashmap.insert(cult.id, cult);
}
}
pub struct CultBatcher;
2020-05-13 21:37:14 -05:00
#[async_trait]
2020-02-13 00:45:15 -06:00
impl BatchFn< i32 , Cult > for CultBatcher {
// A hashmap is used, as we need to return an array which maps each original key to a Cult.
2020-05-13 21:37:14 -05:00
async fn load(& self, keys: & [i32]) -> HashMap< i32 , Cult > {
println!("load cult batch {:?}", keys);
let mut cult_hashmap = HashMap::new();
get_cult_by_ids(& mut cult_hashmap, keys.to_vec());
cult_hashmap
}
2020-02-13 00:45:15 -06:00
}
2020-05-13 21:37:14 -05:00
pub type CultLoader = Loader< i32 , Cult , CultBatcher > ;
2020-02-13 00:45:15 -06:00
// To create a new loader
pub fn get_loader() -> CultLoader {
Loader::new(CultBatcher)
2020-05-13 21:37:14 -05:00
// Usually a DataLoader will coalesce all individual loads which occur
// within a single frame of execution before calling your batch function with all requested keys.
// However sometimes this behavior is not desirable or optimal.
// Perhaps you expect requests to be spread out over a few subsequent ticks
// See: https://github.com/cksac/dataloader-rs/issues/12
// More info: https://github.com/graphql/dataloader#batch-scheduling
// A larger yield count will allow more requests to append to batch but will wait longer before actual load.
.with_yield_count(100)
2020-02-13 00:45:15 -06:00
}
#[juniper::graphql_object(Context = Context)]
impl Cult {
// your resolvers
// To call the dataloader
pub async fn cult_by_id(ctx: & Context, id: i32) -> Cult {
2020-05-13 21:37:14 -05:00
ctx.cult_loader.load(id).await
2020-02-13 00:45:15 -06:00
}
}
```
### How do I call them?
2020-05-13 21:37:14 -05:00
Once created, a dataloader has the async functions `.load()` and `.load_many()` .
In the above example `cult_loader.load(id: i32).await` returns `Cult` . If we had used `cult_loader.load_many(Vec<i32>).await` it would have returned `Vec<Cult>` .
2020-02-13 00:45:15 -06:00
### Where do I create my dataloaders?
**Dataloaders** should be created per-request to avoid risk of bugs where one user is able to load cached/batched data from another user/ outside of its authenticated scope.
Creating dataloaders within individual resolvers will prevent batching from occurring and will nullify the benefits of the dataloader.
For example:
_When you declare your context_
```rust, ignore
use juniper;
#[derive(Clone)]
pub struct Context {
pub cult_loader: CultLoader,
}
impl juniper::Context for Context {}
impl Context {
pub fn new(cult_loader: CultLoader) -> Self {
Self {
cult_loader
}
}
}
```
_Your handler for GraphQL (Note: instantiating context here keeps it per-request)_
```rust, ignore
pub async fn graphql(
st: web::Data< Arc < Schema > >,
data: web::Json< GraphQLRequest > ,
) -> Result< HttpResponse , Error > {
// Context setup
let cult_loader = get_loader();
let ctx = Context::new(cult_loader);
// Execute
2020-05-13 21:37:14 -05:00
let res = data.execute(& st, &ctx).await;
2020-02-13 00:45:15 -06:00
let json = serde_json::to_string(&res).map_err(error::ErrorInternalServerError)?;
Ok(HttpResponse::Ok()
.content_type("application/json")
.body(json))
}
```
### Further Example:
For a full example using Dataloaders and Context check out [jayy-lmao/rust-graphql-docker ](https://github.com/jayy-lmao/rust-graphql-docker ).