< h1 id = "dataloader" > < a class = "header" href = "#dataloader" > DataLoader< / a > < / h1 >
< p > DataLoader pattern, named after the correspondent < a href = "https://github.com/graphql/dataloader" > < code > dataloader< / code > NPM package< / a > , represents a mechanism of batching and caching data requests in a delayed manner for solving the < a href = "n_plus_1.html" > N+1 problem< / a > .< / p >
< blockquote >
< p > A port of the "Loader" API originally developed by < a href = "https://github.com/schrockn" > @schrockn< / a > at Facebook in 2010 as a simplifying force to coalesce the sundry key-value store back-end APIs which existed at the time. At Facebook, "Loader" became one of the implementation details of the "Ent" framework, a privacy-aware data entity loading and caching layer within web server product code. This ultimately became the underpinning for Facebook's GraphQL server implementation and type definitions.< / p >
< / blockquote >
< p > In < a href = "https://www.rust-lang.org" > Rust< / a > ecosystem, DataLoader pattern is introduced with the < a href = "https://docs.rs/crate/dataloader" > < code > dataloader< / code > crate< / a > , naturally usable with < a href = "https://docs.rs/juniper" > Juniper< / a > .< / p >
< p > Let's remake our < a href = "n_plus_1.html" > example of N+1 problem< / a > , so it's solved by applying the DataLoader pattern:< / p >
< pre > < pre class = "playground" > < code class = "language-rust edition2021" > < span class = "boring" > extern crate anyhow;
< / span > < span class = "boring" > extern crate dataloader;
< / span > < span class = "boring" > extern crate juniper;
< / span > < span class = "boring" > use std::{collections::HashMap, sync::Arc};
< / span > < span class = "boring" > use anyhow::anyhow;
< / span > < span class = "boring" > use dataloader::non_cached::Loader;
< / span > < span class = "boring" > use juniper::{graphql_object, GraphQLObject};
< / span > < span class = "boring" >
< / span > < span class = "boring" > type CultId = i32;
< / span > < span class = "boring" > type UserId = i32;
< / span > < span class = "boring" >
< / span > < span class = "boring" > struct Repository;
< / span > < span class = "boring" >
< / span > < span class = "boring" > impl Repository {
< / span > < span class = "boring" > async fn load_cults_by_ids(& self, cult_ids: & [CultId]) -> anyhow::Result< HashMap< CultId, Cult> > { unimplemented!() }
< / span > < span class = "boring" > async fn load_all_persons(& self) -> anyhow::Result< Vec< Person> > { unimplemented!() }
< / span > < span class = "boring" > }
< / span > < span class = "boring" >
< / span > struct Context {
repo: Repository,
cult_loader: CultLoader,
impl juniper::Context for Context {}
#[derive(Clone, GraphQLObject)]
struct Cult {
id: CultId,
name: String,
struct CultBatcher {
repo: Repository,
// Since `BatchFn` doesn't provide any notion of fallible loading, like
// `try_load()` returning `Result< HashMap< K, V> , E> `, we handle possible
// errors as loaded values and unpack them later in the resolver.
impl dataloader::BatchFn< CultId, Result< Cult, Arc< anyhow::Error> > > for CultBatcher {
async fn load(
& mut self,
cult_ids: & [CultId],
) -> HashMap< CultId, Result< Cult, Arc< anyhow::Error> > > {
// Effectively performs the following SQL query:
// SELECT id, name FROM cults WHERE id IN (${cult_id1}, ${cult_id2}, ...)
match self.repo.load_cults_by_ids(cult_ids).await {
Ok(found_cults) => {
found_cults.into_iter().map(|(id, cult)| (id, Ok(cult))).collect()
// One could choose a different strategy to deal with fallible loads,
// like consider values that failed to load as absent, or just panic.
// See cksac/dataloader-rs#35 for details:
// https://github.com/cksac/dataloader-rs/issues/35
Err(e) => {
// Since `anyhow::Error` doesn't implement `Clone`, we have to
// work around here.
let e = Arc::new(e);
cult_ids.iter().map(|k| (k.clone(), Err(e.clone()))).collect()
type CultLoader = Loader< CultId, Result< Cult, Arc< anyhow::Error> > , CultBatcher> ;
fn new_cult_loader(repo: Repository) -> CultLoader {
CultLoader::new(CultBatcher { repo })
// Usually a `Loader` will coalesce all individual loads which occur
// within a single frame of execution before calling a `BatchFn::load()`
// with all the collected keys. However, sometimes this behavior is not
// desirable or optimal (perhaps, a request is expected to be spread out
// over a few subsequent ticks).
// A larger yield count will allow more keys to be appended to the batch,
// but will wait longer before the actual load. For more details see:
// https://github.com/cksac/dataloader-rs/issues/12
// https://github.com/graphql/dataloader#batch-scheduling
struct Person {
id: UserId,
name: String,
cult_id: CultId,
#[graphql(context = Context)]
impl Person {
fn id(& self) -> CultId {
fn name(& self) -> & str {
async fn cult(& self, ctx: & Context) -> anyhow::Result< Cult> {
// Here, we don't run the `CultBatcher::load()` eagerly, but rather
// only register the `self.cult_id` value in the `cult_loader` and
// wait for other concurrent resolvers to do the same.
// The actual batch loading happens once all the resolvers register
// their IDs and there is nothing more to execute.
// The outer error is the `io::Error` returned by `try_load()` if
// no value is present in the `HashMap` for the specified
// `self.cult_id`, meaning that there is no `Cult` with such ID
// in the `Repository`.
.map_err(|_| anyhow!("No cult exists for ID `{}`", self.cult_id))?
// The inner error is the one returned by the `CultBatcher::load()`
// if the `Repository::load_cults_by_ids()` fails, meaning that
// running the SQL query failed.
.map_err(|arc_err| anyhow!("{arc_err}"))
struct Query;
#[graphql(context = Context)]
impl Query {
async fn persons(ctx: & Context) -> anyhow::Result< Vec< Person> > {
// Effectively performs the following SQL query:
// SELECT id, name, cult_id FROM persons
fn main() {
}< / code > < / pre > < / pre >
< p > And now, performing a < a href = "n_plus_1.html" > GraphQL query which lead to N+1 problem< / a > < / p >
< pre > < code class = "language-graphql" > query {
persons {
cult {
< / code > < / pre >
< p > will lead to efficient < a href = "https://en.wikipedia.org/wiki/SQL" > SQL< / a > queries, just as expected:< / p >
< pre > < code class = "language-sql" > SELECT id, name, cult_id FROM persons;
SELECT id, name FROM cults WHERE id IN (1, 2, 3, 4);
< / code > < / pre >
< h2 id = "caching" > < a class = "header" href = "#caching" > Caching< / a > < / h2 >
< p > < a href = "https://docs.rs/dataloader/latest/dataloader/cached/index.html" > < code > dataloader::cached< / code > < / a > provides a < a href = "https://en.wikipedia.org/wiki/Memoization" > memoization< / a > cache: after < code > BatchFn::load()< / code > is called once with given keys, the resulting values are cached to eliminate redundant loads.< / p >
< p > DataLoader caching does not replace < a href = "https://redis.io" > Redis< / a > , < a href = "https://memcached.org" > Memcached< / a > , or any other shared application-level cache. DataLoader is first and foremost a data loading mechanism, and its cache only serves the purpose of not repeatedly loading the same data < a href = "https://github.com/graphql/dataloader#caching" > in the context of a single request< / a > .< / p >
< blockquote >
< p > < strong > WARNING< / strong > : A DataLoader should be created per-request to avoid risk of bugs where one client is able to load cached/batched data from another client outside its authenticated scope. Creating a DataLoader within an individual resolver will prevent batching from occurring and will nullify any benefits of it.< / p >
< / blockquote >
< h2 id = "full-example" > < a class = "header" href = "#full-example" > Full example< / a > < / h2 >
< p > For a full example using DataLoaders in < a href = "https://docs.rs/juniper" > Juniper< / a > check out the < a href = "https://github.com/jayy-lmao/rust-graphql-docker" > < code > jayy-lmao/rust-graphql-docker< / code > repository< / a > .< / p >
