Optimizing N+1 queries
Have you noticed some GraphQL queries end can make hundreds of database queries, often with mostly repeated data? Lets take a look why and how to fix it.
Query Resolution
Imagine if you have a simple query like this:
query {
todos {
users {
name
}
}
}
and User
resolver is like this:
struct User {
id: u64,
}
#[Object]
impl User {
async fn name(&self, ctx: &Context<'_>) -> Result<String> {
let pool = ctx.data_unchecked::<Pool<Postgres>>();
let (name,): (String,) = sqlx::query_as("SELECT name FROM user WHERE id = $1")
.bind(self.id)
.fetch_one(pool)
.await?;
Ok(name)
}
}
The query executor will call the Todos
resolver which does a select * from todo and return N todos
. Then for each
of the todos, concurrently, call the User
resolver, SELECT from USER where id = todo.user_id
.
eg:
SELECT id, todo, user_id FROM todo
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
SELECT name FROM user WHERE id = $1
After executing SELECT name FROM user WHERE id = $1
many times, and most Todo
objects belong to the same user, we
need to optimize these codes!
Dataloader
We need to group queries and exclude duplicate queries. Dataloader
can do this.
facebook gives a request-scope batch and caching solution.
The following is a simplified example of using DataLoader
to optimize queries, there is also a full code example available in GitHub.
use async_graphql::*;
use async_graphql::dataloader::*;
use std::sync::Arc;
struct UserNameLoader {
pool: sqlx::PgPool,
}
impl Loader<u64> for UserNameLoader {
type Value = String;
type Error = Arc<sqlx::Error>;
async fn load(&self, keys: &[u64]) -> Result<HashMap<u64, Self::Value>, Self::Error> {
Ok(sqlx::query_as("SELECT name FROM user WHERE id = ANY($1)")
.bind(keys)
.fetch(&self.pool)
.map_ok(|name: String| name)
.map_err(Arc::new)
.try_collect().await?)
}
}
#[derive(SimpleObject)]
#[graphql(complex)]
struct User {
id: u64,
}
#[ComplexObject]
impl User {
async fn name(&self, ctx: &Context<'_>) -> Result<String> {
let loader = ctx.data_unchecked::<DataLoader<UserNameLoader>>();
let name: Option<String> = loader.load_one(self.id).await?;
name.ok_or_else(|| "Not found".into())
}
}
To expose UserNameLoader
in the ctx
, you have to register it with the schema, along with a task spawner, e.g. async_std::task::spawn
:
let schema = Schema::build(QueryRoot, EmptyMutation, EmptySubscription)
.data(DataLoader::new(
UserNameLoader,
async_std::task::spawn, // or `tokio::spawn`
))
.finish();
In the end, only two SQLs are needed to query the results we want!
SELECT id, todo, user_id FROM todo
SELECT name FROM user WHERE id IN (1, 2, 3, 4)
Implement multiple data types
You can implement multiple data types for the same Loader
, like this:
extern crate async_graphql;
use async_graphql::*;
struct PostgresLoader {
pool: sqlx::PgPool,
}
impl Loader<UserId> for PostgresLoader {
type Value = User;
type Error = Arc<sqlx::Error>;
async fn load(&self, keys: &[UserId]) -> Result<HashMap<UserId, Self::Value>, Self::Error> {
// Load users from database
}
}
impl Loader<TodoId> for PostgresLoader {
type Value = Todo;
type Error = sqlx::Error;
async fn load(&self, keys: &[TodoId]) -> Result<HashMap<TodoId, Self::Value>, Self::Error> {
// Load todos from database
}
}