2 Getting Started
2.1 The tidyverse
We’ll be using the tidyverse ecosystem of R packages, which are really powerful data science tools that are perfect for these tasks.
We will mainly focus on learning how to use these two libraries:
dplyr: this is your best friend for transforming and manipulating data (data plyr)
ggplot2: incredibly robust and modular tool for building visualizations. Provides a lot of flexibility and a rich ecosystem of extensions.
We can install these if we haven’t previously installed them, we can install dplyr
using install.packages()
:
install.packages('dplyr')
And do the same for ggplot2
:
install.packages('ggplot2')
We only need to install libraries once on our computer, but any time we start a new R session we need to load the ones we want to use with library()
. Let’s go ahead and do that to import the functionality of these functions for the rest of the scripts:
library(dplyr)
library(ggplot2)
query_volume %>%
query_volume = mutate(date = as.Date(end_epoch)) %>%
group_by(subgraph_deployment_ipfs_hash, date) %>%
summarize(query_count = sum(query_count),
total_query_fees = sum(query_count*avg_query_fee))
2.2 Prepare Data
We have pre-loaded some real query volume data from the QoS subgraph, which shows us query volume ~10 minutes behind real time, compared to on-chain query volume which can take 28 epochs, or never appear on-chain. The latest available data is from 2025-04-25: