14 Indexing Rewards Proportions
Now that we have a list of subgraphs we have synced, we want to figure out which ones offer the best indexing rewards returns. To do this, we want to pull in the number of allocated tokens on each subgraph deployment, as well as the curation signal. From there, we will be able to figure out which ones have the largest amount of signal relative to the allocation sizes, and maximize our returns.
This is the base query we will use:
'
query =query MyQuery($created_at_gt: Int!) {
subgraphDeployments(
first: 1000
orderBy: createdAt
orderDirection: asc
where: {activeSubgraphCount_gt: 0, deniedAt: 0, createdAt_gt: $created_at_gt}
) {
ipfsHash
indexerAllocations(where: {poi: null}) {
allocatedTokens
indexer {
id
}
}
signalledTokens
createdAt
deprecatedSubgraphCount
}
}'
Next we initialize an empty dataset with tibble()
, and initialize created_at_gt
to 0. The GraphQL endpoints are limited to a maximum value of 1,000 rows at a time, so we will want to extract the data in parts and construct a complete dataset. There will never be more than 1,000 subgraphs created in the exact same second provided by createdAt
, so we can use this as a filter to build a full dataset.
At the start, the filter will be set to 0, which will give us the first 1,000 created subgraphs. After that point, we can get the maximum createdAt
value, and use that to only get subgraphs created after the maximum timestamp from the previous batch. This way we build a full dataset without pulling duplicate data.
The implementation isn’t overly important. The only important thing to understand is that we are pulling the signal and allocation sizes for each subgraph that exists in this step:
# initialize empty dataset
tibble()
rewards_info =# set created_at_gt variable to 0 to start before we overwrite it with the new maximum as data is pulled
list(created_at_gt = 0)
variables =
# keep repeating the same query until we
repeat {
# Send POST request to the GraphQL API
POST(url, body = list(query = query, variables = variables), encode = "json")
response =
# Parse the JSON response
content(response, "text", encoding='UTF-8')
content = fromJSON(content, flatten = TRUE)
json_data = json_data$data$subgraphDeployments
data =
# if we are out of rows to collect, break the repetition and move on to the next step
if (length(data) == 0) {
break
}
# Process the data
tibble(data) %>%
processed_data = # calculate and clean up columns needed. Start by calculating count of indexers per subgraph
mutate(unique_indexers = map_int(indexerAllocations, ~length(unique(.x$indexer.id))),
# allocated tokens per subgraph
total_allocated_tokens = map_dbl(indexerAllocations, ~sum(as.numeric(.x$allocatedTokens))) / 10^18,
# signalled tokens per subgraph
signalled_tokens = as.numeric(signalledTokens) / 10^18,
# convert created at timestamp to datetime
created_at = as.POSIXct(as.numeric(createdAt), origin = "1970-01-01"),
# track whether the subgraph is deprecated
deprecated_subgraph_count = deprecatedSubgraphCount) %>%
# rename ipfsHash to deployment and select only the columns needed
select(deployment = ipfsHash, unique_indexers, total_allocated_tokens, signalled_tokens,
created_at, deprecated_subgraph_count)
# union the newly pulled data to the full dataset
bind_rows(rewards_info, processed_data)
rewards_info =
# if less than 1,000 rows are collected this means there's no more data to collect and can move on
if (nrow(processed_data) < 1000) {
break
}# set the new maximum subgraph creation time to use for the next data pull
$created_at_gt = max(as.numeric(data$createdAt))
variables
}# Show and confirm full row count
paste("Fetched indexer counts for", nrow(rewards_info), "subgraphs")
## [1] "Fetched indexer counts for 11324 subgraphs"
# join rewards info to the list of synced subgraphs
synced_subgraphs %>%
synced_subgraphs = left_join(rewards_info, by = 'deployment') %>%
select(deployment, unique_indexers, signalled_tokens, total_allocated_tokens) %>%
mutate(rewards_proportion = signalled_tokens / total_allocated_tokens)
## # A tibble: 258 × 5
## deployment unique_indexers signalled_tokens total_allocated_tokens
## <chr> <int> <dbl> <dbl>
## 1 QmTyyrcaTduZN1ySRwtV… 5 9902. 2369230
## 2 QmWXhLkz6fRJwLyFmgBK… 13 6931. 1938503.
## 3 QmVUcp8WqE47payYxMJU… 9 4554. 1048922
## 4 QmdKXcBUHR3UyURqVRQH… 25 33256. 8813353.
## 5 QmW26TG5s9myd1gzio9f… 15 4843. 1316928.
## 6 QmYSxAAWeXDJ8DbWTy3o… 11 2976. 707341
## 7 QmYV4MbMHuE98rWL5HRJ… 6 495. 117677
## 8 QmddvgpoNsfxXyQ972Kh… 8 1981. 470963
## 9 QmSK5hvEsM5mw6658SE1… 10 4952. 1167837
## 10 QmQuCY9bwTF13ZkPdpgy… NA NA NA
## # ℹ 248 more rows
## # ℹ 1 more variable: rewards_proportion <dbl>
Nice! Next, we want to also consider how many queries are served on each subgraph before we allocate towards them, which we will do in the next section
save.image('/root/github/indexer_analytics_tutorial/data/chapters_snapshots/06-indexing_rewards_proportions.RData')