Starter Guide to Solana Data Analysis (Part 1)
Analyzing the Orca Whirlpool AMM, using new Solana decoded tables.
This is the first of a four part series covering basic analysis of DEXs and NFTs on Solana. Basic SQL knowledge will be helpful.
part 1: basics of Solana tables and a simple whirlpool swap
part 2: types of programs, token accounts, and balances (TVL)
Make sure to subscribe so you don’t miss the following parts!
If you prefer to self study or work ahead, you can reference all the queries in my two dashboards on Dune:
By the end of the four guides you should be able to understand the core of those queries. My DMs on Twitter are always open for questions or ideas.
Big shoutout to Jon Wong for all his help answering my own questions - be sure to follow him for Solana content and updates. I also learned a lot from fellow Solana analysts Marqu and Anduril - both are must follows in the space.
Table of Contents:
Whirlpool DEX basics and initialization
Breaking down a single swap and working with inner instructions
The goal will be to enable you to calculate “Total Volume” of the SOL-USDC 0.05% fee pool (highlighted below) for direct swaps.
Solana Tables (Raw and Decoded)
Solana produces 100+ GB of a data each day, compared to 2.5 GB on Ethereum mainnet. So if you’re trying to do some historical or aggregated analysis over even a few weeks, your queries will time out on the raw tables (which is why we have decoded tables now to make it much faster).
You can find all the raw/decoded tables described in the docs - we’re going to focus on solana.transactions and solana decoded tables schemas.
A transaction in Solana can have multiple instructions. Instructions are how you pass data and invoke a function on a program. In the EVM you can only call one contract per transaction, which then cascades into traces of internal contract calls. So you can think of Solana as making multiple top level contract calls (placed into an array).
The “instructions” columns contains the following values per instruction, in an array:
“executing_account”: this is the program id that was invoked/called - note that addresses in Solana use base58, which can be converted to bytearray using
“account_arguments”: these are all of the accounts that are needed for interactions throughout the instructions. They can be a little weird to wrap your head around at first since they sit in an array without labels - but we’ve decoded these for you.
“data”: This defines the function being called and parameters to pass in. In the transactions table this is base58, but in decoded tables we’ll have transformed it to bytearray for you and decoded the parameters.
“inner_instructions”: these are all the instructions that are then invoked by the program. Program to program calls are known as Cross Program Invocations (CPIs). You can kind of think of these like traces in EVM.
Inner instructions follow the same schema, but they don’t have any inner instructions themselves - if you call a program than then calls another program, it just shows up sequentially in inner instructions.
You can reference these instruction columns in SQL by using the following syntax: instructions.inner_instructions.data.
You’ll also need to know about “log_messages” column - before you get excited, this is just an array of console log texts. There is no such thing as an event topic with parameters here, and usually no useful values are found there. However, it can be great for filtering for specific programs called or edge cases without having to dig into inner instructions!
The decoded tables build directly off of solana.transactions by unnesting all outer and inner instructions and then creating new columns for every function input in “data” and every account in “account_arguments” with the schema <namespace>_solana.<programName>_call_<instructionName>. Some things to note:
Base columns are inherited with the “call_” column prefix (stuff like tx_id, tx_index, block_time, block_slot).
account_arguments is exploded into columns, such that each known account gets a “account_” prefix.
Inner and Outer instructions are both included, you can use the “call_is_inner” column to filter if you would like to. Indexes and executing accounts are included for both, “data” and all other arguments apply to just the referenced level.
⚠️ You can request decoding for programs with public IDLs by submitting to this form.
Whirlpool is the newest AMM from Orca (launched in March 2022) that uses concentrated liquidity - it’s basically Uniswap v3 but with yield farming mechanics built in.
When a new pair is deployed, it gets its own program id (just like when a new contract is deployed on Ethereum from a factory). We’re going to use just the USDC-SOL whirlpool for all the guides, so let’s query for it’s initialization first:
SELECT * FROM whirlpool_solana.whirlpool_call_initializePool WHERE account_whirlpool = '7qbRF6YsyGuLUVs6Y1q64bdVrfe4ZcUUz1JRdoVNUJnm'
You’ll see I reference an “account_whirlpool” column - that’s actually decoded from account_argument index 5 in the initialization transaction.
You can see a Token Mint A and Token Mint B as the token addresses referenced in account arguments - these are things that you’ll be used to seeing in “data” from EVM land.
There are some nuances with fees being updatable that I’m going to skip, but you can find the query for the USDC-SOL whirlpool here. You’ll see I join it on tokens_solana.fungible to get token symbol and decimal, getting us the result that this pool is:
Token A is 'sol’, specifically wrapped sol with 9 decimals.
Token B is ‘usdc’ with 6 decimals.
The fee tier is 500, which converts to 0.05% after dividing by 1000000.
Subscribe to get the next parts of the guide, and other Web3 data tips and tricks!
Breaking down a swap
Now let’s query for swaps through this pool (we don’t filter for success because decoded tables only contain successful instructions).
SELECT * FROM whirlpool_solana.whirlpool_call_swap sp WHERE sp.account_whirlpool = '7qbRF6YsyGuLUVs6Y1q64bdVrfe4ZcUUz1JRdoVNUJnm' limit 100
Where we can see some key columns:
Notably, you’ll see there are “amount”, “otherAmountThreshold”, and an “aToB” column. We could construct swaps using these function inputs - but these are the estimated/intended swap values and not the actual amounts because it’s pre-execution.
To get the actual swap amounts - we need to join them from the spl_token transfer table or decode them from “inner_instructions”. Let’s look at this example swap:
If you look at enough swaps, you’ll notice that the token sold is always transferred in the first inner instruction (3.1 above), then token bought is second (3.2). We can then figure out the token symbol based on if “aToB” is true or false (if it is true, then a is being sold and b is being bought).
Joining spl_token transfers
The “Token Program” is really the “spl_token program”, where the “transfer” instruction is being called. This maps to the decoded table spl_token_solana.spl_token_call_transfer.
We can join twice (once for the transfer out and once for transfer in):
SELECT tr_2.amount as token_bought_amount , tr_1.amount as token_sold_amount , sp.aToB , sp.call_tx_id FROM whirlpool_solana.whirlpool_call_swap sp LEFT JOIN spl_token_solana.spl_token_call_transfer tr_1 ON tr_1.call_tx_id = sp.call_tx_id AND tr_1.call_outer_instruction_index = sp.call_outer_instruction_index AND (sp.call_is_inner = false AND tr_1.call_inner_instruction_index = 1) LEFT JOIN spl_token_solana.spl_token_call_transfer tr_2 ON tr_2.call_tx_id = sp.call_tx_id AND tr_2.call_outer_instruction_index = sp.call_outer_instruction_index AND (sp.call_is_inner = false AND tr_2.call_inner_instruction_index = 2) WHERE sp.call_tx_id = '44kmeC1edSfp21K5kKNVViJvLHG8XQqqu3KbHsrYcYZGmopWwBgP48c9u1DRBMGtQcbvyxd2TT8syY7ZvwpHqkhF'
The logic is that we want to get the first and second inner instructions when we are looking at an outer instruction call (like our transaction example). If “call_is_inner” was True, where something like a user calls Jupiter which then calls Whirlpool happens, then we would need to get the first and second inner instructions after the “call_inner_instruction_index” (we’ll go over this last).
You can find this query here. You can see that the values are correct, but just are missing the right decimals.
Decoding instructions data
Sometimes, you’ll need to decode instructions data manually for performance reasons. Let’s walk through that swap now. The “data” in inner instructions is still in base58 form - we can get it hex using fromBase58(). Let’s look at the actual transfer data using another explorer, Lumina.fyi that retains the bytearray instruction data in UI.
The first byte (0x03) is a “discriminator” which is basically like a function signature. For non-IDL programs, instructions are in an ordered enum - you can see that Transfer() is the fourth enum value (starting from 0x00) and takes in one argument “amount” that is u64 - or 8 bytes.
So, we’re going to take the substring of the bytearray “data” from the 2nd byte for the next 8 bytes, and then convert that to an integer. This gets slightly technical, but the numerical data is always stored in little endian (EVM uses big endian) which means we need to first do a
bytearray_reverse before using
In practice, that means we take
0x0094357700000000 and flip it to become
0x0000000077534900 and then change it to an integer value of
You can use this web tool to check numeric conversion manually.
We decode the first “inner_instruction” for “token_sold_amount”, and then the second for “token_bought_amount”:
SELECT bytearray_to_bigint( bytearray_reverse( bytearray_substring( frombase58(sp.call_inner_instructions.data) , 2, 8) )) as token_sold_amount , bytearray_to_bigint( bytearray_reverse( bytearray_substring( frombase58(sp.call_inner_instructions.data) , 2, 8) )) as token_bought_amount , aToB , call_tx_id FROM whirlpool_solana.whirlpool_call_swap sp WHERE sp.account_whirlpool = '7qbRF6YsyGuLUVs6Y1q64bdVrfe4ZcUUz1JRdoVNUJnm' AND call_tx_id = '44kmeC1edSfp21K5kKNVViJvLHG8XQqqu3KbHsrYcYZGmopWwBgP48c9u1DRBMGtQcbvyxd2TT8syY7ZvwpHqkhF'
Again, the values are correct but the decimals aren’t included yet. If we join the whirlpool initialization table to the swaps table to get symbols and decimals, and add some CASE WHEN logic for matching tokens to amounts, we can get this!
Congrats - you can now query and play around with direct Whirlpool swaps!
Try and calculate weekly volume and total volume of the USDC-SOL pool now. You should have everything you need.
Note that this is just decoded an example where the whirlpool program was called at the top level. If someone calls an aggregator like Jupiter that then CPI calls Whirlpool - you need some extra JOIN logic. In this example transaction, whirlpool is called at #3.1, and then #3.2 and #3.3 are the transfers.
When you do this, I highly recommend examining more single transaction examples of these different kinds of swaps, so you can get a sense of all the different accounts involved in a transaction. Solana.fm is another great explorer that has a flow visualizer I use a lot.
You can find the all Whirlpool swaps query here.
I hope you’ve found this beginners guide helpful, please let me know if you have any questions or something was unclear. Next time, you’ll learn how to:
How token (non-native), system (native), and anchor (IDL) programs work
Common associated account, ownership, and PDA patterns
Get latest balances of liquidity pools (TVL)
It will cover basic rust sleuthing and very common token and account patterns you’ll need to know for analyzing any program.
Make sure you subscribe so you don’t miss the next parts of the guide!