Starter Guide to Solana Data Analysis (Part 2)
Let's teach you how tokens, accounts, and programs really work, and how to calculate total value locked (TVL).
This is the second of a four part series covering basic analysis of DEXs and NFTs on Solana.
part 2: types of programs, token accounts, and balances (TVL)
For this guide you should already be familiar with the structure of Solana transactions, how to decode a token transfer, and have basic SQL knowledge.
Make sure to subscribe so you don’t miss the following parts!
If you prefer to self study or work ahead, you can reference all the queries in my two dashboards on Dune:
By the end of the four guides you should be able to understand the core of those queries. My DMs on Twitter are always open for questions or ideas.
Big shoutout to Jon Wang for all his help answering my own questions - be sure to follow him for Solana content and updates. I also learned a lot from fellow Solana analysts Marqu and Anduril - both are must follows in the space.
Table of Contents:
How token (non-native), system (native), and anchor (IDL) programs work
Common associated account, ownership, and PDA patterns
Get latest balances of liquidity pools (TVL)
The goal will be to enable you to calculate “TVL” of a pool, which is the USD value of its balance of both token A and token B (last four columns below).
What are programs, really?
Solana programs are written in Rust. It may look and sound daunting, but I’ll help you understand the main file/code patterns to get by as a data analyst.
One great benefit of Rust is auto documentation, protocols can generate documentation without fully open sourcing the code underneath. You can then easily search using a hierarchy like spl_token::instruction::transfer, and sometimes click “source” to see the actual underlying code.
First, let’s go over the three types of programs and how to identify which functions are being called. For Solana, you have a “discriminator” which is at the start of instruction data bytearray but comes in different lengths:
Non-Native Programs (1 byte discriminator): These are all deployed programs that just use the low level solana_program crate. The SPL token (Solana ERC20 factory), Metaplex Metadata program, and many others use this - instructions are packed in order into an “enumerator” which basically assigns 0x00, 0x01, 0x02, …to each one.
You’ll see in the Rust docs that it uses “pub use solana_program;”
Native Programs (4 byte discriminator): These are programs core to the function of the blockchain. Creating accounts, staking/voting for consensus, deploying programs are all examples of native programs. They also use an enum instruction set technically.
Anchor Programs (8 byte discriminator): The Anchor-lang crate abstracts away the solana_program and makes it a lot easier to write Solana core logic, and it also generates an IDL which is the EVM equivalent of an ABI. It has seen a lot of adoption for key components of protocols like Jupiter, Orca, Metaplex, and more.
Note that other than the discriminator and some slight data type decoding nuances, they are all functionally the same in terms of interaction/instructions flows.
If you’re getting confused looking at data explorers or “inner_instructions” on Dune and not seeing any bytearrays, remember that Solana represents everything in base58 - so you need to convert it to bytearray using
Let’s quickly talk about important files/folders for a given program:
the lib.rs file always contains the program id with “declare_id!” and also the main functions/instructions (this is like a solidity interface). Some programs like Jupiter only open source this file. You’ll have to go to “source” to find this file.
the ::instructions:: path will be your go to reference for function definitions. It contains their actual account arguments (like account_whirlpool) and input data types (like “aToB” is bool and “amount” is u64).
the struct for a function will contain the accounts
the fn for a function will contain the input data (and logic)
just look for the instruction enum and go from there to find each fn.
the ::state:: path will contain the main data types that you’ll encounter in function logic and arguments. We can see all the “variables” in a whirlpool pool. The data gets serialized into one long bytearray buffer (no storage slots), I’ll cover how to check state values later since there are no view/read functions.
You’ll see a bunch of other rs files like utils, errors, entrypoint, processor, etc - just ignore those as you won’t really need to reference them as a data analyst/engineer.
⚠️ WARNING: Our Solana decoded tables were just released in alpha, please let us know if you find any issues by DMing me. You can request decoding for programs with public IDLs by submitting to this form.
On Tokens and Accounts
Alright, so now let’s put what we just learned into practice by looking at basic account creation + token transfer. You should have three rust docs tabs open, the SPL token program, the system program, and the whirlpool program. Try to find each instruction I mention as we go along, and their accounts/inputs.
Here’s the mind bender around accounts:
Ownership of anything in Solana is handled using associated accounts that are derived from the base program you are interacting with.
So for example, your wallet address can hold SOL because it’s native token, but it can’t hold wrapped SOL from the SPL token program (Solana ERC20 factory). To hold wrapped SOL, you ask the system program to create you an account that is basically jointly “owned” by you and the token program called a Program Derived Account (PDA).
If you look at the Whirlpool new pool initialization transaction from the last guide, you’ll see that two “vaults” were created for token A and token B. I’ve highlighted the new pool account (account #5) and the new token A associated account (account #6). Remember that token A is wrapped SOL (So1111…).
You can see in 1.2 that create an account (0x00000000 discriminator) is invoked on the System Program to create the 9RfZ account, and then program ownership given to the SPL Token Program. Then in 1.3, initialize (0x01 discriminator) is called on the SPL Token Program to set the account owner of the same associated account to be the new Whirlpool (7qbR…) account (which was created in 1.1) and that the token mint address is set to wrapped SOL (So111…). Each token associated account can only hold one kind of token, specified by the token mint address. Creating a token mint account is more complicated, and we’ll cover that in the next guide with NFTs.
Very Important Point: This is a good time to note that accounts can either be wallet seed derived or program derived, but they are different from programs addresses that hold executable logic (deployed through the BPFLoader). The whirlpool account created for USDC-SOL cannot be called directly. This is very different from a pair deployed by the Uniswap factory where you can call swap() on the pair directly - any swap execution through the USDC-SOL pool must call the whirlpool program.
The pool is the funded by transferring into it’s token vaults using the increaseLiquidity function. Looking at a basic query, we can see the same 9RfZ token account for tokenVaultA being referenced on liquidity addition:
Remember that the raw data doesn’t convert decimals, and that function argument amounts are usually estimates that will differ from actual transfer amounts.
Looking at the fourth liquidity addition in that table, we can see the liquidity being added into the vaults (destination) from the signer’s token associated accounts (source):
Looking at instructions 2, 3, and 5 from the linked transaction, you can see that a token associated account is actually created on behalf of the depositor so that they can wrap their SOL into SPL token wSOL (SO1111…). Then this account is closed at the end of the transaction after the wSOL is transferred into the pool’s tokenVaultA associated account.
If that last sentence made sense to you then congrats - you’re over one of the biggest logical humps of being a Solana analyst! 🎉 And if this PDA structure isn’t clear yet, try looking at a few more transactions from the query while also referencing the rust docs. I promise it’ll quickly make sense after you examine 3-4 of them.
Wondering why this complex accounts structure is even necessary?
This is how Solana achieves such transaction speed, by parallelizing transactions using read/write access on accounts to closely optimize transactions that get mined based on expected state changes.
Subscribe to get the next parts of the guide, and other Web3 data tips and tricks!
Putting it together to get TVL (total liquidity in a pool)
Now you have all the background you need to understand how to calculate TVL - we just need the latest balances of tokenVaultA and tokenVaultB for the whirlpool!
There aren’t really “transfer” events in Solana, and summing all of inner instructions would be a pain (and crash). Let’s look at solana.account_activity (docs) to understand how we’re going to track balances.
You may have noticed some pre and post balance columns in the transactions table. These are more nicely aggregated in the account_activity table! “pre_balance” and “post_balance” refer to the lamports (9 decimals of SOL) that an address holds. “pre_token_balance” and “post_token_balance” refer to the SPL token (Solana’s ERC20) that is held by an address. The SPL token held by the address is defined by the “token_mint_address”, and the owner of the address is the “token_owner_address”.
Note that these address balances are updated per transaction, not per instruction.
This table is really too big to JOIN on, so I’ve created two spells (abstracted views) that calculate daily and latest balances of any address: solana_utils.daily_balances and solana_utils.latest_balances respectively. It won’t be to-the-second accurate, but serves as great trend analysis tables.
Let’s query for all initialized whirlpools, and then join solana_utils.latest_balances twice on the associated account addresses, once for “tokenVaultA” and once for “tokenVaultB”.
SELECT account_whirlpool as whirlpool_id , ip.account_tokenMintA as tokenA , ip.account_tokenMintB as tokenB , COALESCE(bal_a.token_balance,0) as tokenA_balance , COALESCE(bal_b.token_balance,0) as tokenB_balance , bal_a.updated_at FROM whirlpool_solana.whirlpool_call_initializePool ip LEFT JOIN solana_utils.latest_balances bal_a ON bal_a.address = ip.account_tokenVaultA LEFT JOIN solana_utils.latest_balances bal_b ON bal_b.address = ip.account_tokenVaultB WHERE account_whirlpool = '7qbRF6YsyGuLUVs6Y1q64bdVrfe4ZcUUz1JRdoVNUJnm'
Now, we can put the above in a CTE and join on prices.usd_latest on the Solana blockchain which pulls in price data from coinpaprika every minute. We use a COALESCE here to make sure the TVL value doesn’t null out if we don’t have price data for one of the tokens.
SELECT whirlpool_id , tokenA_balance , tokenB_balance , COALESCE(tokenA_balance*p_a.price, 0) + COALESCE(tokenB_balance*p_b.price,0) as tvl , updated_at FROM whirlpool_liq liq LEFT JOIN prices.usd_latest p_a ON p_a.blockchain = 'solana' and toBase58(p_a.contract_address) = liq.tokenA LEFT JOIN prices.usd_latest p_b ON p_b.blockchain = 'solana' and toBase58(p_b.contract_address) = liq.tokenB
You can see the latest balance as of 4/13/2023 is 34.9k token A (SOL) and 1.2m token B (USDC), at a valuation of $2.01m. This roughly matches the values you see on solscan:
Note that the latest_balances and daily_balances tables update only at the start of each day for now.
Voila - you’re done now! The query itself isn’t too difficult, but hopefully you have a full intuition to the Solana accounts and token logic behind the scenes now.
This challenge is more of a SQL challenge than it is Solana domain dependent. You need to calculate weekly TVL by using the solana_utils.daily_balances and prices.usd tables - this requires a cross join and a forward fill. It’ll be a pretty common pattern to learn for any kind of balance analysis, so give it a shot!
See if you can figure out how to replicate this chart:
I hope you’ve found this beginners guide helpful, please let me know if you have any questions or something was unclear. Next time, you’ll learn how to:
Understand the basic mechanics of Candy Guard/Machine v3
Categorize “random” transfer orders to get mint and payments data
We’ll be getting into advanced territory in the third part, and doing more SQL than conceptual stuff. Keep up the efforts and you’ll be a Solana data expert soon!
Make sure you subscribe so you don’t miss the next parts of the guide!