Pull to refresh

Tutorial. Onchain Analysis basics

Level of difficultyMedium
Reading time16 min
Views1.7K
Alina Arsamakova

Software Engineer

Last week, I introduced you to the world of onchain analysis and explored some of the ways it can be used to gain insights into the cryptocurrency market.

Let me recap the most fascinating aspect of onchain analysis. We're talking crystal-clear information. Every transaction made on the blockchain is laid out before us, like a deck of cards spread out on a table. This is a level of transparency that simply cannot be matched by traditional assets, where so much of the transaction data is kept behind closed doors, controlled by a select few individuals or companies, and costs an arm and a leg to access. With traditional assets, no average person can gain the same level of insight and understanding that onchain analysis offers. We're talking about a whole new level of transparency and accessibility, and it's nothing short of revolutionary. 

Today, we're taking things up a notch with a tutorial that will guide you through running your own onchain analysis. By the end of this tutorial, you'll have the skills and knowledge you need to start analyzing blockchain data and making informed decisions about your cryptocurrency investments. So let's dive right in and see what insights we can uncover!

This tutorial is built around the Bitcoin blockchain, but many of the techniques are applicable to any other blockchains as long as they have wallets, balances, and transactions.

Where to get data from?

The starting point for onchain analysis is always a node. In order to investigate each and every block of the Bitcoin blockchain, we need to have access to all of the relevant data. There are a few options at our disposal. One option is to set up our own Bitcoin node on our local machine or remote server. Alternatively, we can take the easier route and use one of the Node-as-a-Service APIs that are available. There are plenty of providers to choose from, some of which offer free-tier access, though with certain limitations. Some of the most popular providers include Infura, Alchemy, QuickNode, and Getblock among others. For learning purposes, this option will perfectly do. 

I’m gonna be using Getblock as my Node-as-a-Service provider. First up, you need to create a new project and select the Bitcoin protocol and the Mainnet network. Once your project is set up, you'll be provided with an API key that you can use to make requests.

How to extract data from a blockchain node?

Once you have your node up and running, you can send requests to it using the RPC protocol. The list of available methods can be found in the Bitcoin Developer Documentation. To send a request, you'll need to use the POST method and pass in the name of the method and its parameters in the request body.

Let's create a Node class with a template function for future requests.

import requests
from typing import Dict, List


class Node:
    BASE_URL: str = 'https://btc.getblock.io/mainnet/'

    def __init__(self, api_key: str):
        self.headers = {'x-api-key': api_key}

    def request(self, method: str, params: List = []) -> Dict:
        body_templte = {"jsonrpc": "2.0",
                        "method": method,
                        "params": params,
                        "id": "1"}
        
        r = requests.post(self.BASE_URL, json=body_templte, headers=self.headers)
        if r.status_code == 200:
            return r.json()
        return {"code": r.status_code, "error": r.text}

Now we can easily add new functions for requesting any of the given RPC methods. For example, if you want to get the information for a specific block, you can use the getblock method and pass in the block hash or block height as a parameter. Similarly, if you want to get the information for a specific transaction, you can use the gettransaction method and pass in the transaction ID as a parameter. 

Let's find out what’s the latest block out there and see what's inside. Request getbestblockhash method:

node = Node(API_KEY)
node.request('getbestblockhash')
{
  'result': '00000000000000000005f8cd245e5427b5c29b1b0d15a498dab27b833436e29d',
  'error': None,
  'id': '1'
}

Bitcoin RPC operates with block hashes, not with decimal numbers. To get full information about a block we request it by its blockhash:

node.request('getblock', params=['00000000000000000005f8cd245e5427b5c29b1b0d15a498dab27b833436e29d'])
{'result': {
  'hash': '00000000000000000005f8cd245e5427b5c29b1b0d15a498dab27b833436e29d',
  'confirmations': 1654,
  'height': 779475,
  'version': 536870912,
  'versionHex': '20000000',
  'merkleroot': '927d33706214cc7654618e6e9e05889bc0de8830351b3b4b32757fe537e3691f',
  'time': 1678042199,
  'mediantime': 1678038302,
  'nonce': 2384488585,
  'bits': '170689a3',
  'difficulty': 43053844193928.45,
  'chainwork': '000000000000000000000000000000000000000041d535f8254fa87ffc768910',
  'nTx': 1858,
  'previousblockhash': '00000000000000000003e0b3af5c422f25fcdaa894edb1a1d55f2435dac3bc3b',
  'nextblockhash': '000000000000000000036072661e02bf991917af4d4c94a9f65c85ddd962e4e6',
  'strippedsize': 893094,
  'size': 1313978,
  'weight': 3993260,
  'tx': [
    '728416194c2d79c3db12b55cf492ee1c502c21cfd164176bc6537c832cf84dfa',
    '48832f686bfbc95820ef8e9bce1ca33881d3dd2b5aa562624345f2358ae684c9',
    '76230daef4a6874a4612898f72992188c1cf67b4d23d319c763d75c79a47ca30',
    'e58005b91f7a503be50cfa4063c0e64c3d5d9ef77c1f8bb4d6b30d031cb54a97',
    ...]
  },
 'error': None,
 'id': '1'}

Here you’ve got the entire block contents including its creation time, miner information, number of transactions it contains, parent block hash, and much more. The actual decimal number is indicated here as 'height'. It signifies the overall number of blocks in the blockchain starting from the genesis block.

What interests us most is transactions. However, they’re presented here as a list of IDs with no extra information. If we were to request each transaction individually using the gettransaction method, it would take some time to retrieve the data for hundreds of transactions. Luckily, we can request the same getblock method with verbosity = 2 which returns detailed block contents including extended transactions info. 

node.request('getblock', params=['00000000000000000005f8cd245e5427b5c29b1b0d15a498dab27b833436e29d', 2])
{'result': {
  'hash': '000000000000000000052cc1b3119c1eed58210d157caea8364733fe61c5510b',
  'confirmations': 1,
  'height': 781128,
  'version': 893157376,
  'versionHex': '353c8000',
  'merkleroot': '50e579e1b50cea34c19e7517a8c39362387cf658e3bed0e3eb72184703c3fd0e',
  'time': 1679013738,
  'mediantime': 1679012271,
  'nonce': 1094549306,
  'bits': '17067681',
  'difficulty': 43551722213590.37,
  'chainwork': '000000000000000000000000000000000000000042d3b62d26b6602bb8708d9a',
  'nTx': 2135,
  'previousblockhash': '000000000000000000004684099611dab88e149f86ade81aecd0e2a8bacf3801',
  'strippedsize': 703320,
  'size': 1883053,
  'weight': 3993013,
  'tx': [
    {
      'txid': '23f57be68a3335b2f18e21c917cf59c5a10ad4b841df95d437d4d1db27e3c9d6',
      'hash': '1504b07b0abbc23a7f1a18d4f0dc0a015c9a01f118b2269cd4f808f8b53b725e',
      'version': 1,
      'size': 306,
      'vsize': 279,
      'weight': 1116,
      'locktime': 0,
      'vin': [
        {
          'coinbase': '0348eb0b1b4d696e656420627920416e74506f6f6c393030b200560295bfa286fabe6d6dc5a7f783fba479fa129648ce02e884e9a32fa84379dcbca2e32dc959bd28fb500400000000000000000038ae5b01000000000000',
          'txinwitness': ['0000000000000000000000000000000000000000000000000000000000000000'],
          'sequence': 4294967295
        }
      ],
    'vout': [
      {'value': 6.43852333,
       'n': 0,
       'scriptPubKey': {'asm': 'OP_HASH160 4b09d828dfc8baaba5d04ee77397e04b1050cc73 OP_EQUAL',
       'desc': 'addr(38XnPvu9PmonFU9WouPXUjYbW91wa5MerL)#ap48vquh',
       'hex': 'a9144b09d828dfc8baaba5d04ee77397e04b1050cc7387',
       'address': '38XnPvu9PmonFU9WouPXUjYbW91wa5MerL',
       'type': 'scripthash'}
      },
     {'value': 0.0,
      'n': 1,
      'scriptPubKey': {'asm': 'OP_RETURN aa21a9edd2786e2cd36d382222891ee9ec4a58b50fbf040b0013580a6c28b4e3f5455f39',
       'desc': 'raw(6a24aa21a9edd2786e2cd36d382222891ee9ec4a58b50fbf040b0013580a6c28b4e3f5455f39)#ksu2m7js',
       'hex': '6a24aa21a9edd2786e2cd36d382222891ee9ec4a58b50fbf040b0013580a6c28b4e3f5455f39',
       'type': 'nulldata'}},
     {'value': 0.0,
      'n': 2,
      'scriptPubKey': {'asm': 'OP_RETURN 52534b424c4f434b3a413ddfdea6cce170388b1859ac400899c835fa76039a64101457fb2c004e61cf',
       'desc': 'raw(6a2952534b424c4f434b3a413ddfdea6cce170388b1859ac400899c835fa76039a64101457fb2c004e61cf)#k62s7rse',
       'hex': '6a2952534b424c4f434b3a413ddfdea6cce170388b1859ac400899c835fa76039a64101457fb2c004e61cf',
       'type': 'nulldata'}}],
    'hex': '010000000001010000000000000000000000000000000000000000000000000000000000000000ffffffff580348eb0b1b4d696e656420627920416e74506f6f6c393030b200560295bfa286fabe6d6dc5a7f783fba479fa129648ce02e884e9a32fa84379dcbca2e32dc959bd28fb500400000000000000000038ae5b01000000000000ffffffff032d6860260000000017a9144b09d828dfc8baaba5d04ee77397e04b1050cc73870000000000000000266a24aa21a9edd2786e2cd36d382222891ee9ec4a58b50fbf040b0013580a6c28b4e3f5455f3900000000000000002b6a2952534b424c4f434b3a413ddfdea6cce170388b1859ac400899c835fa76039a64101457fb2c004e61cf0120000000000000000000000000000000000000000000000000000000000000000000000000'},
   {'txid': '8d907e4f72a41ed129a2fc522fa2cfc75c9e80e9d3d6117186bde91214957ae4',
     ...]
  },
 'error': None,
 'id': '1'}

Request specific transaction by its txid. Some APIs may not have getransaction to return an explicit view of the transaction. In these cases, they may only offer the getrawtransaction which returns the encoded tx.

raw_tx = node.request('getrawtransaction', params=['8d907e4f72a41ed129a2fc522fa2cfc75c9e80e9d3d6117186bde91214957ae4'])['result']
decoded_tx = node.request('decoderawtransaction', params=[raw_tx])
{'result': {
  'txid': '8d907e4f72a41ed129a2fc522fa2cfc75c9e80e9d3d6117186bde91214957ae4',
  'hash': '9a7485b92f362b32f1a1ed0f7e89cb66e3a105f338a8ce83428c98864bfb922b',
  'version': 1,
  'size': 380,
  'vsize': 190,
  'weight': 758,
  'locktime': 0,
  'vin': [{'txid': 'a19fcac30cd451c54cb49f888222286dcdac662e0e65a7b01ed12862120c9b5e',
    'vout': 1,
    'scriptSig': {'asm': '', 'hex': ''},
    'txinwitness': ['',
     '30440220126575150e94bf012c9329510daf2ca1f8af34e66e84d5f3af888b4ee84c4b7c0220497e81afcda74840f81893605d9a857eacf16d0327e4071bcac6a48d7392381b01',
     '3044022050f45e0b3a86e8c1b912687bf9435bc4ee4564e885265b577eab7118a3bbc05e022018b5a8f24969d237cb829ff5a4e11e0d6ae90a72f866884e749f4a674e3277d801',
     '52210375e00eb72e29da82b89367947f29ef34afb75e8654f6ea368e0acdfd92976b7c2103a1b26313f430c4b15bb1fdce663207659d8cac749a0e53d70eff01874496feff2103c96d495bfdd5ba4145e3e046fee45e84a8a48ad05bd8dbb395c011a32cf9f88053ae'],
    'sequence': 4294967295}],
  'vout': [{'value': 0.01,
    'n': 0,
    'scriptPubKey': {'asm': 'OP_HASH160 a4543610bc3f9d0101f22567d06431a69bd67665 OP_EQUAL',
     'desc': 'addr(3Gfud6mKjFRvjpkcDsoZw3bHFDpKddgJJJ)#y0cvm758',
     'hex': 'a914a4543610bc3f9d0101f22567d06431a69bd6766587',
     'address': '3Gfud6mKjFRvjpkcDsoZw3bHFDpKddgJJJ',
     'type': 'scripthash'}},
   {'value': 0.03915408,
    'n': 1,
    'scriptPubKey': {'asm': '0 701a8d401c84fb13e6baf169d59684e17abd9fa216c8cc5b9fc63d622ff8c58d',
     'desc': 'addr(bc1qwqdg6squsna38e46795at95yu9atm8azzmyvckulcc7kytlcckxswvvzej)#ftpgzygj',
     'hex': '0020701a8d401c84fb13e6baf169d59684e17abd9fa216c8cc5b9fc63d622ff8c58d',
     'address': 'bc1qwqdg6squsna38e46795at95yu9atm8azzmyvckulcc7kytlcckxswvvzej',
     'type': 'witness_v0_scripthash'}}]},
 'error': None,
 'id': '1'}

These are essentially all the methods we require from the Bitcoin RPC for our analysis. What we're about to do is request block data with full transaction information blockwise, starting from the genesis block and calculate metrics over it. Before we get down to practice, though, we gotta sort out the inner structure of bitcoin transactions since these are the primal object of our analysis.

How are Bitcoin transactions structured? 

In Bitcoin, a transaction is a transfer of value from one Bitcoin address to another. Every transaction has one or more inputs and one or more outputs.

Inputs in a Bitcoin transaction refer to the previous transactions or unspent transaction outputs (UTXOs) that are being used to fund the new transaction. In simpler terms, an input is a reference to an existing unspent output of a previous Bitcoin transaction. Each input specifies the amount of Bitcoin to be spent and the address from which the Bitcoin is being transferred.

Outputs in a Bitcoin transaction refer to the new addresses where the Bitcoin is being sent. Each output specifies the amount of Bitcoin being sent, as well as the recipient's address [or the public key hash of recipient address, which we will cover later].

It is important to note that a transaction can have multiple inputs and outputs. This means that a single transaction can have multiple senders and recipients. All inputs and outputs are like puzzle pieces that fit together - the sum of the input values must be equal to the sum of the output values, with the difference being the transaction fee paid to the miner who includes the transaction in a block.

In summary, transaction inputs represent the sources of funds being used to fund a new transaction, while transaction outputs represent the destinations where the Bitcoin is being sent.

Let’s wrap this up:

  • An input indicates which address is sending Bitcoins and how many;

  • Outputs show which addresses are receiving Bitcoins; 

  • When an address receives outputs, it now owns those outputs and can spend them in another transaction as input; 

  • An address's balance is the sum of outputs that it received that it has not spent yet; 

  • A BTC that is being spent as an input is actually a reference to an output. 

Naturally, we analyze the blockchain starting from its outermost level, which is a block and then moving down to the lowest level, which are the inputs and outputs of each transaction. I see it as a 3-level nested structure, where each block contains multiple transactions, each transaction contains multiple inputs and outputs, and each input and output represents a transfer of value from one Bitcoin address to another.

We're gonna be implementing all the necessary functions in the reverse order, starting from the lowest level of the data structure (inputs and outputs), and then propagating this analysis up to each transaction and block.

Here’s what the average transaction looks like:

This is a JSON object containing various fields. We want to focus on two particular fields: vin - these are transaction inputs - and vout - transaction outputs. 

Extracting data from transaction outputs

Let’s start with outputs and create a function to extract the number of bitcoins and the address to which they belong:

from typing import Tuple

def analyze_output(output: Dict) -> Tuple[str, float]:
    value = output['value']
    address = output['scriptPubKey']['address']
    return address, value

At times, the 'address' field is intentionally missing and instead, only the 'asm' field is provided, which represents the assembly code. This usually happens in the first few hundred blocks. The wallet addresses, in such cases, are stored as public keys, as demonstrated here:

In order to convert the 'asm' public key into a standard format, you can utilize this readily available code snippet:

from hashlib import sha256, new
from base58 import b58encode

def sha256_digest(bstr):
    return sha256(sha256(bstr).digest()).digest()

def convert_pkh_to_address(prefix, addr):
    data = prefix + addr
    return b58encode(data + sha256_digest(data)[:4])

def pubkey_to_address(pubkey_hex):
    pubkey = bytearray.fromhex(pubkey_hex)
    round1 = sha256(pubkey).digest()
    h = new('ripemd160')
    h.update(round1)
    pubkey_hash = h.digest()
    return convert_pkh_to_address(b'\x00', pubkey_hash)

pubkey = "04f5eeb2b10c944c6b9fbcfff94c35bdeecd93df977882babc7f3a2cf7f5c81d3b09a68db7f0e04f21de5d4230e75e6dbe7ad16eefe0d4325a62067dc6f369446a"
print("Address: ", pubkey_to_address(pubkey))
Address:  b'1BW18n7MfpU35q4MTBSk8pse3XzQF8XvzT'

We will improve the functionality of the analyze_outputs() function by adding public key decryption. The approach is as follows: if a wallet address appears explicitly, extract it as is. Else if an address is encrypted, we decrypt it first and then take.

from typing import Tuple

def analyze_output(output: Dict) -> Tuple[str, float]:
    """returns (None, value) if transaction was reverted"""
    
    value = output['value']
    address = None
    if output['scriptPubKey'].get('address'):
        address = output['scriptPubKey']['address']
    else:
        pubkey, *_ = output['scriptPubKey']['asm'].split(' ')
        address = pubkey_to_address(pubkey)
    return address, value
        

Let’s test analyze_outputs() by giving it an output sample with an explicit address:

output_1 = {'value': 0.001135,
 'n': 0,
 'scriptPubKey': {'asm': 'OP_HASH160 0539d304163d382d1038ca67f94aa8c3f2652e01 OP_EQUAL',
  'desc': 'addr(32AeaCoq6D7igSYyjqktuznUAJnJjS2JEb)#u3nurdrd',
  'hex': 'a9140539d304163d382d1038ca67f94aa8c3f2652e0187',
  'address': '32AeaCoq6D7igSYyjqktuznUAJnJjS2JEb',
  'type': 'scripthash'}
  }

addr, value = analyze_output(output_1)
print(addr, value)
('32AeaCoq6D7igSYyjqktuznUAJnJjS2JEb', 0.001135)

And of course we should test the case with an encrypted address:

output_2 = {
    'value': 50.0,
    'n': 0,
    'scriptPubKey': {
        'asm': '04f5eeb2b10c944c6b9fbcfff94c35bdeecd93df977882babc7f3a2cf7f5c81d3b09a68db7f0e04f21de5d4230e75e6dbe7ad16eefe0d4325a62067dc6f369446a OP_CHECKSIG',
        'desc': 'pk(04f5eeb2b10c944c6b9fbcfff94c35bdeecd93df977882babc7f3a2cf7f5c81d3b09a68db7f0e04f21de5d4230e75e6dbe7ad16eefe0d4325a62067dc6f369446a)#vjmelvzd',
        'hex': '4104f5eeb2b10c944c6b9fbcfff94c35bdeecd93df977882babc7f3a2cf7f5c81d3b09a68db7f0e04f21de5d4230e75e6dbe7ad16eefe0d4325a62067dc6f369446aac',
        'type': 'pubkey'
        }
     }
addr, value = analyze_output(output_2)
print(addr, value)
(b'1BW18n7MfpU35q4MTBSk8pse3XzQF8XvzT', 50.0)

Extracting data from transaction inputs

Lovely, we’ve learned how to extract the required information from the outputs. This function will come in handy for the inputs as well. Likewise, we’ll be extracting the sender's address and the number of bitcoins from the input. The challenge is that when looking at the input, we do not see either the address or the value. Remember that Inputs in a Bitcoin transaction refer to one of the previous transactions where unspent bitcoins were generated to be used in the future.

Input is a reference to an existing unspent output of a previous Bitcoin transaction.

Therefore, to identify the sender and the transfer amount:

  1. Take the txid given in the vin which leads us back to one of the previous blocks; 

  2. Request this txid from the blockchain;

Within the gained transaction search through the outputs and extract the specific one located; at the index specified in ‘vout’ field.

Thus, we found that in the original input a value of 2.24610796 was transferred from the address bc1q8lrcj8dxk4kq92v5ujf694vmvgqwdrv78k3m4p.

def analyze_input(input: Dict) -> Tuple[str, float]:
    # each input is spending output under input['vout'] index from the transaction with txid = input['txid']
    origin_txid = input['txid'] 
    output_idx = input['vout']
    origin_tx_raw = node.request('getrawtransaction', params=[origin_txid])['result']
    origin_tx = node.request('decoderawtransaction', params=[origin_tx_raw])['result']

    origin_output = origin_tx['vout'][output_idx]
    return analyze_output(origin_output)
input_1 = {
    "txid": "93907f440693316e9fa51ee952d483aac4da86464c5b5ad1a6c7cf0a8c9b1a40",
    "vout": 1,
    "scriptSig": {
        "asm": "",
        "hex": ""
      },
    "txinwitness": [
        "30450221008b6042fb2f73f7a6c178e95620884db5edfedac3e488534ea307fbf5edacc67a02204312349dfa126f834f0add948ab3cdc43e56c1b5e1e71e8118d82a819535dff101",
        "02604ec25f33232cdd026f693142b745b327dc60fd62278587da001b00caeed06e"
      ],
    "sequence": 4294967295
    }

addr, value = analyze_input(input_1)
print(addr, value)
('bc1q8lrcj8dxk4kq92v5ujf694vmvgqwdrv78k3m4p', 2.24610796)

There is one major point to note about transactions: the first transaction in each block is a special one. It is known as a coinbase transaction and serves as a reward to the miner who successfully mined the block. Unlike regular transactions, it has no input, and its output is used to transfer the mining reward to the miner's wallet address.

Although there are no inputs in coinbase transactions, we will process them in the same way as regular ones, only skipping the step of processing inputs.

Analyzing the entire transaction

Now that we know how to extract data from inputs and outputs separately, we can write a function to process the transaction as a whole. 

Our objective is to gather the following metrics: Number of Addresses, Transaction Volume, and UTXOs amount. To achieve this we want to keep track of each address we encounter, the number of bitcoins they spent, and their accumulated balance.

Let's create a dictionary where the key is the address and the value is the accumulated balance. Each new transaction will mutate this dictionary by subtracting from inputs and adding up to outputs. Also, don't forget that each transaction has a fee for the miner of the block in addition to the block reward. This fee should also be added to the balance of the corresponding address. Besides, we wanna track the transaction volume and to do that we need to store input values somewhere as well.

from collections import defaultdict


def analyze_transaction(tx: Dict, miner_addr: str) -> Tuple[Dict, float]:
    tx_balances = defaultdict(lambda: 0)
    tx_volume = 0

    tx_balances[miner_addr] += tx['fee']

    for output in tx['vout']:
        address, value = analyze_output(output)
        tx_balances[address] -= value

        tx_volume += value
    
    for input in tx['vin']:
        address, value = analyze_input(input)
        tx_balances[address] += value      
    # CHECK: sum(input values) + fee == sum(output values)
    return tx_balances, tx_volume

Create a separate function to handle coinbase transactions:

def analyze_coinbase_transaction(tx: Dict) -> Tuple[str, float]:
    # First transaction in a block is the miner's reward for block validation
    # hence no inputs exist here
    # ouput goes to the miners address as a reward
    # which is usually 1st output in the list
    
    reward = tx['vout'][0]['value']
    miner_address, reward = analyze_output(tx['vout'][0])
    return miner_address, reward

Having created functions to process individual transactions we're now ready to scale this up to the entire block. In the analyze_block() function, we'll define variables to track address balances and transaction volume.

def analyze_block(block: Dict) -> Tuple[Dict, float]:
    block_balances = defaultdict(lambda: 0)
    block_volume = 0

    # the first tx is the coinbase tx rewarding miner
    miner_addr, reward = analyze_coinbase_transaction(block['tx'][0])
    block_balances[miner_addr] += reward

    for tx in block['tx'][1:]:
        tx_balances, tx_volume = analyze_transaction(tx)
        block_volume += tx_volume

        # update block balances with tx balnces
        for addr, balance in tx_balances.items():
            block_balances[addr] += balance
    return block_balances, block_volume

Now here comes the most interesting part. We can finally use the analyze_block() function to process all the blocks in the chain, starting from zero. As we iterate through the blocks, we'll store the metrics we gather in the blockwise_metrics list and keep track of the balances in the blockchain_balances dict.

genesis_block = node.request('getblockhash', params=[0])

blockchain_balances = defaultdict(lambda: 0)
blockwise_metrics = []

block_hash = genesis_block['result']

while block_hash:
    block = node.request('getblock', params=[block_hash, 2])['result']
    print("block #", block['height'])

    block_balances, block_volume = analyze_block(block)
    
    for addr, balance in block_balances.items():
        blockchain_balances[addr] += balance

    metrics = {
        "number_of_addresses": len(blockchain_balances.keys()),
        "transactions_volume": block_volume,
        "utxos": sum(blockchain_balances.values())
    }
    blockwise_metrics.append(metrics)

    block_hash = block.get('nextblockhash')

Let’s run this code for a short while and see what we’ve managed to gather. In the blockwise_metrics we’ve essentially collected the necessary metrics. As simple as that.

for block_num, metrics in enumerate(blockwise_metrics):
    print(f"block #{block_num}:", "metrics:", metrics)

Let’s check up on blockchain_balances. You may notice that all the addresses have 50 bitcoins in their account. In the beginning of the Bitcoin blockchain, there were mostly blocks where only the reward for the miner was included. When Bitcoin was first created, there were not many transactions happening on the network. As the network grew and the volume of transactions with it, the blocks started to include more and more transactions.

print(blockchain_balances)

The reward for mining a block is halved every 210,000 blocks, which currently occurs roughly every 4 years. This means that in the early days of Bitcoin when the reward was much higher, miners were incentivized to mine blocks even if there were no transactions to include. As the reward amount has decreased over time, miners have become more dependent on transaction fees as a source of income, which has led to more transactions being included in each block.

Basically,  that’s all you need to do to calculate the number of addresses, transaction volume, and UTXOs for the entire blockchain. However, I must warn you, this process will take a considerable amount of time due to the large size of the blockchain and the vast amount of data accumulated since 2009. There are, of course, ways to optimize the code and speed up data extraction from the node, such as recording inputs, outputs, and wallet balances into a database for each block. Doing so will enable you to calculate various metrics without having to repeatedly request raw data from the node, as RPC requests to the node are the most time-consuming part of block processing. In the upcoming section, I will explain how to transfer blockchain data to a regular database correctly, which will streamline analytics significantly. We will also discuss building more complex metrics. Until then, stay tuned!

You can find the working code here.

If you have any questions or improvement suggestions, let me know in the comments :)

Tags:
Hubs:
Total votes 2: ↑2 and ↓0+2
Comments0

Articles