Historical Cyrpto Prices in Python

Binance API for Historical Crypto Prices

Over the weekend, I wrote a script that tried using ccxt to pull price data for various crypto assets. According to their GitHub repo, the CCXT library is used to connect and trade with cryptocurrency exchanges and payment processing services worldwide. It provides quick access to market data for storage, analysis, visualization, indicator development, algorithmic trading, strategy backtesting, bot programming, and related software engineering.

In trying to use the library to pull price data from Binance, I  was getting a bunch of errors claiming that price data couldn’t be provided due to my geographic location. This didn’t make much sense to me since I’m located in California. Instead of spin my wheels, I put together some code that can be used to hit Binance’s API directly.

Specifically, the remainder of this article covers. 

  • Binance API Overview
  • Setting up Required API Keys
  • Pulling prices using Binance API
    • Storing API keys 
    • Setting up helper functions
    • Pulling prices 
    • Running the code 

The full repository can be found here. Keep in mind this article covers one part of a bigger project. 

Binance API 


Binance is one of the largest and most popular cryptocurrency exchanges in the world, providing a platform for users to buy, sell, and trade a wide range of cryptocurrencies. In addition to its exchange services, Binance also provides an API (Application Programming Interface) that enables developers to programmatically access its trading and market data.

The Binance API offers a variety of endpoints for accessing market data, managing orders, and interacting with the exchange. These endpoints can be accessed using HTTP requests, and responses are returned in JSON format. Some of the most commonly used endpoints include getting the current market price of a particular asset, getting historical price data, and placing or canceling orders.

In this post, we focus on pulling prices from the klines endpoint. The Klines endpoint (/api/v3/klines) is a market data endpoint provided by the Binance API that allows developers to retrieve historical klines/candlestick chart data for a specific symbol (i.e., cryptocurrency) and interval.

A kline is a representation of a fixed time interval of an asset’s price data, commonly used to represent the opening price, closing price, highest price, and lowest price for a given time period. The Klines endpoint allows developers to retrieve klines data for a variety of time intervals, such as 1 minute, 3 minutes, 5 minutes, 15 minutes, 30 minutes, 1 hour, 2 hours, 4 hours, 6 hours, 8 hours, 12 hours, and 1 day.

API Keys


Before we write any code, we need to set up keys for Binance and AWS. The AWS keys are optional if you choose to write the data somewhere else, but the code summarized in this article stores the output in S3. An API key is a code or token that is generated by an API (Application Programming Interface) provider and is used to authenticate and authorize access to its services or data.

Binance Key

To set up a Binance API key, take the following steps. 

  1. Create a Binance account if you don’t already have one. You can sign up for a Binance account at binance.com if you are not in the United States, or binance.us if you are in the United States.
  2. Log in to your Binance account and navigate to the “API Management” page.
  3. Click on “Create New API” to create a new API key.
  4. Enter a label for your API key and click on “Create”.
  5. Complete the verification process by following the instructions provided by Binance. This may involve providing additional information or completing additional steps to verify your identity.
  6. Once the verification process is complete, you will see your API key and secret key. 

AWS Key

Creating AWS keys are easy. Take the following steps to do so. 

  1. Log in to the AWS Management Console using your AWS account credentials.
  2. Navigate to the AWS Identity and Access Management (IAM) console.
  3. Click on “Users” from the navigation panel on the left-hand side of the screen.
  4. Select the user for whom you want to create a new key or create a new user if you need to.
  5. Click on the “Security credentials” tab for the selected user.
  6. Under “Access keys”, click on “Create access key”.
  7. Save the access key ID and secret access key in a secure location, as they will not be displayed again.
  8. Use the access key ID and secret access key to authenticate API requests to AWS services that support AWS access control.

Storing API Keys

Now that we have an API key, we need to store them in a place where our code can access them. There are many different ways to do this.

  1. Environment Variables: Environment variables are a common way to store API keys securely. This method involves setting environment variables on your local machine and then retrieving them in your Python code using the os.environ.get() method.
  2. Config Files: Another way to store API keys is to use a configuration file that contains the API keys. This file can be encrypted or stored in a secure location to prevent unauthorized access. You can read the API key from this file using libraries such as configparser or JSON.
  3. Keyring Libraries: Keyring libraries are a secure and platform-independent way to store API keys. These libraries provide a simple API that allows you to store and retrieve API keys from the system keyring.
  4. Python Modules: You can store API keys in a separate Python module and import it into your main code. This module can be encrypted or stored in a secure location to prevent unauthorized access.
  5. External Services: Finally, you can use external services such as AWS Secrets Manager or HashiCorp Vault to securely store your API keys. These services provide encryption, access controls, and auditing capabilities to help you manage and secure your API keys.

Pulling prices using Binance API


At this point, we have everything we need to start writing the code to pull prices from the Binance API. This section details a few different files, each of which are required to pull historical prices. 

  • keys.py 
  •  helper.py 
  • getData.py 

We will take a quick look at each of these files below. 

keys.py 

we will store the required API keys in a separate python module named keys.py.

  • Doing it this way was fast, and makes it easy to import the keys to other modules. 
  • keys.py will be excluded from any github commits, which ensures the keys won’t be exposed in public repositories or in production environments

A completed keys.py will look something like this. 

API_KEY = "<API_KEY>"
AWS_ACCESS_KEY_ID = "<AWS_ACCESS_KEY_ID>"
AWS_SECRET_ACCESS_KEY = "<AWS_SECRET_ACCESS_KEY>"

helper.py 

This step might not be necessary if our only goal was to pull prices from the Binance API; however, this post only talks about one functional component of a larger project. For that reason, I set up a helper file but these functions & variables could be stored in the same script as the one we will write to pull and store prices.

In addition to keys.py, we will also create helper.py. A helper file is typically used to store helper functions or utility functions that are used throughout an application or project. These functions are designed to perform specific tasks and can be reused across different parts of the application. The completed helper can be found here

The first thing in our helper file is a series of variables/constants that will be used across the code base. Instead of typing repeatedly in a bunch of scripts, we store them here so we can import these into other files. 

BUCKET = "cyrpto-trading-bot"

MAX_DATA_POINTS = 1000

URL = "https://api.binance.us/api/v3/klines"

Next, we have INTERVAL_MAPPING. This is a dictionary where the keys are the price intervals you are requesting and the values are the number of prices per day generated using that interval. For example, if you are requesting prices at a 1d interval- that would generate 1 data point per day. If you are requesting prices at a 12h interval, that would generate 2 data points per day.

INTERVAL_MAPPING = {
    "1m": 1440,
    "3m": 480,
    "5m": 288,
    "15m": 96,
    "30m": 48,
    "1h": 24,
    "2h": 12,
    "4h": 6,
    "6h": 4,
    "8h": 3,
    "12h": 2,
    "1d": 1,
}

 INTERVAL_MAPPING is an input into compute_number_data_points(..). This function calculates the number of data points that would be returned between two dates baed on the specified interval. This is important because Binance limits the number of data points per request to 1,000. As you’ll see later, this function is used to figure out if we need to split a single request into multiple requests. 

def compute_number_data_points(start_date, end_date, interval):
    start_date = datetime.strptime(start_date, "%Y-%m-%d").date()
    end_date = datetime.strptime(end_date, "%Y-%m-%d").date()

    date_difference = (end_date - start_date).days

    number_of_data_points = (INTERVAL_MAPPING[interval] * date_difference) + 1

    return number_of_data_points

We also have a helper function for make_api_request(..). This is a super simple function that makes a request and returns an error if it fails. Since we’ll eventually hit the api for things other than prices, we store this as a helper function. 

def make_api_request(url, params, headers):
    try:
        response = requests.get(url, params=params, headers=headers)

        response.raise_for_status()

    except requests.exceptions.RequestException as e:
        logger.error(f"Error making API request: {e}")

    return response

The Binance API returns kline/candlestick data as a list of lists (or an array of arrays). Each inner list represents a single candlestick data point and contains multiple values such as the opening time, opening price, highest price, lowest price, closing price, and so on. However, prefer to have the data stored as a JSON object with more descriptive key-value pairs, you can process the response data and convert the list of lists into a list of dictionaries before saving it to S3. Therefore we create a helper  convert_to_json(..).

def convert_to_json(kline_data):
    keys = [
        "open_time",
        "open",
        "high",
        "low",
        "close",
        "volume",
        "close_time",
        "quote_asset_volume",
        "number_of_trades",
        "taker_buy_base_asset_volume",
        "taker_buy_quote_asset_volume",
        "ignore",
    ]

    json_data = [dict(zip(keys, candlestick)) for candlestick in kline_data]

    return json_data

The last helper function is to_s3(..). This function creates an AWS client and sends a json to a bucket. There is a lot more you could do with this function, but for the purposes of this project it’s not necessary. It’s worth noting that you can set up your environment so the key’s don’t have to be passed as arguments if you want. 

def to_s3(response, bucket_name, file_key, aws_access_key_id, aws_secret_access_key):
    # Convert the dataframe to a JSON string
    json_string = json.dumps(response)

    # Create an S3 client
    s3 = boto3.client(
        "s3",
        aws_access_key_id=aws_access_key_id,
        aws_secret_access_key=aws_secret_access_key,
    )

    # Save the JSON string to S3
    try:
        response = s3.put_object(Bucket=bucket_name, Key=file_key, Body=json_string)
        return True

    except Exception as e:
        return False

getData.py 

keys.py and helper.py are the foundation of the script we will write to pull prices. At this point, we have everything we need to start writing getData.py. The completed code for getData.py can be found here

The first thing we will do is import the required packages. As you can see, we import a bunch of stuff from our helper.py and keys.py file. 

import argparse
import datetime as dt
import logging
import datetime

from typing import Optional, List

from crypto_trader.helper import (
    to_s3,
    compute_number_data_points,
    make_api_request,
    MAX_DATA_POINTS,
    URL,
    INTERVAL_MAPPING,
    BUCKET,
)
from crypto_trader.keys import (
    API_KEY,
    AWS_ACCESS_KEY_ID,
    AWS_SECRET_ACCESS_KEY,
)

We will define our function get_historical_prices(..). This function will take the following arguments 

  • symbol: The crypto symbol of the asset (e.g. ETHUSDT)
  • interval: The interval at which we want the request to return prices (e.g. 1d, 12h)
  • start_date: The day at which you want the request to start returning prices
  • end_date: The last day at which you want the request to start returning prices

The function’s s signature will look like this:

def get_historical_prices(
    symbol: str,
    interval: str,
    start_date: dt.datetime,
    end_date: dt.datetime,
)

The next two bits of code take the arguments passed to the function and create the information needed by the Binance API to make a request. Note that Binance require date to be expressed as UNIX timestamps. 

    params = {
        "symbol": symbol,
        "interval": interval,
        "startTime": int(dt.datetime.timestamp(start_date) * 1000),
        "endTime": int(dt.datetime.timestamp(end_date) * 1000),
        "limit": 1000,
    }

    headers = {"X-MBX-APIKEY": API_KEY}

We also use the start and end date to calculate the number of price data points we are requesting. 

    number_data_points = compute_number_data_points(
        start_date.strftime("%Y-%m-%d"), end_date.strftime("%Y-%m-%d"), interval
    )

Finally, we make the request to the Binance API. If the request is more than 1,000 data points we make multiple smaller requests. If the request is less than 1,000 data points we make a single request. Ultimately, the function returns a JSON of price data . 

def get_historical_prices(
    symbol: str,
    interval: str,
    start_date: dt.datetime,
    end_date: dt.datetime,
) -> Optional[List[dict]]:
    params = {
        "symbol": symbol,
        "interval": interval,
        "startTime": int(dt.datetime.timestamp(start_date) * 1000),
        "endTime": int(dt.datetime.timestamp(end_date) * 1000),
        "limit": 1000,
    }

    headers = {"X-MBX-APIKEY": API_KEY}

    number_data_points = compute_number_data_points(
        start_date.strftime("%Y-%m-%d"), end_date.strftime("%Y-%m-%d"), interval
    )

    if number_data_points > MAX_DATA_POINTS:
        response_list = []

        number_requests = int(number_data_points // 1000) + 1

        data_points_per_day = INTERVAL_MAPPING[interval]
        number_of_days_per_request = 1000 / data_points_per_day

        request_start_date = start_date
        request_end_date = start_date + datetime.timedelta(
            days=number_of_days_per_request
        )
        for i in range(number_requests):
            params = {
                "symbol": symbol,
                "interval": interval,
                "startTime": int(dt.datetime.timestamp(request_start_date) * 1000),
                "endTime": int(dt.datetime.timestamp(request_end_date) * 1000),
                "limit": 1000,
            }

            response = make_api_request(URL, params, headers)

            if response is not None:
                json_data = convert_to_json(response.json())
                response_list = response_list + json_data

            request_start_date = request_end_date + datetime.timedelta(days=1)

            request_end_date = min(
                request_start_date
                + datetime.timedelta(days=number_of_days_per_request),
                datetime.datetime.combine(datetime.datetime.today(), datetime.time.min),
            )

        return response_list

    else:
        params = {
            "symbol": symbol,
            "interval": interval,
            "startTime": int(dt.datetime.timestamp(start_date) * 1000),
            "endTime": int(dt.datetime.timestamp(end_date) * 1000),
            "limit": 1000,
        }

        response = make_api_request(URL, params, headers)

        if response is not None:
            json_data = convert_to_json(response.json())
            return json_data

We run the function get_historical_prices(..) in main. We use argparse to collect the required arguments through the command line. The function for main looks like this. 

def main():
    parser = argparse.ArgumentParser()

    parser.add_argument(
        "--symbol",
        default="ETHUSDT",
        help="The symbol of the cryptocurrency to retrieve price data for (e.g. 'ETHUSDT')",
    )

    parser.add_argument(
        "--interval",
        default="1d",
        help="The interval at which to retrieve price data (e.g. '1d' for daily data)",
    )
    parser.add_argument(
        "--start_date",
        type=dt.datetime.fromisoformat,
        help="The start date for the price data to retrieve, in YYYY-MM-DD format",
    )
    parser.add_argument(
        "--end_date",
        type=dt.datetime.fromisoformat,
        help="The end date for the price data to retrieve, in YYYY-MM-DD format",
    )

    args = parser.parse_args()

    response = get_historical_prices(
        args.symbol, args.interval, args.start_date, args.end_date
    )

    s3_path = f"data/{args.symbol}/{args.start_date.strftime('%Y_%m_%d')}_{args.end_date.strftime('%Y_%m_%d')}_{args.interval}.json"

    to_s3(
        response,
        BUCKET,
        s3_path,
        AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY,
    )

In practice, you can run the script through the command line using something like. This uses the default values for symbol and interval, but those values could be passed arguments as well. 

getData.py --start_date 2017-04-15 --end_date 2023-04-10

Once this has been run, you can navigate to the S3 bucket you sent the files. You should see something like this. The JSON will contain the request price history. 

Taking a quick look at the data, you can see it is an array of jsons as intended. 

Feel free to reach out with any questions using the contact page or hitting me up on any of my social links!