Complete Backtest Example ¶
This notebook runs through a complete backtest example using raw data (external to Nautilus) to a single backtest run.
Imports ¶
We’ll start with all of our imports for the remainder of this guide:
import datetime
import os
import shutil
from decimal import Decimal
import fsspec
import pandas as pd
from nautilus_trader.core.datetime import dt_to_unix_nanos
from nautilus_trader.model.data import QuoteTick
from nautilus_trader.model.objects import Price, Quantity
from nautilus_trader.test_kit.providers import TestInstrumentProvider
from nautilus_trader.backtest.node import BacktestNode, BacktestVenueConfig, BacktestDataConfig, BacktestRunConfig, BacktestEngineConfig
from nautilus_trader.config import ImportableStrategyConfig
from nautilus_trader.persistence.catalog import ParquetDataCatalog
from nautilus_trader.persistence.external.core import process_files, write_objects
from nautilus_trader.persistence.external.readers import TextReader
Getting some raw data ¶
As a once off before we start the notebook - we need to download some sample data for backtesting.
For this example we will use FX data from
histdata.com
. Simply go to
https://www.histdata.com/download-free-forex-historical-data/?/ascii/tick-data-quotes/
and select an FX pair, then select one or more months of data to download.
Once you have downloaded the data, set the variable
DATA_DIR
below to the directory containing the data. By default, it will use the users
Downloads
directory.
DATA_DIR = "~/Downloads/"
Run the cell below; you should see the files that you downloaded:
fs = fsspec.filesystem('file')
raw_files = fs.glob(f"{DATA_DIR}/HISTDATA*")
assert raw_files, f"Unable to find any histdata files in directory {DATA_DIR}"
raw_files
The Data Catalog ¶
Next we will load this raw data into the data catalog. The data catalog is a central store for Nautilus data, persisted in the Parquet file format.
We have chosen parquet as the storage format for the following reasons:
-
It performs much better than CSV/JSON/HDF5/etc in terms of compression ratio (storage size) and read performance
-
It does not require any separate running components (for example a database)
-
It is quick and simple to get up and running with
Loading data into the catalog ¶
We can load data from various sources into the data catalog using helper methods in the
nautilus_trader.persistence.external.readers
module. The module contains methods for reading various data formats (CSV, JSON, text), minimising the amount of code required to get data loaded correctly into the data catalog.
The FX data from
histdata
is stored in CSV/text format, with fields
timestamp,
bid_price,
ask_price
. To load the data into the catalog, we simply write a function that converts each row into a Nautilus object (in this case, a
QuoteTick
). For this example, we will use the
TextReader
helper, which allows reading and applying a parsing function line by line.
Then, we simply instantiate a
ParquetDataCatalog
(passing in a directory where to store the data, by default we will just use the current directory) and pass our parsing function wrapping in the Reader class to
process_files
. We also need to know about which instrument this data is for; in this example, we will simply use one of the Nautilus test helpers to create a FX instrument.
It should only take a couple of minutes to load the data (depending on how many months).
def parser(line):
ts, bid, ask, idx = line.split(b",")
dt = pd.Timestamp(datetime.datetime.strptime(ts.decode(), "%Y%m%d %H%M%S%f"), tz='UTC')
yield QuoteTick(
instrument_id=AUDUSD.id,
bid_price=Price.from_str(bid.decode()),
ask_price=Price.from_str(ask.decode()),
bid_size=Quantity.from_int(100_000),
ask_size=Quantity.from_int(100_000),
ts_event=dt_to_unix_nanos(dt),
ts_init=dt_to_unix_nanos(dt),
)
We’ll set up a catalog in the current working directory.
CATALOG_PATH = os.getcwd() + "/catalog"
# Clear if it already exists, then create fresh
if os.path.exists(CATALOG_PATH):
shutil.rmtree(CATALOG_PATH)
os.mkdir(CATALOG_PATH)
AUDUSD = TestInstrumentProvider.default_fx_ccy("AUD/USD")
catalog = ParquetDataCatalog(CATALOG_PATH)
process_files(
glob_path=f"{DATA_DIR}/HISTDATA*.zip",
reader=TextReader(line_parser=parser),
catalog=catalog,
)
# Also manually write the AUD/USD instrument to the catalog
write_objects(catalog, [AUDUSD])
Using the Data Catalog ¶
Once data has been loaded into the catalog, the
catalog
instance can be used for loading data for backtests, or simple for research purposes. It contains various methods to pull data from the catalog, like
quote_ticks
(show below).
catalog.instruments()
import pandas as pd
from nautilus_trader.core.datetime import dt_to_unix_nanos
start = dt_to_unix_nanos(pd.Timestamp('2020-01-01', tz='UTC'))
end = dt_to_unix_nanos(pd.Timestamp('2020-01-02', tz='UTC'))
catalog.quote_ticks(start=start, end=end)
Configuring backtests ¶
Nautilus uses a
BacktestRunConfig
object, which allows configuring a backtest in one place. It is a
Partialable
object (which means it can be configured in stages); the benefits of which are reduced boilerplate code when creating multiple backtest runs (for example when doing some sort of grid search over parameters).
Adding data and venues ¶
instrument = catalog.instruments(as_nautilus=True)[0]
venues_config=[
BacktestVenueConfig(
name="SIM",
oms_type="HEDGING",
account_type="MARGIN",
base_currency="USD",
starting_balances=["1_000_000 USD"],
)
]
data_config=[
BacktestDataConfig(
catalog_path=str(ParquetDataCatalog.from_env().path),
data_cls=QuoteTick,
instrument_id=instrument.id.value,
start_time=1580398089820000000,
end_time=1580504394501000000,
)
]
strategies = [
ImportableStrategyConfig(
strategy_path="nautilus_trader.examples.strategies.ema_cross:EMACross",
config_path="nautilus_trader.examples.strategies.ema_cross:EMACrossConfig",
config=dict(
instrument_id=instrument.id.value,
bar_type="EUR/USD.SIM-15-MINUTE-BID-INTERNAL",
fast_ema_period=10,
slow_ema_period=20,
trade_size=Decimal(1_000_000),
),
),
]
config = BacktestRunConfig(
engine=BacktestEngineConfig(strategies=strategies),
data=data_config,
venues=venues_config,
)
Run the backtest! ¶
node = BacktestNode(configs=[config])
results = node.run()