Last Update: February 6, 2020
Pairs identification consists of searching for paired asset prices which moved together based on fundamental or technical factors and initially tested through their returns’ correlation coefficient.
This topic is part of Pairs Trading Analysis with Python course. Feel free to take a look at Course Curriculum.
This tutorial has an educational and informational purpose and doesn’t constitute any type of trading or investment advice. All content, including code and data, is presented for personal educational use exclusively and with no guarantee of exactness of completeness. Past performance doesn’t guarantee future results. Please read full Disclaimer.
Initial pairs identification is done through correlation coefficient which consists of measuring the degree to which paired assets prices returns moved together taking into consideration their standard deviation. It is used for evaluating short-term relationships.
1. Formula notation.
Where = paired assets and prices arithmetic returns’ correlation coefficient, = current period asset prices arithmetic return, = current period asset prices arithmetic return, = asset prices arithmetic returns arithmetic mean and = asset prices arithmetic returns arithmetic mean.
2. Python code example.
2.1. Import Python packages .
import numpy as np import pandas as pd import matplotlib.pyplot as plt
2.2. Paired prices returns’ correlation data reading, training and testing ranges delimiting.
- Data: MSCI® Germany and France indexes replicating ETFs (ticker symbols: EWG and EWQ) daily adjusted close prices (2007-2016).
- Training and testing ranges delimiting not fixed and only included for educational purposes.
data = pd.read_csv('Data//Paired-Prices-Returns-Correlation-Data.txt', index_col='Date', parse_dates=True)
tdata = data[:'2014-12-31'] tdata.columns = ['tger', 'tfra'] fdata = data['2015-01-02':] fdata.columns = ['fger', 'ffra']
2.3. Paired prices returns’ calculation.
- Paired prices returns’ calculation within training range.
tger = tdata['tger'] tfra = tdata['tfra']
rtger = tger.pct_change(1).dropna() rtfra = tfra.pct_change(1).dropna()
2.4. Paired prices chart.
- Paired prices chart within training range.
fig1, ax1 = plt.subplots() ax1.plot(tger, label='tger') ax1.legend(loc='lower left') ax2 = ax1.twinx() ax2.plot(tfra, label='tfra', color='orange') ax2.legend(loc='lower right') plt.suptitle('Paired Prices Chart') plt.show()
2.5. Paired prices returns’ correlation calculation.
- Paired prices returns’ correlation calculation within training range.
In: print('== Paired Prices Returns Correlation ==') print(np.round(pd.DataFrame(rtger).join(rtfra).corr(), 4))
Out: == Paired Prices Returns Correlation == tger tfra tger 1.0000 0.9514 tfra 0.9514 1.0000
 Travis E, Oliphant. “A guide to NumPy”. USA: Trelgol Publishing. 2006.
Stéfan van der Walt, S. Chris Colbert and Gaël Varoquaux. “The NumPy Array: A Structure for Efficient Numerical Computation”. Computing in Science & Engineering. 2011.
Wes McKinney. “Data Structures for Statistical Computing in Python.” Proceedings of the 9th Python in Science Conference. 2010.
John D. Hunter. “Matplotlib: A 2D Graphics Environment.” Computing in Science & Engineering. 2007.