Last Update: January 14, 2021
Descriptive statistics consists of quantitative or qualitative data population or sample frequency distribution, central tendency measures, dispersion measures, association measures and frequency distribution shape.
This topic is part of Business Statistics with Python course. Feel free to take a look at Course Curriculum.
This tutorial has an educational and informational purpose and doesn’t constitute any type of forecasting, business, trading or investment advice. All content, including code and data, is presented for personal educational use exclusively and with no guarantee of exactness of completeness. Past performance doesn’t guarantee future results. Please read full Disclaimer.
1. Dispersion measures
Dispersion measures consist of data population or sample variability. Main dispersion measures are standard deviation, variance and average deviation or mean absolute deviation.
- Standard deviation consists of squared root of average data population or sample squared dispersion.
Where = population standard deviation, = data, = population mean, = population number of observations, = corrected sample standard deviation, = sample mean, = sample number of observations.
- Variance consists of average data population or sample squared dispersion.
Where = population variance, = data, = population mean, = population number of observations, = corrected sample variance, = sample mean, = sample number of observations.
- Mean absolute deviation or average deviation consists of average data population or sample absolute dispersion.
Where = population mean absolute deviation or average deviation, = data, = population mean, = population number of observations, = sample mean absolute deviation or average deviation, = sample mean, = sample number of observations.
2. Python code example.
2.1. Import Python packages .
import numpy as np import pandas as pd
2.2. Dispersion measures data reading and data daily arithmetic returns calculation.
• Data: S&P 500® index replicating ETF (ticker symbol: SPY) daily adjusted close prices (2007-2015).
spy = pd.read_csv('Data//Dispersion-Measures-Data.txt', index_col='Date', parse_dates=True)
rspy <- dailyReturn(spy) rspy = spy.pct_change(1) rspy.columns = ['rspy']
2.3. Dispersion measures calculation and output.
In: print('== Dispersion Measures ==') print('') print('Corrected Sample Standard Deviation:', np.round(rspy.std(ddof=1), 6)) print('Corrected Sample Variance:', np.round(rspy.var(ddof=1), 6)) print('Sample Mean Absolute Deviation:', np.round(rspy.mad(), 6))
Out: == Dispersion Measures == Corrected Sample Standard Deviation: rspy 0.01357 dtype: float64 Corrected Sample Variance: rspy 0.000184 dtype: float64 Sample Mean Absolute Deviation: rspy 0.008667 dtype: float64
 Travis E, Oliphant. “A guide to NumPy”. USA: Trelgol Publishing. 2006.
Stéfan van der Walt, S. Chris Colbert and Gaël Varoquaux. “The NumPy Array: A Structure for Efficient Numerical Computation”. Computing in Science & Engineering. 2011.
Wes McKinney. “Data Structures for Statistical Computing in Python.” Proceedings of the 9th Python in Science Conference. 2010.