滚动回归

滚动OLS在固定窗口的观测值上应用OLS,然后将窗口滚动(移动或滑动)到数据集上。关键参数是window,它决定了每个OLS回归中使用的观测值数量。默认情况下,RollingOLS会删除窗口中的缺失值,因此将使用可用的数据点来估计模型。

估计值被对齐,以便使用数据点 \(i+1, i+2, ... i+window\) 估计的模型存储在位置 \(i+window\) 中。

首先导入在此笔记本中使用的模块。

[1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas_datareader as pdr
import seaborn

import statsmodels.api as sm
from statsmodels.regression.rolling import RollingOLS

seaborn.set_style("darkgrid")
pd.plotting.register_matplotlib_converters()
%matplotlib inline

pandas-datareader 用于从 Ken French的网站 下载数据。下载的两个数据集是3个Fama-French因子和10个行业投资组合。数据可从1926年开始获取。

数据是因子或行业投资组合的月度回报。

[2]:
factors = pdr.get_data_famafrench("F-F_Research_Data_Factors", start="1-1-1926")[0]
factors.head()
/var/folders/xc/cwj7_pwj6lb0lkpyjtcbm7y80000gn/T/ipykernel_80786/1924419770.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  factors = pdr.get_data_famafrench("F-F_Research_Data_Factors", start="1-1-1926")[0]
/var/folders/xc/cwj7_pwj6lb0lkpyjtcbm7y80000gn/T/ipykernel_80786/1924419770.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  factors = pdr.get_data_famafrench("F-F_Research_Data_Factors", start="1-1-1926")[0]
[2]:
Mkt-RF SMB HML RF
Date
1926-07 2.96 -2.56 -2.43 0.22
1926-08 2.64 -1.17 3.82 0.25
1926-09 0.36 -1.40 0.13 0.23
1926-10 -3.24 -0.09 0.70 0.32
1926-11 2.53 -0.10 -0.51 0.31
[3]:
industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
industries.head()
/var/folders/xc/cwj7_pwj6lb0lkpyjtcbm7y80000gn/T/ipykernel_80786/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/var/folders/xc/cwj7_pwj6lb0lkpyjtcbm7y80000gn/T/ipykernel_80786/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/var/folders/xc/cwj7_pwj6lb0lkpyjtcbm7y80000gn/T/ipykernel_80786/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/var/folders/xc/cwj7_pwj6lb0lkpyjtcbm7y80000gn/T/ipykernel_80786/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/var/folders/xc/cwj7_pwj6lb0lkpyjtcbm7y80000gn/T/ipykernel_80786/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/var/folders/xc/cwj7_pwj6lb0lkpyjtcbm7y80000gn/T/ipykernel_80786/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/var/folders/xc/cwj7_pwj6lb0lkpyjtcbm7y80000gn/T/ipykernel_80786/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
/var/folders/xc/cwj7_pwj6lb0lkpyjtcbm7y80000gn/T/ipykernel_80786/268191425.py:1: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]
[3]:
NoDur Durbl Manuf Enrgy HiTec Telcm Shops Hlth Utils Other
Date
1926-07 1.45 15.55 4.69 -1.18 2.90 0.83 0.11 1.77 7.04 2.13
1926-08 3.97 3.68 2.81 3.47 2.66 2.17 -0.71 4.25 -1.69 4.35
1926-09 1.14 4.80 1.15 -3.39 -0.38 2.41 0.21 0.69 2.04 0.29
1926-10 -1.24 -8.23 -3.63 -0.78 -4.58 -0.11 -2.29 -0.57 -2.63 -2.84
1926-11 5.20 -0.19 4.10 0.01 4.71 1.63 6.43 5.42 3.71 2.11

估计的第一个模型是CAPM的滚动版本,该模型将科技行业公司的超额收益回归到市场的超额收益上。

窗口期为60个月,因此在第一个60个月(window)之后可以获得结果。前59个月(window - 1)的估计值均为nan填充。

[4]:
endog = industries.HiTec - factors.RF.values
exog = sm.add_constant(factors["Mkt-RF"])
rols = RollingOLS(endog, exog, window=60)
rres = rols.fit()
params = rres.params.copy()
params.index = np.arange(1, params.shape[0] + 1)
params.head()
[4]:
const Mkt-RF
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
[5]:
params.iloc[57:62]
[5]:
const Mkt-RF
58 NaN NaN
59 NaN NaN
60 0.876155 1.399240
61 0.879936 1.406578
62 0.953169 1.408826
[6]:
params.tail()
[6]:
const Mkt-RF
1174 0.480235 1.089338
1175 0.552565 1.086061
1176 0.610923 1.090145
1177 0.517339 1.089693
1178 0.503159 1.088327

接下来,我们绘制市场加载量以及95%的逐点置信区间。alpha=False 省略了常数列(如果存在)。

[7]:
fig = rres.plot_recursive_coefficient(variables=["Mkt-RF"], figsize=(14, 6))
../../../_images/examples_notebooks_generated_rolling_ls_10_0.png

接下来,模型被扩展以包括所有三个因素,即超额市场、规模因素和价值因素。

[8]:
exog_vars = ["Mkt-RF", "SMB", "HML"]
exog = sm.add_constant(factors[exog_vars])
rols = RollingOLS(endog, exog, window=60)
rres = rols.fit()
fig = rres.plot_recursive_coefficient(variables=exog_vars, figsize=(14, 18))
../../../_images/examples_notebooks_generated_rolling_ls_12_0.png

公式

RollingOLSRollingWLS 都支持使用公式接口进行模型规范。下面的示例等同于之前估计的三因子模型。请注意,一个变量被重命名以获得有效的Python变量名。

[9]:
joined = pd.concat([factors, industries], axis=1)
joined["Mkt_RF"] = joined["Mkt-RF"]
mod = RollingOLS.from_formula("HiTec ~ Mkt_RF + SMB + HML", data=joined, window=60)
rres = mod.fit()
rres.params.tail()
[9]:
Intercept Mkt_RF SMB HML
Date
2024-04 0.600109 1.121680 -0.095098 -0.344825
2024-05 0.682975 1.114558 -0.091375 -0.351516
2024-06 0.719744 1.121549 -0.106960 -0.357406
2024-07 0.674874 1.122952 -0.115250 -0.361814
2024-08 0.691445 1.116360 -0.108656 -0.368933

RollingWLS: 滚动加权最小二乘法

The rolling 模块还提供了 RollingWLS,它可以接受一个可选的 weights 输入来进行滚动加权最小二乘法。它产生的结果与应用于滚动数据窗口的 WLS 相匹配。

拟合选项

Fit 接受其他可选的关键字来设置协方差估计器。仅支持两种估计器,'nonrobust'(经典的 OLS 估计器)和 'HC0',即 White 的异方差稳健估计器。

您可以设置 params_only=True 以仅估计模型参数。这比计算执行推理所需的全套值要快得多。

最后,参数 reset 可以设置为一个正整数,以控制非常长样本中的估计误差。RollingOLS 在滚动时通过仅添加最新的观测值并删除被丢弃的观测值来避免完整的矩阵乘积。设置 reset 会在每 reset 个周期使用完整的内积。在大多数应用中,此参数可以省略。

[10]:
%timeit rols.fit()
%timeit rols.fit(params_only=True)
69.4 ms ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
22.3 ms ± 5.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

扩展样本

可以扩展样本,直到有足够的观测值来满足完整的窗口长度。在这个例子中,我们从有12个观测值开始,然后逐渐增加样本,直到有60个观测值。第一个非nan值是使用12个观测值计算的,第二个使用13个,依此类推。所有其他估计值都是使用60个观测值计算的。

[11]:
res = RollingOLS(endog, exog, window=60, min_nobs=12, expanding=True).fit()
res.params.iloc[10:15]
[11]:
const Mkt-RF SMB HML
Date
1927-05 NaN NaN NaN NaN
1927-06 1.560283 0.999383 1.351219 -0.471879
1927-07 1.235899 1.294857 0.742924 -0.540048
1927-08 1.249999 1.297546 0.752327 -0.548306
1927-09 1.375626 1.286724 1.177758 -0.609331
[12]:
res.nobs[10:15]
[12]:
Date
1927-05     0
1927-06    12
1927-07    13
1927-08    14
1927-09    15
Freq: M, dtype: int64

Last update: Oct 16, 2024