事件研究

虽然Alphalens是一个专为评估横截面信号设计的工具(该信号可用于每日对多只证券进行排名),但我们仍可利用Alphalens收益分析函数(Alphalens的功能子集)来创建有意义的事件研究。

事件研究是一种统计方法,用于评估特定事件对股票价值的影响。在本示例中,我们将评估价格跌破30美元的股票会发生什么情况

导入与设置

[1]:
import warnings
warnings.filterwarnings('ignore')
[2]:
import alphalens
import pandas as pd
[3]:
%matplotlib inline

加载数据

以下是对500只大盘股股票代码与行业板块的简单映射关系。

[4]:
tickers = ['ACN', 'ATVI', 'ADBE', 'AMD', 'AKAM', 'ADS', 'GOOGL', 'GOOG', 'APH', 'ADI', 'ANSS', 'AAPL',
           'AVGO', 'CA', 'CDNS', 'CSCO', 'CTXS', 'CTSH', 'GLW', 'CSRA', 'DXC', 'EBAY', 'EA', 'FFIV', 'FB',
           'FLIR', 'IT', 'GPN', 'HRS', 'HPE', 'HPQ', 'INTC', 'IBM', 'INTU', 'JNPR', 'KLAC', 'LRCX', 'MA', 'MCHP',
           'MSFT', 'MSI', 'NTAP', 'NFLX', 'NVDA', 'ORCL', 'PAYX', 'PYPL', 'QRVO', 'QCOM', 'RHT', 'CRM', 'STX',
           'AMG', 'AFL', 'ALL', 'AXP', 'AIG', 'AMP', 'AON', 'AJG', 'AIZ', 'BAC', 'BK', 'BBT', 'BRK.B', 'BLK', 'HRB',
           'BHF', 'COF', 'CBOE', 'SCHW', 'CB', 'CINF', 'C', 'CFG', 'CME', 'CMA', 'DFS', 'ETFC', 'RE', 'FITB', 'BEN',
           'GS', 'HIG', 'HBAN', 'ICE', 'IVZ', 'JPM', 'KEY', 'LUK', 'LNC', 'L', 'MTB', 'MMC', 'MET', 'MCO', 'MS',
           'NDAQ', 'NAVI', 'NTRS', 'PBCT', 'PNC', 'PFG', 'PGR', 'PRU', 'RJF', 'RF', 'SPGI', 'STT', 'STI', 'SYF', 'TROW',
           'ABT', 'ABBV', 'AET', 'A', 'ALXN', 'ALGN', 'AGN', 'ABC', 'AMGN', 'ANTM', 'BCR', 'BAX', 'BDX', 'BIIB', 'BSX',
           'BMY', 'CAH', 'CELG', 'CNC', 'CERN', 'CI', 'COO', 'DHR', 'DVA', 'XRAY', 'EW', 'EVHC', 'ESRX', 'GILD', 'HCA',
           'HSIC', 'HOLX', 'HUM', 'IDXX', 'ILMN', 'INCY', 'ISRG', 'IQV', 'JNJ', 'LH', 'LLY', 'MCK', 'MDT', 'MRK', 'MTD',
           'MYL', 'PDCO', 'PKI', 'PRGO', 'PFE', 'DGX', 'REGN', 'RMD', 'SYK', 'TMO', 'UNH', 'UHS', 'VAR', 'VRTX', 'WAT',
           'MMM', 'AYI', 'ALK', 'ALLE', 'AAL', 'AME', 'AOS', 'ARNC', 'BA', 'CHRW', 'CAT', 'CTAS', 'CSX', 'CMI', 'DE',
           'DAL', 'DOV', 'ETN', 'EMR', 'EFX', 'EXPD', 'FAST', 'FDX', 'FLS', 'FLR', 'FTV', 'FBHS', 'GD', 'GE', 'GWW',
           'HON', 'INFO', 'ITW', 'IR', 'JEC', 'JBHT', 'JCI', 'KSU', 'LLL', 'LMT', 'MAS', 'NLSN', 'NSC', 'NOC', 'PCAR',
           'PH', 'PNR', 'PWR', 'RTN', 'RSG', 'RHI', 'ROK', 'COL', 'ROP', 'LUV', 'SRCL', 'TXT', 'TDG', 'UNP', 'UAL',
           'AES', 'LNT', 'AEE', 'AEP', 'AWK', 'CNP', 'CMS', 'ED', 'D', 'DTE', 'DUK', 'EIX', 'ETR', 'ES', 'EXC']

YFinance数据下载

[5]:
import yfinance as yf
import pandas_datareader.data as web
yf.pdr_override()

df = web.get_data_yahoo(tickers, start='2015-06-01',  end='2017-01-01')
[*********************100%***********************]  247 of 247 completed

17 Failed downloads:
- BHF: Data doesn't exist for startDate = 1433134800, endDate = 1483250400
- ARNC: Data doesn't exist for startDate = 1433134800, endDate = 1483250400
- STI: No data found, symbol may be delisted
- RHT: No data found, symbol may be delisted
- HRS: No data found, symbol may be delisted
- JEC: No data found, symbol may be delisted
- BCR: No data found for this date range, symbol may be delisted
- BRK.B: No data found, symbol may be delisted
- IR: Data doesn't exist for startDate = 1433134800, endDate = 1483250400
- AGN: No data found, symbol may be delisted
- CELG: No data found, symbol may be delisted
- LLL: No data found, symbol may be delisted
- LUK: No data found for this date range, symbol may be delisted
- ETFC: No data found, symbol may be delisted
- MYL: No data found, symbol may be delisted
- RTN: No data found, symbol may be delisted
- BBT: No data found, symbol may be delisted

数据格式化

[6]:
df = df.stack()
df.index.names = ['date', 'asset']
df = df.tz_localize('UTC', level='date')
[7]:
df.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 91935 entries, (Timestamp('2015-06-01 00:00:00+0000', tz='UTC'), 'A') to (Timestamp('2016-12-30 00:00:00+0000', tz='UTC'), 'XRAY')
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   Adj Close  91935 non-null  float64
 1   Close      91935 non-null  float64
 2   High       91935 non-null  float64
 3   Low        91935 non-null  float64
 4   Open       91935 non-null  float64
 5   Volume     91935 non-null  float64
dtypes: float64(6)
memory usage: 4.6+ MB

因子计算

现在是时候构建事件数据框了,这是我们将传递给Alphalens的输入。

Alphalens仅针对输入DataFrame中有值(非NaN)的日期计算统计量。因此,若要在特定日期和证券上执行绩效分析(如事件研究),我们必须确保输入DataFrame仅在该事件发生的日期/证券组合处包含有效值。DataFrame中的所有其他值必须为NaN或不出现。

另外,如果您打算做多事件,请确保事件值为正数(具体数值不重要但必须为正);若打算做空,则使用负值。这将影响累积收益图表。

让我们创建一个事件数据框,用于"标记"(任意值)每个交易日中证券价格跌破30美元的情况。

[8]:
today_price = df.loc[:, 'Open'].unstack('asset')
yesterday_price = today_price.shift(1)
events = today_price[(today_price < 30.0) & (yesterday_price >= 30)]
events = events.stack()
events = events.astype(float)
events
[8]:
date                       asset
2015-06-04 00:00:00+00:00  LNT      29.645000
                           PWR      29.850000
2015-06-12 00:00:00+00:00  CA       29.830000
2015-06-18 00:00:00+00:00  PWR      29.870001
2015-06-25 00:00:00+00:00  PWR      29.629999
                                      ...
2016-12-06 00:00:00+00:00  GE       29.990385
2016-12-07 00:00:00+00:00  PFE      29.724857
2016-12-09 00:00:00+00:00  CSCO     29.980000
2016-12-13 00:00:00+00:00  EW       29.476667
2016-12-14 00:00:00+00:00  EBAY     29.850000
Length: 161, dtype: float64

传递给alphalens.ml4trading.io的定价数据应包含资产的入场价格,因此必须反映在给定时间戳观察到事件后的下一个可用价格。这些价格不得用于计算该时间的事件。务必反复检查,以确保您的研究没有引入前瞻性偏差。

价格数据还必须包含资产的退出价格,对于周期1将使用下一个时间戳的价格,对于周期2将使用2个时间步长后的价格,以此类推。

虽然Alphalens不限定时间频率,但在我们的示例中,我们构建了'pricing'数据框,使得每个事件时间戳都包含事件被检测后次日的资产开盘价,该价格将作为资产的入场价格。同时,我们没有添加额外价格,因此资产的出场价格将是随后几日的开盘价(具体天数取决于'periods'参数)。

[9]:
pricing = df.loc[:, 'Open'].iloc[1:].unstack('asset')
[10]:
pricing.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 402 entries, 2015-06-01 00:00:00+00:00 to 2016-12-30 00:00:00+00:00
Columns: 230 entries, AAL to FTV
dtypes: float64(230)
memory usage: 725.5 KB

运行事件研究

配置

在使用Alphalens之前,请注意以下重要选项:

[11]:
# we don't want any filtering to be done

filter_zscore = None
[12]:
# We want to have only one  bin/quantile. So we can either use quantiles=1 or bins=1

quantiles = None
bins = 1

# Beware that in pandas versions below 0.20.0 there were few bugs in panda.qcut and pandas.cut
# that resulted in ValueError exception to be thrown when identical values were present in the
# dataframe and 1 quantile/bin was selected.
# As a workaroung use the bins custom range option that include all your values. E.g.

quantiles = None
bins = [-1000000, 1000000]
[13]:
# You don't have to directly set 'long_short' option when running alphalens.tears.create_event_study_tear_sheet
# But in case you are making use of other Alphalens functions make sure to set 'long_short=False'
# if you set 'long_short=True' Alphalens will perform forward return demeaning and that makes sense only
# in a dollar neutral portfolio. With an event style signal you cannot usually create a dollar neutral
# long/short portfolio

long_short = False

获取Alphalens输入

[14]:
factor_data = alphalens.utils.get_clean_factor_and_forward_returns(events,
                                                                   pricing,
                                                                   quantiles=None,
                                                                   bins=1,
                                                                   periods=(
                                                                       1, 2, 3, 4, 5, 6, 10),
                                                                   filter_zscore=None)
Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

运行事件分析报告

[15]:
alphalens.tears.create_event_study_tear_sheet(
    factor_data, pricing, avgretplot=(5, 10))
Quantiles Statistics
最小值 最大值 平均值 标准差 计数 计数百分比
因子分位数
1 26.549999 29.990385 29.577682 0.523523 161 100.0
<Figure size 432x288 with 0 Axes>
../_images/notebooks_event_study_29_3.png
<Figure size 432x288 with 0 Axes>
../_images/notebooks_event_study_29_5.png
../_images/notebooks_event_study_29_6.png

空头信号分析

如果我们想分析空头信号的表现,只需将事件值从正数切换为负数即可

[16]:
events = -events
[17]:
factor_data = alphalens.utils.get_clean_factor_and_forward_returns(events,
                                                                   pricing,
                                                                   quantiles=None,
                                                                   bins=1,
                                                                   periods=(
                                                                       1, 2, 3, 4, 5, 6, 10),
                                                                   filter_zscore=None)
Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
[18]:
alphalens.tears.create_event_study_tear_sheet(
    factor_data, pricing, avgretplot=(5, 10))
Quantiles Statistics
最小值 最大值 平均值 标准差 计数 计数百分比
因子分位数
1 -29.990385 -26.549999 -29.577682 0.523523 161 100.0
<Figure size 432x288 with 0 Axes>
../_images/notebooks_event_study_34_3.png
<Figure size 432x288 with 0 Axes>
../_images/notebooks_event_study_34_5.png
../_images/notebooks_event_study_34_6.png