日内因子¶
在本笔记本中,我们使用Alphalens分析日内因子的表现,该因子每日计算,股票在开盘时买入并在收盘时卖出,不持有隔夜头寸。
导入与设置¶
[2]:
import warnings
warnings.filterwarnings('ignore')
[3]:
import alphalens
import pandas as pd
[4]:
%matplotlib inline
加载数据¶
以下是一个针对少量大盘股的投资组合中股票代码与行业板块的简单映射关系。
[5]:
sector_names = {
0 : "information_technology",
1 : "financials",
2 : "health_care",
3 : "industrials",
4 : "utilities",
5 : "real_estate",
6 : "materials",
7 : "telecommunication_services",
8 : "consumer_staples",
9 : "consumer_discretionary",
10 : "energy"
}
ticker_sector = {
"ACN" : 0, "ATVI" : 0, "ADBE" : 0, "AMD" : 0, "AKAM" : 0, "ADS" : 0, "GOOGL" : 0, "GOOG" : 0,
"APH" : 0, "ADI" : 0, "ANSS" : 0, "AAPL" : 0, "AMAT" : 0, "ADSK" : 0, "ADP" : 0, "AVGO" : 0,
"AMG" : 1, "AFL" : 1, "ALL" : 1, "AXP" : 1, "AIG" : 1, "AMP" : 1, "AON" : 1, "AJG" : 1, "AIZ" : 1, "BAC" : 1,
"BK" : 1, "BBT" : 1, "BRK.B" : 1, "BLK" : 1, "HRB" : 1, "BHF" : 1, "COF" : 1, "CBOE" : 1, "SCHW" : 1, "CB" : 1,
"ABT" : 2, "ABBV" : 2, "AET" : 2, "A" : 2, "ALXN" : 2, "ALGN" : 2, "AGN" : 2, "ABC" : 2, "AMGN" : 2, "ANTM" : 2,
"BCR" : 2, "BAX" : 2, "BDX" : 2, "BIIB" : 2, "BSX" : 2, "BMY" : 2, "CAH" : 2, "CELG" : 2, "CNC" : 2, "CERN" : 2,
"MMM" : 3, "AYI" : 3, "ALK" : 3, "ALLE" : 3, "AAL" : 3, "AME" : 3, "AOS" : 3, "ARNC" : 3, "BA" : 3, "CHRW" : 3,
"CAT" : 3, "CTAS" : 3, "CSX" : 3, "CMI" : 3, "DE" : 3, "DAL" : 3, "DOV" : 3, "ETN" : 3, "EMR" : 3, "EFX" : 3,
"AES" : 4, "LNT" : 4, "AEE" : 4, "AEP" : 4, "AWK" : 4, "CNP" : 4, "CMS" : 4, "ED" : 4, "D" : 4, "DTE" : 4,
"DUK" : 4, "EIX" : 4, "ETR" : 4, "ES" : 4, "EXC" : 4, "FE" : 4, "NEE" : 4, "NI" : 4, "NRG" : 4, "PCG" : 4,
"ARE" : 5, "AMT" : 5, "AIV" : 5, "AVB" : 5, "BXP" : 5, "CBG" : 5, "CCI" : 5, "DLR" : 5, "DRE" : 5,
"EQIX" : 5, "EQR" : 5, "ESS" : 5, "EXR" : 5, "FRT" : 5, "GGP" : 5, "HCP" : 5, "HST" : 5, "IRM" : 5, "KIM" : 5,
"APD" : 6, "ALB" : 6, "AVY" : 6, "BLL" : 6, "CF" : 6, "DWDP" : 6, "EMN" : 6, "ECL" : 6, "FMC" : 6, "FCX" : 6,
"IP" : 6, "IFF" : 6, "LYB" : 6, "MLM" : 6, "MON" : 6, "MOS" : 6, "NEM" : 6, "NUE" : 6, "PKG" : 6, "PPG" : 6,
"T" : 7, "CTL" : 7, "VZ" : 7,
"MO" : 8, "ADM" : 8, "BF.B" : 8, "CPB" : 8, "CHD" : 8, "CLX" : 8, "KO" : 8, "CL" : 8, "CAG" : 8,
"STZ" : 8, "COST" : 8, "COTY" : 8, "CVS" : 8, "DPS" : 8, "EL" : 8, "GIS" : 8, "HSY" : 8, "HRL" : 8,
"AAP" : 9, "AMZN" : 9, "APTV" : 9, "AZO" : 9, "BBY" : 9, "BWA" : 9, "KMX" : 9, "CCL" : 9,
"APC" : 10, "ANDV" : 10, "APA" : 10, "BHGE" : 10, "COG" : 10, "CHK" : 10, "CVX" : 10, "XEC" : 10, "CXO" : 10,
"COP" : 10, "DVN" : 10, "EOG" : 10, "EQT" : 10, "XOM" : 10, "HAL" : 10, "HP" : 10, "HES" : 10, "KMI" : 10
}
YFinance数据下载¶
[6]:
import yfinance as yf
import pandas_datareader.data as web
yf.pdr_override()
tickers = list(ticker_sector.keys())
df = web.get_data_yahoo(tickers, start='2017-01-01', end='2017-06-01')
df.index = pd.to_datetime(df.index, utc=True)
[*********************100%***********************] 182 of 182 completed
19 Failed downloads:
- CHK: Data doesn't exist for startDate = 1483250400, endDate = 1496293200
- BF.B: No data found for this date range, symbol may be delisted
- BRK.B: No data found, symbol may be delisted
- MON: Data doesn't exist for startDate = 1483250400, endDate = 1496293200
- CELG: No data found, symbol may be delisted
- APC: No data found, symbol may be delisted
- ARNC: Data doesn't exist for startDate = 1483250400, endDate = 1496293200
- CBG: No data found for this date range, symbol may be delisted
- CTL: No data found, symbol may be delisted
- BCR: No data found for this date range, symbol may be delisted
- DWDP: No data found, symbol may be delisted
- BHF: Data doesn't exist for startDate = 1483250400, endDate = 1496293200
- GGP: No data found for this date range, symbol may be delisted
- CXO: No data found, symbol may be delisted
- HCP: No data found, symbol may be delisted
- DPS: No data found for this date range, symbol may be delisted
- AGN: No data found, symbol may be delisted
- BBT: No data found, symbol may be delisted
- BHGE: No data found, symbol may be delisted
[7]:
df = df.stack()
df.index.names = ['date', 'asset']
df.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 16789 entries, (Timestamp('2017-01-03 00:00:00+0000', tz='UTC'), 'A') to (Timestamp('2017-05-31 00:00:00+0000', tz='UTC'), 'XOM')
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Adj Close 16789 non-null float64
1 Close 16789 non-null float64
2 High 16789 non-null float64
3 Low 16789 non-null float64
4 Open 16789 non-null float64
5 Volume 16789 non-null float64
dtypes: float64(6)
memory usage: 842.6+ KB
因子计算¶
我们的示例因子根据股票的隔夜价格缺口(昨日收盘价至今日开盘价)对股票进行排序。我们将观察该因子是否具有阿尔法收益,还是纯粹的噪音。
[8]:
available_tickers = df.index.unique('asset')
ticker_sector = {k: v for k, v in ticker_sector.items() if k in available_tickers}
[9]:
today_open = df.loc[:, 'Open'].unstack('asset')
today_close = df.loc[:, 'Close'].unstack('asset')
yesterday_close = today_close.shift(1)
[10]:
factor = (today_open - yesterday_close) / yesterday_close
传递给alphalens的定价数据应包含资产的入场价格,因此必须反映在给定时间戳观察到因子值后的下一个可用价格。这些价格不得用于计算该时间点的因子值。务必反复检查,确保您的研究没有引入前瞻性偏差。
价格数据还必须包含资产的退出价格,对于周期1将使用下一个时间戳的价格,对于周期2将使用2个时间戳后的价格,依此类推。
对于计算因子时使用的时间频率没有限制/假设,对于交易因子的具体时间也没有限制(开盘交易、收盘交易或日内交易),唯一的要求是根据上述规则确保因子和价格数据框正确对齐。
在我们的示例中,我们希望以开盘价买入股票,因此需要与因子值时间戳完全匹配的开盘价数据;同时我们计划以收盘价卖出股票,所以还需添加收盘价数据——这些收盘价将用于计算周期1的远期收益,因为它们正好出现在因子值时间戳之后。因此Alphalens计算的收益率将基于资产从开盘价到收盘价的价差。
如果我们有其他价格数据,就可以计算其他时间段的收益率,例如开盘后一小时、两小时等。我们可以在开盘价之后直接添加这些价格,并指示Alphalens计算1、2、3...等多个周期的收益率,而不仅限于本例中的周期1。
数据格式化¶
时间调整¶
[11]:
# Fix time as Yahoo doesn't set it
today_open.index += pd.Timedelta('9h30m')
today_close.index += pd.Timedelta('16h')
# pricing will contain both open and close
pricing = pd.concat([today_open, today_close]).sort_index()
[12]:
pricing.head()
[12]:
| 资产 | A | AAL | AAP | AAPL | ABBV | ABC | ABT | ACN | ADBE | ADI | ... | NUE | PCG | PKG | PPG | SCHW | STZ | T | VZ | XEC | XOM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 日期 | |||||||||||||||||||||
| 2017-01-03 09:30:00+00:00 | 45.930000 | 47.279999 | 170.779999 | 28.950001 | 62.919998 | 78.510002 | 38.630001 | 117.379997 | 103.430000 | 72.599998 | ... | 59.740002 | 60.810001 | 85.160004 | 95.430000 | 40.049999 | 155.009995 | 42.689999 | 53.959999 | 137.529999 | 90.940002 |
| 2017-01-03 16:00:00+00:00 | 46.490002 | 46.299999 | 170.600006 | 29.037500 | 62.410000 | 82.610001 | 39.049999 | 116.459999 | 103.480003 | 72.510002 | ... | 59.610001 | 60.369999 | 85.000000 | 95.250000 | 40.200001 | 154.750000 | 43.020000 | 54.580002 | 138.789993 | 90.889999 |
| 2017-01-04 09:30:00+00:00 | 46.930000 | 46.630001 | 170.369995 | 28.962500 | 62.639999 | 82.599998 | 39.060001 | 116.910004 | 103.739998 | 72.769997 | ... | 59.759998 | 60.610001 | 85.440002 | 95.709999 | 40.400002 | 157.149994 | 42.939999 | 54.549999 | 138.479996 | 91.120003 |
| 2017-01-04 16:00:00+00:00 | 47.099998 | 46.700001 | 172.000000 | 29.004999 | 63.290001 | 84.660004 | 39.360001 | 116.739998 | 104.139999 | 72.360001 | ... | 61.250000 | 60.590000 | 86.370003 | 97.269997 | 41.220001 | 157.990005 | 42.770000 | 54.520000 | 138.500000 | 89.889999 |
| 2017-01-05 09:30:00+00:00 | 47.049999 | 46.520000 | 170.869995 | 28.980000 | 63.380001 | 84.379997 | 39.240002 | 116.980003 | 104.129997 | 72.410004 | ... | 61.119999 | 60.660000 | 86.370003 | 96.459999 | 40.970001 | 150.550003 | 42.849998 | 54.779999 | 138.500000 | 90.190002 |
5行 × 163列
对齐因子与价格数据¶
[13]:
# Align factor to open price
factor.index += pd.Timedelta('9h30m')
factor = factor.stack()
factor.index = factor.index.set_names(['date', 'asset'])
[14]:
factor.unstack().head()
[14]:
| 资产 | A | AAL | AAP | AAPL | ABBV | ABC | ABT | ACN | ADBE | ADI | ... | NUE | PCG | PKG | PPG | SCHW | STZ | T | VZ | XEC | XOM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 日期 | |||||||||||||||||||||
| 2017-01-04 09:30:00+00:00 | 0.009464 | 0.007127 | -0.001348 | -0.002583 | 0.003685 | -0.000121 | 0.000256 | 0.003864 | 0.002513 | 0.003586 | ... | 0.002516 | 0.003976 | 0.005176 | 0.004829 | 0.004975 | 0.015509 | -0.001860 | -0.000550 | -0.002234 | 0.002531 |
| 2017-01-05 09:30:00+00:00 | -0.001062 | -0.003854 | -0.006570 | -0.000862 | 0.001422 | -0.003307 | -0.003049 | 0.002056 | -0.000096 | 0.000691 | ... | -0.002122 | 0.001155 | 0.000000 | -0.008327 | -0.006065 | -0.047092 | 0.001870 | 0.004769 | 0.000000 | 0.003337 |
| 2017-01-06 09:30:00+00:00 | 0.001934 | -0.000872 | -0.003258 | 0.001458 | 0.001725 | -0.001793 | 0.000000 | 0.000000 | 0.000661 | 0.003646 | ... | -0.000328 | -0.003304 | 0.000469 | 0.001569 | 0.008055 | 0.001635 | -0.015709 | -0.017753 | 0.001784 | 0.002710 |
| 2017-01-09 09:30:00+00:00 | 0.000417 | -0.004328 | 0.002535 | 0.000339 | 0.000157 | -0.002359 | 0.000245 | -0.001376 | -0.003139 | 0.000559 | ... | 0.009786 | -0.000327 | -0.003440 | -0.005649 | -0.005578 | 0.003747 | -0.000726 | -0.000751 | -0.011285 | -0.003164 |
| 2017-01-10 09:30:00+00:00 | 0.004155 | -0.001699 | -0.004896 | -0.001849 | -0.002492 | -0.004446 | 0.001718 | -0.000522 | 0.000000 | -0.000973 | ... | 0.012743 | 0.000498 | -0.000798 | -0.002906 | 0.001702 | -0.001797 | -0.003431 | 0.000190 | 0.004664 | 0.001494 |
5行 × 163列
运行Alphalens¶
周期1将显示从市场开盘到收盘的回报,而周期2将显示从今日开盘到明日开盘的回报
获取Alphalens输入¶
[15]:
non_predictive_factor_data = alphalens.utils.get_clean_factor_and_forward_returns(factor,
pricing,
periods=(1,2),
groupby=ticker_sector,
groupby_labels=sector_names)
Dropped 2.9% entries from factor data: 1.0% in forward returns computation and 2.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
收益分析报告¶
[16]:
alphalens.tears.create_returns_tear_sheet(non_predictive_factor_data)
Returns Analysis
| 6小时30分钟 | 1天 | |
|---|---|---|
| 年化Alpha | 0.324 | -0.046 |
| beta | 0.177 | 0.179 |
| 最高分位数的平均周期回报率 (基点) | -7.776 | -2.096 |
| 底部十分位数平均周期收益率 (基点) | -0.445 | 0.697 |
| 周期平均价差(基点) | -7.331 | -2.795 |
<Figure size 432x288 with 0 Axes>
[17]:
alphalens.tears.create_event_returns_tear_sheet(non_predictive_factor_data, pricing);
<Figure size 432x288 with 0 Axes>