日内因子

在本笔记本中,我们使用Alphalens分析日内因子的表现,该因子每日计算,股票在开盘时买入并在收盘时卖出,不持有隔夜头寸。

导入与设置

[2]:
import warnings
warnings.filterwarnings('ignore')
[3]:
import alphalens
import pandas as pd
[4]:
%matplotlib inline

加载数据

以下是一个针对少量大盘股的投资组合中股票代码与行业板块的简单映射关系。

[5]:
sector_names = {
    0 : "information_technology",
    1 : "financials",
    2 : "health_care",
    3 : "industrials",
    4 : "utilities",
    5 : "real_estate",
    6 : "materials",
    7 : "telecommunication_services",
    8 : "consumer_staples",
    9 : "consumer_discretionary",
    10 : "energy"
}

ticker_sector = {
    "ACN" : 0, "ATVI" : 0, "ADBE" : 0, "AMD" : 0, "AKAM" : 0, "ADS" : 0, "GOOGL" : 0, "GOOG" : 0,
    "APH" : 0, "ADI" : 0, "ANSS" : 0, "AAPL" : 0, "AMAT" : 0, "ADSK" : 0, "ADP" : 0, "AVGO" : 0,
    "AMG" : 1, "AFL" : 1, "ALL" : 1, "AXP" : 1, "AIG" : 1, "AMP" : 1, "AON" : 1, "AJG" : 1, "AIZ" : 1, "BAC" : 1,
    "BK" : 1, "BBT" : 1, "BRK.B" : 1, "BLK" : 1, "HRB" : 1, "BHF" : 1, "COF" : 1, "CBOE" : 1, "SCHW" : 1, "CB" : 1,
    "ABT" : 2, "ABBV" : 2, "AET" : 2, "A" : 2, "ALXN" : 2, "ALGN" : 2, "AGN" : 2, "ABC" : 2, "AMGN" : 2, "ANTM" : 2,
    "BCR" : 2, "BAX" : 2, "BDX" : 2, "BIIB" : 2, "BSX" : 2, "BMY" : 2, "CAH" : 2, "CELG" : 2, "CNC" : 2, "CERN" : 2,
    "MMM" : 3, "AYI" : 3, "ALK" : 3, "ALLE" : 3, "AAL" : 3, "AME" : 3, "AOS" : 3, "ARNC" : 3, "BA" : 3, "CHRW" : 3,
    "CAT" : 3, "CTAS" : 3, "CSX" : 3, "CMI" : 3, "DE" : 3, "DAL" : 3, "DOV" : 3, "ETN" : 3, "EMR" : 3, "EFX" : 3,
    "AES" : 4, "LNT" : 4, "AEE" : 4, "AEP" : 4, "AWK" : 4, "CNP" : 4, "CMS" : 4, "ED" : 4, "D" : 4, "DTE" : 4,
    "DUK" : 4, "EIX" : 4, "ETR" : 4, "ES" : 4, "EXC" : 4, "FE" : 4, "NEE" : 4, "NI" : 4, "NRG" : 4, "PCG" : 4,
    "ARE" : 5, "AMT" : 5, "AIV" : 5, "AVB" : 5, "BXP" : 5, "CBG" : 5, "CCI" : 5, "DLR" : 5, "DRE" : 5,
    "EQIX" : 5, "EQR" : 5, "ESS" : 5, "EXR" : 5, "FRT" : 5, "GGP" : 5, "HCP" : 5, "HST" : 5, "IRM" : 5, "KIM" : 5,
    "APD" : 6, "ALB" : 6, "AVY" : 6, "BLL" : 6, "CF" : 6, "DWDP" : 6, "EMN" : 6, "ECL" : 6, "FMC" : 6, "FCX" : 6,
    "IP" : 6, "IFF" : 6, "LYB" : 6, "MLM" : 6, "MON" : 6, "MOS" : 6, "NEM" : 6, "NUE" : 6, "PKG" : 6, "PPG" : 6,
    "T" : 7, "CTL" : 7, "VZ" : 7,
    "MO" : 8, "ADM" : 8, "BF.B" : 8, "CPB" : 8, "CHD" : 8, "CLX" : 8, "KO" : 8, "CL" : 8, "CAG" : 8,
    "STZ" : 8, "COST" : 8, "COTY" : 8, "CVS" : 8, "DPS" : 8, "EL" : 8, "GIS" : 8, "HSY" : 8, "HRL" : 8,
    "AAP" : 9, "AMZN" : 9, "APTV" : 9, "AZO" : 9, "BBY" : 9, "BWA" : 9, "KMX" : 9, "CCL" : 9,
    "APC" : 10, "ANDV" : 10, "APA" : 10, "BHGE" : 10, "COG" : 10, "CHK" : 10, "CVX" : 10, "XEC" : 10, "CXO" : 10,
    "COP" : 10, "DVN" : 10, "EOG" : 10, "EQT" : 10, "XOM" : 10, "HAL" : 10, "HP" : 10, "HES" : 10, "KMI" : 10
}

YFinance数据下载

[6]:
import yfinance as yf
import pandas_datareader.data as web
yf.pdr_override()

tickers = list(ticker_sector.keys())
df = web.get_data_yahoo(tickers, start='2017-01-01',  end='2017-06-01')
df.index = pd.to_datetime(df.index, utc=True)
[*********************100%***********************]  182 of 182 completed

19 Failed downloads:
- CHK: Data doesn't exist for startDate = 1483250400, endDate = 1496293200
- BF.B: No data found for this date range, symbol may be delisted
- BRK.B: No data found, symbol may be delisted
- MON: Data doesn't exist for startDate = 1483250400, endDate = 1496293200
- CELG: No data found, symbol may be delisted
- APC: No data found, symbol may be delisted
- ARNC: Data doesn't exist for startDate = 1483250400, endDate = 1496293200
- CBG: No data found for this date range, symbol may be delisted
- CTL: No data found, symbol may be delisted
- BCR: No data found for this date range, symbol may be delisted
- DWDP: No data found, symbol may be delisted
- BHF: Data doesn't exist for startDate = 1483250400, endDate = 1496293200
- GGP: No data found for this date range, symbol may be delisted
- CXO: No data found, symbol may be delisted
- HCP: No data found, symbol may be delisted
- DPS: No data found for this date range, symbol may be delisted
- AGN: No data found, symbol may be delisted
- BBT: No data found, symbol may be delisted
- BHGE: No data found, symbol may be delisted
[7]:
df = df.stack()
df.index.names = ['date', 'asset']
df.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 16789 entries, (Timestamp('2017-01-03 00:00:00+0000', tz='UTC'), 'A') to (Timestamp('2017-05-31 00:00:00+0000', tz='UTC'), 'XOM')
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   Adj Close  16789 non-null  float64
 1   Close      16789 non-null  float64
 2   High       16789 non-null  float64
 3   Low        16789 non-null  float64
 4   Open       16789 non-null  float64
 5   Volume     16789 non-null  float64
dtypes: float64(6)
memory usage: 842.6+ KB

因子计算

我们的示例因子根据股票的隔夜价格缺口(昨日收盘价至今日开盘价)对股票进行排序。我们将观察该因子是否具有阿尔法收益,还是纯粹的噪音。

[8]:
available_tickers = df.index.unique('asset')
ticker_sector = {k: v for k, v in ticker_sector.items() if k in available_tickers}
[9]:
today_open = df.loc[:, 'Open'].unstack('asset')
today_close = df.loc[:, 'Close'].unstack('asset')
yesterday_close = today_close.shift(1)
[10]:
factor = (today_open - yesterday_close) / yesterday_close

传递给alphalens的定价数据应包含资产的入场价格,因此必须反映在给定时间戳观察到因子值后的下一个可用价格。这些价格不得用于计算该时间点的因子值。务必反复检查,确保您的研究没有引入前瞻性偏差。

价格数据还必须包含资产的退出价格,对于周期1将使用下一个时间戳的价格,对于周期2将使用2个时间戳后的价格,依此类推。

对于计算因子时使用的时间频率没有限制/假设,对于交易因子的具体时间也没有限制(开盘交易、收盘交易或日内交易),唯一的要求是根据上述规则确保因子和价格数据框正确对齐。

在我们的示例中,我们希望以开盘价买入股票,因此需要与因子值时间戳完全匹配的开盘价数据;同时我们计划以收盘价卖出股票,所以还需添加收盘价数据——这些收盘价将用于计算周期1的远期收益,因为它们正好出现在因子值时间戳之后。因此Alphalens计算的收益率将基于资产从开盘价到收盘价的价差。

如果我们有其他价格数据,就可以计算其他时间段的收益率,例如开盘后一小时、两小时等。我们可以在开盘价之后直接添加这些价格,并指示Alphalens计算1、2、3...等多个周期的收益率,而不仅限于本例中的周期1。

数据格式化

时间调整

[11]:
# Fix time as Yahoo doesn't set it
today_open.index += pd.Timedelta('9h30m')
today_close.index += pd.Timedelta('16h')
# pricing will contain both open and close
pricing = pd.concat([today_open, today_close]).sort_index()
[12]:
pricing.head()
[12]:
资产 A AAL AAP AAPL ABBV ABC ABT ACN ADBE ADI ... NUE PCG PKG PPG SCHW STZ T VZ XEC XOM
日期
2017-01-03 09:30:00+00:00 45.930000 47.279999 170.779999 28.950001 62.919998 78.510002 38.630001 117.379997 103.430000 72.599998 ... 59.740002 60.810001 85.160004 95.430000 40.049999 155.009995 42.689999 53.959999 137.529999 90.940002
2017-01-03 16:00:00+00:00 46.490002 46.299999 170.600006 29.037500 62.410000 82.610001 39.049999 116.459999 103.480003 72.510002 ... 59.610001 60.369999 85.000000 95.250000 40.200001 154.750000 43.020000 54.580002 138.789993 90.889999
2017-01-04 09:30:00+00:00 46.930000 46.630001 170.369995 28.962500 62.639999 82.599998 39.060001 116.910004 103.739998 72.769997 ... 59.759998 60.610001 85.440002 95.709999 40.400002 157.149994 42.939999 54.549999 138.479996 91.120003
2017-01-04 16:00:00+00:00 47.099998 46.700001 172.000000 29.004999 63.290001 84.660004 39.360001 116.739998 104.139999 72.360001 ... 61.250000 60.590000 86.370003 97.269997 41.220001 157.990005 42.770000 54.520000 138.500000 89.889999
2017-01-05 09:30:00+00:00 47.049999 46.520000 170.869995 28.980000 63.380001 84.379997 39.240002 116.980003 104.129997 72.410004 ... 61.119999 60.660000 86.370003 96.459999 40.970001 150.550003 42.849998 54.779999 138.500000 90.190002

5行 × 163列

对齐因子与价格数据

[13]:
# Align factor to open price
factor.index += pd.Timedelta('9h30m')
factor = factor.stack()
factor.index = factor.index.set_names(['date', 'asset'])
[14]:
factor.unstack().head()
[14]:
资产 A AAL AAP AAPL ABBV ABC ABT ACN ADBE ADI ... NUE PCG PKG PPG SCHW STZ T VZ XEC XOM
日期
2017-01-04 09:30:00+00:00 0.009464 0.007127 -0.001348 -0.002583 0.003685 -0.000121 0.000256 0.003864 0.002513 0.003586 ... 0.002516 0.003976 0.005176 0.004829 0.004975 0.015509 -0.001860 -0.000550 -0.002234 0.002531
2017-01-05 09:30:00+00:00 -0.001062 -0.003854 -0.006570 -0.000862 0.001422 -0.003307 -0.003049 0.002056 -0.000096 0.000691 ... -0.002122 0.001155 0.000000 -0.008327 -0.006065 -0.047092 0.001870 0.004769 0.000000 0.003337
2017-01-06 09:30:00+00:00 0.001934 -0.000872 -0.003258 0.001458 0.001725 -0.001793 0.000000 0.000000 0.000661 0.003646 ... -0.000328 -0.003304 0.000469 0.001569 0.008055 0.001635 -0.015709 -0.017753 0.001784 0.002710
2017-01-09 09:30:00+00:00 0.000417 -0.004328 0.002535 0.000339 0.000157 -0.002359 0.000245 -0.001376 -0.003139 0.000559 ... 0.009786 -0.000327 -0.003440 -0.005649 -0.005578 0.003747 -0.000726 -0.000751 -0.011285 -0.003164
2017-01-10 09:30:00+00:00 0.004155 -0.001699 -0.004896 -0.001849 -0.002492 -0.004446 0.001718 -0.000522 0.000000 -0.000973 ... 0.012743 0.000498 -0.000798 -0.002906 0.001702 -0.001797 -0.003431 0.000190 0.004664 0.001494

5行 × 163列

运行Alphalens

周期1将显示从市场开盘到收盘的回报,而周期2将显示从今日开盘到明日开盘的回报

获取Alphalens输入

[15]:
non_predictive_factor_data = alphalens.utils.get_clean_factor_and_forward_returns(factor,
                                                                                  pricing,
                                                                                  periods=(1,2),
                                                                                  groupby=ticker_sector,
                                                                                  groupby_labels=sector_names)
Dropped 2.9% entries from factor data: 1.0% in forward returns computation and 2.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

收益分析报告

[16]:
alphalens.tears.create_returns_tear_sheet(non_predictive_factor_data)
Returns Analysis
6小时30分钟 1天
年化Alpha 0.324 -0.046
beta 0.177 0.179
最高分位数的平均周期回报率 (基点) -7.776 -2.096
底部十分位数平均周期收益率 (基点) -0.445 0.697
周期平均价差(基点) -7.331 -2.795
<Figure size 432x288 with 0 Axes>
../_images/notebooks_intraday_factor_30_3.png
[17]:
alphalens.tears.create_event_returns_tear_sheet(non_predictive_factor_data, pricing);
<Figure size 432x288 with 0 Axes>
../_images/notebooks_intraday_factor_31_1.png