首先alphalens的数据格式:
factor: MultiIndex(用stack()方法来转换)
prices: DataFrame
- #转换成MultiIndex
- factor = alpha_mom.stack()
- print (factor.tail())
- datetime
- 2017-11-20 15:00:00 601857.XSHG 1.022616
- 601881.XSHG 0.744411
- 601901.XSHG 0.893478
- 601985.XSHG 0.993412
- 601988.XSHG 0.971698
- dtype: float64
- # 股票池价格的Dataframe
- prices = PN.minor_xs('close')
- print (prices.tail())
- 600000.XSHG 600016.XSHG 600028.XSHG 600029.XSHG \
- datetime
- 2017-11-14 15:00:00 118.12 125.93 12.06 16.00
- 2017-11-15 15:00:00 118.12 124.74 11.82 16.04
- 2017-11-16 15:00:00 116.16 123.54 11.76 16.29
- 2017-11-17 15:00:00 119.81 127.42 11.92 16.97
- 2017-11-20 15:00:00 120.47 128.17 11.92 17.05
-
- 600030.XSHG 600036.XSHG 600048.XSHG 600050.XSHG \
- datetime
- 2017-11-14 15:00:00 69.27 111.81 199.75 9.49
- 2017-11-15 15:00:00 69.04 111.25 204.52 9.68
- 2017-11-16 15:00:00 68.05 112.13 218.27 9.61
- 2017-11-17 15:00:00 69.88 117.24 224.00 9.63
- 2017-11-20 15:00:00 67.71 121.82 224.19 9.80
-
- 600100.XSHG 600104.XSHG ... 601766.XSHG \
- datetime ...
- 2017-11-14 15:00:00 178.62 204.03 ... 12.10
- 2017-11-15 15:00:00 176.35 202.78 ... 12.07
- 2017-11-16 15:00:00 174.24 200.97 ... 11.77
- 2017-11-17 15:00:00 165.92 207.21 ... 12.11
- 2017-11-20 15:00:00 170.61 206.46 ... 12.14
-
- 601788.XSHG 601800.XSHG 601818.XSHG 601857.XSHG \
- datetime
- 2017-11-14 15:00:00 17.28 17.39 5.13 10.63
- 2017-11-15 15:00:00 17.25 17.34 5.12 10.37
- 2017-11-16 15:00:00 17.04 16.91 5.11 10.28
- 2017-11-17 15:00:00 17.30 17.04 5.21 10.33
- 2017-11-20 15:00:00 17.18 16.79 5.24 10.40
-
- 601881.XSHG 601901.XSHG 601985.XSHG 601988.XSHG \
- datetime
- 2017-11-14 15:00:00 13.15 8.63 7.80 6.08
- 2017-11-15 15:00:00 13.03 8.49 7.79 6.07
- 2017-11-16 15:00:00 12.76 8.28 7.54 6.02
- 2017-11-17 15:00:00 12.30 8.11 7.63 6.14
- 2017-11-20 15:00:00 12.32 8.22 7.54 6.18
-
- 601989.XSHG
- datetime
- 2017-11-14 15:00:00 10.64
- 2017-11-15 15:00:00 10.51
- 2017-11-16 15:00:00 10.49
- 2017-11-17 15:00:00 10.14
- 2017-11-20 15:00:00 10.25
-
- [5 rows x 49 columns]
- #输入Alphalen所需要的数据格式
- import alphalens
- factor_data = alphalens.utils.get_clean_factor_and_forward_returns(factor, prices, quantiles=5)
- print (factor_data.head())
- 1 5 10 factor \
- date asset
- 2017-03-07 15:00:00 600000.XSHG -0.001197 -0.010349 -0.024974 1.008018
- 600016.XSHG -0.005597 -0.015598 -0.034555 0.985728
- 600028.XSHG 0.003578 -0.016100 0.007156 1.021938
- 600029.XSHG -0.003912 0.010172 0.000782 1.097938
- 600030.XSHG -0.006045 -0.006045 -0.013999 1.016659
-
- factor_quantile
- date asset
- 2017-03-07 15:00:00 600000.XSHG 2
- 600016.XSHG 1
- 600028.XSHG 3
- 600029.XSHG 5
- 600030.XSHG 3
- mean_return_by_q, std_err_by_q = alphalens.performance.mean_return_by_quantile(factor_data, by_date=True)
- print(mean_return_by_q.head())
- print(std_err_by_q.head())
- 1 5 10
- factor_quantile date
- 1 2017-03-07 15:00:00 0.006782 0.003821 0.006060
- 2017-03-08 15:00:00 0.002207 0.000536 -0.005845
- 2017-03-09 15:00:00 0.000176 0.001881 0.012697
- 2017-03-10 15:00:00 0.001894 0.004035 0.006478
- 2017-03-13 15:00:00 0.000316 0.009381 0.011278
- 1 5 10
- factor_quantile date
- 1 2017-03-07 15:00:00 0.008181 0.005817 0.011047
- 2017-03-08 15:00:00 0.001643 0.005422 0.012947
- 2017-03-09 15:00:00 0.002841 0.004721 0.012215
- 2017-03-10 15:00:00 0.002748 0.003273 0.013972
- 2017-03-13 15:00:00 0.001233 0.006354 0.011653
- 如何将不同收益曲线可视化?
- 1.持有不同天数的收益曲线
- 2.累积收益曲线
- import matplotlib.pyplot as plt
- alphalens.plotting.plot_cumulative_returns_by_quantile(mean_return_by_q, 10)
- plt.show()
度量变量的预测值与实际值之间的关系的相关值。信息系数是用来评估金融分析师预测技能的一种表现方法。
系数在-1到1之间,越大表示正相关程度强。标准是mean(IC)>0.02
其中d为秩次差。
因此IC值是代表因子排序与收益排序的相关性。
A = [1,3,5,7,9]
B = [3,2,4,5,1]
A的排序是1,2,3,4,5
B的排序是3,2,4,5,1
d为排序相减
- # IC值例子
- ic = alphalens.performance.factor_information_coefficient(factor_data)
- # print (ic)
- alphalens.plotting.plot_ic_hist(ic)
- mean_monthly_ic = alphalens.performance.mean_information_coefficient(factor_data, by_time='M')
- # print mean_monthly_ic.mean()
- alphalens.plotting.plot_monthly_ic_heatmap(mean_monthly_ic)
- plt.show()
factor_returns = alphalens.performance.factor_returns(factor_data)
- alphalens.plotting.plot_cumulative_returns(factor_returns[10])
- plt.show()
Alphalens数据准备
源数据需要两个DataFrame:
1.因子数据
2.股价数据(添加行业数据,用于行业中性化)
因子数据:
可以在factor_value后新增一列行业。
因子数据的前2列:date、asset是多重索引(MultiIndex),一级索引是date(日期),二级索引是asset(资产)
股价数据:
get_clean_factor_and_forward_returns()
- alphalens.utils.get_clean_factor_and_forward_returns(factors,
- prices,
- groupby=None,
- binning_by_group=False,
- quantiles=5,
- bins=None,
- periods=(1, 5, 10),
- filter_zscore=20,
- groupby_labels=None,
- max_loss=0.30,
- zero_aware=False,
- cumulative_returns=True)
参数详解
将清洗好的数据送入create_full_tear_sheet,即可获得所有的分析图
alphalens.tears.create_full_tear_sheet(data)
附上Alphalens文档