评估会员奖励计划的效果#
一个关于如何使用DoWhy来估计订阅或奖励计划对客户影响的示例。
假设一个网站有一个会员奖励计划,如果客户注册,他们将获得额外的好处。我们如何知道这个计划是否有效?这里相关的因果问题是:> 提供会员奖励计划对总销售额的影响是什么?
而等效的反事实问题是,> 如果当前成员没有注册该计划,他们在网站上的花费会减少多少?
在正式语言中,我们关注的是处理组的平均处理效应(ATT)。
I. 制定因果模型#
假设奖励计划是在2019年1月推出的。结果变量是年底的总支出。我们拥有每个用户的所有月度交易数据,以及选择注册奖励计划的用户的注册时间。以下是数据的样子。
[1]:
# Creating some simulated data for our example
import pandas as pd
import numpy as np
num_users = 10000
num_months = 12
signup_months = np.random.choice(np.arange(1, num_months), num_users) * np.random.randint(0,2, size=num_users) # signup_months == 0 means customer did not sign up
df = pd.DataFrame({
'user_id': np.repeat(np.arange(num_users), num_months),
'signup_month': np.repeat(signup_months, num_months), # signup month == 0 means customer did not sign up
'month': np.tile(np.arange(1, num_months+1), num_users), # months are from 1 to 12
'spend': np.random.poisson(500, num_users*num_months) #np.random.beta(a=2, b=5, size=num_users * num_months)*1000 # centered at 500
})
# A customer is in the treatment group if and only if they signed up
df["treatment"] = df["signup_month"]>0
# Simulating an effect of month (monotonically decreasing--customers buy less later in the year)
df["spend"] = df["spend"] - df["month"]*10
# Simulating a simple treatment effect of 100
after_signup = (df["signup_month"] < df["month"]) & (df["treatment"])
df.loc[after_signup,"spend"] = df[after_signup]["spend"] + 100
df
[1]:
| 用户ID | 注册月份 | 月份 | 消费 | 处理 | |
|---|---|---|---|---|---|
| 0 | 0 | 1 | 1 | 449 | 真 |
| 1 | 0 | 1 | 2 | 583 | True |
| 2 | 0 | 1 | 3 | 519 | True |
| 3 | 0 | 1 | 4 | 581 | True |
| 4 | 0 | 1 | 5 | 549 | True |
| ... | ... | ... | ... | ... | ... |
| 119995 | 9999 | 0 | 8 | 410 | False |
| 119996 | 9999 | 0 | 9 | 409 | False |
| 119997 | 9999 | 0 | 10 | 398 | 假 |
| 119998 | 9999 | 0 | 11 | 401 | False |
| 119999 | 9999 | 0 | 12 | 394 | False |
120000 行 × 5 列
时间的重要性#
时间在建模这个问题中起着至关重要的作用。
奖励注册可能会影响未来的交易,但不会影响之前发生的交易。事实上,奖励注册之前的交易可以被认为是导致奖励注册决策的原因。因此,我们为每个用户拆分变量:
治疗前的活动(假设为治疗的原因)
治疗后的活动(是应用治疗的结果)
当然,许多影响注册和总支出的重要变量缺失了(例如,购买的产品类型、用户账户的时长、地理位置等)。这是分析中的一个关键假设,需要通过反驳测试在后期进行验证。
下面是一个在月份 i=3 注册的用户的因果图。对于任何 i,分析将是类似的。
[2]:
import dowhy
# Setting the signup month (for ease of analysis)
i = 3
[3]:
causal_graph = """digraph {
treatment[label="Program Signup in month i"];
pre_spends;
post_spends;
Z->treatment;
pre_spends -> treatment;
treatment->post_spends;
signup_month->post_spends;
signup_month->treatment;
}"""
# Post-process the data based on the graph and the month of the treatment (signup)
# For each customer, determine their average monthly spend before and after month i
df_i_signupmonth = (
df[df.signup_month.isin([0, i])]
.groupby(["user_id", "signup_month", "treatment"])
.apply(
lambda x: pd.Series(
{
"pre_spends": x.loc[x.month < i, "spend"].mean(),
"post_spends": x.loc[x.month > i, "spend"].mean(),
}
)
)
.reset_index()
)
print(df_i_signupmonth)
model = dowhy.CausalModel(data=df_i_signupmonth,
graph=causal_graph.replace("\n", " "),
treatment="treatment",
outcome="post_spends")
model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))
user_id signup_month treatment pre_spends post_spends
0 2 0 False 479.0 390.888889
1 4 3 True 487.0 522.444444
2 6 0 False 482.5 422.888889
3 8 0 False 473.0 418.444444
4 10 0 False 489.0 424.333333
... ... ... ... ... ...
5326 9987 0 False 480.0 414.000000
5327 9990 0 False 495.5 421.777778
5328 9992 0 False 473.0 405.666667
5329 9996 0 False 490.0 415.555556
5330 9999 0 False 482.0 420.111111
[5331 rows x 5 columns]
更一般地,我们可以在上述图表中包含客户的任何活动数据。所有之前和之后的活动数据将占据与花费金额节点相同的位置(分别为之前和之后),并具有相同的边。
II. 识别因果效应#
为了这个例子,我们假设未观察到的混杂因素不会起到很大的作用。
[4]:
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d
────────────(Expectation(post_spends|signup_month))
d[treatment]
Estimand assumption 1, Unconfoundedness: If U→{treatment} and U→post_spends then P(post_spends|treatment,signup_month,U) = P(post_spends|treatment,signup_month)
### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(post_spends, [Z, pre_spends])*Derivative([treatment], [
Z, pre_spends])**(-1))
Estimand assumption 1, As-if-random: If U→→post_spends then ¬(U →→{Z,pre_spends})
Estimand assumption 2, Exclusion: If we remove {Z,pre_spends}→{treatment}, then ¬({Z,pre_spends}→post_spends)
### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!
根据图表,DoWhy 确定需要对注册月份和治疗前几个月的支出金额(signup_month, pre_spend)进行条件处理。
III. 估计效果#
我们现在基于后门估计量来估计效果,将目标单位设置为“att”。
[5]:
estimate = model.estimate_effect(identified_estimand,
method_name="backdoor.propensity_score_matching",
target_units="att")
print(estimate)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
*** Causal Estimate ***
## Identified estimand
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d
────────────(Expectation(post_spends|signup_month))
d[treatment]
Estimand assumption 1, Unconfoundedness: If U→{treatment} and U→post_spends then P(post_spends|treatment,signup_month,U) = P(post_spends|treatment,signup_month)
## Realized estimand
b: post_spends~treatment+signup_month
Target units: att
## Estimate
Mean value: 97.57746107152285
分析告诉我们的是处理组平均处理效应(ATT)。也就是说,对于在第i=3个月注册奖励计划的客户,总消费的平均影响(与未注册的情况相比)。我们可以通过更改i的值(上面的第2行)然后重新运行分析,类似地计算在其他任何月份注册的客户的影响。
请注意,估计受到左右截断的影响。1. 左截断:如果客户在第一个月注册,我们没有足够的交易历史来将他们与未注册的类似客户进行匹配(从而应用后门识别估计)。2. 右截断:如果客户在最后一个月注册,我们没有足够的未来(处理后)交易来估计注册后的结果。
因此,即使注册的效果在所有月份中相同,由于数据不足(因此在估计的治疗前或治疗后交易活动中的高方差),估计的效果可能会因注册月份而异。
IV. 反驳估计#
我们使用安慰剂治疗反驳者来反驳估计。这个反驳者通过一个独立的随机变量替代治疗,并检查我们的估计是否现在变为零(它应该如此!)。
[6]:
refutation = model.refute_estimate(identified_estimand, estimate, method_name="placebo_treatment_refuter",
placebo_type="permute", num_simulations=20)
print(refutation)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
/home/amit/py-envs/env3.8/lib/python3.8/site-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
Refute: Use a Placebo Treatment
Estimated effect:97.57746107152285
New effect:1.251821060965955
p value:0.430226053357455