什么是Featuretools?#
Featuretools 是一个用于执行自动特征工程的框架。它擅长将时间序列和关系型数据集转换为机器学习的特征矩阵。## 5分钟快速入门以下是使用深度特征合成(DFS)执行自动特征工程的示例。在这个示例中,我们将DFS应用于一个包含时间戳客户交易的多表数据集。
[1]:
import featuretools as ft
2024-10-11 14:50:13,845 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "DiversityScore" from "premium_primitives.diversity_score" because a primitive with that name already exists in "nlp_primitives.diversity_score"
2024-10-11 14:50:13,845 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "LSA" from "premium_primitives.lsa" because a primitive with that name already exists in "nlp_primitives.lsa"
2024-10-11 14:50:13,846 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "MeanCharactersPerSentence" from "premium_primitives.mean_characters_per_sentence" because a primitive with that name already exists in "nlp_primitives.mean_characters_per_sentence"
2024-10-11 14:50:13,846 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "NumberOfSentences" from "premium_primitives.number_of_sentences" because a primitive with that name already exists in "nlp_primitives.number_of_sentences"
2024-10-11 14:50:13,846 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "PartOfSpeechCount" from "premium_primitives.part_of_speech_count" because a primitive with that name already exists in "nlp_primitives.part_of_speech_count"
2024-10-11 14:50:13,846 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "PolarityScore" from "premium_primitives.polarity_score" because a primitive with that name already exists in "nlp_primitives.polarity_score"
2024-10-11 14:50:13,846 featuretools - WARNING While loading primitives via "premium_primitives" entry point, ignored primitive "StopwordCount" from "premium_primitives.stopword_count" because a primitive with that name already exists in "nlp_primitives.stopword_count"
2024-10-11 14:50:13,860 featuretools - WARNING Featuretools failed to load plugin tsfresh from library featuretools_tsfresh_primitives.__init__. For a full stack trace, set logging to debug.
加载模拟数据#
[2]:
data = ft.demo.load_mock_customer()
准备数据#
在这个玩具数据集中,有3个数据框。 - customers: 有会话的唯一客户 - sessions: 唯一会话和相关属性 - transactions: 该会话中事件的列表
[3]:
customers_df = data["customers"]
customers_df
[3]:
| customer_id | zip_code | join_date | birthday | |
|---|---|---|---|---|
| 0 | 1 | 60091 | 2011-04-17 10:48:33 | 1994-07-18 |
| 1 | 2 | 13244 | 2012-04-15 23:31:04 | 1986-08-18 |
| 2 | 3 | 13244 | 2011-08-13 15:42:34 | 2003-11-21 |
| 3 | 4 | 60091 | 2011-04-08 20:08:14 | 2006-08-15 |
| 4 | 5 | 60091 | 2010-07-17 05:27:50 | 1984-07-28 |
[4]:
sessions_df = data["sessions"]
sessions_df.sample(5)
[4]:
| session_id | customer_id | device | session_start | |
|---|---|---|---|---|
| 13 | 14 | 1 | tablet | 2014-01-01 03:28:00 |
| 6 | 7 | 3 | tablet | 2014-01-01 01:39:40 |
| 1 | 2 | 5 | mobile | 2014-01-01 00:17:20 |
| 28 | 29 | 1 | mobile | 2014-01-01 07:10:05 |
| 24 | 25 | 3 | desktop | 2014-01-01 05:59:40 |
[5]:
transactions_df = data["transactions"]
transactions_df.sample(5)
[5]:
| transaction_id | session_id | transaction_time | product_id | amount | |
|---|---|---|---|---|---|
| 74 | 232 | 5 | 2014-01-01 01:20:10 | 1 | 139.20 |
| 231 | 27 | 17 | 2014-01-01 04:10:15 | 2 | 90.79 |
| 434 | 36 | 31 | 2014-01-01 07:50:10 | 3 | 62.35 |
| 420 | 56 | 30 | 2014-01-01 07:35:00 | 3 | 72.70 |
| 54 | 444 | 4 | 2014-01-01 00:58:30 | 4 | 43.59 |
首先,我们指定一个包含数据集中所有DataFrame的字典。如果DataFrame存在时间索引列,那么将传入该索引列和时间索引列。
[6]:
dataframes = {
"customers": (customers_df, "customer_id"),
"sessions": (sessions_df, "session_id", "session_start"),
"transactions": (transactions_df, "transaction_id", "transaction_time"),
}
第二步,我们指定DataFrame之间的关系。当两个DataFrame之间存在一对多的关系时,我们将“一” DataFrame称为“父” DataFrame。父子关系的定义如下: (父DataFrame, 父列, 子DataFrame, 子列)在这个数据集中,我们有两个关系。
[7]:
relationships = [
("sessions", "session_id", "transactions", "session_id"),
("customers", "customer_id", "sessions", "customer_id"),
]
Note
要管理设置 DataFrame 和关系,我们建议使用 EntitySet 类,该类提供了方便的 API 来管理这样的数据。有关更多信息,请参见 用EntitySets表示数据。
运行深度特征合成#
DFS的最小输入是一个DataFrame字典、一个关系列表,以及我们想要计算特征的目标DataFrame的名称。DFS的输出是一个特征矩阵和相应的特征定义列表。让我们首先为数据中的每个客户创建一个特征矩阵。
[8]:
feature_matrix_customers, features_defs = ft.dfs(
dataframes=dataframes,
relationships=relationships,
target_dataframe_name="customers",
)
feature_matrix_customers
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x10a71ff60> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x10a71ff60> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x10a71ff60> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x10a71ff60> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
).agg(to_agg)
[8]:
| zip_code | COUNT(sessions) | MODE(sessions.device) | NUM_UNIQUE(sessions.device) | COUNT(transactions) | MAX(transactions.amount) | MEAN(transactions.amount) | MIN(transactions.amount) | MODE(transactions.product_id) | NUM_UNIQUE(transactions.product_id) | ... | STD(sessions.SKEW(transactions.amount)) | STD(sessions.SUM(transactions.amount)) | SUM(sessions.MAX(transactions.amount)) | SUM(sessions.MEAN(transactions.amount)) | SUM(sessions.MIN(transactions.amount)) | SUM(sessions.NUM_UNIQUE(transactions.product_id)) | SUM(sessions.SKEW(transactions.amount)) | SUM(sessions.STD(transactions.amount)) | MODE(transactions.sessions.device) | NUM_UNIQUE(transactions.sessions.device) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| customer_id | |||||||||||||||||||||
| 1 | 60091 | 8 | mobile | 3 | 126 | 139.43 | 71.631905 | 5.81 | 4 | 5 | ... | 0.589386 | 279.510713 | 1057.97 | 582.193117 | 78.59 | 40.0 | -0.476122 | 312.745952 | mobile | 3 |
| 2 | 13244 | 7 | desktop | 3 | 93 | 146.81 | 77.422366 | 8.73 | 4 | 5 | ... | 0.509798 | 251.609234 | 931.63 | 548.905851 | 154.60 | 35.0 | -0.277640 | 258.700528 | desktop | 3 |
| 3 | 13244 | 6 | desktop | 3 | 93 | 149.15 | 67.060430 | 5.89 | 1 | 5 | ... | 0.429374 | 219.021420 | 847.63 | 405.237462 | 66.21 | 29.0 | 2.286086 | 257.299895 | desktop | 3 |
| 4 | 60091 | 8 | mobile | 3 | 109 | 149.95 | 80.070459 | 5.73 | 2 | 5 | ... | 0.387884 | 235.992478 | 1157.99 | 649.657515 | 131.51 | 37.0 | 0.002764 | 356.125829 | mobile | 3 |
| 5 | 60091 | 6 | mobile | 3 | 79 | 149.02 | 80.375443 | 7.55 | 5 | 5 | ... | 0.415426 | 402.775486 | 839.76 | 472.231119 | 86.49 | 30.0 | 0.014384 | 259.873954 | mobile | 3 |
5 rows × 75 columns
我们现在有数十种新功能来描述客户的行为。#### 更改目标DataFrameDFS如此强大的原因之一是它可以为我们实体集中的任何DataFrame创建特征矩阵。例如,如果我们想要为会话构建特征。
[10]:
feature_matrix_sessions, features_defs = ft.dfs(
dataframes=dataframes, relationships=relationships, target_dataframe_name="sessions"
)
feature_matrix_sessions.head(5)
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x10a71ff60> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x10a71ff60> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
).agg(to_agg)
[10]:
| customer_id | device | COUNT(transactions) | MAX(transactions.amount) | MEAN(transactions.amount) | MIN(transactions.amount) | MODE(transactions.product_id) | NUM_UNIQUE(transactions.product_id) | SKEW(transactions.amount) | STD(transactions.amount) | ... | customers.STD(transactions.amount) | customers.SUM(transactions.amount) | customers.DAY(birthday) | customers.DAY(join_date) | customers.MONTH(birthday) | customers.MONTH(join_date) | customers.WEEKDAY(birthday) | customers.WEEKDAY(join_date) | customers.YEAR(birthday) | customers.YEAR(join_date) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| session_id | |||||||||||||||||||||
| 1 | 2 | desktop | 16 | 141.66 | 76.813125 | 20.91 | 3 | 5 | 0.295458 | 41.600976 | ... | 37.705178 | 7200.28 | 18 | 15 | 8 | 4 | 0 | 6 | 1986 | 2012 |
| 2 | 5 | mobile | 10 | 135.25 | 74.696000 | 9.32 | 5 | 5 | -0.160550 | 45.893591 | ... | 44.095630 | 6349.66 | 28 | 17 | 7 | 7 | 5 | 5 | 1984 | 2010 |
| 3 | 4 | mobile | 15 | 147.73 | 88.600000 | 8.70 | 1 | 5 | -0.324012 | 46.240016 | ... | 45.068765 | 8727.68 | 15 | 8 | 8 | 4 | 1 | 4 | 2006 | 2011 |
| 4 | 1 | mobile | 25 | 129.00 | 64.557200 | 6.29 | 5 | 5 | 0.234349 | 40.187205 | ... | 40.442059 | 9025.62 | 18 | 17 | 7 | 4 | 0 | 6 | 1994 | 2011 |
| 5 | 4 | mobile | 11 | 139.20 | 70.638182 | 7.43 | 5 | 5 | 0.336381 | 48.918663 | ... | 45.068765 | 8727.68 | 15 | 8 | 8 | 4 | 1 | 4 | 2006 | 2011 |
5 rows × 44 columns
理解特征输出#
一般来说,Featuretools 通过特征名称引用生成的特征。为了使特征更易于理解,Featuretools 提供了两个额外的工具,featuretools.graph_feature() 和 featuretools.describe_feature(),帮助解释特征是什么以及 Featuretools 生成它的步骤。让我们看一个示例特征:
[11]:
feature = features_defs[18]
feature
[11]:
<Feature: MODE(transactions.WEEKDAY(transaction_time))>
特征谱系图#
特征谱系图通过可视化方式展示特征生成的过程。从基础数据开始,逐步展示应用的原语和生成的中间特征,以创建最终特征。
[12]:
ft.graph_feature(feature)
[12]:
![digraph "MODE(transactions.WEEKDAY(transaction_time))" {
graph [bb="0,0,1203,153",
rankdir=LR
];
node [label="\N",
shape=box
];
edge [arrowhead=none,
dir=forward,
style=dotted
];
{
graph [rank=min];
"1_WEEKDAY(transaction_time)_weekday" [height=0.94444,
label=<<FONT POINT-SIZE="12"><B>Step 1:</B> Transform<BR></BR></FONT>WEEKDAY>,
pos="109,40.5",
shape=diamond,
width=3.0278];
}
sessions [height=1.1493,
label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10">
<TR>
<TD colspan="1" bgcolor="#A9A9A9"><B>★ sessions (target)</B></TD>
</TR>
<TR>
<TD ALIGN="LEFT" port="MODE(transactions.WEEKDAY(transaction_time))" BGCOLOR="#D9EAD3">MODE(transactions.WEEKDAY(transaction_time))</TD>
</TR>
</TABLE>>,
pos="1040.8,77.5",
shape=plaintext,
width=4.5069];
transactions [height=2.125,
label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10">
<TR>
<TD colspan="1" bgcolor="#A9A9A9"><B>transactions</B></TD>
</TR><TR><TD ALIGN="LEFT" port="session_id">session_id</TD></TR>
<TR><TD ALIGN="LEFT" port="transaction_time">transaction_time</TD></TR>
<TR><TD ALIGN="LEFT" port="WEEKDAY(transaction_time)">WEEKDAY(transaction_time)</TD></TR>
</TABLE>>,
pos="357.38,76.5",
shape=plaintext,
width=2.8715];
transactions:transaction_time -> "1_WEEKDAY(transaction_time)_weekday" [arrowhead="",
pos="e,181.23,52.423 261,58.375 238.74,58.375 214.77,56.436 192.63,53.836",
style=solid];
"MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id" [height=0.50694,
label="group by
session_id",
pos="532.88,58.5",
width=1.0035];
transactions:"WEEKDAY(transaction_time)" -> "MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id" [arrowhead="",
pos="e,506.48,39.886 453.75,22.125 468.62,22.125 483.73,27.611 496.64,34.323",
style=solid];
transactions:session_id -> "MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id" [pos="453.75,94.625 472.43,94.625 491.49,86.012 506.11,77.197"];
"0_MODE(transactions.WEEKDAY(transaction_time))_mode" [height=0.94444,
label=<<FONT POINT-SIZE="12"><B>Step 2:</B> Aggregation<BR></BR></FONT>MODE>,
pos="723.75,58.5",
shape=diamond,
width=3.2986];
"0_MODE(transactions.WEEKDAY(transaction_time))_mode" -> sessions:"MODE(transactions.WEEKDAY(transaction_time))" [arrowhead="",
pos="e,885.5,58.25 843.26,58.274 853.6,58.263 864,58.255 874.15,58.252",
style=solid];
"1_WEEKDAY(transaction_time)_weekday" -> transactions:"WEEKDAY(transaction_time)" [arrowhead="",
pos="e,261,22.125 180.46,28.346 202.21,25.409 226.56,22.926 249.73,22.285",
style=solid];
"MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id" -> "0_MODE(transactions.WEEKDAY(transaction_time))_mode" [arrowhead="",
pos="e,603.48,58.5 569.4,58.5 576.38,58.5 584.07,58.5 592.19,58.5",
style=solid];
}](_images/graphviz-b2fee41493c322569ecfbe90d6c806dc97dd444a.png)
特征描述#
Featuretools 还可以自动生成特征的英文句子描述。特征描述有助于解释特征的含义,并且可以通过包含手动定义的自定义定义来进一步改进。有关如何自定义自动生成的特征描述的更多详细信息,请参阅 :doc:/guides/feature_descriptions。
[13]:
ft.describe_feature(feature)
[13]:
'The most frequently occurring value of the day of the week of the "transaction_time" of all instances of "transactions" for each "session_id" in "sessions".'