什么是Featuretools?#

Featuretools

Featuretools 是一个用于执行自动特征工程的框架。它擅长将时间序列和关系型数据集转换为机器学习的特征矩阵。## 5分钟快速入门以下是使用深度特征合成(DFS)执行自动特征工程的示例。在这个示例中,我们将DFS应用于一个包含时间戳客户交易的多表数据集。

[1]:
import featuretools as ft

2024-10-11 14:50:13,845 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "DiversityScore" from "premium_primitives.diversity_score" because a primitive with that name already exists in "nlp_primitives.diversity_score"
2024-10-11 14:50:13,845 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "LSA" from "premium_primitives.lsa" because a primitive with that name already exists in "nlp_primitives.lsa"
2024-10-11 14:50:13,846 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "MeanCharactersPerSentence" from "premium_primitives.mean_characters_per_sentence" because a primitive with that name already exists in "nlp_primitives.mean_characters_per_sentence"
2024-10-11 14:50:13,846 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "NumberOfSentences" from "premium_primitives.number_of_sentences" because a primitive with that name already exists in "nlp_primitives.number_of_sentences"
2024-10-11 14:50:13,846 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "PartOfSpeechCount" from "premium_primitives.part_of_speech_count" because a primitive with that name already exists in "nlp_primitives.part_of_speech_count"
2024-10-11 14:50:13,846 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "PolarityScore" from "premium_primitives.polarity_score" because a primitive with that name already exists in "nlp_primitives.polarity_score"
2024-10-11 14:50:13,846 featuretools - WARNING    While loading primitives via "premium_primitives" entry point, ignored primitive "StopwordCount" from "premium_primitives.stopword_count" because a primitive with that name already exists in "nlp_primitives.stopword_count"
2024-10-11 14:50:13,860 featuretools - WARNING    Featuretools failed to load plugin tsfresh from library featuretools_tsfresh_primitives.__init__. For a full stack trace, set logging to debug.

加载模拟数据#

[2]:
data = ft.demo.load_mock_customer()

准备数据#

在这个玩具数据集中,有3个数据框。 - customers: 有会话的唯一客户 - sessions: 唯一会话和相关属性 - transactions: 该会话中事件的列表

[3]:
customers_df = data["customers"]
customers_df

[3]:
customer_id zip_code join_date birthday
0 1 60091 2011-04-17 10:48:33 1994-07-18
1 2 13244 2012-04-15 23:31:04 1986-08-18
2 3 13244 2011-08-13 15:42:34 2003-11-21
3 4 60091 2011-04-08 20:08:14 2006-08-15
4 5 60091 2010-07-17 05:27:50 1984-07-28
[4]:
sessions_df = data["sessions"]
sessions_df.sample(5)

[4]:
session_id customer_id device session_start
13 14 1 tablet 2014-01-01 03:28:00
6 7 3 tablet 2014-01-01 01:39:40
1 2 5 mobile 2014-01-01 00:17:20
28 29 1 mobile 2014-01-01 07:10:05
24 25 3 desktop 2014-01-01 05:59:40
[5]:
transactions_df = data["transactions"]
transactions_df.sample(5)

[5]:
transaction_id session_id transaction_time product_id amount
74 232 5 2014-01-01 01:20:10 1 139.20
231 27 17 2014-01-01 04:10:15 2 90.79
434 36 31 2014-01-01 07:50:10 3 62.35
420 56 30 2014-01-01 07:35:00 3 72.70
54 444 4 2014-01-01 00:58:30 4 43.59

首先,我们指定一个包含数据集中所有DataFrame的字典。如果DataFrame存在时间索引列,那么将传入该索引列和时间索引列。

[6]:
dataframes = {
    "customers": (customers_df, "customer_id"),
    "sessions": (sessions_df, "session_id", "session_start"),
    "transactions": (transactions_df, "transaction_id", "transaction_time"),
}

第二步,我们指定DataFrame之间的关系。当两个DataFrame之间存在一对多的关系时,我们将“一” DataFrame称为“父” DataFrame。父子关系的定义如下: (父DataFrame, 父列, 子DataFrame, 子列)在这个数据集中,我们有两个关系。

[7]:
relationships = [
    ("sessions", "session_id", "transactions", "session_id"),
    ("customers", "customer_id", "sessions", "customer_id"),
]

Note

要管理设置 DataFrame 和关系,我们建议使用 EntitySet 类,该类提供了方便的 API 来管理这样的数据。有关更多信息,请参见 用EntitySets表示数据

运行深度特征合成#

DFS的最小输入是一个DataFrame字典、一个关系列表,以及我们想要计算特征的目标DataFrame的名称。DFS的输出是一个特征矩阵和相应的特征定义列表。让我们首先为数据中的每个客户创建一个特征矩阵。

[8]:
feature_matrix_customers, features_defs = ft.dfs(
    dataframes=dataframes,
    relationships=relationships,
    target_dataframe_name="customers",
)
feature_matrix_customers

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x10a71ff60> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x10a71ff60> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x10a71ff60> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x10a71ff60> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
[8]:
zip_code COUNT(sessions) MODE(sessions.device) NUM_UNIQUE(sessions.device) COUNT(transactions) MAX(transactions.amount) MEAN(transactions.amount) MIN(transactions.amount) MODE(transactions.product_id) NUM_UNIQUE(transactions.product_id) ... STD(sessions.SKEW(transactions.amount)) STD(sessions.SUM(transactions.amount)) SUM(sessions.MAX(transactions.amount)) SUM(sessions.MEAN(transactions.amount)) SUM(sessions.MIN(transactions.amount)) SUM(sessions.NUM_UNIQUE(transactions.product_id)) SUM(sessions.SKEW(transactions.amount)) SUM(sessions.STD(transactions.amount)) MODE(transactions.sessions.device) NUM_UNIQUE(transactions.sessions.device)
customer_id
1 60091 8 mobile 3 126 139.43 71.631905 5.81 4 5 ... 0.589386 279.510713 1057.97 582.193117 78.59 40.0 -0.476122 312.745952 mobile 3
2 13244 7 desktop 3 93 146.81 77.422366 8.73 4 5 ... 0.509798 251.609234 931.63 548.905851 154.60 35.0 -0.277640 258.700528 desktop 3
3 13244 6 desktop 3 93 149.15 67.060430 5.89 1 5 ... 0.429374 219.021420 847.63 405.237462 66.21 29.0 2.286086 257.299895 desktop 3
4 60091 8 mobile 3 109 149.95 80.070459 5.73 2 5 ... 0.387884 235.992478 1157.99 649.657515 131.51 37.0 0.002764 356.125829 mobile 3
5 60091 6 mobile 3 79 149.02 80.375443 7.55 5 5 ... 0.415426 402.775486 839.76 472.231119 86.49 30.0 0.014384 259.873954 mobile 3

5 rows × 75 columns

我们现在有数十种新功能来描述客户的行为。#### 更改目标DataFrameDFS如此强大的原因之一是它可以为我们实体集中的任何DataFrame创建特征矩阵。例如,如果我们想要为会话构建特征。

[10]:
feature_matrix_sessions, features_defs = ft.dfs(
    dataframes=dataframes, relationships=relationships, target_dataframe_name="sessions"
)
feature_matrix_sessions.head(5)

/Users/code/fin_tool/github/featuretools/venv/lib/python3.11/site-packages/woodwork/type_sys/utils.py:40: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x10a71ff60> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function max at 0x10a71ff60> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function min at 0x10a7280e0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function sum at 0x10a71f880> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function mean at 0x10a7289a0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  ).agg(to_agg)
/Users/code/fin_tool/github/featuretools/featuretools/computational_backends/feature_set_calculator.py:756: FutureWarning: The provided callable <function std at 0x10a728ae0> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  ).agg(to_agg)
[10]:
customer_id device COUNT(transactions) MAX(transactions.amount) MEAN(transactions.amount) MIN(transactions.amount) MODE(transactions.product_id) NUM_UNIQUE(transactions.product_id) SKEW(transactions.amount) STD(transactions.amount) ... customers.STD(transactions.amount) customers.SUM(transactions.amount) customers.DAY(birthday) customers.DAY(join_date) customers.MONTH(birthday) customers.MONTH(join_date) customers.WEEKDAY(birthday) customers.WEEKDAY(join_date) customers.YEAR(birthday) customers.YEAR(join_date)
session_id
1 2 desktop 16 141.66 76.813125 20.91 3 5 0.295458 41.600976 ... 37.705178 7200.28 18 15 8 4 0 6 1986 2012
2 5 mobile 10 135.25 74.696000 9.32 5 5 -0.160550 45.893591 ... 44.095630 6349.66 28 17 7 7 5 5 1984 2010
3 4 mobile 15 147.73 88.600000 8.70 1 5 -0.324012 46.240016 ... 45.068765 8727.68 15 8 8 4 1 4 2006 2011
4 1 mobile 25 129.00 64.557200 6.29 5 5 0.234349 40.187205 ... 40.442059 9025.62 18 17 7 4 0 6 1994 2011
5 4 mobile 11 139.20 70.638182 7.43 5 5 0.336381 48.918663 ... 45.068765 8727.68 15 8 8 4 1 4 2006 2011

5 rows × 44 columns

理解特征输出#

一般来说,Featuretools 通过特征名称引用生成的特征。为了使特征更易于理解,Featuretools 提供了两个额外的工具,featuretools.graph_feature()featuretools.describe_feature(),帮助解释特征是什么以及 Featuretools 生成它的步骤。让我们看一个示例特征:

[11]:
feature = features_defs[18]
feature

[11]:
<Feature: MODE(transactions.WEEKDAY(transaction_time))>

特征谱系图#

特征谱系图通过可视化方式展示特征生成的过程。从基础数据开始,逐步展示应用的原语和生成的中间特征,以创建最终特征。

[12]:
ft.graph_feature(feature)

[12]:
_images/index_22_0.svg
digraph "MODE(transactions.WEEKDAY(transaction_time))" {
	graph [bb="0,0,1203,153",
		rankdir=LR
	];
	node [label="\N",
		shape=box
	];
	edge [arrowhead=none,
		dir=forward,
		style=dotted
	];
	{
		graph [rank=min];
		"1_WEEKDAY(transaction_time)_weekday"	[height=0.94444,
			label=<<FONT POINT-SIZE="12"><B>Step 1:</B>   Transform<BR></BR></FONT>WEEKDAY>,
			pos="109,40.5",
			shape=diamond,
			width=3.0278];
	}
	sessions	[height=1.1493,
		label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10">
    <TR>
        <TD colspan="1" bgcolor="#A9A9A9"><B>★ sessions (target)</B></TD>
    </TR>
    <TR>
        <TD ALIGN="LEFT" port="MODE(transactions.WEEKDAY(transaction_time))" BGCOLOR="#D9EAD3">MODE(transactions.WEEKDAY(transaction_time))</TD>
    </TR>
</TABLE>>,
		pos="1040.8,77.5",
		shape=plaintext,
		width=4.5069];
	transactions	[height=2.125,
		label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="10">
    <TR>
        <TD colspan="1" bgcolor="#A9A9A9"><B>transactions</B></TD>
    </TR><TR><TD ALIGN="LEFT" port="session_id">session_id</TD></TR>
<TR><TD ALIGN="LEFT" port="transaction_time">transaction_time</TD></TR>
<TR><TD ALIGN="LEFT" port="WEEKDAY(transaction_time)">WEEKDAY(transaction_time)</TD></TR>
</TABLE>>,
		pos="357.38,76.5",
		shape=plaintext,
		width=2.8715];
	transactions:transaction_time -> "1_WEEKDAY(transaction_time)_weekday"	[arrowhead="",
		pos="e,181.23,52.423 261,58.375 238.74,58.375 214.77,56.436 192.63,53.836",
		style=solid];
	"MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id"	[height=0.50694,
		label="group by
session_id",
		pos="532.88,58.5",
		width=1.0035];
	transactions:"WEEKDAY(transaction_time)" -> "MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id"	[arrowhead="",
		pos="e,506.48,39.886 453.75,22.125 468.62,22.125 483.73,27.611 496.64,34.323",
		style=solid];
	transactions:session_id -> "MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id"	[pos="453.75,94.625 472.43,94.625 491.49,86.012 506.11,77.197"];
	"0_MODE(transactions.WEEKDAY(transaction_time))_mode"	[height=0.94444,
		label=<<FONT POINT-SIZE="12"><B>Step 2:</B>   Aggregation<BR></BR></FONT>MODE>,
		pos="723.75,58.5",
		shape=diamond,
		width=3.2986];
	"0_MODE(transactions.WEEKDAY(transaction_time))_mode" -> sessions:"MODE(transactions.WEEKDAY(transaction_time))"	[arrowhead="",
		pos="e,885.5,58.25 843.26,58.274 853.6,58.263 864,58.255 874.15,58.252",
		style=solid];
	"1_WEEKDAY(transaction_time)_weekday" -> transactions:"WEEKDAY(transaction_time)"	[arrowhead="",
		pos="e,261,22.125 180.46,28.346 202.21,25.409 226.56,22.926 249.73,22.285",
		style=solid];
	"MODE(transactions.WEEKDAY(transaction_time))_groupby_transactions--session_id" -> "0_MODE(transactions.WEEKDAY(transaction_time))_mode"	[arrowhead="",
		pos="e,603.48,58.5 569.4,58.5 576.38,58.5 584.07,58.5 592.19,58.5",
		style=solid];
}
特征描述#

Featuretools 还可以自动生成特征的英文句子描述。特征描述有助于解释特征的含义,并且可以通过包含手动定义的自定义定义来进一步改进。有关如何自定义自动生成的特征描述的更多详细信息,请参阅 :doc:/guides/feature_descriptions。

[13]:
ft.describe_feature(feature)

[13]:
'The most frequently occurring value of the day of the week of the "transaction_time" of all instances of "transactions" for each "session_id" in "sessions".'
接下来做什么?#
Table of contents#

Resources and References