GenericLearner

GenericLearner

GenericLearner

GenericLearner(learner_name: str, task: Task, label: Optional[str], weights: Optional[str], ranking_group: Optional[str], uplift_treatment: Optional[str], data_spec_args: DataSpecInferenceArgs, data_spec: Optional[DataSpecification], hyper_parameters: HyperParameters, explicit_learner_arguments: Optional[Set[str]], deployment_config: DeploymentConfig, tuner: Optional[AbstractTuner])

A generic YDF learner.

hyperparameters `property`

hyperparameters: HyperParameters

A (mutable) dictionary of this learner's hyperparameters.

This object can be used to inspect or modify hyperparameters after creating the learner. Modifying hyperparameters after constructing the learner is suitable for some advanced use cases. Since this approach bypasses some feasibility checks for the given set of hyperparameters, it generally better to re-create the learner for each model. The current set of hyperparameters can be validated manually with validate_hyperparameters().

cross_validation

cross_validation(ds: InputDataset, folds: int = 10, bootstrapping: Union[bool, int] = False, parallel_evaluations: int = 1) -> Evaluation

Cross-validates the learner and return the evaluation.

Usage example:

import pandas as pd
import ydf

dataset = pd.read_csv("my_dataset.csv")
learner = ydf.RandomForestLearner(label="label")
evaluation = learner.cross_validation(dataset)

# In a notebook, display an interractive evaluation
evaluation

# Print the evaluation
print(evaluation)

# Look at specific metrics
print(evaluation.accuracy)

Parameters:

Name	Type	Description	Default
`ds`	`InputDataset`	Dataset for the cross-validation.	required
`folds`	`int`	Number of cross-validation folds.	`10`
`bootstrapping`	`Union[bool, int]`	Controls whether bootstrapping is used to evaluate the confidence intervals and statistical tests (i.e., all the metrics ending with "[B]"). If set to false, bootstrapping is disabled. If set to true, bootstrapping is enabled and 2000 bootstrapping samples are used. If set to an integer, it specifies the number of bootstrapping samples to use. In this case, if the number is less than 100, an error is raised as bootstrapping will not yield useful results.	`False`
`parallel_evaluations`	`int`	Number of model to train and evaluate in parallel using multi-threading. Note that each model is potentially already trained with multithreading (see `num_threads` argument of Learner constructor).	`1`

Returns:

Type	Description
`Evaluation`	The cross-validation evaluation.

train

train(ds: InputDataset, valid: Optional[InputDataset] = None, verbose: Optional[Union[int, bool]] = None) -> ModelType

Trains a model on the given dataset.

Options for dataset reading are given on the learner. Consult the documentation of the learner or ydf.create_vertical_dataset() for additional information on dataset reading in YDF.

Usage example:

import ydf
import pandas as pd

train_ds = pd.read_csv(...)
test_ds = pd.read_csv(...)

learner = ydf.GradientBoostedTreesLearner(label="label")
model = learner.train(train_ds)
evaluation = model.evaluate(test_ds)

Usage example with a validation dataset:

import ydf
import pandas as pd

train_ds = pd.read_csv(...)
valid_ds = pd.read_csv(...)
test_ds = pd.read_csv(...)

learner = ydf.GradientBoostedTreesLearner(label="label")
model = learner.train(train_ds, valid=valid_ds)
evaluation = model.evaluate(test_ds)

If training is interrupted (for example, by interrupting the cell execution in Colab), the model will be returned to the state it was in at the moment of interruption.

Parameters:

Name	Type	Description	Default
`ds`	`InputDataset`	Training dataset.	required
`valid`	`Optional[InputDataset]`	Optional validation dataset. Some learners, such as Random Forest, do not need validation dataset. Some learners, such as GradientBoostedTrees, automatically extract a validation dataset from the training dataset if the validation dataset is not provided.	`None`
`verbose`	`Optional[Union[int, bool]]`	Verbose level during training. If None, uses the global verbose level of `ydf.verbose`. Levels are: 0 of False: No logs, 1 or True: Print a few logs in a notebook; prints all the logs in a terminal. 2: Prints all the logs on all surfaces.	`None`

Returns:

Type	Description
`ModelType`	A trained model.

validate_hyperparameters

validate_hyperparameters()

Returns None if the hyperparameters are valid, raises otherwise.

This method is called automatically before training, but users may call it to fail early. It makes sense to call this method when changing manually the hyper-paramters of the learner. This is a relatively advanced approach that is not recommende (it is better to re-create the learner in most cases).

Usage example:

import ydf
import pandas as pd

train_ds = pd.read_csv(...)

learner = ydf.GradientBoostedTreesLearner(label="label")
learner.hyperparameters["max_depth"] = 20
learner.validate_hyperparameters()
model = learner.train(train_ds)
evaluation = model.evaluate(test_ds)

GenericLearner

GenericLearner

hyperparameters property

cross_validation

train

validate_hyperparameters

hyperparameters `property`