dirty_cat.DatetimeEncoder

Usage examples at the bottom of this page.

class dirty_cat.DatetimeEncoder(extract_until='hour', add_day_of_the_week=False)[source]

This encoder transforms each datetime column into several numeric columns corresponding to temporal features, e.g year, month, day… Constant extracted features are dropped ; for instance, if the year is always the same in a feature, the extracted “year” column won’t be added. If the dates are timezone aware, all the features extracted will correspond to the provided timezone.

Parameters:
  • extract_until ({"year", "month", "day", "hour", "minute", "second", "millisecond", "microsecond", "nanosecond"}, default="hour") – Extract up to this granularity. If all features have not been extracted, add the “total_time” feature, which contains the time to epoch (in seconds). For instance, if you specify “day”, only “year”, “month”, “day” and “total_time” features will be created.

  • add_day_of_the_week (bool, default=False) – Add day of the week feature (if day is extracted). This is a numerical feature from 0 (Monday) to 6 (Sunday).

n_features_out_

Number of features of the transformed data.

Type:

int

features_per_column_

Dictionary mapping the index of the original columns to the list of features extracted for each column.

Type:

Dict[int, List[str]]

col_names_

List of the names of the features of the input data, if input data was a pandas DataFrame, otherwise None.

Type:

List[str]

fit(X, y=None)[source]

Fit the DatetimeEncoder to X. In practice, just stores which extracted features are not constant.

Parameters:

X (array-like, shape (n_samples, n_features)) – Data where each column is a datetime feature.

Returns:

Fitted DatetimeEncoder instance.

Return type:

self

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_feature_names(input_features=None)[source]

Ensures compatibility with sklearn < 1.0, and returns the output of get_feature_names_out.

get_feature_names_out(input_features=None)[source]

Returns clean feature names with format “<column_name>_<new_feature>” if the original data has column names, otherwise with format “<column_index>_<new_feature>”. new_feature is one of [“year”, “month”, “day”, “hour”, “minute”, “second”, “millisecond”, “microsecond”, “nanosecond”, “dayofweek”]

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

transform(X, y=None)[source]

Transform X by replacing each datetime column with corresponding numerical features.

Parameters:

X (array-like, shape (n_samples, n_features)) – The data to transform, where each column is a datetime feature.

Returns:

Transformed input.

Return type:

array, shape (n_samples, n_features_out_)

Examples using dirty_cat.DatetimeEncoder

Handling datetime features with the DatetimeEncoder

Handling datetime features with the DatetimeEncoder

Handling datetime features with the DatetimeEncoder