dirty_cat
.DatetimeEncoder¶
Usage examples at the bottom of this page.
- class dirty_cat.DatetimeEncoder(extract_until='hour', add_day_of_the_week=False)[source]¶
This encoder transforms each datetime column into several numeric columns corresponding to temporal features, e.g year, month, day… Constant extracted features are dropped ; for instance, if the year is always the same in a feature, the extracted “year” column won’t be added. If the dates are timezone aware, all the features extracted will correspond to the provided timezone.
- Parameters:
extract_until ({"year", "month", "day", "hour", "minute", "second", "millisecond", "microsecond", "nanosecond"}, default="hour") – Extract up to this granularity. If all features have not been extracted, add the “total_time” feature, which contains the time to epoch (in seconds). For instance, if you specify “day”, only “year”, “month”, “day” and “total_time” features will be created.
add_day_of_the_week (bool, default=False) – Add day of the week feature (if day is extracted). This is a numerical feature from 0 (Monday) to 6 (Sunday).
- features_per_column_¶
Dictionary mapping the index of the original columns to the list of features extracted for each column.
- col_names_¶
List of the names of the features of the input data, if input data was a pandas DataFrame, otherwise None.
- Type:
List[str]
- fit(X, y=None)[source]¶
Fit the DatetimeEncoder to X. In practice, just stores which extracted features are not constant.
- Parameters:
X (array-like, shape (n_samples, n_features)) – Data where each column is a datetime feature.
- Returns:
Fitted DatetimeEncoder instance.
- Return type:
self
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- get_feature_names(input_features=None)[source]¶
Ensures compatibility with sklearn < 1.0, and returns the output of get_feature_names_out.
- get_feature_names_out(input_features=None)[source]¶
Returns clean feature names with format “<column_name>_<new_feature>” if the original data has column names, otherwise with format “<column_index>_<new_feature>”. new_feature is one of [“year”, “month”, “day”, “hour”, “minute”, “second”, “millisecond”, “microsecond”, “nanosecond”, “dayofweek”]
- get_params(deep=True)¶
Get parameters for this estimator.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- transform(X, y=None)[source]¶
Transform X by replacing each datetime column with corresponding numerical features.
- Parameters:
X (array-like, shape (n_samples, n_features)) – The data to transform, where each column is a datetime feature.
- Returns:
Transformed input.
- Return type:
array, shape (n_samples, n_features_out_)
Examples using dirty_cat.DatetimeEncoder
¶

Handling datetime features with the DatetimeEncoder