Version 1.6#
Legend for changelogs
Major Feature something big that you couldn’t do before.
Feature something that you couldn’t do before.
Efficiency an existing feature now may not require as much computation or memory.
Enhancement a miscellaneous minor improvement.
Fix something that previously didn’t work as documented – or according to reasonable expectations – should now work.
API Change you will need to change your code to have the same effect in the future; or a feature will be removed in the future.
Version 1.6.0#
In Development
Support for Array API#
Additional estimators and functions have been updated to include support for all Array API compliant inputs.
See Array API support (experimental) for more details.
Functions:
sklearn.metrics.cluster.entropy
#29141 by Yaroslav Korobko;sklearn.metrics.mean_absolute_error
#27736 by Edoardo Abati and #29143 by Tialo and Loïc Estève;sklearn.metrics.mean_absolute_percentage_error
#29300 by Emily Chen;sklearn.metrics.mean_squared_error
#29142 by Yaroslav Korobko;sklearn.metrics.pairwise.additive_chi2_kernel
#29144 by Yaroslav Korobko;sklearn.metrics.pairwise.chi2_kernel
#29267 by Yaroslav Korobko;sklearn.metrics.pairwise.cosine_distances
#29265 by Emily Chen;sklearn.metrics.pairwise.cosine_similarity
#29014 by Edoardo Abati;sklearn.metrics.pairwise.euclidean_distances
#29433 by Omar Salman;sklearn.metrics.pairwise.linear_kernel
#29475 by Omar Salman;sklearn.metrics.pairwise.paired_cosine_distances
#29112 by Edoardo Abati.sklearn.metrics.pairwise.paired_euclidean_distances
#29389 by Emily Chen;sklearn.metrics.pairwise.polynomial_kernel
#29475 by Omar Salman;sklearn.metrics.pairwise.sigmoid_kernel
#29475 by Omar Salman.
Classes:
preprocessing.LabelEncoder
now supports Array API compatible inputs. #27381 by Omar Salman.model_selection.GridSearchCV
,model_selection.RandomizedSearchCV
,model_selection.HalvingGridSearchCV
andmodel_selection.HalvingRandomSearchCV
now support Array API compatible inputs when their base estimators do. #27096 by Tim Head and Olivier Grisel.
Other
Support for the soon to be deprecated
cupy.array_api
module has been removed in favor of directly supporting the top levelcupy
module, possibly via thearray_api_compat.cupy
compatibility wrapper. #29639 by Olivier Grisel.
Metadata Routing#
The following models now support metadata routing in one or more of their methods. Refer to the Metadata Routing User Guide for more details.
Feature
model_selection.learning_curve
now supports metadata routing for thefit
method of its estimator and for its underlying CV splitter and scorer. #28975 by Stefanie Senger.Feature
ensemble.StackingClassifier
andensemble.StackingRegressor
now support metadata routing and pass**fit_params
to the underlying estimators via theirfit
methods. #28701 by Stefanie Senger.Feature
compose.TransformedTargetRegressor
now supports metadata routing in itsfit
andpredict
methods and routes the corresponding params to the underlying regressor. #29136 by Omar Salman.Feature
feature_selection.SequentialFeatureSelector
now supports metadata routing in itsfit
method and passes the corresponding params to themodel_selection.cross_val_score
function. #29260 by Omar Salman.Feature
model_selection.validation_curve
now supports metadata routing for thefit
method of its estimator and for its underlying CV splitter and scorer. #29329 by Stefanie Senger.Feature
semi_supervised.SelfTrainingClassifier
now supports metadata routing. The fit method now accepts**fit_params
which are passed to the underlying estimators via theirfit
methods. In addition, thepredict
,predict_proba
,predict_log_proba
,score
anddecision_function
methods also accept**params
which are passed to the underlying estimators via their respective methods. #28494 by Adam Li.Feature
model_selection.permutation_test_score
now supports metadata routing for thefit
method of its estimator and for its underlying CV splitter and scorer. #29266 by Adam Li.Feature
feature_selection.RFE
andfeature_selection.RFECV
now support metadata routing. #29312 by Omar Salman.
Dropping support for building with setuptools#
From scikit-learn 1.6 onwards, support for building with setuptools has been removed. Meson is the only supported way to build scikit-learn, see Building from source for more details.
Dropping official support for PyPy#
Due to limited maintainer resources and small number of users, official PyPy support has been dropped. Some parts of scikit-learn may still work but PyPy is not tested anymore in the scikit-learn Continuous Integration. #29128 by Loïc Estève.
Changelog#
sklearn.base
#
Enhancement Added a function
base.is_clusterer
which determines whether a given estimator is of category clusterer. #28936 by Christian Veenhuis.
sklearn.cluster
#
API Change The
copy
parameter ofcluster.Birch
was deprecated in 1.6 and will be removed in 1.8. It has no effect as the estimator does not perform in-place operations on the input data. #29124 by Yao Xiao.
sklearn.compose
#
Enhancement
sklearn.compose.ColumnTransformer
verbose_feature_names_out
now accepts string format or callable to generate feature names. #28934 by Marc Bresson.
sklearn.cross_decomposition
#
Fix
cross_decomposition.PLSRegression
properly raises an error whenn_components
is larger thann_samples
. #29710 by Thomas Fan.
sklearn.datasets
#
Feature
datasets.fetch_file
allows downloading arbitrary data-file from the web. It handles local caching, integrity checks with SHA256 digests and automatic retries in case of HTTP errors. #29354 by Olivier Grisel.
sklearn.decomposition
#
Fix Increase rank defficiency threshold in the whitening step of
decomposition.FastICA
withwhiten_solver="eigh"
to improve the platform-agnosticity of the estimator. #29612 by Olivier Grisel.
sklearn.discriminant_analysis
#
Fix
discriminant_analysis.QuadraticDiscriminantAnalysis
will now causeLinAlgWarning
in case of collinear variables. These errors can be silenced using thereg_param
attribute. #19731 by Alihan Zihna.
sklearn.ensemble
#
Efficiency Small runtime improvement of fitting
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
by parallelizing the initial search for bin thresholds. #28064 by Christian Lorentzen.Enhancement The verbosity of
ensemble.HistGradientBoostingClassifier
andensemble.HistGradientBoostingRegressor
got a more granular control. Now,verbose = 1
prints only summary messages,verbose >= 2
prints the full information as before. #28179 by Christian Lorentzen.Efficiency
ensemble.IsolationForest
now runs parallel jobs during predict offering a speedup of up to 2-4x on sample sizes larger than 2000 usingjoblib
. #28622 by Adam Li and Sérgio Pereira.Feature
ensemble.ExtraTreesClassifier
andensemble.ExtraTreesRegressor
now support missing-values in the data matrixX
. Missing-values are handled by randomly moving all of the samples to the left, or right child node as the tree is traversed. #28268 by Adam Li.
sklearn.impute
#
Fix
impute.KNNImputer
excludes samples with nan distances when computing the mean value for uniform weights. #29135 by Xuefeng Xu.
sklearn.linear_model
#
API Change Deprecates
copy_X
inlinear_model.TheilSenRegressor
as the parameter has no effect.copy_X
will be removed in 1.8. #29105 by Adam Li.
sklearn.manifold
#
Efficiency
manifold.locally_linear_embedding
andmanifold.LocallyLinearEmbedding
now allocate more efficiently the memory of sparse matrices in the Hessian, Modified and LTSA methods. #28096 by Giorgio Angelotti.
sklearn.metrics
#
Enhancement
sklearn.metrics.check_scoring
now acceptsraise_exc
to specify whether to raise an exception if a subset of the scorers in multimetric scoring fails or to return an error code. #28992 by Stefanie Senger.Enhancement Adds
zero_division
tocohen_kappa_score
. When there is a division by zero, the metric is undefined and this value is returned. #29210 by Marc Torrellas Socastro and Stefanie Senger.API Change scoring=”neg_max_error” should be used instead of scoring=”max_error” which is now deprecated. #29462 by Farid “Freddie” Taba.
API Change the
assert_all_finite
parameter of functionsmetrics.pairwise.check_pairwise_arrays
andmetrics.pairwise_distances
is renamed intoensure_all_finite
.force_all_finite
will be removed in 1.8. #29404 by Jérémie du Boisberranger.
sklearn.model_selection
#
Enhancement Add the parameter
prefit
tomodel_selection.FixedThresholdClassifier
allowing the use of a pre-fitted estimator without re-fitting it. #29067 by Guillaume Lemaitre.Fix Improve error message when
model_selection.RepeatedStratifiedKFold.split
is called without ay
argument #29402 by Anurag Varma.
sklearn.neighbors
#
Fix
neighbors.LocalOutlierFactor
raises a warning in thefit
method when duplicate values in the training data lead to inaccurate outlier detection. #28773 by Henrique Caroço.
sklearn.preprocessing
#
Enhancement The HTML representation of
preprocessing.FunctionTransformer
will show the function name in the label. #29158 by Yao Xiao.Fix
preprocessing.PowerTransformer
now usesscipy.special.inv_boxcox
to outputnan
if the input of BoxCox’s inverse is invalid. #27875 by Xuefeng Xu.
sklearn.semi_supervised
#
API Change
semi_supervised.SelfTrainingClassifier
deprecated thebase_estimator
parameter in favor ofestimator
. #28494 by Adam Li.
sklearn.tree
#
Feature
tree.ExtraTreeClassifier
andtree.ExtraTreeRegressor
now support missing-values in the data matrixX
. Missing-values are handled by randomly moving all of the samples to the left, or right child node as the tree is traversed. #27966 by Adam Li.
sklearn.utils
#
Enhancement
utils.validation.check_array
now acceptsensure_non_negative
to check for negative values in the passed array, until now only available through callingutils.validation.check_non_negative
. #29540 by Tamara Atanasoska.API Change the
assert_all_finite
parameter of functionsutils.check_array
,utils.check_X_y
,utils.as_float_array
is renamed intoensure_all_finite
.force_all_finite
will be removed in 1.8. #29404 by Jérémie du Boisberranger.
Code and documentation contributors
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.5, including:
TODO: update at the time of the release.