.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/ensemble/plot_forest_importances_faces.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_ensemble_plot_forest_importances_faces.py: ================================================= Pixel importances with a parallel forest of trees ================================================= This example shows the use of a forest of trees to evaluate the impurity based importance of the pixels in an image classification task on the faces dataset. The hotter the pixel, the more important it is. The code below also illustrates how the construction and the computation of the predictions can be parallelized within multiple jobs. .. GENERATED FROM PYTHON SOURCE LINES 14-18 .. code-block:: Python # Authors: The scikit-learn developers # SPDX-License-Identifier: BSD-3-Clause .. GENERATED FROM PYTHON SOURCE LINES 19-27 Loading the data and model fitting ---------------------------------- First, we load the olivetti faces dataset and limit the dataset to contain only the first five classes. Then we train a random forest on the dataset and evaluate the impurity-based feature importance. One drawback of this method is that it cannot be evaluated on a separate test set. For this example, we are interested in representing the information learned from the full dataset. Also, we'll set the number of cores to use for the tasks. .. GENERATED FROM PYTHON SOURCE LINES 27-29 .. code-block:: Python from sklearn.datasets import fetch_olivetti_faces .. GENERATED FROM PYTHON SOURCE LINES 30-32 We select the number of cores to use to perform parallel fitting of the forest model. `-1` means use all available cores. .. GENERATED FROM PYTHON SOURCE LINES 32-34 .. code-block:: Python n_jobs = -1 .. GENERATED FROM PYTHON SOURCE LINES 35-36 Load the faces dataset .. GENERATED FROM PYTHON SOURCE LINES 36-39 .. code-block:: Python data = fetch_olivetti_faces() X, y = data.data, data.target .. GENERATED FROM PYTHON SOURCE LINES 40-41 Limit the dataset to 5 classes. .. GENERATED FROM PYTHON SOURCE LINES 41-45 .. code-block:: Python mask = y < 5 X = X[mask] y = y[mask] .. GENERATED FROM PYTHON SOURCE LINES 46-47 A random forest classifier will be fitted to compute the feature importances. .. GENERATED FROM PYTHON SOURCE LINES 47-53 .. code-block:: Python from sklearn.ensemble import RandomForestClassifier forest = RandomForestClassifier(n_estimators=750, n_jobs=n_jobs, random_state=42) forest.fit(X, y) .. raw:: html
RandomForestClassifier(n_estimators=750, n_jobs=-1, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 54-64 Feature importance based on mean decrease in impurity (MDI) ----------------------------------------------------------- Feature importances are provided by the fitted attribute `feature_importances_` and they are computed as the mean and standard deviation of accumulation of the impurity decrease within each tree. .. warning:: Impurity-based feature importances can be misleading for **high cardinality** features (many unique values). See :ref:`permutation_importance` as an alternative. .. GENERATED FROM PYTHON SOURCE LINES 64-80 .. code-block:: Python import time import matplotlib.pyplot as plt start_time = time.time() img_shape = data.images[0].shape importances = forest.feature_importances_ elapsed_time = time.time() - start_time print(f"Elapsed time to compute the importances: {elapsed_time:.3f} seconds") imp_reshaped = importances.reshape(img_shape) plt.matshow(imp_reshaped, cmap=plt.cm.hot) plt.title("Pixel importances using impurity values") plt.colorbar() plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_forest_importances_faces_001.png :alt: Pixel importances using impurity values :srcset: /auto_examples/ensemble/images/sphx_glr_plot_forest_importances_faces_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Elapsed time to compute the importances: 0.144 seconds .. GENERATED FROM PYTHON SOURCE LINES 81-82 Can you still recognize a face? .. GENERATED FROM PYTHON SOURCE LINES 84-93 The limitations of MDI is not a problem for this dataset because: 1. All features are (ordered) numeric and will thus not suffer the cardinality bias 2. We are only interested to represent knowledge of the forest acquired on the training set. If these two conditions are not met, it is recommended to instead use the :func:`~sklearn.inspection.permutation_importance`. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.153 seconds) .. _sphx_glr_download_auto_examples_ensemble_plot_forest_importances_faces.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/ensemble/plot_forest_importances_faces.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_forest_importances_faces.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_forest_importances_faces.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_forest_importances_faces.zip ` .. include:: plot_forest_importances_faces.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_