File tree Expand file tree Collapse file tree 1 file changed +1
-25
lines changed Expand file tree Collapse file tree 1 file changed +1
-25
lines changed Original file line number Diff line number Diff line change @@ -45,7 +45,7 @@ in the pipeline::
45
45
46
46
# X_trans_df is a pandas DataFrame
47
47
X_trans_df = num_preprocessor.fit_transform(X_df)
48
-
48
+
49
49
# X_trans_df is again a pandas DataFrame
50
50
X_trans_df = num_preprocessor[0].transform(X_df)
51
51
@@ -113,30 +113,6 @@ A list of issues discussing Pandas output are: `#14315
113
113
<https://github.com/scikit-learn/scikit-learn/pull/20100> `__, and `#23001
114
114
<https://github.com/scikit-learn/scikit-learn/issueas/23001> `__.
115
115
116
- Future Extensions
117
- -----------------
118
- For information only!
119
- Sparse Data
120
- ...........
121
-
122
- The Pandas DataFrame is not suitable to provide column names for sparse data
123
- because it has performance issues as shown in `#16772
124
- <https://github.com/scikit-learn/scikit-learn/pull/16772#issuecomment-615423097> `__.
125
- A future extension to this SLEP is to have a ``"pandas_or_namedsparse" `` option.
126
- This option will use a scikit-learn specific sparse container that subclasses
127
- SciPy's sparse matrices. This sparse container includes the sparse data, feature
128
- names and index. This enables pipelines with Vectorizers without performance
129
- issues::
130
-
131
- pipe = make_pipeline(
132
- CountVectorizer(),
133
- TfidfTransformer(),
134
- LogisticRegression(solver="liblinear")
135
- )
136
- pipe.set_output(transform="pandas_or_namedsparse")
137
-
138
- # feature names for logistic regression
139
- pipe[-1].feature_names_in_
140
116
141
117
References and Footnotes
142
118
------------------------
You can’t perform that action at this time.
0 commit comments