@@ -240,6 +240,7 @@ in a list::
240
240
[ 0. ],
241
241
[ 1.22474487]])
242
242
243
+
243
244
Columns that don't need any transformation
244
245
******************************************
245
246
@@ -282,6 +283,59 @@ passing it as the ``default`` argument to the mapper:
282
283
Using ``default=False `` (the default) drops unselected columns. Using
283
284
``default=None `` pass the unselected columns unchanged.
284
285
286
+
287
+ Same transformer for the multiple columns
288
+ *****************************************
289
+
290
+ Sometimes it is required to apply the same transformation to several dataframe columns.
291
+ To simplify this process, the package provides ``gen_features `` function which accepts a list
292
+ of columns and feature transformer class (or list of classes), and generates a feature definition,
293
+ acceptable by ``DataFrameMapper ``.
294
+
295
+ For example, consider a dataset with three categorical columns, 'col1', 'col2', and 'col3',
296
+ To binarize each of them, one could pass column names and ``LabelBinarizer `` transformer class
297
+ into generator, and then use returned definition as ``features `` argument for ``DataFrameMapper ``:
298
+
299
+ >>> from sklearn_pandas import gen_features
300
+ >>> feature_def = gen_features(
301
+ ... columns= [' col1' , ' col2' , ' col3' ],
302
+ ... classes= [sklearn.preprocessing.LabelEncoder]
303
+ ... )
304
+ >>> feature_def
305
+ [('col1', [LabelEncoder()]), ('col2', [LabelEncoder()]), ('col3', [LabelEncoder()])]
306
+ >>> mapper5 = DataFrameMapper(feature_def)
307
+ >>> data5 = pd.DataFrame({
308
+ ... ' col1' : [' yes' , ' no' , ' yes' ],
309
+ ... ' col2' : [True , False , False ],
310
+ ... ' col3' : [' one' , ' two' , ' three' ]
311
+ ... })
312
+ >>> mapper5.fit_transform(data5)
313
+ array([[1, 1, 0],
314
+ [0, 0, 2],
315
+ [1, 0, 1]])
316
+
317
+ If it is required to override some of transformer parameters, then a dict with 'class' key and
318
+ transformer parameters should be provided. For example, consider a dataset with missing values.
319
+ Then the following code could be used to override default imputing strategy:
320
+
321
+ >>> feature_def = gen_features(
322
+ ... columns= [[' col1' ], [' col2' ], [' col3' ]],
323
+ ... classes= [{' class' : sklearn.preprocessing.Imputer, ' strategy' : ' most_frequent' }]
324
+ ... )
325
+ >>> mapper6 = DataFrameMapper(feature_def)
326
+ >>> data6 = pd.DataFrame({
327
+ ... ' col1' : [None , 1 , 1 , 2 , 3 ],
328
+ ... ' col2' : [True , False , None , None , True ],
329
+ ... ' col3' : [0 , 0 , 0 , None , None ]
330
+ ... })
331
+ >>> mapper6.fit_transform(data6)
332
+ array([[ 1., 1., 0.],
333
+ [ 1., 0., 0.],
334
+ [ 1., 1., 0.],
335
+ [ 2., 1., 0.],
336
+ [ 3., 1., 0.]])
337
+
338
+
285
339
Feature selection and other supervised transformations
286
340
******************************************************
287
341
0 commit comments