Skip to content

Commit 79a79f1

Browse files
committed
Add tests and documentation for column feature
1 parent 7df87bb commit 79a79f1

File tree

2 files changed

+43
-0
lines changed

2 files changed

+43
-0
lines changed

README.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,18 @@ Now that the transformation is trained, we confirm that it works on new data::
103103
array([[ 1. , 0. , 0. , 1.04]])
104104

105105

106+
Output features names
107+
*********************
108+
109+
In certain cases, like when studying the feature importances for some model,
110+
we want to be able to associate the original features to the ones generated by
111+
the dataframe mapper. We can do so by inspecting the automatically generated
112+
``transformed_names_`` attribute of the mapper after transformation::
113+
114+
>>> mapper.transformed_names_
115+
['pet_cat', 'pet_dog', 'pet_fish', 'children']
116+
117+
106118
Outputting a dataframe
107119
**********************
108120

@@ -123,6 +135,9 @@ By default the output of the dataframe mapper is a numpy array. This is so becau
123135
6 1.0 0.0 0.0 1.04
124136
7 0.0 0.0 1.0 0.21
125137

138+
The names for the columns are the same ones present in the ``transformed_names_``
139+
attribute.
140+
126141
Note this does not work together with the ``default=True`` or ``sparse=True`` arguments to the mapper.
127142

128143
Transform Multiple Columns
@@ -252,6 +267,11 @@ Sklearn-pandas' ``cross_val_score`` function provides exactly the same interface
252267
Changelog
253268
---------
254269

270+
Development
271+
***********
272+
* Capture output columns generated names in ``transformed_names_`` attribute (#78).
273+
274+
255275
1.3.0 (2017-01-21)
256276
******************
257277

@@ -308,5 +328,6 @@ Other contributors:
308328
* Jeremy Howard
309329
* Olivier Grisel
310330
* Paul Butler
331+
* Ritesh Agrawal
311332
* Vitaley Zaretskey
312333
* Zac Stewart

tests/test_dataframe_mapper.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,28 @@ def complex_dataframe():
9696
'feat2': [1, 2, 3, 2, 3, 4]})
9797

9898

99+
def test_transformed_names_simple(simple_dataframe):
100+
"""
101+
Get transformed names of features in `transformed_names` attribute
102+
for simple transformation
103+
"""
104+
df = simple_dataframe
105+
mapper = DataFrameMapper([('a', None)])
106+
mapper.fit_transform(df)
107+
assert mapper.transformed_names_ == ['a']
108+
109+
110+
def test_transformed_names_binarizer(complex_dataframe):
111+
"""
112+
Get transformed names of features in `transformed_names` attribute
113+
for a transformation that multiplies the number of columns
114+
"""
115+
df = complex_dataframe
116+
mapper = DataFrameMapper([('target', LabelBinarizer())])
117+
mapper.fit_transform(df)
118+
mapper.transformed_names_ == ['target_a', 'target_b']
119+
120+
99121
def test_simple_df(simple_dataframe):
100122
"""
101123
Get a dataframe from a simple mapped dataframe

0 commit comments

Comments
 (0)