Skip to content

BUG: Fixed handling of non-list value_vars in melt #15351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

jbradt
Copy link

@jbradt jbradt commented Feb 8, 2017

The value_vars argument of melt is now cast to list like the id_vars argument. This works for tuples, lists, ndarrays, and scalars.

@@ -56,6 +56,20 @@ def test_value_vars(self):
columns=['id1', 'id2', 'variable', 'value'])
tm.assert_frame_equal(result4, expected4)

def test_value_vars_types(self):
result_with_tuple = melt(self.df, id_vars=['id1', 'id2'], value_vars=('A', 'B'))
result_with_array = melt(self.df, id_vars=['id1', 'id2'], value_vars=np.array(['A', 'B']))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do the result checking in a loop, use a list as well

@@ -771,6 +771,8 @@ def melt(frame, id_vars=None, value_vars=None, var_name=None,
if value_vars is not None:
if not isinstance(value_vars, (tuple, list, np.ndarray)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this checking (and in the clause above) should be not is_list_like(value_vars)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew note (0.20.0) bug fix, minor changes. ping on green.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 8, 2017
@codecov-io
Copy link

codecov-io commented Feb 9, 2017

Codecov Report

Merging #15351 into master will increase coverage by 3.04%.

@@            Coverage Diff             @@
##           master   #15351      +/-   ##
==========================================
+ Coverage    88.7%   91.75%   +3.04%     
==========================================
  Files         134      135       +1     
  Lines       49489    49439      -50     
==========================================
+ Hits        43901    45364    +1463     
+ Misses       5588     4075    -1513
Impacted Files Coverage Δ
pandas/core/reshape.py 99.25% <100%> (ø)
pandas/init.py 91.66% <ø> (-0.65%)
pandas/tseries/index.py 95.32% <ø> (-0.1%)
pandas/core/internals.py 94.15% <ø> (-0.04%)
pandas/util/nosetester.py
pandas/conftest.py 75% <ø> (ø)
pandas/util/_tester.py 35.29% <ø> (ø)
pandas/core/generic.py 96.39% <ø> (+0.05%)
pandas/formats/format.py 95.38% <ø> (+0.06%)
pandas/io/excel.py 79.54% <ø> (+0.15%)
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e884072...a2f2510. Read the comment docs.

@jbradt
Copy link
Author

jbradt commented Feb 9, 2017

I've addressed your requests.

One comment: in reshape.py, the type check is now

if not is_list_like(id_vars):
    id_vars = [id_vars]
else:
    id_vars = list(id_vars)

(and the same for value_vars). This seems a bit redundant to me since, unless I'm mistaken, list would work for both. But I thought I'd ask in case I'm missing something.

@jreback
Copy link
Contributor

jreback commented Feb 9, 2017

@jbradt no, this is quite common to do. We need a list result here even if a scalar is passed in, but you can't simply put list() around everything, e.g. things like datetimes or strings would be misinterpreted.

@jbradt
Copy link
Author

jbradt commented Feb 9, 2017

@jreback Ah, got it. I hadn't thought of strings, etc. Thanks for clearing that up!

@jreback
Copy link
Contributor

jreback commented Feb 9, 2017

@jbradt np. that why we have this special is_list_like to get this kind of consistency w/o having to repeat things all over.

@@ -56,6 +56,19 @@ def test_value_vars(self):
columns=['id1', 'id2', 'variable', 'value'])
tm.assert_frame_equal(result4, expected4)

def test_value_vars_types(self):
expected = DataFrame({'id1': self.df['id1'].tolist() * 2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number as a comment

@@ -517,3 +517,4 @@ Bug Fixes

- Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`)
- Bug in ``Series.replace`` and ``DataFrame.replace`` which failed on empty replacement dicts (:issue:`15289`)
- Bug in ``pd.melt`` where passing a tuple value for ``value_vars`` caused a TypeError (:issue:`15348`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put TypeError in double-back quotes

@jreback jreback added this to the 0.20.0 milestone Feb 9, 2017
@jreback
Copy link
Contributor

jreback commented Feb 9, 2017

just a couple of trvial edits. ping when green (might be a while as travis is doing lots of builds)

@jorisvandenbossche
Copy link
Member

Didn't look into the code, but: shouldn't tuples be reserved for specifying multi-indexed columns?
Not sure if that's very useful in the case of melt, though, as you get tuples in the columns of the resulting frame.

@jbradt
Copy link
Author

jbradt commented Feb 9, 2017

@jreback Changes made. Everything's green.

@jreback
Copy link
Contributor

jreback commented Feb 9, 2017

So from the doc-string examples

In [6]: df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
   ...: ...                    'B': {0: 1, 1: 3, 2: 5},
   ...: ...                    'C': {0: 2, 1: 4, 2: 6}})
   ...: 

In [7]: df
Out[7]: 
   A  B  C
0  a  1  2
1  b  3  4
2  c  5  6

In [9]: df.columns = [list('ABC'), list('DEF')]
   ...: 

In [10]: df
Out[10]: 
   A  B  C
   D  E  F
0  a  1  2
1  b  3  4
2  c  5  6

In [11]: pd.melt?

In [12]: pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])
Out[12]: 
  (A, D) variable_0 variable_1  value
0      a          B          E      1
1      b          B          E      3
2      c          B          E      5

So I think this is fine, a tuple is a-list like and NOT a scalar that selects multiple columns.

Actually you get a terrible error message now. If you pass a tuple with a MI.

In [13]: pd.melt(df, id_vars=[('A', 'D')], value_vars=('B', 'E'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-db2c189272b2> in <module>()
----> 1 pd.melt(df, id_vars=[('A', 'D')], value_vars=('B', 'E'))

/Users/jreback/pandas/pandas/core/reshape.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level)
    772         if not isinstance(value_vars, (tuple, list, np.ndarray)):
    773             value_vars = [value_vars]
--> 774         frame = frame.loc[:, id_vars + value_vars]
    775     else:
    776         frame = frame.copy()

TypeError: can only concatenate list (not "tuple") to list

@jbradt can you add this example. I think we need a better message here.

@jbradt
Copy link
Author

jbradt commented Feb 9, 2017

@jreback Sure, but I'm not totally sure I understand what you mean. Do you want an example like you wrote above that says that you must pass a list of tuples if you have a MultiIndex? Or do you want me to check in melt if there's a MultiIndex and raise a better error message if so?

Incidentally, the exception I get after my changes is much more heinous than what you showed:

In [8]: pd.melt(df, id_vars=[('A', 'D')], value_vars=('B', 'E'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-db2c189272b2> in <module>()
----> 1 pd.melt(df, id_vars=[('A', 'D')], value_vars=('B', 'E'))

/Users/josh/Documents/Code/pandas/pandas/core/reshape.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level)
    774         else:
    775             value_vars = list(value_vars)
--> 776         frame = frame.loc[:, id_vars + value_vars]
    777     else:
    778         frame = frame.copy()

/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1335             except (KeyError, IndexError):
   1336                 pass
-> 1337             return self._getitem_tuple(key)
   1338         else:
   1339             key = com._apply_if_callable(key, self.obj)

/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_tuple(self, tup)
    803     def _getitem_tuple(self, tup):
    804         try:
--> 805             return self._getitem_lowerdim(tup)
    806         except IndexingError:
    807             pass

/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
    915         # we may have a nested tuples indexer here
    916         if self._is_nested_tuple_indexer(tup):
--> 917             return self._getitem_nested_tuple(tup)
    918
    919         # we maybe be using a tuple to represent multiple dimensions here

/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_nested_tuple(self, tup)
    990
    991             current_ndim = obj.ndim
--> 992             obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
    993             axis += 1
    994

/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1538                     raise ValueError('Cannot index with multidimensional key')
   1539
-> 1540                 return self._getitem_iterable(key, axis=axis)
   1541
   1542             # nested tuple slicing

/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1084                 try:
   1085                     result = self.obj.reindex_axis(keyarr, axis=axis,
-> 1086                                                    level=level)
   1087
   1088                     # this is an error as we are trying to find

/Users/josh/Documents/Code/pandas/pandas/core/frame.py in reindex_axis(self, labels, axis, method, level, copy, limit, fill_value)
   2796                      self).reindex_axis(labels=labels, axis=axis,
   2797                                         method=method, level=level, copy=copy,
-> 2798                                         limit=limit, fill_value=fill_value)
   2799
   2800     @Appender(_shared_docs['rename'] % _shared_doc_kwargs)

/Users/josh/Documents/Code/pandas/pandas/core/generic.py in reindex_axis(self, labels, axis, method, level, copy, limit, fill_value)
   2413         method = missing.clean_reindex_fill_method(method)
   2414         new_index, indexer = axis_values.reindex(labels, method, level,
-> 2415                                                  limit=limit)
   2416         return self._reindex_with_indexers({axis: [new_index, indexer]},
   2417                                            fill_value=fill_value, copy=copy)

/Users/josh/Documents/Code/pandas/pandas/indexes/multi.py in reindex(self, target, method, level, limit, tolerance)
   1563             else:
   1564                 # hopefully?
-> 1565                 target = MultiIndex.from_tuples(target)
   1566
   1567         if (preserve_names and target.nlevels == self.nlevels and

/Users/josh/Documents/Code/pandas/pandas/indexes/multi.py in from_tuples(cls, tuples, sortorder, names)
   1012                 tuples = tuples._values
   1013
-> 1014             arrays = list(lib.tuples_to_object_array(tuples).T)
   1015         elif isinstance(tuples, list):
   1016             arrays = list(lib.to_object_array_tuples(tuples).T)

/Users/josh/Documents/Code/pandas/pandas/src/inference.pyx in pandas.lib.tuples_to_object_array (pandas/lib.c:68717)()
   1457     result = np.empty((n, k), dtype=object)
   1458     for i in range(n):
-> 1459         tup = tuples[i]
   1460         for j in range(k):
   1461             result[i, j] = tup[j]

TypeError: Expected tuple, got str

That also happens if value_vars=(('A', 'B')) since list flattens the tuple.

@jreback
Copy link
Contributor

jreback commented Feb 9, 2017

@jbradt right, I think we should explicity check & raise an better message if a tuple is passed (when you have a MI). It should be a list of tuples, so this is an easy condition to catch (and give a better message).

Joshua Bradt and others added 2 commits February 9, 2017 17:14
Any non-list type for id_vars or value_vars now raises
an error message if the DataFrame has a MultiIndex for
its columns.
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of minor changes. ping on green.

value_vars = [value_vars]
elif (isinstance(frame.columns, MultiIndex)
and not isinstance(value_vars, list)):
raise TypeError('value_vars must be a list of tuples when'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be ValueError (both errors)

'value': self.df1[('B', 'b')],
}, columns=[('A', 'a'), 'CAP', 'low', 'value'])

for id_vars in ([('A', 'a')], ('A', 'a')):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate this in to 2 parts, the success part (e.g. both lists), and the failing parts (the other 3 possibilites)

@jreback
Copy link
Contributor

jreback commented Feb 10, 2017

lgtm. ping on green.

@jreback
Copy link
Contributor

jreback commented Feb 10, 2017

@jbradt there is some noise here because of our switchover to pytest and numpy getting upgraded.

https://travis-ci.org/pandas-dev/pandas/jobs/200158415

but I think you have some linting errors. and use tm.assertRaisesRegex

@jbradt
Copy link
Author

jbradt commented Feb 10, 2017

@jreback I think that's an old build since I already fixed the linting error, but I just pushed a change to use tm.assertRaisesRegexp. Sometimes I forget about Python 2...

@jreback
Copy link
Contributor

jreback commented Feb 10, 2017

@jbradt sure no problem.

@jreback jreback closed this in 61deba5 Feb 10, 2017
@jreback
Copy link
Contributor

jreback commented Feb 10, 2017

thanks @jbradt

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
The value_vars argument of melt is now cast to list like the
id_vars argument.

closes pandas-dev#15348

Author: Joshua Bradt <[email protected]>
Author: Joshua Bradt <[email protected]>

Closes pandas-dev#15351 from jbradt/fix-melt and squashes the following commits:

a2f2510 [Joshua Bradt] Changed to tm.assertRaisesRegexp for Python 2 compat.
3038f64 [Joshua Bradt] Merge remote-tracking branch 'upstream/master' into fix-melt
e907135 [Joshua Bradt] Split test into two parts
20159c1 [Joshua Bradt] Changed exception classes to ValueError.
129d531 [Joshua Bradt] Moved binary operators to satisfy flake8
70d7256 [Joshua Bradt] Merge branch 'master' into fix-melt
455a310 [Joshua Bradt] Tested types when using MultiIndex to ensure they are lists.
7406222 [Joshua Bradt] Fixed formatting. Added comment with issue number to test.
d4c5da3 [Joshua Bradt] Improved type checking and tests. Added whatsnew note.
33728de [Joshua Bradt] BUG: Fixed handling of non-list value_vars in melt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.melt does not allow a tuple for value_vars
4 participants