BUG: Fixed handling of non-list value_vars in melt #15351

jbradt · 2017-02-08T18:27:15Z

The value_vars argument of melt is now cast to list like the id_vars argument. This works for tuples, lists, ndarrays, and scalars.

closes pd.melt does not allow a tuple for value_vars #15348
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Closes pandas-dev#15348.

jreback · 2017-02-08T20:24:54Z

pandas/tests/test_reshape.py

@@ -56,6 +56,20 @@ def test_value_vars(self):
                              columns=['id1', 'id2', 'variable', 'value'])
        tm.assert_frame_equal(result4, expected4)

+    def test_value_vars_types(self):
+        result_with_tuple = melt(self.df, id_vars=['id1', 'id2'], value_vars=('A', 'B'))
+        result_with_array = melt(self.df, id_vars=['id1', 'id2'], value_vars=np.array(['A', 'B']))


you can do the result checking in a loop, use a list as well

jreback · 2017-02-08T20:27:07Z

pandas/core/reshape.py

@@ -771,6 +771,8 @@ def melt(frame, id_vars=None, value_vars=None, var_name=None,
    if value_vars is not None:
        if not isinstance(value_vars, (tuple, list, np.ndarray)):


this checking (and in the clause above) should be not is_list_like(value_vars)

jreback

pls add a whatsnew note (0.20.0) bug fix, minor changes. ping on green.

codecov-io · 2017-02-09T00:33:31Z

Codecov Report

Merging #15351 into master will increase coverage by 3.04%.

@@            Coverage Diff             @@
##           master   #15351      +/-   ##
==========================================
+ Coverage    88.7%   91.75%   +3.04%     
==========================================
  Files         134      135       +1     
  Lines       49489    49439      -50     
==========================================
+ Hits        43901    45364    +1463     
+ Misses       5588     4075    -1513

Impacted Files	Coverage Δ
pandas/core/reshape.py	`99.25% <100%> (ø)`	✅
pandas/init.py	`91.66% <ø> (-0.65%)`	❌
pandas/tseries/index.py	`95.32% <ø> (-0.1%)`	❌
pandas/core/internals.py	`94.15% <ø> (-0.04%)`	❌
pandas/util/nosetester.py
pandas/conftest.py	`75% <ø> (ø)`
pandas/util/_tester.py	`35.29% <ø> (ø)`
pandas/core/generic.py	`96.39% <ø> (+0.05%)`	✅
pandas/formats/format.py	`95.38% <ø> (+0.06%)`	✅
pandas/io/excel.py	`79.54% <ø> (+0.15%)`	✅
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e884072...a2f2510. Read the comment docs.

jbradt · 2017-02-09T00:37:50Z

I've addressed your requests.

One comment: in reshape.py, the type check is now

if not is_list_like(id_vars):
    id_vars = [id_vars]
else:
    id_vars = list(id_vars)

(and the same for value_vars). This seems a bit redundant to me since, unless I'm mistaken, list would work for both. But I thought I'd ask in case I'm missing something.

jreback · 2017-02-09T00:49:28Z

@jbradt no, this is quite common to do. We need a list result here even if a scalar is passed in, but you can't simply put list() around everything, e.g. things like datetimes or strings would be misinterpreted.

jbradt · 2017-02-09T00:52:51Z

@jreback Ah, got it. I hadn't thought of strings, etc. Thanks for clearing that up!

jreback · 2017-02-09T01:13:48Z

@jbradt np. that why we have this special is_list_like to get this kind of consistency w/o having to repeat things all over.

jreback · 2017-02-09T01:14:50Z

pandas/tests/test_reshape.py

@@ -56,6 +56,19 @@ def test_value_vars(self):
                              columns=['id1', 'id2', 'variable', 'value'])
        tm.assert_frame_equal(result4, expected4)

+    def test_value_vars_types(self):
+        expected = DataFrame({'id1': self.df['id1'].tolist() * 2,


can you add the issue number as a comment

jreback · 2017-02-09T01:15:10Z

doc/source/whatsnew/v0.20.0.txt

@@ -517,3 +517,4 @@ Bug Fixes

 - Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`)
 - Bug in ``Series.replace`` and ``DataFrame.replace`` which failed on empty replacement dicts (:issue:`15289`)
+- Bug in ``pd.melt`` where passing a tuple value for ``value_vars`` caused a TypeError (:issue:`15348`)


put TypeError in double-back quotes

jreback · 2017-02-09T01:15:48Z

just a couple of trvial edits. ping when green (might be a while as travis is doing lots of builds)

jorisvandenbossche · 2017-02-09T09:21:24Z

Didn't look into the code, but: shouldn't tuples be reserved for specifying multi-indexed columns?
Not sure if that's very useful in the case of melt, though, as you get tuples in the columns of the resulting frame.

jbradt · 2017-02-09T15:34:19Z

@jreback Changes made. Everything's green.

jreback · 2017-02-09T16:02:08Z

So from the doc-string examples

In [6]: df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
   ...: ...                    'B': {0: 1, 1: 3, 2: 5},
   ...: ...                    'C': {0: 2, 1: 4, 2: 6}})
   ...: 

In [7]: df
Out[7]: 
   A  B  C
0  a  1  2
1  b  3  4
2  c  5  6

In [9]: df.columns = [list('ABC'), list('DEF')]
   ...: 

In [10]: df
Out[10]: 
   A  B  C
   D  E  F
0  a  1  2
1  b  3  4
2  c  5  6

In [11]: pd.melt?

In [12]: pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])
Out[12]: 
  (A, D) variable_0 variable_1  value
0      a          B          E      1
1      b          B          E      3
2      c          B          E      5

So I think this is fine, a tuple is a-list like and NOT a scalar that selects multiple columns.

Actually you get a terrible error message now. If you pass a tuple with a MI.

In [13]: pd.melt(df, id_vars=[('A', 'D')], value_vars=('B', 'E'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-db2c189272b2> in <module>()
----> 1 pd.melt(df, id_vars=[('A', 'D')], value_vars=('B', 'E'))

/Users/jreback/pandas/pandas/core/reshape.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level)
    772         if not isinstance(value_vars, (tuple, list, np.ndarray)):
    773             value_vars = [value_vars]
--> 774         frame = frame.loc[:, id_vars + value_vars]
    775     else:
    776         frame = frame.copy()

TypeError: can only concatenate list (not "tuple") to list

@jbradt can you add this example. I think we need a better message here.

jbradt · 2017-02-09T19:01:55Z

@jreback Sure, but I'm not totally sure I understand what you mean. Do you want an example like you wrote above that says that you must pass a list of tuples if you have a MultiIndex? Or do you want me to check in melt if there's a MultiIndex and raise a better error message if so?

Incidentally, the exception I get after my changes is much more heinous than what you showed:

In [8]: pd.melt(df, id_vars=[('A', 'D')], value_vars=('B', 'E'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-db2c189272b2> in <module>()
----> 1 pd.melt(df, id_vars=[('A', 'D')], value_vars=('B', 'E'))

/Users/josh/Documents/Code/pandas/pandas/core/reshape.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level)
    774         else:
    775             value_vars = list(value_vars)
--> 776         frame = frame.loc[:, id_vars + value_vars]
    777     else:
    778         frame = frame.copy()

/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1335             except (KeyError, IndexError):
   1336                 pass
-> 1337             return self._getitem_tuple(key)
   1338         else:
   1339             key = com._apply_if_callable(key, self.obj)

/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_tuple(self, tup)
    803     def _getitem_tuple(self, tup):
    804         try:
--> 805             return self._getitem_lowerdim(tup)
    806         except IndexingError:
    807             pass

/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
    915         # we may have a nested tuples indexer here
    916         if self._is_nested_tuple_indexer(tup):
--> 917             return self._getitem_nested_tuple(tup)
    918
    919         # we maybe be using a tuple to represent multiple dimensions here

/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_nested_tuple(self, tup)
    990
    991             current_ndim = obj.ndim
--> 992             obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
    993             axis += 1
    994

/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1538                     raise ValueError('Cannot index with multidimensional key')
   1539
-> 1540                 return self._getitem_iterable(key, axis=axis)
   1541
   1542             # nested tuple slicing

/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1084                 try:
   1085                     result = self.obj.reindex_axis(keyarr, axis=axis,
-> 1086                                                    level=level)
   1087
   1088                     # this is an error as we are trying to find

/Users/josh/Documents/Code/pandas/pandas/core/frame.py in reindex_axis(self, labels, axis, method, level, copy, limit, fill_value)
   2796                      self).reindex_axis(labels=labels, axis=axis,
   2797                                         method=method, level=level, copy=copy,
-> 2798                                         limit=limit, fill_value=fill_value)
   2799
   2800     @Appender(_shared_docs['rename'] % _shared_doc_kwargs)

/Users/josh/Documents/Code/pandas/pandas/core/generic.py in reindex_axis(self, labels, axis, method, level, copy, limit, fill_value)
   2413         method = missing.clean_reindex_fill_method(method)
   2414         new_index, indexer = axis_values.reindex(labels, method, level,
-> 2415                                                  limit=limit)
   2416         return self._reindex_with_indexers({axis: [new_index, indexer]},
   2417                                            fill_value=fill_value, copy=copy)

/Users/josh/Documents/Code/pandas/pandas/indexes/multi.py in reindex(self, target, method, level, limit, tolerance)
   1563             else:
   1564                 # hopefully?
-> 1565                 target = MultiIndex.from_tuples(target)
   1566
   1567         if (preserve_names and target.nlevels == self.nlevels and

/Users/josh/Documents/Code/pandas/pandas/indexes/multi.py in from_tuples(cls, tuples, sortorder, names)
   1012                 tuples = tuples._values
   1013
-> 1014             arrays = list(lib.tuples_to_object_array(tuples).T)
   1015         elif isinstance(tuples, list):
   1016             arrays = list(lib.to_object_array_tuples(tuples).T)

/Users/josh/Documents/Code/pandas/pandas/src/inference.pyx in pandas.lib.tuples_to_object_array (pandas/lib.c:68717)()
   1457     result = np.empty((n, k), dtype=object)
   1458     for i in range(n):
-> 1459         tup = tuples[i]
   1460         for j in range(k):
   1461             result[i, j] = tup[j]

TypeError: Expected tuple, got str

That also happens if value_vars=(('A', 'B')) since list flattens the tuple.

jreback · 2017-02-09T19:31:23Z

@jbradt right, I think we should explicity check & raise an better message if a tuple is passed (when you have a MI). It should be a list of tuples, so this is an easy condition to catch (and give a better message).

Any non-list type for id_vars or value_vars now raises an error message if the DataFrame has a MultiIndex for its columns.

jreback

couple of minor changes. ping on green.

jreback · 2017-02-09T23:49:53Z

pandas/core/reshape.py

            value_vars = [value_vars]
+        elif (isinstance(frame.columns, MultiIndex)
+              and not isinstance(value_vars, list)):
+            raise TypeError('value_vars must be a list of tuples when'


these should be ValueError (both errors)

jreback · 2017-02-09T23:51:21Z

pandas/tests/test_reshape.py

+            'value': self.df1[('B', 'b')],
+        }, columns=[('A', 'a'), 'CAP', 'low', 'value'])
+
+        for id_vars in ([('A', 'a')], ('A', 'a')):


separate this in to 2 parts, the success part (e.g. both lists), and the failing parts (the other 3 possibilites)

jreback · 2017-02-10T15:35:06Z

lgtm. ping on green.

jreback · 2017-02-10T18:32:58Z

@jbradt there is some noise here because of our switchover to pytest and numpy getting upgraded.

https://travis-ci.org/pandas-dev/pandas/jobs/200158415

but I think you have some linting errors. and use tm.assertRaisesRegex

jbradt · 2017-02-10T18:42:02Z

@jreback I think that's an old build since I already fixed the linting error, but I just pushed a change to use tm.assertRaisesRegexp. Sometimes I forget about Python 2...

jreback · 2017-02-10T18:49:15Z

@jbradt sure no problem.

jreback · 2017-02-10T19:51:12Z

thanks @jbradt

The value_vars argument of melt is now cast to list like the id_vars argument. closes pandas-dev#15348 Author: Joshua Bradt <[email protected]> Author: Joshua Bradt <[email protected]> Closes pandas-dev#15351 from jbradt/fix-melt and squashes the following commits: a2f2510 [Joshua Bradt] Changed to tm.assertRaisesRegexp for Python 2 compat. 3038f64 [Joshua Bradt] Merge remote-tracking branch 'upstream/master' into fix-melt e907135 [Joshua Bradt] Split test into two parts 20159c1 [Joshua Bradt] Changed exception classes to ValueError. 129d531 [Joshua Bradt] Moved binary operators to satisfy flake8 70d7256 [Joshua Bradt] Merge branch 'master' into fix-melt 455a310 [Joshua Bradt] Tested types when using MultiIndex to ensure they are lists. 7406222 [Joshua Bradt] Fixed formatting. Added comment with issue number to test. d4c5da3 [Joshua Bradt] Improved type checking and tests. Added whatsnew note. 33728de [Joshua Bradt] BUG: Fixed handling of non-list value_vars in melt

BUG: Fixed handling of non-list value_vars in melt

33728de

Closes pandas-dev#15348.

jreback reviewed Feb 8, 2017

View reviewed changes

jreback requested changes Feb 8, 2017

View reviewed changes

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 8, 2017

Improved type checking and tests. Added whatsnew note.

d4c5da3

jreback reviewed Feb 9, 2017

View reviewed changes

jreback added this to the 0.20.0 milestone Feb 9, 2017

jreback approved these changes Feb 9, 2017

View reviewed changes

Fixed formatting. Added comment with issue number to test.

7406222

Joshua Bradt and others added 2 commits February 9, 2017 17:14

Tested types when using MultiIndex to ensure they are lists.

455a310

Any non-list type for id_vars or value_vars now raises an error message if the DataFrame has a MultiIndex for its columns.

Merge branch 'master' into fix-melt

70d7256

jreback requested changes Feb 9, 2017

View reviewed changes

Joshua Bradt added 4 commits February 10, 2017 09:26

Moved binary operators to satisfy flake8

129d531

Changed exception classes to ValueError.

20159c1

Split test into two parts

e907135

Merge remote-tracking branch 'upstream/master' into fix-melt

3038f64

Changed to tm.assertRaisesRegexp for Python 2 compat.

a2f2510

jreback closed this in 61deba5 Feb 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fixed handling of non-list value_vars in melt #15351

BUG: Fixed handling of non-list value_vars in melt #15351

jbradt commented Feb 8, 2017 •

edited

Loading

jreback Feb 8, 2017

jreback Feb 8, 2017

jreback left a comment

codecov-io commented Feb 9, 2017 •

edited

Loading

jbradt commented Feb 9, 2017

jreback commented Feb 9, 2017

jbradt commented Feb 9, 2017

jreback commented Feb 9, 2017

jreback Feb 9, 2017

jreback Feb 9, 2017

jreback commented Feb 9, 2017

jorisvandenbossche commented Feb 9, 2017

jbradt commented Feb 9, 2017

jreback commented Feb 9, 2017 •

edited

Loading

jbradt commented Feb 9, 2017

jreback commented Feb 9, 2017

jreback left a comment

jreback Feb 9, 2017

jreback Feb 9, 2017

jreback commented Feb 10, 2017

jreback commented Feb 10, 2017

jbradt commented Feb 10, 2017 •

edited

Loading

jreback commented Feb 10, 2017

jreback commented Feb 10, 2017

		@@ -771,6 +771,8 @@ def melt(frame, id_vars=None, value_vars=None, var_name=None,
		if value_vars is not None:
		if not isinstance(value_vars, (tuple, list, np.ndarray)):

BUG: Fixed handling of non-list value_vars in melt #15351

BUG: Fixed handling of non-list value_vars in melt #15351

Conversation

jbradt commented Feb 8, 2017 • edited Loading

jreback Feb 8, 2017

Choose a reason for hiding this comment

jreback Feb 8, 2017

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

codecov-io commented Feb 9, 2017 • edited Loading

Codecov Report

jbradt commented Feb 9, 2017

jreback commented Feb 9, 2017

jbradt commented Feb 9, 2017

jreback commented Feb 9, 2017

jreback Feb 9, 2017

Choose a reason for hiding this comment

jreback Feb 9, 2017

Choose a reason for hiding this comment

jreback commented Feb 9, 2017

jorisvandenbossche commented Feb 9, 2017

jbradt commented Feb 9, 2017

jreback commented Feb 9, 2017 • edited Loading

jbradt commented Feb 9, 2017

jreback commented Feb 9, 2017

jreback left a comment

Choose a reason for hiding this comment

jreback Feb 9, 2017

Choose a reason for hiding this comment

jreback Feb 9, 2017

Choose a reason for hiding this comment

jreback commented Feb 10, 2017

jreback commented Feb 10, 2017

jbradt commented Feb 10, 2017 • edited Loading

jreback commented Feb 10, 2017

jreback commented Feb 10, 2017

jbradt commented Feb 8, 2017 •

edited

Loading

codecov-io commented Feb 9, 2017 •

edited

Loading

jreback commented Feb 9, 2017 •

edited

Loading

jbradt commented Feb 10, 2017 •

edited

Loading