-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fixed handling of non-list value_vars in melt #15351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/tests/test_reshape.py
Outdated
@@ -56,6 +56,20 @@ def test_value_vars(self): | |||
columns=['id1', 'id2', 'variable', 'value']) | |||
tm.assert_frame_equal(result4, expected4) | |||
|
|||
def test_value_vars_types(self): | |||
result_with_tuple = melt(self.df, id_vars=['id1', 'id2'], value_vars=('A', 'B')) | |||
result_with_array = melt(self.df, id_vars=['id1', 'id2'], value_vars=np.array(['A', 'B'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do the result checking in a loop, use a list as well
pandas/core/reshape.py
Outdated
@@ -771,6 +771,8 @@ def melt(frame, id_vars=None, value_vars=None, var_name=None, | |||
if value_vars is not None: | |||
if not isinstance(value_vars, (tuple, list, np.ndarray)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this checking (and in the clause above) should be not is_list_like(value_vars)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a whatsnew note (0.20.0) bug fix, minor changes. ping on green.
Codecov Report@@ Coverage Diff @@
## master #15351 +/- ##
==========================================
+ Coverage 88.7% 91.75% +3.04%
==========================================
Files 134 135 +1
Lines 49489 49439 -50
==========================================
+ Hits 43901 45364 +1463
+ Misses 5588 4075 -1513
Continue to review full report at Codecov.
|
I've addressed your requests. One comment: in if not is_list_like(id_vars):
id_vars = [id_vars]
else:
id_vars = list(id_vars) (and the same for |
@jbradt no, this is quite common to do. We need a list result here even if a scalar is passed in, but you can't simply put |
@jreback Ah, got it. I hadn't thought of strings, etc. Thanks for clearing that up! |
@jbradt np. that why we have this special |
pandas/tests/test_reshape.py
Outdated
@@ -56,6 +56,19 @@ def test_value_vars(self): | |||
columns=['id1', 'id2', 'variable', 'value']) | |||
tm.assert_frame_equal(result4, expected4) | |||
|
|||
def test_value_vars_types(self): | |||
expected = DataFrame({'id1': self.df['id1'].tolist() * 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number as a comment
doc/source/whatsnew/v0.20.0.txt
Outdated
@@ -517,3 +517,4 @@ Bug Fixes | |||
|
|||
- Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`) | |||
- Bug in ``Series.replace`` and ``DataFrame.replace`` which failed on empty replacement dicts (:issue:`15289`) | |||
- Bug in ``pd.melt`` where passing a tuple value for ``value_vars`` caused a TypeError (:issue:`15348`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put TypeError
in double-back quotes
just a couple of trvial edits. ping when green (might be a while as travis is doing lots of builds) |
Didn't look into the code, but: shouldn't tuples be reserved for specifying multi-indexed columns? |
@jreback Changes made. Everything's green. |
So from the doc-string examples
So I think this is fine, a tuple is a-list like and NOT a scalar that selects multiple columns. Actually you get a terrible error message now. If you pass a tuple with a MI.
@jbradt can you add this example. I think we need a better message here. |
@jreback Sure, but I'm not totally sure I understand what you mean. Do you want an example like you wrote above that says that you must pass a list of tuples if you have a Incidentally, the exception I get after my changes is much more heinous than what you showed: In [8]: pd.melt(df, id_vars=[('A', 'D')], value_vars=('B', 'E'))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-db2c189272b2> in <module>()
----> 1 pd.melt(df, id_vars=[('A', 'D')], value_vars=('B', 'E'))
/Users/josh/Documents/Code/pandas/pandas/core/reshape.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level)
774 else:
775 value_vars = list(value_vars)
--> 776 frame = frame.loc[:, id_vars + value_vars]
777 else:
778 frame = frame.copy()
/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in __getitem__(self, key)
1335 except (KeyError, IndexError):
1336 pass
-> 1337 return self._getitem_tuple(key)
1338 else:
1339 key = com._apply_if_callable(key, self.obj)
/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_tuple(self, tup)
803 def _getitem_tuple(self, tup):
804 try:
--> 805 return self._getitem_lowerdim(tup)
806 except IndexingError:
807 pass
/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
915 # we may have a nested tuples indexer here
916 if self._is_nested_tuple_indexer(tup):
--> 917 return self._getitem_nested_tuple(tup)
918
919 # we maybe be using a tuple to represent multiple dimensions here
/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_nested_tuple(self, tup)
990
991 current_ndim = obj.ndim
--> 992 obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
993 axis += 1
994
/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1538 raise ValueError('Cannot index with multidimensional key')
1539
-> 1540 return self._getitem_iterable(key, axis=axis)
1541
1542 # nested tuple slicing
/Users/josh/Documents/Code/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
1084 try:
1085 result = self.obj.reindex_axis(keyarr, axis=axis,
-> 1086 level=level)
1087
1088 # this is an error as we are trying to find
/Users/josh/Documents/Code/pandas/pandas/core/frame.py in reindex_axis(self, labels, axis, method, level, copy, limit, fill_value)
2796 self).reindex_axis(labels=labels, axis=axis,
2797 method=method, level=level, copy=copy,
-> 2798 limit=limit, fill_value=fill_value)
2799
2800 @Appender(_shared_docs['rename'] % _shared_doc_kwargs)
/Users/josh/Documents/Code/pandas/pandas/core/generic.py in reindex_axis(self, labels, axis, method, level, copy, limit, fill_value)
2413 method = missing.clean_reindex_fill_method(method)
2414 new_index, indexer = axis_values.reindex(labels, method, level,
-> 2415 limit=limit)
2416 return self._reindex_with_indexers({axis: [new_index, indexer]},
2417 fill_value=fill_value, copy=copy)
/Users/josh/Documents/Code/pandas/pandas/indexes/multi.py in reindex(self, target, method, level, limit, tolerance)
1563 else:
1564 # hopefully?
-> 1565 target = MultiIndex.from_tuples(target)
1566
1567 if (preserve_names and target.nlevels == self.nlevels and
/Users/josh/Documents/Code/pandas/pandas/indexes/multi.py in from_tuples(cls, tuples, sortorder, names)
1012 tuples = tuples._values
1013
-> 1014 arrays = list(lib.tuples_to_object_array(tuples).T)
1015 elif isinstance(tuples, list):
1016 arrays = list(lib.to_object_array_tuples(tuples).T)
/Users/josh/Documents/Code/pandas/pandas/src/inference.pyx in pandas.lib.tuples_to_object_array (pandas/lib.c:68717)()
1457 result = np.empty((n, k), dtype=object)
1458 for i in range(n):
-> 1459 tup = tuples[i]
1460 for j in range(k):
1461 result[i, j] = tup[j]
TypeError: Expected tuple, got str That also happens if |
@jbradt right, I think we should explicity check & raise an better message if a tuple is passed (when you have a MI). It should be a list of tuples, so this is an easy condition to catch (and give a better message). |
Any non-list type for id_vars or value_vars now raises an error message if the DataFrame has a MultiIndex for its columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of minor changes. ping on green.
pandas/core/reshape.py
Outdated
value_vars = [value_vars] | ||
elif (isinstance(frame.columns, MultiIndex) | ||
and not isinstance(value_vars, list)): | ||
raise TypeError('value_vars must be a list of tuples when' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should be ValueError (both errors)
pandas/tests/test_reshape.py
Outdated
'value': self.df1[('B', 'b')], | ||
}, columns=[('A', 'a'), 'CAP', 'low', 'value']) | ||
|
||
for id_vars in ([('A', 'a')], ('A', 'a')): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate this in to 2 parts, the success part (e.g. both lists), and the failing parts (the other 3 possibilites)
lgtm. ping on green. |
@jbradt there is some noise here because of our switchover to pytest and numpy getting upgraded. https://travis-ci.org/pandas-dev/pandas/jobs/200158415 but I think you have some linting errors. and use |
@jreback I think that's an old build since I already fixed the linting error, but I just pushed a change to use |
@jbradt sure no problem. |
thanks @jbradt |
The value_vars argument of melt is now cast to list like the id_vars argument. closes pandas-dev#15348 Author: Joshua Bradt <[email protected]> Author: Joshua Bradt <[email protected]> Closes pandas-dev#15351 from jbradt/fix-melt and squashes the following commits: a2f2510 [Joshua Bradt] Changed to tm.assertRaisesRegexp for Python 2 compat. 3038f64 [Joshua Bradt] Merge remote-tracking branch 'upstream/master' into fix-melt e907135 [Joshua Bradt] Split test into two parts 20159c1 [Joshua Bradt] Changed exception classes to ValueError. 129d531 [Joshua Bradt] Moved binary operators to satisfy flake8 70d7256 [Joshua Bradt] Merge branch 'master' into fix-melt 455a310 [Joshua Bradt] Tested types when using MultiIndex to ensure they are lists. 7406222 [Joshua Bradt] Fixed formatting. Added comment with issue number to test. d4c5da3 [Joshua Bradt] Improved type checking and tests. Added whatsnew note. 33728de [Joshua Bradt] BUG: Fixed handling of non-list value_vars in melt
The
value_vars
argument ofmelt
is now cast tolist
like theid_vars
argument. This works for tuples, lists, ndarrays, and scalars.git diff upstream/master | flake8 --diff