-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: upcasting on reshaping ops #13247 #15594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Only rebasing and fixing the merge conflicts Original work done by: jennolsen84 Original branch: https://github.com/jennolsen84/pandas/tree/concatnan
pandas/tests/tools/test_concat.py
Outdated
@@ -1899,3 +1899,38 @@ def test_concat_multiindex_dfs_with_deepcopy(self): | |||
tm.assert_frame_equal(result_copy, expected) | |||
result_no_copy = pd.concat(example_dict, names=['testname']) | |||
tm.assert_frame_equal(result_no_copy, expected) | |||
|
|||
|
|||
def test_concat_no_unnecessary_upcats(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u write this in pytest form using parametrize (needs to be a separate function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but I need to read up on documentation and it might take some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure will be something like
@parametrize('dtype', np.sctypes('float'))
@parametrize('klass', [Series, DataFrame])
def test_concat_no_unecessary_upcasts(klass, dtype):
# dtype and klass are defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI only need to do Series/DataFrame/Panel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need pytest
module for using @parametrize
.
Should I create another class for pytest?
Same testing, but with pytest @parametrize
@jreback Copied the original test and used everything in it, except for |
pandas/tests/tools/test_concat.py
Outdated
@parametrize('klass', [Series, DataFrame, Panel]) | ||
def test_concat_no_unnecessary_upcats_pytest(self, dtype, klass): | ||
# GH 13247 | ||
for pdt in klass: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be misreading this, but I don't think you need the for loop over klass
anymore. pytest will handle that for you, since you're doing the parametrize.
I think this will also fail, since this class eventually inherits from TestCase
, which doesn't play well with parametrized fixtures. I'd try it like this:
import numpy as np
from pandas import Series, DataFrame, Panel
import pandas as pd
import pytest
class TestConcatUpcast(object):
@pytest.mark.parametrize('pdt', [Series, DataFrame, Panel])
def test_concat_no_unnecessary_upcats_pytest(self, pdt):
# GH 13247
dims = pdt().ndim
for dt in np.sctypes['float']:
dfs = [pdt(np.array([1], dtype=dt, ndmin=dims)),
pdt(np.array([np.nan], dtype=dt, ndmin=dims)),
pdt(np.array([5], dtype=dt, ndmin=dims))]
x = pd.concat(dfs)
assert x.values.dtype == dt
objs = []
objs.append(pdt(np.array([1], dtype=np.float32, ndmin=dims)))
objs.append(pdt(np.array([1], dtype=np.float16, ndmin=dims)))
assert pd.concat(objs).values.dtype == np.float32
objs = []
objs.append(pdt(np.array([1], dtype=np.int32, ndmin=dims)))
objs.append(pdt(np.array([1], dtype=np.int64, ndmin=dims)))
assert pd.concat(objs).values.dtype == np.int64
objs = []
objs.append(pdt(np.array([1], dtype=np.int32, ndmin=dims)))
objs.append(pdt(np.array([1], dtype=np.float16, ndmin=dims)))
assert pd.concat(objs).values.dtype == np.float64
This doesn't parametrize all that well over the dtype
argument, so probably best to just leave it there.
If you want to leave comments on #15608, would appreciate your insight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm okay.
I'm trying to set up the test environment, but conda doesn't like me.
I'll try your code after I finish setting it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were right.
I can't use @parametrize
without importing pytest
module.
I will try creating another class.
Fixed wrong location of is_float_dtypes and is_numeric_dtypes.
There are lots of things that needed to be fixed for this code, so I will keep working on it. After 9 minutes, I found some errors while I was copying things over to this branch and all the tests seem fine except for the one I created for pytest. Any guidance for pytest would be appreciated. |
Everything tests fine, except for pytest.
Everything tests fine except for the pytest. Waiting for suggestions.
you can run individual tests with Let us know if you have other questions. |
pandas/tests/tools/test_concat.py
Outdated
@@ -1899,3 +1900,64 @@ def test_concat_multiindex_dfs_with_deepcopy(self): | |||
tm.assert_frame_equal(result_copy, expected) | |||
result_no_copy = pd.concat(example_dict, names=['testname']) | |||
tm.assert_frame_equal(result_no_copy, expected) | |||
|
|||
def test_concat_no_unnecessary_upcast(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't need to be a class at all, just a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function as function inside of the class or function outside of the class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outside
pandas/tests/tools/test_concat.py
Outdated
pdt(np.array([np.nan], dtype=dt, ndmin=dims)), | ||
pdt(np.array([5], dtype=dt, ndmin=dims))] | ||
x = pd.concat(dfs) | ||
self.assertTrue(x.values.dtype == dt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just do assert x.values.dtype == dt
, no need for any reference to self
Pytest completed. Flake8 —diff completed.
I got pytest working with the code. |
Hi, do I also need to fix additional errors in |
pandas/tests/tools/test_concat.py
Outdated
# not sure what is the best answer here | ||
objs = [] | ||
objs.append(pdt(np.array([1], dtype=np.int32, ndmin=dims))) | ||
objs.append(pdt(np.array([1], dtype=np.float16, ndmin=dims))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need these tests as you have the ones below
pandas/tests/tools/test_concat.py
Outdated
|
||
@pytest.mark.parametrize('pdt', [pd.Series, pd.DataFrame, pd.Panel]) | ||
@pytest.mark.parametrize('dt', np.sctypes['float']) | ||
def test_concat_no_unnecessary_upcast_pytest(dt, pdt): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use _pytest in the name, just remove the tests above and use these
yes everything needs to pass. fix up the tests as indicated and see if you can narrow down where the failures are. |
# these are coerced to float unavoidably (as its a list-like to begin)
df = DataFrame(columns=['A', 'B'])
df.loc[3] = [6, 7]
exp = DataFrame([[6, 7]], index=[3], columns=['A', 'B'],
dtype='float64')
> tm.assert_frame_equal(df, exp) With this error, I am testing few things and I have few questions. This is not expected behavior correct? |
For test_partial, instead of checking for `float`, now it checks for `object` For test_reshape, instead of checking for `pandas.core.internals.SparseBlock` now it checks for `pandas.core.internals.IntBlock`
Codecov Report
@@ Coverage Diff @@
## master #15594 +/- ##
==========================================
- Coverage 91.03% 91.01% -0.03%
==========================================
Files 143 143
Lines 49382 49392 +10
==========================================
- Hits 44957 44953 -4
- Misses 4425 4439 +14
Continue to review full report at Codecov.
|
@jreback Is this acceptable? |
BUG: default_fill_value for get_dummies will be 0
BUG: default_fill_value for get_dummies will be 0
thanks @jaehoonhwang (and @jennolsen84 ) |
@jreback Thank you for the help. If there is any other old pr, I am happy to help. |
Original work done by @jennolsen84, in pandas-dev#13337 closes pandas-dev#13247 Author: Jaehoon Hwang <[email protected]> Author: Jae <[email protected]> Closes pandas-dev#15594 from jaehoonhwang/Bug13247 and squashes the following commits: 3cd1734 [Jaehoon Hwang] Pass the non-related tests in test_partial and test_reshape 1fa578b [Jaehoon Hwang] Applying request changes removing unnecessary test and renameing 6744636 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247 5bb72c7 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247 a1d5d40 [Jaehoon Hwang] Completed pytest 8122359 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247 0e52b74 [Jaehoon Hwang] Working: Except for pytest 8fec07c [Jaehoon Hwang] Fix: test_concat.py and internals.py 4f6c03e [Jaehoon Hwang] Fix: is_float_dtypes and is_numeric_dtype wrong place d3476c0 [Jaehoon Hwang] Merge branch 'master' into Bug13247 b977615 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' 4b1e5c6 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247 45f7ae9 [Jaehoon Hwang] Added pytest function 468baee [Jae] BUG: upcasting on reshaping ops pandas-dev#13247
BUG: default_fill_value for get_dummies will be 0
Original work done by @jennolsen84, in pandas-dev#13337 closes pandas-dev#13247 Author: Jaehoon Hwang <[email protected]> Author: Jae <[email protected]> Closes pandas-dev#15594 from jaehoonhwang/Bug13247 and squashes the following commits: 3cd1734 [Jaehoon Hwang] Pass the non-related tests in test_partial and test_reshape 1fa578b [Jaehoon Hwang] Applying request changes removing unnecessary test and renameing 6744636 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247 5bb72c7 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247 a1d5d40 [Jaehoon Hwang] Completed pytest 8122359 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247 0e52b74 [Jaehoon Hwang] Working: Except for pytest 8fec07c [Jaehoon Hwang] Fix: test_concat.py and internals.py 4f6c03e [Jaehoon Hwang] Fix: is_float_dtypes and is_numeric_dtype wrong place d3476c0 [Jaehoon Hwang] Merge branch 'master' into Bug13247 b977615 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' 4b1e5c6 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247 45f7ae9 [Jaehoon Hwang] Added pytest function 468baee [Jae] BUG: upcasting on reshaping ops pandas-dev#13247
BUG: default_fill_value for get_dummies will be 0
Only rebasing and fixing the merge conflicts
Original work done by: @jennolsen84
Original branch: https://github.com/jennolsen84/pandas/tree/concatnan
Original PR: #13337
git diff upstream/master | flake8 --diff