Skip to content

BUG: upcasting on reshaping ops #13247 #15594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

jaehoonhwang
Copy link
Contributor

@jaehoonhwang jaehoonhwang commented Mar 6, 2017

Only rebasing and fixing the merge conflicts
Original work done by: @jennolsen84
Original branch: https://github.com/jennolsen84/pandas/tree/concatnan
Original PR: #13337

Only rebasing and fixing the merge conflicts
Original work done by: jennolsen84
Original branch: https://github.com/jennolsen84/pandas/tree/concatnan
@@ -1899,3 +1899,38 @@ def test_concat_multiindex_dfs_with_deepcopy(self):
tm.assert_frame_equal(result_copy, expected)
result_no_copy = pd.concat(example_dict, names=['testname'])
tm.assert_frame_equal(result_no_copy, expected)


def test_concat_no_unnecessary_upcats(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u write this in pytest form using parametrize (needs to be a separate function)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but I need to read up on documentation and it might take some time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure will be something like

@parametrize('dtype', np.sctypes('float'))
@parametrize('klass', [Series, DataFrame])
def test_concat_no_unecessary_upcasts(klass, dtype):
     # dtype and klass are defined

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI only need to do Series/DataFrame/Panel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need pytest module for using @parametrize.
Should I create another class for pytest?

@jaehoonhwang jaehoonhwang changed the title BUG: upcasting on reshaping ops #13247 [WIP] BUG: upcasting on reshaping ops #13247 Mar 6, 2017
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Mar 6, 2017
Same testing, but with pytest @parametrize
@jaehoonhwang
Copy link
Contributor Author

@jreback Copied the original test and used everything in it, except for int testing.

@parametrize('klass', [Series, DataFrame, Panel])
def test_concat_no_unnecessary_upcats_pytest(self, dtype, klass):
# GH 13247
for pdt in klass:
Copy link
Contributor

@TomAugspurger TomAugspurger Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be misreading this, but I don't think you need the for loop over klass anymore. pytest will handle that for you, since you're doing the parametrize.
I think this will also fail, since this class eventually inherits from TestCase, which doesn't play well with parametrized fixtures. I'd try it like this:

import numpy as np
from pandas import Series, DataFrame, Panel
import pandas as pd
import pytest


class TestConcatUpcast(object):

    @pytest.mark.parametrize('pdt', [Series, DataFrame, Panel])
    def test_concat_no_unnecessary_upcats_pytest(self, pdt):
        # GH 13247
        dims = pdt().ndim
        for dt in np.sctypes['float']:
            dfs = [pdt(np.array([1], dtype=dt, ndmin=dims)),
                   pdt(np.array([np.nan], dtype=dt, ndmin=dims)),
                   pdt(np.array([5], dtype=dt, ndmin=dims))]
            x = pd.concat(dfs)
            assert x.values.dtype == dt

            objs = []
            objs.append(pdt(np.array([1], dtype=np.float32, ndmin=dims)))
            objs.append(pdt(np.array([1], dtype=np.float16, ndmin=dims)))
            assert pd.concat(objs).values.dtype == np.float32

            objs = []
            objs.append(pdt(np.array([1], dtype=np.int32, ndmin=dims)))
            objs.append(pdt(np.array([1], dtype=np.int64, ndmin=dims)))
            assert pd.concat(objs).values.dtype == np.int64

            objs = []
            objs.append(pdt(np.array([1], dtype=np.int32, ndmin=dims)))
            objs.append(pdt(np.array([1], dtype=np.float16, ndmin=dims)))
            assert pd.concat(objs).values.dtype == np.float64

This doesn't parametrize all that well over the dtype argument, so probably best to just leave it there.

If you want to leave comments on #15608, would appreciate your insight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay.
I'm trying to set up the test environment, but conda doesn't like me.
I'll try your code after I finish setting it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right.
I can't use @parametrize without importing pytest module.
I will try creating another class.

@jaehoonhwang
Copy link
Contributor Author

jaehoonhwang commented Mar 8, 2017

There are lots of things that needed to be fixed for this code, so I will keep working on it.

After 9 minutes, I found some errors while I was copying things over to this branch and all the tests seem fine except for the one I created for pytest.

Any guidance for pytest would be appreciated.

Everything tests fine, except for pytest.
Everything tests fine except for the pytest. Waiting for suggestions.
@TomAugspurger
Copy link
Contributor

Any guidance for pytest would be appreciated.

you can run individual tests with pytest pandas -k <regex>, so pytest pandas -k test_concat_no_unnecessary_upcast_pytest.

Let us know if you have other questions.

@@ -1899,3 +1900,64 @@ def test_concat_multiindex_dfs_with_deepcopy(self):
tm.assert_frame_equal(result_copy, expected)
result_no_copy = pd.concat(example_dict, names=['testname'])
tm.assert_frame_equal(result_no_copy, expected)

def test_concat_no_unnecessary_upcast(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't need to be a class at all, just a function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function as function inside of the class or function outside of the class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outside

pdt(np.array([np.nan], dtype=dt, ndmin=dims)),
pdt(np.array([5], dtype=dt, ndmin=dims))]
x = pd.concat(dfs)
self.assertTrue(x.values.dtype == dt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just do assert x.values.dtype == dt, no need for any reference to self

@jaehoonhwang
Copy link
Contributor Author

I got pytest working with the code.
@TomAugspurger @jreback Thank you guys for the help.

@jaehoonhwang
Copy link
Contributor Author

Hi, do I also need to fix additional errors in text_index and test_reshape?

# not sure what is the best answer here
objs = []
objs.append(pdt(np.array([1], dtype=np.int32, ndmin=dims)))
objs.append(pdt(np.array([1], dtype=np.float16, ndmin=dims)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need these tests as you have the ones below


@pytest.mark.parametrize('pdt', [pd.Series, pd.DataFrame, pd.Panel])
@pytest.mark.parametrize('dt', np.sctypes['float'])
def test_concat_no_unnecessary_upcast_pytest(dt, pdt):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use _pytest in the name, just remove the tests above and use these

@jreback
Copy link
Contributor

jreback commented Mar 11, 2017

yes everything needs to pass. fix up the tests as indicated and see if you can narrow down where the failures are.

@jaehoonhwang
Copy link
Contributor Author

jaehoonhwang commented Mar 12, 2017

# these are coerced to float unavoidably (as its a list-like to begin)
        df = DataFrame(columns=['A', 'B'])
        df.loc[3] = [6, 7]
    
        exp = DataFrame([[6, 7]], index=[3], columns=['A', 'B'],
                        dtype='float64')
>       tm.assert_frame_equal(df, exp)

With this error, I am testing few things and I have few questions.
When I initialize with df=DataFrame(columns['A','B']) and do df.loc[3] = [6,7], it is returning me dtype: object
If I do df.loc[3] = [6.0, 7.0], A and B are int64, but dtype of data frame is still object.

This is not expected behavior correct?

For test_partial, instead of checking for `float`, now it checks for
`object`

For test_reshape, instead of checking for
`pandas.core.internals.SparseBlock` now it checks for
`pandas.core.internals.IntBlock`
@codecov-io
Copy link

Codecov Report

Merging #15594 into master will decrease coverage by 0.02%.
The diff coverage is 91.66%.

@@            Coverage Diff             @@
##           master   #15594      +/-   ##
==========================================
- Coverage   91.03%   91.01%   -0.03%     
==========================================
  Files         143      143              
  Lines       49382    49392      +10     
==========================================
- Hits        44957    44953       -4     
- Misses       4425     4439      +14
Impacted Files Coverage Δ
pandas/core/internals.py 93.59% <91.66%> (-0.06%)
pandas/io/gbq.py 25% <0%> (-58.34%)
pandas/tools/merge.py 93.61% <0%> (-0.34%)
pandas/core/common.py 90.96% <0%> (-0.34%)
pandas/core/frame.py 97.87% <0%> (-0.1%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5eac08a...3cd1734. Read the comment docs.

@jaehoonhwang
Copy link
Contributor Author

@jreback Is this acceptable?

@jreback jreback added this to the 0.20.0 milestone Mar 14, 2017
jreback added a commit to jreback/pandas that referenced this pull request Mar 14, 2017
BUG: default_fill_value for get_dummies will be 0
@jreback jreback closed this in 7d34d4d Mar 14, 2017
jreback added a commit that referenced this pull request Mar 14, 2017
BUG: default_fill_value for get_dummies will be 0
@jreback
Copy link
Contributor

jreback commented Mar 14, 2017

thanks @jaehoonhwang (and @jennolsen84 )

@jaehoonhwang
Copy link
Contributor Author

jaehoonhwang commented Mar 15, 2017

@jreback Thank you for the help. If there is any other old pr, I am happy to help.

@jaehoonhwang jaehoonhwang changed the title [WIP] BUG: upcasting on reshaping ops #13247 BUG: upcasting on reshaping ops #13247 Mar 20, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
Original work done by @jennolsen84, in pandas-dev#13337

closes pandas-dev#13247

Author: Jaehoon Hwang <[email protected]>
Author: Jae <[email protected]>

Closes pandas-dev#15594 from jaehoonhwang/Bug13247 and squashes the following commits:

3cd1734 [Jaehoon Hwang] Pass the non-related tests in test_partial and test_reshape
1fa578b [Jaehoon Hwang] Applying request changes removing unnecessary test and renameing
6744636 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
5bb72c7 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
a1d5d40 [Jaehoon Hwang] Completed pytest
8122359 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
0e52b74 [Jaehoon Hwang] Working: Except for pytest
8fec07c [Jaehoon Hwang] Fix: test_concat.py and internals.py
4f6c03e [Jaehoon Hwang] Fix: is_float_dtypes and is_numeric_dtype wrong place
d3476c0 [Jaehoon Hwang] Merge branch 'master' into Bug13247
b977615 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master'
4b1e5c6 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
45f7ae9 [Jaehoon Hwang] Added pytest function
468baee [Jae] BUG: upcasting on reshaping ops pandas-dev#13247
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
BUG: default_fill_value for get_dummies will be 0
mattip pushed a commit to mattip/pandas that referenced this pull request Apr 3, 2017
Original work done by @jennolsen84, in pandas-dev#13337

closes pandas-dev#13247

Author: Jaehoon Hwang <[email protected]>
Author: Jae <[email protected]>

Closes pandas-dev#15594 from jaehoonhwang/Bug13247 and squashes the following commits:

3cd1734 [Jaehoon Hwang] Pass the non-related tests in test_partial and test_reshape
1fa578b [Jaehoon Hwang] Applying request changes removing unnecessary test and renameing
6744636 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
5bb72c7 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
a1d5d40 [Jaehoon Hwang] Completed pytest
8122359 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
0e52b74 [Jaehoon Hwang] Working: Except for pytest
8fec07c [Jaehoon Hwang] Fix: test_concat.py and internals.py
4f6c03e [Jaehoon Hwang] Fix: is_float_dtypes and is_numeric_dtype wrong place
d3476c0 [Jaehoon Hwang] Merge branch 'master' into Bug13247
b977615 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master'
4b1e5c6 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
45f7ae9 [Jaehoon Hwang] Added pytest function
468baee [Jae] BUG: upcasting on reshaping ops pandas-dev#13247
mattip pushed a commit to mattip/pandas that referenced this pull request Apr 3, 2017
BUG: default_fill_value for get_dummies will be 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: upcasting on reshaping ops
4 participants