Skip to content

BUG: upcasting on reshaping ops #13247 #13337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jennolsen84
Copy link
Contributor

@jreback jreback changed the title fixes #13247 BUG: upcasting on reshaping ops #13247 May 31, 2016
@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions Compat pandas objects compatability with Numpy or Python functions labels May 31, 2016

for dt in (np.sctypes['int'] + np.sctypes['uint']):
dfs = [pdt(np.array([1], dtype=dt, ndmin=dims)),
pdt(np.array([5], dtype=dt, ndmin=dims))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some tests that have mixed dtypes in the input as well (e.g. mixed float32/64 and int types)

@jennolsen84
Copy link
Contributor Author

@jreback: nice catch, I did some manual testing of that before, and it passed... but extended testing yields different answers for these cases. What is the "right" thing to do here?

Current behavior:

Series/DataFrame:

int32 + float 16 (no nans) => float64

Panel/Panel4D:

(no nans)
int32 + float16 (no nans) => float16

I think the right choice would be float64, but open to other ideas. WDYT?

@codecov-io
Copy link

codecov-io commented May 31, 2016

Current coverage is 84.23%

Merging #13337 into master will decrease coverage by <.01%

@@             master     #13337   diff @@
==========================================
  Files           138        138          
  Lines         50712      50720     +8   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42714      42720     +6   
- Misses         7998       8000     +2   
  Partials          0          0          

Powered by Codecov. Last updated by 2e3c82e...4eb00e2

@jreback
Copy link
Contributor

jreback commented May 31, 2016

I think any int/float casting can simply go to float64. Yes you can get creative, e.g. I think
int16/int8 & float32 -> float32 but prob not worth it. (and float16 is barely used / supported)

@jennolsen84
Copy link
Contributor Author

I updated the PR... but see below:

OK, so for the panel case, it seems like we go into mixed types (x._is_mixed_type == True). So there are multiple block managers, each item of the panel is of different type.

So in order to really fix it, one has to look at as_matrix function. And in the end it comes down to this:

The function here: https://github.com/pydata/pandas/blob/4de83d25d751d8ca102867b2d46a5547c01d7248/pandas/core/internals.py#L4390-L4449 , takes in float and int blocks, and then falls through to the last line:

return _lcd_dtype(counts[FloatBlock] + counts[SparseBlock])

The above lead to this change in the PR:

elif have_int and have_float and not have_complex:
        return np.dtype('float64')

I was also able to get rid of multiple ifs (the ones you wanted to move to a function), so the code is a bit more cleaner. The side effect is that an empty DataFrame's dtype is object, instead of float64 (in some cases). I am guessing that we should be consistent and perhaps always have a type O DataFrame for empty DataFrames (instead of float64). Thoughts?

In [1]: import pandas as pd

In [2]: df = pd.DataFrame(columns=['A', 'B'])

In [3]: df.values.dtype
Out[3]: dtype('O')

In [4]: df2 = pd.DataFrame()

In [5]: df2.values.dtype
Out[5]: dtype('float64')

@jreback
Copy link
Contributor

jreback commented Jun 1, 2016

@jennolsen84 [5] is a bit undefined. It does currently do this on master as well. Not averse to changing this to object . This might break some tests though.

@@ -4443,6 +4444,8 @@ def _lcd_dtype(l):
return np.dtype('int%s' % (lcd.itemsize * 8 * 2))
return lcd

elif have_int and have_float and not have_complex:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side issue: may want to use np.common_type here to return lcd of the itemsize for float.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.types.cast._find_common_type I think will work for this now

@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

can you rebase / update?

@frol
Copy link

frol commented Sep 12, 2016

I have hit this bug through Dask DataFrame repartition, so I would love to see this issue fixed.

Updated: I have tried to install pandas from the fork and it worked for me! Thank you!

@@ -43,7 +43,7 @@ Backwards incompatible API changes
.. _whatsnew_0190.api:



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.20.0

@jreback
Copy link
Contributor

jreback commented Nov 16, 2016

@jennolsen84 thanks for this. too many open PR's and haven't gone thru in a while. if you can update we can merge soon.

@jreback
Copy link
Contributor

jreback commented Dec 26, 2016

@jennolsen84 can you rebase

1 similar comment
@jreback
Copy link
Contributor

jreback commented Jan 21, 2017

@jennolsen84 can you rebase

@jaehoonhwang
Copy link
Contributor

jaehoonhwang commented Mar 6, 2017

Hi, I rebased this code to latest commit on master, but there were merge conflicts. So, I rewrote them and I want to do a pr on this page.
How do I do this or do I need to create a new pr?

@jreback
Copy link
Contributor

jreback commented Mar 6, 2017

@jaehoonhwang what you do is branch off of current master. Then pull in these commits, fixing any conflicts. You then push to your branch and treat like any other fix. The only trick here is to cherry-pick the original commits off of this PR.

@frol
Copy link

frol commented Mar 6, 2017

@jaehoonhwang You cannot replace others PR with yours, so just create a new PR and reference this PR in it.

@jreback
Copy link
Contributor

jreback commented Mar 7, 2017

closing in favor of #15594 (rebase of this PR).

@jreback jreback closed this Mar 7, 2017
@jreback jreback added this to the 0.20.0 milestone Mar 14, 2017
jreback pushed a commit to jreback/pandas that referenced this pull request Mar 14, 2017
Original work done by @jennolsen84, in pandas-dev#13337

closes pandas-dev#13247

Author: Jaehoon Hwang <[email protected]>
Author: Jae <[email protected]>

Closes pandas-dev#15594 from jaehoonhwang/Bug13247 and squashes the following commits:

3cd1734 [Jaehoon Hwang] Pass the non-related tests in test_partial and test_reshape
1fa578b [Jaehoon Hwang] Applying request changes removing unnecessary test and renameing
6744636 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
5bb72c7 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
a1d5d40 [Jaehoon Hwang] Completed pytest
8122359 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
0e52b74 [Jaehoon Hwang] Working: Except for pytest
8fec07c [Jaehoon Hwang] Fix: test_concat.py and internals.py
4f6c03e [Jaehoon Hwang] Fix: is_float_dtypes and is_numeric_dtype wrong place
d3476c0 [Jaehoon Hwang] Merge branch 'master' into Bug13247
b977615 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master'
4b1e5c6 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
45f7ae9 [Jaehoon Hwang] Added pytest function
468baee [Jae] BUG: upcasting on reshaping ops pandas-dev#13247
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
Original work done by @jennolsen84, in pandas-dev#13337

closes pandas-dev#13247

Author: Jaehoon Hwang <[email protected]>
Author: Jae <[email protected]>

Closes pandas-dev#15594 from jaehoonhwang/Bug13247 and squashes the following commits:

3cd1734 [Jaehoon Hwang] Pass the non-related tests in test_partial and test_reshape
1fa578b [Jaehoon Hwang] Applying request changes removing unnecessary test and renameing
6744636 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
5bb72c7 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
a1d5d40 [Jaehoon Hwang] Completed pytest
8122359 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
0e52b74 [Jaehoon Hwang] Working: Except for pytest
8fec07c [Jaehoon Hwang] Fix: test_concat.py and internals.py
4f6c03e [Jaehoon Hwang] Fix: is_float_dtypes and is_numeric_dtype wrong place
d3476c0 [Jaehoon Hwang] Merge branch 'master' into Bug13247
b977615 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master'
4b1e5c6 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
45f7ae9 [Jaehoon Hwang] Added pytest function
468baee [Jae] BUG: upcasting on reshaping ops pandas-dev#13247
mattip pushed a commit to mattip/pandas that referenced this pull request Apr 3, 2017
Original work done by @jennolsen84, in pandas-dev#13337

closes pandas-dev#13247

Author: Jaehoon Hwang <[email protected]>
Author: Jae <[email protected]>

Closes pandas-dev#15594 from jaehoonhwang/Bug13247 and squashes the following commits:

3cd1734 [Jaehoon Hwang] Pass the non-related tests in test_partial and test_reshape
1fa578b [Jaehoon Hwang] Applying request changes removing unnecessary test and renameing
6744636 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
5bb72c7 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
a1d5d40 [Jaehoon Hwang] Completed pytest
8122359 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
0e52b74 [Jaehoon Hwang] Working: Except for pytest
8fec07c [Jaehoon Hwang] Fix: test_concat.py and internals.py
4f6c03e [Jaehoon Hwang] Fix: is_float_dtypes and is_numeric_dtype wrong place
d3476c0 [Jaehoon Hwang] Merge branch 'master' into Bug13247
b977615 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master'
4b1e5c6 [Jaehoon Hwang] Merge remote-tracking branch 'pandas-dev/master' into Bug13247
45f7ae9 [Jaehoon Hwang] Added pytest function
468baee [Jae] BUG: upcasting on reshaping ops pandas-dev#13247
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: upcasting on reshaping ops
5 participants