Skip to content

BUG: #12815 Always use np.nan for missing values of object dtypes #18313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 21, 2017
Merged

BUG: #12815 Always use np.nan for missing values of object dtypes #18313

merged 1 commit into from
Nov 21, 2017

Conversation

mattayes
Copy link
Contributor

@mattayes mattayes commented Nov 15, 2017

Handles the issue of unstacked object columns filling missing values with None instead of np.nan by modifying pd.core.dtypes.cast.maybe_promote() to use a fill_value of np.nan (or pd.NaT) when the original fill_value is None.

Let me know if you'd like me to clarify anything!

@codecov
Copy link

codecov bot commented Nov 15, 2017

Codecov Report

Merging #18313 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18313      +/-   ##
==========================================
- Coverage   91.39%   91.37%   -0.02%     
==========================================
  Files         164      164              
  Lines       49854    49855       +1     
==========================================
- Hits        45566    45557       -9     
- Misses       4288     4298      +10
Flag Coverage Δ
#multiple 89.18% <100%> (-0.01%) ⬇️
#single 39.44% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/cast.py 88.19% <100%> (+0.02%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.38% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c2590b3...8ec39ee. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 15, 2017

Codecov Report

Merging #18313 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18313      +/-   ##
==========================================
- Coverage   91.38%   91.36%   -0.02%     
==========================================
  Files         164      164              
  Lines       49796    49798       +2     
==========================================
- Hits        45507    45500       -7     
- Misses       4289     4298       +9
Flag Coverage Δ
#multiple 89.17% <100%> (ø) ⬆️
#single 39.55% <0%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/cast.py 88.19% <100%> (+0.02%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/internals.py 94.54% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c868423...5f15554. Read the comment docs.

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 15, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. pls add a note in 0.22 / other api changes.



def test_unstack_fill_frame_object():
# Test unstacking with object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment

@mattayes
Copy link
Contributor Author

@jreback Changes made.

@jreback jreback added this to the 0.22.0 milestone Nov 19, 2017
@jreback
Copy link
Contributor

jreback commented Nov 19, 2017

trivial change. push & ping when green.

@@ -48,6 +48,7 @@ Other API Changes
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the ``pandas.tseries.offsets`` module (:issue:`17830`)
- `tseries.frequencies.get_freq_group()` and `tseries.frequencies.DAYS` are removed from the public API (:issue:`18034`)
- :func:`Series.truncate` and :func:`DataFrame.truncate` will raise a ``ValueError`` if the index is not sorted instead of an unhelpful ``KeyError`` (:issue:`17935`)
- `Dataframe.unstack()` will now default to filling with `np.nan` for ``object`` columns. (:issue:`12815`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double-backticks around np.nan

@jreback
Copy link
Contributor

jreback commented Nov 19, 2017

note if you are interested in cleaning some things up :>

maybe_promote I think will pretty much ignore its fill value for almost all dtypes. So the question is, when is it not ignored, can we safely remove it (or really just make the default fill_value=None, and then when its not None actually do something.

Not sure of the answer here.

@mattayes
Copy link
Contributor Author

mattayes commented Nov 20, 2017

@jreback Green. I'll open an issue for the maybe_promote changes.

@jreback jreback merged commit 509e03c into pandas-dev:master Nov 21, 2017
@jreback
Copy link
Contributor

jreback commented Nov 21, 2017

thanks @mattayes

@mattayes mattayes deleted the 12815-unstacking-none branch November 21, 2017 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: unstacking with object columns defaults to None as a fill value
2 participants