Skip to content

BUG: Retain tz-aware dtypes with melt (#15785) #20292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 13, 2018

Conversation

mroeschke
Copy link
Member

.values call was converting tz aware data to tz naive data (by casting to a numpy array). Added an additional test for Categorical data as well.

@pytest.mark.parametrize("col", [
pd.Series(pd.date_range('2010', periods=5, tz='US/Pacific')),
pd.Series(["a", "b", "c", "a", "d"], dtype="category")])
def test_pandas_dtypes_id_var(self, col):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to reduce duplication of code further could parametrize something like "as_val" with parameters of True and False and then just add a conditional at the top of the function to set attr2 either to either the col or [0, 1, 0, 0, 0]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh I see @WillAyd suggested. in any event this makes this much harder to read.

@codecov
Copy link

codecov bot commented Mar 12, 2018

Codecov Report

Merging #20292 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20292      +/-   ##
==========================================
+ Coverage   91.72%   91.73%   +<.01%     
==========================================
  Files         150      150              
  Lines       49165    49174       +9     
==========================================
+ Hits        45099    45108       +9     
  Misses       4066     4066
Flag Coverage Δ
#multiple 90.11% <100%> (ø) ⬆️
#single 41.86% <28.57%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/melt.py 97.34% <100%> (+0.14%) ⬆️
pandas/core/generic.py 95.84% <0%> (-0.01%) ⬇️
pandas/core/frame.py 97.18% <0%> (ø) ⬆️
pandas/core/indexing.py 93.02% <0%> (ø) ⬆️
pandas/core/indexes/base.py 96.66% <0%> (ø) ⬆️
pandas/core/strings.py 98.32% <0%> (ø) ⬆️
pandas/core/resample.py 96.43% <0%> (ø) ⬆️
pandas/core/groupby.py 92.14% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.64% <0%> (ø) ⬆️
pandas/plotting/_core.py 82.27% <0%> (ø) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31afaf8...abe48c9. Read the comment docs.

@jreback jreback added Bug Timezones Timezone data dtype labels Mar 12, 2018
id_data = frame.pop(col)
if is_extension_type(id_data):
# Preserve pandas dtype by not converting to a numpy array
id_data = concat([id_data] * K, ignore_index=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger do we have this in ExtensionArray ATM? .tile()? or could emulate like this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No .tile. We do have _concat_same_type.

mdata[col] = np.tile(frame.pop(col).values, K)
id_data = frame.pop(col)
if is_extension_type(id_data):
# Preserve pandas dtype by not converting to a numpy array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is not needed

df = DataFrame({'klass': range(5),
'col': col,
'attr1': [1, 0, 0, 0, 0]})
if pandas_dtype_value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move what this does into the parameterize, maybe by using a fixture. this defeats the purpose of being able to look at the parameterize and see what cases are being tested

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Mar 12, 2018
@@ -896,6 +896,7 @@ Timezones
- Bug in :func:`Timestamp.tz_localize` where localizing a timestamp near the minimum or maximum valid values could overflow and return a timestamp with an incorrect nanosecond value (:issue:`12677`)
- Bug when iterating over :class:`DatetimeIndex` that was localized with fixed timezone offset that rounded nanosecond precision to microseconds (:issue:`19603`)
- Bug in :func:`DataFrame.diff` that raised an ``IndexError`` with tz-aware values (:issue:`18578`)
- Bug in :func:`melt` that coverted tz-aware dtypes to tz-naive (:issue:`15785`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coverted --> converted

@jreback jreback added this to the 0.23.0 milestone Mar 13, 2018
@jreback jreback merged commit 53bf291 into pandas-dev:master Mar 13, 2018
@jreback
Copy link
Contributor

jreback commented Mar 13, 2018

thanks!

@mroeschke mroeschke deleted the melt_tz branch March 13, 2018 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: melt changes type of tz-aware columns
5 participants