Skip to content

Date Type Corrupting Other Types in Group-by/Apply #15670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gwpdt opened this issue Mar 13, 2017 · 1 comment
Closed

Date Type Corrupting Other Types in Group-by/Apply #15670

gwpdt opened this issue Mar 13, 2017 · 1 comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Groupby
Milestone

Comments

@gwpdt
Copy link
Contributor

gwpdt commented Mar 13, 2017

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'Number' : [1, 2], 'Date' : ["2017-03-02"] * 2, 'Str' : ["foo", "inf"]})

In [3]: df
Out[3]:
         Date  Number  Str
0  2017-03-02       1  foo
1  2017-03-02       2  inf

In [4]: df.groupby(['Number']).apply(lambda x: x.iloc[0])
Out[4]:
              Date  Number  Str
Number
1       2017-03-02       1  foo
2       2017-03-02       2  inf

In [5]: df.Date = pd.to_datetime(df.Date)

In [6]: df
Out[6]:
        Date  Number  Str
0 2017-03-02       1  foo
1 2017-03-02       2  inf

In [7]: df.groupby(['Number']).apply(lambda x: x.iloc[0])
Out[7]:
             Date  Number  Str
Number
1      2017-03-02       1  NaN
2      2017-03-02       2  inf

Problem description

When I change the type of the Date column to a Pandas datetime, it causes other columns' types to change in unexpected ways when doing a group-by/apply. Notice the contents of the "Str" column changes to a numeric type in the final group-by/apply (a contributing factor is probably that one of the elements is the string "inf"). The "inf" value has become inf, and the "foo" value has become NaN.

Expected Output

I expect the Str column to remain a string type, and contain the original strings. I.e.:

        Date  Number  Str
0 2017-03-02       1  foo
1 2017-03-02       2  inf

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.11.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-327.10.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: None
setuptools: 0.6
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.0
statsmodels: 0.6.1
xarray: 0.7.0
IPython: 5.0.0
sphinx: 1.3.5
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.1
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: 0.6.7.None
psycopg2: 2.5.4 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Mar 13, 2017

this is a duplicate of this: #14423

soln is pretty easy if you'd like to do a PR

@jreback jreback closed this as completed Mar 13, 2017
@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Groupby labels Mar 13, 2017
@jreback jreback added this to the No action milestone Mar 13, 2017
@jreback jreback modified the milestones: 0.20.0, No action Mar 16, 2017
jreback pushed a commit that referenced this issue Mar 16, 2017
closes #14423
closes #15421
closes #15670

During a group-by/apply
on a DataFrame, in the presence of one or more  DateTime-like columns,
Pandas would incorrectly coerce the type of all  other columns to
numeric.  E.g. a String column would be coerced to  numeric, producing
NaNs.

Author: Greg Williams <[email protected]>

Closes #15680 from gwpdt/bugfix14423 and squashes the following commits:

e1ed104 [Greg Williams] TST: Rename and expand test_numeric_coercion
0a15674 [Greg Williams] CLN: move import, add whatsnew entry
c8844e0 [Greg Williams] CLN: PEP8 (whitespace fixes)
46d12c2 [Greg Williams] BUG: Group-by numeric type-coericion with datetime
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
closes pandas-dev#14423
closes pandas-dev#15421
closes pandas-dev#15670

During a group-by/apply
on a DataFrame, in the presence of one or more  DateTime-like columns,
Pandas would incorrectly coerce the type of all  other columns to
numeric.  E.g. a String column would be coerced to  numeric, producing
NaNs.

Author: Greg Williams <[email protected]>

Closes pandas-dev#15680 from gwpdt/bugfix14423 and squashes the following commits:

e1ed104 [Greg Williams] TST: Rename and expand test_numeric_coercion
0a15674 [Greg Williams] CLN: move import, add whatsnew entry
c8844e0 [Greg Williams] CLN: PEP8 (whitespace fixes)
46d12c2 [Greg Williams] BUG: Group-by numeric type-coericion with datetime
mattip pushed a commit to mattip/pandas that referenced this issue Apr 3, 2017
closes pandas-dev#14423
closes pandas-dev#15421
closes pandas-dev#15670

During a group-by/apply
on a DataFrame, in the presence of one or more  DateTime-like columns,
Pandas would incorrectly coerce the type of all  other columns to
numeric.  E.g. a String column would be coerced to  numeric, producing
NaNs.

Author: Greg Williams <[email protected]>

Closes pandas-dev#15680 from gwpdt/bugfix14423 and squashes the following commits:

e1ed104 [Greg Williams] TST: Rename and expand test_numeric_coercion
0a15674 [Greg Williams] CLN: move import, add whatsnew entry
c8844e0 [Greg Williams] CLN: PEP8 (whitespace fixes)
46d12c2 [Greg Williams] BUG: Group-by numeric type-coericion with datetime
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Groupby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants