Skip to content

BUG: Aggregating over an integer on an empty DataFrame causes datetime64 types to convert to float64 #7649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chrisaycock opened this issue Jul 2, 2014 · 4 comments

Comments

@chrisaycock
Copy link
Contributor

Possibly related to issue #7574:

Suppose I have a DataFrame:

In [61]: df
Out[61]:
          ref                  localtime  size
0    45361866 2014-06-25 14:11:11.753597   100
1    45361866 2014-06-25 14:11:11.753769   100
2    45361866 2014-06-25 14:11:11.754350   200
3    45361866 2014-06-25 14:11:11.756413   200
4    45361866 2014-06-25 14:18:59.442972   200
..        ...                        ...   ...
29  204286294 2014-06-25 19:43:27.770083   100
30  204286294 2014-06-25 19:43:27.771266  7036
31  216308339 2014-06-25 19:57:32.468547   100
32  216308339 2014-06-25 19:57:37.534973   200
33  216308339 2014-06-25 19:57:37.535035   300

[34 rows x 3 columns]

If I aggregate over an integer, the resulting types are as expected:

In [62]: df.groupby('ref').first().reset_index().dtypes
Out[62]:
ref                   int64
localtime    datetime64[ns]
size                  int64
dtype: object

However, if I aggregate on an empty DataFrame, then my timestamp's type is converted from datetime64 to float64:

In [63]: df.query('size>10000').groupby('ref').first().reset_index().dtypes
Out[63]:
index          int64
localtime    float64
size         float64
dtype: object
@jreback
Copy link
Contributor

jreback commented Jul 2, 2014

can you put the frame creation at the top of your example as well. easiest to diagnose when can simply copy-paste

@chrisaycock
Copy link
Contributor Author

I'm not sure of a clean way to generate a DataFrame with timestamps, but this works in reproducing the error:

df = pd.DataFrame({'ref':[1, 1, 2, 2], 'localtime':4*['2014-06-25 14:11:11.753597'], 'size':[100, 200, 300, 400]})
df.localtime = pd.to_datetime(df.localtime)

@jreback
Copy link
Contributor

jreback commented Jul 2, 2014

works ok in master/0.14.1 (releasing soon). fixed here: #7593

In [1]: df = pd.DataFrame({'ref':[1, 1, 2, 2], 'localtime':4*['2014-06-25 14:11:11.753597'], 'size':[100, 200, 300, 400]})

In [2]: df.localtime = pd.to_datetime(df.localtime)

In [3]: df
Out[3]: 
                   localtime  ref  size
0 2014-06-25 14:11:11.753597    1   100
1 2014-06-25 14:11:11.753597    1   200
2 2014-06-25 14:11:11.753597    2   300
3 2014-06-25 14:11:11.753597    2   400

In [4]: df.dtypes
Out[4]: 
localtime    datetime64[ns]
ref                   int64
size                  int64
dtype: object

In [5]: df.query('size>10000').groupby('ref').first().reset_index().dtypes
Out[5]: 
ref                   int64
localtime    datetime64[ns]
size                  int64
dtype: object

In [6]: df.query('size>10000').groupby('ref').first().reset_index()
Out[6]: 
Empty DataFrame
Columns: [ref, localtime, size]
Index: []

@jreback jreback closed this as completed Jul 2, 2014
@chrisaycock
Copy link
Contributor Author

Ah, and it kept your index as ref, which would fix issue #7574. If you can confirm that 0.14.1 passes my test over there, then feel free to close that issue too. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants