Skip to content

API/BUG: type of scalar aggregations #15385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
chris-b1 opened this issue Feb 13, 2017 · 4 comments
Open

API/BUG: type of scalar aggregations #15385

chris-b1 opened this issue Feb 13, 2017 · 4 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reduction Operations sum, mean, min, max, etc.

Comments

@chris-b1
Copy link
Contributor

Currently there is some inconsistency around the scalar type returned from a Series aggregation, both in terms of whether it is a numpy or python type, as well as different behavior for an empty Series - see table below.

Normally this isn't a big deal as the numpy types mostly behave like the python type, but can be an issue with serialization, which is where I ran into this.

Is the desired behavior to make all of these python types?

function type type_empty
sum <class 'float'> <class 'int'>
mean <class 'float'> <class 'float'>
median <class 'float'> <class 'float'>
count <class 'numpy.int32'> <class 'numpy.int32'>
var <class 'float'> <class 'float'>
std <class 'float'> <class 'float'>
sem <class 'numpy.float64'> <class 'numpy.float64'>
nunique <class 'int'> <class 'int'>
prod <class 'numpy.float64'> <class 'float'>
min <class 'numpy.float64'> <class 'float'>
max <class 'numpy.float64'> <class 'float'>

Code for table

fxs = ['sum', 'mean', 'median', 'count', 'var', 'std', 'sem',  'nunique', 'prod', 'min', 'max']
s = pd.Series([1., 2., 3.])
s_empty = pd.Series([], dtype='f8')
data = []

for f in fxs:
    row = dict(function=f)
    res = getattr(s, f)()
    row['type'] = type(res)
    res = getattr(s_empty, f)()
    row['type_empty'] = type(res)
    data.append(row)

pd.DataFrame(data).to_html(index=False)
@chris-b1 chris-b1 added API Design Dtype Conversions Unexpected or buggy dtype conversions labels Feb 13, 2017
@jreback
Copy link
Contributor

jreback commented Feb 14, 2017

yes this would be good to make consistent. scalars should always be python scalars; numpy scalars are a weird hybrid that can have odd effects.

@jreback jreback added this to the Next Major Release milestone Feb 14, 2017
@chris-b1
Copy link
Contributor Author

Extra test case from #19381

import pandas as pd
import json
import datetime

data = [
    datetime.date(1987, 2, 12),
    datetime.date(1987, 2, 12),
    datetime.date(1987, 2, 12),
    None,
    None,
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15)
]

df = pd.DataFrame(columns=['foo'])
df['foo'] = data
ds = df['foo'].describe()
d = ds.to_dict()
j = json.dumps(d)
print(j)

@mroeschke mroeschke added the Numeric Operations Arithmetic, Comparison, and Logical operations label Oct 27, 2019
@torlenor
Copy link

This is still an issue and leads to troubles, as the original poster described, in serialization when packages are used which do not know how to handle numpy types.

@jreback
Copy link
Contributor

jreback commented Feb 12, 2021

@torlenor pull requests for tests and patches are always welcome and how things get fixed

@mroeschke mroeschke added Bug and removed API Design labels May 8, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel jbrockmendel added Reduction Operations sum, mean, min, max, etc. and removed Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

No branches or pull requests

5 participants