Skip to content

ERR: Series must have a singluar dtype otherwise should raise #13296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pganssle opened this issue May 26, 2016 · 14 comments · Fixed by #30494
Closed

ERR: Series must have a singluar dtype otherwise should raise #13296

pganssle opened this issue May 26, 2016 · 14 comments · Fixed by #30494
Labels
API Design Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas good first issue
Milestone

Comments

@pganssle
Copy link
Contributor

pganssle commented May 26, 2016

When constructing a Series object using a numpy structured data array, if you try and cast it to a str (or print it), it throws:

TypeError(ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')

You can print a single value from the series, but not the whole series.

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

c_dtype = np.dtype([('a', 'i8'), ('b', 'f4')])
cdt_arr = np.array([(1, 0.4), (256,  -13)], dtype=c_dtype)

pds = pd.Series(cdt_arr, index=['A', 'B'])

print('pds.iloc[0]: {}'.format(str(pds.iloc[0])))   # (1, 0.4000000059604645)
print('pds.iloc[1]: {}'.format(str(pds.iloc[1])))   # (256, -13.0)
print('pds.loc["A"]: {}'.format(str(pds.loc['A']))) # Works
print('pds.loc["B"]: {}'.format(str(pds.loc['B']))) # Works

def print_error(x):
    try:
        o = str(x)      # repr(x) also causes the same errors
        print(o)
    except TypeError as e:
        print('TypeError({})'.format(e.args[0]))

a = pds.iloc[0:1]
b = pds.loc[['A', 'B']]

print('pds.iloc[0:1]:')
print_error(a)
print('pds.loc["A", "B"]:')
print_error(b)
print('pds:')
print_error(pds)

print('pd.DataFrame([pds]).T:')
print_error(pd.DataFrame([pds]).T)

print('pds2:')
cdt_arr_2 = np.array([(1, 0.4)], dtype=c_dtype)
pds2 = pd.Series(cdt_arr_2, index=['A'])
print_error(pds2)

Output (actual):

$ python demo_index_bug.py 
pds.iloc[0]: (1, 0.4000000059604645)
pds.iloc[1]: (256, -13.0)
pds.loc["A"]: (1, 0.4000000059604645)
pds.loc["B"]: (256, -13.0)
pds.iloc[0:1]:
TypeError(ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')
pds.loc["A", "B"]:
TypeError(ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')
pds:
TypeError(ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')
pd.DataFrame([pds]).T:
                         0
A  (1, 0.4000000059604645)
B             (256, -13.0)
pds2:
TypeError(ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')

output of pd.show_versions():

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.5.2-1-ARCH
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 21.0.0
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Stack Trace

I swallowed the stack traces to show where this was failing, so here's the traceback for that last error:

Traceback (most recent call last):
  File "demo_dtype_bug.py", line 37, in <module>
    print(pds2)
  File "~/.local/lib/python3.5/site-packages/pandas/core/base.py", line 46, in __str__
    return self.__unicode__()
  File "~/.local/lib/python3.5/site-packages/pandas/core/series.py", line 984, in __unicode__
    max_rows=max_rows)
  File "~/.local/lib/python3.5/site-packages/pandas/core/series.py", line 1025, in to_string
    dtype=dtype, name=name, max_rows=max_rows)
  File "~/.local/lib/python3.5/site-packages/pandas/core/series.py", line 1053, in _get_repr
    result = formatter.to_string()
  File "~/.local/lib/python3.5/site-packages/pandas/formats/format.py", line 225, in to_string
    fmt_values = self._get_formatted_values()
  File "~/.local/lib/python3.5/site-packages/pandas/formats/format.py", line 215, in _get_formatted_values
    float_format=self.float_format, na_rep=self.na_rep)
  File "~/.local/lib/python3.5/site-packages/pandas/formats/format.py", line 2007, in format_array
    return fmt_obj.get_result()
  File "~/.local/lib/python3.5/site-packages/pandas/formats/format.py", line 2026, in get_result
    fmt_values = self._format_strings()
  File "~/.local/lib/python3.5/site-packages/pandas/formats/format.py", line 2059, in _format_strings
    is_float = lib.map_infer(vals, com.is_float) & notnull(vals)
  File "~/.local/lib/python3.5/site-packages/pandas/core/common.py", line 250, in notnull
    res = isnull(obj)
  File "~/.local/lib/python3.5/site-packages/pandas/core/common.py", line 91, in isnull
    return _isnull(obj)
  File "~/.local/lib/python3.5/site-packages/pandas/core/common.py", line 101, in _isnull_new
    return _isnull_ndarraylike(obj)
  File "~/.local/lib/python3.5/site-packages/pandas/core/common.py", line 192, in _isnull_ndarraylike
    result = np.isnan(values)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
@max-sixty
Copy link
Contributor

ref pydata/xarray#861

@jreback
Copy link
Contributor

jreback commented May 27, 2016

You are wanting to construct a DataFrame. xarray is a bit extreme for this case. Series is by-definition a single dtyped structure.

In [5]: DataFrame.from_records(cdt_arr)
Out[5]: 
     a     b
0    1   0.4
1  256 -13.0

In [6]: DataFrame.from_records(cdt_arr).dtypes
Out[6]: 
a      int64
b    float32
dtype: object

@jreback jreback closed this as completed May 27, 2016
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Usage Question API Design labels May 27, 2016
@pganssle
Copy link
Contributor Author

@jreback Whether or not my purposes would be better served with a dataframe in this case, I think this is still a valid bug, considering that you can construct and use Series perfectly well using a compound datatype, but it crashes when printed.

As for why you might want to do something like this - occasionally there are uses where the semantics are much easier when you can treat a single value as a scalars rather than multiple columns. One toy example would be operations on coordinate systems:

import pandas as pd
import numpy as np

three_vec = np.dtype([('x', 'f8'), ('y', 'f8'), ('z', 'f8')])

def rotate_coordinates(x, u, theta):
    I = np.identity(3)

    ux = np.array([
        [      0, -u['z'],  u['y']],
        [ u['z'],       0, -u['x']],
        [-u['y'],  u['x'],       0]
    ])

    uu = np.array([
        [    u['x'] ** 2, u['x'] * u['y'], u['x'] * u['z']],
        [u['x'] * u['y'],     u['y'] ** 2, u['y'] * u['z']],
        [u['x'] * u['z'], u['y'] * u['z'],     u['z'] ** 2]
    ])

    R = np.cos(theta) * I + np.sin(theta) * ux + (1 - np.cos(theta)) * uu

    xx = x.view(np.float64).reshape(x.shape + (-1,)).T

    out_array = (R @ xx).round(15)

    return np.core.records.fromarrays(out_array, dtype=three_vec)

# Rotate these arrays about z
z = np.array([(0, 0, 1)], dtype=three_vec)[0]
v1 = np.array([(0, 1, 0), (1, 0, 0)], dtype=three_vec)
vp = rotate_coordinates(v1, z, np.pi / 2)

print(v1)
print(vp)

Now imagine that I wanted a pd.DataFrame containing the start and end of some motion. I could represent it as a DataFrame with columns 'start_x', 'end_x', 'start_y', 'end_y', etc, and if I wanted to rotate all the coordinates to a new coordinate system, either manually group the columns, then manually re-distribute them, or I could use a compound datatype three_vec, have a dataframe with columns 'start' and 'end', then do df.apply(partial(rotate_coordinates, u=z, theta=np.pi/2), axis=1). This is a much cleaner way to both store the data and operate on it. It's similar in principle to the idea that if a first-class datetime data type didn't exist, you wouldn't suggest just using a DataFrame with columns 'year', 'month', 'day', etc.

@jreback
Copy link
Contributor

jreback commented May 27, 2016

@pganssle you are violating the guarantees of a Series. it is by-definition a singular dtype. The bug is that it accepts (a non-singular one) in the first place. I'll reopen for that purpose. There is NO support for a Series with the use-case you describe. EIther use a DataFrame or xarray.

@jreback jreback reopened this May 27, 2016
@jreback jreback changed the title TypeError converting Series to string with structured data array ERR: Series must have a singluar dtype otherwise should raise May 27, 2016
@jreback jreback added Error Reporting Incorrect or improved errors from pandas Difficulty Novice labels May 27, 2016
@jreback jreback added this to the Next Major Release milestone May 27, 2016
@pganssle
Copy link
Contributor Author

@jreback My suggestion is that compound types are a single type in the same way that a datetime is a single type. Complex numbers are also a single type because they have native numpy support, but what about quarternions and other hypercomplex numbers? I think it's reasonable to use records to define the base unit of a scalar, given that it's already supported by numpy.

@jreback
Copy link
Contributor

jreback commented May 27, 2016

@pganssle a compound dtype is simply not supported, nor do I think should be. Sure an extension type that is innately a compound type is fine because it singular. But a structured dtype is NOT. it has sub-dtypes. This is just making an already complicated structure WAY more complex.

@jreback
Copy link
Contributor

jreback commented May 27, 2016

as I said for not this should simply raise NotImplementedError. If you want to investigate if this could be suppport w/o major restructuring, then great. If its trivial, sure. But I suspect its not.

@pganssle
Copy link
Contributor Author

@jreback Does pandas support custom dtypes? I'm not sure that I've ever seen someone create one, other than pandas.

@jreback
Copy link
Contributor

jreback commented May 27, 2016

@jreback
Copy link
Contributor

jreback commented May 27, 2016

But these required a lot of support to integrate properly. These are fundamental types. I suppose a Coordinate could also be in that category. But as I said its a MAJOR effort to properly handle things.

@jreback
Copy link
Contributor

jreback commented May 27, 2016

Principally the issue is efficient storage. What you are suggesting is NOT stored efficiently and that's the problem.

@jreback
Copy link
Contributor

jreback commented May 27, 2016

I have NEVER seen a good use of .apply, and that's what you are suggesting here. That is SO completely inefficient.

@pganssle
Copy link
Contributor Author

It's just a toy example of why the semantics would be useful. You could achieve the same thing with applymap or even just df[:] = rotate_coordinates(df.values, z, theta). I don't have any particular understanding of the underlying efficiency of how these things are stored, I was just demonstrating the concept of compound data types that are "logical" scalars.

I think it's fine to consider my suggestion a "low reward / high effort" enhancement - it may be fundamentally difficult to deal with this sort of thing and not something that comes up a lot, I just think it's worth considering as a "nice to have", since, if possible, it would be better to have first-class support for complex datatypes than not.

When I have a bit of time I will be happy to look into the underlying details and see if I can get a better understanding of difficulty and/or propose an alternate approach. Probably it will be a while, though, since I have quite a backlog of other stuff to get to.

In the meantime, I would think this could be profitably handled by just converting compound datatypes to tuple on import, possibly with a warning about the inefficiency of this approach. At least this would allow people who are less performance sensitive to write some wrapper functions to allow the use of normal semantics.

@jreback
Copy link
Contributor

jreback commented May 27, 2016

@pganssle if you have time for this great. But I don't have time for every enhancement (actually most of them). So if you'd like to propose something great. However the very simplest thing is to raise an error.

If you are someone wants to implement a better soln. great.

@jreback jreback modified the milestones: Contributions Welcome, 1.0 Dec 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants