ERR: Series must have a singluar dtype otherwise should raise #13296

pganssle · 2016-05-26T17:59:00Z

When constructing a Series object using a numpy structured data array, if you try and cast it to a str (or print it), it throws:

TypeError(ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')

You can print a single value from the series, but not the whole series.

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

c_dtype = np.dtype([('a', 'i8'), ('b', 'f4')])
cdt_arr = np.array([(1, 0.4), (256,  -13)], dtype=c_dtype)

pds = pd.Series(cdt_arr, index=['A', 'B'])

print('pds.iloc[0]: {}'.format(str(pds.iloc[0])))   # (1, 0.4000000059604645)
print('pds.iloc[1]: {}'.format(str(pds.iloc[1])))   # (256, -13.0)
print('pds.loc["A"]: {}'.format(str(pds.loc['A']))) # Works
print('pds.loc["B"]: {}'.format(str(pds.loc['B']))) # Works

def print_error(x):
    try:
        o = str(x)      # repr(x) also causes the same errors
        print(o)
    except TypeError as e:
        print('TypeError({})'.format(e.args[0]))

a = pds.iloc[0:1]
b = pds.loc[['A', 'B']]

print('pds.iloc[0:1]:')
print_error(a)
print('pds.loc["A", "B"]:')
print_error(b)
print('pds:')
print_error(pds)

print('pd.DataFrame([pds]).T:')
print_error(pd.DataFrame([pds]).T)

print('pds2:')
cdt_arr_2 = np.array([(1, 0.4)], dtype=c_dtype)
pds2 = pd.Series(cdt_arr_2, index=['A'])
print_error(pds2)

Output (actual):

$ python demo_index_bug.py 
pds.iloc[0]: (1, 0.4000000059604645)
pds.iloc[1]: (256, -13.0)
pds.loc["A"]: (1, 0.4000000059604645)
pds.loc["B"]: (256, -13.0)
pds.iloc[0:1]:
TypeError(ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')
pds.loc["A", "B"]:
TypeError(ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')
pds:
TypeError(ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')
pd.DataFrame([pds]).T:
                         0
A  (1, 0.4000000059604645)
B             (256, -13.0)
pds2:
TypeError(ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')

output of `pd.show_versions()`:

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.5.2-1-ARCH
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 21.0.0
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Stack Trace

I swallowed the stack traces to show where this was failing, so here's the traceback for that last error:

Traceback (most recent call last):
  File "demo_dtype_bug.py", line 37, in <module>
    print(pds2)
  File "~/.local/lib/python3.5/site-packages/pandas/core/base.py", line 46, in __str__
    return self.__unicode__()
  File "~/.local/lib/python3.5/site-packages/pandas/core/series.py", line 984, in __unicode__
    max_rows=max_rows)
  File "~/.local/lib/python3.5/site-packages/pandas/core/series.py", line 1025, in to_string
    dtype=dtype, name=name, max_rows=max_rows)
  File "~/.local/lib/python3.5/site-packages/pandas/core/series.py", line 1053, in _get_repr
    result = formatter.to_string()
  File "~/.local/lib/python3.5/site-packages/pandas/formats/format.py", line 225, in to_string
    fmt_values = self._get_formatted_values()
  File "~/.local/lib/python3.5/site-packages/pandas/formats/format.py", line 215, in _get_formatted_values
    float_format=self.float_format, na_rep=self.na_rep)
  File "~/.local/lib/python3.5/site-packages/pandas/formats/format.py", line 2007, in format_array
    return fmt_obj.get_result()
  File "~/.local/lib/python3.5/site-packages/pandas/formats/format.py", line 2026, in get_result
    fmt_values = self._format_strings()
  File "~/.local/lib/python3.5/site-packages/pandas/formats/format.py", line 2059, in _format_strings
    is_float = lib.map_infer(vals, com.is_float) & notnull(vals)
  File "~/.local/lib/python3.5/site-packages/pandas/core/common.py", line 250, in notnull
    res = isnull(obj)
  File "~/.local/lib/python3.5/site-packages/pandas/core/common.py", line 91, in isnull
    return _isnull(obj)
  File "~/.local/lib/python3.5/site-packages/pandas/core/common.py", line 101, in _isnull_new
    return _isnull_ndarraylike(obj)
  File "~/.local/lib/python3.5/site-packages/pandas/core/common.py", line 192, in _isnull_ndarraylike
    result = np.isnan(values)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

The text was updated successfully, but these errors were encountered:

max-sixty · 2016-05-26T18:20:46Z

ref pydata/xarray#861

jreback · 2016-05-27T00:30:27Z

You are wanting to construct a DataFrame. xarray is a bit extreme for this case. Series is by-definition a single dtyped structure.

In [5]: DataFrame.from_records(cdt_arr)
Out[5]: 
     a     b
0    1   0.4
1  256 -13.0

In [6]: DataFrame.from_records(cdt_arr).dtypes
Out[6]: 
a      int64
b    float32
dtype: object

pganssle · 2016-05-27T14:46:55Z

@jreback Whether or not my purposes would be better served with a dataframe in this case, I think this is still a valid bug, considering that you can construct and use Series perfectly well using a compound datatype, but it crashes when printed.

As for why you might want to do something like this - occasionally there are uses where the semantics are much easier when you can treat a single value as a scalars rather than multiple columns. One toy example would be operations on coordinate systems:

import pandas as pd
import numpy as np

three_vec = np.dtype([('x', 'f8'), ('y', 'f8'), ('z', 'f8')])

def rotate_coordinates(x, u, theta):
    I = np.identity(3)

    ux = np.array([
        [      0, -u['z'],  u['y']],
        [ u['z'],       0, -u['x']],
        [-u['y'],  u['x'],       0]
    ])

    uu = np.array([
        [    u['x'] ** 2, u['x'] * u['y'], u['x'] * u['z']],
        [u['x'] * u['y'],     u['y'] ** 2, u['y'] * u['z']],
        [u['x'] * u['z'], u['y'] * u['z'],     u['z'] ** 2]
    ])

    R = np.cos(theta) * I + np.sin(theta) * ux + (1 - np.cos(theta)) * uu

    xx = x.view(np.float64).reshape(x.shape + (-1,)).T

    out_array = (R @ xx).round(15)

    return np.core.records.fromarrays(out_array, dtype=three_vec)

# Rotate these arrays about z
z = np.array([(0, 0, 1)], dtype=three_vec)[0]
v1 = np.array([(0, 1, 0), (1, 0, 0)], dtype=three_vec)
vp = rotate_coordinates(v1, z, np.pi / 2)

print(v1)
print(vp)

Now imagine that I wanted a pd.DataFrame containing the start and end of some motion. I could represent it as a DataFrame with columns 'start_x', 'end_x', 'start_y', 'end_y', etc, and if I wanted to rotate all the coordinates to a new coordinate system, either manually group the columns, then manually re-distribute them, or I could use a compound datatype three_vec, have a dataframe with columns 'start' and 'end', then do df.apply(partial(rotate_coordinates, u=z, theta=np.pi/2), axis=1). This is a much cleaner way to both store the data and operate on it. It's similar in principle to the idea that if a first-class datetime data type didn't exist, you wouldn't suggest just using a DataFrame with columns 'year', 'month', 'day', etc.

jreback · 2016-05-27T14:53:23Z

@pganssle you are violating the guarantees of a Series. it is by-definition a singular dtype. The bug is that it accepts (a non-singular one) in the first place. I'll reopen for that purpose. There is NO support for a Series with the use-case you describe. EIther use a DataFrame or xarray.

pganssle · 2016-05-27T14:57:20Z

@jreback My suggestion is that compound types are a single type in the same way that a datetime is a single type. Complex numbers are also a single type because they have native numpy support, but what about quarternions and other hypercomplex numbers? I think it's reasonable to use records to define the base unit of a scalar, given that it's already supported by numpy.

jreback · 2016-05-27T14:59:24Z

@pganssle a compound dtype is simply not supported, nor do I think should be. Sure an extension type that is innately a compound type is fine because it singular. But a structured dtype is NOT. it has sub-dtypes. This is just making an already complicated structure WAY more complex.

jreback · 2016-05-27T15:01:35Z

as I said for not this should simply raise NotImplementedError. If you want to investigate if this could be suppport w/o major restructuring, then great. If its trivial, sure. But I suspect its not.

pganssle · 2016-05-27T15:02:33Z

@jreback Does pandas support custom dtypes? I'm not sure that I've ever seen someone create one, other than pandas.

jreback · 2016-05-27T15:03:44Z

https://github.com/pydata/pandas/blob/master/pandas/types/dtypes.py

jreback · 2016-05-27T15:04:43Z

But these required a lot of support to integrate properly. These are fundamental types. I suppose a Coordinate could also be in that category. But as I said its a MAJOR effort to properly handle things.

jreback · 2016-05-27T15:05:26Z

Principally the issue is efficient storage. What you are suggesting is NOT stored efficiently and that's the problem.

jreback · 2016-05-27T15:06:09Z

I have NEVER seen a good use of .apply, and that's what you are suggesting here. That is SO completely inefficient.

pganssle · 2016-05-27T15:17:58Z

It's just a toy example of why the semantics would be useful. You could achieve the same thing with applymap or even just df[:] = rotate_coordinates(df.values, z, theta). I don't have any particular understanding of the underlying efficiency of how these things are stored, I was just demonstrating the concept of compound data types that are "logical" scalars.

I think it's fine to consider my suggestion a "low reward / high effort" enhancement - it may be fundamentally difficult to deal with this sort of thing and not something that comes up a lot, I just think it's worth considering as a "nice to have", since, if possible, it would be better to have first-class support for complex datatypes than not.

When I have a bit of time I will be happy to look into the underlying details and see if I can get a better understanding of difficulty and/or propose an alternate approach. Probably it will be a while, though, since I have quite a backlog of other stuff to get to.

In the meantime, I would think this could be profitably handled by just converting compound datatypes to tuple on import, possibly with a warning about the inefficiency of this approach. At least this would allow people who are less performance sensitive to write some wrapper functions to allow the use of normal semantics.

jreback · 2016-05-27T15:54:28Z

@pganssle if you have time for this great. But I don't have time for every enhancement (actually most of them). So if you'd like to propose something great. However the very simplest thing is to raise an error.

If you are someone wants to implement a better soln. great.

jreback closed this as completed May 27, 2016

jreback added Dtype Conversions Unexpected or buggy dtype conversions Usage Question API Design labels May 27, 2016

jreback reopened this May 27, 2016

jreback changed the title ~~TypeError converting Series to string with structured data array~~ ERR: Series must have a singluar dtype otherwise should raise May 27, 2016

jreback added Error Reporting Incorrect or improved errors from pandas Difficulty Novice labels May 27, 2016

jreback added this to the Next Major Release milestone May 27, 2016

jreback mentioned this issue Oct 6, 2016

ERR: fail fast with non-supported dtypes on construction #14349

Closed

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Jul 23, 2019

mroeschke removed the Usage Question label Oct 9, 2019

jbrockmendel removed the Effort Low label Oct 21, 2019

jbrockmendel mentioned this issue Dec 26, 2019

BUG: compound dtype ndarray passed to Series #30494

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.0 Dec 26, 2019

WillAyd closed this as completed in #30494 Dec 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERR: Series must have a singluar dtype otherwise should raise #13296

ERR: Series must have a singluar dtype otherwise should raise #13296

pganssle commented May 26, 2016 •

edited

Loading

max-sixty commented May 26, 2016

jreback commented May 27, 2016

pganssle commented May 27, 2016

jreback commented May 27, 2016 •

edited

Loading

pganssle commented May 27, 2016

jreback commented May 27, 2016 •

edited

Loading

jreback commented May 27, 2016 •

edited

Loading

pganssle commented May 27, 2016

jreback commented May 27, 2016

jreback commented May 27, 2016 •

edited

Loading

jreback commented May 27, 2016

jreback commented May 27, 2016

pganssle commented May 27, 2016

jreback commented May 27, 2016

ERR: Series must have a singluar dtype otherwise should raise #13296

ERR: Series must have a singluar dtype otherwise should raise #13296

Comments

pganssle commented May 26, 2016 • edited Loading

Code Sample, a copy-pastable example if possible

Output (actual):

output of pd.show_versions():

Stack Trace

max-sixty commented May 26, 2016

jreback commented May 27, 2016

pganssle commented May 27, 2016

jreback commented May 27, 2016 • edited Loading

pganssle commented May 27, 2016

jreback commented May 27, 2016 • edited Loading

jreback commented May 27, 2016 • edited Loading

pganssle commented May 27, 2016

jreback commented May 27, 2016

jreback commented May 27, 2016 • edited Loading

jreback commented May 27, 2016

jreback commented May 27, 2016

pganssle commented May 27, 2016

jreback commented May 27, 2016

pganssle commented May 26, 2016 •

edited

Loading

output of `pd.show_versions()`:

jreback commented May 27, 2016 •

edited

Loading

jreback commented May 27, 2016 •

edited

Loading

jreback commented May 27, 2016 •

edited

Loading

jreback commented May 27, 2016 •

edited

Loading