BUG: need better inference for path in Series construction #9456

jreback · 2015-02-10T13:13:29Z

This hits a path in Series.__init__ which I think needs some better inference

https://github.com/pydata/pandas/blob/master/pandas/core/series.py#L178

In [1]: d = {numpy.datetime64('2015-01-07T02:00:00.000000000+0200'): 42544017.198965244,
   ...:      numpy.datetime64('2015-01-08T02:00:00.000000000+0200'): 40512335.181958228,
   ...:      numpy.datetime64('2015-01-09T02:00:00.000000000+0200'): 39712952.781494237,
   ...:      numpy.datetime64('2015-01-12T02:00:00.000000000+0200'): 39002721.453793451}

In [2]: Series(d)
Out[2]: 
2015-01-07   NaN
2015-01-08   NaN
2015-01-09   NaN
2015-01-12   NaN
dtype: float64

In [3]: Series(d.values(),d.keys())
Out[3]: 
2015-01-07    42544017.198965
2015-01-08    40512335.181958
2015-01-09    39712952.781494
2015-01-12    39002721.453793
dtype: float64

The problem is the index is already converted at this point and its not easy to get the keys/values out (except to do so explicity which is better IMHO).

Need a review of what currently hits this path (can simply put a halt in here and see what tests hit this). Then figure out a better method.

The text was updated successfully, but these errors were encountered:

patrickfournier · 2015-04-13T16:27:17Z

I am working on this issue (not at the sprint, unfortunately).

patrickfournier · 2015-04-14T04:59:21Z

I put some traces in the elif isinstance(data, dict): block and ran the tests in pandas.tests.test_series.

The if isinstance(index, DatetimeIndex) block catches those tests:
- test_from_csv
- test_name_printing
- test_to_dict
However, all three throw a TypeError exception because index.astype('O') is an Index, not an nparray.
The elif isinstance(index, PeriodIndex) block catches the second Series() of the test_constructor_dict
The else block catches everything else. Three tests throw a TypeError exception because data is not a dict but a dict subclass:
- test_constructor_subclass_dict
- test_orderedDict_ctor
- test_orderedDict_subclass_ctor

I suggest to rewrite the try block like this:

    if isinstance(index, DatetimeIndex) and lib.infer_dtype(data) != 'datetime64':
        data = lib.fast_multiget(data, index.astype('O').values, default=np.nan)
    elif isinstance(index, PeriodIndex):
        data = [data.get(i, nan) for i in index]
    else:
        data = lib.fast_multiget(data, index.values, default=np.nan)

If this is not complete nonsense, I can add a test and create a pull request.

BUG: #10160 DataFrame construction from nested dict with datetime64 index

jreback · 2015-06-10T20:01:43Z

closed by #10269

ruidc · 2015-08-18T16:15:21Z

isn't this still an issue? eg.

In [1]: import pandas;import numpy;import datetime;

In [2]: ix = pandas.MultiIndex.from_arrays([pandas.Index(numpy.array([datetime.date(2015,7,31)], dtype='datetime64[D]')), numpy.array([0.1], dtype='object')])

In [3]: v = {'a':0.1}

In [4]: pandas.DataFrame(v,columns=ix)
Out[4]:
Empty DataFrame
Columns: [(2015-07-31 00:00:00, 0.1)]
Index: []

In [7]: pandas.__version__
Out[7]: u'0.16.2+286.g993942e'

jreback · 2015-08-18T23:49:01Z

@ruidc what exactly do you think your above should do? If anything I would say it should raise as you have all scalar values.

ruidc · 2015-08-19T06:30:50Z

that's not what i was trying to show, I'd expect to see the values not an empty DataFrame or NaNs:

In [6]: pandas.DataFrame(v, index=v.keys(),columns=ix)
Out[6]:
  2015-07-31
           a
a        NaN
b        NaN

jorisvandenbossche · 2015-08-19T09:18:33Z

@ruidc If you provide a dict to DataFrame() the dict keys map to the columns, so you would in any case not get the above (for that you have to do pd.Series(v) and convert to frame / set the name afterwards.

The reason you get an empty dataframe is because you first give a column name 'a' (dict key), but then also provide another column name (with columns=..) which will then reindex the original data provided with the dict, and since this column is not available in there, you get an empty dataframe

jorisvandenbossche · 2015-08-19T09:20:19Z

But as @jreback said, you should actually get an error:

In [38]: pd.DataFrame({'a':0.1})
ValueError: If using all scalar values, you must pass an index

In [39]: pd.DataFrame({'a':0.1}, columns=['b'])
Out[39]:
Empty DataFrame
Columns: [b]
Index: []

but is seems this is not triggered when passing another column name

ruidc · 2015-08-19T14:09:06Z

Ok, i see, I made mistakes in reducing my original problem.
and was further confused that the result is different between passing columns in the constructor vs setting columns afterwards.

ruidc · 2015-08-20T14:51:37Z

maybe this shows the problem better, although it's not specific to MultiIndex:

In [1]: import pandas;import numpy;import datetime;
In [2]: v = datetime.date.today()
In [3]: pandas.DataFrame({v : pandas.Series(range(3),index=range(3))}, columns=[v])
Out[3]:
   2015-08-20
0           0
1           1
2           2
In [4]: v = v, v
In [5]: pandas.DataFrame({v : pandas.Series(range(3),index=range(3))}, columns=[v])
Out[5]:
  (2015-08-20, 2015-08-20)
0                      NaN
1                      NaN
2                      NaN

jreback · 2015-08-20T14:56:11Z

yeh suppose the last is prob a bug, pls create a new issue.

ruidc · 2015-08-20T15:23:45Z

Thx for confirming, done: #10863 and sorry for the uninspired title, but i think I've been looking at this particular issue for too long to be creative.

jreback added Bug Good as first PR Dtype Conversions Unexpected or buggy dtype conversions labels Feb 10, 2015

jreback added this to the 0.16.0 milestone Feb 10, 2015

jreback modified the milestones: 0.16.0, Next Major Release Mar 5, 2015

patrickfournier mentioned this issue Apr 17, 2015

BUG: need better inference for path in Series construction (GH9456) #9924

Closed

jreback mentioned this issue May 5, 2015

Problem constructing Series from dict with datetime.date in level of MultiIndex #10060

Closed

jreback added the Effort Low label May 5, 2015

jreback mentioned this issue Jun 4, 2015

BUG: GH10160 in DataFrame construction from dict with datetime64 index #10269

Closed

jreback pushed a commit that referenced this issue Jun 10, 2015

BUG: #9456 Series construction from dict with datetime64 keys

821542f

BUG: #10160 DataFrame construction from nested dict with datetime64 index

jreback closed this as completed Jun 10, 2015

jreback modified the milestones: 0.16.2, Next Major Release Jun 10, 2015

jreback mentioned this issue Aug 19, 2015

ERR: frame construction with all scalars doesn't raise when columns are provided #10856

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: need better inference for path in Series construction #9456

BUG: need better inference for path in Series construction #9456

jreback commented Feb 10, 2015

patrickfournier commented Apr 13, 2015

patrickfournier commented Apr 14, 2015

jreback commented Jun 10, 2015

ruidc commented Aug 18, 2015

jreback commented Aug 18, 2015

ruidc commented Aug 19, 2015

jorisvandenbossche commented Aug 19, 2015

jorisvandenbossche commented Aug 19, 2015

ruidc commented Aug 19, 2015

ruidc commented Aug 20, 2015

jreback commented Aug 20, 2015

ruidc commented Aug 20, 2015

BUG: need better inference for path in Series construction #9456

BUG: need better inference for path in Series construction #9456

Comments

jreback commented Feb 10, 2015

patrickfournier commented Apr 13, 2015

patrickfournier commented Apr 14, 2015

jreback commented Jun 10, 2015

ruidc commented Aug 18, 2015

jreback commented Aug 18, 2015

ruidc commented Aug 19, 2015

jorisvandenbossche commented Aug 19, 2015

jorisvandenbossche commented Aug 19, 2015

ruidc commented Aug 19, 2015

ruidc commented Aug 20, 2015

jreback commented Aug 20, 2015

ruidc commented Aug 20, 2015