BUG: Bug in DataFrame construction with nulls and datetimes in a list like #15892

jreback · 2017-04-04T20:02:23Z

chrisaycock · 2017-04-04T20:12:14Z

pandas/_libs/src/inference.pyx

        object v

    for i in range(n):
        v = arr[i]
        if util.is_string_object(v):
-            continue
+            seen_string = 1


Is seen_string supposed to be used somewhere?

jreback · 2017-04-04T20:13:15Z

actually this is non-performant.

The issue is that it ends up looking at all of the strings, when we need to just test a couple to see if they are convertible. Not to mention this fails a couple of tests that have to do with us preserving date (which is annoying).

chrisaycock · 2017-04-04T20:14:08Z

pandas/_libs/src/inference.pyx

-        elif is_timedelta(v):
-            seen_timedelta=1
+            # nan or None
+            seen_null = 1


Similarly, is seen_null supposed to be used somewhere?

no these were tries.......

…-like closes pandas-dev#15869

jreback · 2017-04-04T20:49:34Z

I think this is fixed. boy user convenience is a PITA.

jreback · 2017-04-04T21:27:15Z

ok I think its fixed now.

chrisaycock · 2017-04-04T21:41:11Z

As long as the test/benchmarks pass, then LGTM. This was a bigger effort than I thought it would be.

jreback · 2017-04-04T21:45:37Z

@chrisaycock yeah a lot of this lower-level inference code is reasonable as we have quite a few integration tests, so refactoring stuff that I wrote years ago is pretty straightforward. But of course there are lots of cases and easy to break stuff (which the tests catch).

Down the road we will will be able to simplify things. For example datetime.date is not a first class type, so its very hacky when we infer from it (to maintain some compat). In the future, it will be more straightforward to do this (as we will have a valid first class type).

Further I have to think about whether we still want things like:

In [5]: Series([pd.NaT, '1 day'])
Out[5]: 
0      NaT
1   1 days
dtype: timedelta64[ns]

to work. It is reasonable & not too hard.

this is actually a bigger issue:

In [6]: Series([pd.NaT])
Out[6]: 
0   NaT
dtype: datetime64[ns]

as this is somewhat arbitrary. but this is for another time.

codecov · 2017-04-04T22:12:54Z

Codecov Report

Merging #15892 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15892      +/-   ##
==========================================
- Coverage   90.97%   90.97%   -0.01%     
==========================================
  Files         145      145              
  Lines       49491    49495       +4     
==========================================
+ Hits        45023    45026       +3     
- Misses       4468     4469       +1

Flag	Coverage Δ
#multiple	`88.73% <100%> (-0.01%)`	⬇️
#single	`40.63% <70%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/types/cast.py	`85.71% <100%> (+0.11%)`	⬆️
pandas/core/common.py	`90.68% <0%> (-0.35%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e50d397...6bf2148. Read the comment docs.

jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Apr 4, 2017

jreback added this to the 0.20.0 milestone Apr 4, 2017

jreback mentioned this pull request Apr 4, 2017

"IndexError: tuple index out of range" error with numpy array contain datetimes #15869

Closed

chrisaycock reviewed Apr 4, 2017

View reviewed changes

BUG: Bug in DataFrame construction with nulls and datetimes in a list…

7fcd4e5

…-like closes pandas-dev#15869

jreback force-pushed the construct branch from 0567799 to 7fcd4e5 Compare April 4, 2017 20:49

jreback force-pushed the construct branch from ac347ed to c8d7111 Compare April 4, 2017 21:24

fix perf

6bf2148

jreback force-pushed the construct branch from c8d7111 to 6bf2148 Compare April 4, 2017 21:30

jreback mentioned this pull request Apr 4, 2017

A "NULL" / "NA" logical type wesm/pandas2#9

Open

jreback closed this in e0b60c0 Apr 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Bug in DataFrame construction with nulls and datetimes in a list like #15892

BUG: Bug in DataFrame construction with nulls and datetimes in a list like #15892

jreback commented Apr 4, 2017

chrisaycock Apr 4, 2017

jreback commented Apr 4, 2017

chrisaycock Apr 4, 2017

jreback Apr 4, 2017

jreback commented Apr 4, 2017

jreback commented Apr 4, 2017

chrisaycock commented Apr 4, 2017

jreback commented Apr 4, 2017

codecov bot commented Apr 4, 2017 •

edited

Loading

BUG: Bug in DataFrame construction with nulls and datetimes in a list like #15892

BUG: Bug in DataFrame construction with nulls and datetimes in a list like #15892

Conversation

jreback commented Apr 4, 2017

chrisaycock Apr 4, 2017

Choose a reason for hiding this comment

jreback commented Apr 4, 2017

chrisaycock Apr 4, 2017

Choose a reason for hiding this comment

jreback Apr 4, 2017

Choose a reason for hiding this comment

jreback commented Apr 4, 2017

jreback commented Apr 4, 2017

chrisaycock commented Apr 4, 2017

jreback commented Apr 4, 2017

codecov bot commented Apr 4, 2017 • edited Loading

Codecov Report

codecov bot commented Apr 4, 2017 •

edited

Loading