Skip to content

BUG: Bug in DataFrame construction with nulls and datetimes in a list like #15892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Apr 4, 2017

closes #15869

@jreback jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Apr 4, 2017
@jreback jreback added this to the 0.20.0 milestone Apr 4, 2017
object v

for i in range(n):
v = arr[i]
if util.is_string_object(v):
continue
seen_string = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is seen_string supposed to be used somewhere?

@jreback
Copy link
Contributor Author

jreback commented Apr 4, 2017

actually this is non-performant.

The issue is that it ends up looking at all of the strings, when we need to just test a couple to see if they are convertible. Not to mention this fails a couple of tests that have to do with us preserving date (which is annoying).

elif is_timedelta(v):
seen_timedelta=1
# nan or None
seen_null = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, is seen_null supposed to be used somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no these were tries.......

@jreback
Copy link
Contributor Author

jreback commented Apr 4, 2017

I think this is fixed. boy user convenience is a PITA.

@jreback
Copy link
Contributor Author

jreback commented Apr 4, 2017

ok I think its fixed now.

@chrisaycock
Copy link
Contributor

As long as the test/benchmarks pass, then LGTM. This was a bigger effort than I thought it would be.

@jreback
Copy link
Contributor Author

jreback commented Apr 4, 2017

@chrisaycock yeah a lot of this lower-level inference code is reasonable as we have quite a few integration tests, so refactoring stuff that I wrote years ago is pretty straightforward. But of course there are lots of cases and easy to break stuff (which the tests catch).

Down the road we will will be able to simplify things. For example datetime.date is not a first class type, so its very hacky when we infer from it (to maintain some compat). In the future, it will be more straightforward to do this (as we will have a valid first class type).

Further I have to think about whether we still want things like:

In [5]: Series([pd.NaT, '1 day'])
Out[5]: 
0      NaT
1   1 days
dtype: timedelta64[ns]

to work. It is reasonable & not too hard.

this is actually a bigger issue:

In [6]: Series([pd.NaT])
Out[6]: 
0   NaT
dtype: datetime64[ns]

as this is somewhat arbitrary. but this is for another time.

@codecov
Copy link

codecov bot commented Apr 4, 2017

Codecov Report

Merging #15892 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15892      +/-   ##
==========================================
- Coverage   90.97%   90.97%   -0.01%     
==========================================
  Files         145      145              
  Lines       49491    49495       +4     
==========================================
+ Hits        45023    45026       +3     
- Misses       4468     4469       +1
Flag Coverage Δ
#multiple 88.73% <100%> (-0.01%) ⬇️
#single 40.63% <70%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/types/cast.py 85.71% <100%> (+0.11%) ⬆️
pandas/core/common.py 90.68% <0%> (-0.35%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e50d397...6bf2148. Read the comment docs.

@jreback jreback closed this in e0b60c0 Apr 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"IndexError: tuple index out of range" error with numpy array contain datetimes
2 participants