Skip to content

BUG: Constrain date parsing from strings a little bit more #4601 #4863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 20, 2013

Conversation

danbirken
Copy link
Contributor

closes #4601

Currently dateutil will parse almost any string into a datetime. This
change adds a filter in front of dateutil that will prevent it from
parsing certain strings that don't look like datetimes:

  1. Strings that parse to float values that are less than 1000
  2. Certain special one character strings (this was already in there,
    this just moves that code)

Additionally, this filters out datetimes that are out of range for the
datetime64[ns] type. Currently any out-of-range datetimes will just
overflow and be mapped to some random time within the bounds of
datetime64[ns].

@danbirken
Copy link
Contributor Author

Yeah that looks like a much better place. Will do.

@jreback
Copy link
Contributor

jreback commented Sep 17, 2013

what does this do with an out-of-bounds date? raise? (or depends on coerce)?

@jreback
Copy link
Contributor

jreback commented Sep 17, 2013

i believe this should close #4065 as well?

@jreback
Copy link
Contributor

jreback commented Sep 17, 2013

@danbirken I realize there is a pandas.core.datetools (which is not that big)...should prob go there

@danbirken
Copy link
Contributor Author

It treats an out of bounds date as an invalid date string, so it will either coerce it or not convert the array to datetimes. It doesn't throw an error.

As for the location, both of those other places were causing some circular dependency issues, and on second thought I don't think either of them are any improvement over just putting them in tslib.pyx (which is a file full of timestamp related functions anyway). I also think the tests being in test_tslib.py is an improvement over the previous change.

#4065 is similar, but this change wont fix it. This only fixes the specific case of parsing timestamps from strings (like csv parsing) -- previously out-of-bounds timestamps would just be mapped randomly within the appropriate bounds, whereas now they will be seen as invalid. I think fixing that one should be pretty easy though, I can do that after this one.

@jreback
Copy link
Contributor

jreback commented Sep 18, 2013

gr8 on both points

@danbirken
Copy link
Contributor Author

Rebasing so it can be merged cleanly. Let me know if there is anything else I should add to get this merged.

@jreback
Copy link
Contributor

jreback commented Sep 19, 2013

can make tr function names a little shorter

maybe call the dunction parse_datetime_string?

and I think u can add kw argument, maybe validate to parse_datetime_string to call the verify bounds function (make this function name shorter too)

@danbirken
Copy link
Contributor Author

Shortened the function names and added the kw param.

@jtratner
Copy link
Contributor

Can you make sure all your test function names + open parens are under 80 characters in line length?

@jtratner
Copy link
Contributor

If you feel the test needs a longer explanation, you can put it in comments below the function name.

…#4601

Currently dateutil will parse almost any string into a datetime.  This
change adds a filter in front of dateutil that will prevent it from
parsing certain strings that don't look like datetimes:

1) Strings that parse to float values that are less than 1000
2) Certain special one character strings (this was already in there,
   this just moves that code)

Additionally, this filters out datetimes that are out of range for the
datetime64[ns] type.  Currently any out-of-range datetimes will just
overflow and be mapped to some random time within the bounds of
datetime64[ns].
@danbirken
Copy link
Contributor Author

Sorry I am used to 100 column style rules, my bad. Fixed everything to pep8 80 cols.

@jtratner
Copy link
Contributor

totally reasonable - 100 character makes sense too, we just use 80 everywhere. Thanks for making the change! I'll leave it to @jreback for deciding to merge this.

@jreback jreback merged commit f738712 into pandas-dev:master Sep 20, 2013
@jreback
Copy link
Contributor

jreback commented Sep 20, 2013

@danbirken thanks for the fixes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Type inference code coerces float column into datetime
3 participants