Skip to content

ENH: optional ':' separator in ISO8601 strings #12483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

thejohnfreeman
Copy link
Contributor

Includes some refactoring in the ISO8601 parser. If more tests are desired, please let me know where to put them. Same for the whatsnew entry. This is effectively my first pull request for Pandas.

cc @jreback

'6-2014': datetime.datetime(2014, 6, 1),

'20010101 12': datetime.datetime(2001, 1, 1, 12),
'20010101 1234': datetime.datetime(2001, 1, 1, 12, 34),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I don't think 12 or 1234 should work (but 123456) should , IOW, you either have to fully specify HHMMSS or not at all. Any ambiuity in partially accepting it as you have here?

cc @chris-b1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the current version accepts 12 in addition to 1. I was thinking we should keep the two digit form.

In [23]: pd.Timestamp('20010101 1')
Out[23]: Timestamp('2001-01-01 01:00:00')

In [24]: pd.Timestamp('20010101 09')
Out[24]: Timestamp('2001-01-01 09:00:00')

In [25]: pd.Timestamp('20010101 12')
Out[25]: Timestamp('2001-01-01 12:00:00')

As for four digits, I was thinking it would make sense to allow "military" times like 0800 (which is also currently accepted).

In [26]: pd.Timestamp('20010101 0800')
Out[26]: Timestamp('2001-01-01 08:00:00')

@jreback jreback added Datetime Datetime data dtype Bug labels Feb 27, 2016
@thejohnfreeman thejohnfreeman force-pushed the GH-10041 branch 3 times, most recently from 9d7911c to dc7e372 Compare February 28, 2016 16:21
@jreback
Copy link
Contributor

jreback commented Mar 3, 2016

cc @chris-b1 can you have a look

@chris-b1
Copy link
Contributor

chris-b1 commented Mar 3, 2016

Looks good as far as I can tell. The partial time formats are a bit out there, but unambiguous, so I don't see any harm.

In [24]: pd.Timestamp('2015/01/01 1234')
Out[24]: Timestamp('2015-01-01 12:34:00')

@jreback jreback added this to the 0.18.0 milestone Mar 3, 2016
@@ -43,3 +43,10 @@ Performance Improvements

Bug Fixes
~~~~~~~~~

- Timestamps without ':' in HHMMSS keep microsecond resolution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just do this in a 1-liner, pls add the issue number, and move to 0.18.0

@jreback
Copy link
Contributor

jreback commented Mar 5, 2016

can you rebase and push again

Allows Timestamps constructed from strings without the ':' separator in
HHMMSS to preserve microsecond resolution.
@jreback
Copy link
Contributor

jreback commented Mar 5, 2016

thanks @thejohnfreeman !

@thejohnfreeman
Copy link
Contributor Author

Thanks for the help!

@jreback
Copy link
Contributor

jreback commented Mar 5, 2016

I normally test on windows before merging....but alas

.................................................S.......S.........................................................................................
...................................................
======================================================================
ERROR: test_parsers_iso8601 (pandas.tseries.tests.test_tslib.TestDatetimeParsingWrappers)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\conda\Documents\pandas3.5\pandas\tseries\tests\test_tslib.py", line 736, in test_parsers_iso8601
    raise Exception(date_str)
Exception: 2005-0101

======================================================================
FAIL: test_nanosecond_string_parsing (pandas.tseries.tests.test_tslib.TestTimestampNsOperations)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\conda\Documents\pandas3.5\pandas\tseries\tests\test_tslib.py", line 902, in test_nanosecond_string_parsing
    self.assertEqual(ts.value, expected_value)
AssertionError: 1367392545123456000 != 1367392545123456789

----------------------------------------------------------------------
Ran 9204 tests in 304.769s

FAILED (SKIP=174, errors=1, failures=1)

@jreback jreback reopened this Mar 5, 2016
@jreback
Copy link
Contributor

jreback commented Mar 5, 2016

so if you have the ability to debug on windows, pls have a look :<

@jreback jreback modified the milestones: 0.18.1, 0.18.0 Mar 5, 2016
@thejohnfreeman
Copy link
Contributor Author

Just at a glance, the errors make it seem like my changes to the C code weren't built. I didn't have any platform-specific preprocessor guards in the C, so I would expect it to behave the same on each platform. What are the sources of differences between Windows and Linux behavior? Are the numpy and Python datetime types different on each platform? Are there platform-specific tests?

@jreback
Copy link
Contributor

jreback commented Mar 6, 2016

hmm, I just tested this again on windows and its ok. Why don't you rebase and push again.

@jreback jreback added this to the 0.18.0 milestone Mar 6, 2016
@jreback jreback removed this from the 0.18.1 milestone Mar 6, 2016
@jreback jreback closed this in 820e110 Mar 6, 2016
@jreback
Copy link
Contributor

jreback commented Mar 6, 2016

ok, maybe something funny happened in the build.

thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loss of nanosecond resolution when constructing Timestamps from str
3 participants