-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix parsing of stata dates (#17797) #17990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix parsing of stata dates (#17797) #17990
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17990 +/- ##
==========================================
- Coverage 91.23% 91.22% -0.02%
==========================================
Files 163 163
Lines 50113 50112 -1
==========================================
- Hits 45723 45713 -10
- Misses 4390 4399 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17990 +/- ##
=========================================
Coverage ? 91.25%
=========================================
Files ? 163
Lines ? 50099
Branches ? 0
=========================================
Hits ? 45716
Misses ? 4383
Partials ? 0
Continue to review full report at Codecov.
|
pandas/tests/io/test_stata.py
Outdated
STATA supports 9 date types which each have distinct units. We test 7 | ||
of the 9 types, ignoring %tC and %tb. %tC is a variant of %tc that | ||
accounts for leap seconds and %tb relies on STATAs business calendar. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great description! Can you convert that to a comment block? That's more aligned with how we comment our tests, and also, reference the issue number at the top (instead of at the bottom).
@miker985 : You're going to need to add a |
@gfyoung Is this the comment you're looking for? I'll wait to hear back on the |
pandas/tests/io/test_stata.py
Outdated
'column', ['ms', 'day', 'week', 'month', 'qtr', 'half', 'yr']) | ||
def test_date_parsing_ignores_format_details(self, column): | ||
# GH 17797 | ||
# Test that display formats are ignored when determining if a numeric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: add a newline under the issue reference.
@TomAugspurger should be merging the PR for the |
df = read_stata(self.stata_dates) | ||
unformatted = df.loc[0, column] | ||
formatted = df.loc[0, column + "_fmt"] | ||
assert unformatted == formatted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these supposed to be datetime64[ns]
dtype?
what happens for the ignored formats? should raise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these supposed to be
datetime64[ns]
dtype?
At this point in the code formatted
and unformatted
are pandas._libs.tslib.Timestamp
objects. Every column in df
has a dtype of datetime64[ns]
what happens for the ignored formats? should raise?
Ignored formats are not converted to dates (consistent with previous behavior) source
can you add a whatsnew for 0.21.1 (bug fix io section) |
2cd0e59
to
b3273c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment. ping on green.
doc/source/whatsnew/v0.21.1.txt
Outdated
@@ -74,6 +74,9 @@ Indexing | |||
I/O | |||
^^^ | |||
|
|||
- Bug in `StataReader` not converting date/time columns with display formatting addressed (:issue:`17990`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do class:`~pandas.io.stata.StataReader`
. Can you make a bit more clear what is being fixed here.
thanks @miker985 |
(cherry picked from commit e886af5)
git diff upstream/master -u -- "*.py" | flake8 --diff
Expands behavior provided by the following to include most STATA format codes.
Add tests for above behavior (previously untested) + all additional format codes