Skip to content

BUG: Fix parsing of stata dates (#17797) #17990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 31, 2017

Conversation

miker985
Copy link
Contributor

Expands behavior provided by the following to include most STATA format codes.

        # remove format details from %td
        self.fmtlist = ["%td" if x.startswith("%td") else x
                        for x in self.fmtlist]

Add tests for above behavior (previously untested) + all additional format codes

@codecov
Copy link

codecov bot commented Oct 26, 2017

Codecov Report

Merging #17990 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17990      +/-   ##
==========================================
- Coverage   91.23%   91.22%   -0.02%     
==========================================
  Files         163      163              
  Lines       50113    50112       -1     
==========================================
- Hits        45723    45713      -10     
- Misses       4390     4399       +9
Flag Coverage Δ
#multiple 89.03% <100%> (-0.01%) ⬇️
#single 40.31% <0%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/stata.py 93.7% <100%> (-0.01%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6779ac0...9ab0681. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 26, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@cc7abd9). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #17990   +/-   ##
=========================================
  Coverage          ?   91.25%           
=========================================
  Files             ?      163           
  Lines             ?    50099           
  Branches          ?        0           
=========================================
  Hits              ?    45716           
  Misses            ?     4383           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.06% <100%> (?)
#single 40.24% <0%> (?)
Impacted Files Coverage Δ
pandas/io/stata.py 93.7% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc7abd9...04ec86f. Read the comment docs.

@gfyoung gfyoung added Bug IO Stata read_stata, to_stata labels Oct 26, 2017
STATA supports 9 date types which each have distinct units. We test 7
of the 9 types, ignoring %tC and %tb. %tC is a variant of %tc that
accounts for leap seconds and %tb relies on STATAs business calendar.
"""
Copy link
Member

@gfyoung gfyoung Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great description! Can you convert that to a comment block? That's more aligned with how we comment our tests, and also, reference the issue number at the top (instead of at the bottom).

@gfyoung
Copy link
Member

gfyoung commented Oct 26, 2017

@miker985 : You're going to need to add a whatsnew for this PR. However, as the doc for 0.21.1 is not available yet, just hold tight. Feel free to ping us later if we forget to let you know.

@miker985
Copy link
Contributor Author

@gfyoung Is this the comment you're looking for?

I'll wait to hear back on the whatsnew file. If I don't hear back in e.g., a week should I ping you?

'column', ['ms', 'day', 'week', 'month', 'qtr', 'half', 'yr'])
def test_date_parsing_ignores_format_details(self, column):
# GH 17797
# Test that display formats are ignored when determining if a numeric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add a newline under the issue reference.

@gfyoung
Copy link
Member

gfyoung commented Oct 26, 2017

If I don't hear back in e.g., a week should I ping you?

@TomAugspurger should be merging the PR for the whatsnew soon, so that works.

df = read_stata(self.stata_dates)
unformatted = df.loc[0, column]
formatted = df.loc[0, column + "_fmt"]
assert unformatted == formatted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these supposed to be datetime64[ns] dtype?

what happens for the ignored formats? should raise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these supposed to be datetime64[ns] dtype?

At this point in the code formatted and unformatted are pandas._libs.tslib.Timestamp objects. Every column in df has a dtype of datetime64[ns]

what happens for the ignored formats? should raise?

Ignored formats are not converted to dates (consistent with previous behavior) source

@jreback
Copy link
Contributor

jreback commented Oct 27, 2017

can you add a whatsnew for 0.21.1 (bug fix io section)

@miker985 miker985 force-pushed the improve-stata-date-loading branch from 2cd0e59 to b3273c4 Compare October 27, 2017 15:14
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment. ping on green.

@@ -74,6 +74,9 @@ Indexing
I/O
^^^

- Bug in `StataReader` not converting date/time columns with display formatting addressed (:issue:`17990`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do class:`~pandas.io.stata.StataReader`. Can you make a bit more clear what is being fixed here.

@jreback jreback added this to the 0.21.1 milestone Oct 28, 2017
@jreback jreback merged commit e886af5 into pandas-dev:master Oct 31, 2017
@jreback
Copy link
Contributor

jreback commented Oct 31, 2017

thanks @miker985

peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request Oct 31, 2017
@miker985 miker985 deleted the improve-stata-date-loading branch October 31, 2017 15:03
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017
TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StataReader selectively ignores format details
4 participants