Skip to content

BUG: parse_dates may have columns not in dataframe #32320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 17, 2020

Conversation

sathyz
Copy link
Contributor

@sathyz sathyz commented Feb 28, 2020

read_csv will raise ValueError when columnes used for parse_dates are found in the dataframe.

read_csv will raise ValueError when columnes used for parse_dates are found in the dataframe.
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sathyz

Just making some minor comments ahead of core members' review(s)

@sathyz
Copy link
Contributor Author

sathyz commented Feb 28, 2020

Hi @jreback @gfyoung - Could you please review. I incorporated the changes given in #31815

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! A mix of minor things / comments

@WillAyd WillAyd added the IO CSV read_csv, to_csv label Feb 29, 2020
@pep8speaks
Copy link

pep8speaks commented Feb 29, 2020

Hello @sathyz! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-12 02:35:27 UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, other lgtm.

@jreback jreback requested a review from gfyoung March 3, 2020 03:19
@jreback jreback added this to the 1.1 milestone Mar 3, 2020
@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Mar 3, 2020
@jreback jreback mentioned this pull request Mar 3, 2020
5 tasks
@gfyoung
Copy link
Member

gfyoung commented Mar 3, 2020

One minor doc fix, but otherwise, these changes look good. Nice job with the testing!

@sathyz
Copy link
Contributor Author

sathyz commented Mar 4, 2020

@gfyoung done! Thanks for the corrections. I didn't notice these mistakes.

@gfyoung
Copy link
Member

gfyoung commented Mar 4, 2020

@pandas-dev/pandas-core : These mypy failures here look unrelated to this PR.

@simonjayhawkins
Copy link
Member

@pandas-dev/pandas-core : These mypy failures here look unrelated to this PR.

xref #32438

@sathyz
Copy link
Contributor Author

sathyz commented Mar 5, 2020

What do I have to do? Do I have to fix it to merge this PR?

@simonjayhawkins
Copy link
Member

What do I have to do? Do I have to fix it to merge this PR?

you need to merge master see https://pandas.io/docs/development/contributing.html#updating-your-pull-request

@sathyz
Copy link
Contributor Author

sathyz commented Mar 6, 2020

Last couple of times I did that #31815 #31550 and it was not smooth. Let me try this time.

@sathyz
Copy link
Contributor Author

sathyz commented Mar 7, 2020

I am not sure what is failing in docs, how do I debug.

@gfyoung
Copy link
Member

gfyoung commented Mar 7, 2020

@pandas-dev/pandas-core

@sathyz
Copy link
Contributor Author

sathyz commented Mar 10, 2020

Any updates? How do I fix the problem in docs?

@datapythonista
Copy link
Member

Any updates? How do I fix the problem in docs?

The docs fetch a url from some other projects, to know the absolute url of a page when we write something like :ref:numpy.array. Looks like an error in GitHub pages was making the StatsModels website fail, and that caused the fetching of their url to fail in the build in your PR.

I guess that should be fixed not. A simple git fetch upstream && git merge upstream/master && git push should restart all jobs, and hopefully get the CI green this time.

@sathyz
Copy link
Contributor Author

sathyz commented Mar 12, 2020

Done, please merge.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping on green.

@pytest.mark.parametrize(
"names, usecols, parse_dates, missing_cols",
[
(None, ["val"], ["date", "time"], "date, time"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a tuple or other list-like in this test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I use tuple, _validate_parse_dates_arg throws the following error.

TypeError: Only booleans, lists, and dictionaries are accepted for the 'parse_dates' parameter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I guess this is only currently documented as supporting scalars, lists and dicts (not tuples)

@sathyz
Copy link
Contributor Author

sathyz commented Mar 16, 2020

@jreback done. Please review.

@WillAyd WillAyd merged commit 9c7494a into pandas-dev:master Mar 17, 2020
@WillAyd
Copy link
Member

WillAyd commented Mar 17, 2020

Thanks @sathyz - great first PR

@sathyz sathyz deleted the issue-31251-2 branch March 17, 2020 03:49
SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_csv: if parse_dates dont appear in use_cols, we get a trace
9 participants