Skip to content

BUG: Allow custom error values in parse_dates argument of read_sql like functions (GH35185) #37823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Dec 14, 2020

Conversation

avinashpancham
Copy link
Contributor

@avinashpancham avinashpancham commented Nov 13, 2020

@avinashpancham avinashpancham changed the title BUG: Allow custom error values in parse_dates argument of read_sql (GH35185) BUG: Allow custom error values in parse_dates argument of read_sql like functions (GH35185) Nov 14, 2020
],
)
def test_custom_dateparsing_error(self, read_sql, text, mode):
if self.mode in mode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this tests assert the results, otherwise its not actually testing anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pandas/io/sql.py Outdated
@@ -79,7 +79,9 @@ def _process_parse_dates_argument(parse_dates):

def _handle_date_column(col, utc=None, format=None):
if isinstance(format, dict):
return to_datetime(col, errors="ignore", **format)
return to_datetime(
col, **{"errors": "ignore", **format} # type: ignore[call-overload]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document what values format can actualy take on here? also don't type ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYM with "values format can take on here", should I write all the arguments that to_datetime accepts?

I added the type ignore since to_datetime is not overloaded for this argument combination and therefore results into a mypy error. I did not want to add an overload variation for this exception, so any tips on how to approach this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR, is this what you meant with documenting values that format can take? Also changed the python code so that we dont have to type ignore anymore.

@jreback jreback added the IO SQL to_sql, read_sql, read_sql_query label Nov 15, 2020
@avinashpancham
Copy link
Contributor Author

@jreback could you have another look?

@@ -745,6 +745,7 @@ I/O
- :meth:`read_fwf` was inferring compression with ``compression=None`` which was not consistent with the other :meth:``read_*`` functions (:issue:`37909`)
- :meth:`DataFrame.to_html` was ignoring ``formatters`` argument for ``ExtensionDtype`` columns (:issue:`36525`)
- Bumped minimum xarray version to 0.12.3 to avoid reference to the removed ``Panel`` class (:issue:`27101`)
- Allow custom error values for parse_dates argument of :func:`read_sql`, :func:`read_sql_query` and :func:`read_sql_table` (:issue:`GH35185`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this to whatsnew/v1.3.0.rst?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"DateCol": {"errors": error},
},
)
assert issubclass(df.DateCol.dtype.type, np.datetime64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test the full expected dataframe with tm.assert_frame_equal?

Copy link
Contributor Author

@avinashpancham avinashpancham Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Np to do this, however all the other tests that use 'types_test_data' do not use tm.assert_frame_equal, but the assert approach I used. So just doublechecking if we want to go ahead with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_frame_equal is preferable in general since it tests the asserts you have and more

Copy link
Contributor Author

@avinashpancham avinashpancham Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test, but sometimes the assert fails since the machine casts the integer that is read to int32. Giving the error:

E           Attribute "dtype" are different
E           [left]:  int64
E           [right]: int32

I've been trying some stuff, but without success. Is there a way to let pandas decide which int type to use, so that we don't have this mismatch? Else we can provide the argument check_dtype=False to assert_frame_equal, but IMO that seems a bit too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, it looks like I got it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jreback jreback added this to the 1.3 milestone Dec 13, 2020
@jreback
Copy link
Contributor

jreback commented Dec 13, 2020

@avinashpancham ok this is good. Can you do a followon PR (or open an issue to do so), that adds examples to the read_sql doc-string. e.g. using a passed dict like this (but since there are NO examples would be good to have some general examples using the various options).

@avinashpancham
Copy link
Contributor Author

Sure, will create a follow on PR to add some examples in the docstring.

@mroeschke mroeschke merged commit 7ffbf1a into pandas-dev:master Dec 14, 2020
@mroeschke
Copy link
Member

Thanks @avinashpancham

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
…ke functions (GH35185) (pandas-dev#37823)

* BUG: Allow custom error values in parse_dates argument of read_sql (GH35185)

* Ignore call overload for one time exception

* Add tests for custom dateparsing error for read_sql

* Generalize test for all sql read functions

* Add conditional mode for tests

* Typo

* Updated test

* Update to_datetime call in _handle_date_column

* Move whatsnew message to v1.3.0

* Update test

* Explicit cast to int64

* Remove accidental check_dtype=False

* Fix wrong reference in whatsnew

* Add hyphen

Co-authored-by: Jeff Reback <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_sql parse_dates does not allow for "errors" arg
3 participants