BUG: Fix using dtype with parse_dates in read_csv #34330

mproszewska · 2020-05-23T01:45:15Z

closes BUG: ValueError in read_csv when dtype='string' and parse_dates is present #34066
tests added/passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

dsaxton · 2020-05-25T14:25:32Z

pandas/tests/io/parser/test_common.py

+    )
+    expected = expected.astype({"b": np.datetime64})
+    df = parser.read_csv(StringIO(data), dtype="string", parse_dates=["b"])
+    tm.assert_frame_equal(df, expected)


Can you call this result instead of df? Also instead of astyping expected you should be able to set the types in the constructor.

Only one type is allowed in constructor, so I'm not sure how I can set it there

jreback

cc @gfyoung

jreback · 2020-05-25T16:35:44Z

pandas/tests/io/parser/test_common.py

+    expected = DataFrame(
+        [["1", "2020-05-23 01:00:00"]], columns=["a", "b"], dtype="string"
+    )
+    expected = expected.astype({"b": np.datetime64})


this is very strange to astype this way, please use pd.to_datetime

@dsaxton's suggestion above to declare types in the constructor would remove the need to do any type-casting.

jreback · 2020-05-25T16:41:39Z

pandas/io/parsers.py

@@ -1708,7 +1708,9 @@ def _convert_to_ndarrays(
        result = {}
        for c, values in dct.items():
            conv_f = None if converters is None else converters.get(c, None)
-            if isinstance(dtypes, dict):
+            if values.dtype != object:


this is very odd to do as we already have a path for a single dtype, what are you trying to do here?

After loading values from csv we have dictionary with column names and numpy array for each column with dtype=object. Then we change values that are suppose to be datetime ('b' in example). After that we want to change types of the rest of columns, that is those that have dtype=object. In that line we're skiping columns that already have dtype set.

Use is_object_dtype(values.dtype) to do the check

jreback · 2020-05-25T16:42:14Z

pandas/io/parsers.py

@@ -3264,6 +3266,9 @@ def _make_date_converter(
 ):
    def converter(*date_cols):
        if date_parser is None:
+            date_cols = tuple(
+                x if isinstance(x, np.ndarray) else x.to_numpy() for x in date_cols
+            )


this is better off done inside concat_date_cols, but what is the incoming data here in the example?

It's tuple with StringArray with dates from 'b'.

is it possible to move this to concat_date_cols as per @jreback comment?

gfyoung · 2020-05-25T19:52:13Z

Comments from @jreback and @dsaxton notwithstanding, the changes look good !

dsaxton · 2020-09-16T13:55:16Z

@mproszewska Are you still interested in working on this? If so can you merge master?

dsaxton · 2020-10-08T00:27:10Z

Closing as stale, @mproszewska let us know if you'd like to reopen

mproszewska · 2020-10-08T19:06:33Z

I can go back to working on this. Could you reopen?

gfyoung · 2020-10-08T20:01:32Z

@mproszewska : Done

pep8speaks · 2020-10-08T20:27:50Z

Hello @mproszewska! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-08 20:36:43 UTC

arw2019

Some small comments from me. Also @mproszewska can you merge master once more?

cc @dsaxton @gfyoung @jreback

arw2019 · 2020-11-06T19:58:35Z

pandas/io/parsers.py

@@ -1708,7 +1708,9 @@ def _convert_to_ndarrays(
        result = {}
        for c, values in dct.items():
            conv_f = None if converters is None else converters.get(c, None)
-            if isinstance(dtypes, dict):
+            if values.dtype != object:


Use is_object_dtype(values.dtype) to do the check

arw2019 · 2020-11-06T19:59:22Z

pandas/io/parsers.py

@@ -3264,6 +3266,9 @@ def _make_date_converter(
 ):
    def converter(*date_cols):
        if date_parser is None:
+            date_cols = tuple(
+                x if isinstance(x, np.ndarray) else x.to_numpy() for x in date_cols
+            )


is it possible to move this to concat_date_cols as per @jreback comment?

jreback · 2020-12-29T20:42:50Z

seems not unreasonable, but his PR is stale.

mproszewska added 9 commits May 15, 2020 17:38

PERF: Remove unnecessary copies in sorting functions

c94b45e

Run tests

0ab450b

Run tests

54c7304

Add asv

6d72a34

Run black

5ba54a6

Remove asv

2766270

BUG: Fix using dtype with parse_dates in read_csv

b800207

Fix lint

6b8f562

Merge branch 'perf'

91176ca

dsaxton reviewed May 25, 2020

View reviewed changes

jreback requested changes May 25, 2020

View reviewed changes

gfyoung added Bug IO CSV read_csv, to_csv Datetime Datetime data dtype labels May 25, 2020

mproszewska added 6 commits May 28, 2020 17:20

Merge remote-tracking branch 'upstream/master'

f748b78

Modify test

3c13f59

Merge remote-tracking branch 'upstream/master'

c04c494

Add asv

d9aa319

Fix

0afb1b1

Run black

85dd0d6

dsaxton added the Stale label Sep 16, 2020

dsaxton mentioned this pull request Sep 16, 2020

CI: Add stale PR action #36336

Merged

dsaxton closed this Oct 8, 2020

Merge branch 'master' into parse-dates-dtype

4b53bf7

gfyoung reopened this Oct 8, 2020

Merge branch 'master' into parse-dates-dtype

b3d5ca5

mproszewska added 2 commits October 8, 2020 22:29

Run black

4175b80

Run isort

6a78220

arw2019 reviewed Nov 6, 2020

View reviewed changes

arw2019 added Needs Review and removed Stale labels Nov 6, 2020

jreback closed this Dec 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix using dtype with parse_dates in read_csv #34330

BUG: Fix using dtype with parse_dates in read_csv #34330

mproszewska commented May 23, 2020

dsaxton May 25, 2020

mproszewska May 28, 2020

jreback left a comment

jreback May 25, 2020

gfyoung May 25, 2020

jreback May 25, 2020

mproszewska May 28, 2020

arw2019 Nov 6, 2020

jreback May 25, 2020

mproszewska May 28, 2020

arw2019 Nov 6, 2020

gfyoung commented May 25, 2020

dsaxton commented Sep 16, 2020

dsaxton commented Oct 8, 2020

mproszewska commented Oct 8, 2020

gfyoung commented Oct 8, 2020

pep8speaks commented Oct 8, 2020 •

edited

Loading

arw2019 left a comment

arw2019 Nov 6, 2020

arw2019 Nov 6, 2020

jreback commented Dec 29, 2020

BUG: Fix using dtype with parse_dates in read_csv #34330

BUG: Fix using dtype with parse_dates in read_csv #34330

Conversation

mproszewska commented May 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented May 25, 2020

dsaxton commented Sep 16, 2020

dsaxton commented Oct 8, 2020

mproszewska commented Oct 8, 2020

gfyoung commented Oct 8, 2020

pep8speaks commented Oct 8, 2020 • edited Loading

Comment last updated at 2020-10-08 20:36:43 UTC

arw2019 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 29, 2020

pep8speaks commented Oct 8, 2020 •

edited

Loading