BUG: Parse two date columns broken in read_csv with multiple headers #15378

stephenrauch · 2017-02-12T17:19:19Z

Fix for GH15376

In io/parsers/_try_convert_dates() when selecting columns based on a
column index from a set of columns with multi-level names, the column
name was converted to a string. This appears to be a bug since the
name was a tuple before the conversion. This causes problems
downstream when there is an attempt to use this name to lookup a
column, and that lookup fails because the desired column is keyed from
the tuple, not its string representation.

closes BUG: Parse two date columns broken in read_csv with multiple headers #15376
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

jreback · 2017-02-12T17:34:52Z

pandas/tests/io/test_date_converters.py

+
+    def test_parse_date_time_multi_level_column_name(self):
+        # GH 15376
+        result = conv.parse_date_time(self.dates, self.times)


not sure what these 2 lines are doing, remove.

jreback · 2017-02-12T17:35:08Z

pandas/tests/io/test_date_converters.py

+2001-01-05, 00:00:00, 1., 11.
+"""
+        datecols = {'date_time': [0, 1]}
+        df = read_table(StringIO(data), sep=',', header=[0, 1],


use self.read_csv, this tests on all parsers (c/python)

jreback · 2017-02-12T17:35:20Z

pandas/tests/io/test_date_converters.py

+        datecols = {'date_time': [0, 1]}
+        df = read_table(StringIO(data), sep=',', header=[0, 1],
+                        parse_dates=datecols, date_parser=conv.parse_date_time)
+        self.assertIn('date_time', df)


construct an expected frame, and use assert_frame_equal

jreback · 2017-02-12T17:36:28Z

doc/source/whatsnew/v0.20.0.txt

@@ -580,3 +580,4 @@ Bug Fixes
 - Bug in ``Series.replace`` and ``DataFrame.replace`` which failed on empty replacement dicts (:issue:`15289`)
 - Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
 - Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
+- Bug in ``.read_csv()`` which caused ``parse_dates={'datetime': [0, 1]}`` to fail with multiline headers (:issue:`15376`)


don't put this as the last line, instead use an empty space, otherwise you will get conflicts.

Bug in .read_csv() where parse_dates with a list-of-integers specified would fail with multiline headers

codecov-io · 2017-02-12T19:24:13Z

Codecov Report

Merging #15378 into master will decrease coverage by -0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15378      +/-   ##
==========================================
- Coverage   90.37%   90.37%   -0.01%     
==========================================
  Files         135      135              
  Lines       49440    49454      +14     
==========================================
+ Hits        44681    44693      +12     
- Misses       4759     4761       +2

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.51% <100%> (ø)`	✅
pandas/core/common.py	`91.02% <ø> (-0.34%)`	❌
pandas/core/frame.py	`97.82% <ø> (-0.05%)`	❌
pandas/tools/concat.py	`97.62% <ø> (ø)`	✅
pandas/core/generic.py	`96.33% <ø> (ø)`	✅
pandas/io/excel.py	`79.64% <ø> (+0.24%)`	✅

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a8883b...030f5ec. Read the comment docs.

Fix for GH15376 In `io/parsers/_try_convert_dates()` when selecting columns based on a column index from a set of columns with multi-level names, the column `name` was converted to a string. This appears to be a bug since the `name` was a tuple before the conversion. This causes problems downstream when threre is an attempt to use this name to lookup a column, and that lookup fails becuase the desired column is keyed from the tuple, not its string representation.

jreback · 2017-02-16T17:43:48Z

pandas/tests/io/test_date_converters.py

+2001-01-06, 00:00:00, 1.0, 11.
+"""
+        datecols = {'date_time': [0, 1]}
+        result = read_csv(StringIO(data), sep=',', header=[0, 1],


should be self.read_csv, but I can fix on the merge

Thanks. A few more of these and hopefully I'll get it.

haha np. parser tests are a little tricky to understand because of this actually.

jreback · 2017-02-16T17:44:16Z

ok ping on green.

jreback · 2017-02-23T13:27:03Z

can you update

stephenrauch · 2017-02-27T05:59:14Z

@jreback, You asked for update 4 days back, but I thought this was OK. If you still need something, please let me know what.

jreback · 2017-02-27T14:47:27Z

closed via: fb7dc7d

thanks @stephenrauch

this test was in the wrong place (I had made a comment above, but not sure if you saw it).

In fact I think all of the pandas/tests/io/test_date_converters are in the wrong place and should simply be in pandas/tests/io/parsers/parse_dates.py (or equiv), so that they run under each parser. My guess is that this is an older file.

I'll create an issue about this.

In `io/parsers/_try_convert_dates()` when selecting columns based on a column index from a set of columns with multi- level names, the column `name` was converted to a string. This appears to be a bug since the `name` was a tuple before the conversion. This causes problems downstream when there is an attempt to use this name to lookup a column, and that lookup fails because the desired column is keyed from the tuple, not its string representation closes pandas-dev#15376 Author: Stephen Rauch <[email protected]> Closes pandas-dev#15378 from stephenrauch/fix_read_csv_merge_datetime and squashes the following commits: 030f5ec [Stephen Rauch] BUG: Parse two date columns broken in read_csv with multiple headers

jreback requested changes Feb 12, 2017

View reviewed changes

jreback added Bug IO CSV read_csv, to_csv labels Feb 12, 2017

stephenrauch force-pushed the fix_read_csv_merge_datetime branch from 3ed8551 to 030f5ec Compare February 16, 2017 16:16

jreback reviewed Feb 16, 2017

View reviewed changes

jreback approved these changes Feb 16, 2017

View reviewed changes

jreback added this to the 0.20.0 milestone Feb 16, 2017

jreback closed this in fb7dc7d Feb 27, 2017

jreback mentioned this pull request Feb 27, 2017

TST: move pandas/tests/io/test_date_converters.py to pandas/tests/io/parsers/parse_dates.py #15519

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Parse two date columns broken in read_csv with multiple headers #15378

BUG: Parse two date columns broken in read_csv with multiple headers #15378

stephenrauch commented Feb 12, 2017

jreback Feb 12, 2017

jreback Feb 12, 2017

jreback Feb 12, 2017

jreback Feb 12, 2017

codecov-io commented Feb 12, 2017 •

edited

Loading

jreback Feb 16, 2017

stephenrauch Feb 16, 2017

jreback Feb 16, 2017 •

edited

Loading

jreback commented Feb 16, 2017

jreback commented Feb 23, 2017

stephenrauch commented Feb 27, 2017

jreback commented Feb 27, 2017

BUG: Parse two date columns broken in read_csv with multiple headers #15378

BUG: Parse two date columns broken in read_csv with multiple headers #15378

Conversation

stephenrauch commented Feb 12, 2017

jreback Feb 12, 2017

Choose a reason for hiding this comment

jreback Feb 12, 2017

Choose a reason for hiding this comment

jreback Feb 12, 2017

Choose a reason for hiding this comment

jreback Feb 12, 2017

Choose a reason for hiding this comment

codecov-io commented Feb 12, 2017 • edited Loading

Codecov Report

jreback Feb 16, 2017

Choose a reason for hiding this comment

stephenrauch Feb 16, 2017

Choose a reason for hiding this comment

jreback Feb 16, 2017 • edited Loading

Choose a reason for hiding this comment

jreback commented Feb 16, 2017

jreback commented Feb 23, 2017

stephenrauch commented Feb 27, 2017

jreback commented Feb 27, 2017

codecov-io commented Feb 12, 2017 •

edited

Loading

jreback Feb 16, 2017 •

edited

Loading