BUG: Parse two date columns broken in read_csv with multiple headers

stephenrauch · jreback · commit fb7dc7dcbde1 · 2017-02-27T09:44:49.000-05:00
In `io/parsers/_try_convert_dates()` when selecting columns based on a column index from a set of columns with multi- level names, the column `name` was converted to a string. This appears to be a bug since the `name` was a tuple before the conversion. This causes problems downstream when there is an attempt to use this name to lookup a column, and that lookup fails because the desired column is keyed from the tuple, not its string representation closes #15376 Author: Stephen Rauch <stephen.rauch+github@gmail.com> Closes #15378 from stephenrauch/fix_read_csv_merge_datetime and squashes the following commits: 030f5ec [Stephen Rauch] BUG: Parse two date columns broken in read_csv with multiple headers
diff --git a/doc/source/whatsnew/v0.20.0.txt b/doc/source/whatsnew/v0.20.0.txt
@@ -625,6 +625,7 @@ Bug Fixes
 
 
 
+- Bug in ``.read_csv()`` with ``parse_dates`` when multiline headers are specified (:issue:`15376`)
 
 
 - Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`)
diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py
@@ -2858,7 +2858,7 @@ def _try_convert_dates(parser, colspec, data_dict, columns):
         if c in colset:
             colnames.append(c)
         elif isinstance(c, int) and c not in columns:
-            colnames.append(str(columns[c]))
+            colnames.append(columns[c])
         else:
             colnames.append(c)
 
diff --git a/pandas/tests/io/parser/parse_dates.py b/pandas/tests/io/parser/parse_dates.py
@@ -18,6 +18,7 @@
 import pandas.tseries.tools as tools
 import pandas.util.testing as tm
 
+import pandas.io.date_converters as conv
 from pandas import DataFrame, Series, Index, DatetimeIndex
 from pandas import compat
 from pandas.compat import parse_date, StringIO, lrange
@@ -491,3 +492,21 @@ def test_parse_dates_noconvert_thousands(self):
         result = self.read_csv(StringIO(data), index_col=[0, 1],
                                parse_dates=True, thousands='.')
         tm.assert_frame_equal(result, expected)
+
+    def test_parse_date_time_multi_level_column_name(self):
+        data = """\
+D,T,A,B
+date, time,a,b
+2001-01-05, 09:00:00, 0.0, 10.
+2001-01-06, 00:00:00, 1.0, 11.
+"""
+        datecols = {'date_time': [0, 1]}
+        result = self.read_csv(StringIO(data), sep=',', header=[0, 1],
+                               parse_dates=datecols,
+                               date_parser=conv.parse_date_time)
+
+        expected_data = [[datetime(2001, 1, 5, 9, 0, 0), 0., 10.],
+                         [datetime(2001, 1, 6, 0, 0, 0), 1., 11.]]
+        expected = DataFrame(expected_data,
+                             columns=['date_time', ('A', 'a'), ('B', 'b')])
+        tm.assert_frame_equal(result, expected)

Original file line number	Diff line number	Diff line change
`@@ -625,6 +625,7 @@ Bug Fixes`
`625`	`625`
`626`	`626`
`627`	`627`
	`628`	+- Bug in ``.read_csv()`` with ``parse_dates`` when multiline headers are specified (:issue:`15376`)
`628`	`629`
`629`	`630`
`630`	`631`	- Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`)