-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Parse two date columns broken in read_csv with multiple headers #15378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Parse two date columns broken in read_csv with multiple headers #15378
Conversation
|
||
def test_parse_date_time_multi_level_column_name(self): | ||
# GH 15376 | ||
result = conv.parse_date_time(self.dates, self.times) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what these 2 lines are doing, remove.
2001-01-05, 00:00:00, 1., 11. | ||
""" | ||
datecols = {'date_time': [0, 1]} | ||
df = read_table(StringIO(data), sep=',', header=[0, 1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use self.read_csv
, this tests on all parsers (c/python)
datecols = {'date_time': [0, 1]} | ||
df = read_table(StringIO(data), sep=',', header=[0, 1], | ||
parse_dates=datecols, date_parser=conv.parse_date_time) | ||
self.assertIn('date_time', df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
construct an expected frame, and use assert_frame_equal
doc/source/whatsnew/v0.20.0.txt
Outdated
@@ -580,3 +580,4 @@ Bug Fixes | |||
- Bug in ``Series.replace`` and ``DataFrame.replace`` which failed on empty replacement dicts (:issue:`15289`) | |||
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`) | |||
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`) | |||
- Bug in ``.read_csv()`` which caused ``parse_dates={'datetime': [0, 1]}`` to fail with multiline headers (:issue:`15376`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't put this as the last line, instead use an empty space, otherwise you will get conflicts.
Bug in .read_csv() where parse_dates with a list-of-integers specified would fail with multiline headers
Codecov Report
@@ Coverage Diff @@
## master #15378 +/- ##
==========================================
- Coverage 90.37% 90.37% -0.01%
==========================================
Files 135 135
Lines 49440 49454 +14
==========================================
+ Hits 44681 44693 +12
- Misses 4759 4761 +2
Continue to review full report at Codecov.
|
Fix for GH15376 In `io/parsers/_try_convert_dates()` when selecting columns based on a column index from a set of columns with multi-level names, the column `name` was converted to a string. This appears to be a bug since the `name` was a tuple before the conversion. This causes problems downstream when threre is an attempt to use this name to lookup a column, and that lookup fails becuase the desired column is keyed from the tuple, not its string representation.
3ed8551
to
030f5ec
Compare
2001-01-06, 00:00:00, 1.0, 11. | ||
""" | ||
datecols = {'date_time': [0, 1]} | ||
result = read_csv(StringIO(data), sep=',', header=[0, 1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be self.read_csv
, but I can fix on the merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. A few more of these and hopefully I'll get it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha np. parser tests are a little tricky to understand because of this actually.
ok ping on green. |
can you update |
@jreback, You asked for update 4 days back, but I thought this was OK. If you still need something, please let me know what. |
closed via: fb7dc7d thanks @stephenrauch this test was in the wrong place (I had made a comment above, but not sure if you saw it). In fact I think all of the pandas/tests/io/test_date_converters are in the wrong place and should simply be in I'll create an issue about this. |
In `io/parsers/_try_convert_dates()` when selecting columns based on a column index from a set of columns with multi- level names, the column `name` was converted to a string. This appears to be a bug since the `name` was a tuple before the conversion. This causes problems downstream when there is an attempt to use this name to lookup a column, and that lookup fails because the desired column is keyed from the tuple, not its string representation closes pandas-dev#15376 Author: Stephen Rauch <[email protected]> Closes pandas-dev#15378 from stephenrauch/fix_read_csv_merge_datetime and squashes the following commits: 030f5ec [Stephen Rauch] BUG: Parse two date columns broken in read_csv with multiple headers
Fix for GH15376
In
io/parsers/_try_convert_dates()
when selecting columns based on acolumn index from a set of columns with multi-level names, the column
name
was converted to a string. This appears to be a bug since thename
was a tuple before the conversion. This causes problemsdownstream when there is an attempt to use this name to lookup a
column, and that lookup fails because the desired column is keyed from
the tuple, not its string representation.
git diff upstream/master | flake8 --diff