-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: syntax error in hdf query with ts #15544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -188,10 +188,6 @@ def stringify(value): | |||
if v.tz is not None: | |||
v = v.tz_convert('UTC') | |||
return TermValue(v, v.value, kind) | |||
elif (isinstance(v, datetime) or hasattr(v, 'timetuple') or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback - any idea what this was supposed to be doing? This doesn't impact any tested behavior and not sure it ever would have worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was some really old compat IIRC. (or maybe just copied it from somewhere :>)
improving test coverage by deletions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that isn't hit at all in the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is another reference to timetuple
in that same file, you can prob take that out too.
lgtm. merge on green (travis is very slow atm though) |
pandas/tests/io/test_pytables.py
Outdated
with ensure_clean_store(self.path) as store: | ||
store.append('test', df, format='table', data_columns=True) | ||
|
||
result = store.select('test', where='date > ts') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, though on 2nd thought. shouldn't this raise? e.g. we are passing a string column and a datelike. (and we know the dtypes here). A datetimelike cannot be compared against anything but a datetime column by-definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Right now we are really liberal in terms of comparison to string columns - (maybe due to pytables?) - basically anything that has a repr - none of the following raise either.
pd.read_hdf('store.h5', 'df', where='str_col > 2.2')
pd.read_hdf('store.h5', 'df', where='str_col > True')
class O:
pass
o = O()
pd.read_hdf('store.h5', 'df', where='str_col > o')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine with all of those raising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I would be fine with being more strict. At the very least it would raise rather than return an empty result. Prob need a few test cases......
Codecov Report
@@ Coverage Diff @@
## master #15544 +/- ##
==========================================
- Coverage 91.06% 91.04% -0.03%
==========================================
Files 136 136
Lines 49102 49165 +63
==========================================
+ Hits 44717 44764 +47
- Misses 4385 4401 +16
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.20.0.txt
Outdated
df = pd.DataFrame({'unparsed_date': ['2014-01-01', '2014-01-01']}) | ||
df.to_hdf('store.h5', 'key', format='table', data_columns=True) | ||
ts = pd.Timestamp('2014-01-01') | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of putting the try/except. Just show the actual traceback (well just the exception part), e.g.
I would show like this
In [2]: 1/0
ZeroDivisionError: division by zero
for a code block I literally execute the code and copy-paste (as it is not executed), just formatted.
pandas/tests/io/test_pytables.py
Outdated
expected = df.loc[[1], :] | ||
tm.assert_frame_equal(expected, result) | ||
|
||
for v in [2.1, True, pd.Timestamp('2014-01-01')]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also test int/float vs strings column (might work though.....)
Timedelta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, convertible strings will work, added some more test cases
thanks! |
@chris-b1 so I narrowed down what this commit fixes (in the docs): 0fd4499 though I still can't repo locally, I think this might be an encoding of the input file (as ascii) and not utf8. A bare string is interpreted as ascii codes (e.g. 'foo' -> 102......) so quoting is important. Ideally we could give a better message (though as I said I can't repro locally, sphinx must be doing something). you can try using an encoding for the file maybe. of course only if you have a chance to look at this :> |
Yeah, I'll take a closer look later today. Just making a note for myself, the other difference from the cases tested in this PR is that the doc example contains missing data. |
closes pandas-dev#15492 Author: Chris <[email protected]> Closes pandas-dev#15544 from chris-b1/hdf-dt-error and squashes the following commits: 8288dca [Chris] lint 7c7100d [Chris] expand test cases 946a48e [Chris] ERR: more strict HDFStore string comparison 213585f [Chris] CLN: remove timetuple type check cc977f0 [Chris] BUG: syntax error in hdf query with ts
git diff upstream/master | flake8 --diff