Skip to content

BUG: syntax error in hdf query with ts #15544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

chris-b1
Copy link
Contributor

@chris-b1 chris-b1 commented Mar 2, 2017

@@ -188,10 +188,6 @@ def stringify(value):
if v.tz is not None:
v = v.tz_convert('UTC')
return TermValue(v, v.value, kind)
elif (isinstance(v, datetime) or hasattr(v, 'timetuple') or
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback - any idea what this was supposed to be doing? This doesn't impact any tested behavior and not sure it ever would have worked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was some really old compat IIRC. (or maybe just copied it from somewhere :>)

improving test coverage by deletions!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that isn't hit at all in the tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is another reference to timetuple in that same file, you can prob take that out too.

@jreback jreback added Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore labels Mar 2, 2017
@jreback jreback added this to the 0.20.0 milestone Mar 2, 2017
@jreback
Copy link
Contributor

jreback commented Mar 2, 2017

lgtm. merge on green (travis is very slow atm though)

with ensure_clean_store(self.path) as store:
store.append('test', df, format='table', data_columns=True)

result = store.select('test', where='date > ts')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, though on 2nd thought. shouldn't this raise? e.g. we are passing a string column and a datelike. (and we know the dtypes here). A datetimelike cannot be compared against anything but a datetime column by-definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Right now we are really liberal in terms of comparison to string columns - (maybe due to pytables?) - basically anything that has a repr - none of the following raise either.

pd.read_hdf('store.h5', 'df', where='str_col > 2.2')
pd.read_hdf('store.h5', 'df', where='str_col > True')

class O:
    pass
o = O()
pd.read_hdf('store.h5', 'df', where='str_col > o')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with all of those raising.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I would be fine with being more strict. At the very least it would raise rather than return an empty result. Prob need a few test cases......

@codecov-io
Copy link

codecov-io commented Mar 3, 2017

Codecov Report

Merging #15544 into master will decrease coverage by -0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15544      +/-   ##
==========================================
- Coverage   91.06%   91.04%   -0.03%     
==========================================
  Files         136      136              
  Lines       49102    49165      +63     
==========================================
+ Hits        44717    44764      +47     
- Misses       4385     4401      +16
Impacted Files Coverage Δ
pandas/computation/pytables.py 91.59% <100%> (+0.98%)
pandas/io/gbq.py 25% <0%> (-61.67%)
pandas/util/decorators.py 62.26% <0%> (-6.97%)
pandas/tools/merge.py 91.78% <0%> (-0.35%)
pandas/indexes/frozen.py 93.24% <0%> (-0.31%)
pandas/indexes/base.py 96.17% <0%> (-0.06%)
pandas/core/series.py 94.89% <0%> (-0.05%)
pandas/tseries/resample.py 94.46% <0%> (-0.01%)
pandas/core/groupby.py 94.99% <0%> (-0.01%)
pandas/core/frame.py 97.83% <0%> (-0.01%)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c106c8...8288dca. Read the comment docs.

df = pd.DataFrame({'unparsed_date': ['2014-01-01', '2014-01-01']})
df.to_hdf('store.h5', 'key', format='table', data_columns=True)
ts = pd.Timestamp('2014-01-01')
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of putting the try/except. Just show the actual traceback (well just the exception part), e.g.

I would show like this

In [2]: 1/0
ZeroDivisionError: division by zero

for a code block I literally execute the code and copy-paste (as it is not executed), just formatted.

expected = df.loc[[1], :]
tm.assert_frame_equal(expected, result)

for v in [2.1, True, pd.Timestamp('2014-01-01')]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also test int/float vs strings column (might work though.....)
Timedelta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, convertible strings will work, added some more test cases

@jreback
Copy link
Contributor

jreback commented Mar 3, 2017

thanks!

@jreback jreback closed this in 04e1168 Mar 3, 2017
@jreback
Copy link
Contributor

jreback commented Mar 10, 2017

@chris-b1 so I narrowed down what this commit fixes (in the docs): 0fd4499

though I still can't repo locally, I think this might be an encoding of the input file (as ascii) and not utf8.

A bare string is interpreted as ascii codes (e.g. 'foo' -> 102......)

so quoting is important. Ideally we could give a better message (though as I said I can't repro locally, sphinx must be doing something).

you can try using an encoding for the file maybe.

of course only if you have a chance to look at this :>

@jreback jreback mentioned this pull request Mar 10, 2017
@chris-b1
Copy link
Contributor Author

Yeah, I'll take a closer look later today. Just making a note for myself, the other difference from the cases tested in this PR is that the doc example contains missing data.

https://github.com/jreback/pandas/blob/6afb3944017533c3dc6df70a144df48f85e1c1f8/doc/source/whatsnew/v0.10.1.txt#L54

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
closes pandas-dev#15492

Author: Chris <[email protected]>

Closes pandas-dev#15544 from chris-b1/hdf-dt-error and squashes the following commits:

8288dca [Chris] lint
7c7100d [Chris] expand test cases
946a48e [Chris] ERR: more strict HDFStore string comparison
213585f [Chris] CLN: remove timetuple type check
cc977f0 [Chris] BUG: syntax error in hdf query with ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ERR: better message for HDFStore date query with non date columns
3 participants