-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Treat 'Inf' as infinity in text parser #4220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
was using strcasecmp slower? (prob not a big deal either way) |
No. I'm doing that instead. Just found a few other things that needed to be fixed (plus means test cases need to be larger). Is |
some of python parser functions are shared but not sure (your tests are run thru both parsers though so u will know if they work or not) |
Okay, I guess I thought |
And here's a perf test between various options: http://nbviewer.ipython.org/5980552 |
@jtratner you are very thorough! and you said u were busy this week :) |
haha ... didn't take long to do so I figured I'd crank it out. This means that any text that converts to 'inf' when lower cased is considered infinity [though only when attempting to convert column to float/double]. Is that what we want to do? (e.g., "-iNF", "inf", "inF" all would become versions of +/- infinity) |
yes I think any of those variations are fine it might be easier/faster to make a hash table of all of the combinations and just do a lookup |
hmm...I'll think about it. Could probably just reuse the function that creates the hashtable for na_values then. But given that right now it's a 25ish character change in the actual code ( |
yep looks good to go |
If you want doc update on this, I can add Monday. On Thu, Jul 11, 2013 at 10:13 PM, jreback [email protected] wrote:
|
hold off on release notes till after 0.12...have to add the new file |
Will do. |
Btw - when are you pulling the trigger on 0.12? :) |
just a couple minor issues with 0.12rc1...then should be out |
Instead of just 'inf' or '-inf', can test for 'Inf', '-Inf', 'INF', etc. Uses strcasecmp under the hood. (also, small fix to assert_almost_equal to make string comparisons clearer)
@jreback This should be ready to go (just waitin' on Travis). |
should 'infinity' be there too? |
I'd say no, because that's bordering more on things that could be string-like (not 100% sure: are inf checks are done before dtype decision?). If we did that, would need to switch to hashtable format, which I could do. your call on how you want to do it. |
right. so forget that can u just do a quick perf test just to make sure it doesn't slow it down much (I think it'll be fine though) |
@jreback is there a way to collect vbench results into a csv? I find the my results can vary a lot and being able to combine multiple would help smooth it out. |
Here's perf results - ran it twice (wasn't sure what regex to use): That said, reading-related items don't seem to be particularly affected (or at least are not negative). |
Make sure you're not doing literally anything else when you run test perf. I sometimes forget that and I get "bad" results. Ratio > 1.15 is usually something sketchy but only if you weren't running anything else. |
Well anything like building docs or compiling stuff. Surfing the web is OK. |
Here are just the "read_*" benches:
|
I just ran the bench against a bogus commit (literally adding whitespace to |
e.g. here's against a basically blank commit:
|
perf testing is actually quite tricky as many things can affect it from one run to another - just looking to make sure that the change which we think is a non event really is so looks good then |
ENH: Treat 'Inf' as infinity in text parser
thank you sir! |
Makes 'inf' checks case insensitive using strcasecmp
Fixes #4219.