Skip to content

ENH/CLN: give all AssertionErrors and nose.SkipTest raises an informative message #3730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 28, 2013
Merged

ENH/CLN: give all AssertionErrors and nose.SkipTest raises an informative message #3730

merged 1 commit into from
Sep 28, 2013

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented May 31, 2013

continuation of #3519 because of a branch rename.

UPDATE: The exception parsing script has been moved to a Gist

This PR partially addresses #3024.

  • all AssertionError exceptions now have an informative error message in them
  • some AssertionErrors have been converted to different Exception subclasses, where it makes sense, and there's a corresponding test wherever these were changed.
  • all nose.SkipTest exceptions now have an informative message

@jreback
Copy link
Contributor

jreback commented May 31, 2013

@cpcloud only issue I have is that you may have to manually test that each assert works, e.g. the actual exception message doesn't have an error in it (say from the formatting)...not a big deal....but give a quick verify (e.g. maybe temporarily make the assertion test true and see if the message is ok)....don't need tests for this (just a quick verify)

@cpcloud
Copy link
Member Author

cpcloud commented May 31, 2013

@jreback no problem i will just eval the ones that have a constructor with string arguments.

@cpcloud
Copy link
Member Author

cpcloud commented May 31, 2013

should suffice just to construct the AssertionError object where possible (e.g., I can't do it when there's a variable for a message)

@jreback
Copy link
Contributor

jreback commented May 31, 2013

no.. that's exactly the issue; unless you explicity test the AssertionError string its in theory possibly to have a formatting error in the error message; which causes it to raise Exception rather than AssertionError

this is quite rare, and no need to test the assertions explicity, just a FYI really

@cpcloud
Copy link
Member Author

cpcloud commented May 31, 2013

that works fine, assuming df as shown above but with assertionerrors instead you can do

df = df[df.msg.map(lambda x: isinstance(x, basestring))]
(df.code + '(' + df.msg.map(repr) + ')').map(eval) # check for construction error

this doesn't cover incorrect format strings though...will test those manually i guess although i'm sure i could come up with a stupid way to parse formatting strings that would test this

@cpcloud
Copy link
Member Author

cpcloud commented May 31, 2013

oh by formatting i thought u meant code formatting. sorry. it's hard to do this without running test code since it is determined at runtime. e.g., '{0} {1}'.format(1) will not be caught on byte compilation only when the string is actually formatted

@jreback
Copy link
Contributor

jreback commented May 31, 2013

no....

if False:
    raise AssertionError("this is a bogus format %s")

since it 'never' triggers then we don't know that the error message actually is a formatting issue
(of course if its meant to trigger occasionally then a test will catch this, e.g. assertRaises(AssertionError...)

but the things you are checking are probably never triggered (and don't merit tests).....

maybe a ways to have your script actually check this somehow?

@cpcloud
Copy link
Member Author

cpcloud commented May 31, 2013

There is a slight issue with that

  • That string is perfectly valid (and wouldn't be caught in any case), so here's one way I can think of checking (this approach works with format strings as well sans the binop u just check the method call in the ast):
    1. parse the format string and get the number of specs + the number of rhs arg for the modulus binop
    2. assert that those two quantities are equal
  • make the simplifying assumptions
    1. that there are no nested format strings, e.g., '{a.{b}}'.format(a='d', b='c')
    2. no nested format + modulus op format strings, e.g., '%s' % ('%s' % 'x')
    3. no funky formatting attribute access
    4. no '%(dict_key)s' type strings

i think all of these are valid assumptions for error msgs, but i would need to check the corner cases of variables holding strings that will be messages.

@jreback
Copy link
Contributor

jreback commented May 31, 2013

that would work
actually u could test ALL exceptions like that

@cpcloud
Copy link
Member Author

cpcloud commented May 31, 2013

ok. this is turning into a static analysis tool for exceptions. i wonder if pylint can't do this...

@cpcloud
Copy link
Member Author

cpcloud commented Jun 1, 2013

nah pylint doesn't do this...

@ghost ghost assigned cpcloud Jun 6, 2013
@jtratner
Copy link
Contributor

This looks really nice!

I'm going to piggy-back on this to bring up another Exception-related item (though it's about Exception rather than AssertionError) ... it's not great to raise Exception everywhere, because it makes it harder to check for exceptions, consider the following:

try:
    df2 = pandas.ga() # some web scrape or other
    df.inde == df2.index
except Exception:
    #whelp, does this mean the scrape failed here?

Whereas, if we tried to default to more descriptive Exceptions where appropriate and/or used the PandasError in core, that would make it less ambiguous. E.g., in io/data, I'd assert that nearly every use of Exception could either be a PandasError or something like:

class DataError(IOError, PandasError): pass

That way you could be more specific about what you want to catch (and eventually rewrite some of the test cases to cover this as well) + it's backwards-compatible, because checks for Exception will still pick this up.

Is this worth pursuing? (I'm volunteering to do this.)

@jreback
Copy link
Contributor

jreback commented Jun 16, 2013

+1

what would be ideal would be ALL exceptions to be listed in say
pandas.utils.exceptions

a hierarchy (not deep really), maybe IOError, DataError near the top
and various branches for specifc errors

and best of all descriptions on when to use them
and when to use ValueError, TypeError

then gradually work thru the library and get rid of EXception

maybe add tr warning code in here too (or .warnings) module

@cpcloud
Copy link
Member Author

cpcloud commented Jun 16, 2013

darn it @jreback u beat me 2 to it!

@cpcloud
Copy link
Member Author

cpcloud commented Jun 16, 2013

i hate raising Exception and i also die a little inside when i see except:

@cpcloud
Copy link
Member Author

cpcloud commented Jun 16, 2013

@jtratner with this script u can see the line and col no of any exception (and its message if any, but doesn't resolve the variables as that would require eval and there aren't that many of those) matching a regex that u providel u can also provide a module if eg u just want to work on core or core.frame

@cpcloud
Copy link
Member Author

cpcloud commented Jun 16, 2013

i need to revisit the discussion @jreback and i had above, i've sort of let this go by the wayside, modulo the fact that i occasionally rebase and push to make sure there are no merge conflicts when i do get back to it

@jtratner
Copy link
Contributor

Well, I was debating running something like (my grep/sed-fu sucks, even though I use vim constantly:P):

grep -rl "^\s*raise Exception" ./ | xargs sed -i "s/^(\s*)raise Exception/\1raise PandasError/"

and then adding imports from core/common for the PandasError as a stopgap. The problem is that this commits us to using PandasError as a baseclass going forward. (instead of trying to use builtin exceptions as much as possible).

Sidenote - would this make more sense as an issue linked to a Gist [which can have multiple files] instead of a pull request? That way you wouldn't have to keep rebasing and fixing...

@jtratner
Copy link
Contributor

Also, is there a reason to use the if <cond>: raise AssertionError(<msg>) as opposed to assert <cond>, <msg>?

@cpcloud
Copy link
Member Author

cpcloud commented Jun 16, 2013

glad u asked. python -O removes assert only_expressions statements

@cpcloud
Copy link
Member Author

cpcloud commented Jun 16, 2013

kind of annoying since it makes it hard to differentiate "user" errors from, e.g., sanity checks when developing things, but alas python is not perfect

@jtratner
Copy link
Contributor

@cpcloud so what's the best way to do this? Submit a pull-request with a bunch of Exception changes?

@cpcloud
Copy link
Member Author

cpcloud commented Jun 16, 2013

@jtratner yep. btw u might trying using grin. it's like grep but for teh codez (similar to ack but i like grin better)! just pip install grin then do this :D

@jtratner
Copy link
Contributor

@cpcloud I'm a big fan of ack (especially because it has great vim integration)

@jtratner
Copy link
Contributor

@jreback on exceptions - I'm thinking we could use a Wiki page on pydata/pandas to describe which exceptions should do what, so it's possible to collaboratively edit it and then could move to the contributing docs.

@cpcloud
Copy link
Member Author

cpcloud commented Jun 16, 2013

what would be ideal would be ALL exceptions to be listed in say
pandas.utils.exceptions

@jreback 👍 on this and i think a pandas.util.warnings might be useful as well...

@cpcloud
Copy link
Member Author

cpcloud commented Jun 16, 2013

consolidate ALL THE THINGS!

@jreback
Copy link
Contributor

jreback commented Jun 16, 2013

there is a wiki here on GitHub
not really used but seems functional
why don't u start a page for this

@jtratner
Copy link
Contributor

@jreback that's what I'm doing right now. Plus, you can write it in ReST, so can be easily moved around.

@cpcloud
Copy link
Member Author

cpcloud commented Jun 16, 2013

yep that's probably what i would do. i like the idea of having to remember only one place from which to import exceptions

@jreback
Copy link
Contributor

jreback commented Jun 16, 2013

though these are very specific and essentially internal

the exceptions we are talking about are ones that a caller of the function might want to intercept

maybe leave these alone (but document) what they r for

@jtratner
Copy link
Contributor

That makes sense. (though I noticed that PandasError appears to be defined
in both core/common and core/generic, so that's definitely something we'd
want to avoid)

On Sun, Jun 16, 2013 at 6:36 AM, jreback [email protected] wrote:

though these are very specific and essentially internal

the exceptions we are talking about are ones that a caller of the function
might want to intercept

maybe leave these alone (but document) what they r for


Reply to this email directly or view it on GitHubhttps://github.com//pull/3730#issuecomment-19510623
.

@cpcloud
Copy link
Member Author

cpcloud commented Jun 27, 2013

this now implements the full format spec (including dict printf, regular printf, and format strings, not nested) and adds a column named valid of bools telling whether the format string is valid. it can be used like so

./parse_except.py --validate

i could make an API for this so that it could be added to the end of testing ... e.g., it would tell someone where their change introduces a bad (meaning invalid or incompatible with python2.6) format string in an exception

@cpcloud
Copy link
Member Author

cpcloud commented Sep 21, 2013

can we merge this sometime soon? i believe i've added a message to almost every nose skip and definitely to every assertionerror raise

@jtratner
Copy link
Contributor

well rebase it and looks fine to me. going to be some merge conflicts though

@cpcloud
Copy link
Member Author

cpcloud commented Sep 23, 2013

all nose.SkipTest covered as well as assertionerror msgs...would be nice to get this in to avoid the constant merge conflicts

if not ((len(args) == self._AXIS_LEN)):
raise AssertionError()
if len(args) != self._AXIS_LEN:
raise AssertionError('There must be an argument for each axis, '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not an AssertionError, this should be TypeError and could use a brief test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this can just be removed and add 3 pos args ... simplest way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe higher than 3dims uses this too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@cpcloud
Copy link
Member Author

cpcloud commented Sep 28, 2013

@jreback @jtratner any more comments?

@jreback
Copy link
Contributor

jreback commented Sep 28, 2013

I see...lots of fixups...ok...go ahead

cpcloud added a commit that referenced this pull request Sep 28, 2013
ENH/CLN: give all AssertionErrors and nose.SkipTest raises an informative message
@cpcloud cpcloud merged commit 63ec809 into pandas-dev:master Sep 28, 2013
@cpcloud cpcloud deleted the parse-exceptions branch September 28, 2013 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants