Skip to content

Smoke Testing #1926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Smoke Testing #1926

wants to merge 4 commits into from

Conversation

lodagro
Copy link
Contributor

@lodagro lodagro commented Sep 17, 2012

Was thinking on writing smoke tests for Series and DataFrame. Idea is to have "it works"-style of tests on all methods of Series (and later also For DataFrame) for all combinations of index type, value type, and possible input argument combinations.

Current code is certainly not intended to be merged in, just want to introduce the idea.
Smoke tests are easy/quick to write and i think can avoid issues like "method x works fine for y-dtype but unexpectedly raises exception for z-dtype".

Of course it is better to have verification of the output also, and not only check "it works" for all possible cases, but that is much more work and (maybe) for a second phase.

For now, one of the tests smoking __eq__, this came out:

In [17]: s1 = pandas.Series(['foo', 'bar', 'baz'])

In [18]: s2 = pandas.Series()

In [19]: s2 == s1
Out[19]: Series([], dtype=bool)

In [20]: s1 == s2
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'double'

@ghost
Copy link

ghost commented Nov 9, 2012

tseries.pyx:564:

def vec_compare(ndarray[object] left, ndarray[object] right, object op):

dropping the dtype spec doesn't seem to break anything:

def vec_compare(ndarray left, ndarray right, object op):

and results in a more reasonable error regarding disparate lengths.

@wesm , would it be reasonable to drop the dtype? perf-wise, object is bound to be
the least performant anyway, so less typing shouldn't hurt in this case, right?

In any case, It seems terribly strange to have a nonecommutative equality operator.
I'd blindly guess at some exciting corner-cases with python chained comparison if nothing else.

@wesm
Copy link
Member

wesm commented Nov 9, 2012

Dropping the dtype will cause element access to be drastically slower because it will results in Python __getitem__ calls rather than using the C-level buffer interface.

@ghost
Copy link

ghost commented Nov 9, 2012

hopefully, that's acceptable.

@cpcloud
Copy link
Member

cpcloud commented Nov 11, 2012

Why not use fused types?

You could make a mammoth type like

cimport cython

ctypedef fused everything_type:
    object
    cython.numeric
    cython.p_char
    bytes
    unicode

This covers objects, any numeric type, and any string type. Dispatch is figured out automatically at compile-time for cdef and cpdef functions.

@wesm
Copy link
Member

wesm commented Nov 11, 2012

That wasn't working 6 months ago (auto-dispatch from the Python call side). I'm told that it works now and the code could be refactored to use fused types

@ghost
Copy link

ghost commented Nov 11, 2012

#255

@ghost
Copy link

ghost commented Nov 12, 2012

I did an experiment with fused types in 21ef7ab and did not find a performance difference
over aa71c50. stop-gap measure until a big fused-types refactor?
which ever you prefer.

fused types are new to me, so I may have missed something.
should datetime64 be included as a case?

@wesm
Copy link
Member

wesm commented Nov 12, 2012

datetime64 not supported by Cython yet I don't think (grumble)

@wesm wesm closed this in 930a267 Nov 14, 2012
@lodagro
Copy link
Contributor Author

lodagro commented Nov 14, 2012

Reason for this pull request was to get feedback on the idea of smoke testing. Not really to report issue on series == operator, now this PR got closed due to commit message fixing it. Is there any interest on smoke testing?

@wesm
Copy link
Member

wesm commented Nov 14, 2012

that was a complete accident with the fixes. Reopening

@wesm wesm reopened this Nov 14, 2012
@ghost
Copy link

ghost commented Nov 14, 2012

More systematic testing would be a great idea, but would it be possible to split this PR into more
fine-grained patches? it doesn't merge cleanly with master right now, and it looks like
there are some useful new tests and fixes mixed in, which would be easier to grok if they
were decoupled into their own ticket.

Apologies if I mistakenly hijacked the PR, I just saw a legitimate bug laying dormant for a couple
of months and decided to do something about it.

edit: my mistake, was fooled by the divergence from trunk, I understand a25fbf2 is the only relavent bit.

@lodagro
Copy link
Contributor Author

lodagro commented Feb 4, 2013

It was my idea to finish this quickly when i opened this. Failed big time. Closing, and do a new PR if i ever get it finished (i need to rebase anyway, i noticed i created this branch from another feature branch iso master.)

@lodagro lodagro closed this Feb 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants