-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
adding a NotImplementedError for simultaneous use of nrows and chunksize... #7085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
needs a test ( which should fail w/o your fix) and passes with in pandas/io/tests/test_parser.py |
THAT's where they are hidden, was looking for parsing tests. :) |
that class is just includes in other classes which inherit from TestCase which inherits from unit test intimately (and that's whee assertRaises is defined) you can look at klass.mro to see the hierarchy if u r curious |
Sorry, again confused. What's the idea of the separation of ParserTests and TestPythonParser? Because I see both classes performing asserts ? |
when tests are used in multiple test classes they are separated to a base class most of the tests in the parser are run 3x with different arguments (mostly the engine type eg c or python) use -v when u run tests to see how tests are executed |
Not really all having different arguments. Look at these two, isn't it just a copy-paste forgotten or lack of decision if to run these 2 tests separately or within one method? The first test in first method looks like to be exactly the same as the test in the second method definition. def test_integer_overflow_bug(self):
# #2601
data = "65248E10 11\n55555E55 22\n"
result = self.read_csv(StringIO(data), header=None, sep=' ')
self.assertTrue(result[0].dtype == np.float64)
result = self.read_csv(StringIO(data), header=None, sep='\s+')
self.assertTrue(result[0].dtype == np.float64)
def test_integer_overflow_bug(self):
# #2601
data = "65248E10 11\n55555E55 22\n"
result = self.read_csv(StringIO(data), header=None, sep=' ')
self.assertTrue(result[0].dtype == np.float64) |
it's calling self.read_cav that sets different engines on each run |
you mean there's something that keeps track how often a method signature was called and is using a different engine the next time it's there? |
that's why u define multiple classes to have the same methods called but with different arguments to test different cases it makes it simpler then writing a tests twice nose calls the tests in alphabetical order I think no state is kept it's all in each tests idea being is it should behave identically on both parsers their are specifc tests on one or the other parser for certain options that are only supported on one ( which are in effect uninplemented features ATM) |
Super trivial point: unless this will be supported someday, or could be supported, you should make this a TypeError instead (for bad function signature - eg what Python throws if you pass the wrong number of arguments). Your call on what it should be tho. |
@michaelaye |
Ok, I digged deeper into the parser tests and still don't get why above method is defined twice in the class ParserTests.
class TestCParserHighMemory(ParserTests, tm.TestCase):
def read_csv(self, *args, **kwds):
kwds = kwds.copy()
kwds['engine'] = 'c'
kwds['low_memory'] = False
return read_csv(*args, **kwds)
$ cat allout.txt|grep "test_integer_overflow_bug "
test_integer_overflow_bug (pandas.io.tests.test_parsers.TestCParserHighMemory) ... ok
test_integer_overflow_bug (pandas.io.tests.test_parsers.TestCParserLowMemory) ... ok
test_integer_overflow_bug (pandas.io.tests.test_parsers.TestPythonParser) ... ok |
so delete the first one (looks like an older version); it doesn't matter really, because the later definition overrides the first definition e.g.
The first 'func' never is simply overwritten |
Ok, will remove the first version. Don't know what the difference between override and overwrite is, and why the later definition would override the first one, but would NOT overwrite it, but I guess that's some internals I don't care now. |
looks good, pls rebase on master and add a whatsnew entry for 0.14.1. (we are now just putting in 1 place) |
@michaelaye can you add a whatsnew entry? otherwise good to go (you can put in API changes section) |
Nah, still have to do the tests and learn about rebasing. |
ahh..rght...ok...lmk |
update? |
i'm working on the test. saw now that I'm hampered by my main work machine being broken this week, so i'm working with this awfully slow Win7 netbook 🐌 |
Okay, as required, I verified that my test fails when my patch is not in. Now a quick read on rebase-ing, and deciding what commit to survive and I'm done. |
So, I only have 2 commits to replay, is it okay to just execute a 'git rebase master' on my patch branch and push the results up into the PR? |
yep |
perfect just need a short release note in v0.14.1 (ref the original issue) then squash (if u can't no worries) but almost the same as a rebase |
one last help to understand what is the preferred git procedure: |
git rebase -i origin/master will rebase u on top of pandas master |
yes, and then i have to a make a change to the what's new file, creating another commit, meaning even so I squash my previous 2 i now have to create another one. |
that's fine |
Are you suggesting a second rebase after I made the changes to v0.14.1.txt? |
or should I cherry-pick the file, make the changes and then rebase? That's sounds quite hackish for the git log, I think. |
u can rebase and much as u need all rebasing does is rewrites the commit history eg allows u to reorder and/or combine commits |
ok, thanks. |
you can make whatever changes u want I think rebasing is really much nicer than merging in as it allows u to make a consistent history - purist don't like this because it rewrites the history - but I don't think it's a big deal when I merge it will be a single merge commit that takes your commit (kind of like a cherry pick) |
…hunksize. For read_csv() the user intention most likely is to get a TextFileReader, when using the chunksize option, but simultaneous use of nrows is not implemented yet. This raises now a NotImplementedError. Test and entry to current whatsnew source (v0.14.1.txt) added. Fixes pandas-dev#6774
darn git has some empty diffs with github.. i will let u know when i'm done.. :( |
ok, this should be it. Thanks for your patience. |
adding a NotImplementedError for simultaneous use of nrows and chunksize...
thanks @michaelaye ! |
..., as the user intention most likely is to get a TextFileReader, when using the chunksize option.
Fixes #6774