-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Iterableiterator #12173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterableiterator #12173
Conversation
see #9496 |
"""Subclass this and provide a "__next__()" method to obtain an iterator. | ||
Useful only when the object being iterated is non-reusable (e.g. OK for a | ||
parser, not for an in-memory table).""" | ||
def __iter__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to pandas/io/common.py
, inherit from object
, call this BaseIterable
, raise AbstractMethodError
for __next__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make a parent class of Boto
and UTFReader
as well in`io/common.py``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also incorporate elements of TableIterator
in io/pytables.py
(and make this a sub-class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BaseIterable
or BaseIterator
? True, it makes an iterable out of an iterator, but most iterables will not be iterators.
@@ -343,3 +343,14 @@ def is_platform_mac(): | |||
|
|||
def is_platform_32bit(): | |||
return struct.calcsize("P") * 8 < 64 | |||
|
|||
|
|||
class IterableIterator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inherit from object
?
c14e7d0
to
5e094a4
Compare
|
it = read_csv(StringIO(self.data1), chunksize=1) | ||
first = next(it) | ||
tm.assert_frame_equal(first, expected.iloc[[0]]) | ||
expected.index = [0 for i in range(len(expected))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each chunk has an index restarting from 0 (also in master
). Isn't this a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that shouldn't be the case
In [46]: s = DataFrame({'A' : range(10)}).to_csv(None)
In [47]: s
Out[47]: ',A\n0,0\n1,1\n2,2\n3,3\n4,4\n5,5\n6,6\n7,7\n8,8\n9,9\n'
In [48]: pd.read_csv(StringIO(s),index_col=0)
Out[48]:
A
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
In [49]: list(pd.read_csv(StringIO(s),index_col=0,chunksize=4))
Out[49]:
[ A
0 0
1 1
2 2
3 3, A
4 4
5 5
6 6
7 7, A
8 8
9 9]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean when the index is not loaded:
In [7]: list(pd.read_csv(StringIO(s), chunksize=4))
Out[7]:
[ Unnamed: 0 A
0 0 0
1 1 1
2 2 2
3 3 3, Unnamed: 0 A
0 4 4
1 5 5
2 6 6
3 7 7, Unnamed: 0 A
0 8 8
1 9 9]
Wouldn't we prefer a consistent index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yes I agree
you'd have to keep state in the in the iterators I think though -
I do this is HDFStore iirc
but might be a bit non trivial
if u think u can fix easily - go ahead
otherwise create an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'd like to do it but it probably won't be immediate: #12185 .
ok on doing |
return self | ||
|
||
def __next__(self): | ||
raise AbstractMethodError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AbstractMethodError(self)
5e094a4
to
ab8556d
Compare
b8f2582
to
36a7f62
Compare
I removed edits to |
36a7f62
to
5da8699
Compare
@@ -64,3 +74,16 @@ def test_get_filepath_or_buffer_with_buffer(self): | |||
input_buffer = StringIO() | |||
filepath_or_buffer, _, _ = common.get_filepath_or_buffer(input_buffer) | |||
self.assertEqual(filepath_or_buffer, input_buffer) | |||
|
|||
def test_iterator(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add tests for read_stata
& read_sas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, done
5da8699
to
04f7280
Compare
04f7280
to
88b5b57
Compare
If the approach is not overkill, it can be used also in
UTF8Recoder
andUnicodeReader
frompandas/common/io.py
. Closes #12153.