Skip to content

Iterableiterator #12173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Iterableiterator #12173

wants to merge 3 commits into from

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Jan 29, 2016

If the approach is not overkill, it can be used also in UTF8Recoder and UnicodeReader from pandas/common/io.py. Closes #12153.

@jreback
Copy link
Contributor

jreback commented Jan 29, 2016

see #9496

"""Subclass this and provide a "__next__()" method to obtain an iterator.
Useful only when the object being iterated is non-reusable (e.g. OK for a
parser, not for an in-memory table)."""
def __iter__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to pandas/io/common.py, inherit from object, call this BaseIterable, raise AbstractMethodError for __next__

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a parent class of Boto and UTFReader as well in`io/common.py``

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also incorporate elements of TableIterator in io/pytables.py (and make this a sub-class)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseIterable or BaseIterator? True, it makes an iterable out of an iterator, but most iterables will not be iterators.

@jreback jreback added IO Data IO issues that don't fit into a more specific label Internals Related to non-user accessible pandas implementation Compat pandas objects compatability with Numpy or Python functions labels Jan 29, 2016
@@ -343,3 +343,14 @@ def is_platform_mac():

def is_platform_32bit():
return struct.calcsize("P") * 8 < 64


class IterableIterator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inherit from object?

@toobaz
Copy link
Member Author

toobaz commented Jan 30, 2016

TableIterator in io/pytables.py is still a bit too cryptic to me, but I can try to take a look at it next week.

it = read_csv(StringIO(self.data1), chunksize=1)
first = next(it)
tm.assert_frame_equal(first, expected.iloc[[0]])
expected.index = [0 for i in range(len(expected))]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each chunk has an index restarting from 0 (also in master). Isn't this a bug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that shouldn't be the case

In [46]: s = DataFrame({'A' : range(10)}).to_csv(None)

In [47]: s
Out[47]: ',A\n0,0\n1,1\n2,2\n3,3\n4,4\n5,5\n6,6\n7,7\n8,8\n9,9\n'

In [48]: pd.read_csv(StringIO(s),index_col=0)
Out[48]: 
   A
0  0
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
9  9

In [49]: list(pd.read_csv(StringIO(s),index_col=0,chunksize=4))
Out[49]: 
[   A
 0  0
 1  1
 2  2
 3  3,    A
 4  4
 5  5
 6  6
 7  7,    A
 8  8
 9  9]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean when the index is not loaded:

In [7]: list(pd.read_csv(StringIO(s), chunksize=4))
Out[7]: 
[   Unnamed: 0  A
 0           0  0
 1           1  1
 2           2  2
 3           3  3,    Unnamed: 0  A
 0           4  4
 1           5  5
 2           6  6
 3           7  7,    Unnamed: 0  A
 0           8  8
 1           9  9]

Wouldn't we prefer a consistent index?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yes I agree

you'd have to keep state in the in the iterators I think though -
I do this is HDFStore iirc

but might be a bit non trivial

if u think u can fix easily - go ahead
otherwise create an issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'd like to do it but it probably won't be immediate: #12185 .

@jreback
Copy link
Contributor

jreback commented Jan 30, 2016

ok on doing HDFStore iterator in another issue.

return self

def __next__(self):
raise AbstractMethodError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractMethodError(self)

@toobaz
Copy link
Member Author

toobaz commented Feb 3, 2016

I removed edits to BotoFileLikeReader because as far as I can judge the failing tests were due to it (through its ancestor key.Key) already being an iterable, and knowing nothing about this format/service it's hard for me to debug.

@jreback jreback added this to the 0.18.0 milestone Feb 5, 2016
@jreback jreback mentioned this pull request Feb 5, 2016
4 tasks
@@ -64,3 +74,16 @@ def test_get_filepath_or_buffer_with_buffer(self):
input_buffer = StringIO()
filepath_or_buffer, _, _ = common.get_filepath_or_buffer(input_buffer)
self.assertEqual(filepath_or_buffer, input_buffer)

def test_iterator(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add tests for read_stata & read_sas

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, done

@jreback jreback closed this in 45a83a0 Feb 6, 2016
@jreback
Copy link
Contributor

jreback commented Feb 6, 2016

thanks @toobaz

I have marked off csv,stat,sas, remaining are hdf & msgpack which are in #9496

@toobaz toobaz deleted the iterableiterator branch February 6, 2016 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Internals Related to non-user accessible pandas implementation IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants