Skip to content

SAS chunksize / iteration issues #14743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 28, 2016

Conversation

kshedden
Copy link
Contributor

@kshedden kshedden commented Nov 25, 2016

closes #14734
closes #13654

  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

@jreback jreback added this to the Next Major Release milestone Nov 25, 2016
@jreback jreback added Bug IO SAS SAS: read_sas labels Nov 25, 2016
@jreback jreback changed the title Fix 14734 SAS chunksize / iteration issues Nov 25, 2016
@@ -65,6 +65,32 @@ def test_from_iterator(self):
df = rdr.read(3)
tm.assert_frame_equal(df, df0.iloc[2:5, :])

def test_iterator_loop(self):
for j in 0, 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue references here

y = 0
for x in rdr:
y += x.shape[0]
assert(y == rdr.row_count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.assertTrue

@jreback
Copy link
Contributor

jreback commented Nov 25, 2016

pls add to the whatsnew. ping when ready / green.

@jreback jreback modified the milestones: 0.19.2, Next Major Release Nov 25, 2016
@jorisvandenbossche
Copy link
Member

Question: this is not a problem in the xport reader? If so, is there a test that confirms this?

@kshedden
Copy link
Contributor Author

It works there but I added a few tests.

@jorisvandenbossche
Copy link
Member

@kshedden Thanks. You can move the whatsnew notice to the 0.19.2 file

fname = os.path.join(self.dirpath, "test%d.sas7bdat" % k)
with open(fname, 'rb') as f:
byts = f.read()
buf = io.BytesIO(byts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you read the file into a bytes object and not just pass the fname to read_sas ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to work either way so I took all but one of the bytesio tests and simplified to a direct file read.

@codecov-io
Copy link

codecov-io commented Nov 26, 2016

Current coverage is 85.22% (diff: 0.00%)

Merging #14743 into master will increase coverage by <.01%

@@             master     #14743   diff @@
==========================================
  Files           143        143          
  Lines         50807      50857    +50   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43297      43344    +47   
- Misses         7510       7513     +3   
  Partials          0          0          

Powered by Codecov. Last update d8e427b...28d4038

with open(fname, 'rb') as f:
byts = f.read()
buf = io.BytesIO(byts)
df = pd.read_sas(buf, format="sas7bdat", encoding='utf-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more logical to keep the buffer reading here, as this test is called "test_from_buffer" is I suppose is exactly testing this (and then maybe use plain reading from file in "test_iterator_read_too_much")

@jorisvandenbossche jorisvandenbossche merged commit c5f219a into pandas-dev:master Nov 28, 2016
@jorisvandenbossche
Copy link
Member

@kshedden Thanks for the quick fix!

jorisvandenbossche pushed a commit that referenced this pull request Dec 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO SAS SAS: read_sas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_sas with chunksize/iterator raises ValueError Issue iterating with pd.read_sas
4 participants