#26545 Fix: same .tsv file, get different data-frame structure using engine 'python' and 'c' #26634

luckydenis · 2019-06-03T14:46:46Z

closes same .tsv file, get different data-frame structure using engine 'python' and 'c' #26545
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

BugFix:
When using engine='python', columns were handled incorrectly if the first header had in the bom.

Bug:

In [1]: import pandas as pd                                                             

In [2]: pd.read_csv('test.txt', engine='python', delimiter='\t')              
Out[2]: 
Empty DataFrame
Columns: [Project ID]
Index: []

In [3]: pd.read_csv('test.txt', engine='python', delimiter='\t').shape        
Out[3]: (0, 1)

In [4]: pd.read_csv('test.txt', delimiter='\t')                               
Out[4]: 
Empty DataFrame
Columns: [Project ID, Project Name, Product Name]
Index: []

In [5]: pd.read_csv('test, delimiter='\t').shape                         
Out[5]: (0, 3)

Fix:

In [1]: import pandas as pd                                                             

In [2]: pd.read_csv('test.txt', engine='python', delimiter='\t')              
Out[2]: 
Empty DataFrame
Columns: [Project ID, Project Name, Product Name]
Index: []

In [3]: pd.read_csv('test.txt', engine='python', delimiter='\t').shape        
Out[3]: (0, 3)

In [4]: pd.read_csv('test.txt', delimiter='\t')                               
Out[4]: 
Empty DataFrame
Columns: [Project ID, Project Name, Product Name]
Index: []

In [5]: pd.read_csv('test, delimiter='\t').shape                         
Out[5]: (0, 3)

…'python' and 'c'

…engine 'python' and 'c'

pep8speaks · 2019-06-03T14:46:54Z

Hello @luckydenis! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-11 07:36:17 UTC

…engine 'python' and 'c'

pandas/tests/io/parser/test_check_for_bom.py

pandas/tests/io/parser/data/bom_first_line.txt

WillAyd · 2019-06-03T15:43:42Z

This closes an issue right? If so can you update the OP to reflect that

codecov · 2019-06-03T15:48:21Z

Codecov Report

Merging #26634 into master will decrease coverage by 50.09%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #26634      +/-   ##
==========================================
- Coverage   91.88%   41.78%   -50.1%     
==========================================
  Files         174      174              
  Lines       50692    50692              
==========================================
- Hits        46576    21182   -25394     
- Misses       4116    29510   +25394

Flag	Coverage Δ
#multiple	`?`
#single	`41.78% <ø> (-0.11%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.37%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.16%)`	⬇️
pandas/io/sas/sas_xport.py	`0% <0%> (-90.1%)`	⬇️
pandas/core/sparse/scipy_sparse.py	`10.14% <0%> (-89.86%)`	⬇️
pandas/core/tools/numeric.py	`10.14% <0%> (-89.86%)`	⬇️
... and 128 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d124ea...0816572. Read the comment docs.

codecov · 2019-06-03T15:48:23Z

Codecov Report

Merging #26634 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #26634      +/-   ##
==========================================
- Coverage   91.72%   91.71%   -0.01%     
==========================================
  Files         178      178              
  Lines       50779    50779              
==========================================
- Hits        46578    46574       -4     
- Misses       4201     4205       +4

Flag	Coverage Δ
#multiple	`90.31% <ø> (ø)`	⬆️
#single	`41.19% <ø> (-0.09%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`96.88% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 157a4e3...fb010d5. Read the comment docs.

luckydenis · 2019-06-03T18:05:07Z

@WillAyd, I made a correction on your comments. Look please)

pandas/tests/io/parser/test_common.py

pandas/io/parsers.py

WillAyd

Thanks for the updates - can you also add a whatsnew note for 0.25?

cc @gfyoung if you care to take a look

WillAyd · 2019-06-04T11:20:51Z

@luckydenis could you also check what test_utf8_bom is doing in the test module? It looks to cover the same intention as test added so want to make sure we understand the difference and clarify accordingly

luckydenis · 2019-06-04T12:38:43Z

In this context, I'm not sure that they are different, but my PR cures this error, which tm method should be used to check it? The error was in the method that was cleaning from the bom.

In [1]: import pandas as pd                                                             

In [2]: pd.read_csv('test.txt', engine='python', delimiter='\t')              
Out[2]: 
Empty DataFrame
Columns: [Project ID]
Index: []

In [3]: pd.read_csv('test.txt', engine='python', delimiter='\t').shape        
Out[3]: (0, 1)

In [4]: pd.read_csv('test.txt', delimiter='\t')                               
Out[4]: 
Empty DataFrame
Columns: [Project ID, Project Name, Product Name]
Index: []

In [5]: pd.read_csv('test, delimiter='\t').shape                         
Out[5]: (0, 3)

pandas/tests/io/parser/test_common.py

WillAyd

I think this is good if you can add a whatsnew for 0.25 then lgtm!

jreback · 2019-06-06T14:59:26Z

@gfyoung ok with this?

gfyoung · 2019-06-06T18:10:28Z

@jreback : Looks fine to me, just need the whatsnew as @WillAyd said

luckydenis · 2019-06-10T13:30:41Z

Yeah, I'll add a description in the whatsnew, just haven't had time.

luckydenis · 2019-06-10T16:01:54Z

@jreback, @WillAyd, @gfyoung I added, look please.

doc/source/whatsnew/v0.25.0.rst

Co-Authored-By: William Ayd <[email protected]>

WillAyd

lgtm @gfyoung

gfyoung

@jreback if we have anything else

jreback · 2019-06-10T22:27:42Z

lgtm. @luckydenis if you can merge master and ping on green to resolve the conflict.

jreback · 2019-06-12T18:40:51Z

thanks @luckydenis

finik-dev-team added 2 commits June 3, 2019 16:12

fix: same .tsv file, get different data-frame structure using engine …

0c8a223

…'python' and 'c'

#26545 fix: same .tsv file, get different data-frame structure using …

0df0744

…engine 'python' and 'c'

#26545 fix: same .tsv file, get different data-frame structure using …

11214de

…engine 'python' and 'c'

WillAyd requested changes Jun 3, 2019

View reviewed changes

pandas/tests/io/parser/test_check_for_bom.py Outdated Show resolved Hide resolved

pandas/tests/io/parser/data/bom_first_line.txt Outdated Show resolved Hide resolved

WillAyd added the IO CSV read_csv, to_csv label Jun 3, 2019

test-fix

0816572

Correction of comments in the review

45cacb0

WillAyd requested changes Jun 3, 2019

View reviewed changes

pandas/tests/io/parser/test_common.py Outdated Show resolved Hide resolved

pandas/io/parsers.py Show resolved Hide resolved

Corrected the tests, according to a comment in the review

d5c593c

WillAyd requested changes Jun 4, 2019

View reviewed changes

gfyoung reviewed Jun 4, 2019

View reviewed changes

pandas/tests/io/parser/test_common.py Show resolved Hide resolved

WillAyd requested changes Jun 5, 2019

View reviewed changes

WillAyd added this to the 0.25.0 milestone Jun 5, 2019

finik-dev-team and others added 2 commits June 10, 2019 16:42

add whatnew

1fd50eb

Merge branch 'master' into dev-26545

84252c5

gfyoung reviewed Jun 10, 2019

View reviewed changes

doc/source/whatsnew/v0.25.0.rst Outdated Show resolved Hide resolved

luckydenis and others added 2 commits June 10, 2019 19:33

Update doc/source/whatsnew/v0.25.0.rst

946f9f0

Co-Authored-By: William Ayd <[email protected]>

Update v0.25.0.rst

4cf74f9

WillAyd approved these changes Jun 10, 2019

View reviewed changes

gfyoung approved these changes Jun 10, 2019

View reviewed changes

Merge branch 'master' into dev-26545

fb010d5

jreback merged commit a137a9c into pandas-dev:master Jun 12, 2019

luckydenis deleted the dev-26545 branch June 13, 2019 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#26545 Fix: same .tsv file, get different data-frame structure using engine 'python' and 'c' #26634

#26545 Fix: same .tsv file, get different data-frame structure using engine 'python' and 'c' #26634

luckydenis commented Jun 3, 2019 •

edited

Loading

pep8speaks commented Jun 3, 2019 •

edited

Loading

WillAyd commented Jun 3, 2019

codecov bot commented Jun 3, 2019

codecov bot commented Jun 3, 2019 •

edited

Loading

luckydenis commented Jun 3, 2019

WillAyd left a comment

WillAyd commented Jun 4, 2019

luckydenis commented Jun 4, 2019 •

edited

Loading

WillAyd left a comment

jreback commented Jun 6, 2019

gfyoung commented Jun 6, 2019

luckydenis commented Jun 10, 2019

luckydenis commented Jun 10, 2019

WillAyd left a comment

gfyoung left a comment

jreback commented Jun 10, 2019

jreback commented Jun 12, 2019

#26545 Fix: same .tsv file, get different data-frame structure using engine 'python' and 'c' #26634

#26545 Fix: same .tsv file, get different data-frame structure using engine 'python' and 'c' #26634

Conversation

luckydenis commented Jun 3, 2019 • edited Loading

pep8speaks commented Jun 3, 2019 • edited Loading

Comment last updated at 2019-06-11 07:36:17 UTC

WillAyd commented Jun 3, 2019

codecov bot commented Jun 3, 2019

Codecov Report

codecov bot commented Jun 3, 2019 • edited Loading

Codecov Report

luckydenis commented Jun 3, 2019

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd commented Jun 4, 2019

luckydenis commented Jun 4, 2019 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

jreback commented Jun 6, 2019

gfyoung commented Jun 6, 2019

luckydenis commented Jun 10, 2019

luckydenis commented Jun 10, 2019

WillAyd left a comment

Choose a reason for hiding this comment

gfyoung left a comment

Choose a reason for hiding this comment

jreback commented Jun 10, 2019

jreback commented Jun 12, 2019

luckydenis commented Jun 3, 2019 •

edited

Loading

pep8speaks commented Jun 3, 2019 •

edited

Loading

codecov bot commented Jun 3, 2019 •

edited

Loading

luckydenis commented Jun 4, 2019 •

edited

Loading