Skip to content

API: Warn or raise for > 1 char encoded sep #14120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Aug 30, 2016

The system file encoding can cause a separator to be encoded as more than one character even though it maybe provided as one character. Multi-char separators are not supported by the C engine, so we need to catch this case.

Closes #14065.

The system file encoding can cause a separator to be
encoded as more than one character even though it maybe
provided as one character.

Multi-char separators are not supported by the C engine,
so we need to catch this case.

Closes pandas-devgh-14065.
@codecov-io
Copy link

codecov-io commented Aug 30, 2016

Current coverage is 85.27% (diff: 100%)

Merging #14120 into master will increase coverage by <.01%

@@             master     #14120   diff @@
==========================================
  Files           139        139          
  Lines         50511      50517     +6   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43071      43078     +7   
+ Misses         7440       7439     -1   
  Partials          0          0          

Powered by Codecov. Last update 10bf721...152b685

@jreback jreback added IO CSV read_csv, to_csv Unicode Unicode strings labels Aug 31, 2016
@@ -457,6 +457,7 @@ API changes
- ``pd.Timedelta(None)`` is now accepted and will return ``NaT``, mirroring ``pd.Timestamp`` (:issue:`13687`)
- ``Timestamp``, ``Period``, ``DatetimeIndex``, ``PeriodIndex`` and ``.dt`` accessor have gained a ``.is_leap_year`` property to check whether the date belongs to a leap year. (:issue:`13727`)
- ``pd.read_hdf`` will now raise a ``ValueError`` instead of ``KeyError``, if a mode other than ``r``, ``r+`` and ``a`` is supplied. (:issue:`13623`)
- ``pd.read_csv()`` in the C engine will now issue a ``ParserWarning`` or raise a ``ValueError`` when ``sep`` encoded is more than one character long (:issue:`14065`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of a warning when sep is invalid, can you just raise?. when does this happen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback : This behaviour is consistent with whenever we passing a delimiter that has len > 1. If you specifically say engine='c', it will raise. If you don't specify, because engine defaults to c, it will warn instead. You can see that here for other options that are unsupported by the C engine.

@jreback
Copy link
Contributor

jreback commented Aug 31, 2016

do we need an additional test for this?

@gfyoung
Copy link
Member Author

gfyoung commented Aug 31, 2016

Test is in test_unsuppported.py, and that should be sufficient.

@jreback jreback added this to the 0.19.0 milestone Aug 31, 2016
@jreback
Copy link
Contributor

jreback commented Aug 31, 2016

thanks!

@jreback jreback closed this in 5db52f0 Aug 31, 2016
@gfyoung gfyoung deleted the multi-char-encoded branch August 31, 2016 16:16
jreback added a commit that referenced this pull request Sep 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Unicode Unicode strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants