DOC: Added note to io.rst regarding reading in mixed dtypes #13782

wcwagner · 2016-07-25T02:35:39Z

closes DOC: document the perils of reading mixed dtypes / how to handle #13746
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

This is my first time contributing to the docs, I will gladly take any criticism.

Thanks

jreback · 2016-07-25T02:41:46Z

doc/source/io.rst

+   type; if something breaks during that process, the engine will go to the
+   next ``dtype`` and the data is left modified in place. For example,
+
+   .. code-block:: python


I would rather you use an ipython block and show the dtypes of the result

wcwagner · 2016-07-25T03:58:51Z

@jreback Updated to ipython blocks. I accidentally amended my commit instead of rebasing, but it looks like it doesn't really matter.

codecov-io · 2016-07-25T06:55:43Z

Current coverage is 85.23% (diff: 100%)

Merging #13782 into master will not change coverage

@@             master     #13782   diff @@
==========================================
  Files           140        140          
  Lines         50415      50415          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          42971      42971          
  Misses         7444       7444          
  Partials          0          0

Powered by Codecov. Last update a3cddfa...7400607

jorisvandenbossche · 2016-07-25T07:36:34Z

I noticed that there are two Travis CI builds for your commits, so you might want to rebase next time if you can

I don't think that does matter, if you rebase (or amend, which is in effect quite the same: you change the commit) and force push again, you will also get an additional travis build.

accidentally amended my commit instead of rebasing, but it looks like it doesn't really matter.

That does indeed not matter. Amending is perfectly fine (making a separate commit and pushing that is also fine), rebasing is only strictly necessary if there are merge conflicts to solve (or to incorporate changes from master, but that is not essential here)

jorisvandenbossche · 2016-07-25T07:40:25Z

doc/source/io.rst

+       df = pd.DataFrame({'col_1':range(500000) + ['a', 'b'] + range(500000)})
+       df.to_csv('foo')
+       mixed_df = pd.read_csv('foo')
+       Counter(mixed_df['col_1'].apply(lambda x: type(x)))


Can you use value_counts? Counter is also a nice solution, but I would use the pandas equivalent

jorisvandenbossche · 2016-07-25T07:50:31Z

@wcwagner I added some comments, but it is a really nice addition to the docs!

wcwagner · 2016-07-25T14:11:19Z

@jorisvandenbossche Thanks for the help!

jreback · 2016-07-25T21:14:18Z

doc/source/io.rst

+       df = pd.DataFrame({'col_1':range(500000) + ['a', 'b'] + range(500000)})
+       df.to_csv('foo')
+       mixed_df = pd.read_csv('foo')
+       mixed_df['col_1'].apply(lambda x: type(x)).value_counts()


tiny nit: mixed_df['col_1'].apply(type).value_counts() is more idiomatic

jreback · 2016-07-26T22:37:30Z

@wcwagner looks really good. just some more links and think will be good.

I would also add a note in whatsnew (maybe enhancements) with a link to the new documentation (eg saying that its updated).

…xample, clarified type inference process

…whatsnew entry

jreback · 2016-07-27T10:40:12Z

thanks @wcwagner
great PR!

check out http://pandas-docs.github.io/pandas-docs-travis/ in a while to see the built docs. If things don't look exactly right, pls issue a followup PR.

jreback reviewed Jul 25, 2016
View reviewed changes

wcwagner force-pushed the doc/13746 branch from 4ec62ee to ff797af Compare July 25, 2016 03:51

jorisvandenbossche reviewed Jul 25, 2016
View reviewed changes

jorisvandenbossche added the Docs label Jul 25, 2016

jreback reviewed Jul 25, 2016
View reviewed changes

jreback added this to the 0.19.0 milestone Jul 26, 2016

wcwagner added 2 commits July 26, 2016 19:13

DOC: Added note to io.rst regarding reading in mixed dtypes

335d043

DOC: Swtiched Counter to value_counts, added low_memory alternative e…

b6e2b64

…xample, clarified type inference process

wcwagner added 3 commits July 26, 2016 19:13

DOC: Added short commentary on alternatives

ba4c2ce

DOC: Shortened note, moved alternatives to main text

8112ad5

DOC: Added refs to basics.dtypes and basics.object_conversion, added …

7400607

…whatsnew entry

wcwagner force-pushed the doc/13746 branch from ec3787f to 7400607 Compare July 27, 2016 00:31

jreback closed this in 63285a4 Jul 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Added note to io.rst regarding reading in mixed dtypes #13782

DOC: Added note to io.rst regarding reading in mixed dtypes #13782

wcwagner commented Jul 25, 2016

jreback Jul 25, 2016

wcwagner commented Jul 25, 2016

codecov-io commented Jul 25, 2016 •

edited

Loading

jorisvandenbossche commented Jul 25, 2016

jorisvandenbossche Jul 25, 2016

jorisvandenbossche commented Jul 25, 2016

wcwagner commented Jul 25, 2016

jreback Jul 25, 2016

jreback commented Jul 26, 2016

jreback commented Jul 27, 2016

DOC: Added note to io.rst regarding reading in mixed dtypes #13782

DOC: Added note to io.rst regarding reading in mixed dtypes #13782

Conversation

wcwagner commented Jul 25, 2016

jreback Jul 25, 2016

Choose a reason for hiding this comment

wcwagner commented Jul 25, 2016

codecov-io commented Jul 25, 2016 • edited Loading

Current coverage is 85.23% (diff: 100%)

jorisvandenbossche commented Jul 25, 2016

jorisvandenbossche Jul 25, 2016

Choose a reason for hiding this comment

jorisvandenbossche commented Jul 25, 2016

wcwagner commented Jul 25, 2016

jreback Jul 25, 2016

Choose a reason for hiding this comment

jreback commented Jul 26, 2016

jreback commented Jul 27, 2016

codecov-io commented Jul 25, 2016 •

edited

Loading