Skip to content

DOC: Added note to io.rst regarding reading in mixed dtypes #13782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

wcwagner
Copy link
Contributor

This is my first time contributing to the docs, I will gladly take any criticism.

Thanks

type; if something breaks during that process, the engine will go to the
next ``dtype`` and the data is left modified in place. For example,

.. code-block:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather you use an ipython block and show the dtypes of the result

@wcwagner
Copy link
Contributor Author

@jreback Updated to ipython blocks. I accidentally amended my commit instead of rebasing, but it looks like it doesn't really matter.

@codecov-io
Copy link

codecov-io commented Jul 25, 2016

Current coverage is 85.23% (diff: 100%)

Merging #13782 into master will not change coverage

@@             master     #13782   diff @@
==========================================
  Files           140        140          
  Lines         50415      50415          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          42971      42971          
  Misses         7444       7444          
  Partials          0          0          

Powered by Codecov. Last update a3cddfa...7400607

@jorisvandenbossche
Copy link
Member

I noticed that there are two Travis CI builds for your commits, so you might want to rebase next time if you can

I don't think that does matter, if you rebase (or amend, which is in effect quite the same: you change the commit) and force push again, you will also get an additional travis build.

accidentally amended my commit instead of rebasing, but it looks like it doesn't really matter.

That does indeed not matter. Amending is perfectly fine (making a separate commit and pushing that is also fine), rebasing is only strictly necessary if there are merge conflicts to solve (or to incorporate changes from master, but that is not essential here)

df = pd.DataFrame({'col_1':range(500000) + ['a', 'b'] + range(500000)})
df.to_csv('foo')
mixed_df = pd.read_csv('foo')
Counter(mixed_df['col_1'].apply(lambda x: type(x)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use value_counts? Counter is also a nice solution, but I would use the pandas equivalent

@jorisvandenbossche
Copy link
Member

@wcwagner I added some comments, but it is a really nice addition to the docs!

@wcwagner
Copy link
Contributor Author

@jorisvandenbossche Thanks for the help!

df = pd.DataFrame({'col_1':range(500000) + ['a', 'b'] + range(500000)})
df.to_csv('foo')
mixed_df = pd.read_csv('foo')
mixed_df['col_1'].apply(lambda x: type(x)).value_counts()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: mixed_df['col_1'].apply(type).value_counts() is more idiomatic

@jreback jreback added this to the 0.19.0 milestone Jul 26, 2016
@jreback
Copy link
Contributor

jreback commented Jul 26, 2016

@wcwagner looks really good. just some more links and think will be good.

I would also add a note in whatsnew (maybe enhancements) with a link to the new documentation (eg saying that its updated).

@jreback
Copy link
Contributor

jreback commented Jul 27, 2016

thanks @wcwagner
great PR!

check out http://pandas-docs.github.io/pandas-docs-travis/ in a while to see the built docs. If things don't look exactly right, pls issue a followup PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: document the perils of reading mixed dtypes / how to handle
5 participants