-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Added note to io.rst regarding reading in mixed dtypes #13782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
type; if something breaks during that process, the engine will go to the | ||
next ``dtype`` and the data is left modified in place. For example, | ||
|
||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather you use an ipython block and show the dtypes of the result
@jreback Updated to ipython blocks. I accidentally amended my commit instead of rebasing, but it looks like it doesn't really matter. |
Current coverage is 85.23% (diff: 100%)@@ master #13782 diff @@
==========================================
Files 140 140
Lines 50415 50415
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
Hits 42971 42971
Misses 7444 7444
Partials 0 0
|
I don't think that does matter, if you rebase (or amend, which is in effect quite the same: you change the commit) and force push again, you will also get an additional travis build.
That does indeed not matter. Amending is perfectly fine (making a separate commit and pushing that is also fine), rebasing is only strictly necessary if there are merge conflicts to solve (or to incorporate changes from master, but that is not essential here) |
df = pd.DataFrame({'col_1':range(500000) + ['a', 'b'] + range(500000)}) | ||
df.to_csv('foo') | ||
mixed_df = pd.read_csv('foo') | ||
Counter(mixed_df['col_1'].apply(lambda x: type(x))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use value_counts
? Counter
is also a nice solution, but I would use the pandas equivalent
@wcwagner I added some comments, but it is a really nice addition to the docs! |
@jorisvandenbossche Thanks for the help! |
df = pd.DataFrame({'col_1':range(500000) + ['a', 'b'] + range(500000)}) | ||
df.to_csv('foo') | ||
mixed_df = pd.read_csv('foo') | ||
mixed_df['col_1'].apply(lambda x: type(x)).value_counts() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit: mixed_df['col_1'].apply(type).value_counts()
is more idiomatic
@wcwagner looks really good. just some more links and think will be good. I would also add a note in whatsnew (maybe enhancements) with a link to the new documentation (eg saying that its updated). |
…xample, clarified type inference process
thanks @wcwagner check out http://pandas-docs.github.io/pandas-docs-travis/ in a while to see the built docs. If things don't look exactly right, pls issue a followup PR. |
git diff upstream/master | flake8 --diff
This is my first time contributing to the docs, I will gladly take any criticism.
Thanks