DOC: Fix docs on merging categoricals. #28185

ivirshup · 2019-08-28T07:35:14Z

Updated docs about merging categorical to reflect current behavior.

WillAyd · 2019-08-28T15:16:53Z

doc/source/user_guide/categorical.rst

@@ -813,16 +813,16 @@ but the categories of these categoricals need to be the same:
    res
    res.dtypes

-In this case the categories are not the same, and therefore an error is raised:
+If the categories are not exactly the same, merging will coerce the


I find this wording a little hard to digest - we are just coercing the categorical to their values as part of this right?

I agree it could be more clear, but I'm struggling with trying to keep the terminology consistent with existent docs. How about this?

Otherwise the result's dtype will be determined by promotion of the mergands' categories' dtypes.

How about

If the categories are not identical, categorical columns will be cast to a regular array with
the dtype of the underlying categories, which will likely have higher memory usage.

That sounds good. Should there be an example which doesn't cast to object to make that clear? Something like this:

In [101]: pd.concat([ ...: pd.Series([1, 2], dtype="category"), ...: pd.Series([3., 4.], dtype="category") ...: ]) Out[101]: 0 1.0 1 2.0 0 3.0 1 4.0 dtype: float64

ivirshup · 2019-09-02T00:54:43Z

I've just noticed there is another section on Concatenation, as opposed to Merging, in the categorical docs. These docs describe different behavior, but still don't quite fit with what actually happens. Having two sections on this seems redundant to me, could these be collapsed to a single section in this PR?

WillAyd · 2019-09-02T21:30:20Z

I would be OK with that. It's what we seem to have set up in the 10 minutes to pandas:

https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#merge

jreback · 2019-10-06T22:44:06Z

@ivirshup can you merge master and we will look again

Fixes pandas-dev#28166.

* Added an examples where categoricals are concatenated which results in a numeric dtype. * Removed a table of examples which seemed confusion (most entries were equivalent, gave misleading typing info).

ivirshup · 2019-10-21T01:39:00Z

@jreback, I've merged master (albeit a couple weeks ago now) and consolidated the sections on concatenation and merging.

WillAyd · 2019-10-22T01:39:54Z

doc/source/user_guide/categorical.rst

-   union_categoricals([s1.array, s3.array])
-
-
-Following table summarizes the results of ``Categoricals`` related concatenations.


Can you keep this table? I think a good summary of what happens

Though update to reflect current status

Seems logical. Not sure if we have better verbiage than saying category (object) to refer to the categories of the categorical; @TomAugspurger might have thoughts

I think I'd like to keep it close to the repr, since that should make it easier to relate to practice

>>> pd.Categorical(list("abc")) [a, b, c] Categories (3, object): [a, b, c]

Sorry for the delay @WillAyd, I thought we were waiting on @TomAugspurger. Just pushed an update. Hopefully I got the table right.

WillAyd · 2019-11-08T00:06:09Z

@ivirshup can you address comment on table deletion? I think this is close

WillAyd · 2019-11-08T16:10:13Z

lgtm @jreback or @TomAugspurger if you want to look

TomAugspurger · 2019-11-08T16:14:24Z

Won't have time nearterm. Go for it when you're ready.

…

On Fri, Nov 8, 2019 at 10:10 AM William Ayd ***@***.***> wrote: lgtm @jreback <https://github.com/jreback> or @TomAugspurger <https://github.com/TomAugspurger> if you want to look — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28185?email_source=notifications&email_token=AAKAOISRC3ADR3PICYQNXEDQSWFO3A5CNFSM4IQ5YBY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDSSLZY#issuecomment-551888359>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOIT2WRDSND3RKKNOTCDQSWFO3ANCNFSM4IQ5YBYQ> .

WillAyd · 2019-11-08T16:47:44Z

Thanks @ivirshup

…ndexing-1row-df * upstream/master: (109 commits) stronger typing in libreduction (pandas-dev#29502) API: rename labels to codes (pandas-dev#29509) CLN: remove unnecessary type checks (pandas-dev#29517) implement _BaseGrouper (pandas-dev#29520) CLN: F-string formatting in pandas/_libs/*.pyx (pandas-dev#29527) Fixed more SS03 errors (pandas-dev#29540) consolidate dim checks (pandas-dev#29536) REF: separate out _get_cython_func_and_vals (pandas-dev#29537) remove unnecessary exception (pandas-dev#29538) TST:Add test to check single category col returns series with single row slice (pandas-dev#29521) Make color validation more forgiving (pandas-dev#29122) DOC: update bottleneck repo and documentation urls (pandas-dev#29516) TST: add test for df construction from dict with tuples (pandas-dev#29497) add test for pd.melt dtypes preservation (pandas-dev#29510) updated DataFrame.equals docstring (pandas-dev#29496) Resolved merge conflicts (pandas-dev#29506) DOC: Improved pandas/compact/__init__.py (pandas-dev#29507) DOC: Update performance comparison section of io docs (pandas-dev#28890) TST: add test for df.where() with category dtype (pandas-dev#29454) DOC: Fix docs on merging categoricals. (pandas-dev#28185) ...

WillAyd reviewed Aug 28, 2019

View reviewed changes

WillAyd added the Docs label Aug 28, 2019

WillAyd added this to the 1.0 milestone Aug 28, 2019

ivirshup added 2 commits October 7, 2019 18:26

DOC: Fix docs on merging categoricals.

1b9bb12

Fixes pandas-dev#28166.

DOC: Combine concat/ merge sections for categoricals

9fb6d67

ivirshup force-pushed the docfix-concat-cats branch from 846eb70 to 9fb6d67 Compare October 7, 2019 09:51

DOC: Concat categoricals example with numeric result.

2519b2d

* Added an examples where categoricals are concatenated which results in a numeric dtype. * Removed a table of examples which seemed confusion (most entries were equivalent, gave misleading typing info).

WillAyd requested changes Oct 22, 2019

View reviewed changes

Add back table

1392e67

WillAyd approved these changes Nov 8, 2019

View reviewed changes

WillAyd merged commit 3b58f48 into pandas-dev:master Nov 8, 2019

Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019

DOC: Fix docs on merging categoricals. (pandas-dev#28185)

301a682

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

DOC: Fix docs on merging categoricals. (pandas-dev#28185)

bcf9674

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

DOC: Fix docs on merging categoricals. (pandas-dev#28185)

fac909e

		union_categoricals([s1.array, s3.array])


		Following table summarizes the results of ``Categoricals`` related concatenations.

Uh oh!

DOC: Fix docs on merging categoricals. #28185

DOC: Fix docs on merging categoricals. #28185

Uh oh!

Conversation

ivirshup commented Aug 28, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivirshup commented Sep 2, 2019

Uh oh!

WillAyd commented Sep 2, 2019

Uh oh!

jreback commented Oct 6, 2019

Uh oh!

ivirshup commented Oct 21, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd commented Nov 8, 2019

Uh oh!

WillAyd commented Nov 8, 2019

Uh oh!

TomAugspurger commented Nov 8, 2019 via email

Uh oh!

WillAyd commented Nov 8, 2019

Uh oh!

Uh oh!