@@ -90,6 +90,7 @@ By using some special functions:
90
90
df[' group' ] = pd.cut(df.value, range (0 , 105 , 10 ), right = False , labels = labels)
91
91
df.head(10 )
92
92
93
+ See :ref: `documentation <reshaping.tile.cut >` for :func: `~pandas.cut `.
93
94
94
95
`Categoricals ` have a specific ``category `` :ref: `dtype <basics.dtypes >`:
95
96
@@ -331,6 +332,57 @@ Operations
331
332
332
333
The following operations are possible with categorical data:
333
334
335
+ Comparing `Categoricals ` with other objects is possible in two cases:
336
+
337
+ * comparing a `Categorical ` to another `Categorical `, when `level ` and `ordered ` is the same or
338
+ * comparing a `Categorical ` to a scalar.
339
+
340
+ All other comparisons will raise a TypeError.
341
+
342
+ .. ipython :: python
343
+
344
+ cat = pd.Series(pd.Categorical([1 ,2 ,3 ], levels = [3 ,2 ,1 ]))
345
+ cat_base = pd.Series(pd.Categorical([2 ,2 ,2 ], levels = [3 ,2 ,1 ]))
346
+ cat_base2 = pd.Series(pd.Categorical([2 ,2 ,2 ]))
347
+
348
+ cat
349
+ cat_base
350
+ cat_base2
351
+
352
+ Comparing to a categorical with the same levels and ordering or to a scalar works:
353
+
354
+ .. ipython :: python
355
+
356
+ cat > cat_base
357
+ cat > 2
358
+
359
+ This doesn't work because the levels are not the same:
360
+
361
+ .. ipython :: python
362
+
363
+ try :
364
+ cat > cat_base2
365
+ except TypeError as e:
366
+ print (" TypeError: " + str (e))
367
+
368
+ .. note ::
369
+
370
+ Comparisons with `Series `, `np.array ` or a `Categorical ` with different levels or ordering
371
+ will raise an `TypeError ` because custom level ordering would result in two valid results:
372
+ one with taking in account the ordering and one without. If you want to compare a `Categorical `
373
+ with such a type, you need to be explicit and convert the `Categorical ` to values:
374
+
375
+ .. ipython :: python
376
+
377
+ base = np.array([1 ,2 ,3 ])
378
+
379
+ try :
380
+ cat > base
381
+ except TypeError as e:
382
+ print (" TypeError: " + str (e))
383
+
384
+ np.asarray(cat) > base
385
+
334
386
Getting the minimum and maximum, if the categorical is ordered:
335
387
336
388
.. ipython :: python
@@ -489,34 +541,38 @@ but the levels of these `Categoricals` need to be the same:
489
541
490
542
.. ipython :: python
491
543
492
- cat = pd.Categorical([" a" ," b" ], levels = [" a" ," b" ])
493
- vals = [1 ,2 ]
494
- df = pd.DataFrame({" cats" :cat, " vals" :vals})
495
- res = pd.concat([df,df])
496
- res
497
- res.dtypes
544
+ cat = pd.Categorical([" a" ," b" ], levels = [" a" ," b" ])
545
+ vals = [1 ,2 ]
546
+ df = pd.DataFrame({" cats" :cat, " vals" :vals})
547
+ res = pd.concat([df,df])
548
+ res
549
+ res.dtypes
498
550
499
- df_different = df.copy()
500
- df_different[" cats" ].cat.levels = [" a" ," b" ," c" ]
551
+ In this case the levels are not the same and so an error is raised:
501
552
502
- try :
503
- pd.concat([df,df])
504
- except ValueError as e:
505
- print (" ValueError: " + str (e))
553
+ .. ipython :: python
554
+
555
+ df_different = df.copy()
556
+ df_different[" cats" ].cat.levels = [" a" ," b" ," c" ]
557
+ try :
558
+ pd.concat([df,df_different])
559
+ except ValueError as e:
560
+ print (" ValueError: " + str (e))
506
561
507
562
The same applies to ``df.append(df) ``.
508
563
509
564
Getting Data In/Out
510
565
-------------------
511
566
512
- Writing data (`Series `, `Frames `) to a HDF store that contains a ``category `` dtype will currently raise ``NotImplementedError ``.
567
+ Writing data (`Series `, `Frames `) to a HDF store that contains a ``category `` dtype will currently
568
+ raise ``NotImplementedError ``.
513
569
514
570
Writing to a CSV file will convert the data, effectively removing any information about the
515
571
`Categorical ` (levels and ordering). So if you read back the CSV file you have to convert the
516
572
relevant columns back to `category ` and assign the right levels and level ordering.
517
573
518
574
.. ipython :: python
519
- :suppress:
575
+ :suppress:
520
576
521
577
from pandas.compat import StringIO
522
578
@@ -548,7 +604,7 @@ default not included in computations. See the :ref:`Missing Data section
548
604
<missing_data>`
549
605
550
606
There are two ways a `np.nan ` can be represented in `Categorical `: either the value is not
551
- available or `np.nan ` is a valid level.
607
+ available ("missing value") or `np.nan ` is a valid level.
552
608
553
609
.. ipython :: python
554
610
@@ -560,9 +616,25 @@ available or `np.nan` is a valid level.
560
616
s2.cat.levels = [1 ,2 ,np.nan]
561
617
s2
562
618
# three levels, np.nan included
563
- # Note: as int arrays can't hold NaN the levels were converted to float
619
+ # Note: as int arrays can't hold NaN the levels were converted to object
564
620
s2.cat.levels
565
621
622
+ .. note ::
623
+ Missing value methods like ``isnull `` and ``fillna `` will take both missing values as well as
624
+ `np.nan ` levels into account:
625
+
626
+ .. ipython :: python
627
+
628
+ c = pd.Categorical([" a" ," b" ,np.nan])
629
+ c.levels = [" a" ," b" ,np.nan]
630
+ # will be inserted as a NA level:
631
+ c[0 ] = np.nan
632
+ s = pd.Series(c)
633
+ s
634
+ pd.isnull(s)
635
+ s.fillna(" a" )
636
+
637
+
566
638
Gotchas
567
639
-------
568
640
@@ -579,15 +651,18 @@ object and not as a low level `numpy` array dtype. This leads to some problems.
579
651
try :
580
652
np.dtype(" category" )
581
653
except TypeError as e:
582
- print (" TypeError: " + str (e))
654
+ print (" TypeError: " + str (e))
583
655
584
656
dtype = pd.Categorical([" a" ]).dtype
585
657
try :
586
658
np.dtype(dtype)
587
659
except TypeError as e:
588
660
print (" TypeError: " + str (e))
589
661
590
- # dtype comparisons work:
662
+ Dtype comparisons work:
663
+
664
+ .. ipython :: python
665
+
591
666
dtype == np.str_
592
667
np.str_ == dtype
593
668
0 commit comments