|
10 | 10 | p = doctools.TablePlotter()
|
11 | 11 |
|
12 | 12 |
|
13 |
| -**************************** |
14 |
| -Merge, join, and concatenate |
15 |
| -**************************** |
| 13 | +************************************ |
| 14 | +Merge, join, concatenate and compare |
| 15 | +************************************ |
16 | 16 |
|
17 | 17 | pandas provides various facilities for easily combining together Series or
|
18 | 18 | DataFrame with various kinds of set logic for the indexes
|
19 | 19 | and relational algebra functionality in the case of join / merge-type
|
20 | 20 | operations.
|
21 | 21 |
|
| 22 | +In addition, pandas also provides utilities to compare two Series or DataFrame |
| 23 | +and summarize their differences. |
| 24 | + |
22 | 25 | .. _merging.concat:
|
23 | 26 |
|
24 | 27 | Concatenating objects
|
@@ -1477,3 +1480,61 @@ exclude exact matches on time. Note that though we exclude the exact matches
|
1477 | 1480 | by='ticker',
|
1478 | 1481 | tolerance=pd.Timedelta('10ms'),
|
1479 | 1482 | allow_exact_matches=False)
|
| 1483 | +
|
| 1484 | +.. _merging.compare: |
| 1485 | + |
| 1486 | +Comparing objects |
| 1487 | +----------------- |
| 1488 | + |
| 1489 | +The :meth:`~Series.compare` and :meth:`~DataFrame.compare` methods allow you to |
| 1490 | +compare two DataFrame or Series, respectively, and summarize their differences. |
| 1491 | + |
| 1492 | +This feature was added in :ref:`V1.1.0 <whatsnew_110.dataframe_or_series_comparing>`. |
| 1493 | + |
| 1494 | +For example, you might want to compare two `DataFrame` and stack their differences |
| 1495 | +side by side. |
| 1496 | + |
| 1497 | +.. ipython:: python |
| 1498 | +
|
| 1499 | + df = pd.DataFrame( |
| 1500 | + { |
| 1501 | + "col1": ["a", "a", "b", "b", "a"], |
| 1502 | + "col2": [1.0, 2.0, 3.0, np.nan, 5.0], |
| 1503 | + "col3": [1.0, 2.0, 3.0, 4.0, 5.0] |
| 1504 | + }, |
| 1505 | + columns=["col1", "col2", "col3"], |
| 1506 | + ) |
| 1507 | + df |
| 1508 | +
|
| 1509 | +.. ipython:: python |
| 1510 | +
|
| 1511 | + df2 = df.copy() |
| 1512 | + df2.loc[0, 'col1'] = 'c' |
| 1513 | + df2.loc[2, 'col3'] = 4.0 |
| 1514 | + df2 |
| 1515 | +
|
| 1516 | +.. ipython:: python |
| 1517 | +
|
| 1518 | + df.compare(df2) |
| 1519 | +
|
| 1520 | +By default, if two corresponding values are equal, they will be shown as ``NaN``. |
| 1521 | +Furthermore, if all values in an entire row / column, the row / column will be |
| 1522 | +omitted from the result. The remaining differences will be aligned on columns. |
| 1523 | + |
| 1524 | +If you wish, you may choose to stack the differences on rows. |
| 1525 | + |
| 1526 | +.. ipython:: python |
| 1527 | +
|
| 1528 | + df.compare(df2, align_axis=0) |
| 1529 | +
|
| 1530 | +If you wish to keep all original rows and columns, set `keep_shape` argument |
| 1531 | +to ``True``. |
| 1532 | + |
| 1533 | +.. ipython:: python |
| 1534 | +
|
| 1535 | + df.compare(df2, keep_shape=True) |
| 1536 | +
|
| 1537 | +You may also keep all the original values even if they are equal. |
| 1538 | + |
| 1539 | +.. ipython:: python |
| 1540 | + df.compare(df2, keep_shape=True, keep_equal=True) |
0 commit comments