Skip to content

Commit 5da2e44

Browse files
author
harisbal
committed
Merge branch 'master' into multi-index-merge
# Conflicts: # doc/source/merging.rst # doc/source/whatsnew/v0.23.0.txt # pandas/core/frame.py # pandas/core/generic.py # pandas/core/indexes/base.py # pandas/core/ops.py # pandas/core/reshape/merge.py # pandas/plotting/_misc.py # pandas/tests/reshape/merge/test_merge.py
2 parents 593d6cb + 4708db0 commit 5da2e44

File tree

7 files changed

+477
-124
lines changed

7 files changed

+477
-124
lines changed

doc/source/merging.rst

+50-50
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,10 @@ operations.
3131
Concatenating objects
3232
---------------------
3333

34-
The :func:`~pandas.concat` function (in the main pandas namespace) does all of
35-
the heavy lifting of performing concatenation operations along an axis while
36-
performing optional set logic (union or intersection) of the indexes (if any) on
37-
the other axes. Note that I say "if any" because there is only a single possible
34+
The :func:`~pandas.concat` function (in the main pandas namespace) does all of
35+
the heavy lifting of performing concatenation operations along an axis while
36+
performing optional set logic (union or intersection) of the indexes (if any) on
37+
the other axes. Note that I say "if any" because there is only a single possible
3838
axis of concatenation for Series.
3939

4040
Before diving into all of the details of ``concat`` and what it can do, here is
@@ -109,9 +109,9 @@ some configurable handling of "what to do with the other axes":
109109
to the actual data concatenation.
110110
- ``copy`` : boolean, default True. If False, do not copy data unnecessarily.
111111

112-
Without a little bit of context many of these arguments don't make much sense.
113-
Let's revisit the above example. Suppose we wanted to associate specific keys
114-
with each of the pieces of the chopped up DataFrame. We can do this using the
112+
Without a little bit of context many of these arguments don't make much sense.
113+
Let's revisit the above example. Suppose we wanted to associate specific keys
114+
with each of the pieces of the chopped up DataFrame. We can do this using the
115115
``keys`` argument:
116116

117117
.. ipython:: python
@@ -138,9 +138,9 @@ It's not a stretch to see how this can be very useful. More detail on this
138138
functionality below.
139139

140140
.. note::
141-
It is worth noting that :func:`~pandas.concat` (and therefore
142-
:func:`~pandas.append`) makes a full copy of the data, and that constantly
143-
reusing this function can create a significant performance hit. If you need
141+
It is worth noting that :func:`~pandas.concat` (and therefore
142+
:func:`~pandas.append`) makes a full copy of the data, and that constantly
143+
reusing this function can create a significant performance hit. If you need
144144
to use the operation over several datasets, use a list comprehension.
145145

146146
::
@@ -153,7 +153,7 @@ Set logic on the other axes
153153
~~~~~~~~~~~~~~~~~~~~~~~~~~~
154154

155155
When gluing together multiple DataFrames, you have a choice of how to handle
156-
the other axes (other than the one being concatenated). This can be done in
156+
the other axes (other than the one being concatenated). This can be done in
157157
the following three ways:
158158

159159
- Take the (sorted) union of them all, ``join='outer'``. This is the default
@@ -216,8 +216,8 @@ DataFrame:
216216
Concatenating using ``append``
217217
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
218218

219-
A useful shortcut to :func:`~pandas.concat` are the :meth:`~DataFrame.append`
220-
instance methods on ``Series`` and ``DataFrame``. These methods actually predated
219+
A useful shortcut to :func:`~pandas.concat` are the :meth:`~DataFrame.append`
220+
instance methods on ``Series`` and ``DataFrame``. These methods actually predated
221221
``concat``. They concatenate along ``axis=0``, namely the index:
222222

223223
.. ipython:: python
@@ -263,8 +263,8 @@ need to be:
263263
264264
.. note::
265265

266-
Unlike the :py:meth:`~list.append` method, which appends to the original list
267-
and returns ``None``, :meth:`~DataFrame.append` here **does not** modify
266+
Unlike the :py:meth:`~list.append` method, which appends to the original list
267+
and returns ``None``, :meth:`~DataFrame.append` here **does not** modify
268268
``df1`` and returns its copy with ``df2`` appended.
269269

270270
.. _merging.ignore_index:
@@ -362,9 +362,9 @@ Passing ``ignore_index=True`` will drop all name references.
362362
More concatenating with group keys
363363
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
364364

365-
A fairly common use of the ``keys`` argument is to override the column names
365+
A fairly common use of the ``keys`` argument is to override the column names
366366
when creating a new ``DataFrame`` based on existing ``Series``.
367-
Notice how the default behaviour consists on letting the resulting ``DataFrame``
367+
Notice how the default behaviour consists on letting the resulting ``DataFrame``
368368
inherit the parent ``Series``' name, when these existed.
369369

370370
.. ipython:: python
@@ -460,7 +460,7 @@ Appending rows to a DataFrame
460460
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
461461

462462
While not especially efficient (since a new object must be created), you can
463-
append a single row to a ``DataFrame`` by passing a ``Series`` or dict to
463+
append a single row to a ``DataFrame`` by passing a ``Series`` or dict to
464464
``append``, which returns a new ``DataFrame`` as above.
465465

466466
.. ipython:: python
@@ -505,15 +505,15 @@ pandas has full-featured, **high performance** in-memory join operations
505505
idiomatically very similar to relational databases like SQL. These methods
506506
perform significantly better (in some cases well over an order of magnitude
507507
better) than other open source implementations (like ``base::merge.data.frame``
508-
in R). The reason for this is careful algorithmic design and the internal layout
508+
in R). The reason for this is careful algorithmic design and the internal layout
509509
of the data in ``DataFrame``.
510510

511511
See the :ref:`cookbook<cookbook.merge>` for some advanced strategies.
512512

513513
Users who are familiar with SQL but new to pandas might be interested in a
514514
:ref:`comparison with SQL<compare_with_sql.join>`.
515515

516-
pandas provides a single function, :func:`~pandas.merge`, as the entry point for
516+
pandas provides a single function, :func:`~pandas.merge`, as the entry point for
517517
all standard database join operations between ``DataFrame`` objects:
518518

519519
::
@@ -582,7 +582,7 @@ and ``right`` is a subclass of DataFrame, the return type will still be
582582
``DataFrame``.
583583

584584
``merge`` is a function in the pandas namespace, and it is also available as a
585-
``DataFrame`` instance method :meth:`~DataFrame.merge`, with the calling
585+
``DataFrame`` instance method :meth:`~DataFrame.merge`, with the calling
586586
``DataFrame `` being implicitly considered the left object in the join.
587587
588588
The related :meth:`~DataFrame.join` method, uses ``merge`` internally for the
@@ -594,7 +594,7 @@ Brief primer on merge methods (relational algebra)
594594

595595
Experienced users of relational databases like SQL will be familiar with the
596596
terminology used to describe join operations between two SQL-table like
597-
structures (``DataFrame`` objects). There are several cases to consider which
597+
structures (``DataFrame`` objects). There are several cases to consider which
598598
are very important to understand:
599599

600600
- **one-to-one** joins: for example when joining two ``DataFrame`` objects on
@@ -634,8 +634,8 @@ key combination:
634634
labels=['left', 'right'], vertical=False);
635635
plt.close('all');
636636
637-
Here is a more complicated example with multiple join keys. Only the keys
638-
appearing in ``left`` and ``right`` are present (the intersection), since
637+
Here is a more complicated example with multiple join keys. Only the keys
638+
appearing in ``left`` and ``right`` are present (the intersection), since
639639
``how='inner'`` by default.
640640

641641
.. ipython:: python
@@ -751,13 +751,13 @@ Checking for duplicate keys
751751

752752
.. versionadded:: 0.21.0
753753

754-
Users can use the ``validate`` argument to automatically check whether there
755-
are unexpected duplicates in their merge keys. Key uniqueness is checked before
756-
merge operations and so should protect against memory overflows. Checking key
757-
uniqueness is also a good way to ensure user data structures are as expected.
754+
Users can use the ``validate`` argument to automatically check whether there
755+
are unexpected duplicates in their merge keys. Key uniqueness is checked before
756+
merge operations and so should protect against memory overflows. Checking key
757+
uniqueness is also a good way to ensure user data structures are as expected.
758758

759-
In the following example, there are duplicate values of ``B`` in the right
760-
``DataFrame``. As this is not a one-to-one merge -- as specified in the
759+
In the following example, there are duplicate values of ``B`` in the right
760+
``DataFrame``. As this is not a one-to-one merge -- as specified in the
761761
``validate`` argument -- an exception will be raised.
762762

763763

@@ -770,11 +770,11 @@ In the following example, there are duplicate values of ``B`` in the right
770770
771771
In [53]: result = pd.merge(left, right, on='B', how='outer', validate="one_to_one")
772772
...
773-
MergeError: Merge keys are not unique in right dataset; not a one-to-one merge
773+
MergeError: Merge keys are not unique in right dataset; not a one-to-one merge
774774
775-
If the user is aware of the duplicates in the right ``DataFrame`` but wants to
776-
ensure there are no duplicates in the left DataFrame, one can use the
777-
``validate='one_to_many'`` argument instead, which will not raise an exception.
775+
If the user is aware of the duplicates in the right ``DataFrame`` but wants to
776+
ensure there are no duplicates in the left DataFrame, one can use the
777+
``validate='one_to_many'`` argument instead, which will not raise an exception.
778778

779779
.. ipython:: python
780780
@@ -786,8 +786,8 @@ ensure there are no duplicates in the left DataFrame, one can use the
786786
The merge indicator
787787
~~~~~~~~~~~~~~~~~~~
788788

789-
:func:`~pandas.merge` accepts the argument ``indicator``. If ``True``, a
790-
Categorical-type column called ``_merge`` will be added to the output object
789+
:func:`~pandas.merge` accepts the argument ``indicator``. If ``True``, a
790+
Categorical-type column called ``_merge`` will be added to the output object
791791
that takes on values:
792792

793793
=================================== ================
@@ -895,7 +895,7 @@ Joining on index
895895
~~~~~~~~~~~~~~~~
896896

897897
:meth:`DataFrame.join` is a convenient method for combining the columns of two
898-
potentially differently-indexed ``DataFrames`` into a single result
898+
potentially differently-indexed ``DataFrames`` into a single result
899899
``DataFrame``. Here is a very basic example:
900900

901901
.. ipython:: python
@@ -975,9 +975,9 @@ indexes:
975975
Joining key columns on an index
976976
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
977977

978-
:meth:`~DataFrame.join` takes an optional ``on`` argument which may be a column
978+
:meth:`~DataFrame.join` takes an optional ``on`` argument which may be a column
979979
or multiple column names, which specifies that the passed ``DataFrame`` is to be
980-
aligned on that column in the ``DataFrame``. These two function calls are
980+
aligned on that column in the ``DataFrame``. These two function calls are
981981
completely equivalent:
982982

983983
::
@@ -987,7 +987,7 @@ completely equivalent:
987987
how='left', sort=False)
988988

989989
Obviously you can choose whichever form you find more convenient. For
990-
many-to-one joins (where one of the ``DataFrame``'s is already indexed by the
990+
many-to-one joins (where one of the ``DataFrame``'s is already indexed by the
991991
join key), using ``join`` may be more convenient. Here is a simple example:
992992

993993
.. ipython:: python
@@ -1266,7 +1266,7 @@ similarly.
12661266
Joining multiple DataFrame or Panel objects
12671267
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
12681268

1269-
A list or tuple of ``DataFrames`` can also be passed to :meth:`~DataFrame.join`
1269+
A list or tuple of ``DataFrames`` can also be passed to :meth:`~DataFrame.join`
12701270
to join them together on their indexes.
12711271

12721272
.. ipython:: python
@@ -1288,7 +1288,7 @@ Merging together values within Series or DataFrame columns
12881288
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
12891289

12901290
Another fairly common situation is to have two like-indexed (or similarly
1291-
indexed) ``Series`` or ``DataFrame`` objects and wanting to "patch" values in
1291+
indexed) ``Series`` or ``DataFrame`` objects and wanting to "patch" values in
12921292
one object from values for matching indices in the other. Here is an example:
12931293

12941294
.. ipython:: python
@@ -1313,7 +1313,7 @@ For this, use the :meth:`~DataFrame.combine_first` method:
13131313
plt.close('all');
13141314
13151315
Note that this method only takes values from the right ``DataFrame`` if they are
1316-
missing in the left ``DataFrame``. A related method, :meth:`~DataFrame.update`,
1316+
missing in the left ``DataFrame``. A related method, :meth:`~DataFrame.update`,
13171317
alters non-NA values inplace:
13181318

13191319
.. ipython:: python
@@ -1365,15 +1365,15 @@ Merging AsOf
13651365

13661366
.. versionadded:: 0.19.0
13671367

1368-
A :func:`merge_asof` is similar to an ordered left-join except that we match on
1369-
nearest key rather than equal keys. For each row in the ``left`` ``DataFrame``,
1370-
we select the last row in the ``right`` ``DataFrame`` whose ``on`` key is less
1368+
A :func:`merge_asof` is similar to an ordered left-join except that we match on
1369+
nearest key rather than equal keys. For each row in the ``left`` ``DataFrame``,
1370+
we select the last row in the ``right`` ``DataFrame`` whose ``on`` key is less
13711371
than the left's key. Both DataFrames must be sorted by the key.
13721372

1373-
Optionally an asof merge can perform a group-wise merge. This matches the
1373+
Optionally an asof merge can perform a group-wise merge. This matches the
13741374
``by`` key equally, in addition to the nearest match on the ``on`` key.
13751375

1376-
For example; we might have ``trades`` and ``quotes`` and we want to ``asof``
1376+
For example; we might have ``trades`` and ``quotes`` and we want to ``asof``
13771377
merge them.
13781378

13791379
.. ipython:: python
@@ -1432,8 +1432,8 @@ We only asof within ``2ms`` between the quote time and the trade time.
14321432
by='ticker',
14331433
tolerance=pd.Timedelta('2ms'))
14341434
1435-
We only asof within ``10ms`` between the quote time and the trade time and we
1436-
exclude exact matches on time. Note that though we exclude the exact matches
1435+
We only asof within ``10ms`` between the quote time and the trade time and we
1436+
exclude exact matches on time. Note that though we exclude the exact matches
14371437
(of the quotes), prior quotes **do** propagate to that point in time.
14381438

14391439
.. ipython:: python

pandas/core/frame.py

+46-7
Original file line numberDiff line numberDiff line change
@@ -883,27 +883,66 @@ def dot(self, other):
883883
@classmethod
884884
def from_dict(cls, data, orient='columns', dtype=None, columns=None):
885885
"""
886-
Construct DataFrame from dict of array-like or dicts
886+
Construct DataFrame from dict of array-like or dicts.
887+
888+
Creates DataFrame object from dictionary by columns or by index
889+
allowing dtype specification.
887890
888891
Parameters
889892
----------
890893
data : dict
891-
{field : array-like} or {field : dict}
894+
Of the form {field : array-like} or {field : dict}.
892895
orient : {'columns', 'index'}, default 'columns'
893896
The "orientation" of the data. If the keys of the passed dict
894897
should be the columns of the resulting DataFrame, pass 'columns'
895898
(default). Otherwise if the keys should be rows, pass 'index'.
896899
dtype : dtype, default None
897-
Data type to force, otherwise infer
898-
columns: list, default None
899-
Column labels to use when orient='index'. Raises a ValueError
900-
if used with orient='columns'
900+
Data type to force, otherwise infer.
901+
columns : list, default None
902+
Column labels to use when ``orient='index'``. Raises a ValueError
903+
if used with ``orient='columns'``.
901904
902905
.. versionadded:: 0.23.0
903906
904907
Returns
905908
-------
906-
DataFrame
909+
pandas.DataFrame
910+
911+
See Also
912+
--------
913+
DataFrame.from_records : DataFrame from ndarray (structured
914+
dtype), list of tuples, dict, or DataFrame
915+
DataFrame : DataFrame object creation using constructor
916+
917+
Examples
918+
--------
919+
By default the keys of the dict become the DataFrame columns:
920+
921+
>>> data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
922+
>>> pd.DataFrame.from_dict(data)
923+
col_1 col_2
924+
0 3 a
925+
1 2 b
926+
2 1 c
927+
3 0 d
928+
929+
Specify ``orient='index'`` to create the DataFrame using dictionary
930+
keys as rows:
931+
932+
>>> data = {'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd']}
933+
>>> pd.DataFrame.from_dict(data, orient='index')
934+
0 1 2 3
935+
row_1 3 2 1 0
936+
row_2 a b c d
937+
938+
When using the 'index' orientation, the column names can be
939+
specified manually:
940+
941+
>>> pd.DataFrame.from_dict(data, orient='index',
942+
... columns=['A', 'B', 'C', 'D'])
943+
A B C D
944+
row_1 3 2 1 0
945+
row_2 a b c d
907946
"""
908947
index = None
909948
orient = orient.lower()

0 commit comments

Comments
 (0)