Skip to content

DOC: minor tweaks to formatting on SQL comparison page #38941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 28 additions & 25 deletions doc/source/getting_started/comparison/comparison_with_sql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,31 +69,31 @@ Filtering in SQL is done via a WHERE clause.

.. include:: includes/filtering.rst

Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame using | (OR) and &
(AND).
Just like SQL's ``OR`` and ``AND``, multiple conditions can be passed to a DataFrame using ``|``
(``OR``) and ``&`` (``AND``).

Tips of more than $5 at Dinner meals:

.. code-block:: sql

-- tips of more than $5.00 at Dinner meals
SELECT *
FROM tips
WHERE time = 'Dinner' AND tip > 5.00;

.. ipython:: python

# tips of more than $5.00 at Dinner meals
tips[(tips["time"] == "Dinner") & (tips["tip"] > 5.00)]

Tips by parties of at least 5 diners OR bill total was more than $45:

.. code-block:: sql

-- tips by parties of at least 5 diners OR bill total was more than $45
SELECT *
FROM tips
WHERE size >= 5 OR total_bill > 45;

.. ipython:: python

# tips by parties of at least 5 diners OR bill total was more than $45
tips[(tips["size"] >= 5) | (tips["total_bill"] > 45)]

NULL checking is done using the :meth:`~pandas.Series.notna` and :meth:`~pandas.Series.isna`
Expand Down Expand Up @@ -134,7 +134,7 @@ Getting items where ``col1`` IS NOT NULL can be done with :meth:`~pandas.Series.

GROUP BY
--------
In pandas, SQL's GROUP BY operations are performed using the similarly named
In pandas, SQL's ``GROUP BY`` operations are performed using the similarly named
:meth:`~pandas.DataFrame.groupby` method. :meth:`~pandas.DataFrame.groupby` typically refers to a
process where we'd like to split a dataset into groups, apply some function (typically aggregation)
, and then combine the groups together.
Expand Down Expand Up @@ -162,7 +162,7 @@ The pandas equivalent would be:
Notice that in the pandas code we used :meth:`~pandas.core.groupby.DataFrameGroupBy.size` and not
:meth:`~pandas.core.groupby.DataFrameGroupBy.count`. This is because
:meth:`~pandas.core.groupby.DataFrameGroupBy.count` applies the function to each column, returning
the number of ``not null`` records within each.
the number of ``NOT NULL`` records within each.

.. ipython:: python

Expand Down Expand Up @@ -223,10 +223,10 @@ Grouping by more than one column is done by passing a list of columns to the

JOIN
----
JOINs can be performed with :meth:`~pandas.DataFrame.join` or :meth:`~pandas.merge`. By default,
:meth:`~pandas.DataFrame.join` will join the DataFrames on their indices. Each method has
parameters allowing you to specify the type of join to perform (LEFT, RIGHT, INNER, FULL) or the
columns to join on (column names or indices).
``JOIN``\s can be performed with :meth:`~pandas.DataFrame.join` or :meth:`~pandas.merge`. By
default, :meth:`~pandas.DataFrame.join` will join the DataFrames on their indices. Each method has
parameters allowing you to specify the type of join to perform (``LEFT``, ``RIGHT``, ``INNER``,
``FULL``) or the columns to join on (column names or indices).

.. ipython:: python

Expand All @@ -235,7 +235,7 @@ columns to join on (column names or indices).

Assume we have two database tables of the same name and structure as our DataFrames.

Now let's go over the various types of JOINs.
Now let's go over the various types of ``JOIN``\s.

INNER JOIN
~~~~~~~~~~
Expand All @@ -261,56 +261,59 @@ column with another DataFrame's index.

LEFT OUTER JOIN
~~~~~~~~~~~~~~~

Show all records from ``df1``.

.. code-block:: sql

-- show all records from df1
SELECT *
FROM df1
LEFT OUTER JOIN df2
ON df1.key = df2.key;

.. ipython:: python

# show all records from df1
pd.merge(df1, df2, on="key", how="left")

RIGHT JOIN
~~~~~~~~~~

Show all records from ``df2``.

.. code-block:: sql

-- show all records from df2
SELECT *
FROM df1
RIGHT OUTER JOIN df2
ON df1.key = df2.key;

.. ipython:: python

# show all records from df2
pd.merge(df1, df2, on="key", how="right")

FULL JOIN
~~~~~~~~~
pandas also allows for FULL JOINs, which display both sides of the dataset, whether or not the
joined columns find a match. As of writing, FULL JOINs are not supported in all RDBMS (MySQL).
pandas also allows for ``FULL JOIN``\s, which display both sides of the dataset, whether or not the
joined columns find a match. As of writing, ``FULL JOIN``\s are not supported in all RDBMS (MySQL).

Show all records from both tables.

.. code-block:: sql

-- show all records from both tables
SELECT *
FROM df1
FULL OUTER JOIN df2
ON df1.key = df2.key;

.. ipython:: python

# show all records from both frames
pd.merge(df1, df2, on="key", how="outer")


UNION
-----
UNION ALL can be performed using :meth:`~pandas.concat`.

``UNION ALL`` can be performed using :meth:`~pandas.concat`.

.. ipython:: python

Expand Down Expand Up @@ -342,7 +345,7 @@ UNION ALL can be performed using :meth:`~pandas.concat`.

pd.concat([df1, df2])

SQL's UNION is similar to UNION ALL, however UNION will remove duplicate rows.
SQL's ``UNION`` is similar to ``UNION ALL``, however ``UNION`` will remove duplicate rows.

.. code-block:: sql

Expand Down Expand Up @@ -444,7 +447,7 @@ the same using ``rank(method='first')`` function
Let's find tips with (rank < 3) per gender group for (tips < 2).
Notice that when using ``rank(method='min')`` function
``rnk_min`` remains the same for the same ``tip``
(as Oracle's RANK() function)
(as Oracle's ``RANK()`` function)

.. ipython:: python

Expand Down Expand Up @@ -477,7 +480,7 @@ DELETE
DELETE FROM tips
WHERE tip > 9;

In pandas we select the rows that should remain, instead of deleting them
In pandas we select the rows that should remain instead of deleting them:

.. ipython:: python

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`
:ref:`boolean indexing <indexing.boolean>`.

.. ipython:: python

Expand Down