Skip to content

DOC: create shared includes for comparison docs, take III #38887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 3, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 7 additions & 67 deletions doc/source/getting_started/comparison/comparison_with_sas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ For potential users coming from `SAS <https://en.wikipedia.org/wiki/SAS_(softwar
this page is meant to demonstrate how different SAS operations would be
performed in pandas.

.. include:: comparison_boilerplate.rst
.. include:: includes/introduction.rst

.. note::

Expand Down Expand Up @@ -93,16 +93,7 @@ specifying the column names.
;
run;

A pandas ``DataFrame`` can be constructed in many different ways,
but for a small number of values, it is often convenient to specify it as
a Python dictionary, where the keys are the column names
and the values are the data.

.. ipython:: python

df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
df

.. include:: includes/construct_dataframe.rst

Reading external data
~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -217,12 +208,7 @@ or more columns.
DATA step begins and can also be used in PROC statements */
run;

DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`

.. ipython:: python

tips[tips["total_bill"] > 10].head()
.. include:: includes/filtering.rst

If/then logic
~~~~~~~~~~~~~
Expand All @@ -239,18 +225,7 @@ In SAS, if/then logic can be used to create new columns.
else bucket = 'high';
run;

The same operation in pandas can be accomplished using
the ``where`` method from ``numpy``.

.. ipython:: python

tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
tips.head()

.. ipython:: python
:suppress:

tips = tips.drop("bucket", axis=1)
.. include:: includes/if_then.rst

Date functionality
~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -278,28 +253,7 @@ functions pandas supports other Time Series features
not available in Base SAS (such as resampling and custom offsets) -
see the :ref:`timeseries documentation<timeseries>` for more details.

.. ipython:: python

tips["date1"] = pd.Timestamp("2013-01-15")
tips["date2"] = pd.Timestamp("2015-02-15")
tips["date1_year"] = tips["date1"].dt.year
tips["date2_month"] = tips["date2"].dt.month
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
"date1"
].dt.to_period("M")

tips[
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
].head()

.. ipython:: python
:suppress:

tips = tips.drop(
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
axis=1,
)
.. include:: includes/time_date.rst

Selection of columns
~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -349,14 +303,7 @@ Sorting in SAS is accomplished via ``PROC SORT``
by sex total_bill;
run;

pandas objects have a :meth:`~DataFrame.sort_values` method, which
takes a list of columns to sort by.

.. ipython:: python

tips = tips.sort_values(["sex", "total_bill"])
tips.head()

.. include:: includes/sorting.rst

String processing
-----------------
Expand All @@ -377,14 +324,7 @@ functions. ``LENGTHN`` excludes trailing blanks and ``LENGTHC`` includes trailin
put(LENGTHC(time));
run;

Python determines the length of a character string with the ``len`` function.
``len`` includes trailing blanks. Use ``len`` and ``rstrip`` to exclude
trailing blanks.

.. ipython:: python

tips["time"].str.len().head()
tips["time"].str.rstrip().str.len().head()
.. include:: includes/length.rst


Find
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ terminology and link to documentation for Excel, but much will be the same/simil
`Apple Numbers <https://www.apple.com/mac/numbers/compatibility/functions.html>`_, and other
Excel-compatible spreadsheet software.

.. include:: comparison_boilerplate.rst
.. include:: includes/introduction.rst

Data structures
---------------
Expand Down
21 changes: 3 additions & 18 deletions doc/source/getting_started/comparison/comparison_with_sql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Since many potential pandas users have some familiarity with
`SQL <https://en.wikipedia.org/wiki/SQL>`_, this page is meant to provide some examples of how
various SQL operations would be performed using pandas.

.. include:: comparison_boilerplate.rst
.. include:: includes/introduction.rst

Most of the examples will utilize the ``tips`` dataset found within pandas tests. We'll read
the data into a DataFrame called ``tips`` and assume we have a database table of the same name and
Expand Down Expand Up @@ -65,24 +65,9 @@ Filtering in SQL is done via a WHERE clause.

SELECT *
FROM tips
WHERE time = 'Dinner'
LIMIT 5;

DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`

.. ipython:: python

tips[tips["time"] == "Dinner"].head(5)

The above statement is simply passing a ``Series`` of True/False objects to the DataFrame,
returning all rows with True.

.. ipython:: python
WHERE time = 'Dinner';

is_dinner = tips["time"] == "Dinner"
is_dinner.value_counts()
tips[is_dinner].head(5)
.. include:: includes/filtering.rst

Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame using | (OR) and &
(AND).
Expand Down
74 changes: 7 additions & 67 deletions doc/source/getting_started/comparison/comparison_with_stata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ For potential users coming from `Stata <https://en.wikipedia.org/wiki/Stata>`__
this page is meant to demonstrate how different Stata operations would be
performed in pandas.

.. include:: comparison_boilerplate.rst
.. include:: includes/introduction.rst

.. note::

Expand Down Expand Up @@ -89,16 +89,7 @@ specifying the column names.
5 6
end

A pandas ``DataFrame`` can be constructed in many different ways,
but for a small number of values, it is often convenient to specify it as
a Python dictionary, where the keys are the column names
and the values are the data.

.. ipython:: python

df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
df

.. include:: includes/construct_dataframe.rst

Reading external data
~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -210,12 +201,7 @@ Filtering in Stata is done with an ``if`` clause on one or more columns.

list if total_bill > 10

DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`.

.. ipython:: python

tips[tips["total_bill"] > 10].head()
.. include:: includes/filtering.rst

If/then logic
~~~~~~~~~~~~~
Expand All @@ -227,18 +213,7 @@ In Stata, an ``if`` clause can also be used to create new columns.
generate bucket = "low" if total_bill < 10
replace bucket = "high" if total_bill >= 10

The same operation in pandas can be accomplished using
the ``where`` method from ``numpy``.

.. ipython:: python

tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
tips.head()

.. ipython:: python
:suppress:

tips = tips.drop("bucket", axis=1)
.. include:: includes/if_then.rst

Date functionality
~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -266,28 +241,7 @@ functions, pandas supports other Time Series features
not available in Stata (such as time zone handling and custom offsets) --
see the :ref:`timeseries documentation<timeseries>` for more details.

.. ipython:: python

tips["date1"] = pd.Timestamp("2013-01-15")
tips["date2"] = pd.Timestamp("2015-02-15")
tips["date1_year"] = tips["date1"].dt.year
tips["date2_month"] = tips["date2"].dt.month
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
"date1"
].dt.to_period("M")

tips[
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
].head()

.. ipython:: python
:suppress:

tips = tips.drop(
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
axis=1,
)
.. include:: includes/time_date.rst

Selection of columns
~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -327,14 +281,7 @@ Sorting in Stata is accomplished via ``sort``

sort sex total_bill

pandas objects have a :meth:`DataFrame.sort_values` method, which
takes a list of columns to sort by.

.. ipython:: python

tips = tips.sort_values(["sex", "total_bill"])
tips.head()

.. include:: includes/sorting.rst

String processing
-----------------
Expand All @@ -350,14 +297,7 @@ Stata determines the length of a character string with the :func:`strlen` and
generate strlen_time = strlen(time)
generate ustrlen_time = ustrlen(time)

Python determines the length of a character string with the ``len`` function.
In Python 3, all strings are Unicode strings. ``len`` includes trailing blanks.
Use ``len`` and ``rstrip`` to exclude trailing blanks.

.. ipython:: python

tips["time"].str.len().head()
tips["time"].str.rstrip().str.len().head()
.. include:: includes/length.rst


Finding position of substring
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
A pandas ``DataFrame`` can be constructed in many different ways,
but for a small number of values, it is often convenient to specify it as
a Python dictionary, where the keys are the column names
and the values are the data.

.. ipython:: python

df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
df
16 changes: 16 additions & 0 deletions doc/source/getting_started/comparison/includes/filtering.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`

.. ipython:: python

tips[tips["total_bill"] > 10]

The above statement is simply passing a ``Series`` of ``True``/``False`` objects to the DataFrame,
returning all rows with ``True``.

.. ipython:: python

is_dinner = tips["time"] == "Dinner"
is_dinner
is_dinner.value_counts()
tips[is_dinner]
12 changes: 12 additions & 0 deletions doc/source/getting_started/comparison/includes/if_then.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
The same operation in pandas can be accomplished using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are any of these shared across compraisons? its not big deal, but you can simply re-use them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean by "these"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh it looks like you are doing what i thought (e.g. putting includes in a common location) and then using for comparison_excel/sas/r etc. which is great.

so nvm

the ``where`` method from ``numpy``.

.. ipython:: python

tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
tips.head()

.. ipython:: python
:suppress:

tips = tips.drop("bucket", axis=1)
8 changes: 8 additions & 0 deletions doc/source/getting_started/comparison/includes/length.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Python determines the length of a character string with the ``len`` function.
In Python 3, all strings are Unicode strings. ``len`` includes trailing blanks.
Use ``len`` and ``rstrip`` to exclude trailing blanks.

.. ipython:: python

tips["time"].str.len().head()
tips["time"].str.rstrip().str.len().head()
7 changes: 7 additions & 0 deletions doc/source/getting_started/comparison/includes/sorting.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
pandas objects have a :meth:`DataFrame.sort_values` method, which
takes a list of columns to sort by.

.. ipython:: python

tips = tips.sort_values(["sex", "total_bill"])
tips.head()
22 changes: 22 additions & 0 deletions doc/source/getting_started/comparison/includes/time_date.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.. ipython:: python

tips["date1"] = pd.Timestamp("2013-01-15")
tips["date2"] = pd.Timestamp("2015-02-15")
tips["date1_year"] = tips["date1"].dt.year
tips["date2_month"] = tips["date2"].dt.month
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
"date1"
].dt.to_period("M")

tips[
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
].head()

.. ipython:: python
:suppress:

tips = tips.drop(
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
axis=1,
)
5 changes: 4 additions & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,10 @@ ignore = E203, # space before : (needed for how black formats slicing)
E711, # comparison to none should be 'if cond is none:'

exclude =
doc/source/development/contributing_docstring.rst
doc/source/development/contributing_docstring.rst,
# work around issue of undefined variable warnings
# https://github.com/pandas-dev/pandas/pull/38837#issuecomment-752884156
doc/source/getting_started/comparison/includes/*.rst
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per suggestion in #38837 (comment).


[tool:pytest]
# sync minversion with setup.cfg & install.rst
Expand Down