-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: remove use of head() in the comparison docs #38935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
ce26fa0
93fff50
ff640aa
8bd6171
199a04c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,23 +4,13 @@ | |
|
||
Comparison with SAS | ||
******************** | ||
|
||
For potential users coming from `SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__ | ||
this page is meant to demonstrate how different SAS operations would be | ||
performed in pandas. | ||
|
||
.. include:: includes/introduction.rst | ||
|
||
.. note:: | ||
|
||
Throughout this tutorial, the pandas ``DataFrame`` will be displayed by calling | ||
``df.head()``, which displays the first N (default 5) rows of the ``DataFrame``. | ||
This is often used in interactive work (e.g. `Jupyter notebook | ||
<https://jupyter.org/>`_ or terminal) - the equivalent in SAS would be: | ||
|
||
.. code-block:: sas | ||
|
||
proc print data=df(obs=5); | ||
run; | ||
|
||
Data structures | ||
--------------- | ||
|
@@ -120,7 +110,7 @@ The pandas method is :func:`read_csv`, which works similarly. | |
"pandas/master/pandas/tests/io/data/csv/tips.csv" | ||
) | ||
tips = pd.read_csv(url) | ||
tips.head() | ||
tips | ||
|
||
|
||
Like ``PROC IMPORT``, ``read_csv`` can take a number of parameters to specify | ||
|
@@ -138,6 +128,19 @@ In addition to text/csv, pandas supports a variety of other data formats | |
such as Excel, HDF5, and SQL databases. These are all read via a ``pd.read_*`` | ||
function. See the :ref:`IO documentation<io>` for more details. | ||
|
||
Limiting output | ||
~~~~~~~~~~~~~~~ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. New sections in each page. |
||
|
||
.. include:: includes/limit.rst | ||
|
||
The equivalent in SAS would be: | ||
|
||
.. code-block:: sas | ||
|
||
proc print data=df(obs=5); | ||
run; | ||
|
||
|
||
Exporting data | ||
~~~~~~~~~~~~~~ | ||
|
||
|
@@ -173,20 +176,8 @@ be used on new or existing columns. | |
new_bill = total_bill / 2; | ||
run; | ||
|
||
pandas provides similar vectorized operations by | ||
specifying the individual ``Series`` in the ``DataFrame``. | ||
New columns can be assigned in the same way. | ||
.. include:: includes/column_operations.rst | ||
|
||
.. ipython:: python | ||
|
||
tips["total_bill"] = tips["total_bill"] - 2 | ||
tips["new_bill"] = tips["total_bill"] / 2.0 | ||
tips.head() | ||
|
||
.. ipython:: python | ||
:suppress: | ||
|
||
tips = tips.drop("new_bill", axis=1) | ||
|
||
Filtering | ||
~~~~~~~~~ | ||
|
@@ -278,18 +269,7 @@ drop, and rename columns. | |
rename total_bill=total_bill_2; | ||
run; | ||
|
||
The same operations are expressed in pandas below. | ||
|
||
.. ipython:: python | ||
|
||
# keep | ||
tips[["sex", "total_bill", "tip"]].head() | ||
|
||
# drop | ||
tips.drop("sex", axis=1).head() | ||
|
||
# rename | ||
tips.rename(columns={"total_bill": "total_bill_2"}).head() | ||
.. include:: includes/column_selection.rst | ||
|
||
|
||
Sorting by values | ||
|
@@ -442,6 +422,8 @@ input frames. | |
Missing data | ||
------------ | ||
|
||
Both pandas and SAS have a representation for missing data. | ||
|
||
.. include:: includes/missing_intro.rst | ||
|
||
One difference is that missing data cannot be compared to its sentinel value. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
pandas provides similar vectorized operations by specifying the individual ``Series`` in the | ||
``DataFrame``. New columns can be assigned in the same way. The :meth:`DataFrame.drop` method drops | ||
a column from the ``DataFrame``. | ||
|
||
.. ipython:: python | ||
|
||
tips["total_bill"] = tips["total_bill"] - 2 | ||
tips["new_bill"] = tips["total_bill"] / 2 | ||
tips | ||
|
||
tips = tips.drop("new_bill", axis=1) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
The same operations are expressed in pandas below. Note that these operations do not happen in | ||
place. To make these changes persist, assign the operation back to a variable. | ||
|
||
Keep certain columns | ||
'''''''''''''''''''' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These headings are new. |
||
|
||
.. ipython:: python | ||
|
||
tips[["sex", "total_bill", "tip"]] | ||
|
||
Drop a column | ||
''''''''''''' | ||
|
||
.. ipython:: python | ||
|
||
tips.drop("sex", axis=1) | ||
|
||
Rename a column | ||
''''''''''''''' | ||
|
||
.. ipython:: python | ||
|
||
tips.rename(columns={"total_bill": "total_bill_2"}) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,4 +4,4 @@ indexes are zero-based. | |
|
||
.. ipython:: python | ||
|
||
tips["sex"].str[0:1].head() | ||
tips["sex"].str[0:1] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,4 +5,4 @@ zero-based. | |
|
||
.. ipython:: python | ||
|
||
tips["sex"].str.find("ale").head() | ||
tips["sex"].str.find("ale") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
By default, pandas will truncate output of large ``DataFrame``\s to show the first and last rows. | ||
This can be overridden by :ref:`changing the pandas options <options>`, or using | ||
:meth:`DataFrame.head` or :meth:`DataFrame.tail`. | ||
|
||
.. ipython:: python | ||
|
||
tips.head(5) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,31 @@ | ||
This doesn't work in pandas. Instead, the :func:`pd.isna` or :func:`pd.notna` functions | ||
should be used for comparisons. | ||
In pandas, :meth:`Series.isna` and :meth:`Series.notna` can be used to filter the rows. | ||
|
||
.. ipython:: python | ||
|
||
outer_join[pd.isna(outer_join["value_x"])] | ||
outer_join[pd.notna(outer_join["value_x"])] | ||
outer_join[outer_join["value_x"].isna()] | ||
outer_join[outer_join["value_x"].notna()] | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
pandas also provides a variety of methods to work with missing data -- some of | ||
which would be challenging to express in Stata. For example, there are methods to | ||
drop all rows with any missing values, replacing missing values with a specified | ||
value, like the mean, or forward filling from previous rows. See the | ||
:ref:`missing data documentation<missing_data>` for more. | ||
pandas provides :ref:`a variety of methods to work with missing data <missing_data>`. Here are some examples: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Split out the examples to sub-headings below. |
||
|
||
Drop rows with missing values | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. ipython:: python | ||
|
||
# Drop rows with any missing value | ||
outer_join.dropna() | ||
|
||
# Fill forwards | ||
Forward fill from previous rows | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. ipython:: python | ||
|
||
outer_join.fillna(method="ffill") | ||
|
||
# Impute missing values with the mean | ||
Replace missing values with a specified value | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Using the mean: | ||
|
||
.. ipython:: python | ||
|
||
outer_join["value_x"].fillna(outer_join["value_x"].mean()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer the case. Ditto for the Stata page.