Skip to content

DOC: Add Comparison with Excel documentation #23042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions doc/source/comparison_with_excel.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
.. currentmodule:: pandas
.. _compare_with_excel:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it might be ok to either link to the excel docs for these functions and/or include a screen shot showing this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a user manual or docs for excel. I could only find this quick start guide. I'll try to come up with pictures and a gif for the fill handle.

Should I be using .. image:: abc.gif to attach gif?

Copy link

@Remnan13 Remnan13 Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few links to docs which might help with the excel user manual hunt!

University of Arizona's bare-bone (but functional) walk-through. Offers some examples, which seems like they can help with documentation written for excel-to-panda comparison.

Towson's counterpart does a more extensive job and includes examples of including an image background (albeit using Bing). Two birds, one stone?

Hope this helped!


.. ipython:: python
:suppress:

import pandas as pd
import random
pd.options.display.max_rows=15

Comparison with Excel
*********************

Commonly used Excel functionalities
-----------------------------------

Fill Handle
~~~~~~~~~~~

Create a series of numbers following a set pattern in a certain set of cells. In
Excel this would be done by shift+drag after entering the first number or by
entering the first two or three values and then dragging.

This can be achieved by creating a series and assigning it to the desired cells.

.. ipython:: python

df = pd.DataFrame({'AAA': [1] * 8, 'BBB': list(range(0, 8))}); df

series = list(range(1, 5)); series

df.iloc[2:(5+1)].AAA = series

df

Filters
~~~~~~~

Filters can be achieved by using slicing.

The examples filter by 0 on column AAA, and also show how to filter by multiple
values.

.. ipython:: python

df[df.AAA == 0]

df[(df.AAA == 0) | (df.AAA == 2)]


Drop Duplicates
~~~~~~~~~~~~~~~

Another commonly used function is Drop Duplicates. This is directly supported in
pandas.

.. ipython:: python

df = pd.DataFrame({"class": ['A', 'A', 'A', 'B', 'C', 'D'], "student_count": [42, 35, 42, 50, 47, 45], "all_pass": ["Yes", "Yes", "Yes", "No", "No", "Yes"]})

df.drop_duplicates()

df.drop_duplicates(["class", "student_count"])


Pivot Table
~~~~~~~~~~~

This can be achieved by using ``pandas.pivot_table`` for examples and reference,
please see `pandas.pivot_table <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html>`__


Formulae
~~~~~~~~

Let's create a new column "girls_count" and try to compute the number of boys in
each class.

.. ipython:: python

df["girls_count"] = [21, 12, 21, 31, 23, 17]; df

def get_count(row):
return row["student_count"] - row["girls_count"]

df["boys_count"] = df.apply(get_count, axis = 1); df


VLOOKUP
~~~~~~~

.. ipython:: python

df1 = pd.DataFrame({"keys": [1, 2, 3, 4, 5, 6, 7], "first_names": ["harry", "ron",
"hermione", "rubius", "albus", "severus", "luna"]}); df1

random_names = pd.DataFrame({"surnames": ["hadrid", "malfoy", "lovegood",
"dumbledore", "grindelwald", "granger", "weasly", "riddle", "longbottom",
"snape"], "keys": [ random.randint(1,7) for x in range(0,10) ]})

random_names

random_names.merge(df1, on="keys", how='left')

Adding a row
~~~~~~~~~~~~

To appended a row, we can just assign values to an index using ``iloc``.

NOTE: If the index already exists, the values in that index will be over written.

.. ipython:: python

df1.iloc[7] = [8, "tonks"]; df1


Search and Replace
~~~~~~~~~~~~~~~~~~

The ``replace`` method that comes associated with the ``DataFrame`` object can perform
this function. Please see `pandas.DataFrame.replace <https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html>`__ for examples.