Skip to content

Commit 680318a

Browse files
author
Vipul Rai
committed
Resolved merge conflict by incorporating both suggestions.
2 parents 80f68b2 + 9929fca commit 680318a

File tree

196 files changed

+3908
-2015
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

196 files changed

+3908
-2015
lines changed

.travis.yml

+11-1
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,11 @@ matrix:
2727
fast_finish: true
2828

2929
include:
30+
# In allowed failures
31+
- dist: bionic
32+
python: 3.9-dev
33+
env:
34+
- JOB="3.9-dev" PATTERN="(not slow and not network and not clipboard)"
3035
- env:
3136
- JOB="3.8" ENV_FILE="ci/deps/travis-38.yaml" PATTERN="(not slow and not network and not clipboard)"
3237

@@ -53,6 +58,11 @@ matrix:
5358
services:
5459
- mysql
5560
- postgresql
61+
allow_failures:
62+
- dist: bionic
63+
python: 3.9-dev
64+
env:
65+
- JOB="3.9-dev" PATTERN="(not slow and not network)"
5666

5767
before_install:
5868
- echo "before_install"
@@ -83,7 +93,7 @@ install:
8393
script:
8494
- echo "script start"
8595
- echo "$JOB"
86-
- source activate pandas-dev
96+
- if [ "$JOB" != "3.9-dev" ]; then source activate pandas-dev; fi
8797
- ci/run_tests.sh
8898

8999
after_script:

asv_bench/benchmarks/algorithms.py

+14-3
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,16 @@ class Factorize:
3434
params = [
3535
[True, False],
3636
[True, False],
37-
["int", "uint", "float", "string", "datetime64[ns]", "datetime64[ns, tz]"],
37+
[
38+
"int",
39+
"uint",
40+
"float",
41+
"string",
42+
"datetime64[ns]",
43+
"datetime64[ns, tz]",
44+
"Int64",
45+
"boolean",
46+
],
3847
]
3948
param_names = ["unique", "sort", "dtype"]
4049

@@ -49,13 +58,15 @@ def setup(self, unique, sort, dtype):
4958
"datetime64[ns, tz]": pd.date_range(
5059
"2011-01-01", freq="H", periods=N, tz="Asia/Tokyo"
5160
),
61+
"Int64": pd.array(np.arange(N), dtype="Int64"),
62+
"boolean": pd.array(np.random.randint(0, 2, N), dtype="boolean"),
5263
}[dtype]
5364
if not unique:
5465
data = data.repeat(5)
55-
self.idx = data
66+
self.data = data
5667

5768
def time_factorize(self, unique, sort, dtype):
58-
self.idx.factorize(sort=sort)
69+
pd.factorize(self.data, sort=sort)
5970

6071

6172
class Duplicated:

ci/build39.sh

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/bin/bash -e
2+
# Special build for python3.9 until numpy puts its own wheels up
3+
4+
sudo apt-get install build-essential gcc xvfb
5+
pip install --no-deps -U pip wheel setuptools
6+
pip install python-dateutil pytz pytest pytest-xdist hypothesis
7+
pip install cython --pre # https://github.com/cython/cython/issues/3395
8+
9+
git clone https://github.com/numpy/numpy
10+
cd numpy
11+
python setup.py build_ext --inplace
12+
python setup.py install
13+
cd ..
14+
rm -rf numpy
15+
16+
python setup.py build_ext -inplace
17+
python -m pip install --no-build-isolation -e .
18+
19+
python -c "import sys; print(sys.version_info)"
20+
python -c "import pandas as pd"
21+
python -c "import hypothesis"

ci/deps/azure-37-numpydev.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ dependencies:
1616
- pip:
1717
- cython==0.29.16 # GH#34014
1818
- "git+git://github.com/dateutil/dateutil.git"
19-
- "-f https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com"
19+
- "--extra-index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple"
2020
- "--pre"
2121
- "numpy"
2222
- "scipy"

ci/setup_env.sh

+5
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
#!/bin/bash -e
22

3+
if [ "$JOB" == "3.9-dev" ]; then
4+
/bin/bash ci/build39.sh
5+
exit 0
6+
fi
7+
38
# edit the locale file if needed
49
if [[ "$(uname)" == "Linux" && -n "$LC_ALL" ]]; then
510
echo "Adding locale to the first line of pandas/__init__.py"

doc/source/development/contributing.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ version control to allow many people to work together on the project.
110110
Some great resources for learning Git:
111111

112112
* the `GitHub help pages <https://help.github.com/>`_.
113-
* the `NumPy's documentation <https://docs.scipy.org/doc/numpy/dev/index.html>`_.
113+
* the `NumPy's documentation <https://numpy.org/doc/stable/dev/index.html>`_.
114114
* Matthew Brett's `Pydagogue <https://matthew-brett.github.com/pydagogue/>`_.
115115

116116
Getting started with Git
@@ -974,7 +974,7 @@ it is worth getting in the habit of writing tests ahead of time so this is never
974974
Like many packages, pandas uses `pytest
975975
<https://docs.pytest.org/en/latest/>`_ and the convenient
976976
extensions in `numpy.testing
977-
<https://docs.scipy.org/doc/numpy/reference/routines.testing.html>`_.
977+
<https://numpy.org/doc/stable/reference/routines.testing.html>`_.
978978

979979
.. note::
980980

doc/source/development/extending.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ and re-boxes it if necessary.
219219

220220
If applicable, we highly recommend that you implement ``__array_ufunc__`` in your
221221
extension array to avoid coercion to an ndarray. See
222-
`the numpy documentation <https://docs.scipy.org/doc/numpy/reference/generated/numpy.lib.mixins.NDArrayOperatorsMixin.html>`__
222+
`the numpy documentation <https://numpy.org/doc/stable/reference/generated/numpy.lib.mixins.NDArrayOperatorsMixin.html>`__
223223
for an example.
224224

225225
As part of your implementation, we require that you defer to pandas when a pandas

doc/source/getting_started/intro_tutorials/02_read_write.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
<div class="card-body">
2424
<p class="card-text">
2525

26-
This tutorial uses the titanic data set, stored as CSV. The data
26+
This tutorial uses the Titanic data set, stored as CSV. The data
2727
consists of the following data columns:
2828

2929
- PassengerId: Id of every passenger.
@@ -61,7 +61,7 @@ How do I read and write tabular data?
6161
<ul class="task-bullet">
6262
<li>
6363

64-
I want to analyse the titanic passenger data, available as a CSV file.
64+
I want to analyze the Titanic passenger data, available as a CSV file.
6565

6666
.. ipython:: python
6767
@@ -134,7 +134,7 @@ strings (``object``).
134134
<ul class="task-bullet">
135135
<li>
136136

137-
My colleague requested the titanic data as a spreadsheet.
137+
My colleague requested the Titanic data as a spreadsheet.
138138

139139
.. ipython:: python
140140

doc/source/getting_started/intro_tutorials/03_subset_data.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -330,7 +330,7 @@ When using the column names, row labels or a condition expression, use
330330
the ``loc`` operator in front of the selection brackets ``[]``. For both
331331
the part before and after the comma, you can use a single label, a list
332332
of labels, a slice of labels, a conditional expression or a colon. Using
333-
a colon specificies you want to select all rows or columns.
333+
a colon specifies you want to select all rows or columns.
334334

335335
.. raw:: html
336336

doc/source/getting_started/intro_tutorials/06_calculate_statistics.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
<div class="card-body">
2424
<p class="card-text">
2525

26-
This tutorial uses the titanic data set, stored as CSV. The data
26+
This tutorial uses the Titanic data set, stored as CSV. The data
2727
consists of the following data columns:
2828

2929
- PassengerId: Id of every passenger.
@@ -72,7 +72,7 @@ Aggregating statistics
7272
<ul class="task-bullet">
7373
<li>
7474

75-
What is the average age of the titanic passengers?
75+
What is the average age of the Titanic passengers?
7676

7777
.. ipython:: python
7878
@@ -95,7 +95,7 @@ across rows by default.
9595
<ul class="task-bullet">
9696
<li>
9797

98-
What is the median age and ticket fare price of the titanic passengers?
98+
What is the median age and ticket fare price of the Titanic passengers?
9999

100100
.. ipython:: python
101101
@@ -148,7 +148,7 @@ Aggregating statistics grouped by category
148148
<ul class="task-bullet">
149149
<li>
150150

151-
What is the average age for male versus female titanic passengers?
151+
What is the average age for male versus female Titanic passengers?
152152

153153
.. ipython:: python
154154

doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst

+8-8
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
<div class="card-body">
2424
<p class="card-text">
2525

26-
This tutorial uses the titanic data set, stored as CSV. The data
26+
This tutorial uses the Titanic data set, stored as CSV. The data
2727
consists of the following data columns:
2828

2929
- PassengerId: Id of every passenger.
@@ -122,7 +122,7 @@ Sort table rows
122122
<ul class="task-bullet">
123123
<li>
124124

125-
I want to sort the titanic data according to the age of the passengers.
125+
I want to sort the Titanic data according to the age of the passengers.
126126

127127
.. ipython:: python
128128
@@ -138,7 +138,7 @@ I want to sort the titanic data according to the age of the passengers.
138138
<ul class="task-bullet">
139139
<li>
140140

141-
I want to sort the titanic data according to the cabin class and age in descending order.
141+
I want to sort the Titanic data according to the cabin class and age in descending order.
142142

143143
.. ipython:: python
144144
@@ -282,7 +282,7 @@ For more information about :meth:`~DataFrame.pivot_table`, see the user guide se
282282
</div>
283283

284284
.. note::
285-
If case you are wondering, :meth:`~DataFrame.pivot_table` is indeed directly linked
285+
In case you are wondering, :meth:`~DataFrame.pivot_table` is indeed directly linked
286286
to :meth:`~DataFrame.groupby`. The same result can be derived by grouping on both
287287
``parameter`` and ``location``:
288288

@@ -338,7 +338,7 @@ newly created column.
338338

339339
The solution is the short version on how to apply :func:`pandas.melt`. The method
340340
will *melt* all columns NOT mentioned in ``id_vars`` together into two
341-
columns: A columns with the column header names and a column with the
341+
columns: A column with the column header names and a column with the
342342
values itself. The latter column gets by default the name ``value``.
343343

344344
The :func:`pandas.melt` method can be defined in more detail:
@@ -357,8 +357,8 @@ The result in the same, but in more detail defined:
357357

358358
- ``value_vars`` defines explicitly which columns to *melt* together
359359
- ``value_name`` provides a custom column name for the values column
360-
instead of the default columns name ``value``
361-
- ``var_name`` provides a custom column name for the columns collecting
360+
instead of the default column name ``value``
361+
- ``var_name`` provides a custom column name for the column collecting
362362
the column header names. Otherwise it takes the index name or a
363363
default ``variable``
364364

@@ -383,7 +383,7 @@ Conversion from wide to long format with :func:`pandas.melt` is explained in the
383383
<h4>REMEMBER</h4>
384384

385385
- Sorting by one or more columns is supported by ``sort_values``
386-
- The ``pivot`` function is purely restructering of the data,
386+
- The ``pivot`` function is purely restructuring of the data,
387387
``pivot_table`` supports aggregations
388388
- The reverse of ``pivot`` (long to wide format) is ``melt`` (wide to
389389
long format)

doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -305,7 +305,7 @@ More information on join/merge of tables is provided in the user guide section o
305305
<div class="shadow gs-callout gs-callout-remember">
306306
<h4>REMEMBER</h4>
307307

308-
- Multiple tables can be concatenated both column as row wise using
308+
- Multiple tables can be concatenated both column-wise and row-wise using
309309
the ``concat`` function.
310310
- For database-like merging/joining of tables, use the ``merge``
311311
function.

doc/source/getting_started/intro_tutorials/09_timeseries.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ provide any datetime operations (e.g. extract the year, day of the
7878
week,…). By applying the ``to_datetime`` function, pandas interprets the
7979
strings and convert these to datetime (i.e. ``datetime64[ns, UTC]``)
8080
objects. In pandas we call these datetime objects similar to
81-
``datetime.datetime`` from the standard library a :class:`pandas.Timestamp`.
81+
``datetime.datetime`` from the standard library as :class:`pandas.Timestamp`.
8282

8383
.. raw:: html
8484

@@ -99,7 +99,7 @@ objects. In pandas we call these datetime objects similar to
9999
Why are these :class:`pandas.Timestamp` objects useful? Let’s illustrate the added
100100
value with some example cases.
101101

102-
What is the start and end date of the time series data set working
102+
What is the start and end date of the time series data set we are working
103103
with?
104104

105105
.. ipython:: python
@@ -214,7 +214,7 @@ Plot the typical :math:`NO_2` pattern during the day of our time series of all s
214214
215215
Similar to the previous case, we want to calculate a given statistic
216216
(e.g. mean :math:`NO_2`) **for each hour of the day** and we can use the
217-
split-apply-combine approach again. For this case, the datetime property ``hour``
217+
split-apply-combine approach again. For this case, we use the datetime property ``hour``
218218
of pandas ``Timestamp``, which is also accessible by the ``dt`` accessor.
219219

220220
.. raw:: html

doc/source/getting_started/intro_tutorials/10_text_data.rst

+10-10
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
<div class="card-body">
2424
<p class="card-text">
2525

26-
This tutorial uses the titanic data set, stored as CSV. The data
26+
This tutorial uses the Titanic data set, stored as CSV. The data
2727
consists of the following data columns:
2828

2929
- PassengerId: Id of every passenger.
@@ -102,7 +102,7 @@ Create a new column ``Surname`` that contains the surname of the Passengers by e
102102
103103
Using the :meth:`Series.str.split` method, each of the values is returned as a list of
104104
2 elements. The first element is the part before the comma and the
105-
second element the part after the comma.
105+
second element is the part after the comma.
106106

107107
.. ipython:: python
108108
@@ -135,7 +135,7 @@ More information on extracting parts of strings is available in the user guide s
135135
<ul class="task-bullet">
136136
<li>
137137

138-
Extract the passenger data about the Countess on board of the Titanic.
138+
Extract the passenger data about the Countesses on board of the Titanic.
139139

140140
.. ipython:: python
141141
@@ -145,24 +145,24 @@ Extract the passenger data about the Countess on board of the Titanic.
145145
146146
titanic[titanic["Name"].str.contains("Countess")]
147147
148-
(*Interested in her story? See*\ `Wikipedia <https://en.wikipedia.org/wiki/No%C3%ABl_Leslie,_Countess_of_Rothes>`__\ *!*)
148+
(*Interested in her story? See *\ `Wikipedia <https://en.wikipedia.org/wiki/No%C3%ABl_Leslie,_Countess_of_Rothes>`__\ *!*)
149149

150150
The string method :meth:`Series.str.contains` checks for each of the values in the
151151
column ``Name`` if the string contains the word ``Countess`` and returns
152152
for each of the values ``True`` (``Countess`` is part of the name) of
153-
``False`` (``Countess`` is notpart of the name). This output can be used
153+
``False`` (``Countess`` is not part of the name). This output can be used
154154
to subselect the data using conditional (boolean) indexing introduced in
155155
the :ref:`subsetting of data tutorial <10min_tut_03_subset>`. As there was
156-
only 1 Countess on the Titanic, we get one row as a result.
156+
only one Countess on the Titanic, we get one row as a result.
157157

158158
.. raw:: html
159159

160160
</li>
161161
</ul>
162162

163163
.. note::
164-
More powerful extractions on strings is supported, as the
165-
:meth:`Series.str.contains` and :meth:`Series.str.extract` methods accepts `regular
164+
More powerful extractions on strings are supported, as the
165+
:meth:`Series.str.contains` and :meth:`Series.str.extract` methods accept `regular
166166
expressions <https://docs.python.org/3/library/re.html>`__, but out of
167167
scope of this tutorial.
168168

@@ -182,7 +182,7 @@ More information on extracting parts of strings is available in the user guide s
182182
<ul class="task-bullet">
183183
<li>
184184

185-
Which passenger of the titanic has the longest name?
185+
Which passenger of the Titanic has the longest name?
186186

187187
.. ipython:: python
188188
@@ -220,7 +220,7 @@ we can do a selection using the ``loc`` operator, introduced in the
220220
<ul class="task-bullet">
221221
<li>
222222

223-
In the Sex’ columns, replace values of male by ‘M’ and all ‘female’ values by ‘F’
223+
In the "Sex" column, replace values of "male" by "M" and values of "female" by "F"
224224

225225
.. ipython:: python
226226

doc/source/reference/extensions.rst

+1
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ objects.
4545
api.extensions.ExtensionArray.copy
4646
api.extensions.ExtensionArray.view
4747
api.extensions.ExtensionArray.dropna
48+
api.extensions.ExtensionArray.equals
4849
api.extensions.ExtensionArray.factorize
4950
api.extensions.ExtensionArray.fillna
5051
api.extensions.ExtensionArray.isna

0 commit comments

Comments
 (0)