Skip to content

Commit 593dda1

Browse files
committed
Merge remote-tracking branch 'upstream/master' into bug/categorical-indexing-1row-df
* upstream/master: (49 commits) repr() (pandas-dev#29959) DOC : Typo fix in userguide/Styling (pandas-dev#29956) CLN: small things in pytables (pandas-dev#29958) API/DEPR: Change default skipna behaviour + deprecate numeric_only in Categorical.min and max (pandas-dev#27929) DEPR: DTI/TDI/PI constructor arguments (pandas-dev#29930) CLN: fix pytables passing too many kwargs (pandas-dev#29951) Typing (pandas-dev#29947) repr() (pandas-dev#29948) repr() (pandas-dev#29950) Added space at the end of the sentence (pandas-dev#29949) ENH: add NA scalar for missing value indicator, use in StringArray. (pandas-dev#29597) CLN: BlockManager.apply (pandas-dev#29825) TST: add test for rolling max/min/mean with DatetimeIndex over different frequencies (pandas-dev#29932) CLN: explicit signature for to_hdf (pandas-dev#29939) CLN: make kwargs explicit for pytables read_ methods (pandas-dev#29935) Convert core/indexes/base.py to f-strings (pandas-dev#29903) DEPR: dropna multiple axes, fillna int for td64, from_codes with floats, Series.nonzero (pandas-dev#29875) CLN: make kwargs explicit in pytables constructors (pandas-dev#29936) DEPR: tz_convert in the Timestamp constructor raises (pandas-dev#29929) STY: F-strings and repr (pandas-dev#29938) ...
2 parents 4257fe8 + 0c2b1db commit 593dda1

File tree

159 files changed

+1831
-2645
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

159 files changed

+1831
-2645
lines changed

.github/workflows/ci.yml

+9-6
Original file line numberDiff line numberDiff line change
@@ -80,15 +80,18 @@ jobs:
8080
git fetch upstream
8181
if git diff upstream/master --name-only | grep -q "^asv_bench/"; then
8282
asv machine --yes
83-
ASV_OUTPUT="$(asv dev)"
84-
if [[ $(echo "$ASV_OUTPUT" | grep "failed") ]]; then
85-
echo "##vso[task.logissue type=error]Benchmarks run with errors"
86-
echo "$ASV_OUTPUT"
83+
asv dev | sed "/failed$/ s/^/##[error]/" | tee benchmarks.log
84+
if grep "failed" benchmarks.log > /dev/null ; then
8785
exit 1
88-
else
89-
echo "Benchmarks run without errors"
9086
fi
9187
else
9288
echo "Benchmarks did not run, no changes detected"
9389
fi
9490
if: true
91+
92+
- name: Publish benchmarks artifact
93+
uses: actions/upload-artifact@master
94+
with:
95+
name: Benchmarks log
96+
path: asv_bench/benchmarks.log
97+
if: failure()

asv_bench/benchmarks/frame_methods.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -565,7 +565,7 @@ def setup(self):
565565

566566
def time_frame_get_dtype_counts(self):
567567
with warnings.catch_warnings(record=True):
568-
self.df.get_dtype_counts()
568+
self.df._data.get_dtype_counts()
569569

570570
def time_info(self):
571571
self.df.info()

ci/code_checks.sh

+12-16
Original file line numberDiff line numberDiff line change
@@ -34,17 +34,13 @@ function invgrep {
3434
#
3535
# This is useful for the CI, as we want to fail if one of the patterns
3636
# that we want to avoid is found by grep.
37-
if [[ "$AZURE" == "true" ]]; then
38-
set -o pipefail
39-
grep -n "$@" | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Found unwanted pattern: " $3}'
40-
else
41-
grep "$@"
42-
fi
43-
return $((! $?))
37+
grep -n "$@" | sed "s/^/$INVGREP_PREPEND/" | sed "s/$/$INVGREP_APPEND/" ; EXIT_STATUS=${PIPESTATUS[0]}
38+
return $((! $EXIT_STATUS))
4439
}
4540

46-
if [[ "$AZURE" == "true" ]]; then
47-
FLAKE8_FORMAT="##vso[task.logissue type=error;sourcepath=%(path)s;linenumber=%(row)s;columnnumber=%(col)s;code=%(code)s;]%(text)s"
41+
if [[ "$GITHUB_ACTIONS" == "true" ]]; then
42+
FLAKE8_FORMAT="##[error]%(path)s:%(row)s:%(col)s:%(code):%(text)s"
43+
INVGREP_PREPEND="##[error]"
4844
else
4945
FLAKE8_FORMAT="default"
5046
fi
@@ -198,15 +194,15 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
198194
invgrep -R --include="*.py" -P '# type: (?!ignore)' pandas
199195
RET=$(($RET + $?)) ; echo $MSG "DONE"
200196

197+
MSG='Check for use of foo.__class__ instead of type(foo)' ; echo $MSG
198+
invgrep -R --include=*.{py,pyx} '\.__class__' pandas
199+
RET=$(($RET + $?)) ; echo $MSG "DONE"
200+
201201
MSG='Check that no file in the repo contains trailing whitespaces' ; echo $MSG
202-
set -o pipefail
203-
if [[ "$AZURE" == "true" ]]; then
204-
# we exclude all c/cpp files as the c/cpp files of pandas code base are tested when Linting .c and .h files
205-
! grep -n '--exclude=*.'{svg,c,cpp,html,js} --exclude-dir=env -RI "\s$" * | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Tailing whitespaces found: " $3}'
206-
else
207-
! grep -n '--exclude=*.'{svg,c,cpp,html,js} --exclude-dir=env -RI "\s$" * | awk -F ":" '{print $1 ":" $2 ":Tailing whitespaces found: " $3}'
208-
fi
202+
INVGREP_APPEND=" <- trailing whitespaces found"
203+
invgrep -RI --exclude=\*.{svg,c,cpp,html,js} --exclude-dir=env "\s$" *
209204
RET=$(($RET + $?)) ; echo $MSG "DONE"
205+
unset INVGREP_APPEND
210206
fi
211207

212208
### CODE ###

doc/redirects.csv

-181
Large diffs are not rendered by default.

doc/source/getting_started/basics.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -2006,7 +2006,7 @@ The number of columns of each type in a ``DataFrame`` can be found by calling
20062006
20072007
Numeric dtypes will propagate and can coexist in DataFrames.
20082008
If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``,
2009-
or a passed ``Series``, then it will be preserved in DataFrame operations. Furthermore,
2009+
or a passed ``Series``), then it will be preserved in DataFrame operations. Furthermore,
20102010
different numeric dtypes will **NOT** be combined. The following example will give you a taste.
20112011

20122012
.. ipython:: python

doc/source/reference/frame.rst

-2
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@ Attributes and underlying data
2828
:toctree: api/
2929

3030
DataFrame.dtypes
31-
DataFrame.get_dtype_counts
3231
DataFrame.select_dtypes
3332
DataFrame.values
3433
DataFrame.get_values
@@ -363,7 +362,6 @@ Serialization / IO / conversion
363362
DataFrame.to_msgpack
364363
DataFrame.to_gbq
365364
DataFrame.to_records
366-
DataFrame.to_dense
367365
DataFrame.to_string
368366
DataFrame.to_clipboard
369367
DataFrame.style

doc/source/reference/indexing.rst

-4
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@ Properties
3232
Index.has_duplicates
3333
Index.hasnans
3434
Index.dtype
35-
Index.dtype_str
3635
Index.inferred_type
3736
Index.is_all_dates
3837
Index.shape
@@ -42,9 +41,6 @@ Properties
4241
Index.ndim
4342
Index.size
4443
Index.empty
45-
Index.strides
46-
Index.itemsize
47-
Index.base
4844
Index.T
4945
Index.memory_usage
5046

doc/source/reference/series.rst

-6
Original file line numberDiff line numberDiff line change
@@ -33,16 +33,11 @@ Attributes
3333
Series.nbytes
3434
Series.ndim
3535
Series.size
36-
Series.strides
37-
Series.itemsize
38-
Series.base
3936
Series.T
4037
Series.memory_usage
4138
Series.hasnans
42-
Series.flags
4339
Series.empty
4440
Series.dtypes
45-
Series.data
4641
Series.name
4742
Series.put
4843

@@ -584,7 +579,6 @@ Serialization / IO / conversion
584579
Series.to_sql
585580
Series.to_msgpack
586581
Series.to_json
587-
Series.to_dense
588582
Series.to_string
589583
Series.to_clipboard
590584
Series.to_latex

doc/source/user_guide/indexing.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -374,7 +374,7 @@ For getting values with a boolean array:
374374
df1.loc['a'] > 0
375375
df1.loc[:, df1.loc['a'] > 0]
376376
377-
For getting a value explicitly (equivalent to deprecated ``df.get_value('a','A')``):
377+
For getting a value explicitly:
378378

379379
.. ipython:: python
380380

doc/source/user_guide/missing_data.rst

+143-6
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ pandas.
1212
.. note::
1313

1414
The choice of using ``NaN`` internally to denote missing data was largely
15-
for simplicity and performance reasons. It differs from the MaskedArray
16-
approach of, for example, :mod:`scikits.timeseries`. We are hopeful that
17-
NumPy will soon be able to provide a native NA type solution (similar to R)
18-
performant enough to be used in pandas.
15+
for simplicity and performance reasons.
16+
Starting from pandas 1.0, some optional data types start experimenting
17+
with a native ``NA`` scalar using a mask-based approach. See
18+
:ref:`here <missing_data.NA>` for more.
1919

2020
See the :ref:`cookbook<cookbook.missing_data>` for some advanced strategies.
2121

@@ -110,7 +110,7 @@ pandas objects provide compatibility between ``NaT`` and ``NaN``.
110110
.. _missing.inserting:
111111

112112
Inserting missing data
113-
----------------------
113+
~~~~~~~~~~~~~~~~~~~~~~
114114

115115
You can insert missing values by simply assigning to containers. The
116116
actual missing value used will be chosen based on the dtype.
@@ -135,9 +135,10 @@ For object containers, pandas will use the value given:
135135
s.loc[1] = np.nan
136136
s
137137
138+
.. _missing_data.calculations:
138139

139140
Calculations with missing data
140-
------------------------------
141+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
141142

142143
Missing values propagate naturally through arithmetic operations between pandas
143144
objects.
@@ -771,3 +772,139 @@ the ``dtype="Int64"``.
771772
s
772773
773774
See :ref:`integer_na` for more.
775+
776+
777+
.. _missing_data.NA:
778+
779+
Experimental ``NA`` scalar to denote missing values
780+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
781+
782+
.. warning::
783+
784+
Experimental: the behaviour of ``pd.NA`` can still change without warning.
785+
786+
.. versionadded:: 1.0.0
787+
788+
Starting from pandas 1.0, an experimental ``pd.NA`` value (singleton) is
789+
available to represent scalar missing values. At this moment, it is used in
790+
the nullable :doc:`integer <integer_na>`, boolean and
791+
:ref:`dedicated string <text.types>` data types as the missing value indicator.
792+
793+
The goal of ``pd.NA`` is provide a "missing" indicator that can be used
794+
consistently accross data types (instead of ``np.nan``, ``None`` or ``pd.NaT``
795+
depending on the data type).
796+
797+
For example, when having missing values in a Series with the nullable integer
798+
dtype, it will use ``pd.NA``:
799+
800+
.. ipython:: python
801+
802+
s = pd.Series([1, 2, None], dtype="Int64")
803+
s
804+
s[2]
805+
s[2] is pd.NA
806+
807+
Currently, pandas does not yet use those data types by default (when creating
808+
a DataFrame or Series, or when reading in data), so you need to specify
809+
the dtype explicitly.
810+
811+
Propagation in arithmetic and comparison operations
812+
---------------------------------------------------
813+
814+
In general, missing values *propagate* in operations involving ``pd.NA``. When
815+
one of the operands is unknown, the outcome of the operation is also unknown.
816+
817+
For example, ``pd.NA`` propagates in arithmetic operations, similarly to
818+
``np.nan``:
819+
820+
.. ipython:: python
821+
822+
pd.NA + 1
823+
"a" * pd.NA
824+
825+
In equality and comparison operations, ``pd.NA`` also propagates. This deviates
826+
from the behaviour of ``np.nan``, where comparisons with ``np.nan`` always
827+
return ``False``.
828+
829+
.. ipython:: python
830+
831+
pd.NA == 1
832+
pd.NA == pd.NA
833+
pd.NA < 2.5
834+
835+
To check if a value is equal to ``pd.NA``, the :func:`isna` function can be
836+
used:
837+
838+
.. ipython:: python
839+
840+
pd.isna(pd.NA)
841+
842+
An exception on this basic propagation rule are *reductions* (such as the
843+
mean or the minimum), where pandas defaults to skipping missing values. See
844+
:ref:`above <missing_data.calculations>` for more.
845+
846+
Logical operations
847+
------------------
848+
849+
For logical operations, ``pd.NA`` follows the rules of the
850+
`three-valued logic <https://en.wikipedia.org/wiki/Three-valued_logic>`__ (or
851+
*Kleene logic*, similarly to R, SQL and Julia). This logic means to only
852+
propagate missing values when it is logically required.
853+
854+
For example, for the logical "or" operation (``|``), if one of the operands
855+
is ``True``, we already know the result will be ``True``, regardless of the
856+
other value (so regardless the missing value would be ``True`` or ``False``).
857+
In this case, ``pd.NA`` does not propagate:
858+
859+
.. ipython:: python
860+
861+
True | False
862+
True | pd.NA
863+
pd.NA | True
864+
865+
On the other hand, if one of the operands is ``False``, the result depends
866+
on the value of the other operand. Therefore, in this case ``pd.NA``
867+
propagates:
868+
869+
.. ipython:: python
870+
871+
False | True
872+
False | False
873+
False | pd.NA
874+
875+
The behaviour of the logical "and" operation (``&``) can be derived using
876+
similar logic (where now ``pd.NA`` will not propagate if one of the operands
877+
is already ``False``):
878+
879+
.. ipython:: python
880+
881+
False & True
882+
False & False
883+
False & pd.NA
884+
885+
.. ipython:: python
886+
887+
True & True
888+
True & False
889+
True & pd.NA
890+
891+
892+
``NA`` in a boolean context
893+
---------------------------
894+
895+
Since the actual value of an NA is unknown, it is ambiguous to convert NA
896+
to a boolean value. The following raises an error:
897+
898+
.. ipython:: python
899+
:okexcept:
900+
901+
bool(pd.NA)
902+
903+
This also means that ``pd.NA`` cannot be used in a context where it is
904+
evaluated to a boolean, such as ``if condition: ...`` where ``condition`` can
905+
potentially be ``pd.NA``. In such cases, :func:`isna` can be used to check
906+
for ``pd.NA`` or ``condition`` being ``pd.NA`` can be avoided, for example by
907+
filling missing values beforehand.
908+
909+
A similar situation occurs when using Series or DataFrame objects in ``if``
910+
statements, see :ref:`gotchas.truth`.

doc/source/user_guide/style.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -677,7 +677,7 @@
677677
"cell_type": "markdown",
678678
"metadata": {},
679679
"source": [
680-
"Notice that you're able share the styles even though they're data aware. The styles are re-evaluated on the new DataFrame they've been `use`d upon."
680+
"Notice that you're able to share the styles even though they're data aware. The styles are re-evaluated on the new DataFrame they've been `use`d upon."
681681
]
682682
},
683683
{

doc/source/whatsnew/v0.15.0.rst

+2-3
Original file line numberDiff line numberDiff line change
@@ -312,14 +312,13 @@ Timezone handling improvements
312312
previously this resulted in ``Exception`` or ``TypeError`` (:issue:`7812`)
313313

314314
.. ipython:: python
315-
:okwarning:
316315
317316
ts = pd.Timestamp('2014-08-01 09:00', tz='US/Eastern')
318317
ts
319318
ts.tz_localize(None)
320319
321-
didx = pd.DatetimeIndex(start='2014-08-01 09:00', freq='H',
322-
periods=10, tz='US/Eastern')
320+
didx = pd.date_range(start='2014-08-01 09:00', freq='H',
321+
periods=10, tz='US/Eastern')
323322
didx
324323
didx.tz_localize(None)
325324

doc/source/whatsnew/v0.25.1.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ including other versions of pandas.
99
I/O and LZMA
1010
~~~~~~~~~~~~
1111

12-
Some users may unknowingly have an incomplete Python installation lacking the `lzma` module from the standard library. In this case, `import pandas` failed due to an `ImportError` (:issue: `27575`).
12+
Some users may unknowingly have an incomplete Python installation lacking the `lzma` module from the standard library. In this case, `import pandas` failed due to an `ImportError` (:issue:`27575`).
1313
Pandas will now warn, rather than raising an `ImportError` if the `lzma` module is not present. Any subsequent attempt to use `lzma` methods will raise a `RuntimeError`.
1414
A possible fix for the lack of the `lzma` module is to ensure you have the necessary libraries and then re-install Python.
1515
For example, on MacOS installing Python with `pyenv` may lead to an incomplete Python installation due to unmet system dependencies at compilation time (like `xz`). Compilation will succeed, but Python might fail at run time. The issue can be solved by installing the necessary dependencies and then re-installing Python.

0 commit comments

Comments
 (0)