Skip to content

Commit cd8728f

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into ref-dtypes
2 parents 11ed746 + 0d9b57f commit cd8728f

File tree

21 files changed

+211
-41
lines changed

21 files changed

+211
-41
lines changed

.travis.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -69,9 +69,9 @@ matrix:
6969
env:
7070
- JOB="3.7, arm64" PYTEST_WORKERS=8 ENV_FILE="ci/deps/travis-37-arm64.yaml" PATTERN="(not slow and not network and not clipboard)"
7171
- dist: bionic
72-
python: 3.9-dev
7372
env:
74-
- JOB="3.9-dev" PATTERN="(not slow and not network)"
73+
- JOB="3.9-dev" PATTERN="(not slow and not network and not clipboard)"
74+
7575

7676
before_install:
7777
- echo "before_install"

doc/source/user_guide/cookbook.rst

+19
Original file line numberDiff line numberDiff line change
@@ -1166,6 +1166,25 @@ Storing Attributes to a group node
11661166
store.close()
11671167
os.remove('test.h5')
11681168
1169+
You can create or load a HDFStore in-memory by passing the ``driver``
1170+
parameter to PyTables. Changes are only written to disk when the HDFStore
1171+
is closed.
1172+
1173+
.. ipython:: python
1174+
1175+
store = pd.HDFStore('test.h5', 'w', diver='H5FD_CORE')
1176+
1177+
df = pd.DataFrame(np.random.randn(8, 3))
1178+
store['test'] = df
1179+
1180+
# only after closing the store, data is written to disk:
1181+
store.close()
1182+
1183+
.. ipython:: python
1184+
:suppress:
1185+
1186+
os.remove('test.h5')
1187+
11691188
.. _cookbook.binary:
11701189

11711190
Binary files

doc/source/user_guide/enhancingperf.rst

+8
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,14 @@ when we use Cython and Numba on a test function operating row-wise on the
1313
``DataFrame``. Using :func:`pandas.eval` we will speed up a sum by an order of
1414
~2.
1515

16+
.. note::
17+
18+
In addition to following the steps in this tutorial, users interested in enhancing
19+
performance are highly encouraged to install the
20+
:ref:`recommended dependencies<install.recommended_dependencies>` for pandas.
21+
These dependencies are often not installed by default, but will offer speed
22+
improvements if present.
23+
1624
.. _enhancingperf.cython:
1725

1826
Cython (writing C extensions for pandas)

doc/source/user_guide/timeseries.rst

+8
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,8 @@ inferred frequency upon creation:
235235
236236
pd.DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05'], freq='infer')
237237
238+
.. _timeseries.converting.format:
239+
238240
Providing a format argument
239241
~~~~~~~~~~~~~~~~~~~~~~~~~~~
240242

@@ -319,6 +321,12 @@ which can be specified. These are computed from the starting point specified by
319321
pd.to_datetime([1349720105100, 1349720105200, 1349720105300,
320322
1349720105400, 1349720105500], unit='ms')
321323
324+
.. note::
325+
326+
The ``unit`` parameter does not use the same strings as the ``format`` parameter
327+
that was discussed :ref:`above<timeseries.converting.format>`). The
328+
available units are listed on the documentation for :func:`pandas.to_datetime`.
329+
322330
Constructing a :class:`Timestamp` or :class:`DatetimeIndex` with an epoch timestamp
323331
with the ``tz`` argument specified will currently localize the epoch timestamps to UTC
324332
first then convert the result to the specified time zone. However, this behavior

doc/source/user_guide/visualization.rst

+8-8
Original file line numberDiff line numberDiff line change
@@ -443,9 +443,8 @@ Faceting, created by ``DataFrame.boxplot`` with the ``by``
443443
keyword, will affect the output type as well:
444444

445445
================ ======= ==========================
446-
``return_type=`` Faceted Output type
447-
---------------- ------- --------------------------
448-
446+
``return_type`` Faceted Output type
447+
================ ======= ==========================
449448
``None`` No axes
450449
``None`` Yes 2-D ndarray of axes
451450
``'axes'`` No axes
@@ -1424,7 +1423,7 @@ Here is an example of one way to easily plot group means with standard deviation
14241423
# Plot
14251424
fig, ax = plt.subplots()
14261425
@savefig errorbar_example.png
1427-
means.plot.bar(yerr=errors, ax=ax, capsize=4)
1426+
means.plot.bar(yerr=errors, ax=ax, capsize=4, rot=0)
14281427
14291428
.. ipython:: python
14301429
:suppress:
@@ -1445,9 +1444,9 @@ Plotting with matplotlib table is now supported in :meth:`DataFrame.plot` and :
14451444
14461445
.. ipython:: python
14471446
1448-
fig, ax = plt.subplots(1, 1)
1447+
fig, ax = plt.subplots(1, 1, figsize=(7, 6.5))
14491448
df = pd.DataFrame(np.random.rand(5, 3), columns=['a', 'b', 'c'])
1450-
ax.get_xaxis().set_visible(False) # Hide Ticks
1449+
ax.xaxis.tick_top() # Display x-axis ticks on top.
14511450
14521451
@savefig line_plot_table_true.png
14531452
df.plot(table=True, ax=ax)
@@ -1464,8 +1463,9 @@ as seen in the example below.
14641463

14651464
.. ipython:: python
14661465
1467-
fig, ax = plt.subplots(1, 1)
1468-
ax.get_xaxis().set_visible(False) # Hide Ticks
1466+
fig, ax = plt.subplots(1, 1, figsize=(7, 6.75))
1467+
ax.xaxis.tick_top() # Display x-axis ticks on top.
1468+
14691469
@savefig line_plot_table_data.png
14701470
df.plot(table=np.round(df.T, 2), ax=ax)
14711471

doc/source/whatsnew/v1.1.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -956,6 +956,7 @@ MultiIndex
956956
df.loc[(['b', 'a'], [2, 1]), :]
957957
958958
- Bug in :meth:`MultiIndex.intersection` was not guaranteed to preserve order when ``sort=False``. (:issue:`31325`)
959+
- Bug in :meth:`DataFrame.truncate` was dropping :class:`MultiIndex` names. (:issue:`34564`)
959960

960961
.. ipython:: python
961962
@@ -1058,6 +1059,7 @@ Reshaping
10581059
- Bug in :func:`Dataframe.aggregate` and :func:`Series.aggregate` was causing recursive loop in some cases (:issue:`34224`)
10591060
- Fixed bug in :func:`melt` where melting MultiIndex columns with ``col_level`` > 0 would raise a ``KeyError`` on ``id_vars`` (:issue:`34129`)
10601061
- Bug in :meth:`Series.where` with an empty Series and empty ``cond`` having non-bool dtype (:issue:`34592`)
1062+
- Fixed regression where :meth:`DataFrame.apply` would raise ``ValueError`` for elements whth ``S`` dtype (:issue:`34529`)
10611063

10621064
Sparse
10631065
^^^^^^

pandas/core/dtypes/cast.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1608,7 +1608,7 @@ def construct_1d_ndarray_preserving_na(
16081608
"""
16091609
subarr = np.array(values, dtype=dtype, copy=copy)
16101610

1611-
if dtype is not None and dtype.kind in ("U", "S"):
1611+
if dtype is not None and dtype.kind == "U":
16121612
# GH-21083
16131613
# We can't just return np.array(subarr, dtype='str') since
16141614
# NumPy will convert the non-string objects into strings

pandas/core/indexes/base.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -518,7 +518,12 @@ def is_(self, other) -> bool:
518518
519519
Returns
520520
-------
521-
True if both have same underlying data, False otherwise : bool
521+
bool
522+
True if both have same underlying data, False otherwise.
523+
524+
See Also
525+
--------
526+
Index.identical : Works like ``Index.is_`` but also checks metadata.
522527
"""
523528
# use something other than None to be clearer
524529
return self._id is getattr(other, "_id", Ellipsis) and self._id is not None

pandas/core/indexes/multi.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -3193,7 +3193,12 @@ def truncate(self, before=None, after=None):
31933193
new_codes = [level_codes[left:right] for level_codes in self.codes]
31943194
new_codes[0] = new_codes[0] - i
31953195

3196-
return MultiIndex(levels=new_levels, codes=new_codes, verify_integrity=False)
3196+
return MultiIndex(
3197+
levels=new_levels,
3198+
codes=new_codes,
3199+
names=self._names,
3200+
verify_integrity=False,
3201+
)
31973202

31983203
def equals(self, other) -> bool:
31993204
"""

pandas/io/pytables.py

+27-12
Original file line numberDiff line numberDiff line change
@@ -447,8 +447,8 @@ class HDFStore:
447447
448448
Parameters
449449
----------
450-
path : string
451-
File path to HDF5 file
450+
path : str
451+
File path to HDF5 file.
452452
mode : {'a', 'w', 'r', 'r+'}, default 'a'
453453
454454
``'r'``
@@ -462,18 +462,20 @@ class HDFStore:
462462
``'r+'``
463463
It is similar to ``'a'``, but the file must already exist.
464464
complevel : int, 0-9, default None
465-
Specifies a compression level for data.
466-
A value of 0 or None disables compression.
465+
Specifies a compression level for data.
466+
A value of 0 or None disables compression.
467467
complib : {'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib'
468-
Specifies the compression library to be used.
469-
As of v0.20.2 these additional compressors for Blosc are supported
470-
(default if no compressor specified: 'blosc:blosclz'):
471-
{'blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:snappy',
472-
'blosc:zlib', 'blosc:zstd'}.
473-
Specifying a compression library which is not available issues
474-
a ValueError.
468+
Specifies the compression library to be used.
469+
As of v0.20.2 these additional compressors for Blosc are supported
470+
(default if no compressor specified: 'blosc:blosclz'):
471+
{'blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:snappy',
472+
'blosc:zlib', 'blosc:zstd'}.
473+
Specifying a compression library which is not available issues
474+
a ValueError.
475475
fletcher32 : bool, default False
476-
If applying compression use the fletcher32 checksum
476+
If applying compression use the fletcher32 checksum.
477+
**kwargs
478+
These parameters will be passed to the PyTables open_file method.
477479
478480
Examples
479481
--------
@@ -482,6 +484,17 @@ class HDFStore:
482484
>>> store['foo'] = bar # write to HDF5
483485
>>> bar = store['foo'] # retrieve
484486
>>> store.close()
487+
488+
**Create or load HDF5 file in-memory**
489+
490+
When passing the `driver` option to the PyTables open_file method through
491+
**kwargs, the HDF5 file is loaded or created in-memory and will only be
492+
written when closed:
493+
494+
>>> bar = pd.DataFrame(np.random.randn(10, 4))
495+
>>> store = pd.HDFStore('test.h5', driver='H5FD_CORE')
496+
>>> store['foo'] = bar
497+
>>> store.close() # only now, data is written to disk
485498
"""
486499

487500
_handle: Optional["File"]
@@ -634,6 +647,8 @@ def open(self, mode: str = "a", **kwargs):
634647
----------
635648
mode : {'a', 'w', 'r', 'r+'}, default 'a'
636649
See HDFStore docstring or tables.open_file for info about modes
650+
**kwargs
651+
These parameters will be passed to the PyTables open_file method.
637652
"""
638653
tables = _tables()
639654

pandas/tests/arrays/sparse/test_array.py

+12
Original file line numberDiff line numberDiff line change
@@ -1295,3 +1295,15 @@ def test_map_missing():
12951295

12961296
result = arr.map({0: 10, 1: 11})
12971297
tm.assert_sp_array_equal(result, expected)
1298+
1299+
1300+
@pytest.mark.parametrize("fill_value", [np.nan, 1])
1301+
def test_dropna(fill_value):
1302+
# GH-28287
1303+
arr = SparseArray([np.nan, 1], fill_value=fill_value)
1304+
exp = SparseArray([1.0], fill_value=fill_value)
1305+
tm.assert_sp_array_equal(arr.dropna(), exp)
1306+
1307+
df = pd.DataFrame({"a": [0, 1], "b": arr})
1308+
expected_df = pd.DataFrame({"a": [1], "b": exp}, index=pd.Int64Index([1]))
1309+
tm.assert_equal(df.dropna(), expected_df)

pandas/tests/extension/json/array.py

+2-4
Original file line numberDiff line numberDiff line change
@@ -179,13 +179,11 @@ def astype(self, dtype, copy=True):
179179
def unique(self):
180180
# Parent method doesn't work since np.array will try to infer
181181
# a 2-dim object.
182-
return type(self)(
183-
[dict(x) for x in list({tuple(d.items()) for d in self.data})]
184-
)
182+
return type(self)([dict(x) for x in {tuple(d.items()) for d in self.data}])
185183

186184
@classmethod
187185
def _concat_same_type(cls, to_concat):
188-
data = list(itertools.chain.from_iterable([x.data for x in to_concat]))
186+
data = list(itertools.chain.from_iterable(x.data for x in to_concat))
189187
return cls(data)
190188

191189
def _values_for_factorize(self):

pandas/tests/frame/methods/test_diff.py

+45
Original file line numberDiff line numberDiff line change
@@ -169,3 +169,48 @@ def test_diff_sparse(self):
169169
)
170170

171171
tm.assert_frame_equal(result, expected)
172+
173+
@pytest.mark.parametrize(
174+
"axis,expected",
175+
[
176+
(
177+
0,
178+
pd.DataFrame(
179+
{
180+
"a": [np.nan, 0, 1, 0, np.nan, np.nan, np.nan, 0],
181+
"b": [np.nan, 1, np.nan, np.nan, -2, 1, np.nan, np.nan],
182+
"c": np.repeat(np.nan, 8),
183+
"d": [np.nan, 3, 5, 7, 9, 11, 13, 15],
184+
},
185+
dtype="Int64",
186+
),
187+
),
188+
(
189+
1,
190+
pd.DataFrame(
191+
{
192+
"a": np.repeat(np.nan, 8),
193+
"b": [0, 1, np.nan, 1, np.nan, np.nan, np.nan, 0],
194+
"c": np.repeat(np.nan, 8),
195+
"d": np.repeat(np.nan, 8),
196+
},
197+
dtype="Int64",
198+
),
199+
),
200+
],
201+
)
202+
def test_diff_integer_na(self, axis, expected):
203+
# GH#24171 IntegerNA Support for DataFrame.diff()
204+
df = pd.DataFrame(
205+
{
206+
"a": np.repeat([0, 1, np.nan, 2], 2),
207+
"b": np.tile([0, 1, np.nan, 2], 2),
208+
"c": np.repeat(np.nan, 8),
209+
"d": np.arange(1, 9) ** 2,
210+
},
211+
dtype="Int64",
212+
)
213+
214+
# Test case for default behaviour of diff
215+
result = df.diff(axis=axis)
216+
tm.assert_frame_equal(result, expected)

pandas/tests/frame/methods/test_truncate.py

+13
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,16 @@ def test_truncate_decreasing_index(self, before, after, indices, klass):
104104
result = values.truncate(before=before, after=after)
105105
expected = values.loc[indices]
106106
tm.assert_frame_equal(result, expected)
107+
108+
def test_truncate_multiindex(self):
109+
# GH 34564
110+
mi = pd.MultiIndex.from_product([[1, 2, 3, 4], ["A", "B"]], names=["L1", "L2"])
111+
s1 = pd.DataFrame(range(mi.shape[0]), index=mi, columns=["col"])
112+
result = s1.truncate(before=2, after=3)
113+
114+
df = pd.DataFrame.from_dict(
115+
{"L1": [2, 2, 3, 3], "L2": ["A", "B", "A", "B"], "col": [2, 3, 4, 5]}
116+
)
117+
expected = df.set_index(["L1", "L2"])
118+
119+
tm.assert_frame_equal(result, expected)

pandas/tests/frame/test_apply.py

+11
Original file line numberDiff line numberDiff line change
@@ -785,6 +785,17 @@ def non_reducing_function(val):
785785
df.applymap(func)
786786
assert values == df.a.to_list()
787787

788+
def test_apply_with_byte_string(self):
789+
# GH 34529
790+
df = pd.DataFrame(np.array([b"abcd", b"efgh"]), columns=["col"])
791+
expected = pd.DataFrame(
792+
np.array([b"abcd", b"efgh"]), columns=["col"], dtype=object
793+
)
794+
# After we make the aply we exect a dataframe just
795+
# like the original but with the object datatype
796+
result = df.apply(lambda x: x.astype("object"))
797+
tm.assert_frame_equal(result, expected)
798+
788799

789800
class TestInferOutputShape:
790801
# the user has supplied an opaque UDF where

pandas/tests/indexes/multi/test_analytics.py

+8-2
Original file line numberDiff line numberDiff line change
@@ -30,27 +30,33 @@ def test_groupby(idx):
3030
tm.assert_dict_equal(groups, exp)
3131

3232

33-
def test_truncate():
33+
def test_truncate_multiindex():
34+
# GH 34564 for MultiIndex level names check
3435
major_axis = Index(list(range(4)))
3536
minor_axis = Index(list(range(2)))
3637

3738
major_codes = np.array([0, 0, 1, 2, 3, 3])
3839
minor_codes = np.array([0, 1, 0, 1, 0, 1])
3940

4041
index = MultiIndex(
41-
levels=[major_axis, minor_axis], codes=[major_codes, minor_codes]
42+
levels=[major_axis, minor_axis],
43+
codes=[major_codes, minor_codes],
44+
names=["L1", "L2"],
4245
)
4346

4447
result = index.truncate(before=1)
4548
assert "foo" not in result.levels[0]
4649
assert 1 in result.levels[0]
50+
assert index.names == result.names
4751

4852
result = index.truncate(after=1)
4953
assert 2 not in result.levels[0]
5054
assert 1 in result.levels[0]
55+
assert index.names == result.names
5156

5257
result = index.truncate(before=1, after=2)
5358
assert len(result.levels[0]) == 2
59+
assert index.names == result.names
5460

5561
msg = "after < before"
5662
with pytest.raises(ValueError, match=msg):

0 commit comments

Comments
 (0)