Skip to content

Commit 4d70a71

Browse files
author
Santhosh18
committed
Modified test cases and added detailed explanation in v1.1.0.rst
1 parent dc7055b commit 4d70a71

File tree

4 files changed

+96
-11
lines changed

4 files changed

+96
-11
lines changed

doc/source/whatsnew/v1.1.0.rst

+70
Original file line numberDiff line numberDiff line change
@@ -715,6 +715,76 @@ apply and applymap on ``DataFrame`` evaluates first row/column only once
715715
716716
df.apply(func, axis=1)
717717
718+
719+
.. _whatsnew_110.api_breaking.explode_infer_dtype:
720+
721+
Infer dtypes in explode method for Dataframe and Series
722+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
723+
724+
Using :meth:`DataFrame.explode` and :meth:`Series.explode` would always return an object for the column being exploded. Now the dtype of the column would be inferred and returned accordingly. (:issue:`34923`)
725+
726+
.. ipython:: python
727+
728+
s = pd.Series([1,2,3])
729+
df = pd.DataFrame({'A': [s, s, s, s], 'B': 1})
730+
731+
*Previous behavior*:
732+
733+
.. code-block:: ipython
734+
735+
In [3]: df.explode("A").dtypes
736+
Out[3]:
737+
A object
738+
B int64
739+
dtype: object
740+
741+
*New behavior*:
742+
743+
.. ipython:: ipython
744+
745+
In [3]: df.explode("A").dtypes
746+
Out[3]:
747+
A int64
748+
B int64
749+
dtype: object
750+
751+
.. _whatsnew_110.api.other:
752+
753+
Other API changes
754+
^^^^^^^^^^^^^^^^^
755+
756+
- :meth:`Series.describe` will now show distribution percentiles for ``datetime`` dtypes, statistics ``first`` and ``last``
757+
will now be ``min`` and ``max`` to match with numeric dtypes in :meth:`DataFrame.describe` (:issue:`30164`)
758+
- Added :meth:`DataFrame.value_counts` (:issue:`5377`)
759+
- :meth:`Groupby.groups` now returns an abbreviated representation when called on large dataframes (:issue:`1135`)
760+
- ``loc`` lookups with an object-dtype :class:`Index` and an integer key will now raise ``KeyError`` instead of ``TypeError`` when key is missing (:issue:`31905`)
761+
- Using a :func:`pandas.api.indexers.BaseIndexer` with ``count``, ``min``, ``max``, ``median``, ``skew``, ``cov``, ``corr`` will now return correct results for any monotonic :func:`pandas.api.indexers.BaseIndexer` descendant (:issue:`32865`)
762+
- Added a :func:`pandas.api.indexers.FixedForwardWindowIndexer` class to support forward-looking windows during ``rolling`` operations.
763+
- Added a :func:`pandas.api.indexers.NonFixedVariableWindowIndexer` class to support ``rolling`` operations with non-fixed offsets (:issue:`34994`)
764+
- Added :class:`pandas.errors.InvalidIndexError` (:issue:`34570`).
765+
- :meth:`DataFrame.swaplevels` now raises a ``TypeError`` if the axis is not a :class:`MultiIndex`.
766+
Previously an ``AttributeError`` was raised (:issue:`31126`)
767+
- :meth:`DataFrame.xs` now raises a ``TypeError`` if a ``level`` keyword is supplied and the axis is not a :class:`MultiIndex`.
768+
Previously an ``AttributeError`` was raised (:issue:`33610`)
769+
- :meth:`DataFrameGroupby.mean` and :meth:`SeriesGroupby.mean` (and similarly for :meth:`~DataFrameGroupby.median`, :meth:`~DataFrameGroupby.std` and :meth:`~DataFrameGroupby.var`)
770+
now raise a ``TypeError`` if a not-accepted keyword argument is passed into it.
771+
Previously a ``UnsupportedFunctionCall`` was raised (``AssertionError`` if ``min_count`` passed into :meth:`~DataFrameGroupby.median`) (:issue:`31485`)
772+
- :meth:`DataFrame.at` and :meth:`Series.at` will raise a ``TypeError`` instead of a ``ValueError`` if an incompatible key is passed, and ``KeyError`` if a missing key is passed, matching the behavior of ``.loc[]`` (:issue:`31722`)
773+
- Passing an integer dtype other than ``int64`` to ``np.array(period_index, dtype=...)`` will now raise ``TypeError`` instead of incorrectly using ``int64`` (:issue:`32255`)
774+
- Passing an invalid ``fill_value`` to :meth:`Categorical.take` raises a ``ValueError`` instead of ``TypeError`` (:issue:`33660`)
775+
- Combining a ``Categorical`` with integer categories and which contains missing values
776+
with a float dtype column in operations such as :func:`concat` or :meth:`~DataFrame.append`
777+
will now result in a float column instead of an object dtyped column (:issue:`33607`)
778+
- :meth:`Series.to_timestamp` now raises a ``TypeError`` if the axis is not a :class:`PeriodIndex`. Previously an ``AttributeError`` was raised (:issue:`33327`)
779+
- :meth:`Series.to_period` now raises a ``TypeError`` if the axis is not a :class:`DatetimeIndex`. Previously an ``AttributeError`` was raised (:issue:`33327`)
780+
- :func: `pandas.api.dtypes.is_string_dtype` no longer incorrectly identifies categorical series as string.
781+
- :func:`read_excel` no longer takes ``**kwds`` arguments. This means that passing in keyword ``chunksize`` now raises a ``TypeError``
782+
(previously raised a ``NotImplementedError``), while passing in keyword ``encoding`` now raises a ``TypeError`` (:issue:`34464`)
783+
- :func: `merge` now checks ``suffixes`` parameter type to be ``tuple`` and raises ``TypeError``, whereas before a ``list`` or ``set`` were accepted and that the ``set`` could produce unexpected results (:issue:`33740`)
784+
- :class:`Period` no longer accepts tuples for the ``freq`` argument (:issue:`34658`)
785+
- :meth:`Series.interpolate` and :meth:`DataFrame.interpolate` now raises ValueError if ``limit_direction`` is 'forward' or 'both' and ``method`` is 'backfill' or 'bfill' or ``limit_direction`` is 'backward' or 'both' and ``method`` is 'pad' or 'ffill' (:issue:`34746`)
786+
787+
718788
Increased minimum versions for dependencies
719789
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
720790

pandas/core/series.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -3843,7 +3843,7 @@ def explode(self, ignore_index: bool = False) -> "Series":
38433843
else:
38443844
index = self.index.repeat(counts)
38453845

3846-
result = self._constructor(values, index=index, name=self.name)
3846+
result = self._constructor(values, index=index, name=self.name).infer_objects()
38473847

38483848
return result
38493849

pandas/tests/frame/methods/test_explode.py

+20-5
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def test_basic():
2525
expected = pd.DataFrame(
2626
{
2727
"A": pd.Series(
28-
[0, 1, 2, np.nan, np.nan, 3, 4], index=list("aaabcdd"), dtype=object
28+
[0, 1, 2, np.nan, np.nan, 3, 4], index=list("aaabcdd"), dtype=np.float64
2929
),
3030
"B": 1,
3131
}
@@ -55,7 +55,7 @@ def test_multi_index_rows():
5555
("b", 2),
5656
]
5757
),
58-
dtype=object,
58+
dtype=np.float64,
5959
),
6060
"B": 1,
6161
}
@@ -74,7 +74,7 @@ def test_multi_index_columns():
7474
("A", 1): pd.Series(
7575
[0, 1, 2, np.nan, np.nan, 3, 4],
7676
index=pd.Index([0, 0, 0, 1, 2, 3, 3]),
77-
dtype=object,
77+
dtype=np.float64,
7878
),
7979
("A", 2): 1,
8080
}
@@ -93,7 +93,7 @@ def test_usecase():
9393
expected = pd.DataFrame(
9494
{
9595
"A": [11, 11, 11, 11, 11, 22, 22, 22],
96-
"B": np.array([0, 1, 2, 3, 4, 0, 1, 2], dtype=object),
96+
"B": np.array([0, 1, 2, 3, 4, 0, 1, 2], dtype=np.int64),
9797
"C": [10, 10, 10, 10, 10, 20, 20, 20],
9898
},
9999
columns=list("ABC"),
@@ -160,7 +160,22 @@ def test_duplicate_index(input_dict, input_index, expected_dict, expected_index)
160160
# GH 28005
161161
df = pd.DataFrame(input_dict, index=input_index)
162162
result = df.explode("col1")
163-
expected = pd.DataFrame(expected_dict, index=expected_index, dtype=object)
163+
expected = pd.DataFrame(expected_dict, index=expected_index, dtype=np.int64)
164+
tm.assert_frame_equal(result, expected)
165+
166+
167+
def test_inferred_dtype():
168+
# GH 34923
169+
s = pd.Series([1, None, 3])
170+
df = pd.DataFrame({'A': [s, s], "B": 1})
171+
result = df.explode("A")
172+
expected = pd.DataFrame(
173+
{
174+
"A": np.array([1, np.nan, 3, 1, np.nan, 3], dtype=np.float64),
175+
"B": np.array([1, 1, 1, 1, 1, 1], dtype=np.int64)
176+
},
177+
index=[0, 0, 0, 1, 1, 1]
178+
)
164179
tm.assert_frame_equal(result, expected)
165180

166181

pandas/tests/series/methods/test_explode.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@
77

88
def test_basic():
99
s = pd.Series([[0, 1, 2], np.nan, [], (3, 4)], index=list("abcd"), name="foo")
10-
result = s. explode()
10+
result = s.explode()
1111
expected = pd.Series(
12-
[0, 1, 2, np.nan, np.nan, 3, 4], index=list("aaabcdd"), dtype=object, name="foo"
12+
[0, 1, 2, np.nan, np.nan, 3, 4], index=list("aaabcdd"), dtype=np.float64, name="foo"
1313
)
1414
tm.assert_series_equal(result, expected)
1515

@@ -54,7 +54,7 @@ def test_multi_index():
5454
names=["foo", "bar"],
5555
)
5656
expected = pd.Series(
57-
[0, 1, 2, np.nan, np.nan, 3, 4], index=index, dtype=object, name="foo"
57+
[0, 1, 2, np.nan, np.nan, 3, 4], index=index, dtype=np.float64, name="foo"
5858
)
5959
tm.assert_series_equal(result, expected)
6060

@@ -116,14 +116,14 @@ def test_duplicate_index():
116116
# GH 28005
117117
s = pd.Series([[1, 2], [3, 4]], index=[0, 0])
118118
result = s.explode()
119-
expected = pd.Series([1, 2, 3, 4], index=[0, 0, 0, 0], dtype=object)
119+
expected = pd.Series([1, 2, 3, 4], index=[0, 0, 0, 0], dtype=np.int64)
120120
tm.assert_series_equal(result, expected)
121121

122122

123123
def test_ignore_index():
124124
# GH 34932
125125
s = pd.Series([[1, 2], [3, 4]])
126126
result = s.explode(ignore_index=True)
127-
expected = pd.Series([1, 2, 3, 4], index=[0, 1, 2, 3], dtype=object)
127+
expected = pd.Series([1, 2, 3, 4], index=[0, 1, 2, 3], dtype=np.int64)
128128
tm.assert_series_equal(result, expected)
129129

0 commit comments

Comments
 (0)