Skip to content

Commit 2828ccb

Browse files
authored
Merge branch 'main' into main
2 parents bf830f5 + d1ec1a4 commit 2828ccb

File tree

10 files changed

+149
-97
lines changed

10 files changed

+149
-97
lines changed

doc/source/development/contributing_codebase.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ In some cases you may be tempted to use ``cast`` from the typing module when you
198198
obj = cast(str, obj) # Mypy complains without this!
199199
return obj.upper()
200200
201-
The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 <https://github.com/python/mypy/issues/5206>`_. While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable
201+
The limitation here is that while a human can reasonably understand that ``is_number`` would catch the ``int`` and ``float`` types mypy cannot make that same inference just yet (see `mypy #5206 <https://github.com/python/mypy/issues/5206>`_). While the above works, the use of ``cast`` is **strongly discouraged**. Where applicable a refactor of the code to appease static analysis is preferable
202202

203203
.. code-block:: python
204204

doc/source/getting_started/intro_tutorials/03_subset_data.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ the name ``anonymous`` to the first 3 elements of the fourth column:
335335
.. ipython:: python
336336
337337
titanic.iloc[0:3, 3] = "anonymous"
338-
titanic.head()
338+
titanic.iloc[:5, 3]
339339
340340
.. raw:: html
341341

doc/source/reference/arrays.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ is an :class:`ArrowDtype`.
6161
support as NumPy including first-class nullability support for all data types, immutability and more.
6262

6363
The table below shows the equivalent pyarrow-backed (``pa``), pandas extension, and numpy (``np``) types that are recognized by pandas.
64-
Pyarrow-backed types below need to be passed into :class:`ArrowDtype` to be recognized by pandas e.g. ``pd.ArrowDtype(pa.bool_())``
64+
Pyarrow-backed types below need to be passed into :class:`ArrowDtype` to be recognized by pandas e.g. ``pd.ArrowDtype(pa.bool_())``.
6565

6666
=============================================== ========================== ===================
6767
PyArrow type pandas extension type NumPy type
@@ -114,7 +114,7 @@ values.
114114

115115
ArrowDtype
116116

117-
For more information, please see the :ref:`PyArrow user guide <pyarrow>`
117+
For more information, please see the :ref:`PyArrow user guide <pyarrow>`.
118118

119119
.. _api.arrays.datetime:
120120

@@ -495,7 +495,7 @@ a :class:`CategoricalDtype`.
495495
CategoricalDtype.categories
496496
CategoricalDtype.ordered
497497

498-
Categorical data can be stored in a :class:`pandas.Categorical`
498+
Categorical data can be stored in a :class:`pandas.Categorical`:
499499

500500
.. autosummary::
501501
:toctree: api/

doc/source/user_guide/text.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Text data types
1313

1414
There are two ways to store text data in pandas:
1515

16-
1. ``object`` -dtype NumPy array.
16+
1. ``object`` dtype NumPy array.
1717
2. :class:`StringDtype` extension type.
1818

1919
We recommend using :class:`StringDtype` to store text data.
@@ -40,20 +40,20 @@ to significantly increase the performance and lower the memory overhead of
4040
and parts of the API may change without warning.
4141

4242
For backwards-compatibility, ``object`` dtype remains the default type we
43-
infer a list of strings to
43+
infer a list of strings to:
4444

4545
.. ipython:: python
4646
4747
pd.Series(["a", "b", "c"])
4848
49-
To explicitly request ``string`` dtype, specify the ``dtype``
49+
To explicitly request ``string`` dtype, specify the ``dtype``:
5050

5151
.. ipython:: python
5252
5353
pd.Series(["a", "b", "c"], dtype="string")
5454
pd.Series(["a", "b", "c"], dtype=pd.StringDtype())
5555
56-
Or ``astype`` after the ``Series`` or ``DataFrame`` is created
56+
Or ``astype`` after the ``Series`` or ``DataFrame`` is created:
5757

5858
.. ipython:: python
5959
@@ -88,7 +88,7 @@ Behavior differences
8888
^^^^^^^^^^^^^^^^^^^^
8989

9090
These are places where the behavior of ``StringDtype`` objects differ from
91-
``object`` dtype
91+
``object`` dtype:
9292

9393
l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
9494
that return **numeric** output will always return a nullable integer dtype,
@@ -102,7 +102,7 @@ l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
102102
s.str.count("a")
103103
s.dropna().str.count("a")
104104
105-
Both outputs are ``Int64`` dtype. Compare that with object-dtype
105+
Both outputs are ``Int64`` dtype. Compare that with object-dtype:
106106

107107
.. ipython:: python
108108

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -791,6 +791,7 @@ ExtensionArray
791791
^^^^^^^^^^^^^^
792792
- Bug in :class:`Categorical` when constructing with an :class:`Index` with :class:`ArrowDtype` (:issue:`60563`)
793793
- Bug in :meth:`.arrays.ArrowExtensionArray.__setitem__` which caused wrong behavior when using an integer array with repeated values as a key (:issue:`58530`)
794+
- Bug in :meth:`ArrowExtensionArray.factorize` where NA values were dropped when input was dictionary-encoded even when dropna was set to False(:issue:`60567`)
794795
- Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`)
795796
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
796797
- Bug in constructing pandas data structures when passing into ``dtype`` a string of the type followed by ``[pyarrow]`` while PyArrow is not installed would raise ``NameError`` rather than ``ImportError`` (:issue:`57928`)

pandas/core/arrays/arrow/array.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1208,7 +1208,12 @@ def factorize(
12081208
data = data.cast(pa.int64())
12091209

12101210
if pa.types.is_dictionary(data.type):
1211-
encoded = data
1211+
if null_encoding == "encode":
1212+
# dictionary encode does nothing if an already encoded array is given
1213+
data = data.cast(data.type.value_type)
1214+
encoded = data.dictionary_encode(null_encoding=null_encoding)
1215+
else:
1216+
encoded = data
12121217
else:
12131218
encoded = data.dictionary_encode(null_encoding=null_encoding)
12141219
if encoded.length() == 0:

pandas/core/generic.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2788,7 +2788,7 @@ def to_sql(
27882788
con,
27892789
*,
27902790
schema: str | None = None,
2791-
if_exists: Literal["fail", "replace", "append"] = "fail",
2791+
if_exists: Literal["fail", "replace", "append", "delete_rows"] = "fail",
27922792
index: bool = True,
27932793
index_label: IndexLabel | None = None,
27942794
chunksize: int | None = None,
@@ -2825,12 +2825,13 @@ def to_sql(
28252825
schema : str, optional
28262826
Specify the schema (if database flavor supports this). If None, use
28272827
default schema.
2828-
if_exists : {'fail', 'replace', 'append'}, default 'fail'
2828+
if_exists : {'fail', 'replace', 'append', 'delete_rows'}, default 'fail'
28292829
How to behave if the table already exists.
28302830
28312831
* fail: Raise a ValueError.
28322832
* replace: Drop the table before inserting new values.
28332833
* append: Insert new values to the existing table.
2834+
* delete_rows: If a table exists, delete all records and insert data.
28342835
28352836
index : bool, default True
28362837
Write DataFrame index as a column. Uses `index_label` as the column
@@ -2947,6 +2948,16 @@ def to_sql(
29472948
... conn.execute(text("SELECT * FROM users")).fetchall()
29482949
[(0, 'User 6'), (1, 'User 7')]
29492950
2951+
Delete all rows before inserting new records with ``df3``
2952+
2953+
>>> df3 = pd.DataFrame({"name": ['User 8', 'User 9']})
2954+
>>> df3.to_sql(name='users', con=engine, if_exists='delete_rows',
2955+
... index_label='id')
2956+
2
2957+
>>> with engine.connect() as conn:
2958+
... conn.execute(text("SELECT * FROM users")).fetchall()
2959+
[(0, 'User 8'), (1, 'User 9')]
2960+
29502961
Use ``method`` to define a callable insertion method to do nothing
29512962
if there's a primary key conflict on a table in a PostgreSQL database.
29522963
@@ -6267,6 +6278,11 @@ def astype(
62676278
"""
62686279
Cast a pandas object to a specified dtype ``dtype``.
62696280
6281+
This method allows the conversion of the data types of pandas objects,
6282+
including DataFrames and Series, to the specified dtype. It supports casting
6283+
entire objects to a single data type or applying different data types to
6284+
individual columns using a mapping.
6285+
62706286
Parameters
62716287
----------
62726288
dtype : str, data type, Series or Mapping of column name -> data type

pandas/tests/extension/test_arrow.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3329,6 +3329,18 @@ def test_factorize_chunked_dictionary():
33293329
tm.assert_index_equal(res_uniques, exp_uniques)
33303330

33313331

3332+
def test_factorize_dictionary_with_na():
3333+
# GH#60567
3334+
arr = pd.array(
3335+
["a1", pd.NA], dtype=ArrowDtype(pa.dictionary(pa.int32(), pa.utf8()))
3336+
)
3337+
indices, uniques = arr.factorize(use_na_sentinel=False)
3338+
expected_indices = np.array([0, 1], dtype=np.intp)
3339+
expected_uniques = pd.array(["a1", None], dtype=ArrowDtype(pa.string()))
3340+
tm.assert_numpy_array_equal(indices, expected_indices)
3341+
tm.assert_extension_array_equal(uniques, expected_uniques)
3342+
3343+
33323344
def test_dictionary_astype_categorical():
33333345
# GH#56672
33343346
arrs = [

pandas/tests/io/test_sql.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4282,11 +4282,11 @@ def test_xsqlite_execute_fail(sqlite_buildin):
42824282
cur.execute(create_sql)
42834283

42844284
with sql.pandasSQL_builder(sqlite_buildin) as pandas_sql:
4285-
pandas_sql.execute('INSERT INTO test VALUES("foo", "bar", 1.234)')
4286-
pandas_sql.execute('INSERT INTO test VALUES("foo", "baz", 2.567)')
4285+
pandas_sql.execute("INSERT INTO test VALUES('foo', 'bar', 1.234)")
4286+
pandas_sql.execute("INSERT INTO test VALUES('foo', 'baz', 2.567)")
42874287

42884288
with pytest.raises(sql.DatabaseError, match="Execution failed on sql"):
4289-
pandas_sql.execute('INSERT INTO test VALUES("foo", "bar", 7)')
4289+
pandas_sql.execute("INSERT INTO test VALUES('foo', 'bar', 7)")
42904290

42914291

42924292
def test_xsqlite_execute_closed_connection():
@@ -4304,7 +4304,7 @@ def test_xsqlite_execute_closed_connection():
43044304
cur.execute(create_sql)
43054305

43064306
with sql.pandasSQL_builder(conn) as pandas_sql:
4307-
pandas_sql.execute('INSERT INTO test VALUES("foo", "bar", 1.234)')
4307+
pandas_sql.execute("INSERT INTO test VALUES('foo', 'bar', 1.234)")
43084308

43094309
msg = "Cannot operate on a closed database."
43104310
with pytest.raises(sqlite3.ProgrammingError, match=msg):

0 commit comments

Comments
 (0)