Skip to content

Commit 577b329

Browse files
Merge remote-tracking branch 'upstream/main' into regr-concat-empty-2
2 parents 5ed7dad + 89578fe commit 577b329

File tree

101 files changed

+1756
-484
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

101 files changed

+1756
-484
lines changed

.github/actions/build_pandas/action.yml

+3-1
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,6 @@ runs:
1717
shell: bash -el {0}
1818
env:
1919
# Cannot use parallel compilation on Windows, see https://github.com/pandas-dev/pandas/issues/30873
20-
N_JOBS: ${{ runner.os == 'Windows' && 1 || 2 }}
20+
# GH 47305: Parallel build causes flaky ImportError: /home/runner/work/pandas/pandas/pandas/_libs/tslibs/timestamps.cpython-38-x86_64-linux-gnu.so: undefined symbol: pandas_datetime_to_datetimestruct
21+
N_JOBS: 1
22+
#N_JOBS: ${{ runner.os == 'Windows' && 1 || 2 }}

.github/actions/run-tests/action.yml

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: Run tests and report results
2+
runs:
3+
using: composite
4+
steps:
5+
- name: Test
6+
run: ci/run_tests.sh
7+
shell: bash -el {0}
8+
9+
- name: Publish test results
10+
uses: actions/upload-artifact@v2
11+
with:
12+
name: Test results
13+
path: test-data.xml
14+
if: failure()
15+
16+
- name: Report Coverage
17+
run: coverage report -m
18+
shell: bash -el {0}
19+
if: failure()
20+
21+
- name: Upload coverage to Codecov
22+
uses: codecov/codecov-action@v2
23+
with:
24+
flags: unittests
25+
name: codecov-pandas
26+
fail_ci_if_error: false
27+
if: failure()

.github/workflows/macos-windows.yml

+1-15
Original file line numberDiff line numberDiff line change
@@ -53,18 +53,4 @@ jobs:
5353
uses: ./.github/actions/build_pandas
5454

5555
- name: Test
56-
run: ci/run_tests.sh
57-
58-
- name: Publish test results
59-
uses: actions/upload-artifact@v3
60-
with:
61-
name: Test results
62-
path: test-data.xml
63-
if: failure()
64-
65-
- name: Upload coverage to Codecov
66-
uses: codecov/codecov-action@v2
67-
with:
68-
flags: unittests
69-
name: codecov-pandas
70-
fail_ci_if_error: false
56+
uses: ./.github/actions/run-tests

.github/workflows/posix.yml

+1-18
Original file line numberDiff line numberDiff line change
@@ -157,23 +157,6 @@ jobs:
157157
uses: ./.github/actions/build_pandas
158158

159159
- name: Test
160-
run: ci/run_tests.sh
160+
uses: ./.github/actions/run-tests
161161
# TODO: Don't continue on error for PyPy
162162
continue-on-error: ${{ env.IS_PYPY == 'true' }}
163-
164-
- name: Build Version
165-
run: conda list
166-
167-
- name: Publish test results
168-
uses: actions/upload-artifact@v3
169-
with:
170-
name: Test results
171-
path: test-data.xml
172-
if: failure()
173-
174-
- name: Upload coverage to Codecov
175-
uses: codecov/codecov-action@v2
176-
with:
177-
flags: unittests
178-
name: codecov-pandas
179-
fail_ci_if_error: false

.github/workflows/python-dev.yml

+10-30
Original file line numberDiff line numberDiff line change
@@ -57,40 +57,20 @@ jobs:
5757
- name: Install dependencies
5858
shell: bash -el {0}
5959
run: |
60-
python -m pip install --upgrade pip setuptools wheel
61-
pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
62-
pip install git+https://github.com/nedbat/coveragepy.git
63-
pip install cython python-dateutil pytz hypothesis pytest>=6.2.5 pytest-xdist pytest-cov
64-
pip list
60+
python3 -m pip install --upgrade pip setuptools wheel
61+
python3 -m pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
62+
python3 -m pip install git+https://github.com/nedbat/coveragepy.git
63+
python3 -m pip install cython python-dateutil pytz hypothesis pytest>=6.2.5 pytest-xdist pytest-cov pytest-asyncio>=0.17
64+
python3 -m pip list
6565
6666
- name: Build Pandas
6767
run: |
68-
python setup.py build_ext -q -j2
69-
python -m pip install -e . --no-build-isolation --no-use-pep517
68+
python3 setup.py build_ext -q -j2
69+
python3 -m pip install -e . --no-build-isolation --no-use-pep517
7070
7171
- name: Build Version
7272
run: |
73-
python -c "import pandas; pandas.show_versions();"
73+
python3 -c "import pandas; pandas.show_versions();"
7474
75-
- name: Test with pytest
76-
shell: bash -el {0}
77-
run: |
78-
ci/run_tests.sh
79-
80-
- name: Publish test results
81-
uses: actions/upload-artifact@v3
82-
with:
83-
name: Test results
84-
path: test-data.xml
85-
if: failure()
86-
87-
- name: Report Coverage
88-
run: |
89-
coverage report -m
90-
91-
- name: Upload coverage to Codecov
92-
uses: codecov/codecov-action@v2
93-
with:
94-
flags: unittests
95-
name: codecov-pandas
96-
fail_ci_if_error: true
75+
- name: Test
76+
uses: ./.github/actions/run-tests

doc/source/reference/frame.rst

+1
Original file line numberDiff line numberDiff line change
@@ -373,6 +373,7 @@ Serialization / IO / conversion
373373

374374
DataFrame.from_dict
375375
DataFrame.from_records
376+
DataFrame.to_orc
376377
DataFrame.to_parquet
377378
DataFrame.to_pickle
378379
DataFrame.to_csv

doc/source/reference/io.rst

+1
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,7 @@ ORC
159159
:toctree: api/
160160

161161
read_orc
162+
DataFrame.to_orc
162163

163164
SAS
164165
~~~

doc/source/reference/testing.rst

+2
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Exceptions and warnings
3030
errors.DtypeWarning
3131
errors.DuplicateLabelError
3232
errors.EmptyDataError
33+
errors.IndexingError
3334
errors.InvalidIndexError
3435
errors.IntCastingNaNError
3536
errors.MergeError
@@ -45,6 +46,7 @@ Exceptions and warnings
4546
errors.SettingWithCopyError
4647
errors.SettingWithCopyWarning
4748
errors.SpecificationError
49+
errors.UndefinedVariableError
4850
errors.UnsortedIndexError
4951
errors.UnsupportedFunctionCall
5052

doc/source/user_guide/io.rst

+55-4
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
3030
binary;`HDF5 Format <https://support.hdfgroup.org/HDF5/whatishdf5.html>`__;:ref:`read_hdf<io.hdf5>`;:ref:`to_hdf<io.hdf5>`
3131
binary;`Feather Format <https://github.com/wesm/feather>`__;:ref:`read_feather<io.feather>`;:ref:`to_feather<io.feather>`
3232
binary;`Parquet Format <https://parquet.apache.org/>`__;:ref:`read_parquet<io.parquet>`;:ref:`to_parquet<io.parquet>`
33-
binary;`ORC Format <https://orc.apache.org/>`__;:ref:`read_orc<io.orc>`;
33+
binary;`ORC Format <https://orc.apache.org/>`__;:ref:`read_orc<io.orc>`;:ref:`to_orc<io.orc>`
3434
binary;`Stata <https://en.wikipedia.org/wiki/Stata>`__;:ref:`read_stata<io.stata_reader>`;:ref:`to_stata<io.stata_writer>`
3535
binary;`SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__;:ref:`read_sas<io.sas_reader>`;
3636
binary;`SPSS <https://en.wikipedia.org/wiki/SPSS>`__;:ref:`read_spss<io.spss_reader>`;
@@ -5562,13 +5562,64 @@ ORC
55625562
.. versionadded:: 1.0.0
55635563

55645564
Similar to the :ref:`parquet <io.parquet>` format, the `ORC Format <https://orc.apache.org/>`__ is a binary columnar serialization
5565-
for data frames. It is designed to make reading data frames efficient. pandas provides *only* a reader for the
5566-
ORC format, :func:`~pandas.read_orc`. This requires the `pyarrow <https://arrow.apache.org/docs/python/>`__ library.
5565+
for data frames. It is designed to make reading data frames efficient. pandas provides both the reader and the writer for the
5566+
ORC format, :func:`~pandas.read_orc` and :func:`~pandas.DataFrame.to_orc`. This requires the `pyarrow <https://arrow.apache.org/docs/python/>`__ library.
55675567

55685568
.. warning::
55695569

55705570
* It is *highly recommended* to install pyarrow using conda due to some issues occurred by pyarrow.
5571-
* :func:`~pandas.read_orc` is not supported on Windows yet, you can find valid environments on :ref:`install optional dependencies <install.warn_orc>`.
5571+
* :func:`~pandas.DataFrame.to_orc` requires pyarrow>=7.0.0.
5572+
* :func:`~pandas.read_orc` and :func:`~pandas.DataFrame.to_orc` are not supported on Windows yet, you can find valid environments on :ref:`install optional dependencies <install.warn_orc>`.
5573+
* For supported dtypes please refer to `supported ORC features in Arrow <https://arrow.apache.org/docs/cpp/orc.html#data-types>`__.
5574+
* Currently timezones in datetime columns are not preserved when a dataframe is converted into ORC files.
5575+
5576+
.. ipython:: python
5577+
5578+
df = pd.DataFrame(
5579+
{
5580+
"a": list("abc"),
5581+
"b": list(range(1, 4)),
5582+
"c": np.arange(4.0, 7.0, dtype="float64"),
5583+
"d": [True, False, True],
5584+
"e": pd.date_range("20130101", periods=3),
5585+
}
5586+
)
5587+
5588+
df
5589+
df.dtypes
5590+
5591+
Write to an orc file.
5592+
5593+
.. ipython:: python
5594+
:okwarning:
5595+
5596+
df.to_orc("example_pa.orc", engine="pyarrow")
5597+
5598+
Read from an orc file.
5599+
5600+
.. ipython:: python
5601+
:okwarning:
5602+
5603+
result = pd.read_orc("example_pa.orc")
5604+
5605+
result.dtypes
5606+
5607+
Read only certain columns of an orc file.
5608+
5609+
.. ipython:: python
5610+
5611+
result = pd.read_orc(
5612+
"example_pa.orc",
5613+
columns=["a", "b"],
5614+
)
5615+
result.dtypes
5616+
5617+
5618+
.. ipython:: python
5619+
:suppress:
5620+
5621+
os.remove("example_pa.orc")
5622+
55725623
55735624
.. _io.sql:
55745625

doc/source/whatsnew/v1.4.3.rst

+4
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,19 @@ including other versions of pandas.
1515
Fixed regressions
1616
~~~~~~~~~~~~~~~~~
1717
- Fixed regression in :meth:`DataFrame.replace` when the replacement value was explicitly ``None`` when passed in a dictionary to ``to_replace`` also casting other columns to object dtype even when there were no values to replace (:issue:`46634`)
18+
- Fixed regression in :meth:`DataFrame.to_csv` raising error when :class:`DataFrame` contains extension dtype categorical column (:issue:`46297`, :issue:`46812`)
19+
- Fixed regression in representation of ``dtypes`` attribute of :class:`MultiIndex` (:issue:`46900`)
1820
- Fixed regression when setting values with :meth:`DataFrame.loc` updating :class:`RangeIndex` when index was set as new column and column was updated afterwards (:issue:`47128`)
1921
- Fixed regression in :meth:`DataFrame.nsmallest` led to wrong results when ``np.nan`` in the sorting column (:issue:`46589`)
2022
- Fixed regression in :func:`read_fwf` raising ``ValueError`` when ``widths`` was specified with ``usecols`` (:issue:`46580`)
2123
- Fixed regression in :func:`concat` not sorting columns for mixed column names (:issue:`47127`)
2224
- Fixed regression in :meth:`.Groupby.transform` and :meth:`.Groupby.agg` failing with ``engine="numba"`` when the index was a :class:`MultiIndex` (:issue:`46867`)
25+
- Fixed regression in ``NaN`` comparison for :class:`Index` operations where the same object was compared (:issue:`47105`)
2326
- Fixed regression is :meth:`.Styler.to_latex` and :meth:`.Styler.to_html` where ``buf`` failed in combination with ``encoding`` (:issue:`47053`)
2427
- Fixed regression in :func:`read_csv` with ``index_col=False`` identifying first row as index names when ``header=None`` (:issue:`46955`)
2528
- Fixed regression in :meth:`.DataFrameGroupBy.agg` when used with list-likes or dict-likes and ``axis=1`` that would give incorrect results; now raises ``NotImplementedError`` (:issue:`46995`)
2629
- Fixed regression in :meth:`DataFrame.resample` and :meth:`DataFrame.rolling` when used with list-likes or dict-likes and ``axis=1`` that would raise an unintuitive error message; now raises ``NotImplementedError`` (:issue:`46904`)
30+
- Fixed regression in :func:`assert_index_equal` when ``check_order=False`` and :class:`Index` has extension or object dtype (:issue:`47207`)
2731
- Fixed regression in :func:`read_excel` returning ints as floats on certain input sheets (:issue:`46988`)
2832
- Fixed regression in :meth:`DataFrame.shift` when ``axis`` is ``columns`` and ``fill_value`` is absent, ``freq`` is ignored (:issue:`47039`)
2933

doc/source/whatsnew/v1.5.0.rst

+26-1
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,28 @@ as seen in the following example.
100100
1 2021-01-02 08:00:00 4
101101
2 2021-01-02 16:00:00 5
102102
103+
.. _whatsnew_150.enhancements.orc:
104+
105+
Writing to ORC files
106+
^^^^^^^^^^^^^^^^^^^^
107+
108+
The new method :meth:`DataFrame.to_orc` allows writing to ORC files (:issue:`43864`).
109+
110+
This functionality depends the `pyarrow <http://arrow.apache.org/docs/python/>`__ library. For more details, see :ref:`the IO docs on ORC <io.orc>`.
111+
112+
.. warning::
113+
114+
* It is *highly recommended* to install pyarrow using conda due to some issues occurred by pyarrow.
115+
* :func:`~pandas.DataFrame.to_orc` requires pyarrow>=7.0.0.
116+
* :func:`~pandas.DataFrame.to_orc` is not supported on Windows yet, you can find valid environments on :ref:`install optional dependencies <install.warn_orc>`.
117+
* For supported dtypes please refer to `supported ORC features in Arrow <https://arrow.apache.org/docs/cpp/orc.html#data-types>`__.
118+
* Currently timezones in datetime columns are not preserved when a dataframe is converted into ORC files.
119+
120+
.. code-block:: python
121+
122+
df = pd.DataFrame(data={"col1": [1, 2], "col2": [3, 4]})
123+
df.to_orc("./out.orc")
124+
103125
.. _whatsnew_150.enhancements.tar:
104126

105127
Reading directly from TAR archives
@@ -152,8 +174,9 @@ Other enhancements
152174
- A :class:`errors.PerformanceWarning` is now thrown when using ``string[pyarrow]`` dtype with methods that don't dispatch to ``pyarrow.compute`` methods (:issue:`42613`)
153175
- Added ``numeric_only`` argument to :meth:`Resampler.sum`, :meth:`Resampler.prod`, :meth:`Resampler.min`, :meth:`Resampler.max`, :meth:`Resampler.first`, and :meth:`Resampler.last` (:issue:`46442`)
154176
- ``times`` argument in :class:`.ExponentialMovingWindow` now accepts ``np.timedelta64`` (:issue:`47003`)
155-
- :class:`DataError`, :class:`SpecificationError`, :class:`SettingWithCopyError`, :class:`SettingWithCopyWarning`, and :class:`NumExprClobberingError` are now exposed in ``pandas.errors`` (:issue:`27656`)
177+
- :class:`DataError`, :class:`SpecificationError`, :class:`SettingWithCopyError`, :class:`SettingWithCopyWarning`, :class:`NumExprClobberingError`, :class:`UndefinedVariableError`, and :class:`IndexingError` are now exposed in ``pandas.errors`` (:issue:`27656`)
156178
- Added ``check_like`` argument to :func:`testing.assert_series_equal` (:issue:`47247`)
179+
- Allow reading compressed SAS files with :func:`read_sas` (e.g., ``.sas7bdat.gz`` files)
157180

158181
.. ---------------------------------------------------------------------------
159182
.. _whatsnew_150.notable_bug_fixes:
@@ -850,6 +873,7 @@ I/O
850873
- Bug in :func:`read_sas` returned ``None`` rather than an empty DataFrame for SAS7BDAT files with zero rows (:issue:`18198`)
851874
- Bug in :class:`StataWriter` where value labels were always written with default encoding (:issue:`46750`)
852875
- Bug in :class:`StataWriterUTF8` where some valid characters were removed from variable names (:issue:`47276`)
876+
- Bug in :meth:`DataFrame.to_excel` when writing an empty dataframe with :class:`MultiIndex` (:issue:`19543`)
853877

854878
Period
855879
^^^^^^
@@ -902,6 +926,7 @@ Reshaping
902926
- Bug in :func:`get_dummies` that selected object and categorical dtypes but not string (:issue:`44965`)
903927
- Bug in :meth:`DataFrame.align` when aligning a :class:`MultiIndex` to a :class:`Series` with another :class:`MultiIndex` (:issue:`46001`)
904928
- Bug in concanenation with ``IntegerDtype``, or ``FloatingDtype`` arrays where the resulting dtype did not mirror the behavior of the non-nullable dtypes (:issue:`46379`)
929+
- Bug in :func:`concat` not sorting the column names when ``None`` is included (:issue:`47331`)
905930
- Bug in :func:`concat` with identical key leads to error when indexing :class:`MultiIndex` (:issue:`46519`)
906931
- Bug in :meth:`DataFrame.join` with a list when using suffixes to join DataFrames with duplicate column names (:issue:`46396`)
907932
- Bug in :meth:`DataFrame.pivot_table` with ``sort=False`` results in sorted index (:issue:`17041`)

pandas/_libs/tslib.pyi

+1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ def format_array_from_datetime(
99
tz: tzinfo | None = ...,
1010
format: str | None = ...,
1111
na_rep: object = ...,
12+
reso: int = ..., # NPY_DATETIMEUNIT
1213
) -> npt.NDArray[np.object_]: ...
1314
def array_with_unit_to_datetime(
1415
values: np.ndarray,

pandas/_libs/tslib.pyx

+8-5
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,12 @@ import pytz
2828

2929
from pandas._libs.tslibs.np_datetime cimport (
3030
NPY_DATETIMEUNIT,
31+
NPY_FR_ns,
3132
check_dts_bounds,
32-
dt64_to_dtstruct,
3333
dtstruct_to_dt64,
3434
get_datetime64_value,
3535
npy_datetimestruct,
36+
pandas_datetime_to_datetimestruct,
3637
pydate_to_dt64,
3738
pydatetime_to_dt64,
3839
string_to_dts,
@@ -107,7 +108,8 @@ def format_array_from_datetime(
107108
ndarray[int64_t] values,
108109
tzinfo tz=None,
109110
str format=None,
110-
object na_rep=None
111+
object na_rep=None,
112+
NPY_DATETIMEUNIT reso=NPY_FR_ns,
111113
) -> np.ndarray:
112114
"""
113115
return a np object array of the string formatted values
@@ -120,6 +122,7 @@ def format_array_from_datetime(
120122
a strftime capable string
121123
na_rep : optional, default is None
122124
a nat format
125+
reso : NPY_DATETIMEUNIT, default NPY_FR_ns
123126

124127
Returns
125128
-------
@@ -141,7 +144,7 @@ def format_array_from_datetime(
141144
# a format based on precision
142145
basic_format = format is None and tz is None
143146
if basic_format:
144-
reso_obj = get_resolution(values)
147+
reso_obj = get_resolution(values, reso=reso)
145148
show_ns = reso_obj == Resolution.RESO_NS
146149
show_us = reso_obj == Resolution.RESO_US
147150
show_ms = reso_obj == Resolution.RESO_MS
@@ -153,7 +156,7 @@ def format_array_from_datetime(
153156
result[i] = na_rep
154157
elif basic_format:
155158

156-
dt64_to_dtstruct(val, &dts)
159+
pandas_datetime_to_datetimestruct(val, reso, &dts)
157160
res = (f'{dts.year}-{dts.month:02d}-{dts.day:02d} '
158161
f'{dts.hour:02d}:{dts.min:02d}:{dts.sec:02d}')
159162

@@ -169,7 +172,7 @@ def format_array_from_datetime(
169172

170173
else:
171174

172-
ts = Timestamp(val, tz=tz)
175+
ts = Timestamp._from_value_and_reso(val, reso=reso, tz=tz)
173176
if format is None:
174177
result[i] = str(ts)
175178
else:

pandas/_libs/tslibs/ccalendar.pxd

-2
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,6 @@ cpdef int32_t get_day_of_year(int year, int month, int day) nogil
1515
cpdef int get_lastbday(int year, int month) nogil
1616
cpdef int get_firstbday(int year, int month) nogil
1717

18-
cdef int64_t DAY_NANOS
19-
cdef int64_t HOUR_NANOS
2018
cdef dict c_MONTH_NUMBERS
2119

2220
cdef int32_t* month_offset

0 commit comments

Comments
 (0)