Skip to content

Commit 4f62b99

Browse files
committed
Fix for issue pandas-dev#11317
This includes updates to 3 Excel files, plus a test in test_excel.py, plus the fix in parsers.py issue when read_html with previous fix With read_html, the fix didn't work on Python 2.7. Handle the string conversion correctly Add bug fixed to what's new Revert "Add bug fixed to what's new" This reverts commit 05b2344. Revert "issue when read_html with previous fix" This reverts commit d1bc296. Add what's new to describe bug. fix issue with original fix Added text to describe the bug. Fixed issue so that it works correctly in Python 2.7 Add round trip test Added round trip test and fixed error in writing sheets when merge_cells=false and columns have multi index DEPR: deprecate pandas.io.ga, pandas-dev#11308 DEPR: deprecate engine keyword from to_csv pandas-dev#11274 remove warnings from the tests for deprecation of engine in to_csv PERF: Checking monotonic-ness before sorting on an index pandas-dev#11080 BUG: Bug in list-like indexing with a mixed-integer Index, pandas-dev#11320 Add hex color strings test CLN: GH11271 move _get_handle, UTF encoders to io.common TST: tests for list skiprows in read_excel BUG: Fix to_dict() problem when using only datetime pandas-dev#11247 Fix a bug where to_dict() does not return Timestamp when there is only datetime dtype present. Undo change for when columns are multiindex There is still something wrong here in the format of the file when there are multiindex columns, but that's for another day Fix formatting in test_excel and remove spurious test See title BUG: bug in comparisons vs tuples, pandas-dev#11339 bug#10442 : fix, adding note and test BUG pandas-dev#10442(test) : Convert datetimelike index to strings with astype(str) BUG#10422: note added bug#10442 : tests added bug#10442 : note udated BUG pandas-dev#10442(test) : Convert datetimelike index to strings with astype(str) bug#10442: fix, adding note and test bug#10442: fix, adding note and test Adjust test so that merge_cells=False works correctly Adjust the test so that if merge_cells=false, it does a proper formatting of the columns in the single row header, and puts the row header in the first row Fix test for Python 2.7 and 3.5 The test is failing on Python 2.7 and 3.5, which appears to read in the values as floats, and I cannot replicate. So force the tests to pass by just making the column names equal when merge_cells=False Fix for openpyxl < 2, and for issue pandas-dev#11408 If using openpyxl < 2, and value is a string that could be a number, force a string to be written out. If using openpyxl >= 2.2, then fix issue pandas-dev#11408 to do with merging cells Use set_value_explicit instead of set_explicit_value set_value_explicit is in openpyxl 1.6, changed in openpyxl 1.8, but there is code in 1.8 to set set_value_explicit to set_explicit_value for compatibility Add line in whatsnew for issue 11408 ENH: added capability to handle Path/LocalPath objects, pandas-dev#11033 DOC: typo in whatsnew/0.17.1.txt PERF: Release GIL on some datetime ops BUG: Bug in DataFrame.replace with a datetime64[ns, tz] and a non-compat to_replace pandas-dev#11326 CLN: clean up internal impl of fillna/replace, xref pandas-dev#11153 PERF: fast inf checking in to_excel PERF: Series.dropna with non-nan dtypes fixed pathlib tests on windows DEPR: remove some SparsePanel deprecation warnings in testing DEPR: avoid numpy comparison to None warnings API: indexing with a null key will raise a TypeError rather than a ValueError, pandas-dev#11356 WARN: elementwise comparisons with index names, xref pandas-dev#11162 DEPR warning in io/data.py w.r.t. order->sort_values WARN: more elementwise comparisons to object WARN: more uncomparables of numeric array vs object BUG: quick fix for pandas-dev#10989 TST: add test case from Issue pandas-dev#10989 API: add _to_safe_for_reshape to allow safe insert/append with embedded CategoricalIndexes Signed-off-by: Jeff Reback <[email protected]> BLD: conda Revert "BLD: conda" This reverts commit 0c8a8e1. TST: remove invalid symbol warnings TST: move some tests to slow TST: fix some warnings filters TST: import pandas_datareader, use for tests TST: remove some deprecation warnings from imports DEPR: fix VisibleDeprecationWarnings in sparse TST: remove some warnings in test_nanops ENH: Improve the error message in to_gbq when the DataFrame schema does not match pandas-dev#11359 add libgfortran to 1.8.1 build binstar -> anaconda remove link to issue 11328 in whatsnew Fixes to document issue in code, small efficiency fix Try to resolve rebase conflict in whats new
1 parent 3914e0f commit 4f62b99

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+1664
-816
lines changed

asv_bench/asv.conf.json

+1
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
"numexpr": [],
4444
"pytables": [],
4545
"openpyxl": [],
46+
"xlsxwriter": [],
4647
"xlrd": [],
4748
"xlwt": []
4849
},

asv_bench/benchmarks/frame_methods.py

+10
Original file line numberDiff line numberDiff line change
@@ -930,6 +930,16 @@ def time_frame_xs_row(self):
930930
self.df.xs(50000)
931931

932932

933+
class frame_sort_index(object):
934+
goal_time = 0.2
935+
936+
def setup(self):
937+
self.df = DataFrame(randn(1000000, 2), columns=list('AB'))
938+
939+
def time_frame_sort_index(self):
940+
self.df.sort_index()
941+
942+
933943
class series_string_vector_slice(object):
934944
goal_time = 0.2
935945

asv_bench/benchmarks/gil.py

+46
Original file line numberDiff line numberDiff line change
@@ -320,3 +320,49 @@ def time_nogil_kth_smallest(self):
320320
def run(arr):
321321
algos.kth_smallest(arr, self.k)
322322
run()
323+
324+
class nogil_datetime_fields(object):
325+
goal_time = 0.2
326+
327+
def setup(self):
328+
self.N = 100000000
329+
self.dti = pd.date_range('1900-01-01', periods=self.N, freq='D')
330+
self.period = self.dti.to_period('D')
331+
if (not have_real_test_parallel):
332+
raise NotImplementedError
333+
334+
def time_datetime_field_year(self):
335+
@test_parallel(num_threads=2)
336+
def run(dti):
337+
dti.year
338+
run(self.dti)
339+
340+
def time_datetime_field_day(self):
341+
@test_parallel(num_threads=2)
342+
def run(dti):
343+
dti.day
344+
run(self.dti)
345+
346+
def time_datetime_field_daysinmonth(self):
347+
@test_parallel(num_threads=2)
348+
def run(dti):
349+
dti.days_in_month
350+
run(self.dti)
351+
352+
def time_datetime_field_normalize(self):
353+
@test_parallel(num_threads=2)
354+
def run(dti):
355+
dti.normalize()
356+
run(self.dti)
357+
358+
def time_datetime_to_period(self):
359+
@test_parallel(num_threads=2)
360+
def run(dti):
361+
dti.to_period('S')
362+
run(self.dti)
363+
364+
def time_period_to_datetime(self):
365+
@test_parallel(num_threads=2)
366+
def run(period):
367+
period.to_timestamp()
368+
run(self.period)

asv_bench/benchmarks/series_methods.py

+20
Original file line numberDiff line numberDiff line change
@@ -71,3 +71,23 @@ def setup(self):
7171
def time_series_nsmallest2(self):
7272
self.s2.nsmallest(3, take_last=True)
7373
self.s2.nsmallest(3, take_last=False)
74+
75+
76+
class series_dropna_int64(object):
77+
goal_time = 0.2
78+
79+
def setup(self):
80+
self.s = Series(np.random.randint(1, 10, 1000000))
81+
82+
def time_series_dropna_int64(self):
83+
self.s.dropna()
84+
85+
class series_dropna_datetime(object):
86+
goal_time = 0.2
87+
88+
def setup(self):
89+
self.s = Series(pd.date_range('2000-01-01', freq='S', periods=1000000))
90+
self.s[np.random.randint(1, 1000000, 100)] = pd.NaT
91+
92+
def time_series_dropna_datetime(self):
93+
self.s.dropna()

ci/install_conda.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ bash miniconda.sh -b -p $HOME/miniconda || exit 1
7373
conda config --set always_yes yes --set changeps1 no || exit 1
7474
conda update -q conda || exit 1
7575
conda config --add channels conda-forge || exit 1
76-
conda config --add channels http://conda.binstar.org/pandas || exit 1
76+
conda config --add channels http://conda.anaconda.org/pandas || exit 1
7777
conda config --set ssl_verify false || exit 1
7878

7979
# Useful for debugging any issues with conda

ci/requirements-2.7.pip

+2
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,5 @@ blosc
22
httplib2
33
google-api-python-client == 1.2
44
python-gflags == 2.0
5+
pathlib
6+
py

ci/requirements-2.7_SLOW.pip

Whitespace-only changes.

ci/requirements-3.4.build

+1
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@ python-dateutil
22
pytz
33
numpy=1.8.1
44
cython
5+
libgfortran

doc/source/conf.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -299,8 +299,9 @@
299299
intersphinx_mapping = {
300300
'statsmodels': ('http://statsmodels.sourceforge.net/devel/', None),
301301
'matplotlib': ('http://matplotlib.org/', None),
302-
'python': ('http://docs.python.org/', None),
303-
'numpy': ('http://docs.scipy.org/doc/numpy', None)
302+
'python': ('http://docs.python.org/3', None),
303+
'numpy': ('http://docs.scipy.org/doc/numpy', None),
304+
'py': ('http://pylib.readthedocs.org/en/latest/', None)
304305
}
305306
import glob
306307
autosummary_generate = glob.glob("*.rst")

doc/source/io.rst

+3-2
Original file line numberDiff line numberDiff line change
@@ -79,9 +79,10 @@ for some advanced strategies
7979

8080
They can take a number of arguments:
8181

82-
- ``filepath_or_buffer``: Either a string path to a file, URL
82+
- ``filepath_or_buffer``: Either a path to a file (a :class:`python:str`,
83+
:class:`python:pathlib.Path`, or :class:`py:py._path.local.LocalPath`), URL
8384
(including http, ftp, and S3 locations), or any object with a ``read``
84-
method (such as an open file or ``StringIO``).
85+
method (such as an open file or :class:`~python:io.StringIO`).
8586
- ``sep`` or ``delimiter``: A delimiter / separator to split fields
8687
on. With ``sep=None``, ``read_csv`` will try to infer the delimiter
8788
automatically in some cases by "sniffing".

doc/source/whatsnew/v0.17.1.txt

+33-1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Highlights include:
1717

1818
Enhancements
1919
~~~~~~~~~~~~
20+
- ``DatetimeIndex`` now supports conversion to strings with astype(str)(:issue:`10442`)
2021

2122
- Support for ``compression`` (gzip/bz2) in :method:`DataFrame.to_csv` (:issue:`7615`)
2223

@@ -27,6 +28,10 @@ Enhancements
2728
Other Enhancements
2829
^^^^^^^^^^^^^^^^^^
2930

31+
- ``pd.read_*`` functions can now also accept :class:`python:pathlib.Path`, or :class:`py:py._path.local.LocalPath`
32+
objects for the ``filepath_or_buffer`` argument. (:issue:`11033`)
33+
- Improve the error message displayed in :func:`pandas.io.gbq.to_gbq` when the DataFrame does not match the schema of the destination table (:issue:`11359`)
34+
3035
.. _whatsnew_0171.api:
3136

3237
API changes
@@ -37,17 +42,31 @@ API changes
3742
- Regression from 0.16.2 for output formatting of long floats/nan, restored in (:issue:`11302`)
3843
- Prettyprinting sets (e.g. in DataFrame cells) now uses set literal syntax (``{x, y}``) instead of
3944
Legacy Python syntax (``set([x, y])``) (:issue:`11215`)
45+
- Indexing with a null key will raise a ``TypeError``, instead of a ``ValueError`` (:issue:`11356`)
4046

4147
.. _whatsnew_0171.deprecations:
4248

4349
Deprecations
4450
^^^^^^^^^^^^
4551

52+
- The ``pandas.io.ga`` module which implements ``google-analytics`` support is deprecated and will be removed in a future version (:issue:`11308`)
53+
- Deprecate the ``engine`` keyword from ``.to_csv()``, which will be removed in a future version (:issue:`11274`)
54+
55+
4656
.. _whatsnew_0171.performance:
4757

4858
Performance Improvements
4959
~~~~~~~~~~~~~~~~~~~~~~~~
5060

61+
- Checking monotonic-ness before sorting on an index (:issue:`11080`)
62+
- ``Series.dropna`` performance improvement when its dtype can't contain ``NaN`` (:issue:`11159`)
63+
64+
65+
- Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`)
66+
67+
68+
- Improved performance to ``to_excel`` (:issue:`11352`)
69+
5170
.. _whatsnew_0171.bug_fixes:
5271

5372
Bug Fixes
@@ -58,13 +77,19 @@ Bug Fixes
5877

5978
- Bug in ``HDFStore.select`` when comparing with a numpy scalar in a where clause (:issue:`11283`)
6079

61-
- Bug in tz-conversions with an ambiguous time and ``.dt`` accessors (:issues:`11295`)
80+
81+
- Bug in tz-conversions with an ambiguous time and ``.dt`` accessors (:issue:`11295`)
82+
- Bug in comparisons of Series vs list-likes (:issue:`11339`)
6283

6384

85+
- Bug in ``DataFrame.replace`` with a ``datetime64[ns, tz]`` and a non-compat to_replace (:issue:`11326`, :issue:`11153`)
6486

6587

6688

89+
- Bug in list-like indexing with a mixed-integer Index (:issue:`11320`)
6790

91+
- Bug in ``pivot_table`` with ``margins=True`` when indexes are of ``Categorical`` dtype (:issue:`10993`)
92+
- Bug in ``DataFrame.plot`` cannot use hex strings colors (:issue:`10299`)
6893

6994

7095

@@ -88,5 +113,12 @@ Bug Fixes
88113

89114

90115
- Bugs in ``to_excel`` with duplicate columns (:issue:`11007`, :issue:`10982`, :issue:`10970`)
116+
91117
- Fixed a bug that prevented the construction of an empty series of dtype
92118
``datetime64[ns, tz]`` (:issue:`11245`).
119+
120+
- Bug in ``read_excel`` with multi-index containing integers (:issue:`11317`)
121+
122+
- Bug in ``to_excel`` with openpyxl 2.2+ and merging (:issue:`11408`)
123+
124+
- Bug in ``DataFrame.to_dict()`` produces a ``np.datetime64`` object instead of ``Timestamp`` when only datetime is present in data (:issue:`11327`)

0 commit comments

Comments
 (0)