Skip to content

Commit a5ca359

Browse files
authored
Merge branch 'master' into gh15077
2 parents dc0803b + 97fd744 commit a5ca359

20 files changed

+328
-132
lines changed

ci/requirements-3.4_SLOW.build

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
python-dateutil
22
pytz
3-
numpy=1.9.3
3+
numpy=1.10*
44
cython

ci/requirements-3.4_SLOW.run

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
python-dateutil
22
pytz
3-
numpy=1.9.3
3+
numpy=1.10*
44
openpyxl
55
xlsxwriter
66
xlrd

ci/requirements-3.4_SLOW.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ source activate pandas
44

55
echo "install 34_slow"
66

7-
conda install -n pandas -c conda-forge/label/rc -c conda-forge matplotlib
7+
conda install -n pandas -c conda-forge matplotlib

doc/source/groupby.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -810,7 +810,7 @@ next). This enables some operations to be carried out rather succinctly:
810810
tsdf = pd.DataFrame(np.random.randn(1000, 3),
811811
index=pd.date_range('1/1/2000', periods=1000),
812812
columns=['A', 'B', 'C'])
813-
tsdf.ix[::2] = np.nan
813+
tsdf.iloc[::2] = np.nan
814814
grouped = tsdf.groupby(lambda x: x.year)
815815
grouped.fillna(method='pad')
816816

doc/source/indexing.rst

+9-4
Original file line numberDiff line numberDiff line change
@@ -557,13 +557,18 @@ IX Indexer is Deprecated
557557

558558
.. warning::
559559

560-
Starting in 0.20.0, the ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc``
561-
and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to
562-
do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*. This has caused
563-
quite a bit of user confusion over the years.
560+
Starting in 0.20.0, the ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc``
561+
and ``.loc`` indexers.
562+
563+
``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide
564+
to index *positionally* OR via *labels* depending on the data type of the index. This has caused quite a
565+
bit of user confusion over the years.
564566

565567
The recommended methods of indexing are:
566568

569+
- ``.loc`` if you want to *label* index
570+
- ``.iloc`` if you want to *positionally* index.
571+
567572
.. ipython:: python
568573
569574
dfd = pd.DataFrame({'A': [1, 2, 3],

doc/source/io.rst

+57-89
Original file line numberDiff line numberDiff line change
@@ -357,94 +357,6 @@ warn_bad_lines : boolean, default ``True``
357357
If error_bad_lines is ``False``, and warn_bad_lines is ``True``, a warning for
358358
each "bad line" will be output (only valid with C parser).
359359

360-
.. ipython:: python
361-
:suppress:
362-
363-
f = open('foo.csv','w')
364-
f.write('date,A,B,C\n20090101,a,1,2\n20090102,b,3,4\n20090103,c,4,5')
365-
f.close()
366-
367-
Consider a typical CSV file containing, in this case, some time series data:
368-
369-
.. ipython:: python
370-
371-
print(open('foo.csv').read())
372-
373-
The default for `read_csv` is to create a DataFrame with simple numbered rows:
374-
375-
.. ipython:: python
376-
377-
pd.read_csv('foo.csv')
378-
379-
In the case of indexed data, you can pass the column number or column name you
380-
wish to use as the index:
381-
382-
.. ipython:: python
383-
384-
pd.read_csv('foo.csv', index_col=0)
385-
386-
.. ipython:: python
387-
388-
pd.read_csv('foo.csv', index_col='date')
389-
390-
You can also use a list of columns to create a hierarchical index:
391-
392-
.. ipython:: python
393-
394-
pd.read_csv('foo.csv', index_col=[0, 'A'])
395-
396-
.. _io.dialect:
397-
398-
The ``dialect`` keyword gives greater flexibility in specifying the file format.
399-
By default it uses the Excel dialect but you can specify either the dialect name
400-
or a :class:`python:csv.Dialect` instance.
401-
402-
.. ipython:: python
403-
:suppress:
404-
405-
data = ('label1,label2,label3\n'
406-
'index1,"a,c,e\n'
407-
'index2,b,d,f')
408-
409-
Suppose you had data with unenclosed quotes:
410-
411-
.. ipython:: python
412-
413-
print(data)
414-
415-
By default, ``read_csv`` uses the Excel dialect and treats the double quote as
416-
the quote character, which causes it to fail when it finds a newline before it
417-
finds the closing double quote.
418-
419-
We can get around this using ``dialect``
420-
421-
.. ipython:: python
422-
423-
dia = csv.excel()
424-
dia.quoting = csv.QUOTE_NONE
425-
pd.read_csv(StringIO(data), dialect=dia)
426-
427-
All of the dialect options can be specified separately by keyword arguments:
428-
429-
.. ipython:: python
430-
431-
data = 'a,b,c~1,2,3~4,5,6'
432-
pd.read_csv(StringIO(data), lineterminator='~')
433-
434-
Another common dialect option is ``skipinitialspace``, to skip any whitespace
435-
after a delimiter:
436-
437-
.. ipython:: python
438-
439-
data = 'a, b, c\n1, 2, 3\n4, 5, 6'
440-
print(data)
441-
pd.read_csv(StringIO(data), skipinitialspace=True)
442-
443-
The parsers make every attempt to "do the right thing" and not be very
444-
fragile. Type inference is a pretty big deal. So if a column can be coerced to
445-
integer dtype without altering the contents, it will do so. Any non-numeric
446-
columns will come through as object dtype as with the rest of pandas objects.
447-
448360
.. _io.dtypes:
449361

450362
Specifying column data types
@@ -1238,6 +1150,62 @@ data that appear in some lines but not others:
12381150
1 4 5 6
12391151
2 8 9 10
12401152
1153+
.. _io.dialect:
1154+
1155+
Dialect
1156+
'''''''
1157+
1158+
The ``dialect`` keyword gives greater flexibility in specifying the file format.
1159+
By default it uses the Excel dialect but you can specify either the dialect name
1160+
or a :class:`python:csv.Dialect` instance.
1161+
1162+
.. ipython:: python
1163+
:suppress:
1164+
1165+
data = ('label1,label2,label3\n'
1166+
'index1,"a,c,e\n'
1167+
'index2,b,d,f')
1168+
1169+
Suppose you had data with unenclosed quotes:
1170+
1171+
.. ipython:: python
1172+
1173+
print(data)
1174+
1175+
By default, ``read_csv`` uses the Excel dialect and treats the double quote as
1176+
the quote character, which causes it to fail when it finds a newline before it
1177+
finds the closing double quote.
1178+
1179+
We can get around this using ``dialect``
1180+
1181+
.. ipython:: python
1182+
:okwarning:
1183+
1184+
dia = csv.excel()
1185+
dia.quoting = csv.QUOTE_NONE
1186+
pd.read_csv(StringIO(data), dialect=dia)
1187+
1188+
All of the dialect options can be specified separately by keyword arguments:
1189+
1190+
.. ipython:: python
1191+
1192+
data = 'a,b,c~1,2,3~4,5,6'
1193+
pd.read_csv(StringIO(data), lineterminator='~')
1194+
1195+
Another common dialect option is ``skipinitialspace``, to skip any whitespace
1196+
after a delimiter:
1197+
1198+
.. ipython:: python
1199+
1200+
data = 'a, b, c\n1, 2, 3\n4, 5, 6'
1201+
print(data)
1202+
pd.read_csv(StringIO(data), skipinitialspace=True)
1203+
1204+
The parsers make every attempt to "do the right thing" and not be very
1205+
fragile. Type inference is a pretty big deal. So if a column can be coerced to
1206+
integer dtype without altering the contents, it will do so. Any non-numeric
1207+
columns will come through as object dtype as with the rest of pandas objects.
1208+
12411209
.. _io.quoting:
12421210

12431211
Quoting and Escape Characters
@@ -1400,7 +1368,7 @@ returned object:
14001368
14011369
df = pd.read_csv("data/mindex_ex.csv", index_col=[0,1])
14021370
df
1403-
df.iloc[1978]
1371+
df.loc[1978]
14041372
14051373
.. _io.multi_index_columns:
14061374

doc/source/timedeltas.rst

+14
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,20 @@ similarly to the ``Series``. These are the *displayed* values of the ``Timedelta
310310
td.dt.components
311311
td.dt.components.seconds
312312
313+
.. _timedeltas.isoformat:
314+
315+
You can convert a ``Timedelta`` to an ISO 8601 Duration string with the
316+
``.isoformat`` method
317+
318+
.. versionadded:: 0.20.0
319+
320+
.. ipython:: python
321+
pd.Timedelta(days=6, minutes=50, seconds=3,
322+
milliseconds=10, microseconds=10,
323+
nanoseconds=12).isoformat()
324+
325+
.. _ISO 8601 duration: https://en.wikipedia.org/wiki/ISO_8601#Durations
326+
313327
.. _timedeltas.index:
314328

315329
TimedeltaIndex

doc/source/whatsnew/v0.20.0.txt

+13-3
Original file line numberDiff line numberDiff line change
@@ -133,11 +133,14 @@ Other enhancements
133133
- The ``skiprows`` argument in ``pd.read_csv`` now accepts a callable function as a value (:issue:`10882`)
134134
- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`)
135135
- ``pd.Series.interpolate`` now supports timedelta as an index type with ``method='time'`` (:issue:`6424`)
136+
- ``Timedelta.isoformat`` method added for formatting Timedeltas as an `ISO 8601 duration`_. See the :ref:`Timedelta docs <timedeltas.isoformat>` (:issue:`15136`)
136137
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
137138

138139
- ``.select_dtypes()`` now allows the string 'datetimetz' to generically select datetimes with tz (:issue:`14910`)
139140
- ``pd.merge_asof()`` gained the option ``direction='backward'|'forward'|'nearest'`` (:issue:`14887`)
140141

142+
.. _ISO 8601 duration: https://en.wikipedia.org/wiki/ISO_8601#Durations
143+
141144

142145
.. _whatsnew_0200.api_breaking:
143146

@@ -149,7 +152,7 @@ Backwards incompatible API changes
149152
Deprecate .ix
150153
^^^^^^^^^^^^^
151154

152-
The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*. This has caused quite a bit of user confusion over the years. The full indexing documentation are :ref:`here <indexing>`. (:issue:`14218`)
155+
The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide to index *positionally* OR via *labels*, depending on the data type of the index. This has caused quite a bit of user confusion over the years. The full indexing documentation are :ref:`here <indexing>`. (:issue:`14218`)
153156

154157

155158
The recommended methods of indexing are:
@@ -388,10 +391,11 @@ Bug Fixes
388391

389392
- Bug in compat for passing long integers to ``Timestamp.replace`` (:issue:`15030`)
390393
- Bug in ``.loc`` that would not return the correct dtype for scalar access for a DataFrame (:issue:`11617`)
394+
- Bug in ``GroupBy.get_group()`` failing with a categorical grouper (:issue:`15155`)
391395

392396

393397

394-
398+
- Bug in ``.groupby(...).rolling(...)`` when ``on`` is specified and using a ``DatetimeIndex`` (:issue:`15130`)
395399

396400

397401

@@ -435,4 +439,10 @@ Bug Fixes
435439
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`)
436440
- Incorrect dtyped ``Series`` was returned by comparison methods (e.g., ``lt``, ``gt``, ...) against a constant for an empty ``DataFrame`` (:issue:`15077`)
437441
- Bug in ``Series.dt.round`` inconsistent behaviour on NAT's with different arguments (:issue:`14940`)
438-
- Bug in ``.read_json()`` for Python 2 where ``lines=True`` and contents contain non-ascii unicode characters (:issue:`15132`)
442+
443+
444+
- Bug in ``.read_json()`` for Python 2 where ``lines=True`` and contents contain non-ascii unicode characters (:issue:`15132`)
445+
446+
- Bug in ``pd.read_csv()`` with ``float_precision='round_trip'`` which caused a segfault when a text entry is parsed (:issue:`15140`)
447+
448+
- Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`)

pandas/core/window.py

+38-13
Original file line numberDiff line numberDiff line change
@@ -1025,19 +1025,8 @@ def validate(self):
10251025
if (self.is_datetimelike and
10261026
isinstance(self.window, (compat.string_types, DateOffset))):
10271027

1028-
# must be monotonic for on
1029-
if not self._on.is_monotonic:
1030-
formatted = self.on or 'index'
1031-
raise ValueError("{0} must be "
1032-
"monotonic".format(formatted))
1033-
1034-
from pandas.tseries.frequencies import to_offset
1035-
try:
1036-
freq = to_offset(self.window)
1037-
except (TypeError, ValueError):
1038-
raise ValueError("passed window {0} in not "
1039-
"compat with a datetimelike "
1040-
"index".format(self.window))
1028+
self._validate_monotonic()
1029+
freq = self._validate_freq()
10411030

10421031
# we don't allow center
10431032
if self.center:
@@ -1058,6 +1047,23 @@ def validate(self):
10581047
elif self.window < 0:
10591048
raise ValueError("window must be non-negative")
10601049

1050+
def _validate_monotonic(self):
1051+
""" validate on is monotonic """
1052+
if not self._on.is_monotonic:
1053+
formatted = self.on or 'index'
1054+
raise ValueError("{0} must be "
1055+
"monotonic".format(formatted))
1056+
1057+
def _validate_freq(self):
1058+
""" validate & return our freq """
1059+
from pandas.tseries.frequencies import to_offset
1060+
try:
1061+
return to_offset(self.window)
1062+
except (TypeError, ValueError):
1063+
raise ValueError("passed window {0} in not "
1064+
"compat with a datetimelike "
1065+
"index".format(self.window))
1066+
10611067
@Substitution(name='rolling')
10621068
@Appender(SelectionMixin._see_also_template)
10631069
@Appender(SelectionMixin._agg_doc)
@@ -1175,6 +1181,25 @@ class RollingGroupby(_GroupByMixin, Rolling):
11751181
def _constructor(self):
11761182
return Rolling
11771183

1184+
def _gotitem(self, key, ndim, subset=None):
1185+
1186+
# we are setting the index on the actual object
1187+
# here so our index is carried thru to the selected obj
1188+
# when we do the splitting for the groupby
1189+
if self.on is not None:
1190+
self._groupby.obj = self._groupby.obj.set_index(self._on)
1191+
self.on = None
1192+
return super(RollingGroupby, self)._gotitem(key, ndim, subset=subset)
1193+
1194+
def _validate_monotonic(self):
1195+
"""
1196+
validate that on is monotonic;
1197+
we don't care for groupby.rolling
1198+
because we have already validated at a higher
1199+
level
1200+
"""
1201+
pass
1202+
11781203

11791204
class Expanding(_Rolling_and_Expanding):
11801205
"""

pandas/indexes/category.py

+3
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,9 @@ def categories(self):
255255
def ordered(self):
256256
return self._data.ordered
257257

258+
def _reverse_indexer(self):
259+
return self._data._reverse_indexer()
260+
258261
def __contains__(self, key):
259262
hash(key)
260263
return key in self.values

pandas/io/tests/parser/c_parser_only.py

+7
Original file line numberDiff line numberDiff line change
@@ -388,3 +388,10 @@ def test_read_nrows_large(self):
388388
df = self.read_csv(StringIO(test_input), sep='\t', nrows=1010)
389389

390390
self.assertTrue(df.size == 1010 * 10)
391+
392+
def test_float_precision_round_trip_with_text(self):
393+
# gh-15140 - This should not segfault on Python 2.7+
394+
df = self.read_csv(StringIO('a'),
395+
float_precision='round_trip',
396+
header=None)
397+
tm.assert_frame_equal(df, DataFrame({0: ['a']}))

0 commit comments

Comments
 (0)