forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathv0.13.0.txt
984 lines (690 loc) · 34 KB
/
v0.13.0.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
.. _whatsnew_0130:
v0.13.0 (January 3, 2014)
---------------------------
This is a major release from 0.12.0 and includes a number of API changes, several new features and
enhancements along with a large number of bug fixes.
Highlights include:
- support for a new index type ``Float64Index``, and other Indexing enhancements
- ``HDFStore`` has a new string based syntax for query specification
- support for new methods of interpolation
- updated ``timedelta`` operations
- a new string manipulation method ``extract``
- Nanosecond support for Offsets
- ``isin`` for DataFrames
Several experimental features are added, including:
- new ``eval/query`` methods for expression evaluation
- support for ``msgpack`` serialization
- an i/o interface to Google's ``BigQuery``
Their are several new or updated docs sections including:
- :ref:`Comparison with SQL<compare_with_sql>`, which should be useful for those familiar with SQL but still learning pandas.
- :ref:`Comparison with R<compare_with_r>`, idiom translations from R to pandas.
- :ref:`Enhancing Performance<enhancingperf>`, ways to enhance pandas performance with ``eval/query``.
.. warning::
In 0.13.0 ``Series`` has internally been refactored to no longer sub-class ``ndarray``
but instead subclass ``NDFrame``, similar to the rest of the pandas containers. This should be
a transparent change with only very limited API implications. See :ref:`Internal Refactoring<whatsnew_0130.refactoring>`
API changes
~~~~~~~~~~~
- ``read_excel`` now supports an integer in its ``sheetname`` argument giving
the index of the sheet to read in (:issue:`4301`).
- Text parser now treats anything that reads like inf ("inf", "Inf", "-Inf",
"iNf", etc.) as infinity. (:issue:`4220`, :issue:`4219`), affecting
``read_table``, ``read_csv``, etc.
- ``pandas`` now is Python 2/3 compatible without the need for 2to3 thanks to
@jtratner. As a result, pandas now uses iterators more extensively. This
also led to the introduction of substantive parts of the Benjamin
Peterson's ``six`` library into compat. (:issue:`4384`, :issue:`4375`,
:issue:`4372`)
- ``pandas.util.compat`` and ``pandas.util.py3compat`` have been merged into
``pandas.compat``. ``pandas.compat`` now includes many functions allowing
2/3 compatibility. It contains both list and iterator versions of range,
filter, map and zip, plus other necessary elements for Python 3
compatibility. ``lmap``, ``lzip``, ``lrange`` and ``lfilter`` all produce
lists instead of iterators, for compatibility with ``numpy``, subscripting
and ``pandas`` constructors.(:issue:`4384`, :issue:`4375`, :issue:`4372`)
- ``Series.get`` with negative indexers now returns the same as ``[]`` (:issue:`4390`)
- Changes to how ``Index`` and ``MultiIndex`` handle metadata (``levels``,
``labels``, and ``names``) (:issue:`4039`):
.. code-block:: python
# previously, you would have set levels or labels directly
index.levels = [[1, 2, 3, 4], [1, 2, 4, 4]]
# now, you use the set_levels or set_labels methods
index = index.set_levels([[1, 2, 3, 4], [1, 2, 4, 4]])
# similarly, for names, you can rename the object
# but setting names is not deprecated
index = index.set_names(["bob", "cranberry"])
# and all methods take an inplace kwarg - but return None
index.set_names(["bob", "cranberry"], inplace=True)
- **All** division with ``NDFrame`` objects is now *truedivision*, regardless
of the future import. This means that operating on pandas objects will by default
use *floating point* division, and return a floating point dtype.
You can use ``//`` and ``floordiv`` to do integer division.
Integer division
.. code-block:: python
In [3]: arr = np.array([1, 2, 3, 4])
In [4]: arr2 = np.array([5, 3, 2, 1])
In [5]: arr / arr2
Out[5]: array([0, 0, 1, 4])
In [6]: Series(arr) // Series(arr2)
Out[6]:
0 0
1 0
2 1
3 4
dtype: int64
True Division
.. code-block:: python
In [7]: pd.Series(arr) / pd.Series(arr2) # no future import required
Out[7]:
0 0.200000
1 0.666667
2 1.500000
3 4.000000
dtype: float64
- Infer and downcast dtype if ``downcast='infer'`` is passed to ``fillna/ffill/bfill`` (:issue:`4604`)
- ``__nonzero__`` for all NDFrame objects, will now raise a ``ValueError``, this reverts back to (:issue:`1073`, :issue:`4633`)
behavior. See :ref:`gotchas<gotchas.truth>` for a more detailed discussion.
This prevents doing boolean comparison on *entire* pandas objects, which is inherently ambiguous. These all will raise a ``ValueError``.
.. code-block:: python
if df:
....
df1 and df2
s1 and s2
Added the ``.bool()`` method to ``NDFrame`` objects to facilitate evaluating of single-element boolean Series:
.. ipython:: python
Series([True]).bool()
Series([False]).bool()
DataFrame([[True]]).bool()
DataFrame([[False]]).bool()
- All non-Index NDFrames (``Series``, ``DataFrame``, ``Panel``, ``Panel4D``,
``SparsePanel``, etc.), now support the entire set of arithmetic operators
and arithmetic flex methods (add, sub, mul, etc.). ``SparsePanel`` does not
support ``pow`` or ``mod`` with non-scalars. (:issue:`3765`)
- ``Series`` and ``DataFrame`` now have a ``mode()`` method to calculate the
statistical mode(s) by axis/Series. (:issue:`5367`)
- Chained assignment will now by default warn if the user is assigning to a copy. This can be changed
with the option ``mode.chained_assignment``, allowed options are ``raise/warn/None``. See :ref:`the docs<indexing.view_versus_copy>`.
.. ipython:: python
dfc = DataFrame({'A':['aaa','bbb','ccc'],'B':[1,2,3]})
pd.set_option('chained_assignment','warn')
The following warning / exception will show if this is attempted.
.. ipython:: python
:okwarning:
dfc.loc[0]['A'] = 1111
::
Traceback (most recent call last)
...
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
Here is the correct method of assignment.
.. ipython:: python
dfc.loc[0,'A'] = 11
dfc
- ``Panel.reindex`` has the following call signature ``Panel.reindex(items=None, major_axis=None, minor_axis=None, **kwargs)``
to conform with other ``NDFrame`` objects. See :ref:`Internal Refactoring<whatsnew_0130.refactoring>` for more information.
- ``Series.argmin`` and ``Series.argmax`` are now aliased to ``Series.idxmin`` and ``Series.idxmax``. These return the *index* of the
min or max element respectively. Prior to 0.13.0 these would return the position of the min / max element. (:issue:`6214`)
Prior Version Deprecations/Changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
These were announced changes in 0.12 or prior that are taking effect as of 0.13.0
- Remove deprecated ``Factor`` (:issue:`3650`)
- Remove deprecated ``set_printoptions/reset_printoptions`` (:issue:`3046`)
- Remove deprecated ``_verbose_info`` (:issue:`3215`)
- Remove deprecated ``read_clipboard/to_clipboard/ExcelFile/ExcelWriter`` from ``pandas.io.parsers`` (:issue:`3717`)
These are available as functions in the main pandas namespace (e.g. ``pd.read_clipboard``)
- default for ``tupleize_cols`` is now ``False`` for both ``to_csv`` and ``read_csv``. Fair warning in 0.12 (:issue:`3604`)
- default for `display.max_seq_len` is now 100 rather then `None`. This activates
truncated display ("...") of long sequences in various places. (:issue:`3391`)
Deprecations
~~~~~~~~~~~~
Deprecated in 0.13.0
- deprecated ``iterkv``, which will be removed in a future release (this was
an alias of iteritems used to bypass ``2to3``'s changes).
(:issue:`4384`, :issue:`4375`, :issue:`4372`)
- deprecated the string method ``match``, whose role is now performed more
idiomatically by ``extract``. In a future release, the default behavior
of ``match`` will change to become analogous to ``contains``, which returns
a boolean indexer. (Their
distinction is strictness: ``match`` relies on ``re.match`` while
``contains`` relies on ``re.search``.) In this release, the deprecated
behavior is the default, but the new behavior is available through the
keyword argument ``as_indexer=True``.
Indexing API Changes
~~~~~~~~~~~~~~~~~~~~
Prior to 0.13, it was impossible to use a label indexer (``.loc/.ix``) to set a value that
was not contained in the index of a particular axis. (:issue:`2578`). See :ref:`the docs<indexing.basics.partial_setting>`
In the ``Series`` case this is effectively an appending operation
.. ipython:: python
s = Series([1,2,3])
s
s[5] = 5.
s
.. ipython:: python
dfi = DataFrame(np.arange(6).reshape(3,2),
columns=['A','B'])
dfi
This would previously ``KeyError``
.. ipython:: python
dfi.loc[:,'C'] = dfi.loc[:,'A']
dfi
This is like an ``append`` operation.
.. ipython:: python
dfi.loc[3] = 5
dfi
A Panel setting operation on an arbitrary axis aligns the input to the Panel
.. ipython:: python
p = pd.Panel(np.arange(16).reshape(2,4,2),
items=['Item1','Item2'],
major_axis=pd.date_range('2001/1/12',periods=4),
minor_axis=['A','B'],dtype='float64')
p
p.loc[:,:,'C'] = Series([30,32],index=p.items)
p
p.loc[:,:,'C']
Float64Index API Change
~~~~~~~~~~~~~~~~~~~~~~~
- Added a new index type, ``Float64Index``. This will be automatically created when passing floating values in index creation.
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
same. See :ref:`the docs<indexing.float64index>`, (:issue:`263`)
Construction is by default for floating type values.
.. ipython:: python
index = Index([1.5, 2, 3, 4.5, 5])
index
s = Series(range(5),index=index)
s
Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)
.. ipython:: python
s[3]
s.ix[3]
s.loc[3]
The only positional indexing is via ``iloc``
.. ipython:: python
s.iloc[3]
A scalar index that is not found will raise ``KeyError``
Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc``
.. ipython:: python
s[2:4]
s.ix[2:4]
s.loc[2:4]
s.iloc[2:4]
In float indexes, slicing using floats are allowed
.. ipython:: python
s[2.1:4.6]
s.loc[2.1:4.6]
- Indexing on other index types are preserved (and positional fallback for ``[],ix``), with the exception, that floating point slicing
on indexes on non ``Float64Index`` will now raise a ``TypeError``.
.. code-block:: python
In [1]: Series(range(5))[3.5]
TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)
In [1]: Series(range(5))[3.5:4.5]
TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)
Using a scalar float indexer will be deprecated in a future version, but is allowed for now.
.. code-block:: python
In [3]: Series(range(5))[3.0]
Out[3]: 3
HDFStore API Changes
~~~~~~~~~~~~~~~~~~~~
- Query Format Changes. A much more string-like query format is now supported. See :ref:`the docs<io.hdf5-query>`.
.. ipython:: python
path = 'test.h5'
dfq = DataFrame(randn(10,4),
columns=list('ABCD'),
index=date_range('20130101',periods=10))
dfq.to_hdf(path,'dfq',format='table',data_columns=True)
Use boolean expressions, with in-line function evaluation.
.. ipython:: python
read_hdf(path,'dfq',
where="index>Timestamp('20130104') & columns=['A', 'B']")
Use an inline column reference
.. ipython:: python
read_hdf(path,'dfq',
where="A>0 or C>0")
.. ipython:: python
:suppress:
import os
os.remove(path)
- the ``format`` keyword now replaces the ``table`` keyword; allowed values are ``fixed(f)`` or ``table(t)``
the same defaults as prior < 0.13.0 remain, e.g. ``put`` implies ``fixed`` format and ``append`` implies
``table`` format. This default format can be set as an option by setting ``io.hdf.default_format``.
.. ipython:: python
path = 'test.h5'
df = DataFrame(randn(10,2))
df.to_hdf(path,'df_table',format='table')
df.to_hdf(path,'df_table2',append=True)
df.to_hdf(path,'df_fixed')
with get_store(path) as store:
print(store)
.. ipython:: python
:suppress:
import os
os.remove(path)
- Significant table writing performance improvements
- handle a passed ``Series`` in table format (:issue:`4330`)
- can now serialize a ``timedelta64[ns]`` dtype in a table (:issue:`3577`), See :ref:`the docs<io.hdf5-timedelta>`.
- added an ``is_open`` property to indicate if the underlying file handle is_open;
a closed store will now report 'CLOSED' when viewing the store (rather than raising an error)
(:issue:`4409`)
- a close of a ``HDFStore`` now will close that instance of the ``HDFStore``
but will only close the actual file if the ref count (by ``PyTables``) w.r.t. all of the open handles
are 0. Essentially you have a local instance of ``HDFStore`` referenced by a variable. Once you
close it, it will report closed. Other references (to the same file) will continue to operate
until they themselves are closed. Performing an action on a closed file will raise
``ClosedFileError``
.. ipython:: python
path = 'test.h5'
df = DataFrame(randn(10,2))
store1 = HDFStore(path)
store2 = HDFStore(path)
store1.append('df',df)
store2.append('df2',df)
store1
store2
store1.close()
store2
store2.close()
store2
.. ipython:: python
:suppress:
import os
os.remove(path)
- removed the ``_quiet`` attribute, replace by a ``DuplicateWarning`` if retrieving
duplicate rows from a table (:issue:`4367`)
- removed the ``warn`` argument from ``open``. Instead a ``PossibleDataLossError`` exception will
be raised if you try to use ``mode='w'`` with an OPEN file handle (:issue:`4367`)
- allow a passed locations array or mask as a ``where`` condition (:issue:`4467`).
See :ref:`the docs<io.hdf5-where_mask>` for an example.
- add the keyword ``dropna=True`` to ``append`` to change whether ALL nan rows are not written
to the store (default is ``True``, ALL nan rows are NOT written), also settable
via the option ``io.hdf.dropna_table`` (:issue:`4625`)
- pass thru store creation arguments; can be used to support in-memory stores
DataFrame repr Changes
~~~~~~~~~~~~~~~~~~~~~~
The HTML and plain text representations of :class:`DataFrame` now show
a truncated view of the table once it exceeds a certain size, rather
than switching to the short info view (:issue:`4886`, :issue:`5550`).
This makes the representation more consistent as small DataFrames get
larger.
.. image:: _static/df_repr_truncated.png
:alt: Truncated HTML representation of a DataFrame
To get the info view, call :meth:`DataFrame.info`. If you prefer the
info view as the repr for large DataFrames, you can set this by running
``set_option('display.large_repr', 'info')``.
Enhancements
~~~~~~~~~~~~
- ``df.to_clipboard()`` learned a new ``excel`` keyword that let's you
paste df data directly into excel (enabled by default). (:issue:`5070`).
- ``read_html`` now raises a ``URLError`` instead of catching and raising a
``ValueError`` (:issue:`4303`, :issue:`4305`)
- Added a test for ``read_clipboard()`` and ``to_clipboard()`` (:issue:`4282`)
- Clipboard functionality now works with PySide (:issue:`4282`)
- Added a more informative error message when plot arguments contain
overlapping color and style arguments (:issue:`4402`)
- ``to_dict`` now takes ``records`` as a possible outtype. Returns an array
of column-keyed dictionaries. (:issue:`4936`)
- ``NaN`` handing in get_dummies (:issue:`4446`) with `dummy_na`
.. ipython:: python
# previously, nan was erroneously counted as 2 here
# now it is not counted at all
get_dummies([1, 2, np.nan])
# unless requested
get_dummies([1, 2, np.nan], dummy_na=True)
- ``timedelta64[ns]`` operations. See :ref:`the docs<timedeltas.timedeltas_convert>`.
.. warning::
Most of these operations require ``numpy >= 1.7``
Using the new top-level ``to_timedelta``, you can convert a scalar or array from the standard
timedelta format (produced by ``to_csv``) into a timedelta type (``np.timedelta64`` in ``nanoseconds``).
.. ipython:: python
to_timedelta('1 days 06:05:01.00003')
to_timedelta('15.5us')
to_timedelta(['1 days 06:05:01.00003','15.5us','nan'])
to_timedelta(np.arange(5),unit='s')
to_timedelta(np.arange(5),unit='d')
A Series of dtype ``timedelta64[ns]`` can now be divided by another
``timedelta64[ns]`` object, or astyped to yield a ``float64`` dtyped Series. This
is frequency conversion. See :ref:`the docs<timedeltas.timedeltas_convert>` for the docs.
.. ipython:: python
from datetime import timedelta
td = Series(date_range('20130101',periods=4))-Series(date_range('20121201',periods=4))
td[2] += np.timedelta64(timedelta(minutes=5,seconds=3))
td[3] = np.nan
td
# to days
td / np.timedelta64(1,'D')
td.astype('timedelta64[D]')
# to seconds
td / np.timedelta64(1,'s')
td.astype('timedelta64[s]')
Dividing or multiplying a ``timedelta64[ns]`` Series by an integer or integer Series
.. ipython:: python
td * -1
td * Series([1,2,3,4])
Absolute ``DateOffset`` objects can act equivalently to ``timedeltas``
.. ipython:: python
from pandas import offsets
td + offsets.Minute(5) + offsets.Milli(5)
Fillna is now supported for timedeltas
.. ipython:: python
td.fillna(0)
td.fillna(timedelta(days=1,seconds=5))
You can do numeric reduction operations on timedeltas.
.. ipython:: python
td.mean()
td.quantile(.1)
- ``plot(kind='kde')`` now accepts the optional parameters ``bw_method`` and
``ind``, passed to scipy.stats.gaussian_kde() (for scipy >= 0.11.0) to set
the bandwidth, and to gkde.evaluate() to specify the indices at which it
is evaluated, respectively. See scipy docs. (:issue:`4298`)
- DataFrame constructor now accepts a numpy masked record array (:issue:`3478`)
- The new vectorized string method ``extract`` return regular expression
matches more conveniently.
.. ipython:: python
Series(['a1', 'b2', 'c3']).str.extract('[ab](\d)')
Elements that do not match return ``NaN``. Extracting a regular expression
with more than one group returns a DataFrame with one column per group.
.. ipython:: python
Series(['a1', 'b2', 'c3']).str.extract('([ab])(\d)')
Elements that do not match return a row of ``NaN``.
Thus, a Series of messy strings can be *converted* into a
like-indexed Series or DataFrame of cleaned-up or more useful strings,
without necessitating ``get()`` to access tuples or ``re.match`` objects.
Named groups like
.. ipython:: python
Series(['a1', 'b2', 'c3']).str.extract(
'(?P<letter>[ab])(?P<digit>\d)')
and optional groups can also be used.
.. ipython:: python
Series(['a1', 'b2', '3']).str.extract(
'(?P<letter>[ab])?(?P<digit>\d)')
- ``read_stata`` now accepts Stata 13 format (:issue:`4291`)
- ``read_fwf`` now infers the column specifications from the first 100 rows of
the file if the data has correctly separated and properly aligned columns
using the delimiter provided to the function (:issue:`4488`).
- support for nanosecond times as an offset
.. warning::
These operations require ``numpy >= 1.7``
Period conversions in the range of seconds and below were reworked and extended
up to nanoseconds. Periods in the nanosecond range are now available.
.. ipython:: python
date_range('2013-01-01', periods=5, freq='5N')
or with frequency as offset
.. ipython:: python
date_range('2013-01-01', periods=5, freq=pd.offsets.Nano(5))
Timestamps can be modified in the nanosecond range
.. ipython:: python
t = Timestamp('20130101 09:01:02')
t + pd.datetools.Nano(123)
- A new method, ``isin`` for DataFrames, which plays nicely with boolean indexing. The argument to ``isin``, what we're comparing the DataFrame to, can be a DataFrame, Series, dict, or array of values. See :ref:`the docs<indexing.basics.indexing_isin>` for more.
To get the rows where any of the conditions are met:
.. ipython:: python
dfi = DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'f', 'n']})
dfi
other = DataFrame({'A': [1, 3, 3, 7], 'B': ['e', 'f', 'f', 'e']})
mask = dfi.isin(other)
mask
dfi[mask.any(1)]
- ``Series`` now supports a ``to_frame`` method to convert it to a single-column DataFrame (:issue:`5164`)
- All R datasets listed here http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html can now be loaded into Pandas objects
.. code-block:: python
# note that pandas.rpy was deprecated in v0.16.0
import pandas.rpy.common as com
com.load_data('Titanic')
- ``tz_localize`` can infer a fall daylight savings transition based on the structure
of the unlocalized data (:issue:`4230`), see :ref:`the docs<timeseries.timezone>`
- ``DatetimeIndex`` is now in the API documentation, see :ref:`the docs<api.datetimeindex>`
- :meth:`~pandas.io.json.json_normalize` is a new method to allow you to create a flat table
from semi-structured JSON data. See :ref:`the docs<io.json_normalize>` (:issue:`1067`)
- Added PySide support for the qtpandas DataFrameModel and DataFrameWidget.
- Python csv parser now supports usecols (:issue:`4335`)
- Frequencies gained several new offsets:
* ``LastWeekOfMonth`` (:issue:`4637`)
* ``FY5253``, and ``FY5253Quarter`` (:issue:`4511`)
- DataFrame has a new ``interpolate`` method, similar to Series (:issue:`4434`, :issue:`1892`)
.. ipython:: python
df = DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]})
df.interpolate()
Additionally, the ``method`` argument to ``interpolate`` has been expanded
to include ``'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'barycentric', 'krogh', 'piecewise_polynomial', 'pchip', `polynomial`, 'spline'``
The new methods require scipy_. Consult the Scipy reference guide_ and documentation_ for more information
about when the various methods are appropriate. See :ref:`the docs<missing_data.interpolate>`.
Interpolate now also accepts a ``limit`` keyword argument.
This works similar to ``fillna``'s limit:
.. ipython:: python
ser = Series([1, 3, np.nan, np.nan, np.nan, 11])
ser.interpolate(limit=2)
- Added ``wide_to_long`` panel data convenience function. See :ref:`the docs<reshaping.melt>`.
.. ipython:: python
np.random.seed(123)
df = pd.DataFrame({"A1970" : {0 : "a", 1 : "b", 2 : "c"},
"A1980" : {0 : "d", 1 : "e", 2 : "f"},
"B1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
"B1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
"X" : dict(zip(range(3), np.random.randn(3)))
})
df["id"] = df.index
df
wide_to_long(df, ["A", "B"], i="id", j="year")
.. _scipy: http://www.scipy.org
.. _documentation: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation
.. _guide: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
- ``to_csv`` now takes a ``date_format`` keyword argument that specifies how
output datetime objects should be formatted. Datetimes encountered in the
index, columns, and values will all have this formatting applied. (:issue:`4313`)
- ``DataFrame.plot`` will scatter plot x versus y by passing ``kind='scatter'`` (:issue:`2215`)
- Added support for Google Analytics v3 API segment IDs that also supports v2 IDs. (:issue:`5271`)
.. _whatsnew_0130.experimental:
Experimental
~~~~~~~~~~~~
- The new :func:`~pandas.eval` function implements expression evaluation using
``numexpr`` behind the scenes. This results in large speedups for
complicated expressions involving large DataFrames/Series. For example,
.. ipython:: python
nrows, ncols = 20000, 100
df1, df2, df3, df4 = [DataFrame(randn(nrows, ncols))
for _ in range(4)]
.. ipython:: python
# eval with NumExpr backend
%timeit pd.eval('df1 + df2 + df3 + df4')
.. ipython:: python
# pure Python evaluation
%timeit df1 + df2 + df3 + df4
For more details, see the :ref:`the docs<enhancingperf.eval>`
- Similar to ``pandas.eval``, :class:`~pandas.DataFrame` has a new
``DataFrame.eval`` method that evaluates an expression in the context of
the ``DataFrame``. For example,
.. ipython:: python
:suppress:
try:
del a
except NameError:
pass
try:
del b
except NameError:
pass
.. ipython:: python
df = DataFrame(randn(10, 2), columns=['a', 'b'])
df.eval('a + b')
- :meth:`~pandas.DataFrame.query` method has been added that allows
you to select elements of a ``DataFrame`` using a natural query syntax
nearly identical to Python syntax. For example,
.. ipython:: python
:suppress:
try:
del a
except NameError:
pass
try:
del b
except NameError:
pass
try:
del c
except NameError:
pass
.. ipython:: python
n = 20
df = DataFrame(np.random.randint(n, size=(n, 3)), columns=['a', 'b', 'c'])
df.query('a < b < c')
selects all the rows of ``df`` where ``a < b < c`` evaluates to ``True``.
For more details see the :ref:`the docs<indexing.query>`.
- ``pd.read_msgpack()`` and ``pd.to_msgpack()`` are now a supported method of serialization
of arbitrary pandas (and python objects) in a lightweight portable binary format. See :ref:`the docs<io.msgpack>`
.. warning::
Since this is an EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.
.. ipython:: python
df = DataFrame(np.random.rand(5,2),columns=list('AB'))
df.to_msgpack('foo.msg')
pd.read_msgpack('foo.msg')
s = Series(np.random.rand(5),index=date_range('20130101',periods=5))
pd.to_msgpack('foo.msg', df, s)
pd.read_msgpack('foo.msg')
You can pass ``iterator=True`` to iterator over the unpacked results
.. ipython:: python
for o in pd.read_msgpack('foo.msg',iterator=True):
print o
.. ipython:: python
:suppress:
:okexcept:
os.remove('foo.msg')
- ``pandas.io.gbq`` provides a simple way to extract from, and load data into,
Google's BigQuery Data Sets by way of pandas DataFrames. BigQuery is a high
performance SQL-like database service, useful for performing ad-hoc queries
against extremely large datasets. :ref:`See the docs <io.bigquery>`
.. code-block:: python
from pandas.io import gbq
# A query to select the average monthly temperatures in the
# in the year 2000 across the USA. The dataset,
# publicata:samples.gsod, is available on all BigQuery accounts,
# and is based on NOAA gsod data.
query = """SELECT station_number as STATION,
month as MONTH, AVG(mean_temp) as MEAN_TEMP
FROM publicdata:samples.gsod
WHERE YEAR = 2000
GROUP BY STATION, MONTH
ORDER BY STATION, MONTH ASC"""
# Fetch the result set for this query
# Your Google BigQuery Project ID
# To find this, see your dashboard:
# https://code.google.com/apis/console/b/0/?noredirect
projectid = xxxxxxxxx;
df = gbq.read_gbq(query, project_id = projectid)
# Use pandas to process and reshape the dataset
df2 = df.pivot(index='STATION', columns='MONTH', values='MEAN_TEMP')
df3 = pandas.concat([df2.min(), df2.mean(), df2.max()],
axis=1,keys=["Min Tem", "Mean Temp", "Max Temp"])
The resulting DataFrame is::
> df3
Min Tem Mean Temp Max Temp
MONTH
1 -53.336667 39.827892 89.770968
2 -49.837500 43.685219 93.437932
3 -77.926087 48.708355 96.099998
4 -82.892858 55.070087 97.317240
5 -92.378261 61.428117 102.042856
6 -77.703334 65.858888 102.900000
7 -87.821428 68.169663 106.510714
8 -89.431999 68.614215 105.500000
9 -86.611112 63.436935 107.142856
10 -78.209677 56.880838 92.103333
11 -50.125000 48.861228 94.996428
12 -50.332258 42.286879 94.396774
.. warning::
To use this module, you will need a BigQuery account. See
<https://cloud.google.com/products/big-query> for details.
As of 10/10/13, there is a bug in Google's API preventing result sets
from being larger than 100,000 rows. A patch is scheduled for the week of
10/14/13.
.. _whatsnew_0130.refactoring:
Internal Refactoring
~~~~~~~~~~~~~~~~~~~~
In 0.13.0 there is a major refactor primarily to subclass ``Series`` from
``NDFrame``, which is the base class currently for ``DataFrame`` and ``Panel``,
to unify methods and behaviors. Series formerly subclassed directly from
``ndarray``. (:issue:`4080`, :issue:`3862`, :issue:`816`)
.. warning::
There are two potential incompatibilities from < 0.13.0
- Using certain numpy functions would previously return a ``Series`` if passed a ``Series``
as an argument. This seems only to affect ``np.ones_like``, ``np.empty_like``,
``np.diff`` and ``np.where``. These now return ``ndarrays``.
.. ipython:: python
s = Series([1,2,3,4])
Numpy Usage
.. ipython:: python
np.ones_like(s)
np.diff(s)
np.where(s>1,s,np.nan)
Pandonic Usage
.. ipython:: python
Series(1,index=s.index)
s.diff()
s.where(s>1)
- Passing a ``Series`` directly to a cython function expecting an ``ndarray`` type will no
long work directly, you must pass ``Series.values``, See :ref:`Enhancing Performance<enhancingperf.ndarray>`
- ``Series(0.5)`` would previously return the scalar ``0.5``, instead this will return a 1-element ``Series``
- This change breaks ``rpy2<=2.3.8``. an Issue has been opened against rpy2 and a workaround
is detailed in :issue:`5698`. Thanks @JanSchulz.
- Pickle compatibility is preserved for pickles created prior to 0.13. These must be unpickled with ``pd.read_pickle``, see :ref:`Pickling<io.pickle>`.
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
- added ``_setup_axes`` to created generic NDFrame structures
- moved methods
- ``from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop``
- ``__iter__,keys,__contains__,__len__,__neg__,__invert__``
- ``convert_objects,as_blocks,as_matrix,values``
- ``__getstate__,__setstate__`` (compat remains in frame/panel)
- ``__getattr__,__setattr__``
- ``_indexed_same,reindex_like,align,where,mask``
- ``fillna,replace`` (``Series`` replace is now consistent with ``DataFrame``)
- ``filter`` (also added axis argument to selectively filter on a different axis)
- ``reindex,reindex_axis,take``
- ``truncate`` (moved to become part of ``NDFrame``)
- These are API changes which make ``Panel`` more consistent with ``DataFrame``
- ``swapaxes`` on a ``Panel`` with the same axes specified now return a copy
- support attribute access for setting
- filter supports the same API as the original ``DataFrame`` filter
- Reindex called with no arguments will now return a copy of the input object
- ``TimeSeries`` is now an alias for ``Series``. the property ``is_time_series``
can be used to distinguish (if desired)
- Refactor of Sparse objects to use BlockManager
- Created a new block type in internals, ``SparseBlock``, which can hold multi-dtypes
and is non-consolidatable. ``SparseSeries`` and ``SparseDataFrame`` now inherit
more methods from there hierarchy (Series/DataFrame), and no longer inherit
from ``SparseArray`` (which instead is the object of the ``SparseBlock``)
- Sparse suite now supports integration with non-sparse data. Non-float sparse
data is supportable (partially implemented)
- Operations on sparse structures within DataFrames should preserve sparseness,
merging type operations will convert to dense (and back to sparse), so might
be somewhat inefficient
- enable setitem on ``SparseSeries`` for boolean/integer/slices
- ``SparsePanels`` implementation is unchanged (e.g. not using BlockManager, needs work)
- added ``ftypes`` method to Series/DataFrame, similar to ``dtypes``, but indicates
if the underlying is sparse/dense (as well as the dtype)
- All ``NDFrame`` objects can now use ``__finalize__()`` to specify various
values to propagate to new objects from an existing one (e.g. ``name`` in ``Series`` will
follow more automatically now)
- Internal type checking is now done via a suite of generated classes, allowing ``isinstance(value, klass)``
without having to directly import the klass, courtesy of @jtratner
- Bug in Series update where the parent frame is not updating its cache based on
changes (:issue:`4080`) or types (:issue:`3217`), fillna (:issue:`3386`)
- Indexing with dtype conversions fixed (:issue:`4463`, :issue:`4204`)
- Refactor ``Series.reindex`` to core/generic.py (:issue:`4604`, :issue:`4618`), allow ``method=`` in reindexing
on a Series to work
- ``Series.copy`` no longer accepts the ``order`` parameter and is now consistent with ``NDFrame`` copy
- Refactor ``rename`` methods to core/generic.py; fixes ``Series.rename`` for (:issue:`4605`), and adds ``rename``
with the same signature for ``Panel``
- Refactor ``clip`` methods to core/generic.py (:issue:`4798`)
- Refactor of ``_get_numeric_data/_get_bool_data`` to core/generic.py, allowing Series/Panel functionality
- ``Series`` (for index) / ``Panel`` (for items) now allow attribute access to its elements (:issue:`1903`)
.. ipython:: python
s = Series([1,2,3],index=list('abc'))
s.b
s.a = 5
s
Bug Fixes
~~~~~~~~~
See :ref:`V0.13.0 Bug Fixes<release.bug_fixes-0.13.0>` for an extensive list of bugs that have been fixed in 0.13.0.
See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list of all API changes, Enhancements and Bug Fixes.