forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathv0.10.1.txt
135 lines (93 loc) · 4.43 KB
/
v0.10.1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
.. _whatsnew_0101:
v0.10.1 (January ??, 2013)
---------------------------
This is a minor release from 0.10.0 and includes many new features and
enhancements along with a large number of bug fixes. There are also a number of
important API changes that long-time pandas users should pay close attention
to.
API changes
~~~~~~~~~~~
New features
~~~~~~~~~~~~
HDFStore
~~~~~~~~
You may need to upgrade your existing data files. Please visit the **compatibility** section in the main docs.
.. ipython:: python
:suppress:
:okexcept:
os.remove('store.h5')
You can designate (and index) certain columns that you want to be able to perform queries on a table, by passing a list to ``data_columns``
.. ipython:: python
store = HDFStore('store.h5')
df = DataFrame(randn(8, 3), index=date_range('1/1/2000', periods=8),
columns=['A', 'B', 'C'])
df['string'] = 'foo'
df.ix[4:6,'string'] = np.nan
df.ix[7:9,'string'] = 'bar'
df['string2'] = 'cool'
df
# on-disk operations
store.append('df', df, data_columns = ['B','C','string','string2'])
store.select('df',[ 'B > 0', 'string == foo' ])
# this is in-memory version of this type of selection
df[(df.B > 0) & (df.string == 'foo')]
Retrieving unique values in an indexable or data column.
.. ipython:: python
store.unique('df','index')
store.unique('df','string')
You can now store ``datetime64`` in data columns
.. ipython:: python
df_mixed = df.copy()
df_mixed['datetime64'] = Timestamp('20010102')
df_mixed.ix[3:4,['A','B']] = np.nan
store.append('df_mixed', df_mixed)
df_mixed1 = store.select('df_mixed')
df_mixed1
df_mixed1.get_dtype_counts()
You can pass ``columns`` keyword to select to filter a list of the return columns, this is equivalent to passing a ``Term('columns',list_of_columns_to_filter)``
.. ipython:: python
store.select('df',columns = ['A','B'])
``HDFStore`` now serializes multi-index dataframes when appending tables.
.. ipython:: python
index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
['one', 'two', 'three']],
labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
[0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
names=['foo', 'bar'])
df = DataFrame(np.random.randn(10, 3), index=index,
columns=['A', 'B', 'C'])
df
store.append('mi',df)
store.select('mi')
# the levels are automatically included as data columns
store.select('mi', Term('foo=bar'))
Multi-table creation via ``append_to_multiple`` and selection via ``select_as_multiple`` can create/select from multiple tables and return a combined result, by using ``where`` on a selector table.
.. ipython:: python
df_mt = DataFrame(randn(8, 6), index=date_range('1/1/2000', periods=8),
columns=['A', 'B', 'C', 'D', 'E', 'F'])
df_mt['foo'] = 'bar'
# you can also create the tables individually
store.append_to_multiple({ 'df1_mt' : ['A','B'], 'df2_mt' : None }, df_mt, selector = 'df1_mt')
store
# indiviual tables were created
store.select('df1_mt')
store.select('df2_mt')
# as a multiple
store.select_as_multiple(['df1_mt','df2_mt'], where = [ 'A>0','B>0' ], selector = 'df1_mt')
.. ipython:: python
:suppress:
store.close()
import os
os.remove('store.h5')
**Enhancements**
- ``HDFStore`` now can read native PyTables table format tables
- You can pass ``nan_rep = 'my_nan_rep'`` to append, to change the default nan representation on disk (which converts to/from `np.nan`), this defaults to `nan`.
- You can pass ``index`` to ``append``. This defaults to ``True``. This will automagically create indicies on the *indexables* and *data columns* of the table
- You can pass ``chunksize=an integer`` to ``append``, to change the writing chunksize (default is 50000). This will signficantly lower your memory usage on writing.
- You can pass ``expectedrows=an integer`` to the first ``append``, to set the TOTAL number of expectedrows that ``PyTables`` will expected. This will optimize read/write performance.
- ``Select`` now supports passing ``start`` and ``stop`` to provide selection space limiting in selection.
**Bug Fixes**
- ``HDFStore`` tables can now store ``float32`` types correctly (cannot be mixed with ``float64`` however)
See the `full release notes
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
on GitHub for a complete list.