Skip to content

Commit 0e9b908

Browse files
committed
Merge commit 'v0.7.0rc1-183-gcc2a8a7' into debian
* commit 'v0.7.0rc1-183-gcc2a8a7': (86 commits) DOC: release notes ENH: can pass file handle or StringIO to Series.to_csv and DataFrame.to_csv, GH pandas-dev#765 BUG: fix subtle bug in maybe_convert_objects causing indexes to be mutated, test coverage, fix pandas-dev#766 TST: merge test coverage and trim floating point zeros even if there are NAs TST: skip excel tests if libraries not installed TST: more merge test coverage, refactoring ENH: more intelligent inference about index_col for Excel files, test coverage for PR pandas-dev#735 BUG: fix docstring TST: test coverage Fix up docstrings Combine xlsx tests with xls to avoid code duplication Add some additional excel reading/writing tests Document sheet_name arg in docstring, give it a default value Add to_excel in Panel and corresponding test Test to_excel with MultiIndex Document writing multiple DataFrames to different sheets Add support for reading/writing .xlsx using openpyxl Special case np.int64 in ExcelWriter and add unittest Missed one variable rename Add unittest for to_excel ...
2 parents 8e43312 + cc2a8a7 commit 0e9b908

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+4460
-1534
lines changed

RELEASE.rst

+14
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,12 @@ pandas 0.7.0
8585
- Add new ``value_range`` function to return min/max of a dataframe (GH #288)
8686
- Add ``drop`` parameter to ``reset_index`` method of ``DataFrame`` and added
8787
method to ``Series`` as well (GH #699)
88+
- Add ``isin`` method to Index objects, works just like ``Series.isin`` (GH
89+
#657)
90+
- Implement array interface on Panel so that ufuncs work (re: #740)
91+
- Add ``sort`` option to ``DataFrame.join`` (GH #731)
92+
- Improved handling of NAs (propagation) in binary operations with
93+
dtype=object arrays (GH #737)
8894

8995
**API Changes**
9096

@@ -169,6 +175,9 @@ pandas 0.7.0
169175
- Add option to Series.to_csv to omit the index (PR #684)
170176
- Add ``delimiter`` as an alternative to ``sep`` in ``read_csv`` and other
171177
parsing functions
178+
- Substantially improved performance of groupby on DataFrames with many
179+
columns by aggregating blocks of columns all at once (GH #745)
180+
- Can pass a file handle or StringIO to Series/DataFrame.to_csv (GH #765)
172181

173182
**Bug fixes**
174183

@@ -249,6 +258,11 @@ pandas 0.7.0
249258
- Raise Exception in DateRange when offset with n=0 is passed (GH #683)
250259
- Fix get/set inconsistency with .ix property and integer location but
251260
non-integer index (GH #707)
261+
- Use right dropna function for SparseSeries. Return dense Series for NA fill
262+
value (GH #730)
263+
- Fix Index.format bug causing incorrectly string-formatted Series with
264+
datetime indexes (# 758)
265+
- Fix errors caused by object dtype arrays passed to ols (GH #759)
252266

253267
Thanks
254268
------

doc/make.py

+14-1
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,20 @@
2626
SPHINX_BUILD = 'sphinxbuild'
2727

2828
def sf():
29-
'push a copy to the sf site'
29+
'push a copy to the sf'
3030
os.system('cd build/html; rsync -avz . wesmckinn,[email protected]'
3131
':/home/groups/p/pa/pandas/htdocs/ -essh --cvs-exclude')
32+
33+
def upload_dev():
34+
'push a copy to the pydata dev directory'
35+
os.system('cd build/html; rsync -avz . [email protected]'
36+
':/usr/share/nginx/pandas/pandas-docs/dev/ -essh')
37+
38+
def upload_stable():
39+
'push a copy to the pydata dev directory'
40+
os.system('cd build/html; rsync -avz . [email protected]'
41+
':/usr/share/nginx/pandas/pandas-docs/stable/ -essh')
42+
3243
def sfpdf():
3344
'push a copy to the sf site'
3445
os.system('cd build/latex; scp pandas.pdf wesmckinn,[email protected]'
@@ -83,6 +94,8 @@ def all():
8394

8495
funcd = {
8596
'html' : html,
97+
'upload_dev' : upload_dev,
98+
'upload_stable' : upload_stable,
8699
'latex' : latex,
87100
'clean' : clean,
88101
'sf' : sf,

doc/source/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -125,4 +125,4 @@ See the package overview for more detail about what's in the library.
125125
related
126126
comparison_with_r
127127
api
128-
vbench
128+

doc/source/indexing.rst

+10-8
Original file line numberDiff line numberDiff line change
@@ -654,7 +654,7 @@ instance:
654654

655655
.. ipython:: python
656656
657-
midx = MultiIndex(levels=[['one', 'two'], ['x','y']],
657+
midx = MultiIndex(levels=[['zero', 'one'], ['x','y']],
658658
labels=[[1,1,0,0],[1,0,1,0]])
659659
df = DataFrame(randn(4,2), index=midx)
660660
print df
@@ -670,13 +670,15 @@ The need for sortedness
670670
~~~~~~~~~~~~~~~~~~~~~~~
671671

672672
**Caveat emptor**: the present implementation of ``MultiIndex`` requires that
673-
the labels be lexicographically sorted into groups for some of the slicing /
674-
indexing routines to work correctly. You can think about this as meaning that
675-
the axis is broken up into a tree structure, where every leaf in a particular
676-
branch shares the same labels at that level of the hierarchy. However, the
677-
``MultiIndex`` does not enforce this: **you are responsible for ensuring that
678-
things are properly sorted**. There is an important new method ``sortlevel``
679-
which will lexicographically sort an axis with a ``MultiIndex``:
673+
the labels be sorted for some of the slicing / indexing routines to work
674+
correctly. You can think about breaking the axis into unique groups, where at
675+
the hierarchical level of interest, each distinct group shares a label, but no
676+
two have the same label. However, the ``MultiIndex`` does not enforce this:
677+
**you are responsible for ensuring that things are properly sorted**. There is
678+
an important new method ``sortlevel`` to sort an axis within a ``MultiIndex``
679+
so that its labels are grouped and sorted by the original ordering of the
680+
associated factor at that level. Note that this does not necessarily mean the
681+
labels will be sorted lexicographically!
680682

681683
.. ipython:: python
682684

pandas/__init__.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@
2323
from pandas.sparse.api import *
2424
from pandas.stats.api import *
2525

26-
from pandas.core.common import set_printoptions, reset_printoptions
27-
from pandas.core.common import set_eng_float_format
26+
from pandas.core.format import (set_printoptions, reset_printoptions,
27+
set_eng_float_format)
2828
from pandas.io.parsers import read_csv, read_table, read_clipboard, ExcelFile
2929
from pandas.io.pytables import HDFStore
3030
from pandas.util.testing import debug

pandas/core/algorithms.py

+111
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
"""
2+
Generic data algorithms. This module is experimental at the moment and not
3+
intended for public consumption
4+
"""
5+
6+
import numpy as np
7+
8+
from pandas.core.series import Series
9+
import pandas.core.common as com
10+
import pandas._tseries as lib
11+
12+
def match(values, index):
13+
"""
14+
15+
16+
Parameters
17+
----------
18+
19+
Returns
20+
-------
21+
match : ndarray
22+
"""
23+
if com.is_float_dtype(index):
24+
return _match_generic(values, index, lib.Float64HashTable,
25+
com._ensure_float64)
26+
elif com.is_integer_dtype(index):
27+
return _match_generic(values, index, lib.Int64HashTable,
28+
com._ensure_int64)
29+
else:
30+
return _match_generic(values, index, lib.PyObjectHashTable,
31+
com._ensure_object)
32+
33+
def _get_hash_table_and_cast(values):
34+
if com.is_float_dtype(values):
35+
klass = lib.Float64HashTable
36+
values = com._ensure_float64(values)
37+
elif com.is_integer_dtype(values):
38+
klass = lib.Int64HashTable
39+
values = com._ensure_int64(values)
40+
else:
41+
klass = lib.PyObjectHashTable
42+
values = com._ensure_object(values)
43+
return klass, values
44+
45+
def count(values, uniques=None):
46+
if uniques is not None:
47+
raise NotImplementedError
48+
else:
49+
if com.is_float_dtype(values):
50+
return _count_generic(values, lib.Float64HashTable,
51+
com._ensure_float64)
52+
elif com.is_integer_dtype(values):
53+
return _count_generic(values, lib.Int64HashTable,
54+
com._ensure_int64)
55+
else:
56+
return _count_generic(values, lib.PyObjectHashTable,
57+
com._ensure_object)
58+
59+
def _count_generic(values, table_type, type_caster):
60+
values = type_caster(values)
61+
table = table_type(len(values))
62+
uniques, labels, counts = table.factorize(values)
63+
64+
return Series(counts, index=uniques)
65+
66+
def _match_generic(values, index, table_type, type_caster):
67+
values = type_caster(values)
68+
index = type_caster(index)
69+
table = table_type(len(index))
70+
table.map_locations(index)
71+
return table.lookup(values)
72+
73+
def factorize(values, sort=False, order=None, na_sentinel=-1):
74+
"""
75+
Encode input values as an enumerated type or categorical variable
76+
77+
Parameters
78+
----------
79+
values : sequence
80+
sort :
81+
order :
82+
83+
Returns
84+
-------
85+
"""
86+
hash_klass, values = _get_hash_table_and_cast(values)
87+
88+
uniques = []
89+
table = hash_klass(len(values))
90+
labels, counts = table.get_labels(values, uniques, 0, na_sentinel)
91+
92+
uniques = com._asarray_tuplesafe(uniques)
93+
if sort and len(counts) > 0:
94+
sorter = uniques.argsort()
95+
reverse_indexer = np.empty(len(sorter), dtype=np.int32)
96+
reverse_indexer.put(sorter, np.arange(len(sorter)))
97+
98+
mask = labels < 0
99+
labels = reverse_indexer.take(labels)
100+
np.putmask(labels, mask, -1)
101+
102+
uniques = uniques.take(sorter)
103+
counts = counts.take(sorter)
104+
105+
return labels, uniques, counts
106+
107+
def unique(values):
108+
"""
109+
110+
"""
111+
pass

pandas/core/api.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
from pandas.core.datetools import DateOffset
66
import pandas.core.datetools as datetools
77

8-
from pandas.core.common import isnull, notnull, set_printoptions, save, load
8+
from pandas.core.common import isnull, notnull, save, load
9+
from pandas.core.format import set_printoptions
910
from pandas.core.index import Index, Int64Index, Factor, MultiIndex
1011
from pandas.core.daterange import DateRange
1112
from pandas.core.series import Series, TimeSeries

0 commit comments

Comments
 (0)