Skip to content

Commit cbe91d7

Browse files
committed
2 parents ca26e88 + a1cd25f commit cbe91d7

19 files changed

+474
-103
lines changed

RELEASE.rst

+4
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ pandas 0.9.0
3636
Finance (#1748, #1739)
3737
- Recognize and convert more boolean values in file parsing (Yes, No, TRUE,
3838
FALSE, variants thereof) (#1691, #1295)
39+
- Add Panel.update method, analogous to DataFrame.update (#1999, #1988)
3940

4041
**Improvements to existing features**
4142

@@ -63,6 +64,8 @@ pandas 0.9.0
6364

6465
**API Changes**
6566

67+
- Change default header names in read_* functions to more Pythonic X0, X1,
68+
etc. instead of X.1, X.2. (#2000)
6669
- Deprecated ``day_of_year`` API removed from PeriodIndex, use ``dayofyear``
6770
(#1723)
6871
- Don't modify NumPy suppress printoption at import time
@@ -243,6 +246,7 @@ pandas 0.9.0
243246
- Fix reset_index bug if both drop and level are specified (#1957)
244247
- Work around unsafe NumPy object->int casting with Cython function (#1987)
245248
- Fix datetime64 formatting bug in DataFrame.to_csv (#1993)
249+
- Default start date in pandas.io.data to 1/1/2000 as the docs say (#2011)
246250

247251

248252
pandas 0.8.1

doc/source/computation.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -397,8 +397,8 @@ available:
397397
:widths: 20, 80
398398

399399
``ewma``, EW moving average
400-
``ewvar``, EW moving variance
401-
``ewstd``, EW moving standard deviation
400+
``ewmvar``, EW moving variance
401+
``ewmstd``, EW moving standard deviation
402402
``ewmcorr``, EW moving correlation
403403
``ewmcov``, EW moving covariance
404404

doc/source/v0.9.0.txt

+58-27
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0900:
22

3-
v0.9.0 (September 25, 2012)
4-
---------------------------
3+
v0.9.0 (October 2, 2012)
4+
------------------------
55

66
This is a major release from 0.8.1 and includes several new features and
77
enhancements along with a large number of bug fixes. New features include
@@ -30,31 +30,62 @@ New features
3030
API changes
3131
~~~~~~~~~~~
3232

33-
- Creating a Series from another Series, passing an index, will cause
34-
reindexing to happen inside rather than treating the Series like an
35-
ndarray. Technically improper usages like Series(df[col1], index=df[col2])
36-
that worked before "by accident" (this was never intended) will lead to all
37-
NA Series in some cases.
38-
- Deprecated ``day_of_year`` API removed from PeriodIndex, use ``dayofyear``
39-
(GH1723_)
40-
- Don't modify NumPy suppress printoption to True at import time
41-
- The internal HDF5 data arrangement for DataFrames has been transposed.
42-
Legacy files will still be readable by HDFStore (GH1834_, GH1824_)
43-
- Legacy cruft removed: pandas.stats.misc.quantileTS
44-
- Use ISO8601 format for Period repr: monthly, daily, and on down (GH1776_)
45-
- Empty DataFrame columns are now created as object dtype. This will prevent
46-
a class of TypeErrors that was occurring in code where the dtype of a
47-
column would depend on the presence of data or not (e.g. a SQL query having
48-
results) (GH1783_)
49-
- Setting parts of DataFrame/Panel using ix now aligns input Series/DataFrame
50-
(GH1630_)
51-
- ``first`` and ``last`` methods in ``GroupBy`` no longer drop non-numeric
52-
columns (GH1809_)
53-
- Resolved inconsistencies in specifying custom NA values in text parser.
54-
``na_values`` of type dict no longer override default NAs unless
55-
``keep_default_na`` is set to false explicitly (GH1657_)
56-
- ``DataFrame.dot`` will not do data alignment, and also work with Series
57-
(GH1915_)
33+
- The default column names when ``header=None`` and no columns names passed to
34+
functions like ``read_csv`` has changed to be more Pythonic and amenable to
35+
attribute access:
36+
37+
.. ipython:: python
38+
39+
from StringIO import StringIO
40+
41+
data = '0,0,1\n1,1,0\n0,1,0'
42+
df = read_csv(StringIO(data), header=None)
43+
df
44+
45+
46+
- Creating a Series from another Series, passing an index, will cause reindexing
47+
to happen inside rather than treating the Series like an ndarray. Technically
48+
improper usages like ``Series(df[col1], index=df[col2])11 that worked before
49+
"by accident" (this was never intended) will lead to all NA Series in some
50+
cases. To be perfectly clear:
51+
52+
.. ipython:: python
53+
54+
s1 = Series([1, 2, 3])
55+
s1
56+
57+
s2 = Series(s1, index=['foo', 'bar', 'baz'])
58+
s2
59+
60+
- Deprecated ``day_of_year`` API removed from PeriodIndex, use ``dayofyear``
61+
(GH1723_)
62+
63+
- Don't modify NumPy suppress printoption to True at import time
64+
65+
- The internal HDF5 data arrangement for DataFrames has been transposed. Legacy
66+
files will still be readable by HDFStore (GH1834_, GH1824_)
67+
68+
- Legacy cruft removed: pandas.stats.misc.quantileTS
69+
70+
- Use ISO8601 format for Period repr: monthly, daily, and on down (GH1776_)
71+
72+
- Empty DataFrame columns are now created as object dtype. This will prevent a
73+
class of TypeErrors that was occurring in code where the dtype of a column
74+
would depend on the presence of data or not (e.g. a SQL query having results)
75+
(GH1783_)
76+
77+
- Setting parts of DataFrame/Panel using ix now aligns input Series/DataFrame
78+
(GH1630_)
79+
80+
- ``first`` and ``last`` methods in ``GroupBy`` no longer drop non-numeric
81+
columns (GH1809_)
82+
83+
- Resolved inconsistencies in specifying custom NA values in text parser.
84+
``na_values`` of type dict no longer override default NAs unless
85+
``keep_default_na`` is set to false explicitly (GH1657_)
86+
87+
- ``DataFrame.dot`` will not do data alignment, and also work with Series
88+
(GH1915_)
5889

5990

6091
See the `full release notes

pandas/core/frame.py

+33-2
Original file line numberDiff line numberDiff line change
@@ -2461,7 +2461,8 @@ def set_index(self, keys, drop=True, append=False, inplace=False,
24612461
frame.index = index
24622462
return frame
24632463

2464-
def reset_index(self, level=None, drop=False, inplace=False):
2464+
def reset_index(self, level=None, drop=False, inplace=False, col_level=0,
2465+
col_fill=''):
24652466
"""
24662467
For DataFrame with multi-level index, return new DataFrame with
24672468
labeling information in the columns under the index names, defaulting
@@ -2479,6 +2480,13 @@ def reset_index(self, level=None, drop=False, inplace=False):
24792480
the index to the default integer index.
24802481
inplace : boolean, default False
24812482
Modify the DataFrame in place (do not create a new object)
2483+
col_level : int or str, default 0
2484+
If the columns have multiple levels, determines which level the
2485+
labels are inserted into. By default it is inserted into the first
2486+
level.
2487+
col_fill : object, default ''
2488+
If the columns have multiple levels, determines how the other levels
2489+
are named. If None then the index name is repeated.
24822490
24832491
Returns
24842492
-------
@@ -2507,11 +2515,22 @@ def _maybe_cast(values):
25072515
names = self.index.names
25082516
zipped = zip(self.index.levels, self.index.labels)
25092517

2518+
multi_col = isinstance(self.columns, MultiIndex)
25102519
for i, (lev, lab) in reversed(list(enumerate(zipped))):
25112520
col_name = names[i]
25122521
if col_name is None:
25132522
col_name = 'level_%d' % i
25142523

2524+
if multi_col:
2525+
if col_fill is None:
2526+
col_name = tuple([col_name] *
2527+
self.columns.nlevels)
2528+
else:
2529+
name_lst = [col_fill] * self.columns.nlevels
2530+
lev_num = self.columns._get_level_number(col_level)
2531+
name_lst[lev_num] = col_name
2532+
col_name = tuple(name_lst)
2533+
25152534
# to ndarray and maybe infer different dtype
25162535
level_values = _maybe_cast(lev.values)
25172536
if level is None or i in level:
@@ -2521,6 +2540,14 @@ def _maybe_cast(values):
25212540
name = self.index.name
25222541
if name is None or name == 'index':
25232542
name = 'index' if 'index' not in self else 'level_0'
2543+
if isinstance(self.columns, MultiIndex):
2544+
if col_fill is None:
2545+
name = tuple([name] * self.columns.nlevels)
2546+
else:
2547+
name_lst = [col_fill] * self.columns.nlevels
2548+
lev_num = self.columns._get_level_number(col_level)
2549+
name_lst[lev_num] = name
2550+
name = tuple(name_lst)
25242551
new_obj.insert(0, name, _maybe_cast(self.index.values))
25252552

25262553
new_obj.index = new_index
@@ -3368,7 +3395,7 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
33683395
33693396
Parameters
33703397
----------
3371-
other : DataFrame
3398+
other : DataFrame, or object coercible into a DataFrame
33723399
join : {'left', 'right', 'outer', 'inner'}, default 'left'
33733400
overwrite : boolean, default True
33743401
If True then overwrite values for common keys in the calling frame
@@ -3382,7 +3409,11 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
33823409
if join != 'left':
33833410
raise NotImplementedError
33843411

3412+
if not isinstance(other, DataFrame):
3413+
other = DataFrame(other)
3414+
33853415
other = other.reindex_like(self)
3416+
33863417
for col in self.columns:
33873418
this = self[col].values
33883419
that = other[col].values

pandas/core/groupby.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -303,7 +303,7 @@ def mean(self):
303303

304304
def median(self):
305305
"""
306-
Compute mean of groups, excluding missing values
306+
Compute median of groups, excluding missing values
307307
308308
For multiple groupings, the result index will be a MultiIndex
309309
"""

pandas/core/panel.py

+30
Original file line numberDiff line numberDiff line change
@@ -1318,6 +1318,36 @@ def join(self, other, how='left', lsuffix='', rsuffix=''):
13181318
return concat([self] + list(other), axis=0, join=how,
13191319
join_axes=join_axes, verify_integrity=True)
13201320

1321+
def update(self, other, join='left', overwrite=True, filter_func=None,
1322+
raise_conflict=False):
1323+
"""
1324+
Modify Panel in place using non-NA values from passed
1325+
Panel, or object coercible to Panel. Aligns on items
1326+
1327+
Parameters
1328+
----------
1329+
other : Panel, or object coercible to Panel
1330+
join : How to join individual DataFrames
1331+
{'left', 'right', 'outer', 'inner'}, default 'left'
1332+
overwrite : boolean, default True
1333+
If True then overwrite values for common keys in the calling panel
1334+
filter_func : callable(1d-array) -> 1d-array<boolean>, default None
1335+
Can choose to replace values other than NA. Return True for values
1336+
that should be updated
1337+
raise_conflict : bool
1338+
If True, will raise an error if a DataFrame and other both
1339+
contain data in the same place.
1340+
"""
1341+
1342+
if not isinstance(other, Panel):
1343+
other = Panel(other)
1344+
1345+
other = other.reindex(items=self.items)
1346+
1347+
for frame in self.items:
1348+
self[frame].update(other[frame], join, overwrite, filter_func,
1349+
raise_conflict)
1350+
13211351
def _get_join_index(self, other, how):
13221352
if how == 'left':
13231353
join_major, join_minor = self.major_axis, self.minor_axis

pandas/io/data.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ def _sanitize_dates(start, end):
6767
start = to_datetime(start)
6868
end = to_datetime(end)
6969
if start is None:
70-
start = dt.datetime.today() - dt.timedelta(365)
70+
start = dt.datetime(2010, 1, 1)
7171
if end is None:
7272
end = dt.datetime.today()
7373
return start, end
@@ -178,7 +178,8 @@ def get_data_fred(name=None, start=dt.datetime(2010, 1, 1),
178178

179179
url = fred_URL + '%s' % name + \
180180
'/downloaddata/%s' % name + '.csv'
181-
data = read_csv(urllib.urlopen(url), index_col=0, parse_dates=True)
181+
data = read_csv(urllib.urlopen(url), index_col=0, parse_dates=True, header=None,
182+
skiprows=1, names=["DATE", name])
182183
return data.truncate(start, end)
183184

184185

pandas/io/parsers.py

+21-18
Original file line numberDiff line numberDiff line change
@@ -43,15 +43,17 @@ class DateConversionError(Exception):
4343
If None defaults to Excel dialect. Ignored if sep longer than 1 char
4444
See csv.Dialect documentation for more details
4545
header : int, default 0
46-
Row to use for the column labels of the parsed DataFrame
46+
Row to use for the column labels of the parsed DataFrame. Specify None if
47+
there is no header row.
4748
skiprows : list-like or integer
4849
Row numbers to skip (0-indexed) or number of rows to skip (int)
4950
at the start of the file
5051
index_col : int or sequence, default None
5152
Column to use as the row labels of the DataFrame. If a sequence is
5253
given, a MultiIndex is used.
5354
names : array-like
54-
List of column names
55+
List of column names to use. If passed, header will be implicitly set to
56+
None.
5557
na_values : list-like or dict, default None
5658
Additional strings to recognize as NA/NaN. If dict passed, specific
5759
per-column NA values
@@ -613,7 +615,7 @@ def _infer_columns(self):
613615

614616
ncols = len(line)
615617
if not names:
616-
columns = ['X.%d' % (i + 1) for i in range(ncols)]
618+
columns = ['X%d' % i for i in range(ncols)]
617619
else:
618620
columns = names
619621

@@ -747,7 +749,7 @@ def _explicit_index_names(self, columns):
747749
else:
748750
index_name = columns[self.index_col]
749751

750-
if index_name is not None and 'Unnamed' in index_name:
752+
if index_name is not None and 'Unnamed' in str(index_name):
751753
index_name = None
752754

753755
elif self.index_col is not None:
@@ -833,6 +835,9 @@ def get_chunk(self, rows=None):
833835
alldata = self._rows_to_cols(content)
834836
data = self._exclude_implicit_index(alldata)
835837

838+
if self.parse_dates is not None:
839+
data, columns = self._process_date_conversion(data)
840+
836841
# apply converters
837842
for col, f in self.converters.iteritems():
838843
if isinstance(col, int) and col not in self.orig_columns:
@@ -841,9 +846,6 @@ def get_chunk(self, rows=None):
841846

842847
data = _convert_to_ndarrays(data, self.na_values, self.verbose)
843848

844-
if self.parse_dates is not None:
845-
data, columns = self._process_date_conversion(data)
846-
847849
if self.index_col is None:
848850
numrows = len(content)
849851
index = Index(np.arange(numrows))
@@ -1160,19 +1162,9 @@ def _convert_types(values, na_values):
11601162

11611163
return result, na_count
11621164

1163-
def _get_col_names(colspec, columns):
1164-
colset = set(columns)
1165-
colnames = []
1166-
for c in colspec:
1167-
if c in colset:
1168-
colnames.append(str(c))
1169-
elif isinstance(c, int):
1170-
colnames.append(str(columns[c]))
1171-
return colnames
1172-
11731165
def _try_convert_dates(parser, colspec, data_dict, columns):
11741166
colspec = _get_col_names(colspec, columns)
1175-
new_name = '_'.join(colspec)
1167+
new_name = '_'.join([str(x) for x in colspec])
11761168

11771169
to_parse = [data_dict[c] for c in colspec if c in data_dict]
11781170
try:
@@ -1181,6 +1173,17 @@ def _try_convert_dates(parser, colspec, data_dict, columns):
11811173
new_col = parser(_concat_date_cols(to_parse))
11821174
return new_name, new_col, colspec
11831175

1176+
def _get_col_names(colspec, columns):
1177+
colset = set(columns)
1178+
colnames = []
1179+
for c in colspec:
1180+
if c in colset:
1181+
colnames.append(c)
1182+
elif isinstance(c, int):
1183+
colnames.append(columns[c])
1184+
return colnames
1185+
1186+
11841187
def _concat_date_cols(date_cols):
11851188
if len(date_cols) == 1:
11861189
return np.array([str(x) for x in date_cols[0]], dtype=object)

0 commit comments

Comments
 (0)