Skip to content

ENH: Add JSON export option for DataFrame #631 #1226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 114 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
cb7c6ae
ENH: Add JSON export option for DataFrame #631
Komnomnomnom May 11, 2012
3af585e
REF: working toward #1150, broke apart Cython module into generated _…
wesm May 10, 2012
11f2c0d
REF: have got things mostly working for #1150
wesm May 10, 2012
e9dee69
BUG: more bug fixes, have to fix intraday frequencies still
wesm May 11, 2012
69d0baa
BUG: more intraday unit fixes
wesm May 11, 2012
5485c2d
BUG: test suite passes, though negative ordinals broken
wesm May 11, 2012
879779d
BUG: weekly and business daily unit support #1150
wesm May 12, 2012
85fcd69
REF: remove period multipliers, close #1199
wesm May 12, 2012
075f05e
ENH: move _ensure_{dtype} functions to Cython for speedup, close #1221
wesm May 12, 2012
ee73df1
DOC: doc fixes
wesm May 12, 2012
9e88e0c
ENH: handle dict return values and vbench, close #823
wesm May 12, 2012
a31ed38
ENH: add is_full method to PeriodIndex close #1114
wesm May 12, 2012
b457ff8
Remove dependencies on details of experimental numpy datetime64 ABI
mwiebe May 7, 2012
3d83387
Use datetime64 with a 'us' unit explicitly, for 1.6 and 1.7 compatibi…
mwiebe May 7, 2012
c53e093
Use an explicit unit for the 1.7 datetime64 scalar constructor
mwiebe May 7, 2012
89bd898
Use assert_equal instead of assert, to see the actual values
mwiebe May 7, 2012
4e6720f
Microseconds (us) not milliseconds (ms)
mwiebe May 8, 2012
a7bccd8
TST: use NaT value
wesm May 12, 2012
b98e4e0
ENH: #1020 implementation. needs tests and adding to API
adamklein Apr 10, 2012
1ecb5c4
ENH: add docs and add match function to API, close #502
wesm May 12, 2012
4ac9abb
ENH: add Cython nth/last functions, vbenchmarks. close #1043
wesm May 12, 2012
b246ae1
BUG: fix improper quarter parsing for frequencies other than Q-DEC, c…
wesm May 12, 2012
4d052f9
BUG: implement Series.repeat to get expected results, close #1229
wesm May 12, 2012
74a6be0
ENH: anchor resampling frequencies like 5minute that evenly subdivide…
wesm May 12, 2012
e043862
BUG: support resampling of period data to, e.g. 5minute thoguh with t…
wesm May 12, 2012
996b964
BUG: remove restriction in lib.Reducer that index by object dtype. cl…
wesm May 12, 2012
0cf9e3d
ENH: Allow different number of rows & columns in a histogram plot
May 8, 2012
7baa84c
TST: vbenchmark for #561, push more work til 0.9
wesm May 12, 2012
8b972a1
BUG: don't print exception in reducer
wesm May 12, 2012
93b5221
BUG: rogue foo
wesm May 12, 2012
eb460c0
ENH: reimplment groupby_indices using better algorithmic tricks, asso…
wesm May 13, 2012
197a7f6
BLD: fix npy_* -> pandas_*, compiler warnings
wesm May 13, 2012
aca4c43
TST: remove one skip test
wesm May 14, 2012
c1260e3
ENH: store pytz time zones as zone strings in HDFStore, close #1232
wesm May 14, 2012
4c32ab8
Stop storing class reference in HDFStore #1235
May 14, 2012
e057ad5
removed extraneous IntIndex instance test
May 14, 2012
0cdfe75
BUG: fix rebase conflict from #1236
wesm May 14, 2012
8d27185
treat XLRD.XL_CELL_ERROR as NaN
ruidc May 11, 2012
1e6aea5
replace tabs with spaces
ruidc May 11, 2012
63952a8
RLS: release note
wesm May 14, 2012
349bccb
ENH: convert multiple text file columns to a single date column #1186
May 11, 2012
52492dd
Merged extra keyword with parse_dates
May 11, 2012
9c01e77
TST: VB for multiple date columns
May 11, 2012
1febe66
A few related bug fixes
May 11, 2012
3fdf18a
TST: test with headers
wesm May 14, 2012
a89e7b9
ENH: maybe upcast masked arrays passed to DataFrame constructor
May 11, 2012
c9af5c5
ENH: Add support for converting DataFrames to R data.frames and
lbeltrame May 8, 2012
d17f1d5
BUG: Properly handle the case of matrices
lbeltrame May 8, 2012
ea7f4e1
RLS: release notes
wesm May 14, 2012
4c1eb1b
ENH: optimize join/merge on integer keys, close #682
wesm May 14, 2012
8572d54
RLS: release notes for #1081
wesm May 14, 2012
8ecb31b
ENH: efficiently box datetime64 -> Timestamp inside Series.__getitem_…
wesm May 14, 2012
4b56332
BLD: add modified numpy Cython header
wesm May 14, 2012
d2b947b
BLD: fix datetime.pxd
wesm May 14, 2012
67a98ff
ENH: can pass multiple columns to GroupBy.__getitem__, close #383
wesm May 14, 2012
2e9de0e
ENH: accept list of tuples, preserving function order in SeriesGroupB…
wesm May 14, 2012
92d050b
ENH: more flexible multiple function application in DataFrameGroupBy,…
wesm May 14, 2012
b07f097
DOC: release notes
wesm May 14, 2012
48a073a
ENH: treat complex number in maybe_convert_objects
tkf Apr 20, 2012
a3e538f
ENH: treat complex number in maybe_convert_objects
tkf Apr 20, 2012
ca6558c
TST: Add complex number in test_constructor_scalar_inference
tkf Apr 20, 2012
3f3b900
ENH: treat complex number in internals.form_blocks
tkf Apr 20, 2012
dc43a1e
ENH: add internals.ComplexBlock
tkf Apr 21, 2012
c280d22
BUG: fix max recursion error in test_reindex_items
tkf Apr 21, 2012
a7698da
BLD: fix platform int issues
wesm May 15, 2012
0782990
TST: verify consistently set group name, close #184
wesm May 15, 2012
d66ac45
ENH: don't populate hash table in index engine if > 1e6 elements, to …
wesm May 15, 2012
be5b5a4
ENH: support different 'bases' when resampling regular intervals like…
wesm May 15, 2012
8d581c8
VB: more convenience auto-updates
May 15, 2012
6e09dda
VB: get from and to email addresses from config file
May 15, 2012
31fefba
VB: removing cruft; getting config from user folders
May 15, 2012
d5b6b93
BUG: floor division for Python 3
wesm May 15, 2012
e275d76
DOC: function for auto docs build
May 15, 2012
18d9a13
DOC: removed lingering sourceforge references
May 15, 2012
545e917
DOC: removed lingering timeRule keyword use
May 15, 2012
40d9a3b
ENH: very basic ordered_merge with forward filling, not with multiple…
wesm May 15, 2012
69229e7
ENH: add group-wise merge capability to ordered_merge, unit tests, cl…
wesm May 15, 2012
9e2142b
BUG: ensure_platform_int actually makes lots of copies
wesm May 15, 2012
5891ad5
RLS: release notes, close #1239
wesm May 15, 2012
42d1c90
BLD: 32-bit compat fixes per #1242
wesm May 15, 2012
f1c6c89
ENH: add keys() method to DataFrame, close #1240
wesm May 15, 2012
6e8bbed
DOC: release notes
wesm May 15, 2012
e50c7d8
TST: test cases for replace method. #929
May 8, 2012
b0e13c1
ENH: Series.replace #929
May 8, 2012
b7546b2
ENH: DataFrame.replace and cython replace. Only works for floats and …
May 9, 2012
45773c9
ENH: finishing up DataFrame.replace need to revisit
May 10, 2012
2f5319d
removed bottleneck calls from replace
May 15, 2012
245c126
moved mask_missing to common
May 15, 2012
35220b4
TST: extra test case for Series.replace
May 15, 2012
40a0cb1
removed remaining references to replace code generation
May 15, 2012
76355d0
DOC: release note re: #929
wesm May 15, 2012
927d370
Removed erroneous reference to iterating over a Series, which iterate…
invisibleroads May 17, 2012
b60c0d3
Fixed a few typos
invisibleroads May 17, 2012
49ad7e2
TST: rephrased .keys call for py3compat
May 17, 2012
421f5d3
DOC: put back doc regarding inplace in rename in anticipation of feature
May 17, 2012
181f945
DOC: reworded description for MultiIndex
May 17, 2012
fb1e662
DOC: started on timeseries.rst for 0.8
May 18, 2012
d4407a9
REF: microsecond -> nanosecond migration, most of the way there #1238
wesm May 15, 2012
4f15d54
BUG: more nano fixes
wesm May 15, 2012
9bc3814
REF: more nanosecond support fixes, test suite passes #1238
wesm May 19, 2012
b026566
ENH: more nanosecond support #1238
wesm May 19, 2012
c360391
Changes to plotting scatter matrix diagonals
orbitfold May 12, 2012
cf74512
Changed xtick, ytick labels
orbitfold May 14, 2012
d7d6a0f
Added simple test cases
orbitfold May 14, 2012
cd8222c
Updated plotting.py scatter_matrix docstring to describe all the para…
orbitfold May 16, 2012
8e2f3f9
Added scatter_matrix examples to visualization.rst
orbitfold May 16, 2012
da1b234
DOC: release notes
wesm May 19, 2012
a6e32b8
BUG: DataFrame.drop_duplicates with NA values
May 12, 2012
2a6fc11
use fast zip with a placeholder value just for np.nan
May 15, 2012
d95a254
TST: vbench for drop_duplicate with skipna set to False
May 15, 2012
7953ae8
optimized a little bit for speed
May 15, 2012
916be1d
ENH: inplace option to DataFrame.drop_duplicates #805 with vbench
May 16, 2012
ba6a9c8
BUG: replace complex64 with complex128
tkf May 16, 2012
1cacb6c
ENH: add KDE plot from #1059
wesm May 19, 2012
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,22 @@ pandas 0.8.0
- Add support for indexes (dates or otherwise) with duplicates and common
sense indexing/selection functionality
- Series/DataFrame.update methods, in-place variant of combine_first (#961)
- Add ``match`` function to API (#502)
- Add Cython-optimized first, last, min, max, prod functions to GroupBy (#994,
#1043)
- Dates can be split across multiple columns (#1227, #1186)
- Add experimental support for converting pandas DataFrame to R data.frame
via rpy2 (#350, #1212)
- Can pass list of (name, function) to GroupBy.aggregate to get aggregates in
a particular order (#610)
- Can pass dicts with lists of functions or dicts to GroupBy aggregate to do
much more flexible multiple function aggregation (#642)
- New ordered_merge functions for merging DataFrames with ordered
data. Also supports group-wise merging for panel data (#813)
- Add keys() method to DataFrame
- Add flexible replace method for replacing potentially values to Series and
DataFrame (#929, #1241)
- Add 'kde' plot kind for Series/DataFrame.plot (#1059)

**Improvements to existing features**

Expand All @@ -50,13 +66,21 @@ pandas 0.8.0
- Can pass arrays in addition to column names to DataFrame.set_index (#402)
- Improve the speed of "square" reindexing of homogeneous DataFrame objects
by significant margin (#836)
- Handle more dtypes when passed MaskedArrays in DataFrame constructor (#406)
- Improved performance of join operations on integer keys (#682)
- Can pass multiple columns to GroupBy object, e.g. grouped[[col1, col2]] to
only aggregate a subset of the value columns (#383)
- Add histogram / kde plot options for scatter_matrix diagonals (#1237)
- Add inplace option to DataFrame.drop_duplicates (#805)

**API Changes**

- Raise ValueError in DataFrame.__nonzero__, so "if df" no longer works
(#1073)
- Change BDay (business day) to not normalize dates by default
- Remove deprecated DataMatrix name
- Default merge suffixes for overlap now have underscores instead of periods
to facilitate tab completion, etc. (#1239)

**Bug fixes**

Expand All @@ -76,6 +100,10 @@ pandas 0.8.0
cases. Fix pivot table bug (#1181)
- Fix formatting of MultiIndex on Series/DataFrame when index name coincides
with label (#1217)
- Handle Excel 2003 #N/A as NaN from xlrd (#1213, #1225)
- Fix timestamp locale-related deserialization issues with HDFStore by moving
to datetime64 representation (#1081, #809)
- Fix DataFrame.duplicated/drop_duplicates NA value handling (#557)

pandas 0.7.3
============
Expand Down
110 changes: 89 additions & 21 deletions doc/make.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,35 +25,29 @@

SPHINX_BUILD = 'sphinxbuild'

def sf():
'push a copy to the sf'
os.system('cd build/html; rsync -avz . wesmckinn,[email protected]'
':/home/groups/p/pa/pandas/htdocs/ -essh --cvs-exclude')

def upload_dev():
'push a copy to the pydata dev directory'
os.system('cd build/html; rsync -avz . [email protected]'
':/usr/share/nginx/pandas/pandas-docs/dev/ -essh')
if os.system('cd build/html; rsync -avz . [email protected]'
':/usr/share/nginx/pandas/pandas-docs/dev/ -essh'):
raise SystemExit('Upload to Pydata Dev failed')

def upload_dev_pdf():
'push a copy to the pydata dev directory'
os.system('cd build/latex; scp pandas.pdf [email protected]'
':/usr/share/nginx/pandas/pandas-docs/dev/')
if os.system('cd build/latex; scp pandas.pdf [email protected]'
':/usr/share/nginx/pandas/pandas-docs/dev/'):
raise SystemExit('PDF upload to Pydata Dev failed')

def upload_stable():
'push a copy to the pydata dev directory'
os.system('cd build/html; rsync -avz . [email protected]'
':/usr/share/nginx/pandas/pandas-docs/stable/ -essh')
'push a copy to the pydata stable directory'
if os.system('cd build/html; rsync -avz . [email protected]'
':/usr/share/nginx/pandas/pandas-docs/stable/ -essh'):
raise SystemExit('Upload to stable failed')

def upload_stable_pdf():
'push a copy to the pydata dev directory'
os.system('cd build/latex; scp pandas.pdf [email protected]'
':/usr/share/nginx/pandas/pandas-docs/stable/')

def sfpdf():
'push a copy to the sf site'
os.system('cd build/latex; scp pandas.pdf wesmckinn,[email protected]'
':/home/groups/p/pa/pandas/htdocs/')
if os.system('cd build/latex; scp pandas.pdf [email protected]'
':/usr/share/nginx/pandas/pandas-docs/stable/'):
raise SystemExit('PDF upload to stable failed')

def clean():
if os.path.exists('build'):
Expand Down Expand Up @@ -102,6 +96,80 @@ def all():
# clean()
html()

def auto_dev_build(debug=False):
msg = ''
try:
clean()
html()
latex()
upload_dev()
upload_dev_pdf()
if not debug:
sendmail()
except (Exception, SystemExit), inst:
msg += str(inst) + '\n'
sendmail(msg)

def sendmail(err_msg=None):
from_name, to_name = _get_config()

if err_msg is None:
msgstr = 'Daily docs build completed successfully'
subject = "DOC: daily build successful"
else:
msgstr = err_msg
subject = "DOC: daily build failed"

import smtplib
from email.MIMEText import MIMEText
msg = MIMEText(msgstr)
msg['Subject'] = subject
msg['From'] = from_name
msg['To'] = to_name

server_str, port, login, pwd = _get_credentials()
server = smtplib.SMTP(server_str, port)
server.ehlo()
server.starttls()
server.ehlo()

server.login(login, pwd)
try:
server.sendmail(from_name, to_name, msg.as_string())
finally:
server.close()

def _get_dir():
import getpass
USERNAME = getpass.getuser()
if sys.platform == 'darwin':
HOME = '/Users/%s' % USERNAME
else:
HOME = '/home/%s' % USERNAME

tmp_dir = '%s/tmp' % HOME
return tmp_dir

def _get_credentials():
tmp_dir = _get_dir()
cred = '%s/credentials' % tmp_dir
with open(cred, 'r') as fh:
server, port, un, domain = fh.read().split(',')
port = int(port)
login = un + '@' + domain + '.com'

import base64
with open('%s/cron_email_pwd' % tmp_dir, 'r') as fh:
pwd = base64.b64decode(fh.read())

return server, port, login, pwd

def _get_config():
tmp_dir = _get_dir()
with open('%s/config' % tmp_dir, 'r') as fh:
from_name, to_name = fh.read().split(',')
return from_name, to_name

funcd = {
'html' : html,
'upload_dev' : upload_dev,
Expand All @@ -110,8 +178,8 @@ def all():
'upload_stable_pdf' : upload_stable_pdf,
'latex' : latex,
'clean' : clean,
'sf' : sf,
'sfpdf' : sfpdf,
'auto_dev' : auto_dev_build,
'auto_debug' : lambda: auto_dev_build(True),
'all' : all,
}

Expand Down
23 changes: 12 additions & 11 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -491,7 +491,7 @@ With a DataFrame, you can simultaneously reindex the index and columns:
df.reindex(index=['c', 'f', 'b'], columns=['three', 'two', 'one'])

For convenience, you may utilize the ``reindex_axis`` method, which takes the
labels and a keyword ``axis`` paramater.
labels and a keyword ``axis`` parameter.

Note that the ``Index`` objects containing the actual axis labels can be
**shared** between objects. So if we have a Series and a DataFrame, the
Expand Down Expand Up @@ -657,7 +657,7 @@ set of labels from an axis:
df.drop(['a', 'd'], axis=0)
df.drop(['one'], axis=1)

Note that the following also works, but a bit less obvious / clean:
Note that the following also works, but is a bit less obvious / clean:

.. ipython:: python

Expand Down Expand Up @@ -685,24 +685,25 @@ Series, it need only contain a subset of the labels as keys:
df.rename(columns={'one' : 'foo', 'two' : 'bar'},
index={'a' : 'apple', 'b' : 'banana', 'd' : 'durian'})

The ``rename`` method also provides a ``copy`` named parameter that is by
default ``True`` and copies the underlying data. Pass ``copy=False`` to rename
the data in place.
The ``rename`` method also provides an ``inplace`` named parameter that is by
default ``False`` and copies the underlying data. Pass ``inplace=True`` to
rename the data in place.

.. _basics.rename_axis:

The Panel class has an a related ``rename_axis`` class which can rename any of
The Panel class has a related ``rename_axis`` class which can rename any of
its three axes.

Iteration
---------

Considering the pandas as somewhat dict-like structure, basic iteration
produces the "keys" of the objects, namely:
Because Series is array-like, basic iteration produces the values. Other data
structures follow the dict-like convention of iterating over the "keys" of the
objects. In short:

* **Series**: the index label
* **DataFrame**: the column labels
* **Panel**: the item labels
* **Series**: values
* **DataFrame**: column labels
* **Panel**: item labels

Thus, for example:

Expand Down
8 changes: 4 additions & 4 deletions doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -171,10 +171,10 @@ accept the following arguments:
- ``window``: size of moving window
- ``min_periods``: threshold of non-null data points to require (otherwise
result is NA)
- ``freq``: optionally specify a :ref: `frequency string <timeseries.freq>` or :ref:`DateOffset <timeseries.offsets>`
to pre-conform the data to. Note that prior to pandas v0.8.0, a keyword
argument ``time_rule`` was used instead of ``freq`` that referred to
the legacy time rule constants
- ``freq``: optionally specify a :ref: `frequency string <timeseries.alias>`
or :ref:`DateOffset <timeseries.offsets>` to pre-conform the data to.
Note that prior to pandas v0.8.0, a keyword argument ``time_rule`` was used
instead of ``freq`` that referred to the legacy time rule constants

These functions can be applied to ndarrays or Series objects:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@
latex_documents = [
('index', 'pandas.tex',
u'pandas: powerful Python data analysis toolkit',
u'Wes McKinney', 'manual'),
u'Wes McKinney\n& PyData Development Team', 'manual'),
]

# The name of an image file (relative to this directory) to place at the top of
Expand Down
13 changes: 7 additions & 6 deletions doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ of the DataFrame):

Consider the ``isin`` method of Series, which returns a boolean vector that is
true wherever the Series elements exist in the passed list. This allows you to
select out rows where one or more columns have values you want:
select rows where one or more columns have values you want:

.. ipython:: python

Expand All @@ -215,7 +215,7 @@ more complex criteria:
.. ipython:: python

# only want 'two' or 'three'
criterion = df2['a'].map(lambda x: x.startswith('t')
criterion = df2['a'].map(lambda x: x.startswith('t'))

df2[criterion]

Expand Down Expand Up @@ -319,7 +319,7 @@ Duplicate Data

.. _indexing.duplicate:

If you want to indentify and remove duplicate rows in a DataFrame, there are
If you want to identify and remove duplicate rows in a DataFrame, there are
two methods that will help: ``duplicated`` and ``drop_duplicates``. Each
takes as an argument the columns to use to identify duplicated rows.

Expand Down Expand Up @@ -567,9 +567,9 @@ Hierarchical indexing (MultiIndex)
Hierarchical indexing (also referred to as "multi-level" indexing) is brand new
in the pandas 0.4 release. It is very exciting as it opens the door to some
quite sophisticated data analysis and manipulation, especially for working with
higher dimensional data. In essence, it enables you to effectively store and
manipulate arbitrarily high dimension data in a 2-dimensional tabular structure
(DataFrame), for example. It is not limited to DataFrame
higher dimensional data. In essence, it enables you to store and manipulate
data with an arbitrary number of dimensions in lower dimensional data
structures like Series (1d) and DataFrame (2d).

In this section, we will show what exactly we mean by "hierarchical" indexing
and how it integrates with the all of the pandas indexing functionality
Expand Down Expand Up @@ -611,6 +611,7 @@ As a convenience, you can pass a list of arrays directly into Series or
DataFrame to construct a MultiIndex automatically:

.. ipython:: python

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
s = Series(randn(8), index=arrays)
Expand Down
2 changes: 1 addition & 1 deletion doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ The two workhorse functions for reading text files (a.k.a. flat files) are
They both use the same parsing code to intelligently convert tabular
data into a DataFrame object. They can take a number of arguments:

- ``path_or_buffer``: Either a string path to a file, or any object with a
- ``filepath_or_buffer``: Either a string path to a file, or any object with a
``read`` method (such as an open file or ``StringIO``).
- ``sep`` or ``delimiter``: A delimiter / separator to split fields
on. `read_csv` is capable of inferring the delimiter automatically in some
Expand Down
3 changes: 1 addition & 2 deletions doc/source/missing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -204,8 +204,7 @@ for interpolation methods outside of the filling methods described above.
:suppress:

np.random.seed(123456)
ts = Series(randn(100), index=date_range('1/1/2000', periods=100,
timeRule='EOM'))
ts = Series(randn(100), index=date_range('1/1/2000', periods=100, freq='BM'))
ts[20:40] = np.nan
ts[60:80] = np.nan
ts = ts.cumsum()
Expand Down
Loading