Skip to content

Commit bfd2e19

Browse files
committed
Merge commit 'v0.12.0rc1-127-gec8920a' into debian
* commit 'v0.12.0rc1-127-gec8920a': DOC: docs for precise_float option in read_json BUG: explicity change nan -> NaT when assigning to datelike dtypes ENH: expose ujson precise_float argument on decode ENH: ujson better handling of very large and very small numbers, throw ValueError for bad double_precision arg pandas-dev#4042 minor: some trailing spaces and a pylint "pragma" to stop complaining about Series._ix defined elsewhere ENH: test_perf.py - use psutil to set affinity (if absent functionality - then affinity module) TST: print out byteorder in ci/print_versions.py DOC: Fix typo. Update CONTRIBUTING.md with note on attribution in PRs
2 parents b13d540 + ec8920a commit bfd2e19

File tree

16 files changed

+187
-75
lines changed

16 files changed

+187
-75
lines changed

CONTRIBUTING.md

+9
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,15 @@ your contribution or address the issue you're having.
7878
- For extra brownie points, use "git rebase -i" to squash and reorder
7979
commits in your PR so that the history makes the most sense. Use your own
8080
judgment to decide what history needs to be preserved.
81+
- Pandas source code should not (with some exceptions, such as 3rd party licensed code),
82+
generally speaking, include an "Authors:" list or attribution to individuals in source code.
83+
The RELEASE.rst details changes and enhancements to the code over time,
84+
a "thanks goes to @JohnSmith." as part of the appropriate entry is a suitable way to acknowledge
85+
contributions, the rest is git blame/log.
86+
Feel free to ask the commiter who merges your code to include such an entry
87+
or include it directly yourself as part of the PR if you'd like to. We're always glad to have
88+
new contributors join us from the ever-growing pandas community.
89+
You may also be interested in the copyright policy as detailed in the pandas [LICENSE](https://github.com/pydata/pandas/blob/master/LICENSE).
8190
- On the subject of [PEP8](http://www.python.org/dev/peps/pep-0008/): yes.
8291
- On the subject of massive PEP8 fix PRs touching everything, please consider the following:
8392
- They create merge conflicts for people working in their own fork.

README.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,8 @@ Optional dependencies
9999

100100
- `BeautifulSoup4`_ and `html5lib`_ (Any recent version of `html5lib`_ is
101101
okay.)
102-
- `BeautifulSoup4`_ and `lxml`_
103-
- `BeautifulSoup4`_ and `html5lib`_ and `lxml`_
102+
- `BeautifulSoup4`_ and `lxml`_
103+
- `BeautifulSoup4`_ and `html5lib`_ and `lxml`_
104104
- Only `lxml`_, although see :ref:`HTML reading gotchas <html-gotchas>`
105105
for reasons as to why you should probably **not** take this approach.
106106

ci/print_versions.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,10 @@
55
print("------------------")
66
print("Python: %d.%d.%d.%s.%s" % sys.version_info[:])
77
try:
8-
import os
8+
import os, sys
99
(sysname, nodename, release, version, machine) = os.uname()
1010
print("OS: %s %s %s %s" % (sysname, release, version,machine))
11+
print("byteorder: %s" % sys.byteorder)
1112
print("LC_ALL: %s" % os.environ.get('LC_ALL',"None"))
1213
print("LANG: %s" % os.environ.get('LANG',"None"))
1314
except:

doc/source/basics.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1228,7 +1228,7 @@ You can get/set options directly as attributes of the top-level ``options`` attr
12281228
pd.options.display.max_rows
12291229
12301230
1231-
There is also an API composed of 4 relavent functions, available directly from the ``pandas``
1231+
There is also an API composed of 4 relevant functions, available directly from the ``pandas``
12321232
namespace, and they are:
12331233

12341234
- ``get_option`` / ``set_option`` - get/set the value of a single option.

doc/source/io.rst

+2
Original file line numberDiff line numberDiff line change
@@ -1060,6 +1060,8 @@ is ``None``. To explicity force ``Series`` parsing, pass ``typ=series``
10601060
- ``keep_default_dates`` : boolean, default True. If parsing dates, then parse the default datelike columns
10611061
- ``numpy`` : direct decoding to numpy arrays. default is False;
10621062
Note that the JSON ordering **MUST** be the same for each term if ``numpy=True``
1063+
- ``precise_float`` : boolean, default ``False``. Set to enable usage of higher precision (strtod) function
1064+
when decoding string to double values. Default (``False``) is to use fast but less precise builtin functionality
10631065

10641066
The parser will raise one of ``ValueError/TypeError/AssertionError`` if the JSON is
10651067
not parsable.

doc/source/release.rst

+3
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ pandas 0.12
3535
list of ``DataFrame`` s courtesy of @cpcloud. (:issue:`3477`,
3636
:issue:`3605`, :issue:`3606`)
3737
- Support for reading Amazon S3 files. (:issue:`3504`)
38+
- Added module for reading and writing JSON strings/files: pandas.io.json
39+
includes ``to_json`` DataFrame/Series method, and a ``read_json`` top-level reader
40+
various issues (:issue:`1226`, :issue:`3804`, :issue:`3876`, :issue:`3867`, :issue:`1305`)
3841
- Added module for reading and writing Stata files: pandas.io.stata (:issue:`1512`)
3942
includes ``to_stata`` DataFrame method, and a ``read_stata`` top-level reader
4043
- Added support for writing in ``to_csv`` and reading in ``read_csv``,

doc/source/v0.12.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,7 @@ I/O Enhancements
206206
- Added module for reading and writing json format files: ``pandas.io.json``
207207
accessable via ``read_json`` top-level function for reading,
208208
and ``to_json`` DataFrame method for writing, :ref:`See the docs<io.json>`
209+
various issues (:issue:`1226`, :issue:`3804`, :issue:`3876`, :issue:`3867`, :issue:`1305`)
209210

210211
- ``MultiIndex`` column support for reading and writing csv format files
211212

pandas/core/common.py

+9
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ class AmbiguousIndexError(PandasError, KeyError):
4242
_NS_DTYPE = np.dtype('M8[ns]')
4343
_TD_DTYPE = np.dtype('m8[ns]')
4444
_INT64_DTYPE = np.dtype(np.int64)
45+
_DATELIKE_DTYPES = set([ np.dtype(t) for t in ['M8[ns]','m8[ns]'] ])
4546

4647
def isnull(obj):
4748
"""Detect missing values (NaN in numeric arrays, None/NaN in object arrays)
@@ -718,6 +719,12 @@ def _infer_dtype_from_scalar(val):
718719
return dtype, val
719720

720721

722+
def _maybe_cast_scalar(dtype, value):
723+
""" if we a scalar value and are casting to a dtype that needs nan -> NaT conversion """
724+
if np.isscalar(value) and dtype in _DATELIKE_DTYPES and isnull(value):
725+
return tslib.iNaT
726+
return value
727+
721728
def _maybe_promote(dtype, fill_value=np.nan):
722729

723730
# if we passed an array here, determine the fill value by dtype
@@ -789,6 +796,7 @@ def _maybe_upcast_putmask(result, mask, other, dtype=None, change=None):
789796

790797
if mask.any():
791798

799+
other = _maybe_cast_scalar(result.dtype, other)
792800
def changeit():
793801

794802
# try to directly set by expanding our array to full
@@ -851,6 +859,7 @@ def _maybe_upcast_indexer(result, indexer, other, dtype=None):
851859
return the result and a changed flag
852860
"""
853861

862+
other = _maybe_cast_scalar(result.dtype, other)
854863
original_dtype = result.dtype
855864
def changeit():
856865
# our type is wrong here, need to upcast

pandas/core/series.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -567,7 +567,7 @@ def axes(self):
567567

568568
@property
569569
def ix(self):
570-
if self._ix is None:
570+
if self._ix is None: # defined in indexing.py; pylint: disable=E0203
571571
self._ix = _SeriesIndexer(self, 'ix')
572572

573573
return self._ix

pandas/io/json.py

+48-27
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@
1616
### interface to/from ###
1717

1818
def to_json(path_or_buf, obj, orient=None, date_format='epoch', double_precision=10, force_ascii=True):
19-
19+
2020
if isinstance(obj, Series):
21-
s = SeriesWriter(obj, orient=orient, date_format=date_format, double_precision=double_precision,
21+
s = SeriesWriter(obj, orient=orient, date_format=date_format, double_precision=double_precision,
2222
ensure_ascii=force_ascii).write()
2323
elif isinstance(obj, DataFrame):
2424
s = FrameWriter(obj, orient=orient, date_format=date_format, double_precision=double_precision,
@@ -41,7 +41,7 @@ def __init__(self, obj, orient, date_format, double_precision, ensure_ascii):
4141

4242
if orient is None:
4343
orient = self._default_orient
44-
44+
4545
self.orient = orient
4646
self.date_format = date_format
4747
self.double_precision = double_precision
@@ -64,7 +64,7 @@ def _format_to_date(self, data):
6464
if self._needs_to_date(data):
6565
return data.apply(lambda x: x.isoformat())
6666
return data
67-
67+
6868
def copy_if_needed(self):
6969
""" copy myself if necessary """
7070
if not self.is_copy:
@@ -119,7 +119,8 @@ def _format_dates(self):
119119
self.obj[c] = self._format_to_date(self.obj[c])
120120

121121
def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True,
122-
convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False):
122+
convert_axes=True, convert_dates=True, keep_default_dates=True,
123+
numpy=False, precise_float=False):
123124
"""
124125
Convert JSON string to pandas object
125126
@@ -154,8 +155,10 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True,
154155
default is True
155156
keep_default_dates : boolean, default True. If parsing dates,
156157
then parse the default datelike columns
157-
numpy: direct decoding to numpy arrays. default is False.Note that the JSON ordering MUST be the same
158+
numpy : direct decoding to numpy arrays. default is False.Note that the JSON ordering MUST be the same
158159
for each term if numpy=True.
160+
precise_float : boolean, default False. Set to enable usage of higher precision (strtod) function
161+
when decoding string to double values. Default (False) is to use fast but less precise builtin functionality
159162
160163
Returns
161164
-------
@@ -186,28 +189,31 @@ def read_json(path_or_buf=None, orient=None, typ='frame', dtype=True,
186189
return obj
187190

188191
class Parser(object):
189-
190-
def __init__(self, json, orient, dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=False, numpy=False):
192+
193+
def __init__(self, json, orient, dtype=True, convert_axes=True,
194+
convert_dates=True, keep_default_dates=False, numpy=False,
195+
precise_float=False):
191196
self.json = json
192197

193198
if orient is None:
194199
orient = self._default_orient
195-
200+
196201
self.orient = orient
197202
self.dtype = dtype
198203

199204
if orient == "split":
200205
numpy = False
201206

202207
self.numpy = numpy
208+
self.precise_float = precise_float
203209
self.convert_axes = convert_axes
204210
self.convert_dates = convert_dates
205211
self.keep_default_dates = keep_default_dates
206212
self.obj = None
207213

208214
def parse(self):
209215

210-
# try numpy
216+
# try numpy
211217
numpy = self.numpy
212218
if numpy:
213219
self._parse_numpy()
@@ -269,7 +275,7 @@ def _try_convert_data(self, name, data, use_dtypes=True, convert_dates=True):
269275
pass
270276

271277
if data.dtype == 'float':
272-
278+
273279
# coerce floats to 64
274280
try:
275281
data = data.astype('float64')
@@ -291,7 +297,7 @@ def _try_convert_data(self, name, data, use_dtypes=True, convert_dates=True):
291297

292298
# coerce ints to 64
293299
if data.dtype == 'int':
294-
300+
295301
# coerce floats to 64
296302
try:
297303
data = data.astype('int64')
@@ -322,7 +328,7 @@ def _try_convert_to_date(self, data):
322328
if issubclass(new_data.dtype.type,np.number):
323329
if not ((new_data == iNaT) | (new_data > 31536000000000000L)).all():
324330
return data, False
325-
331+
326332
try:
327333
new_data = to_datetime(new_data)
328334
except:
@@ -342,29 +348,35 @@ class SeriesParser(Parser):
342348
_default_orient = 'index'
343349

344350
def _parse_no_numpy(self):
345-
351+
346352
json = self.json
347353
orient = self.orient
348354
if orient == "split":
349355
decoded = dict((str(k), v)
350-
for k, v in loads(json).iteritems())
356+
for k, v in loads(
357+
json,
358+
precise_float=self.precise_float).iteritems())
351359
self.obj = Series(dtype=None, **decoded)
352360
else:
353-
self.obj = Series(loads(json), dtype=None)
361+
self.obj = Series(
362+
loads(json, precise_float=self.precise_float), dtype=None)
354363

355364
def _parse_numpy(self):
356365

357366
json = self.json
358367
orient = self.orient
359368
if orient == "split":
360-
decoded = loads(json, dtype=None, numpy=True)
369+
decoded = loads(json, dtype=None, numpy=True,
370+
precise_float=self.precise_float)
361371
decoded = dict((str(k), v) for k, v in decoded.iteritems())
362372
self.obj = Series(**decoded)
363373
elif orient == "columns" or orient == "index":
364374
self.obj = Series(*loads(json, dtype=None, numpy=True,
365-
labelled=True))
375+
labelled=True,
376+
precise_float=self.precise_float))
366377
else:
367-
self.obj = Series(loads(json, dtype=None, numpy=True))
378+
self.obj = Series(loads(json, dtype=None, numpy=True,
379+
precise_float=self.precise_float))
368380

369381
def _try_convert_types(self):
370382
if self.obj is None: return
@@ -381,34 +393,43 @@ def _parse_numpy(self):
381393
orient = self.orient
382394

383395
if orient == "columns":
384-
args = loads(json, dtype=None, numpy=True, labelled=True)
396+
args = loads(json, dtype=None, numpy=True, labelled=True,
397+
precise_float=self.precise_float)
385398
if args:
386399
args = (args[0].T, args[2], args[1])
387400
self.obj = DataFrame(*args)
388401
elif orient == "split":
389-
decoded = loads(json, dtype=None, numpy=True)
402+
decoded = loads(json, dtype=None, numpy=True,
403+
precise_float=self.precise_float)
390404
decoded = dict((str(k), v) for k, v in decoded.iteritems())
391405
self.obj = DataFrame(**decoded)
392406
elif orient == "values":
393-
self.obj = DataFrame(loads(json, dtype=None, numpy=True))
407+
self.obj = DataFrame(loads(json, dtype=None, numpy=True,
408+
precise_float=self.precise_float))
394409
else:
395-
self.obj = DataFrame(*loads(json, dtype=None, numpy=True, labelled=True))
410+
self.obj = DataFrame(*loads(json, dtype=None, numpy=True, labelled=True,
411+
precise_float=self.precise_float))
396412

397413
def _parse_no_numpy(self):
398414

399415
json = self.json
400416
orient = self.orient
401417

402418
if orient == "columns":
403-
self.obj = DataFrame(loads(json), dtype=None)
419+
self.obj = DataFrame(
420+
loads(json, precise_float=self.precise_float), dtype=None)
404421
elif orient == "split":
405422
decoded = dict((str(k), v)
406-
for k, v in loads(json).iteritems())
423+
for k, v in loads(
424+
json,
425+
precise_float=self.precise_float).iteritems())
407426
self.obj = DataFrame(dtype=None, **decoded)
408427
elif orient == "index":
409-
self.obj = DataFrame(loads(json), dtype=None).T
428+
self.obj = DataFrame(
429+
loads(json, precise_float=self.precise_float), dtype=None).T
410430
else:
411-
self.obj = DataFrame(loads(json), dtype=None)
431+
self.obj = DataFrame(
432+
loads(json, precise_float=self.precise_float), dtype=None)
412433

413434
def _try_convert_types(self):
414435
if self.obj is None: return

pandas/io/tests/test_json/test_pandas.py

+10
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,16 @@ def test_series_to_json_except(self):
289289
s = Series([1, 2, 3])
290290
self.assertRaises(ValueError, s.to_json, orient="garbage")
291291

292+
def test_series_from_json_precise_float(self):
293+
s = Series([4.56, 4.56, 4.56])
294+
result = read_json(s.to_json(), typ='series', precise_float=True)
295+
assert_series_equal(result, s)
296+
297+
def test_frame_from_json_precise_float(self):
298+
df = DataFrame([[4.56, 4.56, 4.56], [4.56, 4.56, 4.56]])
299+
result = read_json(df.to_json(), precise_float=True)
300+
assert_frame_equal(result, df)
301+
292302
def test_typ(self):
293303

294304
s = Series(range(6), index=['a','b','c','d','e','f'], dtype='int64')

0 commit comments

Comments
 (0)