Skip to content

csv_import: Thousands separator works in floating point numbers #4598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 23, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,11 @@ They can take a number of arguments:
date_converters.py
- ``dayfirst``: if True then uses the DD/MM international/European date format
(This is False by default)
- ``thousands``: sepcifies the thousands separator. If not None, then parser
will try to look for it in the output and parse relevant data to integers.
Because it has to essentially scan through the data again, this causes a
- ``thousands``: specifies the thousands separator. If not None, this character will
be stripped from numeric dtypes. However, if it is the first character in a field,
that column will be imported as a string. In the PythonParser, if not None,
then parser will try to look for it in the output and parse relevant data to numeric
dtypes. Because it has to essentially scan through the data again, this causes a
significant performance hit so only use if necessary.
- ``lineterminator`` : string (length 1), default ``None``, Character to break file into lines. Only valid with C parser
- ``quotechar`` : string, The character to used to denote the start and end of a quoted item.
Expand Down Expand Up @@ -506,8 +508,8 @@ DD/MM/YYYY instead. For convenience, a ``dayfirst`` keyword is provided:

Thousand Separators
~~~~~~~~~~~~~~~~~~~
For large integers that have been written with a thousands separator, you can
set the ``thousands`` keyword to ``True`` so that integers will be parsed
For large numbers that have been written with a thousands separator, you can
set the ``thousands`` keyword to a string of length 1 so that integers will be parsed
correctly:

.. ipython:: python
Expand All @@ -521,7 +523,7 @@ correctly:
with open('tmp.csv', 'w') as fh:
fh.write(data)

By default, integers with a thousands separator will be parsed as strings
By default, numbers with a thousands separator will be parsed as strings

.. ipython:: python

Expand Down Expand Up @@ -1123,7 +1125,7 @@ is ``None``. To explicity force ``Series`` parsing, pass ``typ=series``
- ``numpy`` : direct decoding to numpy arrays. default is False;
Note that the JSON ordering **MUST** be the same for each term if ``numpy=True``
- ``precise_float`` : boolean, default ``False``. Set to enable usage of higher precision (strtod) function when decoding string to double values. Default (``False``) is to use fast but less precise builtin functionality
- ``date_unit`` : string, the timestamp unit to detect if converting dates. Default
- ``date_unit`` : string, the timestamp unit to detect if converting dates. Default
None. By default the timestamp precision will be detected, if this is not desired
then pass one of 's', 'ms', 'us' or 'ns' to force timestamp precision to
seconds, milliseconds, microseconds or nanoseconds respectively.
Expand Down Expand Up @@ -1201,11 +1203,11 @@ nanoseconds
dfju

# Let Pandas detect the correct precision
dfju = pd.read_json(json)
dfju = pd.read_json(json)
dfju

# Or specify that all timestamps are in nanoseconds
dfju = pd.read_json(json, date_unit='ns')
dfju = pd.read_json(json, date_unit='ns')
dfju

.. ipython:: python
Expand Down
2 changes: 2 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,8 @@ See :ref:`Internal Refactoring<whatsnew_0130.refactoring>`
- Fix selection with ``ix/loc`` and non_unique selectors (:issue:`4619`)
- Fix assignment with iloc/loc involving a dtype change in an existing column (:issue:`4312`)
have internal setitem_with_indexer in core/indexing to use Block.setitem
- Fixed bug where thousands operator was not handled correctly for floating point numbers
in csv_import (:issue:`4322`)

pandas 0.12
===========
Expand Down
3 changes: 3 additions & 0 deletions doc/source/v0.13.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,9 @@ Bug Fixes

- Suppressed DeprecationWarning associated with internal calls issued by repr() (:issue:`4391`)

- Fixed bug where thousands operator was not handled correctly for floating point numbers
in csv_import (:issue:`4322`)

See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list.
Loading