Skip to content

Commit 1d1317d

Browse files
committed
DOC: string columns update in io.rst/HDFStrore
1 parent 3653994 commit 1d1317d

File tree

1 file changed

+28
-3
lines changed

1 file changed

+28
-3
lines changed

doc/source/io.rst

+28-3
Original file line numberDiff line numberDiff line change
@@ -2404,10 +2404,18 @@ conversion may not be necessary in future versions of pandas)
24042404
String Columns
24052405
~~~~~~~~~~~~~~
24062406
2407-
The underlying implementation of ``HDFStore`` uses a fixed column width (itemsize) for string columns. A string column itemsize is calculated as the maximum of the
2408-
length of data (for that column) that is passed to the ``HDFStore``, **in the first append**. Subsequent appends, may introduce a string for a column **larger** than the column can hold, an Exception will be raised (otherwise you could have a silent truncation of these columns, leading to loss of information). In the future we may relax this and allow a user-specified truncation to occur.
2407+
**min_itemsize**
24092408
2410-
Pass ``min_itemsize`` on the first table creation to a-priori specifiy the minimum length of a particular string column. ``min_itemsize`` can be an integer, or a dict mapping a column name to an integer. You can pass ``values`` as a key to allow all *indexables* or *data_columns* to have this min_itemsize.
2409+
The underlying implementation of ``HDFStore`` uses a fixed column width (itemsize) for string columns.
2410+
A string column itemsize is calculated as the maximum of the
2411+
length of data (for that column) that is passed to the ``HDFStore``, **in the first append**. Subsequent appends,
2412+
may introduce a string for a column **larger** than the column can hold, an Exception will be raised (otherwise you
2413+
could have a silent truncation of these columns, leading to loss of information). In the future we may relax this and
2414+
allow a user-specified truncation to occur.
2415+
2416+
Pass ``min_itemsize`` on the first table creation to a-priori specifiy the minimum length of a particular string column.
2417+
``min_itemsize`` can be an integer, or a dict mapping a column name to an integer. You can pass ``values`` as a key to
2418+
allow all *indexables* or *data_columns* to have this min_itemsize.
24112419
24122420
Starting in 0.11, passing a ``min_itemsize`` dict will cause all passed columns to be created as *data_columns* automatically.
24132421
@@ -2429,6 +2437,23 @@ Starting in 0.11, passing a ``min_itemsize`` dict will cause all passed columns
24292437
store.append('dfs2', dfs, min_itemsize = { 'A' : 30 })
24302438
store.get_storer('dfs2').table
24312439
2440+
**nan_rep**
2441+
2442+
String columns will serialize a ``np.nan`` (a missing value) with the ``nan_rep`` string representation. This defaults to the string value ``nan``.
2443+
You could inadvertently turn an actual ``nan`` value into a missing value.
2444+
2445+
.. ipython:: python
2446+
2447+
dfss = DataFrame(dict(A = ['foo','bar','nan']))
2448+
dfss
2449+
2450+
store.append('dfss', dfss)
2451+
store.select('dfss')
2452+
2453+
# here you need to specify a different nan rep
2454+
store.append('dfss2', dfss, nan_rep='_nan_')
2455+
store.select('dfss2')
2456+
24322457
External Compatibility
24332458
~~~~~~~~~~~~~~~~~~~~~~
24342459

0 commit comments

Comments
 (0)