Skip to content

DOC: doc/source/whatsnew #36857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 5, 2020
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 16 additions & 13 deletions doc/source/whatsnew/v0.10.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ talking about:
:okwarning:

import pandas as pd
df = pd.DataFrame(np.random.randn(6, 4),
index=pd.date_range('1/1/2000', periods=6))

df = pd.DataFrame(np.random.randn(6, 4), index=pd.date_range("1/1/2000", periods=6))
df
# deprecated now
df - df[0]
Expand Down Expand Up @@ -184,12 +184,14 @@ labeled the aggregated group with the end of the interval: the next day).

import io

data = ('a,b,c\n'
'1,Yes,2\n'
'3,No,4')
data = """
a,b,c
1,Yes,2
3,No,4
"""
print(data)
pd.read_csv(io.StringIO(data), header=None)
pd.read_csv(io.StringIO(data), header=None, prefix='X')
pd.read_csv(io.StringIO(data), header=None, prefix="X")

- Values like ``'Yes'`` and ``'No'`` are not interpreted as boolean by default,
though this can be controlled by new ``true_values`` and ``false_values``
Expand All @@ -199,7 +201,7 @@ labeled the aggregated group with the end of the interval: the next day).

print(data)
pd.read_csv(io.StringIO(data))
pd.read_csv(io.StringIO(data), true_values=['Yes'], false_values=['No'])
pd.read_csv(io.StringIO(data), true_values=["Yes"], false_values=["No"])

- The file parsers will not recognize non-string values arising from a
converter function as NA if passed in the ``na_values`` argument. It's better
Expand All @@ -210,10 +212,10 @@ labeled the aggregated group with the end of the interval: the next day).

.. ipython:: python

s = pd.Series([np.nan, 1., 2., np.nan, 4])
s = pd.Series([np.nan, 1.0, 2.0, np.nan, 4])
s
s.fillna(0)
s.fillna(method='pad')
s.fillna(method="pad")

Convenience methods ``ffill`` and ``bfill`` have been added:

Expand All @@ -229,7 +231,8 @@ Convenience methods ``ffill`` and ``bfill`` have been added:
.. ipython:: python

def f(x):
return pd.Series([x, x**2], index=['x', 'x^2'])
return pd.Series([x, x ** 2], index=["x", "x^2"])


s = pd.Series(np.random.rand(5))
s
Expand Down Expand Up @@ -272,20 +275,20 @@ The old behavior of printing out summary information can be achieved via the

.. ipython:: python

pd.set_option('expand_frame_repr', False)
pd.set_option("expand_frame_repr", False)

wide_frame

.. ipython:: python
:suppress:

pd.reset_option('expand_frame_repr')
pd.reset_option("expand_frame_repr")

The width of each line can be changed via 'line_width' (80 by default):

.. code-block:: python

pd.set_option('line_width', 40)
pd.set_option("line_width", 40)

wide_frame

Expand Down
64 changes: 34 additions & 30 deletions doc/source/whatsnew/v0.10.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,49 +45,51 @@ You may need to upgrade your existing data files. Please visit the

import os

os.remove('store.h5')
os.remove("store.h5")

You can designate (and index) certain columns that you want to be able to
perform queries on a table, by passing a list to ``data_columns``

.. ipython:: python

store = pd.HDFStore('store.h5')
df = pd.DataFrame(np.random.randn(8, 3),
index=pd.date_range('1/1/2000', periods=8),
columns=['A', 'B', 'C'])
df['string'] = 'foo'
df.loc[df.index[4:6], 'string'] = np.nan
df.loc[df.index[7:9], 'string'] = 'bar'
df['string2'] = 'cool'
store = pd.HDFStore("store.h5")
df = pd.DataFrame(
np.random.randn(8, 3),
index=pd.date_range("1/1/2000", periods=8),
columns=["A", "B", "C"],
)
df["string"] = "foo"
df.loc[df.index[4:6], "string"] = np.nan
df.loc[df.index[7:9], "string"] = "bar"
df["string2"] = "cool"
df

# on-disk operations
store.append('df', df, data_columns=['B', 'C', 'string', 'string2'])
store.select('df', "B>0 and string=='foo'")
store.append("df", df, data_columns=["B", "C", "string", "string2"])
store.select("df", "B>0 and string=='foo'")

# this is in-memory version of this type of selection
df[(df.B > 0) & (df.string == 'foo')]
df[(df.B > 0) & (df.string == "foo")]

Retrieving unique values in an indexable or data column.

.. code-block:: python

# note that this is deprecated as of 0.14.0
# can be replicated by: store.select_column('df','index').unique()
store.unique('df', 'index')
store.unique('df', 'string')
store.unique("df", "index")
store.unique("df", "string")

You can now store ``datetime64`` in data columns

.. ipython:: python

df_mixed = df.copy()
df_mixed['datetime64'] = pd.Timestamp('20010102')
df_mixed.loc[df_mixed.index[3:4], ['A', 'B']] = np.nan
df_mixed["datetime64"] = pd.Timestamp("20010102")
df_mixed.loc[df_mixed.index[3:4], ["A", "B"]] = np.nan

store.append('df_mixed', df_mixed)
df_mixed1 = store.select('df_mixed')
store.append("df_mixed", df_mixed)
df_mixed1 = store.select("df_mixed")
df_mixed1
df_mixed1.dtypes.value_counts()

Expand All @@ -97,7 +99,7 @@ columns, this is equivalent to passing a

.. ipython:: python

store.select('df', columns=['A', 'B'])
store.select("df", columns=["A", "B"])

``HDFStore`` now serializes MultiIndex dataframes when appending tables.

Expand Down Expand Up @@ -160,29 +162,31 @@ combined result, by using ``where`` on a selector table.

.. ipython:: python

df_mt = pd.DataFrame(np.random.randn(8, 6),
index=pd.date_range('1/1/2000', periods=8),
columns=['A', 'B', 'C', 'D', 'E', 'F'])
df_mt['foo'] = 'bar'
df_mt = pd.DataFrame(
np.random.randn(8, 6),
index=pd.date_range("1/1/2000", periods=8),
columns=["A", "B", "C", "D", "E", "F"],
)
df_mt["foo"] = "bar"

# you can also create the tables individually
store.append_to_multiple({'df1_mt': ['A', 'B'], 'df2_mt': None},
df_mt, selector='df1_mt')
store.append_to_multiple(
{"df1_mt": ["A", "B"], "df2_mt": None}, df_mt, selector="df1_mt"
)
store

# individual tables were created
store.select('df1_mt')
store.select('df2_mt')
store.select("df1_mt")
store.select("df2_mt")

# as a multiple
store.select_as_multiple(['df1_mt', 'df2_mt'], where=['A>0', 'B>0'],
selector='df1_mt')
store.select_as_multiple(["df1_mt", "df2_mt"], where=["A>0", "B>0"], selector="df1_mt")

.. ipython:: python
:suppress:

store.close()
os.remove('store.h5')
os.remove("store.h5")

**Enhancements**

Expand Down
50 changes: 28 additions & 22 deletions doc/source/whatsnew/v0.12.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ API changes

.. ipython:: python

p = pd.DataFrame({'first': [4, 5, 8], 'second': [0, 0, 3]})
p = pd.DataFrame({"first": [4, 5, 8], "second": [0, 0, 3]})
p % 0
p % p
p / p
Expand Down Expand Up @@ -95,8 +95,8 @@ API changes

.. ipython:: python

df = pd.DataFrame(range(5), index=list('ABCDE'), columns=['a'])
mask = (df.a % 2 == 0)
df = pd.DataFrame(range(5), index=list("ABCDE"), columns=["a"])
mask = df.a % 2 == 0
mask

# this is what you should use
Expand Down Expand Up @@ -141,21 +141,24 @@ API changes
.. code-block:: python

from pandas.io.parsers import ExcelFile
xls = ExcelFile('path_to_file.xls')
xls.parse('Sheet1', index_col=None, na_values=['NA'])

xls = ExcelFile("path_to_file.xls")
xls.parse("Sheet1", index_col=None, na_values=["NA"])

With

.. code-block:: python

import pandas as pd
pd.read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])

pd.read_excel("path_to_file.xls", "Sheet1", index_col=None, na_values=["NA"])

- added top-level function ``read_sql`` that is equivalent to the following

.. code-block:: python

from pandas.io.sql import read_frame

read_frame(...)

- ``DataFrame.to_html`` and ``DataFrame.to_latex`` now accept a path for
Expand Down Expand Up @@ -200,7 +203,7 @@ IO enhancements
.. ipython:: python
:okwarning:

df = pd.DataFrame({'a': range(3), 'b': list('abc')})
df = pd.DataFrame({"a": range(3), "b": list("abc")})
print(df)
html = df.to_html()
alist = pd.read_html(html, index_col=0)
Expand Down Expand Up @@ -248,16 +251,18 @@ IO enhancements
.. ipython:: python

from pandas._testing import makeCustomDataframe as mkdf

df = mkdf(5, 3, r_idx_nlevels=2, c_idx_nlevels=4)
df.to_csv('mi.csv')
print(open('mi.csv').read())
pd.read_csv('mi.csv', header=[0, 1, 2, 3], index_col=[0, 1])
df.to_csv("mi.csv")
print(open("mi.csv").read())
pd.read_csv("mi.csv", header=[0, 1, 2, 3], index_col=[0, 1])

.. ipython:: python
:suppress:

import os
os.remove('mi.csv')

os.remove("mi.csv")

- Support for ``HDFStore`` (via ``PyTables 3.0.0``) on Python3

Expand Down Expand Up @@ -304,8 +309,8 @@ Other enhancements

.. ipython:: python

df = pd.DataFrame({'a': list('ab..'), 'b': [1, 2, 3, 4]})
df.replace(regex=r'\s*\.\s*', value=np.nan)
df = pd.DataFrame({"a": list("ab.."), "b": [1, 2, 3, 4]})
df.replace(regex=r"\s*\.\s*", value=np.nan)

to replace all occurrences of the string ``'.'`` with zero or more
instances of surrounding white space with ``NaN``.
Expand All @@ -314,7 +319,7 @@ Other enhancements

.. ipython:: python

df.replace('.', np.nan)
df.replace(".", np.nan)

to replace all occurrences of the string ``'.'`` with ``NaN``.

Expand Down Expand Up @@ -359,16 +364,16 @@ Other enhancements

.. ipython:: python

dff = pd.DataFrame({'A': np.arange(8), 'B': list('aabbbbcc')})
dff.groupby('B').filter(lambda x: len(x) > 2)
dff = pd.DataFrame({"A": np.arange(8), "B": list("aabbbbcc")})
dff.groupby("B").filter(lambda x: len(x) > 2)

Alternatively, instead of dropping the offending groups, we can return a
like-indexed objects where the groups that do not pass the filter are
filled with NaNs.

.. ipython:: python

dff.groupby('B').filter(lambda x: len(x) > 2, dropna=False)
dff.groupby("B").filter(lambda x: len(x) > 2, dropna=False)

- Series and DataFrame hist methods now take a ``figsize`` argument (:issue:`3834`)

Expand Down Expand Up @@ -397,17 +402,18 @@ Experimental features

from pandas.tseries.offsets import CustomBusinessDay
from datetime import datetime

# As an interesting example, let's look at Egypt where
# a Friday-Saturday weekend is observed.
weekmask_egypt = 'Sun Mon Tue Wed Thu'
weekmask_egypt = "Sun Mon Tue Wed Thu"
# They also observe International Workers' Day so let's
# add that for a couple of years
holidays = ['2012-05-01', datetime(2013, 5, 1), np.datetime64('2014-05-01')]
holidays = ["2012-05-01", datetime(2013, 5, 1), np.datetime64("2014-05-01")]
bday_egypt = CustomBusinessDay(holidays=holidays, weekmask=weekmask_egypt)
dt = datetime(2013, 4, 30)
print(dt + 2 * bday_egypt)
dts = pd.date_range(dt, periods=5, freq=bday_egypt)
print(pd.Series(dts.weekday, dts).map(pd.Series('Mon Tue Wed Thu Fri Sat Sun'.split())))
print(pd.Series(dts.weekday, dts).map(pd.Series("Mon Tue Wed Thu Fri Sat Sun".split())))

Bug fixes
~~~~~~~~~
Expand All @@ -430,14 +436,14 @@ Bug fixes
.. ipython:: python
:okwarning:

strs = 'go', 'bow', 'joe', 'slow'
strs = "go", "bow", "joe", "slow"
ds = pd.Series(strs)

for s in ds.str:
print(s)

s
s.dropna().values.item() == 'w'
s.dropna().values.item() == "w"

The last element yielded by the iterator will be a ``Series`` containing
the last element of the longest string in the ``Series`` with all other
Expand Down
Loading