Skip to content

Commit 0d5f9db

Browse files
authored
Merge pull request #159 from pandas-dev/master
Sync Fork from Upstream Repo
2 parents 6d46116 + 7d2f5ce commit 0d5f9db

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

92 files changed

+1125
-552
lines changed

codecov.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ coverage:
88
status:
99
project:
1010
default:
11-
target: '82'
11+
target: '72'
1212
patch:
1313
default:
1414
target: '50'

doc/source/ecosystem.rst

+8
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,14 @@ far exceeding the performance of the native ``df.to_sql`` method. Internally, it
369369
Microsoft's BCP utility, but the complexity is fully abstracted away from the end user.
370370
Rigorously tested, it is a complete replacement for ``df.to_sql``.
371371

372+
`Deltalake <https://pypi.org/project/deltalake>`__
373+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
374+
375+
Deltalake python package lets you access tables stored in
376+
`Delta Lake <https://delta.io/>`__ natively in Python without the need to use Spark or
377+
JVM. It provides the ``delta_table.to_pyarrow_table().to_pandas()`` method to convert
378+
any Delta table into Pandas dataframe.
379+
372380

373381
.. _ecosystem.out-of-core:
374382

doc/source/user_guide/style.ipynb

+23-11
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,7 @@
243243
"Table styles are flexible enough to control all individual parts of the table, including column headers and indexes. \n",
244244
"However, they can be unwieldy to type for individual data cells or for any kind of conditional formatting, so we recommend that table styles are used for broad styling, such as entire rows or columns at a time.\n",
245245
"\n",
246-
"Table styles are also used to control features which can apply to the whole table at once such as greating a generic hover functionality. The `:hover` pseudo-selector, as well as other pseudo-selectors, can only be used this way.\n",
246+
"Table styles are also used to control features which can apply to the whole table at once such as creating a generic hover functionality. The `:hover` pseudo-selector, as well as other pseudo-selectors, can only be used this way.\n",
247247
"\n",
248248
"To replicate the normal format of CSS selectors and properties (attribute value pairs), e.g. \n",
249249
"\n",
@@ -295,7 +295,7 @@
295295
"cell_type": "markdown",
296296
"metadata": {},
297297
"source": [
298-
"Next we just add a couple more styling artifacts targeting specific parts of the table, and we add some internally defined CSS classes that we need for the next section. Be careful here, since we are *chaining methods* we need to explicitly instruct the method **not to** ``overwrite`` the existing styles."
298+
"Next we just add a couple more styling artifacts targeting specific parts of the table. Be careful here, since we are *chaining methods* we need to explicitly instruct the method **not to** ``overwrite`` the existing styles."
299299
]
300300
},
301301
{
@@ -308,11 +308,6 @@
308308
" {'selector': 'th.col_heading', 'props': 'text-align: center;'},\n",
309309
" {'selector': 'th.col_heading.level0', 'props': 'font-size: 1.5em;'},\n",
310310
" {'selector': 'td', 'props': 'text-align: center; font-weight: bold;'},\n",
311-
" # internal CSS classes\n",
312-
" {'selector': '.true', 'props': 'background-color: #e6ffe6;'},\n",
313-
" {'selector': '.false', 'props': 'background-color: #ffe6e6;'},\n",
314-
" {'selector': '.border-red', 'props': 'border: 2px dashed red;'},\n",
315-
" {'selector': '.border-green', 'props': 'border: 2px dashed green;'},\n",
316311
"], overwrite=False)"
317312
]
318313
},
@@ -394,7 +389,7 @@
394389
"\n",
395390
"*New in version 1.2.0*\n",
396391
"\n",
397-
"The [.set_td_classes()][tdclass] method accepts a DataFrame with matching indices and columns to the underlying [Styler][styler]'s DataFrame. That DataFrame will contain strings as css-classes to add to individual data cells: the `<td>` elements of the `<table>`. Here we add our `.true` and `.false` classes that we created previously. We will save adding the borders until the [section on tooltips](#Tooltips).\n",
392+
"The [.set_td_classes()][tdclass] method accepts a DataFrame with matching indices and columns to the underlying [Styler][styler]'s DataFrame. That DataFrame will contain strings as css-classes to add to individual data cells: the `<td>` elements of the `<table>`. Rather than use external CSS we will create our classes internally and add them to table style. We will save adding the borders until the [section on tooltips](#Tooltips).\n",
398393
"\n",
399394
"[tdclass]: ../reference/api/pandas.io.formats.style.Styler.set_td_classes.rst\n",
400395
"[styler]: ../reference/api/pandas.io.formats.style.Styler.rst"
@@ -406,6 +401,10 @@
406401
"metadata": {},
407402
"outputs": [],
408403
"source": [
404+
"s.set_table_styles([ # create internal CSS classes\n",
405+
" {'selector': '.true', 'props': 'background-color: #e6ffe6;'},\n",
406+
" {'selector': '.false', 'props': 'background-color: #ffe6e6;'},\n",
407+
"], overwrite=False)\n",
409408
"cell_color = pd.DataFrame([['true ', 'false ', 'true ', 'false '], \n",
410409
" ['false ', 'true ', 'false ', 'true ']], \n",
411410
" index=df.index, \n",
@@ -622,7 +621,7 @@
622621
"cell_type": "markdown",
623622
"metadata": {},
624623
"source": [
625-
"The only thing left to do for our table is to add the highlighting borders to draw the audience attention to the tooltips. **Setting classes always overwrites** so we need to make sure we add the previous classes."
624+
"The only thing left to do for our table is to add the highlighting borders to draw the audience attention to the tooltips. We will create internal CSS classes as before using table styles. **Setting classes always overwrites** so we need to make sure we add the previous classes."
626625
]
627626
},
628627
{
@@ -631,6 +630,10 @@
631630
"metadata": {},
632631
"outputs": [],
633632
"source": [
633+
"s.set_table_styles([ # create internal CSS classes\n",
634+
" {'selector': '.border-red', 'props': 'border: 2px dashed red;'},\n",
635+
" {'selector': '.border-green', 'props': 'border: 2px dashed green;'},\n",
636+
"], overwrite=False)\n",
634637
"cell_border = pd.DataFrame([['border-green ', ' ', ' ', 'border-red '], \n",
635638
" [' ', ' ', ' ', ' ']], \n",
636639
" index=df.index, \n",
@@ -1381,7 +1384,7 @@
13811384
"source": [
13821385
"### HTML Escaping\n",
13831386
"\n",
1384-
"Suppose you have to display HTML within HTML, that can be a bit of pain when the renderer can't distinguish. You can use the `escape` formatting option to handle this. Even use it within a formatter that contains HTML itself."
1387+
"Suppose you have to display HTML within HTML, that can be a bit of pain when the renderer can't distinguish. You can use the `escape` formatting option to handle this, and even use it within a formatter that contains HTML itself."
13851388
]
13861389
},
13871390
{
@@ -1400,7 +1403,16 @@
14001403
"metadata": {},
14011404
"outputs": [],
14021405
"source": [
1403-
"# df4.style.format(escape=True)"
1406+
"df4.style.format(escape=True)"
1407+
]
1408+
},
1409+
{
1410+
"cell_type": "code",
1411+
"execution_count": null,
1412+
"metadata": {},
1413+
"outputs": [],
1414+
"source": [
1415+
"df4.style.format('<a href=\"https://pandas.pydata.org\" target=\"_blank\">{}</a>', escape=True)"
14041416
]
14051417
},
14061418
{

doc/source/whatsnew/v1.3.0.rst

+26
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,30 @@ both XPath 1.0 and XSLT 1.0 is available. (:issue:`27554`)
110110
111111
For more, see :ref:`io.xml` in the user guide on IO tools.
112112

113+
.. _whatsnew_130.dataframe_honors_copy_with_dict:
114+
115+
DataFrame constructor honors ``copy=False`` with dict
116+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
117+
118+
When passing a dictionary to :class:`DataFrame` with ``copy=False``,
119+
a copy will no longer be made (:issue:`32960`)
120+
121+
.. ipython:: python
122+
123+
arr = np.array([1, 2, 3])
124+
df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)
125+
df
126+
127+
``df["A"]`` remains a view on ``arr``:
128+
129+
.. ipython:: python
130+
131+
arr[0] = 0
132+
assert df.iloc[0, 0] == 0
133+
134+
The default behavior when not passing ``copy`` will remain unchanged, i.e.
135+
a copy will be made.
136+
113137
.. _whatsnew_130.enhancements.other:
114138

115139
Other enhancements
@@ -546,6 +570,8 @@ Conversion
546570
- Bug in creating a :class:`DataFrame` from an empty ``np.recarray`` not retaining the original dtypes (:issue:`40121`)
547571
- Bug in :class:`DataFrame` failing to raise ``TypeError`` when constructing from a ``frozenset`` (:issue:`40163`)
548572
- Bug in :class:`Index` construction silently ignoring a passed ``dtype`` when the data cannot be cast to that dtype (:issue:`21311`)
573+
- Bug in :class:`DataFrame` construction with a dictionary containing an arraylike with ``ExtensionDtype`` and ``copy=True`` failing to make a copy (:issue:`38939`)
574+
-
549575

550576
Strings
551577
^^^^^^^

pandas/_libs/algos.pyx

+4-58
Original file line numberDiff line numberDiff line change
@@ -794,68 +794,14 @@ def backfill(ndarray[algos_t] old, ndarray[algos_t] new, limit=None) -> ndarray:
794794
return indexer
795795

796796

797-
@cython.boundscheck(False)
798-
@cython.wraparound(False)
799797
def backfill_inplace(algos_t[:] values, uint8_t[:] mask, limit=None):
800-
cdef:
801-
Py_ssize_t i, N
802-
algos_t val
803-
uint8_t prev_mask
804-
int lim, fill_count = 0
805-
806-
N = len(values)
807-
808-
# GH#2778
809-
if N == 0:
810-
return
811-
812-
lim = validate_limit(N, limit)
813-
814-
val = values[N - 1]
815-
prev_mask = mask[N - 1]
816-
for i in range(N - 1, -1, -1):
817-
if mask[i]:
818-
if fill_count >= lim:
819-
continue
820-
fill_count += 1
821-
values[i] = val
822-
mask[i] = prev_mask
823-
else:
824-
fill_count = 0
825-
val = values[i]
826-
prev_mask = mask[i]
798+
pad_inplace(values[::-1], mask[::-1], limit=limit)
827799

828800

829-
@cython.boundscheck(False)
830-
@cython.wraparound(False)
831801
def backfill_2d_inplace(algos_t[:, :] values,
832802
const uint8_t[:, :] mask,
833803
limit=None):
834-
cdef:
835-
Py_ssize_t i, j, N, K
836-
algos_t val
837-
int lim, fill_count = 0
838-
839-
K, N = (<object>values).shape
840-
841-
# GH#2778
842-
if N == 0:
843-
return
844-
845-
lim = validate_limit(N, limit)
846-
847-
for j in range(K):
848-
fill_count = 0
849-
val = values[j, N - 1]
850-
for i in range(N - 1, -1, -1):
851-
if mask[j, i]:
852-
if fill_count >= lim:
853-
continue
854-
fill_count += 1
855-
values[j, i] = val
856-
else:
857-
fill_count = 0
858-
val = values[j, i]
804+
pad_2d_inplace(values[:, ::-1], mask[:, ::-1], limit)
859805

860806

861807
@cython.boundscheck(False)
@@ -987,10 +933,10 @@ def rank_1d(
987933
* max: highest rank in group
988934
* first: ranks assigned in order they appear in the array
989935
* dense: like 'min', but rank always increases by 1 between groups
990-
ascending : boolean, default True
936+
ascending : bool, default True
991937
False for ranks by high (1) to low (N)
992938
na_option : {'keep', 'top', 'bottom'}, default 'keep'
993-
pct : boolean, default False
939+
pct : bool, default False
994940
Compute percentage rank of data within each group
995941
na_option : {'keep', 'top', 'bottom'}, default 'keep'
996942
* keep: leave NA values where they are

pandas/_libs/groupby.pyx

+25-43
Original file line numberDiff line numberDiff line change
@@ -402,9 +402,9 @@ def group_any_all(uint8_t[::1] out,
402402
ordering matching up to the corresponding record in `values`
403403
values : array containing the truth value of each element
404404
mask : array indicating whether a value is na or not
405-
val_test : str {'any', 'all'}
405+
val_test : {'any', 'all'}
406406
String object dictating whether to use any or all truth testing
407-
skipna : boolean
407+
skipna : bool
408408
Flag to ignore nan values during truth testing
409409
410410
Notes
@@ -455,11 +455,11 @@ ctypedef fused complexfloating_t:
455455

456456
@cython.wraparound(False)
457457
@cython.boundscheck(False)
458-
def _group_add(complexfloating_t[:, ::1] out,
459-
int64_t[::1] counts,
460-
ndarray[complexfloating_t, ndim=2] values,
461-
const intp_t[:] labels,
462-
Py_ssize_t min_count=0):
458+
def group_add(complexfloating_t[:, ::1] out,
459+
int64_t[::1] counts,
460+
ndarray[complexfloating_t, ndim=2] values,
461+
const intp_t[:] labels,
462+
Py_ssize_t min_count=0):
463463
"""
464464
Only aggregates on axis=0 using Kahan summation
465465
"""
@@ -506,19 +506,13 @@ def _group_add(complexfloating_t[:, ::1] out,
506506
out[i, j] = sumx[i, j]
507507

508508

509-
group_add_float32 = _group_add['float32_t']
510-
group_add_float64 = _group_add['float64_t']
511-
group_add_complex64 = _group_add['float complex']
512-
group_add_complex128 = _group_add['double complex']
513-
514-
515509
@cython.wraparound(False)
516510
@cython.boundscheck(False)
517-
def _group_prod(floating[:, ::1] out,
518-
int64_t[::1] counts,
519-
ndarray[floating, ndim=2] values,
520-
const intp_t[:] labels,
521-
Py_ssize_t min_count=0):
511+
def group_prod(floating[:, ::1] out,
512+
int64_t[::1] counts,
513+
ndarray[floating, ndim=2] values,
514+
const intp_t[:] labels,
515+
Py_ssize_t min_count=0):
522516
"""
523517
Only aggregates on axis=0
524518
"""
@@ -560,19 +554,15 @@ def _group_prod(floating[:, ::1] out,
560554
out[i, j] = prodx[i, j]
561555

562556

563-
group_prod_float32 = _group_prod['float']
564-
group_prod_float64 = _group_prod['double']
565-
566-
567557
@cython.wraparound(False)
568558
@cython.boundscheck(False)
569559
@cython.cdivision(True)
570-
def _group_var(floating[:, ::1] out,
571-
int64_t[::1] counts,
572-
ndarray[floating, ndim=2] values,
573-
const intp_t[:] labels,
574-
Py_ssize_t min_count=-1,
575-
int64_t ddof=1):
560+
def group_var(floating[:, ::1] out,
561+
int64_t[::1] counts,
562+
ndarray[floating, ndim=2] values,
563+
const intp_t[:] labels,
564+
Py_ssize_t min_count=-1,
565+
int64_t ddof=1):
576566
cdef:
577567
Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
578568
floating val, ct, oldmean
@@ -619,17 +609,13 @@ def _group_var(floating[:, ::1] out,
619609
out[i, j] /= (ct - ddof)
620610

621611

622-
group_var_float32 = _group_var['float']
623-
group_var_float64 = _group_var['double']
624-
625-
626612
@cython.wraparound(False)
627613
@cython.boundscheck(False)
628-
def _group_mean(floating[:, ::1] out,
629-
int64_t[::1] counts,
630-
ndarray[floating, ndim=2] values,
631-
const intp_t[::1] labels,
632-
Py_ssize_t min_count=-1):
614+
def group_mean(floating[:, ::1] out,
615+
int64_t[::1] counts,
616+
ndarray[floating, ndim=2] values,
617+
const intp_t[::1] labels,
618+
Py_ssize_t min_count=-1):
633619
cdef:
634620
Py_ssize_t i, j, N, K, lab, ncounts = len(counts)
635621
floating val, count, y, t
@@ -675,10 +661,6 @@ def _group_mean(floating[:, ::1] out,
675661
out[i, j] = sumx[i, j] / count
676662

677663

678-
group_mean_float32 = _group_mean['float']
679-
group_mean_float64 = _group_mean['double']
680-
681-
682664
@cython.wraparound(False)
683665
@cython.boundscheck(False)
684666
def group_ohlc(floating[:, ::1] out,
@@ -1083,10 +1065,10 @@ def group_rank(float64_t[:, ::1] out,
10831065
* max: highest rank in group
10841066
* first: ranks assigned in order they appear in the array
10851067
* dense: like 'min', but rank always increases by 1 between groups
1086-
ascending : boolean, default True
1068+
ascending : bool, default True
10871069
False for ranks by high (1) to low (N)
10881070
na_option : {'keep', 'top', 'bottom'}, default 'keep'
1089-
pct : boolean, default False
1071+
pct : bool, default False
10901072
Compute percentage rank of data within each group
10911073
na_option : {'keep', 'top', 'bottom'}, default 'keep'
10921074
* keep: leave NA values where they are

0 commit comments

Comments
 (0)