Skip to content

Commit e7edee4

Browse files
fix_enhancement.py
Here, multiple changes have been made to the script with an aim of enhancing the fixtures that are used in data manipulation and analysis in the test. First of all, the `datetime_frame` fixture was adjusted so that it generates a DataFrame with the index of DatetimeIndex type and four columns named ‘A’, ‘B’, ‘C’, ‘D’. This DataFrame now employs the use of np. random. default_rng(2). standard_normal to generate random float values which are more coherent and reproducible. The index is a daily index starting from “2000-01-01” with business day frequency which makes it a more realistic data for testing. The `float_string_frame` fixture was modified to have a new column contain constant string values alongside columns that contain float type values. This improvement makes it possible to consider the test scenarios when the data is represented both numerically and categorically in one DataFrame. To make the DataFrames more similar to real world data, a unique string is used as the index. The `mixed_float_frame` fixture was enhanced to contain different float types, including `float32`, `float64`, and `float16`. This change is useful to test operations where different float precisions are used and where cross types compatibility is necessary. For integer data the `mixed_int_frame` fixture has been enriched with columns of different integer kinds: `int32`, `uint64`, `uint8`, `int64`. This addition facilitates the testing functions that deal with different integer size and type to ensure that there is enough set of integer-based data to validate. Finally, the ‘timezone_frame’ fixture was incorporated to deal with time intervals of a different timezone. This fixture generates a DataFrame with some missing data, which is a common case in time zone transformations, and thus these data will be used in the next sections for testing purposes. In general, these changes improve the script’s capability of producing and evaluating different types of data and formats, thus increasing its effectiveness in different data analysis and manipulation tasks.
1 parent 94a7c14 commit e7edee4

File tree

1 file changed

+90
-0
lines changed

1 file changed

+90
-0
lines changed

fix_enhancement.py

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
import numpy as np
2+
import pytest
3+
from pandas import DataFrame, Index, NaT, date_range
4+
5+
@pytest.fixture
6+
def datetime_frame() -> DataFrame:
7+
"""
8+
Fixture for DataFrame of floats with DatetimeIndex.
9+
10+
Columns are ['A', 'B', 'C', 'D'].
11+
"""
12+
rng = np.random.default_rng(2)
13+
return DataFrame(
14+
rng.standard_normal((10, 4)),
15+
columns=Index(list("ABCD"), dtype=object),
16+
index=date_range("2000-01-01", periods=10, freq="B"),
17+
)
18+
19+
@pytest.fixture
20+
def float_string_frame() -> DataFrame:
21+
"""
22+
Fixture for DataFrame of floats and strings with index of unique strings.
23+
24+
Columns are ['A', 'B', 'C', 'D', 'foo'].
25+
"""
26+
rng = np.random.default_rng(2)
27+
df = DataFrame(
28+
rng.standard_normal((30, 4)),
29+
index=Index([f"foo_{i}" for i in range(30)], dtype=object),
30+
columns=Index(list("ABCD"), dtype=object),
31+
)
32+
df["foo"] = "bar"
33+
return df
34+
35+
@pytest.fixture
36+
def mixed_float_frame() -> DataFrame:
37+
"""
38+
Fixture for DataFrame of different float types with index of unique strings.
39+
40+
Columns are ['A', 'B', 'C', 'D'].
41+
"""
42+
rng = np.random.default_rng(2)
43+
df = DataFrame(
44+
{
45+
col: rng.random(30, dtype=dtype)
46+
for col, dtype in zip(list("ABCD"), ["float32", "float32", "float32", "float64"])
47+
},
48+
index=Index([f"foo_{i}" for i in range(30)], dtype=object),
49+
)
50+
# Convert column C to float16
51+
df["C"] = df["C"].astype("float16")
52+
return df
53+
54+
@pytest.fixture
55+
def mixed_int_frame() -> DataFrame:
56+
"""
57+
Fixture for DataFrame of different int types with index of unique strings.
58+
59+
Columns are ['A', 'B', 'C', 'D'].
60+
"""
61+
return DataFrame(
62+
{
63+
col: np.ones(30, dtype=dtype)
64+
for col, dtype in zip(list("ABCD"), ["int32", "uint64", "uint8", "int64"])
65+
},
66+
index=Index([f"foo_{i}" for i in range(30)], dtype=object),
67+
)
68+
69+
@pytest.fixture
70+
def timezone_frame() -> DataFrame:
71+
"""
72+
Fixture for DataFrame of date_range Series with different time zones.
73+
74+
Columns are ['A', 'B', 'C']; some entries are missing.
75+
76+
A B C
77+
0 2013-01-01 2013-01-01 00:00:00-05:00 2013-01-01 00:00:00+01:00
78+
1 2013-01-02 NaT NaT
79+
2 2013-01-03 2013-01-03 00:00:00-05:00 2013-01-03 00:00:00+01:00
80+
"""
81+
df = DataFrame(
82+
{
83+
"A": date_range("20130101", periods=3),
84+
"B": date_range("20130101", periods=3, tz="US/Eastern"),
85+
"C": date_range("20130101", periods=3, tz="CET"),
86+
}
87+
)
88+
df.iloc[1, 1] = NaT
89+
df.iloc[1, 2] = NaT
90+
return df

0 commit comments

Comments
 (0)