fix_enhancement.py

AnandPolamarasetti · web-flow · commit e7edee47c870 · 2024-08-31T19:02:08.000+05:00
Here, multiple changes have been made to the script with an aim of enhancing the fixtures that are used in data manipulation and analysis in the test. 
 
 First of all, the `datetime_frame` fixture was adjusted so that it generates a DataFrame with the index of DatetimeIndex type and four columns named ‘A’, ‘B’, ‘C’, ‘D’. This DataFrame now employs the use of np. random. default_rng(2). standard_normal to generate random float values which are more coherent and reproducible. The index is a daily index starting from “2000-01-01” with business day frequency which makes it a more realistic data for testing. 
 
 The `float_string_frame` fixture was modified to have a new column contain constant string values alongside columns that contain float type values. This improvement makes it possible to consider the test scenarios when the data is represented both numerically and categorically in one DataFrame. To make the DataFrames more similar to real world data, a unique string is used as the index. 
 
 The `mixed_float_frame` fixture was enhanced to contain different float types, including `float32`, `float64`, and `float16`. This change is useful to test operations where different float precisions are used and where cross types compatibility is necessary. 
 
 For integer data the `mixed_int_frame` fixture has been enriched with columns of different integer kinds: `int32`, `uint64`, `uint8`, `int64`. This addition facilitates the testing functions that deal with different integer size and type to ensure that there is enough set of integer-based data to validate. 
 
 Finally, the ‘timezone_frame’ fixture was incorporated to deal with time intervals of a different timezone. This fixture generates a DataFrame with some missing data, which is a common case in time zone transformations, and thus these data will be used in the next sections for testing purposes. 
 
 In general, these changes improve the script’s capability of producing and evaluating different types of data and formats, thus increasing its effectiveness in different data analysis and manipulation tasks.
diff --git a/fix_enhancement.py b/fix_enhancement.py
@@ -0,0 +1,90 @@
+import numpy as np
+import pytest
+from pandas import DataFrame, Index, NaT, date_range
+
+@pytest.fixture
+def datetime_frame() -> DataFrame:
+    """
+    Fixture for DataFrame of floats with DatetimeIndex.
+
+    Columns are ['A', 'B', 'C', 'D'].
+    """
+    rng = np.random.default_rng(2)
+    return DataFrame(
+        rng.standard_normal((10, 4)),
+        columns=Index(list("ABCD"), dtype=object),
+        index=date_range("2000-01-01", periods=10, freq="B"),
+    )
+
+@pytest.fixture
+def float_string_frame() -> DataFrame:
+    """
+    Fixture for DataFrame of floats and strings with index of unique strings.
+
+    Columns are ['A', 'B', 'C', 'D', 'foo'].
+    """
+    rng = np.random.default_rng(2)
+    df = DataFrame(
+        rng.standard_normal((30, 4)),
+        index=Index([f"foo_{i}" for i in range(30)], dtype=object),
+        columns=Index(list("ABCD"), dtype=object),
+    )
+    df["foo"] = "bar"
+    return df
+
+@pytest.fixture
+def mixed_float_frame() -> DataFrame:
+    """
+    Fixture for DataFrame of different float types with index of unique strings.
+
+    Columns are ['A', 'B', 'C', 'D'].
+    """
+    rng = np.random.default_rng(2)
+    df = DataFrame(
+        {
+            col: rng.random(30, dtype=dtype)
+            for col, dtype in zip(list("ABCD"), ["float32", "float32", "float32", "float64"])
+        },
+        index=Index([f"foo_{i}" for i in range(30)], dtype=object),
+    )
+    # Convert column C to float16
+    df["C"] = df["C"].astype("float16")
+    return df
+
+@pytest.fixture
+def mixed_int_frame() -> DataFrame:
+    """
+    Fixture for DataFrame of different int types with index of unique strings.
+
+    Columns are ['A', 'B', 'C', 'D'].
+    """
+    return DataFrame(
+        {
+            col: np.ones(30, dtype=dtype)
+            for col, dtype in zip(list("ABCD"), ["int32", "uint64", "uint8", "int64"])
+        },
+        index=Index([f"foo_{i}" for i in range(30)], dtype=object),
+    )
+
+@pytest.fixture
+def timezone_frame() -> DataFrame:
+    """
+    Fixture for DataFrame of date_range Series with different time zones.
+
+    Columns are ['A', 'B', 'C']; some entries are missing.
+
+               A                         B                         C
+    0 2013-01-01 2013-01-01 00:00:00-05:00 2013-01-01 00:00:00+01:00
+    1 2013-01-02                       NaT                       NaT
+    2 2013-01-03 2013-01-03 00:00:00-05:00 2013-01-03 00:00:00+01:00
+    """
+    df = DataFrame(
+        {
+            "A": date_range("20130101", periods=3),
+            "B": date_range("20130101", periods=3, tz="US/Eastern"),
+            "C": date_range("20130101", periods=3, tz="CET"),
+        }
+    )
+    df.iloc[1, 1] = NaT
+    df.iloc[1, 2] = NaT
+    return df