Skip to content

Commit c1b01c2

Browse files
zhengruifengragnarok56
authored andcommitted
[SPARK-44891][PYTHON][CONNECT] Enable Doctests of rand, randn and log
### What changes were proposed in this pull request? I roughly went thought all the skipped doctests in `pyspark.sql.functions`, and find we can enabled doctests of `rand`, `randn` and `log`, by making them deterministic: - specify the `numPartitions` in `spark.range` for `rand` `randn`; - changes the input values for `log` ### Why are the changes needed? Enable Doctests of `rand`, `randn` and `log`, improve test coverage ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? enabled doctests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42584 from zhengruifeng/make_doctest_deterministic. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
1 parent 14784f8 commit c1b01c2

File tree

1 file changed

+30
-29
lines changed

1 file changed

+30
-29
lines changed

python/pyspark/sql/functions.py

+30-29
Original file line numberDiff line numberDiff line change
@@ -4616,13 +4616,13 @@ def rand(seed: Optional[int] = None) -> Column:
46164616
46174617
Examples
46184618
--------
4619-
>>> df = spark.range(2)
4620-
>>> df.withColumn('rand', rand(seed=42) * 3).show() # doctest: +SKIP
4619+
>>> from pyspark.sql import functions as F
4620+
>>> spark.range(0, 2, 1, 1).withColumn('rand', F.rand(seed=42) * 3).show()
46214621
+---+------------------+
46224622
| id| rand|
46234623
+---+------------------+
4624-
| 0|1.4385751892400076|
4625-
| 1|1.7082186019706387|
4624+
| 0|1.8575681106759028|
4625+
| 1|1.5288056527339444|
46264626
+---+------------------+
46274627
"""
46284628
if seed is not None:
@@ -4657,14 +4657,14 @@ def randn(seed: Optional[int] = None) -> Column:
46574657
46584658
Examples
46594659
--------
4660-
>>> df = spark.range(2)
4661-
>>> df.withColumn('randn', randn(seed=42)).show() # doctest: +SKIP
4662-
+---+--------------------+
4663-
| id| randn|
4664-
+---+--------------------+
4665-
| 0|-0.04167221574820542|
4666-
| 1| 0.15241403986452778|
4667-
+---+--------------------+
4660+
>>> from pyspark.sql import functions as F
4661+
>>> spark.range(0, 2, 1, 1).withColumn('randn', F.randn(seed=42)).show()
4662+
+---+------------------+
4663+
| id| randn|
4664+
+---+------------------+
4665+
| 0| 2.384479054241165|
4666+
| 1|0.1920934041293524|
4667+
+---+------------------+
46684668
"""
46694669
if seed is not None:
46704670
return _invoke_function("randn", seed)
@@ -5159,26 +5159,27 @@ def log(arg1: Union["ColumnOrName", float], arg2: Optional["ColumnOrName"] = Non
51595159
51605160
Examples
51615161
--------
5162-
>>> df = spark.createDataFrame([10, 100, 1000], "INT")
5163-
>>> df.select(log(10.0, df.value).alias('ten')).show() # doctest: +SKIP
5164-
+---+
5165-
|ten|
5166-
+---+
5167-
|1.0|
5168-
|2.0|
5169-
|3.0|
5170-
+---+
5162+
>>> from pyspark.sql import functions as F
5163+
>>> df = spark.sql("SELECT * FROM VALUES (1), (2), (4) AS t(value)")
5164+
>>> df.select(F.log(2.0, df.value).alias('log2_value')).show()
5165+
+----------+
5166+
|log2_value|
5167+
+----------+
5168+
| 0.0|
5169+
| 1.0|
5170+
| 2.0|
5171+
+----------+
51715172
51725173
And Natural logarithm
51735174
5174-
>>> df.select(log(df.value)).show() # doctest: +SKIP
5175-
+-----------------+
5176-
| ln(value)|
5177-
+-----------------+
5178-
|2.302585092994046|
5179-
|4.605170185988092|
5180-
|4.605170185988092|
5181-
+-----------------+
5175+
>>> df.select(F.log(df.value).alias('ln_value')).show()
5176+
+------------------+
5177+
| ln_value|
5178+
+------------------+
5179+
| 0.0|
5180+
|0.6931471805599453|
5181+
|1.3862943611198906|
5182+
+------------------+
51825183
"""
51835184
if arg2 is None:
51845185
return _invoke_function_over_columns("log", cast("ColumnOrName", arg1))

0 commit comments

Comments
 (0)