Skip to content

Commit 71703d6

Browse files
jameslambzhengruifeng
authored andcommitted
[MINOR][DOCS] clarify array_position return value
### What changes were proposed in this pull request? Proposes a slight modification to the docs for SQL function `array_position()`, to clarify the return value and that function's behavior when no match is found. ### Why are the changes needed? In my opinion, the docs at https://spark.apache.org/docs/latest/api/sql/index.html#array_position leave too much room for confusion. > *array_position(array, element) - Returns the (1-based) index of the first element of the array as long.* > Examples: > ``` > SELECT array_position(array(3, 2, 1), 1); > 3 > ``` Because the return value also happens to be a return value in the example array, and because the doc says "first element" instead of "first matching element", I think it'd be easy to misunderstand this function as doing something like "given `array_position(a, i)`, return `arr[i]`". The code comment on this function is very clear, but I think the user-facing docs would benefit from similar clarification: https://github.com/apache/spark/blob/05bc73a83921a9e606609a12750932f95bd5b3f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L2134-L2141 This PR proposes modifying the doc to remove such confusion, by: * providing an example array and search values which are unlikely to be confused with indices * modifying the example such that it shows the behavior when `array_position(a, v)` is called on an array `a` containing multiple instances of `v` * adding an example showing the behavior of this function when no match is found ### Does this PR introduce _any_ user-facing change? Yes, minor user-facing documentation change. ### How was this patch tested? I did not test this patch at all, just followed the patterns I saw elsewhere in the same file for documentation strings. I did check that this was the only place this text came from, like this: ```shell git grep 'index of the first element of the array as long' ``` ### Notes for Reviewers Thanks very much for your time and consideration. Closes #41892 from jameslamb/docs/array-position. Authored-by: James Lamb <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
1 parent ea6bacf commit 71703d6

File tree

2 files changed

+6
-3
lines changed

2 files changed

+6
-3
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala

+5-2
Original file line numberDiff line numberDiff line change
@@ -2141,12 +2141,15 @@ case class ArrayMax(child: Expression)
21412141
*/
21422142
@ExpressionDescription(
21432143
usage = """
2144-
_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long.
2144+
_FUNC_(array, element) - Returns the (1-based) index of the first matching element of
2145+
the array as long, or 0 if no match is found.
21452146
""",
21462147
examples = """
21472148
Examples:
2148-
> SELECT _FUNC_(array(3, 2, 1), 1);
2149+
> SELECT _FUNC_(array(312, 773, 708, 708), 708);
21492150
3
2151+
> SELECT _FUNC_(array(312, 773, 708, 708), 414);
2152+
0
21502153
""",
21512154
group = "array_funcs",
21522155
since = "2.4.0")

sql/core/src/test/resources/sql-functions/sql-expression-schema.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
| org.apache.spark.sql.catalyst.expressions.ArrayJoin | array_join | SELECT array_join(array('hello', 'world'), ' ') | struct<array_join(array(hello, world), ):string> |
2626
| org.apache.spark.sql.catalyst.expressions.ArrayMax | array_max | SELECT array_max(array(1, 20, null, 3)) | struct<array_max(array(1, 20, NULL, 3)):int> |
2727
| org.apache.spark.sql.catalyst.expressions.ArrayMin | array_min | SELECT array_min(array(1, 20, null, 3)) | struct<array_min(array(1, 20, NULL, 3)):int> |
28-
| org.apache.spark.sql.catalyst.expressions.ArrayPosition | array_position | SELECT array_position(array(3, 2, 1), 1) | struct<array_position(array(3, 2, 1), 1):bigint> |
28+
| org.apache.spark.sql.catalyst.expressions.ArrayPosition | array_position | SELECT array_position(array(312, 773, 708, 708), 708) | struct<array_position(array(312, 773, 708, 708), 708):bigint> |
2929
| org.apache.spark.sql.catalyst.expressions.ArrayPrepend | array_prepend | SELECT array_prepend(array('b', 'd', 'c', 'a'), 'd') | struct<array_prepend(array(b, d, c, a), d):array<string>> |
3030
| org.apache.spark.sql.catalyst.expressions.ArrayRemove | array_remove | SELECT array_remove(array(1, 2, 3, null, 3), 3) | struct<array_remove(array(1, 2, 3, NULL, 3), 3):array<int>> |
3131
| org.apache.spark.sql.catalyst.expressions.ArrayRepeat | array_repeat | SELECT array_repeat('123', 2) | struct<array_repeat(123, 2):array<string>> |

0 commit comments

Comments
 (0)