Add support for first, last aggregate function parsing #880

mustafasrepo · 2023-05-11T11:28:04Z

Closes #877

This PR adds support for FIRST, LAST aggregate functions.
As an example, see queries below

SELECT FIRST(x ORDER BY x) AS a FROM T

or

SELECT LAST(x ORDER BY x) AS a FROM T

With this PR we can parse above queries successfully.
Common dialects support this feature, for instance, duckdb (see link), databricks (see link) supports this feature.

coveralls · 2023-05-11T11:32:10Z

Pull Request Test Coverage Report for Build 4947357822

46 of 78 (58.97%) changed or added relevant lines in 3 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.1%) to 86.045%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/ast/mod.rs	18	22	81.82%
src/parser.rs	24	52	46.15%

Totals
Change from base Build 4931911477:	-0.1%
Covered Lines:	14342
Relevant Lines:	16668

💛 - Coveralls

alamb

Thanks @mustafasrepo -- sorry for the delay in reviewing.

alamb · 2023-05-17T17:00:54Z

src/ast/mod.rs

@@ -3549,6 +3555,66 @@ impl fmt::Display for ArrayAgg {
    }
 }

+/// An `FIRST` invocation `FIRST( [ DISTINCT ] <expr> [ORDER BY <expr>] [LIMIT <n>] )`


I think these new structs follow the existing patterns in this file but they seem very specific to me (as in do we really need two new structs that are almost identical to each other as well as to Function.

Did you consider adding an order_by clause to Function and adjusting the parse_function function?

So something like

pub struct Function { pub name: ObjectName, pub args: Vec<FunctionArg>, pub over: Option<WindowSpec>, // aggregate functions may specify eg `COUNT(DISTINCT x)` pub distinct: bool, // Some functions must be called without trailing parentheses, for example Postgres // do it for current_catalog, current_schema, etc. This flags is used for formatting. pub special: bool, // optional ORDER BY expression <----------- Proposed Addition pub order_by: Option<Box<OrderByExpr>>, }

I think this would both reduce the code required for this PR and make the codebase easier to maintain, as well as would extend naturally to other aggregate functions (e.g. databricks offers the first_value function in addition to first)

Adding an order_by to Function would allow users to parse any function with an order_by clause which I think would be quite valuable

I didn't consider, this. However, it seems like a better approach to me. I will experiment with it, then let you know about result. Thanks for the suggestion.

alamb · 2023-05-17T17:01:40Z

src/parser.rs

+        self.expect_token(&Token::LParen)?;
+        let expr = Box::new(self.parse_expr()?);
+        // ANSI SQL and BigQuery define ORDER BY inside function.
+        if !self.dialect.supports_within_after_array_aggregation() {


I don't understand the call to supports_within_after_array_aggregation here as this is not an array aggregate.

alamb · 2023-05-17T17:02:50Z

tests/sqlparser_common.rs

+
+    for sql in [
+        "SELECT FIRST(x ORDER BY x) AS a FROM T",
+        "SELECT LAST(x ORDER BY x) AS a FROM T",


given there is code above to handle WITHIN GROUP during parsing, can you either: 1) Remove that code, or 2) add test coverage for it?

// Snowflake defines ORDERY BY in within group instead of inside the function like // ANSI SQL. self.expect_token(&Token::RParen)?; let within_group = if self.parse_keywords(&[Keyword::WITHIN, Keyword::GROUP]) { self.expect_token(&Token::LParen)?; self.expect_keywords(&[Keyword::ORDER, Keyword::BY])?; let order_by_expr = self.parse_order_by_expr()?; self.expect_token(&Token::RParen)?; Some(Box::new(order_by_expr)) } else { None };

mustafasrepo · 2023-05-18T12:46:56Z

@alamb. I have tried you suggestion. New PR puts order by field inside Function. I think new version is much more clear. Until we decide for final approach, both PRs will open (I have converted this to draft). By the way, I didn't do some of the reviews in this PR, (since they are obsolete in the new PR). If we somehow, decide to continue with this approach, I will address them. You can find new Pr in the link

alamb · 2023-05-18T18:59:44Z

I agree #882 looks great and I have merged that one in

mustafasrepo added 4 commits May 9, 2023 16:15

Initial Commit

870b3de

Remove distinct and limit from first and last

1c4c3b8

Merge branch 'main' into feature/first_last_support

4cbadd7

fix buggy test

4d8076a

alamb reviewed May 17, 2023

View reviewed changes

mustafasrepo marked this pull request as draft May 18, 2023 12:41

mustafasrepo closed this May 18, 2023

mustafasrepo deleted the feature/first_last_support branch May 22, 2023 06:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for first, last aggregate function parsing #880

Add support for first, last aggregate function parsing #880

mustafasrepo commented May 11, 2023 •

edited

Loading

coveralls commented May 11, 2023

alamb left a comment

alamb May 17, 2023

mustafasrepo May 18, 2023

alamb May 17, 2023

alamb May 17, 2023

mustafasrepo commented May 18, 2023

alamb commented May 18, 2023

Add support for first, last aggregate function parsing #880

Add support for first, last aggregate function parsing #880

Conversation

mustafasrepo commented May 11, 2023 • edited Loading

coveralls commented May 11, 2023

Pull Request Test Coverage Report for Build 4947357822

💛 - Coveralls

alamb left a comment

Choose a reason for hiding this comment

alamb May 17, 2023

Choose a reason for hiding this comment

mustafasrepo May 18, 2023

Choose a reason for hiding this comment

alamb May 17, 2023

Choose a reason for hiding this comment

alamb May 17, 2023

Choose a reason for hiding this comment

mustafasrepo commented May 18, 2023

alamb commented May 18, 2023

mustafasrepo commented May 11, 2023 •

edited

Loading