Skip to content

Fail to parse - "select * from (table_name) alias" (Snowflake) #223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hwchen opened this issue Jul 9, 2020 · 15 comments · Fixed by #260
Closed

Fail to parse - "select * from (table_name) alias" (Snowflake) #223

hwchen opened this issue Jul 9, 2020 · 15 comments · Fixed by #260

Comments

@hwchen
Copy link

hwchen commented Jul 9, 2020

Related to #154

sqlparser now parses select * from (table_name). However, it does not parse select * from (table_name) alias.

Would it make sense to add this functionality? If so, would that mean adding an alias field to TableWithJoins?

@Dandandan Dandandan changed the title Fail to parse - "select * from (table_name)" Fail to parse - "select * from (table_name) alias" Jul 10, 2020
@hwchen
Copy link
Author

hwchen commented Jul 10, 2020

It looks like select * from (select * from table_name) alias works.

Edit: Ah, of course this is expected, as seen in #154.

@nickolay
Copy link
Contributor

As noted in #154, FROM (table_name) alias is not valid SQL. Which database supports that?

This becomes hard to reason about. The SQL standard allows FROM table_name alias and FROM (tab1 alias1 CROSS JOIN tab2 alias2). We relaxed our parser to allow omitting the JOIN inside the parens, making it accept FROM (tab1 alias1) too. But it doesn't make sense to allow FROM (table_name alias1) alias2. So I'm not sure what grammar the SQL dialect acceptingFROM (table_name) alias uses...

@hwchen
Copy link
Author

hwchen commented Jul 15, 2020

I think it's reasonable not to handle this in this library, I can think of some workarounds. This is snowflake :) and just double-checked that this syntax works.

@nickolay
Copy link
Contributor

Its docs don't explain how it handles this.. Out of curiosity, how does it react to FROM (table_name alias1) and FROM (table_name alias1) alias2?

@eyalleshem
Copy link
Contributor

eyalleshem commented Jul 27, 2020

FROM (table_name alias1) works as expected ,
while
FROM (table_name alias1) alias2 complain about that alias2 is duplicate.

Don't sure exactly what the roles of this syntax ..

@nickolay nickolay changed the title Fail to parse - "select * from (table_name) alias" Fail to parse - "select * from (table_name) alias" (Snowflake) Jul 27, 2020
@nickolay
Copy link
Contributor

Thanks @eyalleshem! I think forking parse_table_factor for Snowflake for now (and using a dialect-based switch in the parser - #241 to integrate it in the mainline) is the proper solution here.

@nickolay
Copy link
Contributor

nickolay commented Jul 28, 2020

In my previous comment I forgot that there's also the problem of representing this in AST. @eyalleshem, as you indicated you'd like to work on this, please start by documenting the grammar the Snowflake dialect will implement here, and how it can fit into our AST.

@eyalleshem
Copy link
Contributor

eyalleshem commented Jul 29, 2020

I opened a PR - should be enable that,
I took 2 assumptions (both of them correct in snowflake ):

  1. Table could not be re-alias (select * from (t as a) as a1 should fail )
  2. Alias not allowed in join of multiple table (select * from (t1 natural join t2) as a1 also should fail)

thats enable me to not change the AST , and in case of select * from ((t)) as a the alias will be add to the TableFactor::Table that represent t .

@nickolay
Copy link
Contributor

Ah, so the only difference on the parser level is that of the three TableFactor variants (Table, Derived, or Nested), we currently allow aliases on the first two only, where your version parses it after Nested(Join) as well, but uses the logic you describe to find the inner TableFactor (Table or Derived) to store it in.

This is a smart solution, but I don't feel we gain much by not changing the AST, and it would be simpler to change NestedJoin to a TableFactor::Nested { inner: TableWithJoins, alias: Option<TableAlias> } and leave it up to the consumer to choose:

  • either validate that nested.inner.join.is_empty() && nested.alias.is_none() as required by the standard,
  • or implement the algorithm you described to figure out which relations an alias applies to.

@eyalleshem
Copy link
Contributor

Well,
My view is from the snowflake perspective - and from this perspective i see 2 advantage to the current solution :

  1. Query that would not compile by snowflake - will not passed the parser
  2. The alias is actually for the inner table - so i think it be more easy for the consumer to use it

I agree that if we encounter other DB that allow single table in parens- but not follow the assumptions that describe above , maybe it's should be handled differently.

But if you think it's better , i don't care about changing the behaviour..

@nickolay
Copy link
Contributor

I realize that it is more convenient for consumers to have some semantic processing happen in the library, but simply parsing SQL is rather hard, so I'd rather not complicate matters by attempting to guess the semantics during parsing... Happy to hear what others think as well!

@eyalleshem
Copy link
Contributor

eyalleshem commented Jul 30, 2020

@nickolay - according to your suggestion - you would enable parsing joins with aliases even on the generic dialect ?

For example - do you think that this statement should be parsed in the generic dialect:
SELECT * from (t1 NATURAL JOIN t2 ) as a1.

I asked that because the only difference that i see between snowflake and the ANSI sql here that the question if they allow "redundant" parens around single table factor , but snowflake not allow to alias "real joins" ( like the other dialect). so i don't sure that i feel comfortable to parse that on snowflake, but throw an error in the other dialect.

@eyalleshem
Copy link
Contributor

I think maybe i have other suggestion -

In snowflake dialect : if the parse_table_and_joins return a NestedJoin , but this object has only relation without joins. we will replace the NestedJoin with the TableFactor that in the relation.

This solution will eliminate the problem of aliasing real joins (joins with multiple tables) , as currently no database support that.

And redundant parens around single table will not treat anymore as joins (I think that what confused my most- because real join is between tables ).

Btw , this solution will have impact only on snowflake dialect...

@nickolay
Copy link
Contributor

do you think that this statement should be parsed in the generic dialect: SELECT * from (t1 NATURAL JOIN t2 ) as a1.

That's how it could work if my yesterday's suggestion were implemented, yes. There's no requirement that the generic dialect must be able to parse that, it follows naturally from the proposed design of the AST and lack of validation in the generic dialect...

Your new idea sounds promising though. If losing the information about the redundant parens is fine, I think it could be an elegant solution.

@eyalleshem
Copy link
Contributor

I update the PD according to the suggestion ,
I can't think about use-case that the information about the redundant parens is important to something ..

nickolay added a commit to eyalleshem/sqlparser-rs that referenced this issue Oct 13, 2020
Snowflake diverges from the standard and from most of the other
implementations by allowing extra parentheses not only around a join,
but around lone table names (e.g. `FROM (mytable [AS alias])`) and
around derived tables (e.g. `FROM ((SELECT ...)  [AS alias])`) as well.

Initially this was implemented in apache#154
by (ab)using `TableFactor::NestedJoin` to represent anything nested in
extra set of parens.

Afterwards we learned in apache#223
that in cases of such extraneous nesting Snowflake allows specifying the
alias both inside and outside parens, but not both - consider:

    FROM (table_factor AS inner_alias) AS outer_alias

We've considered implementing this by changing `TableFactor::NestedJoin`
to a `TableFactor::Nested { inner: TableWithJoins, alias:
Option<TableAlias> }`, but that seemed too generic, as no known dialect
supports duplicate aliases, as shown above, nor naming nested joins
`(foo NATURAL JOIN bar) alias`. So we decided on making a smaller change
(with no modifications to the AST), that is also more appropriate to the
contributors to the Snowflake dialect:


1) Revert apache#154 by rejecting `FROM (table or derived table)` in most dialects.

2) For `dialect_of!(self is SnowflakeDialect | GenericDialect)` parse
and strip the extraneous parentheses, e.g.

   `(mytable) AS alias` -> `(mytable AS alias)`


Co-authored-by: Eyal Leshem <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants