Skip to content

Incorrect parsing literals that consisting of digits and letters, beginning with digits #804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DiamondMofeng opened this issue Feb 3, 2023 · 5 comments · Fixed by #856

Comments

@DiamondMofeng
Copy link

DiamondMofeng commented Feb 3, 2023

It seems when parsing a literal beginning with digits, the sql parser simply ends this token when meets a non-numeric character

see https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/tokenizer.rs#L547

For example, 123foobar would be parsed into a number token 123 and a word token foobar instead of one word token 123foobar.

And 123foobar456 would be parsed into 123 and foobar456

@DiamondMofeng
Copy link
Author

DiamondMofeng commented Feb 3, 2023

Althrough nodody would name their columns like 1col 00column, mysql does allow this.

CREATE DATABASE test;
use test;
CREATE TABLE my_table (
	id int,
	1col int,
    2col int
);

INSERT INTO my_table VALUES (1,2,3);
INSERT INTO my_table VALUES (2,3,4);
INSERT INTO my_table VALUES (3,4,5);
 
SELECT * FROM my_table;
SELECT 1col FROM my_table
MariaDB [(none)]> CREATE DATABASE test;
Query OK, 1 row affected (0.000 sec)

MariaDB [(none)]> use test;
INSERT INTO my_table VALUES (3,4,5);

SELECT * FROM my_table;
SELECT 1col FROM my_table
Database changed
MariaDB [test]> CREATE TABLE my_table (
    -> id int,
    -> 1col int,
    ->     2col int
    -> );
Query OK, 0 rows affected (0.029 sec)

MariaDB [test]>
MariaDB [test]> INSERT INTO my_table VALUES (1,2,3);
Query OK, 1 row affected (0.005 sec)

MariaDB [test]> INSERT INTO my_table VALUES (2,3,4);
Query OK, 1 row affected (0.003 sec)

MariaDB [test]> INSERT INTO my_table VALUES (3,4,5);
Query OK, 1 row affected (0.007 sec)

MariaDB [test]>
MariaDB [test]> SELECT * FROM my_table;
+------+------+------+
| id   | 1col | 2col |
+------+------+------+
|    1 |    2 |    3 |
|    2 |    3 |    4 |
|    3 |    4 |    5 |
+------+------+------+
3 rows in set (0.000 sec)

MariaDB [test]> SELECT 1col FROM my_table
    -> ;
+------+
| 1col |
+------+
|    2 |
|    3 |
|    4 |
+------+
3 rows in set (0.000 sec)

MariaDB [test]>

@DiamondMofeng
Copy link
Author

DiamondMofeng commented Feb 3, 2023

图片

As you can see, SELECT 1col FROM mytable was treated as SELECT 1 col FROM mytable

@DiamondMofeng
Copy link
Author

DiamondMofeng commented Feb 6, 2023

Okay, I think it has nothing to do with if some dialect supports column name beginning with numbers. This is simply a bug about parsing word tokens incorrectly
图片

@Jefffrey
Copy link
Contributor

Jefffrey commented Feb 6, 2023

I believe this is intended behaviour, at least for Generic/Postgres dialect. Relevant bit of discussion here:

#768 (comment)

Though some dialects can support having their identifiers begin with a number, such as Hive:

https://github.com/sqlparser-rs/sqlparser-rs/blob/4955863bdf0e9d90e0fff72bca7912b3d20daac2/src/dialect/hive.rs#L23-L28

Though this does cause some issues with parsing numbers, for example exponent parsing does not work (related to above discussion)

Since Hive supports it, same could be enabled for MySql now which has an existing comment about it:

https://github.com/sqlparser-rs/sqlparser-rs/blob/4955863bdf0e9d90e0fff72bca7912b3d20daac2/src/dialect/mysql.rs#L19-L29

@AviRaboah
Copy link
Contributor

Any update about this issue? do this issue should be fixed in the near future for mysql?

thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants