Skip to content

Add parse_multipart_identifier function to parser #860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 17, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4687,6 +4687,54 @@ impl<'a> Parser<'a> {
Ok(idents)
}

/// Parse identifiers of form ident1[.identN]*
pub fn parse_multipart_identifier(&mut self) -> Result<Vec<Ident>, ParserError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this function more closely, I wonder how if we can't reuse parse_identifiers from above?

Or maybe we could add some more documentation (like docstrings / tests) for this function to make it clearer it is designed to parse strings into identifiers 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think parse_identifiers above is the odd one out, as it accepts any delimiter between words/idents (except equal sign) which means it would parse test + one + two as a valid sequence of identifiers. This is why I introduced the new parse_multipart_identifier to be more strict about what constitutes a compound/multipart identifier.

Though I did base parse_multipart_identifier primarily on Spark SQL/Postgres syntax, so maybe parse_identifiers is generic to accommodate some other type of syntax

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon some more reflection, I think we should accept this code and update the documentation strings on parse_multipart_identifier and parse_identifiers to explain in more detail what they do and how they are different.

I can try and find time to update the documentation maybe next week -- or @Jefffrey do you have time to do so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've update the doc a bit, let me know any further suggestions

let mut idents = vec![];

// expecting at least one word for identifier
match self.next_token().token {
Token::Word(w) => idents.push(w.to_ident()),
Token::EOF => {
return Err(ParserError::ParserError(
"Empty input when parsing identifier".to_string(),
))?
}
token => {
return Err(ParserError::ParserError(format!(
"Unexpected token in identifier: {token}"
)))?
}
};

// parse optional next parts if exist
loop {
match self.next_token().token {
// ensure that optional period is succeeded by another identifier
Token::Period => match self.next_token().token {
Token::Word(w) => idents.push(w.to_ident()),
Token::EOF => {
return Err(ParserError::ParserError(
"Trailing period in identifier".to_string(),
))?
}
token => {
return Err(ParserError::ParserError(format!(
"Unexpected token following period in identifier: {token}"
)))?
}
},
Token::EOF => break,
token => {
return Err(ParserError::ParserError(format!(
"Unexpected token in identifier: {token}"
)))?
}
}
}

Ok(idents)
}

/// Parse a simple one-word identifier (possibly quoted, possibly a keyword)
pub fn parse_identifier(&mut self) -> Result<Ident, ParserError> {
let next_token = self.next_token();
Expand Down Expand Up @@ -7429,4 +7477,84 @@ mod tests {
))
);
}

#[test]
fn test_parse_multipart_identifier_positive() {
let dialect = TestedDialects {
dialects: vec![Box::new(GenericDialect {})],
};

// parse multipart with quotes
let expected = vec![
Ident {
value: "CATALOG".to_string(),
quote_style: None,
},
Ident {
value: "F(o)o. \"bar".to_string(),
quote_style: Some('"'),
},
Ident {
value: "table".to_string(),
quote_style: None,
},
];
dialect.run_parser_method(r#"CATALOG."F(o)o. ""bar".table"#, |parser| {
let actual = parser.parse_multipart_identifier().unwrap();
assert_eq!(expected, actual);
});

// allow whitespace between ident parts
let expected = vec![
Ident {
value: "CATALOG".to_string(),
quote_style: None,
},
Ident {
value: "table".to_string(),
quote_style: None,
},
];
dialect.run_parser_method("CATALOG . table", |parser| {
let actual = parser.parse_multipart_identifier().unwrap();
assert_eq!(expected, actual);
});
}

#[test]
fn test_parse_multipart_identifier_negative() {
macro_rules! test_parse_multipart_identifier_error {
($input:expr, $expected_err:expr $(,)?) => {{
all_dialects().run_parser_method(&*$input, |parser| {
let actual_err = parser.parse_multipart_identifier().unwrap_err();
assert_eq!(actual_err.to_string(), $expected_err);
});
}};
}

test_parse_multipart_identifier_error!(
"",
"sql parser error: Empty input when parsing identifier",
);

test_parse_multipart_identifier_error!(
"*schema.table",
"sql parser error: Unexpected token in identifier: *",
);

test_parse_multipart_identifier_error!(
"schema.table*",
"sql parser error: Unexpected token in identifier: *",
);

test_parse_multipart_identifier_error!(
"schema.table.",
"sql parser error: Trailing period in identifier",
);

test_parse_multipart_identifier_error!(
"schema.*",
"sql parser error: Unexpected token following period in identifier: *",
);
}
}