Skip to content

fix: Two letter language code must be supported #3258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 2, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 65 additions & 2 deletions src/sagemaker/clarify.py
Original file line number Diff line number Diff line change
Expand Up @@ -512,68 +512,131 @@ class TextConfig:
_SUPPORTED_GRANULARITIES = ["token", "sentence", "paragraph"]
_SUPPORTED_LANGUAGES = [
"chinese",
"zh",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we coming up with this list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the ISO two letter language codes for the supported languages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I meant is, is this a full list or some of them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a full list of supported languages. Not full list of all languages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so are we limited by support by some underlying library or we have some restriction on our side?
Also whats the plan for keeping them up to date?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We are limited by an underlying library. The update has to be manual.

"danish",
"da",
"dutch",
"nl",
"english",
"en",
"french",
"fr",
"german",
"de",
"greek",
"el",
"italian",
"it",
"japanese",
"ja",
"lithuanian",
"lt",
"multi-language",
"xx",
"norwegian bokmål",
"nb",
"polish",
"pl",
"portuguese",
"pt",
"romanian",
"ro",
"russian",
"ru",
"spanish",
"es",
"afrikaans",
"af",
"albanian",
"sq",
"arabic",
"ar",
"armenian",
"hy",
"basque",
"eu",
"bengali",
"bn",
"bulgarian",
"bg",
"catalan",
"ca",
"croatian",
"hr",
"czech",
"cs",
"estonian",
"et",
"finnish",
"fi",
"gujarati",
"gu",
"hebrew",
"he",
"hindi",
"hi",
"hungarian",
"hu",
"icelandic",
"is",
"indonesian",
"id",
"irish",
"ga",
"kannada",
"kn",
"kyrgyz",
"ky",
"latvian",
"lv",
"ligurian",
"lij",
"luxembourgish",
"lb",
"macedonian",
"mk",
"malayalam",
"ml",
"marathi",
"mr",
"nepali",
"ne",
"persian",
"fa",
"sanskrit",
"sa",
"serbian",
"sr",
"setswana",
"tn",
"sinhala",
"si",
"slovak",
"sk",
"slovenian",
"sl",
"swedish",
"sv",
"tagalog",
"tl",
"tamil",
"ta",
"tatar",
"tt",
"telugu",
"te",
"thai",
"th",
"turkish",
"tr",
"ukrainian",
"uk",
"urdu",
"ur",
"vietnamese",
"vi",
"yoruba",
"yo",
]

def __init__(
Expand Down Expand Up @@ -602,8 +665,8 @@ def __init__(
``"persian"``, ``"sanskrit"``, ``"serbian"``, ``"setswana"``, ``"sinhala"``,
``"slovak"``, ``"slovenian"``, ``"swedish"``, ``"tagalog"``, ``"tamil"``,
``"tatar"``, ``"telugu"``, ``"thai"``, ``"turkish"``, ``"ukrainian"``, ``"urdu"``,
``"vietnamese"``, ``"yoruba"``.
Use ``"multi-language"`` for a mix of multiple languages.
``"vietnamese"``, ``"yoruba"``. Use "multi-language" for a mix of multiple
languages. The corresponding two-letter ISO codes are also accepted.

Raises:
ValueError: when ``granularity`` is not in list of supported values
Expand Down