-
Notifications
You must be signed in to change notification settings - Fork 1.2k
"FastFile"
for Processing Job Input
#3962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think this is actually low prio, since I was able to use |
As far as I can see, there is no validation in the code that would prevent from using "FastFile" instead of "File" for The only thing that seems to be missing is mention to "FastFile" in the docstring (e.g. https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/processing.py#L1245). I will open a PR for that. |
Closing as answered and the dosctrings was fixed. Feel free to reopen if you think the issue still stands. |
Thanks @martinRenou. |
It seems this doesn't actually work as the underlying API does not support it:
https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ProcessingS3Input.html I also get the following error when trying to use
|
To clarify, I did not try the suggested solution by Martin. |
@lorenzwalthert Thanks for the clarification. Could you reopen this issue? If not, I'll create a new one. |
I can't reopen it seems. |
Hi @lorenzwalthert, Thanks for using SageMaker and taking the time to suggest ways to improve SageMaker Python SDK. We have added your feature request it to our backlog of feature requests and may consider putting it into future SDK versions. I will go ahead and close the issue now, please let me know if you have any more feedback. Let me know if you have any other questions. Best, |
@ShwetaSingh801 I recently had the same error:
I can provide a MWE if needed (using @martinRenou Can you, please, comment on this? |
Also tagging @mohanasudhan and @akrishna1995 on this as they approved #4311 (without adding/updating tests?!). |
Describe the feature you'd like
"FastFile" to be an available option for
s3_input_mode
insagemaker.Processing.ProcessingInput
, in addition to"File"
and"Pipe"
. The s3 input mode is already available for TrainingInput since 2021 and greatly improves speed (-82%) according to an AWS Blog post.How would this feature be used? Please describe.
To speed up processing jobs compared to donwloading all data and allow complex filtering of files before accessing them.
Describe alternatives you've considered
Other methods like
sagemaker.s3.S3Downloader()
. Problem: I can't shard by s3 key and have to build my own sharding logic.s3_data_type
insagemaker.Processing.ProcessingInput
to filter out by prefix: Problem: Some data can't be easily filtered by prefix and you need more complex pattern matching.Additional context
I know it's not an SDK topic as long as the underlaying APIs don't provide that functionality but I don't know where I can put the feature request otherwise.
The text was updated successfully, but these errors were encountered: