Skip to content

Cannot load local file from Windows path #3173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JoseJuan98 opened this issue Jun 15, 2022 · 5 comments
Closed

Cannot load local file from Windows path #3173

JoseJuan98 opened this issue Jun 15, 2022 · 5 comments

Comments

@JoseJuan98
Copy link
Contributor

JoseJuan98 commented Jun 15, 2022

Describe the bug

When I am trying to create a workflow step for deployment with a script from a local path in a Windows OS the code will fail saying ValueError: is not a valid file

Step config used:

  deployment_step_config = {
                     'name': '...',
                     '...': '...',
                     'code': f'file://{os.path.join(path_to_script, 'deployment.py')}'
                    }

Example, 'code' value looks like:

'file://C:\\Users\\user1\\....\\deployment.py'

Because of being a Windows path I get the following errors:

Screens or logs

bug_workflow_step

If I remove the 'file://' then I get the following error:

bug_workflow_step_without_literal_file

But this file actually exists, if I put this path in a browser or the file explorer

To reproduce

Create a workflow step and try to upload a local script from a Windows system.

Or more simple call the method with a Windows path as parameter and the Hash class used by default:

def _hash_file(file: Union[str, Path], md5: Hash) -> Hash:

Expected behavior

That the following lines of code can parse a file path for Windows.

System information

  • SageMaker Python SDK version: 2.72.0
  • Requirements:
    • numpy>=1.16.4

    • pandas>=1.3.0

    • boto3>=1,.17.101

    • sagemaker>=2.72.0

    • protbuf==3.17.3

      Python version: 3.8
      CPU or GPU: CPU
      Custom Docker image (Y/N): N

Addiotional context

The method _hash_file expects a string to start by 'file://' but if a Windows path starts by that is wrongly recognize as a file because of the '//' and then having '\\' in the rest of the folders.

Even I tried ussing the preffix 'file:///...' but then it doesn't recognize it as the path is like '\\C:\\...'.

Solution

I would suggest to fix it by just checking in which OS this is running, for example:

def _hash_file(file: Union[str, Path], md5: Hash) -> Hash:

# new code
from platform import system

def _hash_file(file: Union[str, Path], md5: Hash) -> Hash:
    """Updates the inputted Hash with the contents of the current path.
    Args:
        file: path of the file
    Returns:
        str: The MD5 hash of the file
    """
    if isinstance(file, str) and file.lower().startswith("file://"):
         
         # changes
         file = urlparse(file)
         if 'windows' in system().lower():
             file = file.netloc
         else: # Linux, Mac
            file = file.path
       
        file = unquote(file)
        
    if not Path(file).is_file():
        raise ValueError(str(file) + " is not a valid file")
    with open(file, "rb") as f:
        while True:
            data = f.read(BUF_SIZE)
            if not data:
                break
            md5.update(data)
    return md5
@100318091
Copy link

I think it is an already known issue with Sagemaker package. Here is the fix #3051 (2.84.1)

@JoseJuan98
Copy link
Contributor Author

JoseJuan98 commented Jun 16, 2022

I think it is an already known issue with Sagemaker package. Here is the fix #3051 (2.84.1)

Actually that #3051 (2.84.1) doesn't solve the issue, that changes were the ones that introduce it.

@JoseJuan98
Copy link
Contributor Author

I can make the changes following the contribution doc and making a PR from a forked repo, but before that I would like that someone from AWS answer to me first to check if they will approve it, and if there are any guidelines for coding style to follow or constant variables to know the OS in which is running.

Thanks on advance

@staubhp
Copy link
Contributor

staubhp commented Feb 17, 2023

Sorry for the late response.

Officially, the SageMaker Python SDK only supports POSIX compliant operating systems. For that reason we're unlikely to accept a PR that provides work-arounds for Windows -- even if the change works, all of our integration testing happens on Linux distros so we can't reliably maintain this code. You'll find that there are dozens of places in the SDK where you'll run into an issue similar to this one.

I'm curious though, have you tried changing the path you're passing? This Wiki article explains that Windows does support file URIs, but in a particular format: file:///c:/path/to/the%20file.txt. Curious if converting your backslashes to forward-slashes (and possibly adding an additional forward-slash to the protocol prefix) would work.

@akrishna1995
Copy link
Contributor

Closing this issue for now. Feel free to reopen if there is any more feedback. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants