Skip to content

gh-135034: Normalize link targets in tarfile, add os.path.realpath(strict='allow_missing') #135037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 3, 2025
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions Doc/library/os.path.rst
Original file line number Diff line number Diff line change
Expand Up @@ -408,9 +408,26 @@ the :mod:`glob` module.)
system). On Windows, this function will also resolve MS-DOS (also called 8.3)
style names such as ``C:\\PROGRA~1`` to ``C:\\Program Files``.

If a path doesn't exist or a symlink loop is encountered, and *strict* is
``True``, :exc:`OSError` is raised. If *strict* is ``False`` these errors
are ignored, and so the result might be missing or otherwise inaccessible.
By default, the path is evaluated up to the first component that does not
exist, is a symlink loop, or whose evaluation raises :exc:`OSError`.
All such components are appended unchanged to the existing part of the path.

Some errors that are handled this way include "access denied", "not a
directory", or "bad argument to internal function". Thus, the
resulting path may be missing or inaccessible, may still contain
links or loops, and may traverse non-directories.

This behavior can be modified by keyword arguments:

If *strict* is ``True``, the first error encountered when evaluating the path is
re-raised.
In particular, :exc:`FileNotFoundError` is raised if *path* does not exist,
or another :exc:`OSError` if it is otherwise inaccessible.

If *strict* is the string ``'allow_missing'``, errors other than
:exc:`FileNotFoundError` are re-raised (as with ``strict=True``).
Thus, the returned path will not contain any symbolic links, but the named
file and some of its parent directories may be missing.

.. note::
This function emulates the operating system's procedure for making a path
Expand All @@ -429,6 +446,8 @@ the :mod:`glob` module.)
.. versionchanged:: 3.10
The *strict* parameter was added.

.. versionchanged:: next
The ``'allow_missing'`` value for *strict* parameter was added.

.. function:: relpath(path, start=os.curdir)

Expand Down
20 changes: 20 additions & 0 deletions Doc/library/tarfile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,15 @@ The :mod:`tarfile` module defines the following exceptions:
Raised to refuse extracting a symbolic link pointing outside the destination
directory.

.. exception:: LinkFallbackError

Raised to refuse emulating a link (hard or symbolic) by extracting another
archive member, when that member would be rejected by the filter location.
The exception that was raised to reject the replacement member is available
as :attr:`!BaseException.__context__`.

.. versionadded:: next


The following constants are available at the module level:

Expand Down Expand Up @@ -1068,6 +1077,12 @@ reused in custom filters:
Implements the ``'data'`` filter.
In addition to what ``tar_filter`` does:

- Normalize link targets (:attr:`TarInfo.linkname`) using
:func:`os.path.normpath`.
Note that this removes internal ``..`` components, which may change the
meaning of the link if the path in :attr:`!TarInfo.linkname` traverses
symbolic links.

- :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft)
that link to absolute paths, or ones that link outside the destination.

Expand Down Expand Up @@ -1099,6 +1114,10 @@ reused in custom filters:
Note that this filter does not block *all* dangerous archive features.
See :ref:`tarfile-further-verification` for details.

.. versionchanged:: next

Link targets are now normalized.


.. _tarfile-extraction-refuse:

Expand Down Expand Up @@ -1127,6 +1146,7 @@ Here is an incomplete list of things to consider:
* Extract to a :func:`new temporary directory <tempfile.mkdtemp>`
to prevent e.g. exploiting pre-existing links, and to make it easier to
clean up after a failed extraction.
* Disallow symbolic links if you do not need the functionality.
* When working with untrusted data, use external (e.g. OS-level) limits on
disk, memory and CPU usage.
* Check filenames against an allow-list of characters
Expand Down
32 changes: 32 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,16 @@ math
(Contributed by Sergey B Kirpichev in :gh:`132908`.)


os.path
-------

* The *strict* parameter to :func:`os.path.realpath` accepts a new value,
``'allow_missing'``.
If used, errors other than :exc:`FileNotFoundError` will be re-raised;
the resulting path can be missing but it will be free of symlinks.
(Contributed by Petr Viktorin for :cve:`2025-4517`.)


shelve
------

Expand All @@ -128,6 +138,28 @@ ssl
(Contributed by Will Childs-Klein in :gh:`133624`.)


tarfile
-------

* :func:`~tarfile.data_filter` now normalizes symbolic link targets in order to
avoid path traversal attacks.
(Contributed by Petr Viktorin in :gh:`127987` and :cve:`2025-4138`.)
* :func:`~tarfile.TarFile.extractall` now skips fixing up directory attributes
when a directory was removed or replaced by another kind of file.
(Contributed by Petr Viktorin in :gh:`127987` and :cve:`2024-12718`.)
* :func:`~tarfile.TarFile.extract` and :func:`~tarfile.TarFile.extractall`
now (re-)apply the extraction filter when substituting a link (hard or
symbolic) with a copy of another archive member, and when fixing up
directory attributes.
The former raises a new exception, :exc:`~tarfile.LinkFallbackError`.
(Contributed by Petr Viktorin for :cve:`2025-4330` and :cve:`2024-12718`.)
* :func:`~tarfile.TarFile.extract` and :func:`~tarfile.TarFile.extractall`
no longer extract rejected members when
:func:`~tarfile.TarFile.errorlevel` is zero.
(Contributed by Matt Prodani and Petr Viktorin in :gh:`112887`
and :cve:`2025-4435`.)


zlib
----

Expand Down
36 changes: 23 additions & 13 deletions Lib/ntpath.py
Original file line number Diff line number Diff line change
Expand Up @@ -601,9 +601,10 @@ def abspath(path):
from nt import _findfirstfile, _getfinalpathname, readlink as _nt_readlink
except ImportError:
# realpath is a no-op on systems without _getfinalpathname support.
realpath = abspath
def realpath(path, *, strict=False):
return abspath(path)
else:
def _readlink_deep(path):
def _readlink_deep(path, ignored_error=OSError):
# These error codes indicate that we should stop reading links and
# return the path we currently have.
# 1: ERROR_INVALID_FUNCTION
Expand Down Expand Up @@ -636,7 +637,7 @@ def _readlink_deep(path):
path = old_path
break
path = normpath(join(dirname(old_path), path))
except OSError as ex:
except ignored_error as ex:
if ex.winerror in allowed_winerror:
break
raise
Expand All @@ -645,7 +646,7 @@ def _readlink_deep(path):
break
return path

def _getfinalpathname_nonstrict(path):
def _getfinalpathname_nonstrict(path, ignored_error=OSError):
# These error codes indicate that we should stop resolving the path
# and return the value we currently have.
# 1: ERROR_INVALID_FUNCTION
Expand Down Expand Up @@ -673,25 +674,26 @@ def _getfinalpathname_nonstrict(path):
try:
path = _getfinalpathname(path)
return join(path, tail) if tail else path
except OSError as ex:
except ignored_error as ex:
if ex.winerror not in allowed_winerror:
raise
try:
# The OS could not resolve this path fully, so we attempt
# to follow the link ourselves. If we succeed, join the tail
# and return.
new_path = _readlink_deep(path)
new_path = _readlink_deep(path,
ignored_error=ignored_error)
if new_path != path:
return join(new_path, tail) if tail else new_path
except OSError:
except ignored_error:
# If we fail to readlink(), let's keep traversing
pass
# If we get these errors, try to get the real name of the file without accessing it.
if ex.winerror in (1, 5, 32, 50, 87, 1920, 1921):
try:
name = _findfirstfile(path)
path, _ = split(path)
except OSError:
except ignored_error:
path, name = split(path)
else:
path, name = split(path)
Expand Down Expand Up @@ -721,24 +723,32 @@ def realpath(path, *, strict=False):
if normcase(path) == devnull:
return '\\\\.\\NUL'
had_prefix = path.startswith(prefix)

if strict == 'allow_missing':
ignored_error = FileNotFoundError
strict = True
elif strict:
ignored_error = ()
else:
ignored_error = OSError

if not had_prefix and not isabs(path):
path = join(cwd, path)
try:
path = _getfinalpathname(path)
initial_winerror = 0
except ValueError as ex:
# gh-106242: Raised for embedded null characters
# In strict mode, we convert into an OSError.
# In strict modes, we convert into an OSError.
# Non-strict mode returns the path as-is, since we've already
# made it absolute.
if strict:
raise OSError(str(ex)) from None
path = normpath(path)
except OSError as ex:
if strict:
raise
except ignored_error as ex:
initial_winerror = ex.winerror
path = _getfinalpathname_nonstrict(path)
path = _getfinalpathname_nonstrict(path,
ignored_error=ignored_error)
# The path returned by _getfinalpathname will always start with \\?\ -
# strip off that prefix unless it was already provided on the original
# path.
Expand Down
55 changes: 32 additions & 23 deletions Lib/posixpath.py
Original file line number Diff line number Diff line change
Expand Up @@ -402,10 +402,18 @@ def realpath(filename, *, strict=False):
curdir = '.'
pardir = '..'
getcwd = os.getcwd
return _realpath(filename, strict, sep, curdir, pardir, getcwd)
if strict == 'allow_missing':
ignored_error = FileNotFoundError
strict = True
elif strict:
ignored_error = ()
else:
ignored_error = OSError

lstat = os.lstat
readlink = os.readlink
maxlinks = None

def _realpath(filename, strict=False, sep=sep, curdir=curdir, pardir=pardir,
getcwd=os.getcwd, lstat=os.lstat, readlink=os.readlink, maxlinks=None):
# The stack of unresolved path parts. When popped, a special value of None
# indicates that a symlink target has been resolved, and that the original
# symlink path can be retrieved by popping again. The [::-1] slice is a
Expand Down Expand Up @@ -477,27 +485,28 @@ def _realpath(filename, strict=False, sep=sep, curdir=curdir, pardir=pardir,
path = newpath
continue
target = readlink(newpath)
except OSError:
if strict:
raise
path = newpath
except ignored_error:
pass
else:
# Resolve the symbolic link
if target.startswith(sep):
# Symlink target is absolute; reset resolved path.
path = sep
if maxlinks is None:
# Mark this symlink as seen but not fully resolved.
seen[newpath] = None
# Push the symlink path onto the stack, and signal its specialness
# by also pushing None. When these entries are popped, we'll
# record the fully-resolved symlink target in the 'seen' mapping.
rest.append(newpath)
rest.append(None)
# Push the unresolved symlink target parts onto the stack.
target_parts = target.split(sep)[::-1]
rest.extend(target_parts)
part_count += len(target_parts)
continue
# Resolve the symbolic link
if target.startswith(sep):
# Symlink target is absolute; reset resolved path.
path = sep
if maxlinks is None:
# Mark this symlink as seen but not fully resolved.
seen[newpath] = None
# Push the symlink path onto the stack, and signal its specialness
# by also pushing None. When these entries are popped, we'll
# record the fully-resolved symlink target in the 'seen' mapping.
rest.append(newpath)
rest.append(None)
# Push the unresolved symlink target parts onto the stack.
target_parts = target.split(sep)[::-1]
rest.extend(target_parts)
part_count += len(target_parts)
# An error occurred and was ignored.
path = newpath

return path

Expand Down
Loading
Loading