Skip to content

Commit c358142

Browse files
ambvencukousethmlarsonAA-Turnerserhiy-storchaka
authored andcommitted
[3.12] pythongh-135034: Normalize link targets in tarfile, add os.path.realpath(strict='allow_missing') (pythonGH-135037)
Addresses CVEs 2024-12718, 2025-4138, 2025-4330, and 2025-4517. (cherry picked from commit 3612d8f) Co-authored-by: Łukasz Langa <[email protected]> Signed-off-by: Łukasz Langa <[email protected]> Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Seth Michael Larson <[email protected]> Co-authored-by: Adam Turner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
1 parent d4cf1fa commit c358142

File tree

11 files changed

+1173
-134
lines changed

11 files changed

+1173
-134
lines changed

Doc/library/os.path.rst

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -377,10 +377,26 @@ the :mod:`glob` module.)
377377
links encountered in the path (if they are supported by the operating
378378
system).
379379

380-
If a path doesn't exist or a symlink loop is encountered, and *strict* is
381-
``True``, :exc:`OSError` is raised. If *strict* is ``False``, the path is
382-
resolved as far as possible and any remainder is appended without checking
383-
whether it exists.
380+
By default, the path is evaluated up to the first component that does not
381+
exist, is a symlink loop, or whose evaluation raises :exc:`OSError`.
382+
All such components are appended unchanged to the existing part of the path.
383+
384+
Some errors that are handled this way include "access denied", "not a
385+
directory", or "bad argument to internal function". Thus, the
386+
resulting path may be missing or inaccessible, may still contain
387+
links or loops, and may traverse non-directories.
388+
389+
This behavior can be modified by keyword arguments:
390+
391+
If *strict* is ``True``, the first error encountered when evaluating the path is
392+
re-raised.
393+
In particular, :exc:`FileNotFoundError` is raised if *path* does not exist,
394+
or another :exc:`OSError` if it is otherwise inaccessible.
395+
396+
If *strict* is :py:data:`os.path.ALLOW_MISSING`, errors other than
397+
:exc:`FileNotFoundError` are re-raised (as with ``strict=True``).
398+
Thus, the returned path will not contain any symbolic links, but the named
399+
file and some of its parent directories may be missing.
384400

385401
.. note::
386402
This function emulates the operating system's procedure for making a path
@@ -399,6 +415,15 @@ the :mod:`glob` module.)
399415
.. versionchanged:: 3.10
400416
The *strict* parameter was added.
401417

418+
.. versionchanged:: next
419+
The :py:data:`~os.path.ALLOW_MISSING` value for the *strict* parameter
420+
was added.
421+
422+
.. data:: ALLOW_MISSING
423+
424+
Special value used for the *strict* argument in :func:`realpath`.
425+
426+
.. versionadded:: next
402427

403428
.. function:: relpath(path, start=os.curdir)
404429

Doc/library/tarfile.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,15 @@ The :mod:`tarfile` module defines the following exceptions:
249249
Raised to refuse extracting a symbolic link pointing outside the destination
250250
directory.
251251

252+
.. exception:: LinkFallbackError
253+
254+
Raised to refuse emulating a link (hard or symbolic) by extracting another
255+
archive member, when that member would be rejected by the filter location.
256+
The exception that was raised to reject the replacement member is available
257+
as :attr:`!BaseException.__context__`.
258+
259+
.. versionadded:: next
260+
252261

253262
The following constants are available at the module level:
254263

@@ -1039,6 +1048,12 @@ reused in custom filters:
10391048
Implements the ``'data'`` filter.
10401049
In addition to what ``tar_filter`` does:
10411050

1051+
- Normalize link targets (:attr:`TarInfo.linkname`) using
1052+
:func:`os.path.normpath`.
1053+
Note that this removes internal ``..`` components, which may change the
1054+
meaning of the link if the path in :attr:`!TarInfo.linkname` traverses
1055+
symbolic links.
1056+
10421057
- :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft)
10431058
that link to absolute paths, or ones that link outside the destination.
10441059

@@ -1067,6 +1082,10 @@ reused in custom filters:
10671082

10681083
Return the modified ``TarInfo`` member.
10691084

1085+
.. versionchanged:: next
1086+
1087+
Link targets are now normalized.
1088+
10701089

10711090
.. _tarfile-extraction-refuse:
10721091

@@ -1093,6 +1112,7 @@ Here is an incomplete list of things to consider:
10931112
* Extract to a :func:`new temporary directory <tempfile.mkdtemp>`
10941113
to prevent e.g. exploiting pre-existing links, and to make it easier to
10951114
clean up after a failed extraction.
1115+
* Disallow symbolic links if you do not need the functionality.
10961116
* When working with untrusted data, use external (e.g. OS-level) limits on
10971117
disk, memory and CPU usage.
10981118
* Check filenames against an allow-list of characters

Doc/whatsnew/3.12.rst

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2320,3 +2320,37 @@ sys
23202320
* The previously undocumented special function :func:`sys.getobjects`,
23212321
which only exists in specialized builds of Python, may now return objects
23222322
from other interpreters than the one it's called in.
2323+
2324+
2325+
Notable changes in 3.12.10
2326+
==========================
2327+
2328+
os.path
2329+
-------
2330+
2331+
* The *strict* parameter to :func:`os.path.realpath` accepts a new value,
2332+
:data:`os.path.ALLOW_MISSING`.
2333+
If used, errors other than :exc:`FileNotFoundError` will be re-raised;
2334+
the resulting path can be missing but it will be free of symlinks.
2335+
(Contributed by Petr Viktorin for :cve:`2025-4517`.)
2336+
2337+
tarfile
2338+
-------
2339+
2340+
* :func:`~tarfile.data_filter` now normalizes symbolic link targets in order to
2341+
avoid path traversal attacks.
2342+
(Contributed by Petr Viktorin in :gh:`127987` and :cve:`2025-4138`.)
2343+
* :func:`~tarfile.TarFile.extractall` now skips fixing up directory attributes
2344+
when a directory was removed or replaced by another kind of file.
2345+
(Contributed by Petr Viktorin in :gh:`127987` and :cve:`2024-12718`.)
2346+
* :func:`~tarfile.TarFile.extract` and :func:`~tarfile.TarFile.extractall`
2347+
now (re-)apply the extraction filter when substituting a link (hard or
2348+
symbolic) with a copy of another archive member, and when fixing up
2349+
directory attributes.
2350+
The former raises a new exception, :exc:`~tarfile.LinkFallbackError`.
2351+
(Contributed by Petr Viktorin for :cve:`2025-4330` and :cve:`2024-12718`.)
2352+
* :func:`~tarfile.TarFile.extract` and :func:`~tarfile.TarFile.extractall`
2353+
no longer extract rejected members when
2354+
:func:`~tarfile.TarFile.errorlevel` is zero.
2355+
(Contributed by Matt Prodani and Petr Viktorin in :gh:`112887`
2356+
and :cve:`2025-4435`.)

Lib/genericpath.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
__all__ = ['commonprefix', 'exists', 'getatime', 'getctime', 'getmtime',
1010
'getsize', 'isdir', 'isfile', 'islink', 'samefile', 'sameopenfile',
11-
'samestat']
11+
'samestat', 'ALLOW_MISSING']
1212

1313

1414
# Does a path exist?
@@ -165,3 +165,12 @@ def _check_arg_types(funcname, *args):
165165
f'os.PathLike object, not {s.__class__.__name__!r}') from None
166166
if hasstr and hasbytes:
167167
raise TypeError("Can't mix strings and bytes in path components") from None
168+
169+
# A singleton with a true boolean value.
170+
@object.__new__
171+
class ALLOW_MISSING:
172+
"""Special value for use in realpath()."""
173+
def __repr__(self):
174+
return 'os.path.ALLOW_MISSING'
175+
def __reduce__(self):
176+
return self.__class__.__name__

Lib/ntpath.py

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@
3030
"ismount", "expanduser","expandvars","normpath","abspath",
3131
"curdir","pardir","sep","pathsep","defpath","altsep",
3232
"extsep","devnull","realpath","supports_unicode_filenames","relpath",
33-
"samefile", "sameopenfile", "samestat", "commonpath", "isjunction"]
33+
"samefile", "sameopenfile", "samestat", "commonpath", "isjunction",
34+
"ALLOW_MISSING"]
3435

3536
def _get_bothseps(path):
3637
if isinstance(path, bytes):
@@ -609,9 +610,10 @@ def abspath(path):
609610
from nt import _getfinalpathname, readlink as _nt_readlink
610611
except ImportError:
611612
# realpath is a no-op on systems without _getfinalpathname support.
612-
realpath = abspath
613+
def realpath(path, *, strict=False):
614+
return abspath(path)
613615
else:
614-
def _readlink_deep(path):
616+
def _readlink_deep(path, ignored_error=OSError):
615617
# These error codes indicate that we should stop reading links and
616618
# return the path we currently have.
617619
# 1: ERROR_INVALID_FUNCTION
@@ -644,7 +646,7 @@ def _readlink_deep(path):
644646
path = old_path
645647
break
646648
path = normpath(join(dirname(old_path), path))
647-
except OSError as ex:
649+
except ignored_error as ex:
648650
if ex.winerror in allowed_winerror:
649651
break
650652
raise
@@ -653,7 +655,7 @@ def _readlink_deep(path):
653655
break
654656
return path
655657

656-
def _getfinalpathname_nonstrict(path):
658+
def _getfinalpathname_nonstrict(path, ignored_error=OSError):
657659
# These error codes indicate that we should stop resolving the path
658660
# and return the value we currently have.
659661
# 1: ERROR_INVALID_FUNCTION
@@ -680,17 +682,18 @@ def _getfinalpathname_nonstrict(path):
680682
try:
681683
path = _getfinalpathname(path)
682684
return join(path, tail) if tail else path
683-
except OSError as ex:
685+
except ignored_error as ex:
684686
if ex.winerror not in allowed_winerror:
685687
raise
686688
try:
687689
# The OS could not resolve this path fully, so we attempt
688690
# to follow the link ourselves. If we succeed, join the tail
689691
# and return.
690-
new_path = _readlink_deep(path)
692+
new_path = _readlink_deep(path,
693+
ignored_error=ignored_error)
691694
if new_path != path:
692695
return join(new_path, tail) if tail else new_path
693-
except OSError:
696+
except ignored_error:
694697
# If we fail to readlink(), let's keep traversing
695698
pass
696699
path, name = split(path)
@@ -721,24 +724,32 @@ def realpath(path, *, strict=False):
721724
if normcase(path) == normcase(devnull):
722725
return '\\\\.\\NUL'
723726
had_prefix = path.startswith(prefix)
727+
728+
if strict is ALLOW_MISSING:
729+
ignored_error = FileNotFoundError
730+
strict = True
731+
elif strict:
732+
ignored_error = ()
733+
else:
734+
ignored_error = OSError
735+
724736
if not had_prefix and not isabs(path):
725737
path = join(cwd, path)
726738
try:
727739
path = _getfinalpathname(path)
728740
initial_winerror = 0
729741
except ValueError as ex:
730742
# gh-106242: Raised for embedded null characters
731-
# In strict mode, we convert into an OSError.
743+
# In strict modes, we convert into an OSError.
732744
# Non-strict mode returns the path as-is, since we've already
733745
# made it absolute.
734746
if strict:
735747
raise OSError(str(ex)) from None
736748
path = normpath(path)
737-
except OSError as ex:
738-
if strict:
739-
raise
749+
except ignored_error as ex:
740750
initial_winerror = ex.winerror
741-
path = _getfinalpathname_nonstrict(path)
751+
path = _getfinalpathname_nonstrict(path,
752+
ignored_error=ignored_error)
742753
# The path returned by _getfinalpathname will always start with \\?\ -
743754
# strip off that prefix unless it was already provided on the original
744755
# path.

Lib/posixpath.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
"samefile","sameopenfile","samestat",
3636
"curdir","pardir","sep","pathsep","defpath","altsep","extsep",
3737
"devnull","realpath","supports_unicode_filenames","relpath",
38-
"commonpath", "isjunction"]
38+
"commonpath", "isjunction", "ALLOW_MISSING"]
3939

4040

4141
def _get_sep(path):
@@ -438,6 +438,15 @@ def _joinrealpath(path, rest, strict, seen):
438438
sep = '/'
439439
curdir = '.'
440440
pardir = '..'
441+
getcwd = os.getcwd
442+
if strict is ALLOW_MISSING:
443+
ignored_error = FileNotFoundError
444+
elif strict:
445+
ignored_error = ()
446+
else:
447+
ignored_error = OSError
448+
449+
maxlinks = None
441450

442451
if isabs(rest):
443452
rest = rest[1:]
@@ -460,7 +469,7 @@ def _joinrealpath(path, rest, strict, seen):
460469
newpath = join(path, name)
461470
try:
462471
st = os.lstat(newpath)
463-
except OSError:
472+
except ignored_error:
464473
if strict:
465474
raise
466475
is_link = False

0 commit comments

Comments
 (0)