Skip to content

Validate untrusted user input #7807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 25, 2022

Conversation

sergepetrenko
Copy link
Collaborator

@sergepetrenko sergepetrenko commented Oct 14, 2022

As seen in error reports from TNT-601, getopt_long, getopt and realpath are subject to internal buffer overflows (at least they used to be some time ago).
The suggestion is to either not use them, or provide them with correct input.

Speaking of realpath, we have 2 places that use it:

sergey@sergey-pc:~/Source/sptnt$ grep -rn realpath ./src --include *.c --include *.h --include *.cc --include *.cpp --include *.hpp --exclude *.lua.c --include *.lua
./src/box/module_cache.c:119:	if (realpath(lua_tostring(L, -1), resolved) == NULL) {
./src/box/module_cache.c:120:		diag_set(SystemError, "realpath");
./src/box/module_cache.c:126:	 * is guaranteed by realpath call.
./src/find_path.c:79:	if (realpath(buf, path) == NULL)

Both these places provide a valid input buffer of size PATH_MAX, which should be fine: https://stackoverflow.com/questions/4109638/what-is-the-safe-alternative-to-realpath

Speaking of getopt:

grep -rn getopt ./src --include *.c --include *.h --include *.cc --include *.cpp --include *.hpp --exclude *.lua.c --include *.lua
./src/lua/init.c:911:			unreachable(); /* checked by getopt() in main() */
./src/main.cc:39:#include <getopt.h>
./src/main.cc:672:	while ((ch = getopt_long(argc, argv, opts, longopts, NULL)) != -1) {
./src/main.cc:695:			/* "invalid option" is printed by getopt */

Some googling reveals that one has to limit argc and all of the argv[i] lengths to some "sane" value. What value is considered sane remains unknown. After some discussion in this PR (see below), it was decided to not try and patch getopt usage, since the only buggy getopt implementation known belongs to Solaris 2.5 from 1997.

The problem with getenv is of other nature: it might return string of arbitrary length and content, which can't be trusted. The relevant CWE (not CVE) is CWE-807 and CWE-20.
Here are our getenv usages:

sergey@sergey-pc:~/Source/sptnt$ grep -rn getenv ./src --include *.c --include *.h --include *.cc --include *.cpp --include *.hpp --exclude *.lua.c --include *.lua
./src/lib/core/coio_file.c:509:	const char *tmpdir = getenv("TMPDIR");
./src/lib/core/errinj.c:75:		const char *env_value = getenv(inj->name);
./src/proc_title.c:202: * that might try to hang onto a getenv() result.)
./src/proc_title.c:241:	 * is mandatory to flush internal libc caches on getenv/setenv
./src/systemd.c:54:	sd_unix_path = getenv("NOTIFY_SOCKET");
./src/box/module_cache.c:300:	const char *tmpdir = getenv("TMPDIR");
./src/box/sql/os_unix.c:1441:		azDirs[0] = getenv("SQL_TMPDIR");
./src/box/sql/os_unix.c:1446:		azDirs[1] = getenv("TMPDIR");
./src/box/lua/console.c:394:	const char *envvar = getenv("TT_CONSOLE_HIDE_SHOW_PROMPT");
./src/box/lua/console.lua:771:    local home_dir = os.getenv('HOME')
./src/box/lua/load_cfg.lua:1007:    local raw_value = os.getenv(env_var_name)
./src/lua/init.c:575:	const char *path = getenv(envname);
./src/lua/init.c:592:	const char *home = getenv("HOME");
./src/find_path.c:77:			snprintf(buf, sizeof(buf) - 1, "%s", getenv("_"));

Most of them are already treated correctly (that is, getenv output is either passed to snprintf or strcmp-ed to a static fixed size string). The ones, that weren't treated correctly are fixed in this patchset, including the ones in src/lua/* and src/box/lua*.

Please note, that TNT-601 also reports some findings in extra/* folder. There are extra/lemon.c, extra/bin2c.c and extra/txt2c.c. AFAICS these are all our building utilities and we are not concerned with their security. Please, correct me if I'm wrong.

Closes #7797

@sergepetrenko sergepetrenko force-pushed the validate-untrusted-user-input branch 2 times, most recently from 9c04e7c to 6afa9c0 Compare October 14, 2022 10:40
@sergepetrenko sergepetrenko added the do not merge Not ready to be merged label Oct 14, 2022
@sergepetrenko sergepetrenko force-pushed the validate-untrusted-user-input branch from 6afa9c0 to b54f37a Compare October 14, 2022 11:01
@locker locker removed their assignment Oct 14, 2022
@sergepetrenko sergepetrenko force-pushed the validate-untrusted-user-input branch 4 times, most recently from ced98df to f3fd668 Compare October 18, 2022 08:09
@sergepetrenko
Copy link
Collaborator Author

sergepetrenko commented Oct 18, 2022

I'm not sure we need the last commit.
Suppose there's a system with a buggy getopt_long. I suppose (no evidence here) that it would fail on 100kb arguments. Regardless of ARG_MAX value on that system. On the other hand, if there was a bug on that system, it must be triggered by arguments shorter than ARG_MAX, otherwise the OS wouldn't let the user to pass such long arguments.

I'm trying to say that neither fixed boundary (100kb) nor a boundary depending on ARG_MAX actually fix the problem which's being reported. So maybe it would be better to say that we don't run on systems with buggy getopt (which must be quite old ones), and ignore this "error".

@locker locker assigned sergepetrenko and unassigned locker Oct 18, 2022
@sergepetrenko sergepetrenko force-pushed the validate-untrusted-user-input branch from f3fd668 to 8a3e609 Compare October 18, 2022 13:58
@sergepetrenko sergepetrenko requested a review from locker October 18, 2022 14:05
@sergepetrenko sergepetrenko force-pushed the validate-untrusted-user-input branch from 93974a6 to 303b38c Compare October 24, 2022 11:16
@sergepetrenko sergepetrenko force-pushed the validate-untrusted-user-input branch from 303b38c to 91e2076 Compare October 24, 2022 11:36
@locker locker removed their assignment Oct 24, 2022
getenv() return values cannot be trusted, because an attacker might set
them. For instance, we shouldn't expect, that getenv() returns a value
of some sane size.

Another problem is that getenv() returns a pointer to one of
`char **environ` members, which might change upon next setenv().

Introduce a wrapper, getenv_safe(), which returns the value only when
it fits in a buffer of a specified size, and copies the value onto the
buffer. Use this wrapper everywhere in our code.

Below's a slightly decorated output of `grep -rwn getenv ./src --include
*.c --include *.h --include *.cc --include *.cpp --include *.hpp
--exclude *.lua.c` as of 2022-10-14.
`-` marks invalid occurences (comments, for example),
`*` marks the places that are already guarded before this patch,
`X` mars the places guarded in this patch, and
`^` marks places fixed in the next commit:

NO_WRAP
```
* ./src/lib/core/coio_file.c:509:	const char *tmpdir = getenv("TMPDIR");
X ./src/lib/core/errinj.c:75: const char *env_value = getenv(inj->name);
- ./src/proc_title.c:202: * that might try to hang onto a getenv() result.)
- ./src/proc_title.c:241:	* is mandatory to flush internal libc caches on getenv/setenv
X ./src/systemd.c:54: sd_unix_path = getenv("NOTIFY_SOCKET");
* ./src/box/module_cache.c:300: const char *tmpdir = getenv("TMPDIR");
X ./src/box/sql/os_unix.c:1441: azDirs[0] = getenv("SQL_TMPDIR");
X ./src/box/sql/os_unix.c:1446: azDirs[1] = getenv("TMPDIR");
* ./src/box/lua/console.c:394: const char *envvar = getenv("TT_CONSOLE_HIDE_SHOW_PROMPT");
^ ./src/box/lua/console.lua:771: local home_dir = os.getenv('HOME')
^ ./src/box/lua/load_cfg.lua:1007: local raw_value = os.getenv(env_var_name)
X ./src/lua/init.c:575: const char *path = getenv(envname);
X ./src/lua/init.c:592: const char *home = getenv("HOME");
* ./src/find_path.c:77: snprintf(buf, sizeof(buf) - 1, "%s", getenv("_"));
```
NO_WRAP

Part-of tarantool#7797

NO_DOC=security
Closes tarantool#7797

NO_DOC=security fix
NO_TEST=security fix
@sergepetrenko sergepetrenko force-pushed the validate-untrusted-user-input branch from 91e2076 to 904b754 Compare October 24, 2022 13:13
Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reflection of several compromises we agreed upon, the patchset looks okay for me.

@Totktonada Totktonada removed their assignment Oct 24, 2022
@sergepetrenko sergepetrenko removed the request for review from sergos October 24, 2022 15:07
@sergepetrenko sergepetrenko added the full-ci Enables all tests for a pull request label Oct 24, 2022
@locker locker self-assigned this Oct 24, 2022
@kyukhin kyukhin merged commit dd7d46a into tarantool:master Oct 25, 2022
@sergepetrenko
Copy link
Collaborator Author

Please backport to 2.10 and 1.10 as well

@locker locker assigned kyukhin and unassigned locker Oct 25, 2022
@locker
Copy link
Member

locker commented Oct 25, 2022

Please backport to 2.10 and 1.10 as well

Why 1.10? We backport fixes needed for the security certification only to 2.10 AFAIK.

@sergepetrenko
Copy link
Collaborator Author

Why 1.10? We backport fixes needed for the security certification only to 2.10 AFAIK.

I thought we backport security fixes everywhere where bugfixes go. This is not only for certification, this is a real problem, which's getting fixed. But ok, up to you.

@locker
Copy link
Member

locker commented Oct 25, 2022

Why 1.10? We backport fixes needed for the security certification only to 2.10 AFAIK.

I thought we backport security fixes everywhere where bugfixes go. This is not only for certification, this is a real problem, which's getting fixed. But ok, up to you.

To the best of my knowledge, we don't backport all fixes to 1.10 - only those that were specifically requested by our customers.

@sergepetrenko
Copy link
Collaborator Author

To the best of my knowledge, we don't backport all fixes to 1.10 - only those that were specifically requested by our customers.

I see, thanks.

sergepetrenko added a commit to tarantool/checkpatch that referenced this pull request Oct 31, 2022
getenv() is bad: it returns a pointer to the environment, which might
be changed by a following call to setenv(), making the value pointed
to longer and leading to buffer overflows.

See tarantool/tarantool#7807
@sergepetrenko sergepetrenko deleted the validate-untrusted-user-input branch October 31, 2022 08:40
locker pushed a commit to tarantool/checkpatch that referenced this pull request Oct 31, 2022
getenv() is bad: it returns a pointer to the environment, which might
be changed by a following call to setenv(), making the value pointed
to longer and leading to buffer overflows.

See tarantool/tarantool#7807
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full-ci Enables all tests for a pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate untrusted user input
6 participants