-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Restrict repository indexing by glob match #7767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 26 commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
a85393b
Restrict repository indexing by file extension
4353c71
Merge master into indexbyfileext
c48c37a
Use REPO_EXTENSIONS_LIST_INCLUDE instead of REPO_EXTENSIONS_LIST_EXCL…
c6d5a79
Corrected to pass lint gosimple
021014a
Merge master into indexbyfileext
72a650c
Add wildcard support to REPO_INDEXER_EXTENSIONS
1d7edb4
This reverts commit 72a650c8e42f4abf59d5df7cd5dc27b451494cc6.
7450aee
Add wildcard support to REPO_INDEXER_EXTENSIONS (no make vendor)
106faf3
Simplify isIndexable() for better clarity
guillep2k e48f041
Add gobwas/glob to vendors
guillep2k bf82bdb
Merge branch master into indexbyfileext
guillep2k d63a0fb
Merge branch 'master' of github.com:go-gitea/gitea into indexbyfileext
guillep2k 49e260c
Merge master into indexbyfileext and resolve conflicts
guillep2k 7c66a16
Merge branch 'master' into indexbyfileext
guillep2k f2342d6
Merge branch 'master' into indexbyfileext
guillep2k 55a93a3
manually set appengine new release
guillep2k 89773e9
Merge branch 'master' into indexbyfileext
guillep2k 0eca697
Merge branch 'master' of github.com:go-gitea/gitea into indexbyfileext
guillep2k 6bd0ac8
Implement better REPO_INDEXER_INCLUDE and REPO_INDEXER_EXCLUDE
guillep2k 435a222
Merge branch 'master' of github.com:go-gitea/gitea into indexbyfileext
guillep2k 4de7be9
Add unit and integration tests
guillep2k b79406a
Merge branch 'indexbyfileext' of github.com:guillep2k/gitea into inde…
guillep2k 9afc70e
Merge branch 'master' of github.com:go-gitea/gitea into indexbyfileext
guillep2k d92ee99
Update app.ini.sample and reword config-cheat-sheet
guillep2k 59fe641
Add doc page and correct app.ini.sample
guillep2k a1f675b
Some polish on the doc
guillep2k 480a406
Simplify code as suggested by @lafriks
guillep2k e0fbcb7
Merge branch 'master' into indexbyfileext
guillep2k 326f770
Merge branch 'master' into indexbyfileext
lafriks File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
--- | ||
date: "2019-09-06T01:35:00-03:00" | ||
title: "Repository indexer" | ||
slug: "repo-indexer" | ||
weight: 45 | ||
toc: true | ||
draft: false | ||
menu: | ||
sidebar: | ||
parent: "advanced" | ||
name: "Repository indexer" | ||
weight: 45 | ||
identifier: "repo-indexer" | ||
--- | ||
|
||
# Repository indexer | ||
|
||
## Setting up the repository indexer | ||
|
||
Gitea can search through the files of the repositories by enabling this function in your [`app.ini`](https://docs.gitea.io/en-us/config-cheat-sheet/): | ||
|
||
``` | ||
[indexer] | ||
; ... | ||
REPO_INDEXER_ENABLED = true | ||
REPO_INDEXER_PATH = indexers/repos.bleve | ||
UPDATE_BUFFER_LEN = 20 | ||
MAX_FILE_SIZE = 1048576 | ||
REPO_INDEXER_INCLUDE = | ||
REPO_INDEXER_EXCLUDE = resources/bin/** | ||
``` | ||
|
||
Please bear in mind that indexing the contents can consume a lot of system resources, especially when the index is created for the first time or globally updated (e.g. after upgrading Gitea). | ||
|
||
### Choosing the files for indexing by size | ||
|
||
The `MAX_FILE_SIZE` option will make the indexer skip all files larger than the specified value. | ||
|
||
### Choosing the files for indexing by path | ||
|
||
Gitea applies glob pattern matching from the [`gobwas/glob` library](https://github.com/gobwas/glob) to choose which files will be included in the index. | ||
|
||
Limiting the list of files prevents the indexes from becoming polluted with derived or irrelevant files (e.g. lss, sym, map, etc.), so the search results are more relevant. It can also help reduce the index size. | ||
|
||
`REPO_INDEXER_INCLUDE` (default: empty) is a comma separated list of glob patterns to **include** in the index. An empty list means "_include all files_". | ||
`REPO_INDEXER_EXCLUDE` (default: empty) is a comma separated list of glob patterns to **exclude** from the index. Files that match this list will not be indexed. `REPO_INDEXER_EXCLUDE` takes precedence over `REPO_INDEXER_INCLUDE`. | ||
|
||
Pattern matching works as follows: | ||
|
||
* To match all files with a `.txt` extension no matter what directory, use `**.txt`. | ||
* To match all files with a `.txt` extension _only at the root level of the repository_, use `*.txt`. | ||
* To match all files inside `resources/bin` and below, use `resources/bin/**`. | ||
* To match all files _immediately inside_ `resources/bin`, use `resources/bin/*`. | ||
* To match all files named `Makefile`, use `**Makefile`. | ||
* Matching a directory has no effect; the pattern `resources/bin` will not include/exclude files inside that directory; `resources/bin/**` will. | ||
* All files and patterns are normalized to lower case, so `**Makefile`, `**makefile` and `**MAKEFILE` are equivalent. | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
ref: refs/heads/master |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
[core] | ||
repositoryformatversion = 0 | ||
filemode = true | ||
bare = true |
1 change: 1 addition & 0 deletions
1
integrations/gitea-repositories-meta/user2/glob.git/description
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Unnamed repository; edit this file 'description' to name the repository. |
15 changes: 15 additions & 0 deletions
15
integrations/gitea-repositories-meta/user2/glob.git/hooks/applypatch-msg.sample
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
#!/bin/sh | ||
# | ||
# An example hook script to check the commit log message taken by | ||
# applypatch from an e-mail message. | ||
# | ||
# The hook should exit with non-zero status after issuing an | ||
# appropriate message if it wants to stop the commit. The hook is | ||
# allowed to edit the commit message file. | ||
# | ||
# To enable this hook, rename this file to "applypatch-msg". | ||
|
||
. git-sh-setup | ||
commitmsg="$(git rev-parse --git-path hooks/commit-msg)" | ||
test -x "$commitmsg" && exec "$commitmsg" ${1+"$@"} | ||
: |
24 changes: 24 additions & 0 deletions
24
integrations/gitea-repositories-meta/user2/glob.git/hooks/commit-msg.sample
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
#!/bin/sh | ||
# | ||
# An example hook script to check the commit log message. | ||
# Called by "git commit" with one argument, the name of the file | ||
# that has the commit message. The hook should exit with non-zero | ||
# status after issuing an appropriate message if it wants to stop the | ||
# commit. The hook is allowed to edit the commit message file. | ||
# | ||
# To enable this hook, rename this file to "commit-msg". | ||
|
||
# Uncomment the below to add a Signed-off-by line to the message. | ||
# Doing this in a hook is a bad idea in general, but the prepare-commit-msg | ||
# hook is more suited to it. | ||
# | ||
# SOB=$(git var GIT_AUTHOR_IDENT | sed -n 's/^\(.*>\).*$/Signed-off-by: \1/p') | ||
# grep -qs "^$SOB" "$1" || echo "$SOB" >> "$1" | ||
|
||
# This example catches duplicate Signed-off-by lines. | ||
|
||
test "" = "$(grep '^Signed-off-by: ' "$1" | | ||
sort | uniq -c | sed -e '/^[ ]*1[ ]/d')" || { | ||
echo >&2 Duplicate Signed-off-by lines. | ||
exit 1 | ||
} |
114 changes: 114 additions & 0 deletions
114
integrations/gitea-repositories-meta/user2/glob.git/hooks/fsmonitor-watchman.sample
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
#!/usr/bin/perl | ||
|
||
use strict; | ||
use warnings; | ||
use IPC::Open2; | ||
|
||
# An example hook script to integrate Watchman | ||
# (https://facebook.github.io/watchman/) with git to speed up detecting | ||
# new and modified files. | ||
# | ||
# The hook is passed a version (currently 1) and a time in nanoseconds | ||
# formatted as a string and outputs to stdout all files that have been | ||
# modified since the given time. Paths must be relative to the root of | ||
# the working tree and separated by a single NUL. | ||
# | ||
# To enable this hook, rename this file to "query-watchman" and set | ||
# 'git config core.fsmonitor .git/hooks/query-watchman' | ||
# | ||
my ($version, $time) = @ARGV; | ||
|
||
# Check the hook interface version | ||
|
||
if ($version == 1) { | ||
# convert nanoseconds to seconds | ||
$time = int $time / 1000000000; | ||
} else { | ||
die "Unsupported query-fsmonitor hook version '$version'.\n" . | ||
"Falling back to scanning...\n"; | ||
} | ||
|
||
my $git_work_tree; | ||
if ($^O =~ 'msys' || $^O =~ 'cygwin') { | ||
$git_work_tree = Win32::GetCwd(); | ||
$git_work_tree =~ tr/\\/\//; | ||
} else { | ||
require Cwd; | ||
$git_work_tree = Cwd::cwd(); | ||
} | ||
|
||
my $retry = 1; | ||
|
||
launch_watchman(); | ||
|
||
sub launch_watchman { | ||
|
||
my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j --no-pretty') | ||
or die "open2() failed: $!\n" . | ||
"Falling back to scanning...\n"; | ||
|
||
# In the query expression below we're asking for names of files that | ||
# changed since $time but were not transient (ie created after | ||
# $time but no longer exist). | ||
# | ||
# To accomplish this, we're using the "since" generator to use the | ||
# recency index to select candidate nodes and "fields" to limit the | ||
# output to file names only. Then we're using the "expression" term to | ||
# further constrain the results. | ||
# | ||
# The category of transient files that we want to ignore will have a | ||
# creation clock (cclock) newer than $time_t value and will also not | ||
# currently exist. | ||
|
||
my $query = <<" END"; | ||
["query", "$git_work_tree", { | ||
"since": $time, | ||
"fields": ["name"], | ||
"expression": ["not", ["allof", ["since", $time, "cclock"], ["not", "exists"]]] | ||
}] | ||
END | ||
|
||
print CHLD_IN $query; | ||
close CHLD_IN; | ||
my $response = do {local $/; <CHLD_OUT>}; | ||
|
||
die "Watchman: command returned no output.\n" . | ||
"Falling back to scanning...\n" if $response eq ""; | ||
die "Watchman: command returned invalid output: $response\n" . | ||
"Falling back to scanning...\n" unless $response =~ /^\{/; | ||
|
||
my $json_pkg; | ||
eval { | ||
require JSON::XS; | ||
$json_pkg = "JSON::XS"; | ||
1; | ||
} or do { | ||
require JSON::PP; | ||
$json_pkg = "JSON::PP"; | ||
}; | ||
|
||
my $o = $json_pkg->new->utf8->decode($response); | ||
|
||
if ($retry > 0 and $o->{error} and $o->{error} =~ m/unable to resolve root .* directory (.*) is not watched/) { | ||
print STDERR "Adding '$git_work_tree' to watchman's watch list.\n"; | ||
$retry--; | ||
qx/watchman watch "$git_work_tree"/; | ||
die "Failed to make watchman watch '$git_work_tree'.\n" . | ||
"Falling back to scanning...\n" if $? != 0; | ||
|
||
# Watchman will always return all files on the first query so | ||
# return the fast "everything is dirty" flag to git and do the | ||
# Watchman query just to get it over with now so we won't pay | ||
# the cost in git to look up each individual file. | ||
print "/\0"; | ||
eval { launch_watchman() }; | ||
exit 0; | ||
} | ||
|
||
die "Watchman: $o->{error}.\n" . | ||
"Falling back to scanning...\n" if $o->{error}; | ||
|
||
binmode STDOUT, ":utf8"; | ||
local $, = "\0"; | ||
print @{$o->{files}}; | ||
} |
8 changes: 8 additions & 0 deletions
8
integrations/gitea-repositories-meta/user2/glob.git/hooks/post-update.sample
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
#!/bin/sh | ||
# | ||
# An example hook script to prepare a packed repository for use over | ||
# dumb transports. | ||
# | ||
# To enable this hook, rename this file to "post-update". | ||
|
||
exec git update-server-info |
14 changes: 14 additions & 0 deletions
14
integrations/gitea-repositories-meta/user2/glob.git/hooks/pre-applypatch.sample
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
#!/bin/sh | ||
# | ||
# An example hook script to verify what is about to be committed | ||
# by applypatch from an e-mail message. | ||
# | ||
# The hook should exit with non-zero status after issuing an | ||
# appropriate message if it wants to stop the commit. | ||
# | ||
# To enable this hook, rename this file to "pre-applypatch". | ||
|
||
. git-sh-setup | ||
precommit="$(git rev-parse --git-path hooks/pre-commit)" | ||
test -x "$precommit" && exec "$precommit" ${1+"$@"} | ||
: |
49 changes: 49 additions & 0 deletions
49
integrations/gitea-repositories-meta/user2/glob.git/hooks/pre-commit.sample
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
#!/bin/sh | ||
# | ||
# An example hook script to verify what is about to be committed. | ||
# Called by "git commit" with no arguments. The hook should | ||
# exit with non-zero status after issuing an appropriate message if | ||
# it wants to stop the commit. | ||
# | ||
# To enable this hook, rename this file to "pre-commit". | ||
|
||
if git rev-parse --verify HEAD >/dev/null 2>&1 | ||
then | ||
against=HEAD | ||
else | ||
# Initial commit: diff against an empty tree object | ||
against=$(git hash-object -t tree /dev/null) | ||
fi | ||
|
||
# If you want to allow non-ASCII filenames set this variable to true. | ||
allownonascii=$(git config --bool hooks.allownonascii) | ||
|
||
# Redirect output to stderr. | ||
exec 1>&2 | ||
|
||
# Cross platform projects tend to avoid non-ASCII filenames; prevent | ||
# them from being added to the repository. We exploit the fact that the | ||
# printable range starts at the space character and ends with tilde. | ||
if [ "$allownonascii" != "true" ] && | ||
# Note that the use of brackets around a tr range is ok here, (it's | ||
# even required, for portability to Solaris 10's /usr/bin/tr), since | ||
# the square bracket bytes happen to fall in the designated range. | ||
test $(git diff --cached --name-only --diff-filter=A -z $against | | ||
LC_ALL=C tr -d '[ -~]\0' | wc -c) != 0 | ||
then | ||
cat <<\EOF | ||
Error: Attempt to add a non-ASCII file name. | ||
|
||
This can cause problems if you want to work with people on other platforms. | ||
|
||
To be portable it is advisable to rename the file. | ||
|
||
If you know what you are doing you can disable this check using: | ||
|
||
git config hooks.allownonascii true | ||
EOF | ||
exit 1 | ||
fi | ||
|
||
# If there are whitespace errors, print the offending file names and fail. | ||
exec git diff-index --check --cached $against -- |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.