Skip to content

Commit 0eeca28

Browse files
Robert LoveLinus Torvalds
Robert Love
authored and
Linus Torvalds
committed
[PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly its inability to scale and its terrible user interface: * dnotify requires the opening of one fd per each directory that you intend to watch. This quickly results in too many open files and pins removable media, preventing unmount. * dnotify is directory-based. You only learn about changes to directories. Sure, a change to a file in a directory affects the directory, but you are then forced to keep a cache of stat structures. * dnotify's interface to user-space is awful. Signals? inotify provides a more usable, simple, powerful solution to file change notification: * inotify's interface is a system call that returns a fd, not SIGIO. You get a single fd, which is select()-able. * inotify has an event that says "the filesystem that the item you were watching is on was unmounted." * inotify can watch directories or files. Inotify is currently used by Beagle (a desktop search infrastructure), Gamin (a FAM replacement), and other projects. See Documentation/filesystems/inotify.txt. Signed-off-by: Robert Love <[email protected]> Cc: John McCutchan <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent bd4c625 commit 0eeca28

File tree

24 files changed

+1639
-67
lines changed

24 files changed

+1639
-67
lines changed

Documentation/filesystems/inotify.txt

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
inotify
2+
a powerful yet simple file change notification system
3+
4+
5+
6+
Document started 15 Mar 2005 by Robert Love <[email protected]>
7+
8+
(i) User Interface
9+
10+
Inotify is controlled by a set of three sys calls
11+
12+
First step in using inotify is to initialise an inotify instance
13+
14+
int fd = inotify_init ();
15+
16+
Change events are managed by "watches". A watch is an (object,mask) pair where
17+
the object is a file or directory and the mask is a bit mask of one or more
18+
inotify events that the application wishes to receive. See <linux/inotify.h>
19+
for valid events. A watch is referenced by a watch descriptor, or wd.
20+
21+
Watches are added via a path to the file.
22+
23+
Watches on a directory will return events on any files inside of the directory.
24+
25+
Adding a watch is simple,
26+
27+
int wd = inotify_add_watch (fd, path, mask);
28+
29+
You can add a large number of files via something like
30+
31+
for each file to watch {
32+
int wd = inotify_add_watch (fd, file, mask);
33+
}
34+
35+
You can update an existing watch in the same manner, by passing in a new mask.
36+
37+
An existing watch is removed via the INOTIFY_IGNORE ioctl, for example
38+
39+
inotify_rm_watch (fd, wd);
40+
41+
Events are provided in the form of an inotify_event structure that is read(2)
42+
from a inotify instance fd. The filename is of dynamic length and follows the
43+
struct. It is of size len. The filename is padded with null bytes to ensure
44+
proper alignment. This padding is reflected in len.
45+
46+
You can slurp multiple events by passing a large buffer, for example
47+
48+
size_t len = read (fd, buf, BUF_LEN);
49+
50+
Will return as many events as are available and fit in BUF_LEN.
51+
52+
each inotify instance fd is also select()- and poll()-able.
53+
54+
You can find the size of the current event queue via the FIONREAD ioctl.
55+
56+
All watches are destroyed and cleaned up on close.
57+
58+
59+
(ii) Internal Kernel Implementation
60+
61+
Each open inotify instance is associated with an inotify_device structure.
62+
63+
Each watch is associated with an inotify_watch structure. Watches are chained
64+
off of each associated device and each associated inode.
65+
66+
See fs/inotify.c for the locking and lifetime rules.
67+
68+
69+
(iii) Rationale
70+
71+
Q: What is the design decision behind not tying the watch to the open fd of
72+
the watched object?
73+
74+
A: Watches are associated with an open inotify device, not an open file.
75+
This solves the primary problem with dnotify: keeping the file open pins
76+
the file and thus, worse, pins the mount. Dnotify is therefore infeasible
77+
for use on a desktop system with removable media as the media cannot be
78+
unmounted.
79+
80+
Q: What is the design decision behind using an-fd-per-device as opposed to
81+
an fd-per-watch?
82+
83+
A: An fd-per-watch quickly consumes more file descriptors than are allowed,
84+
more fd's than are feasible to manage, and more fd's than are optimally
85+
select()-able. Yes, root can bump the per-process fd limit and yes, users
86+
can use epoll, but requiring both is a silly and extraneous requirement.
87+
A watch consumes less memory than an open file, separating the number
88+
spaces is thus sensible. The current design is what user-space developers
89+
want: Users initialize inotify, once, and add n watches, requiring but one fd
90+
and no twiddling with fd limits. Initializing an inotify instance two
91+
thousand times is silly. If we can implement user-space's preferences
92+
cleanly--and we can, the idr layer makes stuff like this trivial--then we
93+
should.
94+
95+
There are other good arguments. With a single fd, there is a single
96+
item to block on, which is mapped to a single queue of events. The single
97+
fd returns all watch events and also any potential out-of-band data. If
98+
every fd was a separate watch,
99+
100+
- There would be no way to get event ordering. Events on file foo and
101+
file bar would pop poll() on both fd's, but there would be no way to tell
102+
which happened first. A single queue trivially gives you ordering. Such
103+
ordering is crucial to existing applications such as Beagle. Imagine
104+
"mv a b ; mv b a" events without ordering.
105+
106+
- We'd have to maintain n fd's and n internal queues with state,
107+
versus just one. It is a lot messier in the kernel. A single, linear
108+
queue is the data structure that makes sense.
109+
110+
- User-space developers prefer the current API. The Beagle guys, for
111+
example, love it. Trust me, I asked. It is not a surprise: Who'd want
112+
to manage and block on 1000 fd's via select?
113+
114+
- You'd have to manage the fd's, as an example: Call close() when you
115+
received a delete event.
116+
117+
- No way to get out of band data.
118+
119+
- 1024 is still too low. ;-)
120+
121+
When you talk about designing a file change notification system that
122+
scales to 1000s of directories, juggling 1000s of fd's just does not seem
123+
the right interface. It is too heavy.
124+
125+
Q: Why the system call approach?
126+
127+
A: The poor user-space interface is the second biggest problem with dnotify.
128+
Signals are a terrible, terrible interface for file notification. Or for
129+
anything, for that matter. The ideal solution, from all perspectives, is a
130+
file descriptor-based one that allows basic file I/O and poll/select.
131+
Obtaining the fd and managing the watches could have been done either via a
132+
device file or a family of new system calls. We decided to implement a
133+
family of system calls because that is the preffered approach for new kernel
134+
features and it means our user interface requirements.
135+
136+
Additionally, it _is_ possible to more than one instance and
137+
juggle more than one queue and thus more than one associated fd.
138+

arch/i386/kernel/syscall_table.S

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -291,3 +291,6 @@ ENTRY(sys_call_table)
291291
.long sys_keyctl
292292
.long sys_ioprio_set
293293
.long sys_ioprio_get /* 290 */
294+
.long sys_inotify_init
295+
.long sys_inotify_add_watch
296+
.long sys_inotify_rm_watch

fs/Kconfig

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,19 @@ config ROMFS_FS
359359
If you don't know whether you need it, then you don't need it:
360360
answer N.
361361

362+
config INOTIFY
363+
bool "Inotify file change notification support"
364+
default y
365+
---help---
366+
Say Y here to enable inotify support and the /dev/inotify character
367+
device. Inotify is a file change notification system and a
368+
replacement for dnotify. Inotify fixes numerous shortcomings in
369+
dnotify and introduces several new features. It allows monitoring
370+
of both files and directories via a single open fd. Multiple file
371+
events are supported.
372+
373+
If unsure, say Y.
374+
362375
config QUOTA
363376
bool "Quota support"
364377
help

fs/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ obj-y := open.o read_write.o file_table.o buffer.o bio.o super.o \
1212
seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \
1313
ioprio.o
1414

15+
obj-$(CONFIG_INOTIFY) += inotify.o
1516
obj-$(CONFIG_EPOLL) += eventpoll.o
1617
obj-$(CONFIG_COMPAT) += compat.o
1718

fs/attr.c

Lines changed: 4 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
#include <linux/mm.h>
1111
#include <linux/string.h>
1212
#include <linux/smp_lock.h>
13-
#include <linux/dnotify.h>
13+
#include <linux/fsnotify.h>
1414
#include <linux/fcntl.h>
1515
#include <linux/quotaops.h>
1616
#include <linux/security.h>
@@ -107,31 +107,8 @@ int inode_setattr(struct inode * inode, struct iattr * attr)
107107
out:
108108
return error;
109109
}
110-
111110
EXPORT_SYMBOL(inode_setattr);
112111

113-
int setattr_mask(unsigned int ia_valid)
114-
{
115-
unsigned long dn_mask = 0;
116-
117-
if (ia_valid & ATTR_UID)
118-
dn_mask |= DN_ATTRIB;
119-
if (ia_valid & ATTR_GID)
120-
dn_mask |= DN_ATTRIB;
121-
if (ia_valid & ATTR_SIZE)
122-
dn_mask |= DN_MODIFY;
123-
/* both times implies a utime(s) call */
124-
if ((ia_valid & (ATTR_ATIME|ATTR_MTIME)) == (ATTR_ATIME|ATTR_MTIME))
125-
dn_mask |= DN_ATTRIB;
126-
else if (ia_valid & ATTR_ATIME)
127-
dn_mask |= DN_ACCESS;
128-
else if (ia_valid & ATTR_MTIME)
129-
dn_mask |= DN_MODIFY;
130-
if (ia_valid & ATTR_MODE)
131-
dn_mask |= DN_ATTRIB;
132-
return dn_mask;
133-
}
134-
135112
int notify_change(struct dentry * dentry, struct iattr * attr)
136113
{
137114
struct inode *inode = dentry->d_inode;
@@ -197,11 +174,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
197174
if (ia_valid & ATTR_SIZE)
198175
up_write(&dentry->d_inode->i_alloc_sem);
199176

200-
if (!error) {
201-
unsigned long dn_mask = setattr_mask(ia_valid);
202-
if (dn_mask)
203-
dnotify_parent(dentry, dn_mask);
204-
}
177+
if (!error)
178+
fsnotify_change(dentry, ia_valid);
179+
205180
return error;
206181
}
207182

fs/compat.c

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
#include <linux/ctype.h>
3838
#include <linux/module.h>
3939
#include <linux/dirent.h>
40-
#include <linux/dnotify.h>
40+
#include <linux/fsnotify.h>
4141
#include <linux/highuid.h>
4242
#include <linux/sunrpc/svc.h>
4343
#include <linux/nfsd/nfsd.h>
@@ -1307,9 +1307,13 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
13071307
out:
13081308
if (iov != iovstack)
13091309
kfree(iov);
1310-
if ((ret + (type == READ)) > 0)
1311-
dnotify_parent(file->f_dentry,
1312-
(type == READ) ? DN_ACCESS : DN_MODIFY);
1310+
if ((ret + (type == READ)) > 0) {
1311+
struct dentry *dentry = file->f_dentry;
1312+
if (type == READ)
1313+
fsnotify_access(dentry);
1314+
else
1315+
fsnotify_modify(dentry);
1316+
}
13131317
return ret;
13141318
}
13151319

fs/file_table.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
#include <linux/eventpoll.h>
1717
#include <linux/mount.h>
1818
#include <linux/cdev.h>
19+
#include <linux/fsnotify.h>
1920

2021
/* sysctl tunables... */
2122
struct files_stat_struct files_stat = {
@@ -126,6 +127,8 @@ void fastcall __fput(struct file *file)
126127
struct inode *inode = dentry->d_inode;
127128

128129
might_sleep();
130+
131+
fsnotify_close(file);
129132
/*
130133
* The function eventpoll_release() should be the first called
131134
* in the file cleanup chain.

fs/inode.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
#include <linux/pagemap.h>
2222
#include <linux/cdev.h>
2323
#include <linux/bootmem.h>
24+
#include <linux/inotify.h>
2425

2526
/*
2627
* This is needed for the following functions:
@@ -202,6 +203,10 @@ void inode_init_once(struct inode *inode)
202203
INIT_LIST_HEAD(&inode->i_data.i_mmap_nonlinear);
203204
spin_lock_init(&inode->i_lock);
204205
i_size_ordered_init(inode);
206+
#ifdef CONFIG_INOTIFY
207+
INIT_LIST_HEAD(&inode->inotify_watches);
208+
sema_init(&inode->inotify_sem, 1);
209+
#endif
205210
}
206211

207212
EXPORT_SYMBOL(inode_init_once);
@@ -351,6 +356,7 @@ int invalidate_inodes(struct super_block * sb)
351356

352357
down(&iprune_sem);
353358
spin_lock(&inode_lock);
359+
inotify_unmount_inodes(&sb->s_inodes);
354360
busy = invalidate_list(&sb->s_inodes, &throw_away);
355361
spin_unlock(&inode_lock);
356362

0 commit comments

Comments
 (0)