[prev in list] [next in list] [prev in thread] [next in thread]
List: freebsd-hackers
Subject: add closefrom() call revisited
From: Ighighi <ighighi () gmail ! com>
Date: 2007-09-19 9:28:05
Message-ID: 46F0EBA5.7020802 () gmail ! com
[Download RAW message or body]
Given that NetBSD, OpenBSD and DragonFly (as well as Solaris and maybe
others) it'd be nice and worthwhile to implement it too on FreeBSD.
The attached shar archive contains 4 possible implementations of it.
One, a system call (the approach use by the other BSD's), available
here as a loadable kernel module for quick testing. The remaining 3
others are library versions. One of them doesn't currently work since
FreeBSD lacks a /proc/<pid>/fd/ that I tried to emulate with /dev/fd/,
both via devfs(5) and fdescfs(5): they seem to lacks some types of
file descriptors... Another just does what a lot of programs do: try
close() on every possible file descriptor and the other uses sysctl().
The implementation was inspired by the DragonFly code but the semantics
match Open/NetBSD's (EBADF vs EINVAL). Their code is available at:
http://www.dragonflybsd.org/cvsweb/~checkout~/src/sys/kern/kern_descrip.c
http://cvsweb.netbsd.org/bsdweb.cgi/~checkout~/src/sys/kern/kern_descrip.c
Also included in the archive is a timing test along with a regression
test borrowed from OpenSSH.
It was successfully built and tested on FreeBSD 6.2-STABLE.
There's code to make it work in -CURRENT.
A sample run on a Pentium 4 1.7Ghz:
$ make test
Trying closefrom_syscall(3) with 58976 open file descriptors
user 0.000000 sys 0.030874 total 0.030874
Trying closefrom_syscall(3) with 58976 closed file descriptors
user 0.000000 sys 0.000008 total 0.000008
Trying closefrom_sysctl(3) with 58976 open file descriptors
user 0.050941 sys 0.045333 total 0.096274
Trying closefrom_sysctl(3) with 58976 closed file descriptors
user 0.000877 sys 0.000939 total 0.001816
Trying closefrom_brute(3) with 58976 open file descriptors
user 0.037777 sys 0.043793 total 0.081570
Trying closefrom_brute(3) with 58976 closed file descriptors
user 0.026666 sys 0.046383 total 0.073049
closefrom_sysctl() has a a worst-case scenario when a lot of files
are open that may make it slower than closefrom_brute().
Implementations using /proc/<pid>/fd/ are also vulnerable to this.
With no library version guaranteed to be faster, and because of the
various reasons discussed in
http://lists.freebsd.org/pipermail/freebsd-hackers/2007-July/thread.html
I believe it'd be best to implement it as a system call (which can be
done through fcntl() anyway).
More info is included in the README.
Any ideas, suggestions?
Salutes,
Igh
["closefrom.shar" (text/plain)]
#!/bin/sh
# This is a shell archive
echo x closefrom
mkdir -p closefrom > /dev/null 2>&1
echo x closefrom/Makefile
sed 's/^X//' > closefrom/Makefile << 'SHAR_END'
XSUBDIR = module test
X
X.include <bsd.subdir.mk>
SHAR_END
echo x closefrom/README
sed 's/^X//' > closefrom/README << 'SHAR_END'
XOVERVIEW
X
XThis tarball contains 4 possible implementations of closefrom().
XThe first, a system call, is located in ./module/syscall.c and is
Xavailable as a kernel module for quick testing.
X
XBoth NetBSD >= 3.0 and DragonFly >= 1.4 implement it as a system call.
XIn NetBSD, it uses the F_CLOSEM fcntl(), available since version 2.0.
X
XThe second, implemented with the kern.file sysctl(), is available
Xon both FreeBSD >= 5.0 and DragonFly >= 1.2. Dynamic memory should be
Xallocated for an array of "struct xfile" structures that describes each
Xopen file descriptor open file descriptor _for every running process_ in
Xthe system...! (Note: the sysctl(3) manpage should be patched to reflect
Xthe current behaviour since FreeBSD 5.0: it should mention struct xfile).
XIn my system, the size of this structure is 52 bytes, so it could fail
Xon systems that setup a larger kern.maxfiles. This function would be
Xcleaner to implement in NetBSD which has an (undocumented) kern.file2
Xthat lets you work with a specific pid instead by passing KERN_FILE_BYPID.
X
XThe third is the usual brute force approach that uses getdtablesize(),
Xused for reference on the approach most applications take.
X
XThe fourth tries to do what some implementations (including Solaris') do
Xby browsing /proc/<pid>/fd/ but using /dev/fd/. Unfortunately, it doesn't
Xwork because neither devfs(5) nor fdescfs(5) seem to include duplicated
Xfile descriptors, sockets and maybe others.
X
X-o-
X
XIt was successfully built and tested on FreeBSD 6.2-STABLE (as of
XSept, 18 2007), though code that should work on -CURRENT is present
X(namely, the new FILEDESC_S[UN]LOCK macros).
X
XTo try the implementations, run these commands as follows:
X
Xcd module
Xmake
Xsudo make load
Xcd ..
Xcd test
Xmake
Xmake check
Xmake test
X
XFor repeated testing of any of the implementations you may run:
X./closefrom syscall
X./closefrom sysctl
X./closefrom brute
X
SHAR_END
echo x closefrom/module
mkdir -p closefrom/module > /dev/null 2>&1
echo x closefrom/test
mkdir -p closefrom/test > /dev/null 2>&1
echo x closefrom/test/closefrom.c
sed 's/^X//' > closefrom/test/closefrom.c << 'SHAR_END'
X/*
X * Copyright (c) 2007 by Ighighi
X * All rights reserved.
X *
X * Redistribution and use in source and binary forms, with or without
X * modification, are permitted provided that the following conditions
X * are met:
X *
X * 1. Redistributions of source code must retain the above copyright
X * notice, this list of conditions and the following disclaimer.
X * 2. Redistributions in binary form must reproduce the above copyright
X * notice, this list of conditions and the following disclaimer in the
X * documentation and/or other materials provided with the distribution.
X *
X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
X * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
X * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
X * THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
X * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
X * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
X * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
X * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
X * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
X * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
X */
X
X#include <dirent.h>
X#include <err.h>
X#include <errno.h>
X#include <fcntl.h>
X#include <limits.h>
X#include <stdio.h>
X#include <stdlib.h>
X#include <string.h>
X#include <unistd.h>
X#include <sys/types.h>
X#include <sys/param.h>
X#include <sys/file.h>
X#include <sys/resource.h>
X#include <sys/time.h>
X#include <sys/sysctl.h>
X
X#include <sys/syscall.h>
X#include <sys/module.h>
X
X#define DEBUG
X
Xstatic void
Xusage(const char *argv0)
X{
X fprintf(stderr, "Usage: %s syscall|sysctl|brute|devfd\n"
X "Usage: %s check\n", argv0, argv0);
X exit(1);
X}
X
Xstatic int (*closefrom)(int); /* pointer to closefrom_xxx() */
X
X/*
X * LKM version of closefrom()
X */
X
Xstatic int syscall_num;
X
Xstatic void
Xfind_module(void)
X{
X struct module_stat stat;
X int modid;
X
X modid = modfind("closefrom");
X if (modid == -1)
X err(1, "modfind(closefrom)");
X
X stat.version = sizeof(stat);
X if (modstat(modid, &stat) == -1)
X err(1, "modstat()");
X
X syscall_num = stat.data.intval;
X}
X
Xstatic int
Xclosefrom_syscall(int lowfd)
X{
X return (syscall(syscall_num, lowfd));
X}
X
X/*
X * This version uses the kern.file sysctl()
X */
Xstatic int
Xclosefrom_sysctl(int lowfd)
X{
X int mib[2] = { CTL_KERN, KERN_FILE };
X struct xfile *files = NULL;
X pid_t pid = getpid();
X size_t fsize;
X int i, nfiles;
X
X if (lowfd < 0) {
X errno = EBADF;
X return (-1);
X }
X
X for (;;) {
X if (sysctl(mib, 2, files, &fsize, NULL, 0) == -1) {
X if (errno != ENOMEM)
X goto bad;
X else if (files != NULL) {
X free(files);
X files = NULL;
X }
X } else if (files == NULL) {
X files = (struct xfile *) malloc(fsize);
X if (files == NULL)
X return (-1);
X } else
X break;
X }
X
X /* XXX This structure may change */
X if (files->xf_size != sizeof(struct xfile) ||
X fsize % sizeof(struct xfile))
X {
X errno = ENOSYS;
X goto bad;
X }
X
X nfiles = fsize / sizeof(struct xfile);
X
X for (i = 0; i < nfiles; i++)
X if (files[i].xf_pid == pid && files[i].xf_fd >= lowfd)
X if (close(files[i].xf_fd) < 0 && errno == EINTR)
X goto bad;
X
X free(files);
X return (0);
X
Xbad:
X if (files != NULL) {
X int save_errno = errno;
X free(files);
X errno = save_errno;
X }
X return (-1);
X}
X
X/*
X * This version iterates over all possible file descriptors >= lowfd
X */
Xstatic int
Xclosefrom_brute(int lowfd)
X{
X int fd;
X
X if (lowfd < 0) {
X errno = EBADF;
X return (-1);
X }
X
X for (fd = getdtablesize(); fd >= lowfd; fd--)
X if (close(fd) < 0 && errno == EINTR)
X return (-1);
X
X return (0);
X}
X
X/*
X * An example implementation using /dev/fd (other systems use /proc/<pid>/fd)
X * Unfortunately, on FreeBSD, fdescf(5) doesn't include duplicated file
X * descriptors and sockets.
X */
Xstatic int
Xclosefrom_devfd(int lowfd)
X{
X struct dirent *d;
X DIR *dir;
X int fd;
X
X if (lowfd < 0) {
X errno = EBADF;
X return (-1);
X }
X
X /*
X * Close lowfd so we have a spare fd to use with /dev/fd
X */
X close(lowfd++);
X
X if ((dir = opendir("/dev/fd")) == NULL)
X return (-1);
X
X while ((d = readdir(dir)) != NULL) {
X#ifdef DEBUG
X printf("%s\n", d->d_name);
X#endif
X if (d->d_name[0] == '.')
X continue;
X fd = atoi(d->d_name);
X if (fd >= lowfd && fd != dirfd(dir))
X if (close(fd) < 0 && errno == EINTR)
X goto bad;
X }
X
X (void)closedir(dir);
X return (0);
X
Xbad:
X {
X int save_errno = errno;
X (void)closedir(dir);
X errno = save_errno;
X return (-1);
X }
X}
X
Xstatic void
Xtime_closefrom(int lowfd)
X{
X struct rusage ru, rux;
X struct timeval tv;
X double usecs, ssecs;
X
X if (getrusage(RUSAGE_SELF, &ru) < 0)
X err(1, "getrusage()");
X if (closefrom(lowfd) < 0)
X err(1, "closefrom()");
X if (getrusage(RUSAGE_SELF, &rux) < 0)
X err(1, "getrusage()");
X
X timersub(&rux.ru_utime, &ru.ru_utime, &tv);
X usecs = ((double)tv.tv_sec + (double)tv.tv_usec / 1000000);
X printf("user\t%f\t", usecs);
X timersub(&rux.ru_stime, &ru.ru_stime, &tv);
X ssecs = ((double)tv.tv_sec + (double)tv.tv_usec / 1000000);
X printf("sys\t%f\t", ssecs);
X usecs += ssecs;
X printf("total\t%f\n", usecs);
X}
X
Xstatic void
Xtry(int (*xclosefrom)(int), const char *str)
X{
X int fd, lowfd, maxfd;
X
X lowfd = dup(STDIN_FILENO);
X maxfd = getdtablesize();
X for (fd = 1; fd < maxfd; fd++)
X if (dup(STDIN_FILENO) < 0)
X break;
X
X closefrom = xclosefrom;
X printf("Trying %s(%d) with %d open file descriptors\n", str, lowfd, fd);
X time_closefrom(lowfd);
X
X printf("Trying %s(%d) with %d closed file descriptors\n", str, lowfd, fd);
X time_closefrom(lowfd);
X printf("\n");
X}
X
Xint test(int (*)(int));
X
Xint
Xmain(int argc, char *argv[])
X{
X if (argv[1] == NULL)
X usage(argv[0]);
X
X if (!strcmp(argv[1], "check")) {
X find_module();
X printf("testing closefrom_syscall():\t%s\n",
X test(&closefrom_syscall) ? "failed" : "ok");
X printf("testing closefrom_sysctl():\t%s\n",
X test(&closefrom_sysctl) ? "failed" : "ok");
X printf("testing closefrom_brute():\t%s\n",
X test(&closefrom_brute) ? "failed" : "ok");
X }
X else if (!strcmp(argv[1], "syscall")) {
X find_module();
X try(&closefrom_syscall, "closefrom_syscall");
X }
X else if (!strcmp(argv[1], "sysctl"))
X try(&closefrom_sysctl, "closefrom_sysctl");
X else if (!strcmp(argv[1], "devfd"))
X try(&closefrom_devfd, "closefrom_devfd");
X else if (!strcmp(argv[1], "brute"))
X try(&closefrom_brute, "closefrom_brute");
X else
X usage(argv[0]);
X
X return (0);
X}
X
X/*
X * NOTE:
X * The following code was adapted from OpenSSH's
X * openbsd-compat/regress/closefromtest.c
X */
X
X/*
X * Copyright (c) 2006 Darren Tucker
X *
X * Permission to use, copy, modify, and distribute this software for any
X * purpose with or without fee is hereby granted, provided that the above
X * copyright notice and this permission notice appear in all copies.
X *
X * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
X * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
X * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
X * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
X * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
X * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
X * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
X */
X
X#define NUM_OPENS 10
X
X#define fail(str) \
X do { printf("%s\n", (str)); \
X return -1; } while(0)
X
Xint
Xtest(int (*xclosefrom)(int))
X{
X int i, max, fds[NUM_OPENS];
X char buf[512];
X
X for (i = 0; i < NUM_OPENS; i++)
X if ((fds[i] = open("/dev/null", O_RDONLY)) == -1)
X exit(0); /* can't test */
X max = i - 1;
X
X /* should close last fd only */
X xclosefrom(fds[max]);
X if (close(fds[max]) != -1)
X fail("failed to close highest fd");
X
X /* make sure we can still use remaining descriptors */
X for (i = 0; i < max; i++)
X if (read(fds[i], buf, sizeof(buf)) == -1)
X fail("closed descriptors it should not have");
X
X /* should close all fds */
X xclosefrom(fds[0]);
X for (i = 0; i < NUM_OPENS; i++)
X if (close(fds[i]) != -1)
X fail("failed to close from lowest fd");
X
X return 0;
X}
SHAR_END
echo x closefrom/test/Makefile
sed 's/^X//' > closefrom/test/Makefile << 'SHAR_END'
XPROG = closefrom
XNO_MAN =
X
XCFLAGS = -Wall -O2
X
Xcheck: ${PROG}
X @./${PROG} check
X
Xtest: ${PROG}
X @./${PROG} syscall
X @./${PROG} sysctl
X @./${PROG} brute
X
X.include <bsd.prog.mk>
SHAR_END
echo x closefrom/module/Makefile
mkdir -p closefrom/module > /dev/null 2>&1
sed 's/^X//' > closefrom/module/Makefile << 'SHAR_END'
XKMOD = syscall
XSRCS = syscall.c vnode_if.h
X
XCFLAGS += -Wall
X
Xreload:
X @${MAKE} unload
X @${MAKE} load
X
X.include <bsd.kmod.mk>
SHAR_END
echo x closefrom/module/syscall.c
sed 's/^X//' > closefrom/module/syscall.c << 'SHAR_END'
X/*
X * Copyright (c) 2007 by Ighighi
X * All rights reserved.
X *
X * Redistribution and use in source and binary forms, with or without
X * modification, are permitted provided that the following conditions
X * are met:
X *
X * 1. Redistributions of source code must retain the above copyright
X * notice, this list of conditions and the following disclaimer.
X * 2. Redistributions in binary form must reproduce the above copyright
X * notice, this list of conditions and the following disclaimer in the
X * documentation and/or other materials provided with the distribution.
X *
X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
X * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
X * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
X * THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
X * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
X * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
X * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
X * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
X * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
X * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
X */
X
X#include <sys/param.h>
X#include <sys/file.h>
X#include <sys/filedesc.h>
X#include <sys/kernel.h>
X#include <sys/proc.h>
X#include <sys/syscallsubr.h>
X#include <sys/sysent.h>
X#include <sys/systm.h>
X#include <sys/vnode.h>
X#include <sys/module.h>
X
X/*
X * Newer code in FreeBSD > 6.2 use shared/exclusive locks
X */
X#ifndef FILEDESC_SLOCK
X#define FILEDESC_SLOCK FILEDESC_LOCK_FAST
X#define FILEDESC_SUNLOCK FILEDESC_UNLOCK_FAST
X#endif
X
X/*
X * kern_closefrom()
X */
Xstatic int
Xkern_closefrom(struct thread *td, int lowfd)
X{
X struct filedesc *fdp;
X int fd;
X
X /*
X * Note: NetBSD uses EBADF and Dragonly uses (undocumented) EINVAL
X */
X if (lowfd < 0)
X return (EBADF);
X
X fdp = td->td_proc->p_fd;
X
X FILEDESC_SLOCK(fdp);
X while ((fd = fdp->fd_lastfile) >= lowfd) {
X FILEDESC_SUNLOCK(fdp);
X if (kern_close(td, fd) == EINTR)
X return (EINTR);
X FILEDESC_SLOCK(fdp);
X }
X FILEDESC_SUNLOCK(fdp);
X
X return (0);
X}
X
X/* closefrom() arguments */
Xstruct closefrom_args {
X int fd;
X};
X
Xstatic int
Xclosefrom(struct thread *td, void *args)
X{
X struct closefrom_args *uap = (struct closefrom_args *)args;
X
X return (kern_closefrom(td, uap->fd));
X}
X
X/* closefrom() sysent[] */
Xstatic struct sysent closefrom_sysent = {
X 1, /* number of arguments */
X closefrom /* implementing function */
X};
X
X/*
X * LKM stuff
X */
X
X/* offset in sysent[] where the syscall will be allocated */
Xstatic int offset = NO_SYSCALL;
X
Xstatic int
Xload(struct module *module, int cmd, void *arg)
X{
X int error = 0;
X
X switch (cmd) {
X case MOD_LOAD:
X uprintf("closefrom loaded at offset %d\n", offset);
X break;
X
X case MOD_UNLOAD:
X uprintf("closefrom unloaded from offset %d\n", offset);
X break;
X
X default:
X error = EOPNOTSUPP;
X break;
X }
X
X return (error);
X}
X
XSYSCALL_MODULE(closefrom, &offset, &closefrom_sysent, load, NULL);
SHAR_END
exit
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic