[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-xfs
Subject:    Shrink support - idea/plan/problems
From:       Iustin Pop <iusty () k1024 ! org>
Date:       2005-08-28 22:55:52
Message-ID: 20050828225552.GB17330 () saytrin ! hq ! k1024 ! org
[Download RAW message or body]

Hello to all,

I have a proposal to implement a very simple shrink operation. Simple
meaning that the user has to deal with clearing the removed space, and
it doesn't work with a real-time part.

The idea is based on my guesswork through the sources and on the
correctness of the following assumptions:
 - an AG contains only bnobt blocks, cntbt blocks, inobt blocks and user
   data blocks;
 - there is some way of computing the number of blocks used by the above
   btrees;
 - a way to block allocations to an ag can be implemented (I'm thinking
   about an in-core flag for AGs which xfs_alloc_vextend should use)

Kernel-side, the steps are (somewhat reverse of xfs_growfs_data_private):
 1. compute new number of AGs and blocks;
 2. check to see if we don't trip over the log (if it is internal);
 3. for each AG to be fully removed, block allocations in AG and check:
    a. check that agf->agf_length = XFS_PREALLOC_BLOCKS() + ags->agf_free
       + sum(block in bnobt, cntbt, inobt) ; this means that no user data
       is held in this AG;
    b. check that agi->agi_count == agi->agi_free; this means that no
       inodes are used in this AG;
 4. if the (new) last AG will have number of blocks smaller than
    sb_agblocks, allocate an extent occupying exactly the space to be
    removed; if it is not possible, it means that some data is using that
    space; also this takes care of modifying the freespace lists, etc.
    for this AG;
 5. update the superblocks with new agcount, dblocks, fdblocks;

I have a patch which does some parts of this (works ok on empty
filesystems), with the following issues:
 - 3a I don't know how to implement; judging from the logical structure
   of the btrees, a walk function starting at the root and following
   ptrs field until level 0 and counting each block could do it;
   however, the xfs_bmap_btree and xfs_ialloc_btree stuff is very
   complicated;
 - in 4, I use xfs_alloc_vextent with the args parameter filled by
   guesswork; it seems to work and the file system is ok (as reported by
   xfs_check and xfs_repair); however, more information is needed here
 - in xfs_growfs_data_private, if new AGs are added, some mp->m_perag is
   resized; presumably also for shrink, although I don't know what issues
   are with just shrinking that structure;
 - I don't understand from xfs_growfs_data_private if locking is needed
   or if some problems if the AGs we are modifying are present in some
   caches; the grow operation implementation seems very simple :)

All in all, I'm able to shrink and grow at will (no panics, df reports
ok, xfs_check and xfs_repair are not complaining) an empty filesystem
(down to the middle where the log is) and which no-one is accessing :)
The current version of the patch just adds a new function in xfs_fsops.c
and modifies xfs_growfs_data_private to call that function is the number
of data blocks is lower than the current number.

For helping the user clear the space, an userspace program can compute
all items having blocks/attribute blocks in that region, and the inodes
present in the affected AGs; however, if any btree has blocks in the
portion to be removed from the last AG (if not shrinking down to a
multiple of sb_agblocks), I think that cannot be computed (from
userspace). For actual moving the data off those blocks, there would be:
 - a program using xfsctl(XFS_IOC_SWAPEXT, ...) for clearing data
   blocks or manual copy (after disabling allocations to those AGs);
 - either a version of xfs_reno for clearing inodes or manual copy of
   those items (but somehow after disabling allocations to those AGs);

What do you think? Is it a reasonable idea to try to finish this? And
does anyone have pointers toward the issues described above?

Thanks,
Iustin Pop


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic