History log of /drivers/md/raid0.c
Revision Date Author Comments
24b961f811a3e790a9b93604d2594bfb6cce4fa4 01-Apr-2012 Jes Sorensen <Jes.Sorensen@redhat.com> md: Avoid OOPS when reshaping raid1 to raid0

raid1 arrays do not have the notion of chunk size. Calculate the
largest chunk sector size we can use to avoid a divide by zero OOPS
when aligning the size of the new array to the chunk size.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
0366ef847581d692e197b88825867ca9ee00e358 02-Apr-2012 majianpeng <majianpeng@gmail.com> md/raid0: If md_integrity_register() fails, raid0_run() must free the mem.

Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
ba13da47ffa202784355561f72160a41350e95cc 18-Mar-2012 NeilBrown <neilb@suse.de> md: add proper merge_bvec handling to RAID0 and Linear.

These personalities currently set a max request size of one page
when any member device has a merge_bvec_fn because they don't
bother to call that function.

This causes extra works in splitting and combining requests.

So make the extra effort to call the merge_bvec_fn when it exists
so that we end up with larger requests out the bottom.

Signed-off-by: NeilBrown <neilb@suse.de>
dafb20fa34320a472deb7442f25a0c086e0feb33 18-Mar-2012 NeilBrown <neilb@suse.de> md: tidy up rdev_for_each usage.

md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
mddev. However it uses the 'safe' version of list_for_each_entry,
and so requires the extra variable, but doesn't include 'safe' in the
name, which is useful documentation.

Consequently some places use this safe version without needing it, and
many use an explicity list_for_each entry.

So:
- rename rdev_for_each to rdev_for_each_safe
- create a new rdev_for_each which uses the plain
list_for_each_entry,
- use the 'safe' version only where needed, and convert all other
list_for_each_entry calls to use rdev_for_each.

Signed-off-by: NeilBrown <neilb@suse.de>
056075c76417b112b4924e7b6386fdc6dfc9ac03 03-Jul-2011 Paul Gortmaker <paul.gortmaker@windriver.com> md: Add module.h to all files using it implicitly

A pending cleanup will mean that module.h won't be implicitly
everywhere anymore. Make sure the modular drivers in md dir
are actually calling out for <module.h> explicitly in advance.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
84fc4b56db85cb9e05326424049973a2036c9940 11-Oct-2011 NeilBrown <neilb@suse.de> md: rename "mdk_personality" to "md_personality"

"mdk" doesn't mean anything any more.

Signed-off-by: NeilBrown <neilb@suse.de>
e373ab109172abc2d821bd3b5c1b400acddef5a5 11-Oct-2011 NeilBrown <neilb@suse.de> md/raid0: typedef removal: raid0_conf_t -> struct r0conf

Signed-off-by: NeilBrown <neilb@suse.de>
fd01b88c75a718020ff77e7f560d33835e9b58de 11-Oct-2011 NeilBrown <neilb@suse.de> md: remove typedefs: mddev_t -> struct mddev

Having mddev_t and 'struct mddev_s' is ugly and not preferred

Signed-off-by: NeilBrown <neilb@suse.de>
3cb03002000f133f9f97269edefd73611eafc873 11-Oct-2011 NeilBrown <neilb@suse.de> md: removing typedefs: mdk_rdev_t -> struct md_rdev

The typedefs are just annoying. 'mdk' probably refers to 'md_k.h'
which used to be an include file that defined this thing.

Signed-off-by: NeilBrown <neilb@suse.de>
50de8df4abca1b27dbf7b2f81a56451bd8b5a7d8 07-Oct-2011 NeilBrown <neilb@suse.de> md/raid0: convert some printks to pr_debug.

When md assembles a RAID0 array it prints out lots of info which
is really just for debugging, so convert that to pr_debug.
It also prints out the resulting configuration which could be
interesting, so keep that as 'printk' but tidy it up a bit.

Signed-off-by: NeilBrown <neilb@suse.de>
bdc04e6b15f70a8f96d8cdfe21df159a6466b49a 07-Oct-2011 NeilBrown <neilb@suse.de> md: remove some old DEBUGging code.

This code is not really helpful and is hard to maintain, so just
discard it.

Signed-off-by: NeilBrown <neilb@suse.de>
5a7bbad27a410350e64a2d7f5ec18fc73836c14f 12-Sep-2011 Christoph Hellwig <hch@infradead.org> block: remove support for bio remapping from ->make_request

There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.

Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
a91a2785b200864aef2270ed6a3babac7a253a20 17-Mar-2011 Martin K. Petersen <martin.petersen@oracle.com> block: Require subsystems to explicitly allocate bio_set integrity mempool

MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.

Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.

Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Acked-by: Mike Snitzer <snizer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
7eaceaccab5f40bbfda044629a6298616aeaed50 10-Mar-2011 Jens Axboe <jaxboe@fusionio.com> block: remove per-queue plugging

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
da9cf5050a2e3dbc3cf26a8d908482eb4485ed49 21-Feb-2011 NeilBrown <neilb@suse.de> md: avoid spinlock problem in blk_throtl_exit

blk_throtl_exit assumes that ->queue_lock still exists,
so make sure that it does.
To do this, we stop redirecting ->queue_lock to conf->device_lock
and leave it pointing where it is initialised - __queue_lock.

As the blk_plug functions check the ->queue_lock is held, we now
take that spin_lock explicitly around the plug functions. We don't
need the locking, just the warning removal.

This is needed for any kernel with the blk_throtl code, which is
which is 2.6.37 and later.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
f7bee80945155ad0326916486dabc38428c6cdef 14-Feb-2011 Krzysztof Wojcik <krzysztof.wojcik@intel.com> md: Fix raid1->raid0 takeover

Takeover raid1->raid0 not succeded. Kernel message is shown:
"md/raid0:md126: too few disks (1 of 2) - aborting!"

Problem was that we weren't updating ->raid_disks for that
takeover, unlike all the others.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
fc3a08b85b7a4f6c1069e5f71f6ad40d925ff55b 31-Jan-2011 Krzysztof Wojcik <krzysztof.wojcik@intel.com> Add raid1->raid0 takeover support

This patch introduces raid 1 to raid0 takeover operation
in kernel space.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: Neil Brown <neilb@nbeee.brown>
e9c7469bb4f502dafc092166201bea1ad5fc0fbf 03-Sep-2010 Tejun Heo <tj@kernel.org> md: implment REQ_FLUSH/FUA support

This patch converts md to support REQ_FLUSH/FUA instead of now
deprecated REQ_HARDBARRIER. In the core part (md.c), the following
changes are notable.

* Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA don't interfere with
processing of other requests and thus there is no reason to mark the
queue congested while FLUSH/FUA is in progress.

* REQ_FLUSH/FUA failures are final and its users don't need retry
logic. Retry logic is removed.

* Preflush needs to be issued to all member devices but FUA writes can
be handled the same way as other writes - their processing can be
deferred to request_queue of member devices. md_barrier_request()
is renamed to md_flush_request() and simplified accordingly.

For linear, raid0 and multipath, the core changes are enough. raid1,
5 and 10 need the following conversions.

* raid1: Handling of FLUSH/FUA bio's can simply be deferred to
request_queues of member devices. Barrier related logic removed.

* raid5: Queue draining logic dropped. FUA bit is propagated through
biodrain and stripe resconstruction such that all the updated parts
of the stripe are written out with FUA writes if any of the dirtying
writes was FUA. preread_active_stripes handling in make_request()
is updated as suggested by Neil Brown.

* raid10: FUA bit needs to be propagated to write clones.

linear, raid0, 1, 5 and 10 tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
7b6d91daee5cac6402186ff224c3af39d79f4a0e 07-Aug-2010 Christoph Hellwig <hch@lst.de> block: unify flags for struct bio and struct request

Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
049d6c1ef983c9ac43aa423dfd752071a5b0002d 16-Jun-2010 Maciej Trela <maciej.trela@intel.com> md: enable raid4->raid0 takeover

Only level 5 with layout=PARITY_N can be taken over to raid0 now.
Lets allow level 4 either.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
001048a318d48e93cb6a1246f3b20335b2a7c855 16-Jun-2010 Maciej Trela <maciej.trela@intel.com> md: clear layout after ->raid0 takeover

After takeover from raid5/10 -> raid0 mddev->layout is not cleared.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
e93f68a1fc6244c05ad8fae28e75835ec74ab34e 15-Jun-2010 NeilBrown <neilb@suse.de> md: fix handling of array level takeover that re-arranges devices.

Most array level changes leave the list of devices largely unchanged,
possibly causing one at the end to become redundant.
However conversions between RAID0 and RAID10 need to renumber
all devices (except 0).

This renumbering is currently being done in the ->run method when the
new personality takes over. However this is too late as the common
code in md.c might already have invalidated some of the devices if
they had a ->raid_disk number that appeared to high.

Moving it into the ->takeover method is too early as the array is
still active at that time and wrong ->raid_disk numbers could cause
confusion.

So add a ->new_raid_disk field to mdk_rdev_s and use it to communicate
the new raid_disk number.
Now the common code knows exactly which devices need to be renumbered,
and which can be invalidated, and can do it all at a convenient time
when the array is suspend.
It can also update some symlinks in sysfs which previously were not be
updated correctly.

Reported-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
b5a20961f3479dda48bdc340354ee5469997839d 03-May-2010 NeilBrown <neilb@suse.de> md/raid0: tidy up printk messages.

All messages now start
md/raid0:md-device-name:

Signed-off-by: NeilBrown <neilb@suse.de>
21a52c6d05c15f862797736393915bfa8cd40ee9 01-Apr-2010 NeilBrown <neilb@suse.de> md: pass mddev to make_request functions rather than request_queue

We used to pass the personality make_request function direct
to the block layer so the first argument had to be a queue.
But now we have the intermediary md_make_request so it makes
at lot more sense to pass a struct mddev_s.
It makes it possible to have an mddev without its own queue too.

Signed-off-by: NeilBrown <neilb@suse.de>
490773268cf64f68da2470e07b52c7944da6312d 25-Mar-2010 NeilBrown <neilb@suse.de> md: move io accounting out of personalities into md_make_request

While I generally prefer letting personalities do as much as possible,
given that we have a central md_make_request anyway we may as well use
it to simplify code.
Also this centralises knowledge of ->gendisk which will help later.

Signed-off-by: NeilBrown <neilb@suse.de>
9af204cf720cedf369cf823bbd806c350201f7ea 08-Mar-2010 Trela, Maciej <Maciej.Trela@intel.com> md: Add support for Raid5->Raid0 and Raid10->Raid0 takeover


Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
84707f38e767ac470fd82af6c45a8cafe2bd1b9a 16-Mar-2010 NeilBrown <neilb@suse.de> md: don't use mddev->raid_disks in raid0 or raid10 while array is active.

In a subsequent patch we will make it possible to change
mddev->raid_disks while a RAID0 or RAID10 array is active. This is
part of the process of reshaping such an array.

This means that we cannot use this value while processes requests
(it is OK to use it during initialisation as we are locked against
changes then).
Both RAID0 and RAID10 have the same value stored in the private data
structure, so use that value instead.

Signed-off-by: NeilBrown <neilb@suse.de>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 08-Mar-2010 NeilBrown <neilb@suse.de> md: deal with merge_bvec_fn in component devices better.

If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to. Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.

So instead set max_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.

This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
086fa5ff0854c676ec333760f4c0154b3b242616 26-Feb-2010 Martin K. Petersen <martin.petersen@oracle.com> block: Rename blk_queue_max_sectors to blk_queue_max_hw_sectors

The block layer calling convention is blk_queue_<limit name>.
blk_queue_max_sectors predates this practice, leading to some confusion.
Rename the function to appropriately reflect that its intended use is to
set max_hw_sectors.

Also introduce a temporary wrapper for backwards compability. This can
be removed after the merge window is closed.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
0efb9e6191e1d3d34c1db90b829b742bc36d532e 13-Dec-2009 NeilBrown <neilb@suse.de> md: add MODULE_DESCRIPTION for all md related modules.

Suggested by Oren Held <orenhe@il.ibm.com>

Signed-off-by: NeilBrown <neilb@suse.de>
a2826aa92e2e14db372eda01d333267258944033 13-Dec-2009 NeilBrown <neilb@suse.de> md: support barrier requests on all personalities.

Previously barriers were only supported on RAID1. This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.

When a barrier arrives, we send a zero-length barrier to every active
device. When that completes - and if the original request was not
empty - we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.

The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.

The reason for clearing the barrier flag is that a barrier request is
allowed to fail. If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail. That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.

RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet. So we flush the stripe cache before
proceeding with the barrier.

Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted. Thus when
a personality finds mddev->barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.

That will be done in later patches.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Andre Noll <maan@systemlinux.org>
3fa841d7e7266f6fcc1b3885b905f5153ba897d8 23-Sep-2009 NeilBrown <neilb@suse.de> md: report device as congested when suspended

This should writeback from coming when the device is temporarily
suspended.

Signed-off-by: NeilBrown <neilb@suse.de>
a9f326ebf22a0de776815240fb76dabe139397ea 23-Sep-2009 NeilBrown <neilb@suse.de> md: remove sparse waring "symbol xxx shadows an earlier one"

Rename some variable and remove some duplicate definitions
to avoid there warnings. None of them are actual errors.

Signed-off-by: NeilBrown <neilb@suse.de>
1f98a13f623e0ef666690a18c1250335fc6d7ef1 11-Sep-2009 Jens Axboe <jens.axboe@oracle.com> bio: first step in sanitizing the bio->bi_rw flag testing

Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
ac5e7113e74872928844d00085bd47c988f12728 03-Aug-2009 Andre Noll <maan@systemlinux.org> md: Push down data integrity code to personalities.

This patch replaces md_integrity_check() by two new public functions:
md_integrity_register() and md_integrity_add_rdev() which are both
personality-independent.

md_integrity_register() is called from the ->run and ->hot_remove
methods of all personalities that support data integrity. The
function iterates over the component devices of the array and
determines if all active devices are integrity capable and if their
profiles match. If this is the case, the common profile is registered
for the mddev via blk_integrity_register().

The second new function, md_integrity_add_rdev() is called from the
->hot_add_disk methods, i.e. whenever a new device is being added
to a raid array. If the new device does not support data integrity,
or has a profile different from the one already registered, data
integrity for the mddev is disabled.

For raid0 and linear, only the call to md_integrity_register() from
the ->run method is necessary.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
8f6c2e4b325a8e9f8f47febb2fd0ed4fae7d45a9 01-Jul-2009 Martin K. Petersen <martin.petersen@oracle.com> md: Use new topology calls to indicate alignment and I/O sizes

Switch MD over to the new disk_stack_limits() function which checks for
aligment and adjusts preferred I/O sizes when stacking.

Also indicate preferred I/O sizes where applicable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
0894cc3066aaa3e75a99383c0d25feebf9b688ac 18-Jun-2009 Andre Noll <maan@systemlinux.org> md: Move check for bitmap presence to personality code.

If the superblock of a component device indicates the presence of a
bitmap but the corresponding raid personality does not support bitmaps
(raid0, linear, multipath, faulty), then something is seriously wrong
and we'd better refuse to run such an array.

Currently, this check is performed while the superblocks are examined,
i.e. before entering personality code. Therefore the generic md layer
must know which raid levels support bitmaps and which do not.

This patch avoids this layer violation without adding identical code
to various personalities. This is accomplished by introducing a new
public function to md.c, md_check_no_bitmap(), which replaces the
hard-coded checks in the superblock loading functions.

A call to md_check_no_bitmap() is added to the ->run method of each
personality which does not support bitmaps and assembly is aborted
if at least one component device contains a bitmap.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
13f2682b7216ebebd72b3d5868fe7fccec91a92d 18-Jun-2009 NeilBrown <neilb@suse.de> md: raid0/linear: ensure device sizes are rounded to chunk size.

This is currently ensured by common code, but it is more reliable to
ensure it where it is needed in personality code.
All the other personalities that care already round the size to
the chunk_size. raid0 and linear are the only hold-outs.

Signed-off-by: NeilBrown <neilb@suse.de>
d6e412eaa52db82010f12ea7d2c9b9468e933c44 18-Jun-2009 NeilBrown <neilb@suse.de> md: raid0: chunk_sectors cleanups.

following the conversion to chunk_sectors, there is room
for cleaning up a little.

Signed-off-by: NeilBrown <neilb@suse.de>
9d8f0363623b3da12c43007cf77f5e1a4e8a5964 18-Jun-2009 Andre Noll <maan@systemlinux.org> md: Make mddev->chunk_size sector-based.

This patch renames the chunk_size field to chunk_sectors with the
implied change of semantics. Since

is_power_of_2(chunk_size) = is_power_of_2(chunk_sectors << 9)
= is_power_of_2(chunk_sectors)

these bits don't need an adjustment for the shift.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
fbb704efb784e2c8418e34dc3013af76bdd58101 16-Jun-2009 raz ben yehuda <raziebe@gmail.com> md: raid0 :Enables chunk size other than powers of 2.

Maintain two flows, one for pow2 chunk sizes (which uses masks and
shift), and a flow for the general case (which uses sector_div).
This is for the sake of performance.

- introduce map_sector and is_io_in_chunk_boundary to encapsulate
those two flows better for raid0_make_request
- fix blk_mergeable to support the two flows.

Signed-off-by: raziebe@gmail.com
Signed-off-by: NeilBrown <neilb@suse.de>
92e59b6ba21845fadd2cce725010a9351740b76e 16-Jun-2009 raz ben yehuda <raziebe@gmail.com> md: raid0: chunk size check in raid0_run

have raid0 check chunk size in run method instead of in md.
This is part of a series moving the checks from common code to
the personalities where they belong.

hardsect is short and chunksize is an int, so it is safe to use %.

Signed-off-by: raziebe@gmail.com
Signed-off-by: NeilBrown <neilb@suse.de>
46994191ae8fdf1cbcc1f29282576b269a638c69 16-Jun-2009 raz ben yehuda <raziebe@gmail.com> md: have raid0 report its formation

Report to the user what are the raid zones

Signed-off-by: raziebe@gmail.com
Signed-off-by: NeilBrown <neilb@suse.de>
1b9614291eb319fad96de45392eb4452ad39f0ee 16-Jun-2009 raz ben yehuda <raziebe@gmail.com> md: have raid0 compile with MD_DEBUG on

Because of the removal of the device list from
the strips raid0 did not compile with MD_DEBUG flag on

Signed-off-by: NeilBrown <neilb@suse.de>
070ec55d07157a3041f92654135c3c6e2eaaf901 16-Jun-2009 NeilBrown <neilb@suse.de> md: remove mddev_to_conf "helper" macro

Having a macro just to cast a void* isn't really helpful.
I would must rather see that we are simply de-referencing ->private,
than have to know what the macro does.

So open code the macro everywhere and remove the pointless cast.

Signed-off-by: NeilBrown <neilb@suse.de>
a6b3deafe0c50e3e873e8ed5cc8abfcb25c05eff 16-Jun-2009 NeilBrown <neilb@suse.de> md: raid0: remove setting of segment boundary.


This setting doesn't seem to make sense (half the chunk size??) and
shouldn't be needed.
The segment boundary exported by raid0 should simply be the minimum
of the segment boundary of all component devices. And we already
get that right.

Signed-off-by: NeilBrown <neilb@suse.de>
b414579f4573b6dc8583e31b01dcffd13f49fd62 16-Jun-2009 NeilBrown <neilb@suse.de> md: raid0: remove ->dev pointer from strip_zone structure

If we treat conf->devlist more like a 2 dimensional array,
we can get the devlist for a particular zone simply by indexing
that array, so we don't need to store the pointers to subarrays
in strip_zone. This makes strip_zone smaller and so (hopefully)
searches faster.

Signed-of-by: NeilBrown <neilb@suse.de>
49f357a22b3fa3eeac042dfa0a6cae920c174e48 16-Jun-2009 NeilBrown <neilb@suse.de> md: raid0: remove ->sectors from the strip_zone structure.

storing ->sectors is redundant as is can be computed from the
difference z->zone_end - (z-1)->zone_end

The one place where it is used, it is just as efficient to use
a zone_end value instead.

And removing it makes strip_zone smaller, so they array of these that
is searched on every request has a better chance to say in cache.

So discard the field and get the value from elsewhere.

Signed-off-by: NeilBrown <neilb@suse.de>
fb5ab4b5d6e16fd5006c9f800d0116f3547cb760 16-Jun-2009 Andre Noll <maan@systemlinux.org> md: raid0: Fix a memory leak when stopping a raid0 array.

raid0_stop() removes all references to the raid0 configuration but
misses to free the ->devlist buffer.

This patch closes this leak, removes a pointless initialization and
fixes a coding style issue in raid0_stop().

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
ed7b00380d957ec770b5e90380d012c6062c13cc 16-Jun-2009 Andre Noll <maan@systemlinux.org> md: raid0: Allocate all buffers for the raid0 configuration in one function.

Currently the raid0 configuration is allocated in raid0_run() while
the buffers for the strip_zone and the dev_list arrays are allocated
in create_strip_zones(). On errors, all three buffers are freed
in raid0_run().

It's easier and more readable to do the allocation and cleanup within
a single function. So move that code into create_strip_zones().

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
5568a6035d9fca2cd8f1ef7005e215eae4e65fab 16-Jun-2009 Andre Noll <maan@systemlinux.org> md: raid0: Make raid0_run() return a proper error code.

Currently raid0_run() always returns -ENOMEM on errors. This is
incorrect as running the array might fail for other reasons, for
example because not all component devices were available.

This patch changes create_strip_zones() so that it returns a proper
error code (either -ENOMEM or -EINVAL) rather than 1 on errors and
makes raid0_run(), its single caller, return that value instead
of -ENOMEM.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
8f79cfcdb65472f1504ade2f53e5f2bfdaeb95da 16-Jun-2009 Andre Noll <maan@systemlinux.org> md: raid0: Remove hash spacing and sector shift.

The "sector_shift" and "spacing" fields of struct raid0_private_data
were only used for the hash table lookups. So the removal of the
hash table allows get rid of these fields as well which simplifies
create_strip_zones() and raid0_run() quite a bit.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
09770e0b6ee649313611a2d6a9b44f456072dbd6 16-Jun-2009 Andre Noll <maan@systemlinux.org> md: raid0: Remove hash table.

The raid0 hash table has become unused due to the changes in the
previous patch. This patch removes the hash table allocation and
setup code and kills the hash_table field of struct raid0_private_data.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
d27a43abd7be0ab4b2337e4587feca8c7340e5f9 16-Jun-2009 NeilBrown <neilb@suse.de> md/raid0: two cleanups in create_stripe_zones.

1/ remove current_start. The same value is available in
zone->dev_start and storing it separately doesn't gain anything.
2/ rename curr_zone_start to curr_zone_end as we are now more
focused on the 'end' of each zone. We end up storing the
same number though - the old name was a little confusing
(and what does 'current' mean in this context anyway).

Signed-off-by: NeilBrown <neilb@suse.de>
dc58266385e51420298275c90a616c34f1473a73 16-Jun-2009 Andre Noll <maan@systemlinux.org> md: raid0: Replace hash table lookup by looping over all strip_zones.

The number of strip_zones of a raid0 array is bounded by the number of
drives in the array and is in fact much smaller for typical setups. For
example, any raid0 array containing identical disks will have only
a single strip_zone.

Therefore, the hash tables which are used for quickly finding the
strip_zone that holds a particular sector are of questionable value
and add quite a bit of unnecessary complexity.

This patch replaces the hash table lookup by equivalent code which
simply loops over all strip zones to find the zone that holds the
given sector.

In order to make this loop as fast as possible, the zone->start field
of struct strip_zone has been renamed to zone_end, and it now stores
the beginning of the next zone in sectors. This allows to save one
addition in the loop.

Subsequent cleanup patches will remove the hash table structure.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
ae03bf639a5027d27270123f5f6e3ee6a412781d 22-May-2009 Martin K. Petersen <martin.petersen@oracle.com> block: Use accessor functions for queue limits

Convert all external users of queue limits to using wrapper functions
instead of poking the request queue variables directly.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
b522adcde9c4d3fb7b579cfa9160d8bde7744be8 31-Mar-2009 Dan Williams <dan.j.williams@intel.com> md: 'array_size' sysfs attribute

Allow userspace to set the size of the array according to the following
semantics:

1/ size must be <= to the size returned by mddev->pers->size(mddev, 0, 0)
a) If size is set before the array is running, do_md_run will fail
if size is greater than the default size
b) A reshape attempt that reduces the default size to less than the set
array size should be blocked
2/ once userspace sets the size the kernel will not change it
3/ writing 'default' to this attribute returns control of the size to the
kernel and reverts to the size reported by the personality

Also, convert locations that need to know the default size from directly
reading ->array_sectors to <pers>_size. Resync/reshape operations
always follow the default size.

Finally, fixup other locations that read a number of 1k-blocks from
userspace to use strict_blocks_to_sectors() which checks for unsigned
long long to sector_t overflow and blocks to sectors overflow.

Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
1f403624bde3c678a166984b1e6a727a0ce06f2b 31-Mar-2009 Dan Williams <dan.j.williams@intel.com> md: centralize ->array_sectors modifications

Get personalities out of the business of directly modifying
->array_sectors. Lays groundwork to introduce policy on when
->array_sectors can be modified.

Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
80c3a6ce4ba4470379b9e6a4d9bcd9d2ee26ae03 18-Mar-2009 Dan Williams <dan.j.williams@intel.com> md: add 'size' as a personality method

In preparation for giving userspace control over ->array_sectors we need
to be able to retrieve the 'default' size, and the 'anticipated' size
when a reshape is requested. For personalities that do not reshape emit
a warning if anything but the default size is requested.

In the raid5 case we need to update ->previous_raid_disks to make the
new 'default' size available.

Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
dd8ac336c13fd8afdb082ebacb1cddd5cf727889 31-Mar-2009 Andre Noll <maan@systemlinux.org> md: Represent raid device size in sectors.

This patch renames the "size" field of struct mdk_rdev_s to
"sectors" and changes this field to store sectors instead of
blocks.

All users of this field, linear.c, raid0.c and md.c, are fixed up
accordingly which gets rid of many multiplications and divisions.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
43b2e5d86d8bdd77386226db0bc961529492c043 31-Mar-2009 NeilBrown <neilb@suse.de> md: move md_k.h from include/linux/raid/ to drivers/md/

It really is nicer to keep related code together..

Signed-off-by: NeilBrown <neilb@suse.de>
bff61975b3d6c18ee31457cc5b4d73042f44915f 31-Mar-2009 NeilBrown <neilb@suse.de> md: move lots of #include lines out of .h files and into .c

This makes the includes more explicit, and is preparation for moving
md_k.h to drivers/md/md.h

Remove include/raid/md.h as its only remaining use was to #include
other files.

Signed-off-by: NeilBrown <neilb@suse.de>
ef740c372dfd80e706dbf955d4e4aedda6c0c148 31-Mar-2009 Christoph Hellwig <hch@lst.de> md: move headers out of include/linux/raid/

Move the headers with the local structures for the disciplines and
bitmap.h into drivers/md/ so that they are more easily grepable for
hacking and not far away. md.h is left where it is for now as there
are some uses from the outside.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
159ec1fc060ab22b157a62364045f5e98749c4d3 08-Jan-2009 Cheng Renquan <crquan@gmail.com> md: use list_for_each_entry macro directly

The rdev_for_each macro defined in <linux/raid/md_k.h> is identical to
list_for_each_entry_safe, from <linux/list.h>, it should be defined to
use list_for_each_entry_safe, instead of reinventing the wheel.

But some calls to each_entry_safe don't really need a safe version,
just a direct list_for_each_entry is enough, this could save a temp
variable (tmp) in every function that used rdev_for_each.

In this patch, most rdev_for_each loops are replaced by list_for_each_entry,
totally save many tmp vars; and only in the other situations that will call
list_del to delete an entry, the safe version is used.

Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
ccacc7d2cf03114a24ab903f710118e9e5d43273 08-Jan-2009 Andre Noll <maan@systemlinux.org> md: raid0: make hash_spacing and preshift sector-based.

This patch renames the hash_spacing and preshift members of struct
raid0_private_data to spacing and sector_shift respectively and
changes the semantics as follows:

We always have spacing = 2 * hash_spacing. In case
sizeof(sector_t) > sizeof(u32) we also have sector_shift = preshift + 1
while sector_shift = preshift = 0 otherwise.

Note that the values of nb_zone and zone are unaffected by these changes
because in the sector_div() preceeding the assignement of these two
variables both arguments double.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
83838ed87898e0a8ff8dbf001e54e6c017f0a011 08-Jan-2009 Andre Noll <maan@systemlinux.org> md: raid0: Represent the size of strip zones in sectors.

This completes the block -> sector conversion of struct strip_zone.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
0825b87a7dd9645c7e16489fec839a3cb5c40a08 08-Jan-2009 Andre Noll <maan@systemlinux.org> md: raid0 create_strip_zones(): Add KERN_INFO/KERN_ERR to printk's.

This patch consists only of these trivial changes.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
6b8796cc3decb43c7c67a13a0699b6a21b754c78 08-Jan-2009 Andre Noll <maan@systemlinux.org> md: raid0 create_strip_zones(): Make two local variables sector-based.

current_offset and curr_zone_offset stored the corresponding offsets
as 1K quantities. Rename them to current_start and curr_zone_start
to match the naming of struct strip_zone and store the offsets as
sector counts.

Also, add KERN_INFO to the printk() affected by this change to make
checkpatch happy.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
6199d3db0fc34f8ada895879d04a353a6ae632bc 08-Jan-2009 Andre Noll <maan@systemlinux.org> md: raid0: Represent zone->zone_offset in sectors.

For the same reason as in the previous patch, rename it from zone_offset
to zone_start.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
019c4e2f3e02aac4b44003913b54ca4b332e4371 08-Jan-2009 Andre Noll <maan@systemlinux.org> md: raid0: Represent device offset in sectors.

Rename zone->dev_offset to zone->dev_start to make sure all users
have been converted.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
e0f06868341700c5c1964a04f6c5b51d0a2d5bca 08-Jan-2009 Andre Noll <maan@systemlinux.org> md: raid0_make_request(): Replace local variable block by sector.

This change already simplifies the code a bit.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
a471200595b24fb1907ad12107a6a66db02c63f2 08-Jan-2009 Andre Noll <maan@systemlinux.org> md: raid0_make_request(): Remove local variable chunk_size.

We might as well use chunk_sects instead.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
1b7fdf8ff7c0e3fba9c679def4e98d5701d2949e 08-Jan-2009 Andre Noll <maan@systemlinux.org> md: raid0_make_request(): Replace chunksize_bits by chunksect_bits.

As ffz(~(2 * x)) = ffz(~x) + 1, we have

chunksect_bits = chunksize_bits + 1.

Fixup all users accordingly.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
fb4d8c76e56a887b9eee99fbc55fe82b18625d30 13-Oct-2008 NeilBrown <neilb@suse.de> md: Remove unnecessary #includes, #defines, and function declarations.

A lot of cruft has gathered over the years. Time to remove it.

Signed-off-by: NeilBrown <neilb@suse.de>
6feef531f55cf4a20fd9eb39f5352e5745203603 09-Oct-2008 Denis ChengRq <crquan@gmail.com> block: mark bio_split_pool static

Since all bio_split calls refer the same single bio_split_pool, the bio_split
function can use bio_split_pool directly instead of the mempool_t parameter;

then the mempool_t parameter can be removed from bio_split param list, and
bio_split_pool is only referred in fs/bio.c file, can be marked static.

Signed-off-by: Denis ChengRq <crquan@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
074a7aca7afa6f230104e8e65eba3420263714a5 25-Aug-2008 Tejun Heo <tj@kernel.org> block: move stats from disk to part0

Move stats related fields - stamp, in_flight, dkstats - from disk to
part0 and unify stat handling such that...

* part_stat_*() now updates part0 together if the specified partition
is not part0. ie. part_stat_*() are now essentially all_stat_*().

* {disk|all}_stat_*() are gone.

* part_round_stats() is updated similary. It handles part0 stats
automatically and disk_round_stats() is killed.

* part_{inc|dec}_in_fligh() is implemented which automatically updates
part0 stats for parts other than part0.

* disk_map_sector_rcu() is updated to return part0 if no part matches.
Combined with the above changes, this makes NULL special case
handling in callers unnecessary.

* Separate stats show code paths for disk are collapsed into part
stats show code paths.

* Rename disk_stat_lock/unlock() to part_stat_lock/unlock()

While at it, reposition stat handling macros a bit and add missing
parentheses around macro parameters.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
c9959059161ddd7bf4670cf47367033d6b2f79c4 25-Aug-2008 Tejun Heo <tj@kernel.org> block: fix diskstats access

There are two variants of stat functions - ones prefixed with double
underbars which don't care about preemption and ones without which
disable preemption before manipulating per-cpu counters. It's unclear
whether the underbarred ones assume that preemtion is disabled on
entry as some callers don't do that.

This patch unifies diskstats access by implementing disk_stat_lock()
and disk_stat_unlock() which take care of both RCU (for partition
access) and preemption (for per-cpu counter access). diskstats access
should always be enclosed between the two functions. As such, there's
no need for the versions which disables preemption. They're removed
and double underbars ones are renamed to drop the underbars. As an
extra argument is added, there's no danger of using the old version
unconverted.

disk_stat_lock() uses get_cpu() and returns the cpu index and all
diskstat functions which access per-cpu counters now has @cpu
argument to help RT.

This change adds RCU or preemption operations at some places but also
collapses several preemption ops into one at others. Overall, the
performance difference should be negligible as all involved ops are
very lightweight per-cpu ones.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
f233ea5c9e0d8b95e4283bf6a3436b88f6fd3586 21-Jul-2008 Andre Noll <maan@systemlinux.org> md: Make mddev->array_size sector-based.

This patch renames the array_size field of struct mddev_s to array_sectors
and converts all instances to use units of 512 byte sectors instead of 1k
blocks.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
cc371e66e340f35eed8dc4651c7c18e754c7fb26 03-Jul-2008 Alasdair G Kergon <agk@redhat.com> Add bvec_merge_data to handle stacked devices and ->merge_bvec()

When devices are stacked, one device's merge_bvec_fn may need to perform
the mapping and then call one or more functions for its underlying devices.

The following bio fields are used:
bio->bi_sector
bio->bi_bdev
bio->bi_size
bio->bi_rw using bio_data_dir()

This patch creates a new struct bvec_merge_data holding a copy of those
fields to avoid having to change them directly in the struct bio when
going down the stack only to have to change them back again on the way
back up. (And then when the bio gets mapped for real, the whole
exercise gets repeated, but that's a problem for another day...)

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
e7e72bf641b1fc7b9df6f40bd2c36dfccd8d647c 15-May-2008 Neil Brown <neilb@suse.de> Remove blkdev warning triggered by using md

As setting and clearing queue flags now requires that we hold a spinlock
on the queue, and as blk_queue_stack_limits is called without that lock,
get the lock inside blk_queue_stack_limits.

For blk_queue_stack_limits to be able to find the right lock, each md
personality needs to set q->queue_lock to point to the appropriate lock.
Those personalities which didn't previously use a spin_lock, us
q->__queue_lock. So always initialise that lock when allocated.

With this in place, setting/clearing of the QUEUE_FLAG_PLUGGED bit will no
longer cause warnings as it will be clear that the proper lock is held.

Thanks to Dan Williams for review and fixing the silly bugs.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Alistair John Strachan <alistair@devzero.co.uk>
Cc: Nick Piggin <npiggin@suse.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Jacek Luczak <difrost.kernel@gmail.com>
Cc: Prakash Punnoor <prakash@punnoor.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d089c6af10c2be5988f03667d6d22fe6085fbe5e 06-Feb-2008 NeilBrown <neilb@suse.de> md: change ITERATE_RDEV to rdev_for_each

As this is more in line with common practice in the kernel. Also swap the
args around to be more like list_for_each.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2ad8b1ef11c98c5603580878aebf9f1bc74129e4 07-Nov-2007 Alan D. Brunelle <Alan.Brunelle@hp.com> Add UNPLUG traces to all appropriate places

Added blk_unplug interface, allowing all invocations of unplugs to result
in a generated blktrace UNPLUG.

Signed-off-by: Alan D. Brunelle <Alan.Brunelle@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
8299d7f7c067a30a67ad359d416128c4ff57dcd1 17-Oct-2007 NeilBrown <neilb@suse.de> md: fix a bug in some never-used code.

http://bugzilla.kernel.org/show_bug.cgi?id=3277

There is a seq_printf here that isn't being passed a 'seq'. Howeve as the
code is inside #ifdef MD_DEBUG, nobody noticed.

Also remove some extra spaces.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fd5d806266935179deda1502101624832eacd01f 16-Oct-2007 Jens Axboe <jens.axboe@oracle.com> block: convert blkdev_issue_flush() to use empty barriers

Then we can get rid of ->issue_flush_fn() and all the driver private
implementations of that.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
6712ecf8f648118c3363c142196418f89a510b90 27-Sep-2007 NeilBrown <neilb@suse.de> Drop 'size' argument from bio_endio and bi_end_io

As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant. Remove it.

Now there is no need for bio_endio to subtract the size completed
from bi_size. So don't do that either.

While we are at it, change bi_end_io to return void.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
165125e1e480f9510a5ffcfbfee4e3ee38c05f23 24-Jul-2007 Jens Axboe <jens.axboe@oracle.com> [BLOCK] Get rid of request_queue_t typedef

Some of the code has been gradually transitioned to using the proper
struct request_queue, but there's lots left. So do a full sweet of
the kernel and get rid of this typedef and replace its uses with
the proper type.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
787f17feb204ed1c6331892fb8124b80dc9fe288 23-May-2007 NeilBrown <neilb@suse.de> md: avoid overflow in raid0 calculation with large components

If a raid0 has a component device larger than 4TB, and is accessed on a 32bit
machines, then as 'chunk' is unsigned long,

chunk << chunksize_bits

can overflow (this can be as high as the size of the device in KB). chunk
itself will not overflow (without triggering a BUG).

So change 'chunk' to be 'sector_t, and get rid of the 'BUG' as it becomes
impossible to hit.

Cc: "Jeff Zheng" <Jeff.Zheng@endace.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
26be34dc3a46be983352dd89683db374b0cb73fa 03-Oct-2006 NeilBrown <neilb@suse.de> [PATCH] md: define backing_dev_info.congested_fn for raid0 and linear

Each backing_dev needs to be able to report whether it is congested, either by
modulating BDI_*_congested in ->state, or by defining a ->congested_fn.
md/raid did neither of these. This patch add a congested_fn which simply
checks all component devices to see if they are congested.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
5c4c33318d26620fa552f15bbb6d0f9775a1b4df 23-May-2006 NeilBrown <neilb@suse.de> [PATCH] md: fix possible oops when starting a raid0 array

This loop that sets up the hash_table has problems.

Careful examination will show that the last time through, everything but
the first line is pointless. This is because all it does is change 'cur'
and 'size' and neither of these are used after the loop. This should ring
warning bells... That last time through the loop,

size += conf->strip_zone[cur].size

can index off the end of the strip_zone array. Depending on what it finds
there, it might exit the loop cleanly, or it might spin going further and
further beyond the array until it hits an unmapped address.

This patch rearranges the code so that the last, pointless, iteration of
the loop never happens. i.e. the one statement of the last loop that is
needed is moved the the end of the previous loop - or to before the loop
starts - and the loop counter starts from 1 instead of 0.

Cc: "Don Dupuis" <dondster@gmail.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
29fc7e3e70a05e9eea28afb6707a39c1a53e2f66 03-Feb-2006 NeilBrown <neilb@suse.de> [PATCH] md: Assorted little md fixes

- version-1 superblock
+ The default_bitmap_offset is in sectors, not bytes.
+ the 'size' field in the superblock is in sectors, not KB
- raid0_run should return a negative number on error, not '1'
- raid10_read_balance should not return a valid 'disk' number if
->rdev turned out to be NULL
- kmem_cache_destroy doesn't like being passed a NULL.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a1365647022eb05a5993f270a78e9bef3bf554eb 08-Jan-2006 Andrew Morton <akpm@osdl.org> [PATCH] remove gcc-2 checks

Remove various things which were checking for gcc-1.x and gcc-2.x compilers.

From: Adrian Bunk <bunk@stusta.de>

Some documentation updates and removes some code paths for gcc < 3.2.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
d9d166c2a9d5d01af34396793950aa695883eed4 06-Jan-2006 NeilBrown <neilb@suse.de> [PATCH] md: allow array level to be set textually via sysfs

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2604b703b6b3db80e3c75ce472a54dfd0b7bf9f4 06-Jan-2006 NeilBrown <neilb@suse.de> [PATCH] md: remove personality numbering from md

md supports multiple different RAID level, each being implemented by a
'personality' (which is often in a separate module).

These personalities have fairly artificial 'numbers'. The numbers
are use to:
1- provide an index into an array where the various personalities
are recorded
2- identify the module (via an alias) which implements are particular
personality.

Neither of these uses really justify the existence of personality numbers.
The array can be replaced by a linked list which is searched (array lookup
only happens very rarely). Module identification can be done using an alias
based on level rather than 'personality' number.

The current 'raid5' modules support two level (4 and 5) but only one
personality. This slight awkwardness (which was handled in the mapping from
level to personality) can be better handled by allowing raid5 to register 2
personalities.

With this change in place, the core md module does not need to have an
exhaustive list of all possible personalities, so other personalities can be
added independently.

This patch also moves the check for chunksize being non-zero into the ->run
routines for the personalities that need it, rather than having it in core-md.
This has a side effect of allowing 'faulty' and 'linear' not to have a
chunk-size set.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
9ffae0cf3ea02f75d163922accfd3e592d87adde 06-Jan-2006 NeilBrown <neilb@suse.de> [PATCH] md: convert md to use kzalloc throughout

Replace multiple kmalloc/memset pairs with kzalloc calls.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2d1f3b5d1b2cd11a162eb29645df749ec0036413 06-Jan-2006 NeilBrown <neilb@suse.de> [PATCH] md: clean up 'page' related names in md

Substitute:

page_cache_get -> get_page
page_cache_release -> put_page
PAGE_CACHE_SHIFT -> PAGE_SHIFT
PAGE_CACHE_SIZE -> PAGE_SIZE
PAGE_CACHE_MASK -> PAGE_MASK
__free_page -> put_page

because we aren't using the page cache, we are just using pages.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a362357b6cd62643d4dda3b152639303d78473da 01-Nov-2005 Jens Axboe <axboe@suse.de> [BLOCK] Unify the seperate read/write io stat fields into arrays

Instead of having ->read_sectors and ->write_sectors, combine the two
into ->sectors[2] and similar for the other fields. This saves a branch
several places in the io path, since we don't have to care for what the
actual io direction is. On my x86-64 box, that's 200 bytes less text in
just the core (not counting the various drivers).

Signed-off-by: Jens Axboe <axboe@suse.de>
e5dcdd80a60627371f40797426273048630dc8ca 10-Sep-2005 NeilBrown <neilb@cse.unsw.edu.au> [PATCH] md: fail IO request to md that require a barrier.

md does not yet support BIO_RW_BARRIER, so be honest about it and fail
(-EOPNOTSUPP) any such requests.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1eb29128c644581fa51f822545921394ad4f719f 15-Jul-2005 Neil Brown <neilb@cse.unsw.edu.au> [PATCH] Fix raid0's attempt to divide by 64bit numbers

Apparently sector_div is only guaranteed to work with a 32bit divisor, even
on 64bit architectures. So allow for this in raid0.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
990a8baf568ca1d0ae65e59783ff821794118d07 22-Jun-2005 Jesper Juhl <juhl-lkml@dif.dk> [PATCH] md: remove unneeded NULL checks before kfree

This patch removes some unneeded checks of pointers being NULL before
calling kfree() on them. kfree() handles NULL pointers just fine, checking
first is pointless.

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!