History log of /fs/f2fs/segment.c
Revision Date Author Comments
88b88a66797159949cec32eaab12b4968f6fae2d 07-Oct-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: support atomic writes

This patch introduces a very limited functionality for atomic write support.
In order to support atomic write, this patch adds two ioctls:
o F2FS_IOC_START_ATOMIC_WRITE
o F2FS_IOC_COMMIT_ATOMIC_WRITE

The database engine should be aware of the following sequence.
1. open
-> ioctl(F2FS_IOC_START_ATOMIC_WRITE);
2. writes
: all the written data will be treated as atomic pages.
3. commit
-> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
: this flushes all the data blocks to the disk, which will be shown all or
nothing by f2fs recovery procedure.
4. repeat to #2.

The IO pattens should be:

,- START_ATOMIC_WRITE ,- COMMIT_ATOMIC_WRITE
CP | D D D D D D | FSYNC | D D D D | FSYNC ...
`- COMMIT_ATOMIC_WRITE

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7cd8558baa4e4588a80ecb31cb30784195763cdd 23-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: check the use of macros on block counts and addresses

This patch cleans up the existing and new macros for readability.

Rule is like this.

,-----------------------------------------> MAX_BLKADDR -,
| ,------------- TOTAL_BLKS ----------------------------,
| | |
| ,- seg0_blkaddr ,----- sit/nat/ssa/main blkaddress |
block | | (SEG0_BLKADDR) | | | | (e.g., MAIN_BLKADDR) |
address 0..x................ a b c d .............................
| |
global seg# 0...................... m .............................
| | |
| `------- MAIN_SEGS -----------'
`-------------- TOTAL_SEGS ---------------------------'
| |
seg# 0..........xx..................

= Note =
o GET_SEGNO_FROM_SEG0 : blk address -> global segno
o GET_SEGNO : blk address -> segno
o START_BLOCK : segno -> starting block address

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4b2fecc84655055a6a1fe9151786992ac04b56ce 21-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: introduce FITRIM in f2fs_ioctl

This patch introduces FITRIM in f2fs_ioctl.
In this case, f2fs will issue small discards and prefree discards as many as
possible for the given area.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
9b5f136fd41658f384a5b4ea49d8ef37036e15f5 17-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: change the ipu_policy option to enable combinations

This patch changes the ipu_policy setting to use any combination of orthogonal policies.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
55cf9cb63f0e5439f208d78ed944de9a8df65011 15-Sep-2014 Chao Yu <chao2.yu@samsung.com> f2fs: support large sector size

Block size in f2fs is 4096 bytes, so theoretically, f2fs can support 4096 bytes
sector device at maximum. But now f2fs only support 512 bytes size sector, so
block device such as zRAM which uses page cache as its block storage space will
not be mounted successfully as mismatch between sector size of zRAM and sector
size of f2fs supported.

In this patch we support large sector size in f2fs, so block device with sector
size of 512/1024/2048/4096 bytes can be supported in f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
90a893c749f4582f21e97639f4e85e7f2362c2f0 23-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: use MAX_BIO_BLOCKS(sbi)

This patch cleans up a simple macro.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
c1ce1b02bb25640567dc484dc94d3a195d21e705 11-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: give an option to enable in-place-updates during fsync to users

If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
721bd4d5c3f957f98157b6dcac9c4a4dd828e3ff 05-Sep-2014 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: use lock-less list(llist) to simplify the flush cmd management

We use flush cmd control to collect many flush cmds, and flush them
together. In this case, we use two list to manage the flush cmds
(collect and dispatch), and one spin lock is used to protect this.
In fact, the lock-less list(llist) is very suitable to this case,
and we use simplify this routine.

-
v2:
-use llist_for_each_entry_safe to fix possible use-after-free issue.
-remove the unused field from struct flush_cmd.
Thanks for Yu's suggestion.
-

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
184a5cd2ce281f1207d72adb9ae18e416ca371db 04-Sep-2014 Chao Yu <chao2.yu@samsung.com> f2fs: refactor flush_sit_entries codes for reducing SIT writes

In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:

"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
journal is full, then flush the left dirty entries to disk without merge
journaled entries, so these journaled entries may be flushed to disk at next
checkpoint but lost chance to flushed last time."

Actually, we have the same problem in using SIT journal area.

In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.

In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.

In my testing environment, it shows this patch can help to reduce SIT block
update obviously.

virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070

Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
d3a14afd5ed1970519a2d6ed59f4062ec3ba821f 04-Sep-2014 Chao Yu <chao2.yu@samsung.com> f2fs: remove unneeded sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO

sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO is not used, remove it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
ec325b5270cd3ba01bce299d1ede1616f31813ea 03-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: handle bug cases by letting fsck.f2fs initiate

This patch adds to handle corner buggy cases for fsck.f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
05796763b8d19b48bb4149bfb1aa1a91dd9faee6 03-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: add BUG cases to initiate fsck.f2fs

This patch replaces BUG cases with f2fs_bug_on to remain fsck.f2fs information.
And it implements some void functions to initiate fsck.f2fs too.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
9850cf4a8908886370b1f15aacf83d291f098c72 03-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: need fsck.f2fs when f2fs_bug_on is triggered

If any f2fs_bug_on is triggered, fsck.f2fs is needed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4081363fbe84a7ebac6d3339dd2775df45d856d0 03-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB

This patch adds three inline functions to clean up dirty casting codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
202095a7a0ec075b924cb15dde330bf76e485f61 15-Aug-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: remove rewrite_node_page

I think we need to let the dirty node pages remain in the page cache instead
of rewriting them in their places.
So, after done with successful recovery, write_checkpoint will flush all of them
through the normal write path.
Through this, we can avoid potential error cases in terms of block allocation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
e1c42045203071c4634b89e696037357810d3083 06-Aug-2014 arter97 <qkrwngud825@gmail.com> f2fs: fix typo

Fix typo and some grammatical errors.

The words "filesystem" and "readahead" are being used without the space treewide.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
b65ee14818e67127aa242fe1dbd3711b9c095cc0 04-Aug-2014 Chao Yu <chao2.yu@samsung.com> f2fs: use for_each_set_bit to simplify the code

This patch uses for_each_set_bit to simplify some codes in f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
33be828ada7274ebcade2001f16e5b4e33a4636e 30-Jul-2014 Dongho Sim <dh.sim@samsung.com> f2fs: remove redundant lines in allocate_data_block

There are redundant lines in allocate_data_block.

In this function, we call refresh_sit_entry with old seg and old curseg.
After that, we call locate_dirty_segment with old curseg.

But, the new address is always allocated from old curseg and
we call locate_dirty_segment with old curseg in refresh_sit_entry.
So, we do not need to call locate_dirty_segment with old curseg again.

We've discussed like below:

Jaegeuk said:
"When considering SSR, we need to take care of the following scenario.
- old segno : X
- new address : Z
- old curseg : Y
This means, a new block is supposed to be written to Z from X.
And Z is newly allocated in the same path from Y.

In that case, we should trigger locate_dirty_segment for Y, since
it was a current_segment and can be dirty owing to SSR.
But that was not included in the dirty list."

Changman said:
"We already choosed old curseg(Y) and then we allocate new address(Z) from old
curseg(Y). After that we call refresh_sit_entry(old address, new address).
In the funcation, we call locate_dirty_segment with old seg and old curseg.
So calling locate_dirty_segment after refresh_sit_entry again is redundant."

Jaegeuk said:
"Right. The new address is always allocated from old_curseg."

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Dongho Sim <dh.sim@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
24a9ee0fa3d40415765d2d9f6064930d72ad8b5a 26-Jul-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: add tracepoint for f2fs_issue_flush

This patch adds a tracepoint for f2fs_issue_flush.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
cf2271e781cb16e1ca22be920010c2b64d90c338 26-Jul-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: avoid retrying wrong recovery routine when error was occurred

This patch eliminates the propagation of recovery errors to the next mount.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
0f7b2abd188089a44f60e2bf8521d1363ada9e12 23-Jul-2014 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: add nobarrier mount option

This patch adds a mount option, nobarrier, in f2fs.
The assumption in here is that file system keeps the IO ordering, but
doesn't care about cache flushes inside the storages.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6b2920a513ec9972f7b80d219fcf6f59130a1f31 07-Jul-2014 Chao Yu <chao2.yu@samsung.com> f2fs: use inner macro and function to clean up codes

In this patch we use below inner macro and function to clean up codes.
1. ADDRS_PER_PAGE
2. SM_I
3. f2fs_readonly

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
b434babf854eab82397759e98425423cd9bc4786 23-Jun-2014 Fabian Frederick <fabf@skynet.be> f2fs: replace count*size kzalloc by kcalloc

kcalloc manages count*sizeof overflow.

Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
50e1f8d22199b557337b3d1ec8520e4c5aa5c76e 07-Jul-2014 Chao Yu <chao2.yu@samsung.com> f2fs: avoid to access NULL pointer in issue_flush_thread

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=75861

Denis 2014-05-10 11:28:59 UTC reported:
"F2FS-fs (mmcblk0p28): mounting..
Unable to handle kernel NULL pointer dereference at virtual address 00000018
...
[<c0a2f678>] (_raw_spin_lock+0x3c/0x70) from [<c03a0330>] (issue_flush_thread+0x50/0x17c)
[<c03a0330>] (issue_flush_thread+0x50/0x17c) from [<c01b4064>] (kthread+0x98/0xa4)
[<c01b4064>] (kthread+0x98/0xa4) from [<c0108060>] (kernel_thread_exit+0x0/0x8)"

This patch assign cmd_control_info in sm_info before issue_flush_thread is being
created, so this make sure that issue flush thread will have no chance to access
invalid info in fcc.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8bc6f60e3f7f31c4ce370b4b27b8f4b355b7f07e 11-Jun-2014 Chao Yu <chao2.yu@samsung.com> f2fs: remove unused variables in f2fs_sm_info

Remove unused variables in struct f2fs_sm_info.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
adf8d90b6a949dc80e827263fccb31f8eb08a55d 08-May-2014 Chao Yu <chao2.yu@samsung.com> f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency

If we use slab memory in f2fs_issue_flush(), we will face memory pressure and
latency time caused by racing of kmem_cache_{alloc,free}.

Let's alloc memory in stack instead of slab.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2163d19815b3dfdb243cee2de2478ae7efce1942 27-Apr-2014 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: introduce help function {create,destroy}_flush_cmd_control

Introduce help function {create,destroy}_flush_cmd_control to clean up
the create/destory flush merge operation.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
a688b9d9e5cbec76edab74e724297b5488c07829 27-Apr-2014 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: introduce struct flush_cmd_control to wrap the flush_merge fields

Split the flush_merge fields from sm_i, and use the new struct flush_cmd_control
to wrap it, so that we can igonre these fileds if flush_merge is disable, and
it alse can the structs more neat.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
876dc59eb1f0131c092803d0d206d47dd0119dfe 11-Apr-2014 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: add the flush_merge handle in the remount flow

Add the *remount* handle of flush_merge option, so that the users
can enable flush_merge in the runtime, such as the underlying device
handles the cache_flush command relatively slowly.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
1e87a78d95ecea7a989349860feb42db3e4b7db5 15-Apr-2014 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: avoid to conduct roll-forward due to the remained garbage blocks

The f2fs always scans the next chain of direct node blocks.
But some garbage blocks are able to be remained due to no discard support or
SSR triggers.
This occasionally wreaks recovering wrong inodes that were used or BUG_ONs
due to reallocating node ids as follows.

When mount this f2fs image:
http://linuxtesting.org/downloads/f2fs_fault_image.zip
BUG_ON is triggered in f2fs driver (messages below are generated on
kernel 3.13.2; for other kernels output is similar):

kernel BUG at fs/f2fs/node.c:215!
Call Trace:
[<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
[<ffffffff811446e7>] ? __lock_page+0x67/0x70
[<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
[<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
[<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
[<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
[<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
[<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
[<ffffffff811b8769>] mount_bdev+0x1c9/0x210
[<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
[<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
[<ffffffff811b9497>] mount_fs+0x47/0x1c0
[<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
[<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
[<ffffffff811d6763>] do_mount+0x493/0x910
[<ffffffff811615cb>] ? strndup_user+0x5b/0x80
[<ffffffff811d6c70>] SyS_mount+0x90/0xe0
[<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b

Found by Linux File System Verification project (linuxtesting.org).

Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
b270ad6f0aedd27bdc689fc15f26bc650a59b12b 11-Apr-2014 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: enable flush_merge only in f2fs is not read-only

Enable flush_merge only in f2fs is not read-only, so does the mount
option show.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
197d46476c3ba5f8595209025d20953a0c851748 11-Apr-2014 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: use __GFP_ZERO to avoid appending set-NULL

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
a4ed23f2f1a875f4b50d5f7e3dd5b0abbedda1ff 11-Apr-2014 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: put the bio when issue_flush completed

Put the bio when the flush cmd issued, it also can fix the following
kmemleak:
unreferenced object 0xffff8800270c73c0 (size 200):
comm "f2fs_flush-7:0", pid 27161, jiffies 4312127988 (age 988.503s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 40 07 81 19 01 88 ff ff ........@.......
01 00 00 00 00 00 00 f0 11 14 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81559866>] kmemleak_alloc+0x72/0x96
[<ffffffff81156f7e>] slab_post_alloc_hook+0x28/0x2a
[<ffffffff811595b1>] kmem_cache_alloc+0xec/0x157
[<ffffffff8111924d>] mempool_alloc_slab+0x15/0x17
[<ffffffff81119513>] mempool_alloc+0x71/0x138
[<ffffffff81193548>] bio_alloc_bioset+0x93/0x18c
[<ffffffffa040f857>] issue_flush_thread+0x8d/0x145 [f2fs]
[<ffffffff8107ac16>] kthread+0xba/0xc2
[<ffffffff81571b2c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
6b4afdd794783fe515b50838aa36591e3feea990 02-Apr-2014 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: introduce f2fs_issue_flush to avoid redundant flush issue

Some storage devices show relatively high latencies to complete cache_flush
commands, even though their normal IO speed is prettry much high. In such
the case, it needs to merge cache_flush commands as much as possible to avoid
issuing them redundantly.
So, this patch introduces a mount option, "-o flush_merge", to mitigate such
the overhead.

If this option is enabled by user, F2FS merges the cache_flush commands and then
issues just one cache_flush on behalf of them. Once the single command is
finished, F2FS sends a completion signal to all the pending threads.

Note that, this option can be used under a workload consisting of very intensive
concurrent fsync calls, while the storage handles cache_flush commands slowly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
ce23447fe5764391025a67c20c97eaf5c6ac1ec3 02-Apr-2014 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: fix to cover io->bio with io_rwsem

In the f2fs_wait_on_page_writeback, io->bio should be covered by io_rwsem.
Otherwise, the bio pointer can become a dangling pointer due to data races.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2d7b822ad9daf0ea903accacaa89340ddd3f201f 29-Mar-2014 Chao Yu <chao2.yu@samsung.com> f2fs: use list_for_each_entry{_safe} for simplyfying code

This patch use list_for_each_entry{_safe} instead of list_for_each{_safe} for
simplfying code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
df0f8dc0e154de13e3a54846f384b674dd557c85 22-Mar-2014 Chao Yu <chao2.yu@samsung.com> f2fs: avoid unnecessary bio submit when wait page writeback

This patch introduce is_merged_page() to check whether current page is merged
in f2fs bio cache. When page is not in cache, we can avoid submitting bio cache,
resulting in having more chance to merge pages.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
58c410351eba3d24f741c85a0eb9eaf15c94047d 19-Mar-2014 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: change reclaim rate in percentage

It is more reasonable to determine the reclaiming rate of prefree segments
according to the volume size, which is set to 5% by default.
For example, if the volume is 128GB, the prefree segments are reclaimed
when the number reaches to 6.4GB.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
e4fc5fbfc9e285356be7e5208bb1a2fa377b2656 17-Mar-2014 Chao Yu <chao2.yu@samsung.com> f2fs: avoid to return incorrect errno of read_normal_summaries

We should return error number of read_normal_summaries instead of -EINVAL when
read_normal_summaries failed.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
d653788a43475eb3cdfcfaa60fb53451878944cf 07-Mar-2014 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: optimize restore_node_summary slightly

Previously, we ra_sum_pages to pre-read contiguous pages as more
as possible, and if we fail to alloc more pages, an ENOMEM error
will be reported upstream, even though we have alloced some pages
yet. In fact, we can use the available pages to do the job partly,
and continue the rest in the following circle. Only reporting ENOMEM
upstream if we really can not alloc any available page.

And another fix is ignoring dealing with the following pages if an
EIO occurs when reading page from page_list.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: modify the flow for better neat code]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
e8512d2e0c4eb38cd78b1499bb08d7d8eea6c723 07-Mar-2014 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: remove the unused ctor argument of f2fs_kmem_cache_create()

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
662befda25fb16d7164633c39e9e20aeac5107d9 07-Feb-2014 Chao Yu <chao2.yu@samsung.com> f2fs: introduce ra_meta_pages to readahead CP/NAT/SIT pages

This patch help us to cleanup the readahead code by merging ra_{sit,nat}_pages
function into ra_meta_pages.
Additionally the new function is used to readahead cp block in
recover_orphan_inodes.

Change log from v1:
o fix a deadloop bug pointed by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
491c0854b41380f48e422c00ae7e25ae4d02cecc 04-Feb-2014 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: clean up with a macro

This patch adds GET_BLKOFF_FROM_SEG0 to clean up some codes.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
5e443818fa0b2a2845561ee25bec181424fb2889 27-Jan-2014 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: handle dirty segments inside refresh_sit_entry

This patch cleans up the refresh_sit_entry to handle locate_dirty_segments.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
9df27d982d58b9372bc476fb6b9bab861d617029 20-Jan-2014 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: add help function META_MAPPING

Introduce help function META_MAPPING() to get the cache meta blocks'
address space.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
6c311ec6c2d9e015d454b4e3fda8008b5bebf316 17-Jan-2014 Chris Fries <cfries@motorola.com> f2fs: clean checkpatch warnings

Fixed a variety of trivial checkpatch warnings. The only delta should
be some minor formatting on log strings that were split / too long.

Signed-off-by: Chris Fries <cfries@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
5514f0aadddcdfaaaea697b60203f5402552eb7b 10-Jan-2014 Yuan Zhong <yuan.mark.zhong@samsung.com> f2fs: remove the needless parameter of f2fs_wait_on_page_writeback

"boo sync" parameter is never referenced in f2fs_wait_on_page_writeback.
We should remove this parameter.

Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
fb5566da9181d33ecdd9892e44f90320e7d4cc9f 08-Jan-2014 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: improve write performance under frequent fsync calls

When considering a bunch of data writes with very frequent fsync calls, we
are able to think the following performance regression.

N: Node IO, D: Data IO, IO scheduler: cfq

Issue pending IOs
D1 D2 D3 D4
D1 D2 D3 D4 N1
D2 D3 D4 N1 N2
N1 D3 D4 N2 D1
--> N1 can be selected by cfq becase of the same priority of N and D.
Then D3 and D4 would be delayed, resuling in performance degradation.

So, when processing the fsync call, it'd better give higher priority to data IOs
than node IOs by assigning WRITE and WRITE_SYNC respectively.
This patch improves the random wirte performance with frequent fsync calls by up
to 10%.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
7e8f23081ab3a11de90d7389f2c6fd44676c8df9 20-Dec-2013 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: remove the rw_flag domain from f2fs_io_info

When using the f2fs_io_info in the low level, we still need to merge the
rw and rw_flag, so use the rw to hold all the io flags directly,
and remove the rw_flag field.

ps.It is based on the previous patch:
f2fs: move all the bio initialization into __bio_alloc

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
bfad7c2d40332be6a1d7a89660bceb0f6ea1d73a 16-Dec-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: introduce a new direct_IO write path

Previously, f2fs doesn't support direct IOs with high performance, which throws
every write requests via the buffered write path, resulting in highly
performance degradation due to memory opeations like copy_from_user.

This patch introduces a new direct IO path in which every write requests are
processed by generic blockdev_direct_IO() with enhanced get_block function.

The get_data_block() in f2fs handles:
1. if original data blocks are allocates, then give them to blockdev.
2. otherwise,
a. preallocate requested block addresses
b. do not use extent cache for better performance
c. give the block addresses to blockdev

This policy induces that:
- new allocated data are sequentially written to the disk
- updated data are randomly written to the disk.
- f2fs gives consistency on its file meta, not file data.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
216fbd64437452d23db54ae845916facd7215caa 07-Nov-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: introduce sysfs entry to control in-place-update policy

This patch introduces new sysfs entries for users to control the policy of
in-place-updates, namely IPU, in f2fs.

Sometimes f2fs suffers from performance degradation due to its out-of-place
update policy that produces many additional node block writes.
If the storage performance is very dependant on the amount of data writes
instead of IO patterns, we'd better drop this out-of-place update policy.

This patch suggests 5 polcies and their triggering conditions as follows.

[sysfs entry name = ipu_policy]

0: F2FS_IPU_FORCE all the time,
1: F2FS_IPU_SSR if SSR mode is activated,
2: F2FS_IPU_UTIL if FS utilization is over threashold,
3: F2FS_IPU_SSR_UTIL if SSR mode is activated and FS utilization is over
threashold,
4: F2FS_IPU_DISABLE disable IPU. (=default option)

[sysfs entry name = min_ipu_util]

This parameter controls the threshold to trigger in-place-updates.
The number indicates percentage of the filesystem utilization, and used by
F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.

For more details, see need_inplace_update() in segment.h.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
458e6197c37de53f7be0a837644daabb900c3036 11-Dec-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: refactor bio->rw handling

This patch introduces f2fs_io_info to mitigate the complex parameter list.

struct f2fs_io_info {
enum page_type type; /* contains DATA/NODE/META/META_FLUSH */
int rw; /* contains R/RS/W/WS */
int rw_flag; /* contains REQ_META/REQ_PRIO */
}

1. f2fs_write_data_pages
- DATA
- WRITE_SYNC is set when wbc->WB_SYNC_ALL.

2. sync_node_pages
- NODE
- WRITE_SYNC all the time

3. sync_meta_pages
- META
- WRITE_SYNC all the time
- REQ_META | REQ_PRIO all the time

** f2fs_submit_merged_bio() handles META_FLUSH.

4. ra_nat_pages, ra_sit_pages, ra_sum_pages
- META
- READ_SYNC

Cc: Fan Li <fanofcode.li@samsung.com>
Cc: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
63a0b7cb33d85aeb0df39b984c08e234db4925d1 09-Dec-2013 Fan Li <fanofcode.li@samsung.com> f2fs: merge pages with the same sync_mode flag

Previously f2fs submits most of write requests using WRITE_SYNC, but f2fs_write_data_pages
submits last write requests by sync_mode flags callers pass.

This causes a performance problem since continuous pages with different sync flags
can't be merged in cfq IO scheduler(thanks yu chao for pointing it out), and synchronous
requests often take more time.

This patch makes the following modifies to DATA writebacks:

1. every page will be written back using the sync mode caller pass.
2. only pages with the same sync mode can be merged in one bio request.

These changes are restricted to DATA pages.Other types of writebacks are modified
To remain synchronous.

In my test with tiotest, f2fs sequence write performance is improved by about 7%-10% ,
and this patch has no obvious impact on other performance tests.

Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
93dfe2ac516250755f7d5edd438b0ce67c0e3aa6 29-Nov-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: refactor bio-related operations

This patch integrates redundant bio operations on read and write IOs.

1. Move bio-related codes to the top of data.c.
2. Replace f2fs_submit_bio with f2fs_submit_merged_bio, which handles read
bios additionally.
3. Introduce __submit_merged_bio to submit the merged bio.
4. Change f2fs_readpage to f2fs_submit_page_bio.
5. Introduce f2fs_submit_page_mbio to integrate previous submit_read_page and
submit_write_page.

Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com >
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
187b5b8b3dfcfc73126f2743c89cc47df3bf07be 30-Nov-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: remove the own bi_private allocation

Previously f2fs allocates its own bi_private data structure all the time even
though we don't use it. But, can we remove this bi_private allocation?

This patch removes such the additional bi_private allocation.

1. Retrieve f2fs_sb_info from its page->mapping->host->i_sb.
- This removes the usecases of bi_private in end_io.

2. Use bi_private only when we really need it.
- The bi_private is used only when the checkpoint procedure is conducted.
- When conducting the checkpoint, f2fs submits a META_FLUSH bio to wait its bio
completion.
- Since we have no dependancies to remove bi_private now, let's just use
bi_private pointer as the completion pointer.

Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
f9a4e6df52edf8ce1040d1b8d340d31234a1bce3 27-Nov-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: bug fix on bit overflow from 32bits to 64bits

This patch fixes some bit overflows by the shift operations.

Dan Carpenter reported potential bugs on bit overflows as follows.

fs/f2fs/segment.c:910 submit_write_page()
warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type?
fs/f2fs/checkpoint.c:429 get_valid_checkpoint()
warn: should '1 << ()' be a 64 bit type?
fs/f2fs/data.c:408 f2fs_readpage()
warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type?
fs/f2fs/data.c:457 submit_read_page()
warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type?
fs/f2fs/data.c:525 get_data_block_ro()
warn: should 'i << blkbits' be a 64 bit type?

Bug-Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
03232305ff3cf44761f7ea271f7c9af5105392b9 24-Nov-2013 Changman Lee <cm224.lee@samsung.com> f2fs: send REQ_META or REQ_PRIO when reading meta area

Let's send REQ_META or REQ_PRIO when reading meta area such as NAT/SIT
etc.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
a709f4a2f22c0ebaed1d99aee63ab44ffc2ba3d0 24-Nov-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: add detailed information of bio types in the tracepoints

This patch inserts information of bio types in more detail.
So, we can now see REQ_META and REQ_PRIO too.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
74de593af77b109f202c47e090c9e134c8882869 22-Nov-2013 Chao Yu <chao2.yu@samsung.com> f2fs: read contiguous sit entry pages by merging for mount performance

Previously we read sit entries page one by one, this method lost the chance
of reading contiguous page together. So we read pages as contiguous as
possible for better mount performance.

change log:
o merge judgements/use 'Continue' or 'Break' instead of 'Goto' as Gu Zheng
suggested.
o add mark_page_accessed() before release page to delay VM reclaiming.
o remove '*order' for simplification of function as Jaegeuk Kim suggested.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix a bug on the block address calculation]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
d4d288bc72c020d335868ce217695c4d5dfd74d0 23-Nov-2013 Chao Yu <chao2.yu@samsung.com> f2fs: adds a tracepoint for f2fs_submit_read_bio

This patch adds a tracepoint for f2fs_submit_read_bio.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: integrate tracepoints of f2fs_submit_read(_write)_bio]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
87b8872d5b4a8f9f61123ab913aff4f6047d8b53 20-Nov-2013 Chao Yu <chao2.yu@samsung.com> f2fs: adds a tracepoint for submit_read_page

This patch adds a tracepoint for submit_read_page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: integrate tracepoints of f2fs_submit_read(_write)_page]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
1ff7bd3bb5f7f57bc7418ee6ed12f3ae217e4e9c 18-Nov-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: introduce a bio array for per-page write bios

The f2fs has three bio types, NODE, DATA, and META, and manages some data
structures per each bio types.

The codes are a little bit messy, thus, this patch introduces a bio array
which groups individual data structures as follows.

struct f2fs_bio_info {
struct bio *bio; /* bios to merge */
sector_t last_block_in_bio; /* last block number */
struct mutex io_mutex; /* mutex for bio */
};

struct f2fs_sb_info {
...
struct f2fs_bio_info write_io[NR_PAGE_TYPE]; /* for write bios */
...
};

The code changes from this new data structure are trivial.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
971767caf632190f77a40b4011c19948232eed75 18-Nov-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: use sbi->write_mutex for write bios

This patch removes an unnecessary semaphore (i.e., sbi->bio_sem).
There is no reason to use the semaphore when f2fs submits read and write IOs.
Instead, let's use a write mutex and cover the sbi->bio[] by the lock.

Change log from v1:
o split write_mutex suggested by Chao Yu

Chao described,
"All DATA/NODE/META bio buffers in superblock is protected by
'sbi->write_mutex', but each bio buffer area is independent, So we
should split write_mutex to three for DATA/NODE/META."

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
7d5e510944ce60ef0c6c2300a58547679df76db7 18-Nov-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: clean up the do_submit_bio flow

This patch introduces PAGE_TYPE_OF_BIO() and cleans up do_submit_bio() with it.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
1661d07c2d5e6486cab1c189cfb65ff60abf9b92 12-Nov-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: add a tracepoint for f2fs_issue_discard

This patch adds a tracepoint for f2fs_issue_discard.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
3720887910864467a61cd0d64bad3965009cdef8 12-Nov-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: introduce f2fs_issue_discard() to clean up

Change log from v1:
o fix 32bit drops reported by Dan Carpenter

This patch adds f2fs_issue_discard() to clean up blkdev_issue_discard() flows.

Dan carpenter reported:
"block_t is a 32 bit type and sector_t is a 64 bit type. The upper 32
bits of the sector_t are not used because the shift will wrap."

Bug-Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
b29555505d81e496fdbd125190c8043f3c09a83c 12-Nov-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: add key functions for small discards

This patch adds key functions to activate the small discard feature.

Note that this procedure is conducted during the checkpoint only.

In flush_sit_entries(), when a new dirty sit entry is flushed, f2fs calls
add_discard_addrs() which searches candidates to be discarded.
The candidates should be marked *invalidated* and also previous checkpoint
recognizes it as *valid*.

At the end of a checkpoint procedure, f2fs throws discards based on the
discard entry list.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
7fd9e544fbb10c6ae4b4953f6063560c8eeae6e8 15-Nov-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: add a slab cache entry for small discards

This patch adds a slab cache entry for small discards.

Each entry consists of:

struct discard_entry {
struct list_head list; /* list head */
block_t blkaddr; /* block address to be discarded */
int len; /* # of consecutive blocks of the discard */
};

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
e81c93cf8c7bd413898798cf8c67f18b1fef3360 15-Nov-2013 Changman Lee <cm224.lee@samsung.com> f2fs: improve searching speed of __next_free_blkoff

To find a zero bit using the result of OR operation between ckpt_valid_map
and cur_valid_map is more fast than find a zero bit in each bitmap.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: adjust changed function name]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
9a7f143ab529352ebef13d3f0f4a09f13efa9435 15-Nov-2013 Changman Lee <cm224.lee@samsung.com> f2fs: introduce __find_rev_next(_zero)_bit

When f2fs_set_bit is used, in a byte MSB and LSB is reversed,
in that case we can use __find_rev_next_bit or __find_rev_next_zero_bit.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: change the function names]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
4f024f3797c43cb4b73cd2c50cec728842d0e49e 12-Oct-2013 Kent Overstreet <kmo@daterainc.com> block: Abstract out bvec iterator

Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchand@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>6
2c30c71bd653afcbed7f6754e8fe3d16e0e708a1 07-Nov-2013 Kent Overstreet <kmo@daterainc.com> block: Convert various code to bio_for_each_segment()

With immutable biovecs we don't want code accessing bi_io_vec directly -
the uses this patch changes weren't incorrect since they all own the
bio, but it makes the code harder to audit for no good reason - also,
this will help with multipage bvecs later.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
29e59c14ae5c21d25db1580d9651b5855d656a30 11-Nov-2013 Changman Lee <cm224.lee@samsung.com> f2fs: issue more large discard command

o Changes from v1
Use find_next(_zero)_bit suggested by jg.kim

When f2fs issues discard command, if segment is contiguous,
let's issue more large segment to gather adjacent segments.

** blktrace **
179,1 0 5859 42.619023770 971 C D 131072 + 2097152 [0]
179,1 0 33665 108.840475468 971 C D 2228224 + 2494464 [0]
179,1 0 33671 109.131616427 971 C D 14909440 + 344064 [0]
179,1 0 33677 109.137100677 971 C D 15261696 + 4096 [0]

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
fb51b5ef9c07844f80402702bd3d3002ceca5cd9 06-Nov-2013 Changman Lee <cm224.lee@samsung.com> f2fs: cleanup waiting routine for writeback pages in cp

use genernal method supported by kernel

o changes from v1
If any waiter exists at end io, wake up it.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
3b03f72445ba1437cfa29f9719bb3cfdb60558d9 06-Nov-2013 Chao Yu <chao2.yu@samsung.com> f2fs: avoid to use a NULL point in destroy_segment_manager

A NULL point should avoid to be used in destroy_segment_manager after allocating
memory fail for f2fs_sm_info.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
9a47938b226cc2b8e2afd72b0f1ca1a7e1367cf5 29-Oct-2013 Fan Li <fanofcode.li@samsung.com> f2fs: change the method of calculating the number summary blocks

npages_for_summary_flush uses (SUMMARY_SIZE + 1) as the size of a f2fs_summary
while its actual size is SUMMARY_SIZE. So the result sometimes is bigger than
actual number by one, which causes checkpoint can't be written into disk
contiguously, and sometimes summary blocks can't be compacted like they should.
Besides, when writing summary blocks into pages, if remain space in a page
isn't big enough for one f2fs_summary, it will be left unused, current code
seems not to take it into account.

Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
5d56b6718a0f4e5c58cdd3cb6b7a472d7c5671b9 29-Oct-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: add an option to avoid unnecessary BUG_ONs

If you want to remove unnecessary BUG_ONs, you can just turn off F2FS_CHECK_FS
in your kernel config.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
4625d6aac2d00a18f7bcc15bffe41e9de3a25332 25-Oct-2013 Changman Lee <cm224.lee@samsung.com> f2fs: remove unnecessary segment bitmap updates

Only one dirty type is set in __locate_dirty_segment and we can know
dirty type of segment. So we don't need to check other dirty types.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
e8d61a7488d06aba3e7226e3536a6a6e14391ce8 24-Oct-2013 Chao Yu <chao2.yu@samsung.com> f2fs: remove redundant set_page_dirty from write_compacted_summaries

Previously, set_page_dirty is called every time after writting one summary info
into compacted summary page,
To avoid redundant set_page_dirty, we only call set_page_dirty before release
page.

Signed-off-by: Yu Chao <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
4660f9c0fe484353b17a4b9d1cc2b036fa895f76 24-Oct-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: introduce f2fs_balance_fs_bg for some background jobs

This patch merges some background jobs into this new function.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
81eb8d6e2869b119d4a7b8c02091c3779733a3ac 24-Oct-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: reclaim prefree segments periodically

Previously, f2fs postpones reclaiming prefree segments into free segments
as much as possible.
However, if user writes and deletes a bunch of data without any sync or fsync
calls, some flash storages can suffer from garbage collections.

So, this patch adds the reclaiming codes to f2fs_write_node_pages and background
GC thread.

If there are a lot of prefree segments, let's do checkpoint so that f2fs
submits discard commands for the prefree regions to the flash storage.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
dcdfff65276fdc6dfe5eb1d0aff802dfa7a95e15 22-Oct-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: clean up several status-related operations

This patch cleans up improper definitions that update some status information.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
435f2a1b58ac8f50894f23549c97791085f7cba2 18-Oct-2013 Haicheng Li <haicheng.li@linux.intel.com> f2fs: no need to check other dirty_segmap when the seg has been found

Because one dirty seg can only be mapped to one dirty_type. Otherwise, it's a bug.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
[Jaegeuk Kim: modify a comment related to this patch]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
cffbfa66485e4940091d7e64024e802314d24c09 18-Oct-2013 Haicheng Li <haicheng.li@linux.intel.com> f2fs: use true and false for boolean value

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
e234088758fca3a669ebb1a02d8bf7bf60f0e4ff 14-Oct-2013 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: avoid wait if IO end up when do_checkpoint for better performance

Previously, do_checkpoint() will call congestion_wait() for waiting the pages
(previous submitted node/meta/data pages) to be written back.
Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting, and
no additional wake up mechanism was introduced if IO ends up before regular period costed.
Yuan Zhong found there is a situation that after the pages have been written back,
but the checkpoint thread still wait for congestion_wait to exit.

So here we store checkpoint task into f2fs_sb when doing checkpoint, it'll wait for IO completes
if there's IO going on, and in the end IO path, wake up checkpoint task when IO ends up.

Thanks to Yuan Zhong's pre work about this problem.

Reported-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
cc7b1bb173676621b092b61d22d8d12b05efb5e8 22-Sep-2013 Chao Yu <chao2.yu@samsung.com> f2fs: avoid allocating failure in bio_alloc

This patch add macro MAX_BIO_BLOCKS to limit value of npages in
f2fs_bio_alloc, it can avoid allocating failure in bio_alloc caused by
npages is larger than BIO_MAX_PAGES.

Signed-off-by: Yu Chao <chao2.yu@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
7b40527508670e56d817b837b2114bc340446539 19-Aug-2013 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: fix a compound statement label error

An error "label at end of compound statement" will occur if CONFIG_F2FS_STAT_FS
disabled.
fs/f2fs/segment.c:556:1: error: label at end of compound statement
So clean up the 'out' label to fix it.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
41dfde135f9169948dd0c9bba948774f2e521210 09-Aug-2013 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: clean up the needless end 'return' of void function

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
a569469e967022d9ceeaa4b73619f96614087d2d 05-Aug-2013 Jin Xu <jinuxstyle@gmail.com> f2fs: fix a deadlock in fsync

This patch fixes a deadlock bug that occurs quite often when there are
concurrent write and fsync on a same file.

Following is the simplified call trace when tasks get hung.

fsync thread:
- f2fs_sync_file
...
- f2fs_write_data_pages
...
- update_extent_cache
...
- update_inode
- wait_on_page_writeback

bdi writeback thread
- __writeback_single_inode
- f2fs_write_data_pages
- mutex_lock(sbi->writepages)

The deadlock happens when the fsync thread waits on a inode page that has
been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
no one else could be able to submit the cached bio to block layer for
writeback. This is because the fsync thread already hold a sbi->fs_lock and
the sbi->writepages lock, causing the bdi thread being blocked when attempt
to write data pages for the same inode. At the same time, f2fs_gc thread
does not notice the situation and could not help. Even the sync syscall
gets blocked.

To fix it, we could submit the cached bio first before waiting on a inode page
that is being written back.

Signed-off-by: Jin Xu <jinuxstyle@gmail.com>
[Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
d8207f69589c74037128ff6c9e1a44223fad3b7c 25-Jul-2013 Gu Zheng <guz.fnst@cn.fujitsu.com> f2fs: move bio_private allocation out of f2fs_bio_alloc()

bio->bi_private is not always needed. As in the reading data path,
end_read_io does not need bio_private for further using, so moving
bio_private allocation out of f2fs_bio_alloc(). Alloc it in the
submit_write_page(), and ignore it in the f2fs_readpage().

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
763bfe1bc575dcce56dc5c570dc005d94911705f 27-Jun-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: remove reusing any prefree segments

This patch removes check_prefree_segments initially designed to enhance the
performance by narrowing the range of LBA usage across the whole block device.

When allocating a new segment, previous f2fs tries to find proper prefree
segments, and then, if finds a segment, it reuses the segment for further
data or node block allocation.

However, I found that this was totally wrong approach since the prefree segments
have several data or node blocks that will be used by the roll-forward mechanism
operated after sudden-power-off.

Let's assume the following scenario.

/* write 8MB with fsync */
for (i = 0; i < 2048; i++) {
offset = i * 4096;
write(fd, offset, 4KB);
fsync(fd);
}

In this case, naive segment allocation sequence will be like:
data segment: x, x+1, x+2, x+3
node segment: y, y+1, y+2, y+3.

But, if we can reuse prefree segments, the sequence can be like:
data segment: x, x+1, y, y+1
node segment: y, y+1, y+2, y+3.
Because, y, y+1, and y+2 became prefree segments one by one, and those are
reused by data allocation.

After conducting this workload, we should consider how to recover the latest
inode with its data.
If we reuse the prefree segments such as y or y+1, we lost the old node blocks
so that f2fs even cannot start roll-forward recovery.

Therefore, I suggest that we should remove reusing prefree segments.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
8736fbf00372dcc0bc7b04b86d737eb5db31fff6 16-Jun-2013 Namjae Jeon <namjae.jeon@samsung.com> f2fs: optimize the init_dirty_segmap function

Optimize the while loop condition

Since this condition will always be true and while loop will
be terminated by the following condition in code:

if (segno >= TOTAL_SEGS(sbi))
break;
Hence we can replace the while loop condition with while(1)
instead of always checking for segno to be less than Total segs.

Also we do not need to use TOTAL_SEGS() everytime. We can store
this value in a local variable since this value is constant.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
354a3399dc6f7e556d04e1c731cd50e08eeb44bd 14-Jun-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: recover wrong pino after checkpoint during fsync

If a file is linked, f2fs loose its parent inode number so that fsync calls
for the linked file should do checkpoint all the time.
But, if we can recover its parent inode number after the checkpoint, we can
adjust roll-forward mechanism for the further fsync calls, which is able to
improve the fsync performance significatly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
8d8451af6875f8841dc20987d1363405020a9172 13-Jun-2013 Haicheng Li <haicheng.li@linux.intel.com> f2fs: make locate_dirty_segment() as static

It's used only locally and could be static.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
e79efe3b69d6454eb8ec734a24d49f0f4c7d26f5 13-Jun-2013 Haicheng Li <haicheng.li@linux.intel.com> f2fs: remove unnecessary parameter "offset" from __add_sum_entry()

We can get the value directly from pointer "curseg".

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
35b09d82c3cf3fc0b8b6d923e7fd82ff7926aafc 23-May-2013 Namjae Jeon <namjae.jeon@samsung.com> f2fs: push some variables to debug part

Some, counters are needed only for the statistical information
while debugging.
So, those can be controlled using CONFIG_F2FS_STAT_FS,
pushing the usage for few variables under this flag.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
81fb5e874675517c57e9edd913065f1e17ebd362 14-May-2013 Haicheng Li <haicheng.li@linux.intel.com> f2fs: remove unecessary variable and code

Code cleanup without behavior changed.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
ac5d156c78a68b39955ee9b09498ba93831c77d7 29-Apr-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: modify the number of issued pages to merge IOs

When testing f2fs on an SSD, I found some 128 page IOs followed by 1 page IO
were issued by f2fs_write_node_pages.
This means that there were some mishandling flows which degrades performance.

Previous f2fs_write_node_pages determines the number of pages to be written,
nr_to_write, as follows.

1. The bio_get_nr_vecs returns 129 pages.
2. The bio_alloc makes a room for 128 pages.
3. The initial 128 pages go into one bio.
4. The existing bio is submitted, and a new bio is prepared for the last 1 page.
5. Finally, sync_node_pages submits the last 1 page bio.

The problem is from the use of bio_get_nr_vecs, so this patch replace it
with max_hw_blocks using queue_max_sectors.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
8680441caa5465a2cd87b6be857be4378df46700 25-Apr-2013 Namjae Jeon <namjae.jeon@samsung.com> f2fs: add REQ_META about metadata requests for submit

Adding REQ_META for all the metadata requests can help in improving the
FS performance, if the underlying device supports TAGGING.
So, when considering the submit_bio path for all the f2fs requests. We can
add REQ_META for all the META requests.
As a precursor to this change we considered the commit
4265900e0be653f5b78baf2816857ef57cf1332f 'mmc: MMC-4.5 Data Tag Support'

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
6ec178dac6768204a6edf70f4a53d40b691c12b4 23-Apr-2013 Namjae Jeon <namjae.jeon@samsung.com> f2fs: add tracepoints for write page operations

Add tracepoints to debug the various page write operation
like data pages, meta pages.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: remove unnecessary tracepoints]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
b2f2c390c5612df97f0403e1ef1e4e41c24b7d4f 01-Apr-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: fix the bitmap consistency of dirty segments

Like below, there are 8 segment bitmaps for SSR victim candidates.

enum dirty_type {
DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs */
DIRTY_WARM_DATA, /* dirty segments assigned as warm data logs */
DIRTY_COLD_DATA, /* dirty segments assigned as cold data logs */
DIRTY_HOT_NODE, /* dirty segments assigned as hot node logs */
DIRTY_WARM_NODE, /* dirty segments assigned as warm node logs */
DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
DIRTY, /* to count # of dirty segments */
PRE, /* to count # of entirely obsolete segments */
NR_DIRTY_TYPE
};

The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
And, the DIRTY bitmap integrates all the 6 bitmaps.

For example,
o DIRTY_HOT_DATA : 1010000
o DIRTY_WARM_DATA: 0100000
o DIRTY_COLD_DATA: 0001000
o DIRTY_HOT_NODE : 0000010
o DIRTY_WARM_NODE: 0000001
o DIRTY_COLD_NODE: 0000000
In this case,
o DIRTY : 1111011,

which means that we should guarantee the consistency between DIRTY and other
bitmaps concreately.

However, the SSR mode selects victims freely from any log types, which can set
multiple bits across the various bitmap types.

So, this patch eliminates this inconsistency.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
60374688a1a1cc8ef173d3dab42574719b851ac4 31-Mar-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: allocate remained free segments in the LFS mode

This patch adds a new condition that allocates free segments in the current
active section even if SSR is needed.
Otherwise, f2fs cannot allocate remained free segments in the section since
SSR finds dirty segments only.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
5ec4e49f9bd753e2a6857a96e01f8ae5ff00b459 31-Mar-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: change GC bitmaps to apply the section granularity

This patch removes a bitmap for victim segments selected by foreground GC, and
modifies the other bitmap for victim segments selected by background GC.

1) foreground GC bitmap
: We don't need to manage this, since we just only one previous victim section
number instead of the whole victim history.
The f2fs uses the victim section number in order not to allocate currently
GC'ed section to current active logs.

2) background GC bitmap
: This bitmap is used to avoid selecting victims repeatedly by background GCs.
In addition, the victims are able to be selected by foreground GCs, since
there is no need to read victim blocks during foreground GCs.

By the fact that the foreground GC reclaims segments in a section unit, it'd
be better to manage this bitmap based on the section granularity.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
33afa7fde0defbb362328233e600e052d0a22cd5 30-Mar-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: allocate new segment aligned with sections

When allocating a new segment under the LFS mode, we should keep the section
boundary.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
53cf95222fad7a962cc03fb61a33e37bcf4f5c9d 30-Mar-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: introduce TOTAL_SECS macro

Let's use a macro to get the total number of sections.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
79b5793be44d97c0a0e905c221858af08e5ebd85 28-Mar-2013 Alexandru Gheorghiu <gheorghiuandru@gmail.com> f2fs: use kmemdup

Use kmemdup instead of kzalloc and memcpy.

Signed-off-by: Alexandru Gheorghiu <gheorghiuandru@gmail.com>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
437275272f9e635673f065300e5d95226a25cb06 04-Feb-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: clarify and enhance the f2fs_gc flow

This patch makes clearer the ambiguous f2fs_gc flow as follows.

1. Remove intermediate checkpoint condition during f2fs_gc
(i.e., should_do_checkpoint() and GC_BLOCKED)

2. Remove unnecessary return values of f2fs_gc because of #1.
(i.e., GC_NODE, GC_OK, etc)

3. Simplify write_checkpoint() because of #2.

4. Clarify the main f2fs_gc flow.
o monitor how many freed sections during one iteration of do_garbage_collect().
o do GC more without checkpoints if we can't get enough free sections.
o do checkpoint once we've got enough free sections through forground GCs.

5. Adopt thread-logging (Slack-Space-Recycle) scheme more aggressively on data
log types. See. get_ssr_segement()

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
577e349514452fa3fcd99fd06e587b02d3d1cf28 24-Jan-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: prevent checkpoint once any IO failure is detected

This patch enhances the checkpoint routine to cope with IO errors.

Basically f2fs detects IO errors from end_io_write, and the errors are able to
be occurred during one of data, node, and meta page writes.

In the previous code, when an IO error is occurred during writes, f2fs sets a
flag, CP_ERROR_FLAG, in the raw ckeckpoint buffer which will be written to disk.
Afterwards, write_checkpoint() will check the flag and remount f2fs as a
read-only (ro) mode.

However, even once f2fs is remounted as a ro mode, dirty checkpoint pages are
freely able to be written to disk by flusher or kswapd in background.
In such a case, after cold reboot, f2fs would restore the checkpoint data having
CP_ERROR_FLAG, resulting in disabling write_checkpoint and remounting f2fs as
a ro mode again.

Therefore, let's prevent any checkpoint page (meta) writes once an IO error is
occurred, and remount f2fs as a ro mode right away at that moment.

Reported-by: Oliver Winker <oliver@oli1170.net>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
408e9375610cca6d54e9c654cbe05a647687e12e 03-Jan-2013 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: revisit the f2fs_gc flow

I'd like to revisit the f2fs_gc flow and rewrite as follows.

1. In practical, the nGC parameter of f2fs_gc is meaningless. So, let's
remove it.
2. Background GC marks victim blocks as dirty one at a time.
3. Foreground GC should do cleaning job until acquiring enough free
sections. Afterwards, it needs to do checkpoint.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
344324f10fad05e40b1047c5e09ebbc77e43c24f 21-Dec-2012 Namjae Jeon <namjae.jeon@samsung.com> f2fs: remove unneeded initialization of nr_dirty in dirty_seglist_info

Since, the memory for the object of dirty_seglist_info is allocated
using kzalloc - which returns zeroed out memory. So, there is no need
to initialize the nr_dirty values with zeroes.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
029cd28c1f739bbfc5105035696d5f1f4e45d161 21-Dec-2012 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: fix equation of has_not_enough_free_secs()

Practically, has_not_enough_free_secs() should calculate with the numbers of
current node and directory data blocks together.
Actually the equation was implemented in need_to_flush().

So, this patch removes need_flush() and moves the equation into
has_not_enough_free_secs().

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
12a67146e35ba1d04ac4a5430eaaa8790158d60e 21-Dec-2012 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: return a default value for non-void function

This patch resolves a build warning reported by kbuild test robot.

"
fs/f2fs/segment.c: In function '__get_segment_type':
fs/f2fs/segment.c:806:1: warning: control reaches end of non-void
function [-Wreturn-type]
"

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
690e4a3ead5f88fc95f7650816d1376aa2e79db5 19-Dec-2012 Geert Uytterhoeven <geert@linux-m68k.org> f2fs: add missing #include <linux/prefetch.h>

m68k allmodconfig:

fs/f2fs/data.c: In function ‘read_end_io’:
fs/f2fs/data.c:311: error: implicit declaration of function ‘prefetchw’

fs/f2fs/segment.c: In function ‘f2fs_end_io_write’:
fs/f2fs/segment.c:628: error: implicit declaration of function ‘prefetchw’

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
dfb7c0ceab57fee7618f4c9c31c5a89254e8530a 12-Dec-2012 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: remove set_page_dirty for atomic f2fs_end_io_write

We should guarantee not to do *scheduling while atomic*.
I found, in atomic f2fs_end_io_write(), there is a set_page_dirty() call
to deal with IO errors.

But, set_page_dirty() calls:
-> f2fs_set_data_page_dirty()
-> set_dirty_dir_page()
-> cond_resched() which results in scheduling.

In order to avoid this, I'd like to remove simply set_page_dirty(),
since the page is already marked as ERROR and f2fs will be operated
as the read-only mode as well.
So, there is no recovery issue with this.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
3cd8a23948b29301f8f67b8d70c5c18fabbc05e1 10-Dec-2012 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: cleanup the f2fs_bio_alloc routine

Do cleanup more for better code readability.

- Change the parameter set of f2fs_bio_alloc()
This function should allocate a bio only since it is not something like
f2fs_bio_init(). Instead, the caller should initialize the allocated bio.

- Introduce SECTOR_FROM_BLOCK
This macro translates a block address to its sector address.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
c212991a6bc3ba120d41205a294c5b89f05f1535 08-Dec-2012 Namjae Jeon <namjae.jeon@samsung.com> f2fs: rewrite f2fs_bio_alloc to make it simpler

Since, GFP_NOFS(__GFP_WAIT) is used for allocation requests of bio in f2fs.
So, there is no chance of returning NULL from the BIO allocation.

Making the bio allocation routine for f2fs simpler.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
1042d60f917d78ef1a6eaea297a1020484d4bf74 01-Dec-2012 Namjae Jeon <namjae.jeon@samsung.com> f2fs: remove unneeded initialization

No need to initialize "struct f2fs_gc_kthread *gc_th = NULL",
as gc_th = NULL, will be taken care by the return values of kmalloc().
And fix codes in other places.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
0a8165d7c2cf1395059db20ab07665baf3758fcd 29-Nov-2012 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: adjust kernel coding style

As pointed out by Randy Dunlap, this patch removes all usage of "/**" for comment
blocks. Instead, just use "/*".

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
25ca923b2a766b9c93b63777ead351137533a623 28-Nov-2012 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: fix endian conversion bugs reported by sparse

This patch should resolve the bugs reported by the sparse tool.
Initial reports were written by "kbuild test robot" managed by fengguang.wu.

In my local machines, I've tested also by running:
> make C=2 CF="-D__CHECK_ENDIAN__"

Accordingly, I've found lots of warnings and bugs related to the endian
conversion. And I've fixed all at this moment.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
351df4b201157351c7d26bf12c3eeb9dbce98854 02-Nov-2012 Jaegeuk Kim <jaegeuk.kim@samsung.com> f2fs: add segment operations

This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.

- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.

- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.

- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.

- To cache SIT entries, a simple array is used. The index for the array is the
segment number.

- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.

- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.

- This patch adds a default block allocation function which supports heap-based
allocation policy.

- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>