a4483e8a424d76bc1dfacdd94e739fba29d7f83f |
|
17-Sep-2014 |
Chao Yu <chao2.yu@samsung.com> |
ceph: remove redundant code for max file size verification Both ceph_update_writeable_page and ceph_setattr will verify file size with max size ceph supported. There are two caller for ceph_update_writeable_page, ceph_write_begin and ceph_page_mkwrite. For ceph_write_begin, we have already verified the size in generic_write_checks of ceph_write_iter; for ceph_page_mkwrite, we have no chance to change file size when mmap. Likewise we have already verified the size in inode_change_ok when we call ceph_setattr. So let's remove the redundant code for max file size verification. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
|
508b32d8661b12da4c9ca41a9b2054e1dc92fa7e |
|
16-Sep-2014 |
Yan, Zheng <zyan@redhat.com> |
ceph: request xattrs if xattr_version is zero Following sequence of events can happen. - Client releases an inode, queues cap release message. - A 'lookup' reply brings the same inode back, but the reply doesn't contain xattrs because MDS didn't receive the cap release message and thought client already has up-to-data xattrs. The fix is force sending a getattr request to MDS if xattrs_version is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client does not have xattr. Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
4e217b5dc87042b0fa52b11f491c4ded1863823a |
|
07-Jun-2014 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: use truncate_pagecache() instead of truncate_inode_pages() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
f3ae1b97be14ff10da8f02309ba04bed2ba035bc |
|
06-Jun-2014 |
Fabian Frederick <fabf@skynet.be> |
fs/ceph: replace pr_warning by pr_warn Update the last pr_warning callsites in fs branch Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Sage Weil <sage@inktank.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
8d08503c130e96e3794f66fe47053051460b1584 |
|
18-Apr-2014 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: remember subtree root dirfrag's auth MDS remember dirfrag's auth MDS when it's different from its parent inode's auth MDS. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
3e7fbe9cebfdaac380419507908e10c499ddd25b |
|
18-Apr-2014 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: introduce ceph_fill_fragtree() Move the code that update the i_fragtree into a separate function. Also add simple probabilistic test to decide whether the i_fragtree should be updated Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
d9df2783507943316b305e177e5b1c157200c76f |
|
18-Apr-2014 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: pre-allocate ceph_cap struct for ceph_add_cap() So that ceph_add_cap() can be used while i_ceph_lock is locked. This simplifies the code that handle cap import/export. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
f98a128a55ff85d0087de89f304f10bd75e792aa |
|
17-Apr-2014 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: update inode fields according to issued caps Cap message and request reply from non-auth MDS may carry stale information (corresponding locks are in LOCK states) even they have the newest inode version. So client should update inode fields according to issued caps. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
0a8a70f96fe1bd3e07c15bb86fd247e76102398a |
|
14-Apr-2014 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: clear directory's completeness when creating file When creating a file, ceph_set_dentry_offset() puts the new dentry at the end of directory's d_subdirs, then set the dentry's offset based on directory's max offset. The offset does not reflect the real postion of the dentry in directory. Later readdir reply from MDS may change the dentry's position/offset. This inconsistency can cause missing/duplicate entries in readdir result if readdir is partly satisfied by dcache_readdir(). The fix is clear directory's completeness after creating/renaming file. It prevents later readdir from using dcache_readdir(). Fixes: http://tracker.ceph.com/issues/8025 Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
|
48193012873e341f08de48304e32d0499b96c60b |
|
01-Apr-2014 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: don't grabs open file reference for aborted request Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
5f75ce57818e4a48bdeac0b76daeb434eea26059 |
|
21-Mar-2014 |
Fabian Frederick <fabf@skynet.be> |
ceph: Remove get/set acl on symlinks Remove unsupported symlink operations. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
|
8c93cd610c6c5a4c0dddfc6fe906814331b3af87 |
|
08-Mar-2014 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: update i_max_size even if inode version does not change handle following sequence of events: - client releases a inode with i_max_size > 0. The release message is queued. (is not sent to the auth MDS) - a 'lookup' request reply from non-auth MDS returns the same inode. - client opens the inode in write mode. The version of inode trace in 'open' request reply is equal to the cached inode's version. - client requests new max size. The MDS ignores the request because it does not affect client's write range Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
|
19913b4eac4a230dccb548931358398f45dabe4c |
|
06-Mar-2014 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: add get_name() NFS export callback Use the newly introduced LOOKUPNAME MDS request to connect child inode to its parent directory. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
|
752c8bdcfe88f27a17c5c9264df928fd145a4b30 |
|
05-Feb-2013 |
Sage Weil <sage@inktank.com> |
ceph: do not chain inode updates to parent fsync The fsync(dirfd) only covers namespace operations, not inode updates. We do not need to cover setattr variants or O_TRUNC. Reported-by: Al Viro <viro@xeniv.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
|
72466d0b92e04a7e0e5abf74c86eb352225346e4 |
|
29-Jan-2014 |
Sage Weil <sage@inktank.com> |
ceph: fix posix ACL hooks The merge of commit 7221fe4c2ed7 ("ceph: add acl for cephfs") raced with upstream changes in the generic POSIX ACL code (eg commit 2aeccbe957d0 "fs: add generic xattr_acl handlers" and others). Some of the fallout was fixed in commit 4db658ea0ca ("ceph: Fix up after semantic merge conflict"), but it was incomplete: the set_acl inode_operation wasn't getting set, and the prototype needed to be adjusted a bit (it doesn't take a dentry anymore). Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
4db658ea0ca2312b5d168230476ec7729385aefe |
|
29-Jan-2014 |
Linus Torvalds <torvalds@linux-foundation.org> |
ceph: Fix up after semantic merge conflict The previous ceph-client merge resulted in ceph not even building, because there was a merge conflict that wasn't visible as an actual data conflict: commit 7221fe4c2ed7 ("ceph: add acl for cephfs") added support for POSIX ACL's into Ceph, but unluckily we also had the VFS tree change a lot of the POSIX ACL helper functions to be much more helpful to filesystems (see for example commits 2aeccbe957d0 "fs: add generic xattr_acl handlers", 5bf3258fd2ac "fs: make posix_acl_chmod more useful" and 37bc15392a23 "fs: make posix_acl_create more useful") The reason this conflict wasn't obvious was many-fold: because it was a semantic conflict rather than a data conflict, it wasn't visible in the git merge as a conflict. And because the VFS tree hadn't been in linux-next, people hadn't become aware of it that way. And because I was at jury duty this morning, I was using my laptop and as a result not doing constant "allmodconfig" builds. Anyway, this fixes the build and generally removes a fair chunk of the Ceph POSIX ACL support code, since the improved helpers seem to match really well for Ceph too. But I don't actually have any way to *test* the end result, and I was really hoping for some ACK's for this. Oh, well. Not compiling certainly doesn't make things easier to test, so I'm committing this without the acks after having waited for four hours... Plus it's what I would have done for the merge had I noticed the semantic conflict.. Reported-by: Dave Jones <davej@redhat.com> Cc: Sage Weil <sage@inktank.com> Cc: Guangliang Zhao <lucienchao@gmail.com> Cc: Li Wang <li.wang@ubuntykylin.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
11df2dfb610d68e8050c2183c344b1002351a99d |
|
24-Nov-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: add imported caps when handling cap export message Version 3 cap export message includes information about the imported caps. It allows us to add the imported caps if the corresponding cap import message still hasn't been received. This allow us to handle situation that the importer MDS crashes and the cap import message is missing. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
9563f88c1fa01341d125e396edc654a8dbcab2d2 |
|
22-Nov-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: fix cache revoke race handle following sequence of events: - non-auth MDS revokes Fc cap. queue invalidate work - auth MDS issues Fc cap through request reply. i_rdcache_gen gets increased. - invalidate work runs. it finds i_rdcache_revoking != i_rdcache_gen, so it does nothing. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
7221fe4c2ed72804b28633c8e0217d65abb0023f |
|
11-Nov-2013 |
Guangliang Zhao <lucienchao@gmail.com> |
ceph: add acl for cephfs Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Reviewed-by: Li Wang <li.wang@ubuntykylin.com> Reviewed-by: Zheng Yan <zheng.z.yan@intel.com>
|
9f12bd119e408388233e7aeb1152f372a8b5dcad |
|
20-Sep-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: drop unconnected inodes Positve dentry and corresponding inode are always accompanied in MDS reply. So no need to keep inode in the cache after dropping all its aliases. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
|
86b58d13134ef14f09f8c8f37797ccc37cf823a3 |
|
04-Dec-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: initialize inode before instantiating dentry commit b18825a7c8 (Put a small type field into struct dentry::d_flags) put a type field into struct dentry::d_flags. __d_instantiate() set the field by checking inode->i_mode. So we should initialize inode before instantiating dentry when handling mds reply. Fixes: http://tracker.ceph.com/issues/6930 Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
|
81c6aea5275eae453719d7f3924da07e668265c5 |
|
18-Sep-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: handle frag mismatch between readdir request and reply If client has outdated directory fragments information, it may request readdir an non-existent directory fragment. In this case, the MDS finds an approximate directory fragment and sends its contents back to the client. When receiving a reply with fragment that is different than the requested one, the client need to reset the 'readdir offset'. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
|
53e879a485f9def0e55c404dbc7187470a01602d |
|
18-Sep-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: remove outdated frag information If directory fragments change, fill_inode() inserts new frags into the fragtree, but it does not remove outdated frags from the fragtree. This patch fixes it. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
|
ed284c49f61165c3ba1b4e6969d1cc30a769c31b |
|
02-Sep-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: remove ceph_lookup_inode() commit 6f60f889 (ceph: fix freeing inode vs removing session caps race) introduced ceph_lookup_inode(). But there is already a ceph_find_inode() which provides similar function. So remove ceph_lookup_inode(), use ceph_find_inode() instead. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Alex Elder <alex.elder@linary.org> Reviewed-by: Sage Weil <sage@inktank.com>
|
99ccbd229cf7453206bc858e795ec1f0345ff258 |
|
21-Aug-2013 |
Milosz Tanski <milosz@adfin.com> |
ceph: use fscache as a local presisent cache Adding support for fscache to the Ceph filesystem. This would bring it to on par with some of the other network filesystems in Linux (like NFS, AFS, etc...) In order to mount the filesystem with fscache the 'fsc' mount option must be passed. Signed-off-by: Milosz Tanski <milosz@adfin.com> Signed-off-by: Sage Weil <sage@inktank.com>
|
b0d7c2231015b331b942746610a05b6ea72977ab |
|
13-Aug-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: introduce i_truncate_mutex I encountered below deadlock when running fsstress wmtruncate work truncate MDS --------------- ------------------ -------------------------- lock i_mutex <- truncate file lock i_mutex (blocked) <- revoking Fcb (filelock to MIX) send request -> handle request (xlock filelock) At the initial time, there are some dirty pages in the page cache. When the kclient receives the truncate message, it reduces inode size and creates some 'out of i_size' dirty pages. wmtruncate work can't truncate these dirty pages because it's blocked by the i_mutex. Later when the kclient receives the cap message that revokes Fcb caps, It can't flush all dirty pages because writepages() only flushes dirty pages within the inode size. When the MDS handles the 'truncate' request from kclient, it waits for the filelock to become stable. But the filelock is stuck in unstable state because it can't finish revoking kclient's Fcb caps. The truncate pagecache locking has already caused lots of trouble for use. I think it's time simplify it by introducing a new mutex. We use the new mutex to prevent concurrent truncate_inode_pages(). There is no need to worry about race between buffered write and truncate_inode_pages(), because our "get caps" mechanism prevents them from concurrent execution. Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
6f60f889470aecf747610279545c054a99aadca3 |
|
23-Jul-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: fix freeing inode vs removing session caps race remove_session_caps() uses iterate_session_caps() to remove caps, but iterate_session_caps() skips inodes that are being deleted. So session->s_nr_caps can be non-zero after iterate_session_caps() return. We can fix the issue by waiting until deletions are complete. __wait_on_freeing_inode() is designed for the job, but it is not exported, so we use lookup inode function to access it. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
85ce127a9adf5ab9e9d57ddf64c858927d5e546d |
|
21-Jul-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: wake up writer if vmtruncate work get blocked To write data, the writer first acquires the i_mutex, then try getting caps. The writer may sleep while holding the i_mutex. If the MDS revokes Fb cap in this case, vmtruncate work can't do its job because i_mutex is locked. We should wake up the writer and let it truncate the pages. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
|
84d08fa888e7c2d53b5bbc764db2ef02968b499c |
|
05-Jul-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
helper for reading ->d_count Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
b415bf4f9fe25f39934f5c464125e4a2dffb6d08 |
|
01-Jul-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: fix pending vmtruncate race The locking order for pending vmtruncate is wrong, it can lead to following race: write wmtruncate work ------------------------ ---------------------- lock i_mutex check i_truncate_pending check i_truncate_pending truncate_inode_pages() lock i_mutex (blocked) copy data to page cache unlock i_mutex truncate_inode_pages() The fix is take i_mutex before calling __ceph_do_pending_vmtruncate() Fixes: http://tracker.ceph.com/issues/5453 Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
|
0b93267252ef5fe6c6d77e3013ed6a0d766352ad |
|
07-Apr-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: fix symlink inode operations add getattr/setattr and xattrs related methods. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>
|
2f276c511137d97e56b19e29865e1e6569315ccb |
|
13-Mar-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: use i_release_count to indicate dir's completeness Current ceph code tracks directory's completeness in two places. ceph_readdir() checks i_release_count to decide if it can set the I_COMPLETE flag in i_ceph_flags. All other places check the I_COMPLETE flag. This indirection introduces locking complexity. This patch adds a new variable i_complete_count to ceph_inode_info. Set i_release_count's value to it when marking a directory complete. By comparing the two variables, we know if a directory is complete Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
3f99969f42300e52779ae0656678c2534097f2ea |
|
01-Mar-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: acquire i_mutex in __ceph_do_pending_vmtruncate make __ceph_do_pending_vmtruncate() acquire the i_mutex if the caller does not hold the i_mutex, so ceph_aio_read() can call safely. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>
|
a8673d61ad77ddf2118599507bd40cc345e95368 |
|
18-Feb-2013 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: use I_COMPLETE inode flag instead of D_COMPLETE flag commit c6ffe10015 moved the flag that tracks if the dcache contents for a directory are complete to dentry. The problem is there are lots of places that use ceph_dir_{set,clear,test}_complete() while holding i_ceph_lock. but ceph_dir_{set,clear,test}_complete() may sleep because they call dput(). This patch basically reverts that commit. For ceph_d_prune(), it's called with both the dentry to prune and the parent dentry are locked. So it's safe to access the parent dentry's d_inode and clear I_COMPLETE flag. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
|
79f9f99ad1e3e19d4ac300573b51289e3ee8ba86 |
|
29-Jan-2013 |
Sage Weil <sage@inktank.com> |
ceph: prepopulate inodes only when request is aborted If r_aborted is true, we do not hold the dir i_mutex, and cannot touch the dcache. However, we still need to update the inodes with the state returned by the MDS. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
bd2bae6a66df9261a39e47291b0a6b00cd0831e0 |
|
31-Jan-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
ceph: Convert kuids and kgids before printing them. Before printing kuid and kgids values convert them into the initial user namespace. Cc: Sage Weil <sage@inktank.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
ab871b903e9095772c219b512d9eae96c4663a5d |
|
31-Jan-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
ceph: Translate inode uid and gid attributes to/from kuids and kgids. - In fill_inode() transate uids and gids in the initial user namespace into kuids and kgids stored in inode->i_uid and inode->i_gid. - In ceph_setattr() if they have changed convert inode->i_uid and inode->i_gid into initial user namespace uids and gids for transmission. Cc: Sage Weil <sage@inktank.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
a85f50b6ef93fbbb2ae932ce9b2376509d172796 |
|
19-Nov-2012 |
Yan, Zheng <zheng.z.yan@intel.com> |
ceph: Fix __ceph_do_pending_vmtruncate we should set i_truncate_pending to 0 after page cache is truncated to i_truncate_size Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
|
2744c171dbaa0b1ec7639e7d0ff817fba9461a38 |
|
27-Sep-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
ceph: don't abuse d_delete() on failure exits Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
6c5e50fa614fea5325a2973be06f7ec6f1055316 |
|
22-Aug-2012 |
Sage Weil <sage@inktank.com> |
ceph: tolerate (and warn on) extraneous dentry from mds If the MDS gives us a dentry and we weren't prepared to handle it, WARN_ON_ONCE instead of crashing. Reported-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
|
810339ec2fae5cbd0164b8acde7fb65652755864 |
|
03-Feb-2012 |
Xi Wang <xi.wang@gmail.com> |
ceph: avoid panic with mismatched symlink sizes in fill_inode() Return -EINVAL rather than panic if iinfo->symlink_len and inode->i_size do not match. Also use kstrndup rather than kmalloc/memcpy. Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Alex Elder <elder@dreamhost.com>
|
b8cd952b51034ad9f20ca147507ee68dc641c98c |
|
13-Dec-2011 |
Yehuda Sadeh <yehuda@hq.newdream.net> |
ceph: dereference pointer after checking for NULL moved dereference after BUG_ON Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
|
6b520e0565422966cdf1c3759bd73df77b0f248c |
|
12-Dec-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
vfs: fix the stupidity with i_dentry in inode destructors Seeing that just about every destructor got that INIT_LIST_HEAD() copied into it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once(); the cost of taking it into inode_init_always() will be negligible for pipes and sockets and negative for everything else. Not to mention the removal of boilerplate code from ->destroy_inode() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
be655596b3de5873f994ddbe205751a5ffb4de39 |
|
30-Nov-2011 |
Sage Weil <sage@newdream.net> |
ceph: use i_ceph_lock instead of i_lock We have been using i_lock to protect all kinds of data structures in the ceph_inode_info struct, including lists of inodes that we need to iterate over while avoiding races with inode destruction. That requires grabbing a reference to the inode with the list lock protected, but igrab() now takes i_lock to check the inode flags. Changing the list lock ordering would be a painful process. However, using a ceph-specific i_ceph_lock in the ceph inode instead of i_lock is a simple mechanical change and avoids the ordering constraints imposed by igrab(). Reported-by: Amon Ott <a.ott@m-privacy.de> Signed-off-by: Sage Weil <sage@newdream.net>
|
15a2015fbc692e1c97d7ce12d96e077f5ae7ea6d |
|
06-Nov-2011 |
Sage Weil <sage@newdream.net> |
ceph: fix iput race when queueing inode work If we queue a work item that calls iput(), make sure we ihold() before attempting to queue work. Otherwise our queued work might miraculously run before we notice the queue_work() succeeded and call ihold(), allowing the inode to be destroyed. That is, instead of if (queue_work(...)) ihold(); we need to do ihold(); if (!queue_work(...)) iput(); Reported-by: Amon Ott <a.ott@m-privacy.de> Signed-off-by: Sage Weil <sage@newdream.net>
|
c6ffe10015f4e6fba8a915318b319c43aed1836f |
|
03-Nov-2011 |
Sage Weil <sage@newdream.net> |
ceph: use new D_COMPLETE dentry flag We used to use a flag on the directory inode to track whether the dcache contents for a directory were a complete cached copy. Switch to a dentry flag CEPH_D_COMPLETE that is safely updated by ->d_prune(). Signed-off-by: Sage Weil <sage@newdream.net>
|
bfe8684869601dacfcb2cd69ef8cfd9045f62170 |
|
28-Oct-2011 |
Miklos Szeredi <mszeredi@suse.cz> |
filesystems: add set_nlink() Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
83eaea22bdfc9e1cec88f81be5b64f30f6c37e8b |
|
24-Aug-2011 |
Sage Weil <sage@newdream.net> |
Revert "ceph: don't truncate dirty pages in invalidate work thread" This reverts commit c9af9fb68e01eb2c2165e1bc45cfeeed510c64e6. We need to block and truncate all pages in order to reliably invalidate them. Otherwise, we could: - have some uptodate pages in the cache - queue an invalidate - write(2) locks some pages - invalidate_work skips them - write(2) only overwrites part of the page - page now dirty and uptodate -> partial leakage of invalidated data It's not entirely clear why we started skipping locked pages in the first place. I just ran this through fsx and didn't see any problems. Signed-off-by: Sage Weil <sage@newdream.net>
|
4f1772645296a230e04f5c53e79cfb6f841ce634 |
|
26-Jul-2011 |
Sage Weil <sage@newdream.net> |
ceph: document locking for ceph_set_dentry_offset Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
|
5f21c96dd5c615341963036ae8f5e4f5227a818d |
|
26-Jul-2011 |
Sage Weil <sage@newdream.net> |
ceph: protect access to d_parent d_parent is protected by d_lock: use it when looking up a dentry's parent directory inode. Also take a reference and drop it in the caller to avoid a use-after-free. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
|
dfabbed6fdd509dc2beb89c954bc36014a1bc7cb |
|
26-Jul-2011 |
Sage Weil <sage@newdream.net> |
ceph: set dir complete frag after adding capability Curretly ceph_add_cap clears the complete bit if we are newly issued the FILE_SHARED cap, which is normally the case for a newly issue cap on a new directory. That means we clear the just-set bit. Move the check that sets the flag to after the cap is added/updated. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
|
2f90b852e3ae73889d7f6de6ecf429b9b6a6b103 |
|
26-Jul-2011 |
Sage Weil <sage@newdream.net> |
ceph: ignore lease mask The lease mask is no longer used (and it changed a while back). Instead, use a non-zero duration to indicate that there is a lease being issued. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
|
10556cb21a0d0b24d95f00ea6df16f599a3345b2 |
|
21-Jun-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
->permission() sanitizing: don't pass flags to ->permission() not used by the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
2830ba7f34ebb27c4e5b8b6ef408cd6d74860890 |
|
21-Jun-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
->permission() sanitizing: don't pass flags to generic_permission() redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of them removes that bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
178ea73521d64ba41d7aa5488fb9f549c6d4507d |
|
20-Jun-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
kill check_acl callback of generic_permission() its value depends only on inode and does not change; we might as well store it in ->i_op->check_acl and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
70b666c3b4cb2b96098d80e6f515e4bc6d37db5a |
|
27-May-2011 |
Sage Weil <sage@newdream.net> |
ceph: use ihold when we already have an inode ref We should use ihold whenever we already have a stable inode ref, even when we aren't holding i_lock. This avoids adding new and unnecessary locking dependencies. Signed-off-by: Sage Weil <sage@newdream.net>
|
d3d0720d4a7a46e93e055e5b0f1a8bd612743ed6 |
|
11-May-2011 |
Henry C Chang <henry.cy.chang@gmail.com> |
ceph: do not use i_wrbuffer_ref as refcount for Fb cap We increments i_wrbuffer_ref when taking the Fb cap. This breaks the dirty page accounting and causes looping in __ceph_do_pending_vmtruncate, and ceph client hangs. This bug can be reproduced occasionally by running blogbench. Add a new field i_wb_ref to inode and dedicate it to Fb reference counting. Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
|
fca65b4ad72d28cbb43a029114d04b89f06faadb |
|
04-May-2011 |
Sage Weil <sage@newdream.net> |
ceph: do not call __mark_dirty_inode under i_lock The __mark_dirty_inode helper now takes i_lock as of 250df6ed. Fix the one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the flags value so that the callers can do it outside of i_lock. Signed-off-by: Sage Weil <sage@newdream.net>
|
ad1fee96cbaf873520064252c5dc3212c9844861 |
|
22-Jan-2011 |
Yehuda Sadeh <yehuda@hq.newdream.net> |
ceph: add ino32 mount option The ino32 mount option forces the ceph fs to report 32 bit ino values. This is useful for 64 bit kernels with 32 bit userspace. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
|
09adc80c611bb8902daa8ccfe34dbbc009d6befe |
|
05-Feb-2011 |
Sage Weil <sage@newdream.net> |
ceph: preserve I_COMPLETE across rename d_move puts the renamed dentry at the end of d_subdirs, screwing with our cached dentry directory offsets. We were just clearing I_COMPLETE to avoid any possibility of trouble. However, assigning the renamed dentry an offset at the end of the directory (to match it's new d_subdirs position) is sufficient to maintain correct behavior and hold onto I_COMPLETE. This is especially important for workloads like rsync, which renames files into place. Before, we would lose I_COMPLETE and do MDS lookups for each file. With this patch we only talk to the MDS on create and rename. Signed-off-by: Sage Weil <sage@newdream.net>
|
b545cc1505eb49247071ce9f4092665de788ca00 |
|
28-Feb-2011 |
Sage Weil <sage@newdream.net> |
ceph: do not set I_COMPLETE Do not set the I_COMPLETE flag on directories until we resolve races with dcache pruning. Signed-off-by: Sage Weil <sage@newdream.net>
|
1c1266bb916e6a6b362d3be95f2cc7f3c41277a6 |
|
13-Jan-2011 |
Yehuda Sadeh <yehuda@hq.newdream.net> |
ceph: fix getattr on directory when using norbytes The norbytes mount option was broken, and when doing getattr on a directory it return the rbytes instead of the number of entities. This commit fixes it. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
|
14303d20f3ae3e6ab626c77a4aac202b3bafd377 |
|
15-Dec-2010 |
Sage Weil <sage@newdream.net> |
ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS This implements the DIRLAYOUTHASH protocol feature, which passes the dir layout over the wire from the MDS. This gives the client knowledge of the correct hash function to use for mapping dentries among dir fragments. Note that if this feature is _not_ present on the client but is on the MDS, the client may misdirect requests. This will result in a forward and degrade performance. It may also result in inaccurate NFS filehandle generation, which will prevent fh resolution when the inode is not present in the client cache and the parent directories have been fragmented. Signed-off-by: Sage Weil <sage@newdream.net>
|
6c0f3af72cb1622a66962a1180c36ef8c41be8e2 |
|
16-Nov-2010 |
Sage Weil <sage@newdream.net> |
ceph: add dir_layout to inode Add a ceph_dir_layout to the inode, and calculate dentry hash values based on the parent directory's specified dir_hash function. This is needed because the old default Linux dcache hash function is extremely week and leads to a poor distribution of files among dir fragments. Signed-off-by: Sage Weil <sage@newdream.net>
|
b74c79e99389cd79b31fcc08f82c24e492e63c7e |
|
07-Jan-2011 |
Nick Piggin <npiggin@kernel.dk> |
fs: provide rcu-walk aware permission i_ops Signed-off-by: Nick Piggin <npiggin@kernel.dk>
|
fa0d7e3de6d6fc5004ad9dea0dd6b286af8f03e9 |
|
07-Jan-2011 |
Nick Piggin <npiggin@kernel.dk> |
fs: icache RCU free inodes RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
|
b5c84bf6f6fa3a7dfdcb556023a62953574b60ee |
|
07-Jan-2011 |
Nick Piggin <npiggin@kernel.dk> |
fs: dcache remove dcache_lock dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
|
2fd6b7f50797f2e993eea59e0a0b8c6399c811dc |
|
07-Jan-2011 |
Nick Piggin <npiggin@kernel.dk> |
fs: dcache scale subdirs Protect d_subdirs and d_child with d_lock, except in filesystems that aren't using dcache_lock for these anyway (eg. using i_mutex). Note: if we change the locking rule in future so that ->d_child protection is provided only with ->d_parent->d_lock, it may allow us to reduce some locking. But it would be an exception to an otherwise regular locking scheme, so we'd have to see some good results. Probably not worthwhile. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
|
b7ab39f631f505edc2bbdb86620d5493f995c9da |
|
07-Jan-2011 |
Nick Piggin <npiggin@kernel.dk> |
fs: dcache scale dentry refcount Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
|
451a3c24b0135bce54542009b5fde43846c7cf67 |
|
17-Nov-2010 |
Arnd Bergmann <arnd@arndb.de> |
BKL: remove extraneous #include <smp_lock.h> The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
b7495fc2ff941db6a118a93ab8d61149e3f4cef8 |
|
09-Nov-2010 |
Sage Weil <sage@newdream.net> |
ceph: make page alignment explicit in osd interface We used to infer alignment of IOs within a page based on the file offset, which assumed they matched. This broke with direct IO that was not aligned to pages (e.g., 512-byte aligned IO). We were also trusting the alignment specified in the OSD reply, which could have been adjusted by the server. Explicitly specify the page alignment when setting up OSD IO requests. Signed-off-by: Sage Weil <sage@newdream.net>
|
d8672d64b88cdb7aa8139fb6d218f40b8cbf60af |
|
08-Nov-2010 |
Sage Weil <sage@newdream.net> |
ceph: fix update of ctime from MDS The client can have a newer ctime than the MDS due to AUTH_EXCL and XATTR_EXCL caps as well; update the check in ceph_fill_file_time appropriately. This fixes cases where ctime/mtime goes backward under the right sequence of local updates (e.g. chmod) and mds replies (e.g. subsequent stat that goes to the MDS). Signed-off-by: Sage Weil <sage@newdream.net>
|
8bd59e0188c04f6540f00e13f633f22e4804ce06 |
|
08-Nov-2010 |
Sage Weil <sage@newdream.net> |
ceph: fix version check on racing inode updates We may get updates on the same inode from multiple MDSs; generally we only pay attention if the update is newer than what we already have. The exception is when an MDS sense unstable information, in which case we always update. The old > check got this wrong when our version was odd (e.g. 3) and the reply version was even (e.g. 2): the older stale (v2) info would be applied. Fixed and clarified the comment. Signed-off-by: Sage Weil <sage@newdream.net>
|
cd045cb42a266882ac24bc21a3a8d03683c72954 |
|
04-Nov-2010 |
Sage Weil <sage@newdream.net> |
ceph: fix rdcache_gen usage and invalidate We used to use rdcache_gen to indicate whether we "might" have cached pages. Now we just look at the mapping to determine that. However, some old behavior remains from that transition. First, rdcache_gen == 0 no longer means we have no pages. That can happen at any time (presumably when we carry FILE_CACHE). We should not reset it to zero, and we should not check that it is zero. That means that the only purpose for rdcache_revoking is to resolve races between new issues of FILE_CACHE and an async invalidate. If they are equal, we should invalidate. On success, we decrement rdcache_revoking, so that it is no longer equal to rdcache_gen. Similarly, if we success in doing a sync invalidate, set revoking = gen - 1. (This is a small optimization to avoid doing unnecessary invalidate work and does not affect correctness.) Signed-off-by: Sage Weil <sage@newdream.net>
|
912a9b0319a8eb9e0834b19a25e01013ab2d6a9f |
|
07-Nov-2010 |
Sage Weil <sage@newdream.net> |
ceph: only let auth caps update max_size Only the auth MDS has a meaningful max_size value for us, so only update it in fill_inode if we're being issued an auth cap. Otherwise, a random stat result from a non-auth MDS can clobber a meaningful max_size, get the client<->mds cap state out of sync, and make writes hang. Specifically, even if the client re-requests a larger max_size (which it will), the MDS won't respond because as far as it knows we already have a sufficiently large value. Signed-off-by: Sage Weil <sage@newdream.net>
|
d8b16b3d1c9d8d9124d647d05797383d35e2d645 |
|
06-Nov-2010 |
Sage Weil <sage@newdream.net> |
ceph: fix bad pointer dereference in ceph_fill_trace We dereference *in a few lines down, but only set it on rename. It is apparently pretty rare for this to trigger, but I have been hitting it with a clustered MDSs. Signed-off-by: Sage Weil <sage@newdream.net>
|
3d14c5d2b6e15c21d8e5467dc62d33127c23a644 |
|
07-Apr-2010 |
Yehuda Sadeh <yehuda@hq.newdream.net> |
ceph: factor out libceph from Ceph file system This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: Sage Weil <sage@newdream.net>
|
467c525109d5d542d7d416b0c11bdd54610fe2f4 |
|
13-Sep-2010 |
Sage Weil <sage@newdream.net> |
ceph: fix dn offset during readdir_prepopulate When adding the readdir results to the cache, ceph_set_dentry_offset was clobbered our just-set offset. This can cause the readdir result offsets to get out of sync with the server. Add an argument to the helper so that it does not. This bug was introduced by 1cd3935bedccf592d44343890251452a6dd74fc4. Signed-off-by: Sage Weil <sage@newdream.net>
|
ac1f12ef569d49b013c3db86e11be7e15d66b1c3 |
|
25-Aug-2010 |
Dan Carpenter <error27@gmail.com> |
ceph: ceph_get_inode() returns an ERR_PTR ceph_get_inode() returns an ERR_PTR and it doesn't return a NULL. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
|
124514918b030d74f1f3e15483b7bf3b85268082 |
|
23-Aug-2010 |
Sage Weil <sage@newdream.net> |
ceph: don't improperly set dir complete when holding EXCL cap If we hold the EXCL cap, we cannot trust the dir stats from the MDS (num files, subdirs) and must not incorrectly conclude that the directory is empty. If we do, we get can bad results from lookup (bad ENOENT) and bad readdir results. Signed-off-by: Sage Weil <sage@newdream.net>
|
2962507ca204f886967e1a089d9bec206d427c22 |
|
27-May-2010 |
Sage Weil <sage@newdream.net> |
ceph: perform lazy reads when file mode and caps permit If the file mode is marked as "lazy," perform cached/buffered reads when the caps permit it. Adjust the rdcache_gen and invalidation logic accordingly so that we manage our cache based on the FILE_CACHE -or- FILE_LAZYIO cap bits. Signed-off-by: Sage Weil <sage@newdream.net>
|
03066f23452ff088ad8e2c8acdf4443043f35b51 |
|
27-Jul-2010 |
Yehuda Sadeh <yehuda@hq.newdream.net> |
ceph: use complete_all and wake_up_all This fixes an issue triggered by running concurrent syncs. One of the syncs would go through while the other would just hang indefinitely. In any case, we never actually want to wake a single waiter, so the *_all functions should be used. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
|
8c696737aa61316a252c4514d09dd163f1464d33 |
|
22-Jul-2010 |
Sage Weil <sage@newdream.net> |
ceph: fix leak of dentry in ceph_init_dentry() error path If we fail to allocate a ceph_dentry_info, don't leak the dn reference. Signed-off-by: Sage Weil <sage@newdream.net>
|
d69ed05a80f23b25f06e73af9b7e701ce4900edc |
|
21-Jun-2010 |
Sage Weil <sage@newdream.net> |
ceph: handle splice_dentry/d_materialize_unique error in readdir_prepopulate Handle a splice_dentry failure (due to a d_materialize_unique error) without crashing. (Also, report the error code.) Signed-off-by: Sage Weil <sage@newdream.net>
|
13a4214cd9ec14d7b77e98bd3ee51f60f868a6e5 |
|
01-Jun-2010 |
Henry C Chang <henry_c_chang@tcloudcomputing.com> |
ceph: fix d_subdirs ordering problem We misused list_move_tail() to order the dentry in d_subdirs. This will screw up the d_subdirs order. This bug can be reliably reproduced by: 1. mount ceph fs. 2. on ceph fs, git clone git://ceph.newdream.net/git/ceph.git 3. Run autogen.sh in ceph directory. (Note: Errors only occur at the first time you run autogen.sh.) Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
|
7e34bc524ecae3a04d8cc427ee76ddad826a937b |
|
22-May-2010 |
Julia Lawall <julia@diku.dk> |
fs/ceph: Use ERR_CAST Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. In the case of fs/ceph/inode.c, ERR_CAST is not needed, because the type of the returned value is the same as the type of the enclosing function. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Sage Weil <sage@newdream.net>
|
167c9e352deb7e25568c926c49c3eafad69cbe76 |
|
14-May-2010 |
Sage Weil <sage@newdream.net> |
ceph: use common helper for aborted dir request invalidation We invalidate I_COMPLETE and dentry leases in two places: on aborted mds request and on request replay. Use common helper to avoid duplicate code. Signed-off-by: Sage Weil <sage@newdream.net>
|
1cd3935bedccf592d44343890251452a6dd74fc4 |
|
04-May-2010 |
Sage Weil <sage@newdream.net> |
ceph: set dn offset when spliced We want to assign an offset when the dentry goes from null to linked, which is always done by splice_dentry(). Notably, we should NOT assign an offset when a dentry is first created and is still null. BUG if we try to splice a non-null dentry (we shouldn't). Signed-off-by: Sage Weil <sage@newdream.net>
|
1b7facc41b42c2ab904b2f88b64b1f8ca0ca6cb7 |
|
16-Apr-2010 |
Sage Weil <sage@newdream.net> |
ceph: don't clobber i_max_offset on already complete dir This can screw up offsets assigned to new dentries and break dcache readdir results. Signed-off-by: Sage Weil <sage@newdream.net>
|
e8a7498715181ece36130335536e13733a5c3187 |
|
15-Apr-2010 |
Sage Weil <sage@newdream.net> |
ceph: skip set_dentry_offset work if directory not I_COMPLETE Signed-off-by: Sage Weil <sage@newdream.net>
|
a6424e48c8d54a5795430b07c4487f1ed280df4e |
|
29-Apr-2010 |
Sage Weil <sage@newdream.net> |
ceph: fix xattr dangling pointer / double free If we use the xattr_blob, clear the pointer so we don't release the memory at the bottom of the fuction. Reported-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
|
640ef79d27c81b7a3265a344ec1d25644dd463ad |
|
26-Mar-2010 |
Cheng Renquan <crquan@gmail.com> |
ceph: use ceph_sb_to_client instead of ceph_client ceph_sb_to_client and ceph_client are really identical, we need to dump one; while function ceph_client is confusing with "struct ceph_client", ceph_sb_to_client's definition is more clear; so we'd better switch all call to ceph_sb_to_client. -static inline struct ceph_client *ceph_client(struct super_block *sb) -{ - return sb->s_fs_info; -} Signed-off-by: Cheng Renquan <crquan@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
|
81a6cf2d30eac5d790f53cdff110892f7b18c7fe |
|
14-May-2010 |
Sage Weil <sage@newdream.net> |
ceph: invalidate affected dentry leases on aborted requests If we abort a request, we return to caller, but the request may still complete. And if we hold the dir FILE_EXCL bit, we may not release a lease when sending a request. A simple un-tar, control-c, un-tar again will reproduce the bug (manifested as a 'Cannot open: File exists'). Ensure we invalidate affected dentry leases (as well dir I_COMPLETE) so we don't have valid (but incorrect) leases. Do the same, consistently, at other sites where I_COMPLETE is similarly cleared. Signed-off-by: Sage Weil <sage@newdream.net>
|
04d000eb358919043da538f197d63f2a5924a525 |
|
07-May-2010 |
Sage Weil <sage@newdream.net> |
ceph: fix open file counting on snapped inodes when mds returns no caps It's possible the MDS will not issue caps on a snapped inode, in which case an open request may not __ceph_get_fmode(), botching the open file counting. (This is actually a server bug, but the client shouldn't BUG out in this case.) Signed-off-by: Sage Weil <sage@newdream.net>
|
c10f5e12bafde7f7a2f9b75d76f7a68d62154e91 |
|
16-Apr-2010 |
Sage Weil <sage@newdream.net> |
ceph: clear dir complete on d_move d_move() reorders the d_subdirs list, breaking the readdir result caching. Unless/until d_move preserves that ordering, clear CEPH_I_COMPLETE on rename. Signed-off-by: Sage Weil <sage@newdream.net>
|
9358c6d4c0264b1572554c49c4b92673ea9a5c72 |
|
30-Mar-2010 |
Sage Weil <sage@newdream.net> |
ceph: fix dentry rehashing on virtual .snap dir If a lookup fails on the magic .snap directory, we bind it to a magic snap directory inode in ceph_lookup_finish(). That code assumes the dentry is unhashed, but a recent server-side change started returning NULL leases on lookup failure, causing the .snap dentry to be hashed and NULL by ceph_fill_trace(). This causes dentry hash chain corruption, or a dies when d_rehash() includes BUG_ON(!d_unhashed(entry)); So, avoid processing the NULL dentry lease if it the dentry matches the snapdir name in ceph_fill_trace(). That allows the lookup completion to properly bind it to the snapdir inode. BUG there if dentry is hashed to be sure. Signed-off-by: Sage Weil <sage@newdream.net>
|
8b218b8a4a65bf4e304ae8690cadb9100ef029c0 |
|
09-Mar-2010 |
Sage Weil <sage@newdream.net> |
ceph: fix inode removal from snap realm when racing with migration When an inode was dropped while being migrated between two MDSs, i_cap_exporting_issued was non-zero such that issue caps were non-zero and __ceph_is_any_caps(ci) was true. This prevented the inode from being removed from the snap realm, even as it was dropped from the cache. Fix this by dropping any residual i_snap_realm ref in destroy_inode. Signed-off-by: Sage Weil <sage@newdream.net>
|
c9af9fb68e01eb2c2165e1bc45cfeeed510c64e6 |
|
19-Feb-2010 |
Yehuda Sadeh <yehuda@hq.newdream.net> |
ceph: don't truncate dirty pages in invalidate work thread Instead of truncating the whole range of pages, we skip those pages that are dirty or in the middle of writeback. Those pages will be cleared later when the writeback completes. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
|
2c27c9a57c93a0757b9b4b0e7dc1abeaf1db1ce2 |
|
18-Feb-2010 |
Sage Weil <sage@newdream.net> |
ceph: fix typo in ceph_queue_writeback debug output Signed-off-by: Sage Weil <sage@newdream.net>
|
3c6f6b79a64db7f1c7abf09d693db3b0066784fb |
|
10-Feb-2010 |
Sage Weil <sage@newdream.net> |
ceph: cleanup async writeback, truncation, invalidate helpers Grab inode ref in helper. Make work functions static, with consistent naming. Signed-off-by: Sage Weil <sage@newdream.net>
|
3d497d858ae6e5f23a28783030aecc69074e102d |
|
09-Feb-2010 |
Yehuda Sadeh <yehuda@hq.newdream.net> |
ceph: fix truncation when not holding caps A truncation should occur when either we have the specified caps for the file, or (in cases where we are not the only ones referencing the file) when it is mapped or when it is opened. The latter two cases were not handled. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
|
0f26c4b21b684825a6dd41f2bc04d48ff62d72f8 |
|
29-Jan-2010 |
Yehuda Sadeh <yehuda@hq.newdream.net> |
ceph: remove unreachable code We never truncate to a smaller size without contacting the MDS. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
|
5b1daecd59f95eb24dc629407ed80369c9929520 |
|
25-Jan-2010 |
Sage Weil <sage@newdream.net> |
ceph: properly handle aborted mds requests Previously, if the MDS request was interrupted, we would unregister the request and ignore any reply. This could cause the caps or other cache state to become out of sync. (For instance, aborting dbench and doing rm -r on clients would complain about a non-empty directory because the client didn't realize it's aborted file create request completed.) Even we don't unregister, we still can't process the reply normally because we are no longer holding the caller's locks (like the dir i_mutex). So, mark aborted operations with r_aborted, and in the reply handler, be sure to process all the caps. Do not process the namespace changes, though, since we no longer will hold the dir i_mutex. The dentry lease state can also be ignored as it's more forgiving. Signed-off-by: Sage Weil <sage@newdream.net>
|
4baa75ef0ed29adae03fcbbaa9aca1511a5a8cc9 |
|
08-Jan-2010 |
Yehuda Sadeh <yehuda@hq.newdream.net> |
ceph: change dentry offset and position after splice_dentry This fixes a bug, where we had the parent list have dentries with offsets that are not monotonically increasing, which caused the ceph dcache_readdir to skip entries. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
|
c4a29f26d50bea65809ca670992108a33aa2efa6 |
|
21-Dec-2009 |
Sage Weil <sage@newdream.net> |
ceph: ensure rename target dentry fails revalidation This works around a bug in vfs_rename_dir() that rehashes the target dentry. Ensure such dentries always fail revalidation by timing out the dentry lease and kicking it out of the current directory lease gen. This can be reverted when the vfs bug is fixed. Signed-off-by: Sage Weil <sage@newdream.net>
|
b6c1d5b81ea0841ae9d3ce2cda319ab986b081cf |
|
07-Dec-2009 |
Sage Weil <sage@newdream.net> |
ceph: simplify ceph_buffer interface We never allocate the ceph_buffer and buffer separtely, so use a single constructor. Disallow put on NULL buffer; make the caller check. Signed-off-by: Sage Weil <sage@newdream.net>
|
b377ff13b31778c19203f3089d14080beb40a692 |
|
12-Nov-2009 |
Sage Weil <sage@newdream.net> |
ceph: initialize i_size/i_rbytes on snapdir Signed-off-by: Sage Weil <sage@newdream.net>
|
232d4b01319767b3ffa5d08962a81c805962be49 |
|
21-Oct-2009 |
Sage Weil <sage@newdream.net> |
ceph: move directory size logic to ceph_getattr We can't fill i_size with rbytes at the fill_file_size stage without adding additional checks for directories. Notably, we want st_blocks to remain 0 on directories so that 'du' still works. Fill in i_blocks, i_size specially in ceph_getattr instead. Signed-off-by: Sage Weil <sage@newdream.net>
|
355da1eb7a1f91c276b991764e951bbcd8047599 |
|
06-Oct-2009 |
Sage Weil <sage@newdream.net> |
ceph: inode operations Inode cache and inode operations. We also include routines to incorporate metadata structures returned by the MDS into the client cache, and some helpers to deal with file capabilities and metadata leases. The bulk of that work is done by fill_inode() and fill_trace(). Signed-off-by: Sage Weil <sage@newdream.net>
|