Ext4 write stall issue

A write Stall issue was reported by MM folks found during page claim testing over ext4. There is lock contention in JBD2 between journal commit and new transaction, resulting blocking IOs waiting for locks. More precisely it is caused by do_get_write_access() will block at lock_buffer(). The problem is nothing new should be visible in ext3 too. But new kernel becomes more visitable.

Ted has proposed two fixes:

  1. avoid calling lock_buffer() during do_get_write_acess()
  2. adjust jbd2 to manage buffer_head itself to reduce latency.

Fixing in JBD2 would be a big effort. Propose 1) sounds more reasonable to work with. The first action is to mark metadata update with RED_* to avoid the priority disorder meanwhile looking at the block IO layer and see if there is a way to move blocking IOs to a separate queue.

DIO lock contention issue

Another topic brought up is the Direct IO locking contention issue.

On DIO read side there is already no lock hold, but only for pagesize=blocksize case. There is not a fundamental issue why the no lock for direct IO read is not possible for blocksize <Pagesize -- agree we should remove this limit.

较新的 kernel 中引入的 dioread_nolock 挂载参数就是为了 Direct IO read 时无锁操作,只是该特性不支持 data journaling。

On the Direct IO write side, two proposals about concurrent direct IO writes:

  1. One is based on in memory extent status tree, similar to xfs does, which allows dio write to different range of file possible;
  2. Another proposal is the general VFS solution which lock the pages in range during direct IO write. This would benefit all filesystems, but has challenge of sorting out multiple locks orders.

Jan Kara had a LSF session for VFS solution in more details. Looks like this approach is more promising.