Recently, I had a conversation with Mark Fasheh, the topic was DLM (Distributed Lock Manager) levels used in OCFS2 (Oracle Cluster File System v2). IMHO, the talk is quite useful for a starter of OCFS2 or DLM, I list the conversation here, hope it could be informative. Thank you, Mark
Mark gave a simplified explanation on NL, PR and EX dlm lock levels used in OCFS2.
There are 3 lock levels Ocfs2 uses when protecting shared resources.
“NL” aka “No Lock” this is used as a placeholder. Either we get it so that we
can convert the lock to something useful, or we already had some higher level
lock and dropper to NL so another node can continue. This lock level does not
block any other nodes from access to the resource.
“PR” aka “Protected Read”. This is used to that multiple nodes might read the
resource at the same time without any mutual exclusion. This level blocks only
those nodes which want to make changes to the resource (EX locks).
“EX” aka “Exclusive”. This is used to keep other nodes from reading or changing
a resource while it is being changed by the current node. This level blocks PR
locks and other EX locks.
When another node wants a level of access to a resource which the current node
is blocking due to it’s lock level, that node “downconverts” the lock to a
compatible level. Sometimes we might have multiple nodes trying to gain
exclusive access to a resource at the same time (say two nodes want to go from
PR -> EX). When that happens, only one node can win and the others are sent
signals to ‘cancel’ their lock request and if need be, ‘downconvert’ to a mode
which is compatible with what’s being requested. In the previous example, that
means one of the nodes would cancel it’s attempt to go from PR->EX and
afterwards it would drop it’s PR to NL since the PR lock blocks the other node
from an EX.
After read the above text, I talked with Mark in IRC, here is the edited (remove unnecessary part) conversation log,
coly: it’is an excellent material for DLM lock levels of ocfs2!
mark: specially if that helps folks understand what’s happening in dlmglue.c
* mark knows that code can be…. hard to follow
mark: another thing you might want to take note of – this whole “cancel convert” business is there because the dlm allows a process to retain it’s current lock level while asking for an escalation
coly: one thing I am not clear is, what’s the functionality of dlmglue.c ? like the name, glue ?
mark: if you think about it – being forced to drop the lock and re-acquire would eliminate the possibility of deadlock, at the expense of performance
mark: think of dlmglue.c as the layer of code which abstracts away the dlm interface for the fs
mark: as part of that abstraction, file system lock management is wholly contained within dlmglue.c
coly: only dlmglue.c acts as a abstract layer ? and the real job is done by fsdlm or o2dlm ?
mark: dlmglue is never actually creating resources itself – it’s asking the dlm on behalf of the file system
mark: aside from code cleanliness, dlmglue provides a number of features the fs needs that the dlm (rightfully) does not provide
coly: which kind of ?
mark: lock caching for example – you’ll notice that we keep counts on the locks in dlmglue
mark: also, whatever fs specific actions might be needed as part of a lock transition are initiated from dlmglue. an example of that would be checkpointing inode changes before allowing other nodes access, etc
coly: yeah, that’s one more thing confusing me.
coly: It’s not clear to me yet, for the conception of upconvert and downconvert
coly: when it combined with ast and bast
coly: have you checked out the “dlmbook” pdf? it explains the dlm api (which once you understand, makes dlmglue a lot easier to figure out)
coly: yes, I read it. but because I didn’t know ast and bast before, I don’t have conception on what happens in ast and bast
coly: is it something like the signal handler ?
mark: ast and bast though are just callbacks we pass to the dlm. one (ast) is used to tell fs that a request is complete, the other (bast) is used to tell fs that a lock is blocking progress from another node
coly: when an ast is triggered, what will happen ? the node received the ast can make sure the requested lock level is granted ?
mark: generally yes. the procedure is: dlmglue fires off a request… some time later, the ast callback is run and the status it passes to dlmglue indicates whether the operation succeeded
coly: if a node receives a bast, what will happen ? I mean, are there options (e.g. release its lock, or ignore the bast) ?
mark: release the lock once possible
mark: that’s the only action that doesn’t lockup the cluster
coly: I see, once a node receives a bast, it should try best to downconvert the coresponded lock to NL.
coly: it’s a little bit clear to me
I recite the log other than my own understanding, it can be helpful to get the basic conception of OCFS2′s dlm levels and what ast and bast do.