An enjoyed kernel apprentice Just another WordPress weblog

April 26, 2016

libelf-devel is required when building kernel module

Filed under: Basic Knowledge,kernel — colyli @ 10:19 pm

For most documents about kernel module building just need a Makefile like this,

HELLO = helloworld

obj-m += $(HELLO).o

$(HELLO)-objs := hello.o world.o

KERNEL_SOURCE := /lib/modules/`uname -r`/build/


        $(MAKE) -C $(KERNEL_SOURCE) M=`pwd` modules


        $(MAKE) -C $(KERNEL_SOURCE) M=`pwd` clean

        $(RM) Module.markers modules.order

Then type “make” will make everything set. (Of cause source of current kernel is ready at /lib/modules/`uname -r`/build).

But yesterday when I tried to build a kernel module for Linux-4.6-rc5 (openSUSE vanilla) kernel, I observed an error never saw before.

helloworld> make

make -C /lib/modules/`uname -r`/build/ M=`pwd` modules

make[1]: Entering directory ‘/home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla’

make[2]: *** No rule to make target ‘/home/colyli/source/tmp/helloworld/hello.o’, needed by ‘/home/colyli/source/tmp/helloworld/helloworld.o’.  Stop.

Makefile:1428: recipe for target ‘_module_/home/colyli/source/tmp/helloworld’ failed

make[1]: *** [_module_/home/colyli/source/tmp/helloworld] Error 2

make[1]: Leaving directory ‘/home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla’

Makefile:10: recipe for target ‘default’ failed

make: *** [default] Error 2

It seems nothing missed, but the error message was there.  Today a friend (Chang Liu from Memblaze) tells me maybe I should check the output of “make modules_prepare” in the kernel source directory, here is my result,

linux-4.6-rc5-vanilla> make modules_prepare

Makefile:1016: “Cannot use CONFIG_STACK_VALIDATION, please install libelf-dev or elfutils-libelf-devel”

This is an informative clue, so I install libelf-dev package and re-run “make modules_prepare”,

linux-4.6-rc5-vanilla> make modules_prepare

  CHK     include/config/kernel.release

  CHK     include/generated/uapi/linux/version.h

  CHK     include/generated/utsrelease.h

  CHK     include/generated/bounds.h

  CHK     include/generated/timeconst.h

  CHK     include/generated/asm-offsets.h

  CALL    scripts/

  DESCEND  objtool

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/fixdep.o

  LD       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/fixdep-in.o

  LINK     /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/fixdep

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/exec-cmd.o

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/help.o

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/pager.o

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/parse-options.o

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/run-command.o

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/sigchain.o

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/subcmd-config.o

  LD       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/libsubcmd-in.o

  AR       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/libsubcmd.a

  GEN      /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/arch/x86/insn/inat-tables.c

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/arch/x86/decode.o

  LD       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/arch/x86/objtool-in.o

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/builtin-check.o

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/elf.o

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/special.o

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/objtool.o

  CC       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/libstring.o

  LD       /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/objtool-in.o

  LINK     /home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla/tools/objtool/objtool

No complain anymore, then I back to kernel module source directory, run “make” again,

helloworld> make

make -C /lib/modules/`uname -r`/build/ M=`pwd` modules

make[1]: Entering directory ‘/home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla’

  CC [M]  /home/colyli/source/tmp/helloworld/hello.o

  CC [M]  /home/colyli/source/tmp/helloworld/world.o

  LD [M]  /home/colyli/source/tmp/helloworld/helloworld.o

  Building modules, stage 2.

  MODPOST 1 modules

  CC      /home/colyli/source/tmp/helloworld/helloworld.mod.o

  LD [M]  /home/colyli/source/tmp/helloworld/helloworld.ko

make[1]: Leaving directory ‘/home/colyli/source/suse-kernel/patched/linux-4.6-rc5-vanilla’

Everything is cool, the kernel module is built. So installing libelf-devel package solve the problem.

But it is still not clear to me, why missing libelf-devel may cause kernel module building failed. If you know the reason, please give me a hint. Thanks in advance.

March 13, 2014

A six lectures OS kernel development course

Filed under: Basic Knowledge,kernel — colyli @ 12:17 pm

Since year 2010 after I joined Taobao (a subsidiary of Alibaba Group), I help my employer to build a Linux kernel team, to maintain in-house Linux kernel and optimize system performance continuously. The team grew from 1 person to 10 persons in the next 2 years, we made some successful stories by internal projects, while having 200+ patches merged into upstream Linux kernel.

In these 2 years, I found most of programmers had just a little concept on how to write code to cooperate with Linux kernel perfectly. And I found I was not the only person had similar conclusion. A colleague of mine, Zhitong Wang, a system software engineer from Ali Cloud (another subsidiary company of Alibaba Group), asked me whether I had interest to design and promote a course on OS kernel development, to help other junior developers to write better code on Linux servers. We had more then 100K real hardware servers online, if we could help other developers to improve 1% performance in their code, no doubt it would be extremely cool.

Very soon, we agreed on the outline of this course. This was a six lectures course, each one taking 120 ~ 150 minutes,


  • First class: Loading Kernel

This class introduced how a runnable OS kernel was loaded by boot loader and how the first instruction of the kernel was executed.

  • Second class: Protected Mode Programming

This class introduced very basic concept on x86 protect mode programming, which was fundamental to rested four classes.

  • Third class: System Call

This class explained how to design and implement system call interface, how priority transfer was happened.

  • Forth class: Process scheduling

We expected people was able to understand how a simplest scheduler was working and how context switch was made.

  • Fifth class: Physical Memory Management

In this class people could have a basic idea that how memory size was detected, how memory was managed before buddy system initialized, how buddy and slab system working.

  • Sixth class: Virtual Memory Management

Finally there were enough back ground knowledge to introduce how memory map, virtual memory area, page fault was designed and implemented, there was also a few slide pages introduces TLB and huge pages.


In next 6 months, Zhitong and I finished first version of  all slides. When Alibaba training department knew we were preparing an OS kernel development training, they helped us to arrange time slots both in Beijing and Hangzhou (Alibaba Group office location). We did the first wave training in 4 months, around 30 persons attended each class. We received a lot of positive feed back beyond our expectation. Many colleagues told me they were too busy to attend all these six classes, and required us to arrange this course again.

This was great encouragement to us. We knew the training material could be better, we yet had better method to make audience understand kernel development more. By this motivation, with many helpful suggestions from Zhitong, I spent half year to re-write all slide pages for all six classes, to make the materials to be more logical, consistent and scrutable.

Thanks to my employer, I may prepare course material in working hours, and accomplish the second wave training earlier. In last two classes, the teaching room was full, even some people had to stand for hours. Again, many colleagues complained they were too busy to miss some of the classes, and asked me to arrange another wave sometime in future.

This is not an easy task, I gave 6 classes both in Beijing and Hangzhou, including Q&A it was more than 30 hours. But I decide to arrange another wave of the course again, maybe start in Oct 2014, to show my honor to all people who helped and encouraged me 🙂

Here you may find all slide files for these six classes, they are written in simplified Chinese.
[There are more than enough document in English, but in Chinese the more the better ]

* Class 1: osdev1-loading_kernel
* Class 2: osdev2-protected_mode_programming
* Class 3: osdev3-system_call
* Class 4: osdev4-process_scheduling
* Class 5: osdev5-physical_memory_management
* Class 6: osdev6-virtual_memory_management

April 30, 2010

a conversation on DLM lock levels used in OCFS2

Filed under: Basic Knowledge,File System Magic — Tags: — colyli @ 9:56 am

Recently, I had a conversation with Mark Fasheh, the topic was DLM (Distributed Lock Manager) levels used in OCFS2 (Oracle Cluster File System v2). IMHO, the talk is quite useful for a starter of OCFS2 or DLM, I list the conversation here, hope it could be informative. Thank you, Mark 😉

Mark gave a simplified explanation on NL, PR and EX dlm lock levels used in OCFS2.

There are 3 lock levels Ocfs2 uses when protecting shared resources.

“NL” aka “No Lock” this is used as a placeholder. Either we get it so that we
can convert the lock to something useful, or we already had some higher level
lock and dropper to NL so another node can continue. This lock level does not
block any other nodes from access to the resource.

“PR” aka “Protected Read”. This is used to that multiple nodes might read the
resource at the same time without any mutual exclusion. This level blocks only
those nodes which want to make changes to the resource (EX locks).

“EX” aka “Exclusive”. This is used to keep other nodes from reading or changing
a resource while it is being changed by the current node. This level blocks PR
locks and other EX locks.

When another node wants a level of access to a resource which the current node
is blocking due to it’s lock level, that node “downconverts” the lock to a
compatible level. Sometimes we might have multiple nodes trying to gain
exclusive access to a resource at the same time (say two nodes want to go from
PR -> EX). When that happens, only one node can win and the others are sent
signals to ‘cancel’ their lock request and if need be, ‘downconvert’ to a mode
which is compatible with what’s being requested. In the previous example, that
means one of the nodes would cancel it’s attempt to go from PR->EX and
afterwards it would drop it’s PR to NL since the PR lock blocks the other node
from an EX.

After read the above text, I talked with Mark in IRC,  here is the edited (remove unnecessary part) conversation log,

coly: it’is an excellent material for DLM lock levels of ocfs2!
mark: specially if that helps folks understand what’s happening in dlmglue.c
* mark knows that code can be…. hard to follow  😉
mark: another thing you might want to take note of – this whole “cancel convert” business is there because the dlm allows a process to retain it’s current lock level while asking for an escalation
coly: one thing I am not clear is, what’s the functionality of dlmglue.c ? like the name, glue ?
mark: if you think about it – being forced to drop the lock and re-acquire would eliminate the possibility of deadlock, at the expense of performance
mark: think of dlmglue.c as the layer of code which abstracts away the dlm interface for the fs
mark: as part of that abstraction, file system lock management is wholly contained within dlmglue.c
coly: only dlmglue.c acts as a abstract layer ?  and the real job is done by fsdlm or o2dlm ?
mark: yes
mark: dlmglue is never actually creating resources itself – it’s asking the dlm on behalf of the file system
mark: aside from code cleanliness, dlmglue provides a number of features the fs needs that the dlm (rightfully) does not provide
coly: which kind of ?
mark: lock caching for example – you’ll notice that we keep counts on the locks in dlmglue
mark: also, whatever fs specific actions might be needed as part of a lock transition are initiated from dlmglue. an example of that would be checkpointing inode changes before allowing other nodes access, etc
coly: yeah, that’s one more thing confusing me.
coly:  It’s not clear to me yet, for the conception of upconvert and downconvert
coly: when it combined with ast and bast
coly: have you checked out the “dlmbook” pdf? it explains the dlm api (which once you understand, makes dlmglue a lot easier to figure out)
coly: yes, I read it. but because I didn’t know ast and bast before, I don’t have conception on what happens in ast and bast
coly: is it something like the signal handler ?
mark: ast and bast though are just callbacks we pass to the dlm. one (ast) is used to tell fs that a request is complete, the other (bast) is used to tell fs that a lock is blocking progress from another node
coly: when an ast is triggered, what will happen ? the node received the ast can make sure the requested lock level is granted ?
mark: generally yes. the procedure is: dlmglue fires off a request… some time later, the ast callback is run and the status it passes to dlmglue indicates whether the operation succeeded
coly: if a node receives a bast, what will happen ? I mean, are there options (e.g. release its lock, or ignore the bast) ?
mark: release the lock once possible
mark: that’s the only action that doesn’t lockup the cluster  😉
coly: I see, once a node receives a bast, it should try best to downconvert the coresponded lock to NL.
coly: it’s a little bit clear to me 🙂

I recite the log other than my own understanding, it can be helpful to get the basic conception of OCFS2’s dlm levels and what ast and bast do.

October 16, 2009

My first publish

Filed under: Basic Knowledge,Great Days — colyli @ 11:53 am

After publish the Chinese translation of “Linkers and Loaders” on , this week the Chinese version of this great book is published. This is my first publish, though it’s a translation 🙂


If anyone finds any mistake from the translation, please send the errata to publisher or to me directly. I do appreicate for your feed back 🙂

[NOTE: the cover picture is copied from ]

August 4, 2009

Report for the Course of Dragon Star Programming

Filed under: Basic Knowledge,File System Magic,Great Days — colyli @ 12:42 pm

In July 13~17, I took a course called ‘Dragon Star’ programming. Here is my meeting report of this great event.

At the very beginning, please permit me to thank my Labs colleagues Nikanth Karthikesan and Suresh Jayaraman for their kindly help. They review the draft version of this report during their busy working hours, provide many valuable comments and make the final version’s quality much better.

Dragon Star is a series of computer science academic communications. It’s sponsored by China Natural Science Fund, to invite outstanding overseas Chinese professor to give systematic training for post graduate students. Dragon Star programming office is located at Institute of Computing Technology of Chinese Academy of Science.

In the past several years, most attendees of this program are university students, researcher from state owned institute or organization. Due to the limited seats for each course, this program seldom accepts applications from multi-national companies (like Novell). This year, it’s quite surprising that my application was approved, therefore I have a chance to know people from state owned institute/organization and exchange ideas with professor and students.

The course I applied was “Infrastructure of data-binded applications”, which was taught by Professor Xiaodong Zhang from Ohio State University. Frankly speaking, it was a seminar rather than a course, open discussions happened freely, professor was willing to hear and discuss with students. Though it was a course for post graduate students, most of the audience were PhD student, researcher and professor from universities. The venue was in Changsha Institute of Technology, where Changsha is another city in China, famous for its crazy high temperature in summer. I was surprised to know that many students in this university stayed in school during the summer holiday and also joined the training.

In the 4.5 days course, Prof Zhang touched quite a lot layers of the whole computer storage stack, from CPU L1/2/3 caches, to main memory, swapping in virtual memory and buffer cache for disk data, and finally SSD. Though many fields were mentioned, the discussion was always focused on one topic — caching. As an file systems engineer, caching is something I should know and I must know. At the end of the course, I did feel that the content of this course was much more beyond my expectation.

Here I share some of my experiences from this course.

– Most of my computer knowledge was learned by myself, for CPU caches organization, I had a misunderstood concept of fully associative cache and direct mapped cache for long. I thought, fully associative cache was much faster than direct mapped cache, because it could compare the cached value in parallel. But full associated cache was too expensive, therefore direct mapped cache was introduced. From this training, I realized direct mapped cache is much faster than full associated cache. The short coming of direct mapped cache was cache conflict handling, which is just the advantage of fully associative cache. Even a 2 way set-assocated cache can improve a lot on cache conflict because it can cache different values in these 2 ways. Then I learned a new (to me) idea to improve cache look-up speed in set-associative cache [1].

– In this training, I also re-learned the conception of row buffer in memory controller. The professor told us they observed an interesting condition. Because the row buffer only cache one page size data, when there was a cache conflict in L2/3 cache, there should be a buffer conflict in row buffer. The conflict means, new content had to replace the old content in same place. Then he introduced their work on how to solve this issue [2]. The solution was quite simple, but how they observed this issue, and how they made the analysis, this was a perfect story.

– As a Linux kernel developer, 2 talks by this professor helped me to understand the Linux virtual memory management a little bit easier. One was a method to improve page replacement policy when cache memory is full, called token-ordered LRU, which could minimize the possibility to paging thrashing [3]. Another was to avoid caching too many access-once-only pages in cache memory, which was called clock-pro[4]. I knew the token-ordered LRU for days, but this was the first time to meet one of the algorithm authors. For clock-pro, from google I knew there was patches and not upstream yet. If you are interested on these topics, please check the reference.

– Prof Zhang’s team also did research on SSD (Solid State Disk) caching. They ran different I/O pattern and got some interesting performance numbers. One of the performance numbers impressed me was, random reads and sequential reads had recoganized performance difference on Intel X25-E SSD. In the past 2 years, I took it for granted that reading from SSD were all in same speed. The explanation here was that, there was read-ahead buffer inside the SSD, therefore sequential reads is faster than random reads. Caching is everywhere!

When I stayed in Changsha, the temperature was 40+ degree Celsius. Everyday, I walked 6km between dorm and classroom (a big university) in the sunshine and high temperature, my T-shirt was wet all the time. In the evening, it was good time to read related papers and review the topics which Prof Zhang mentioned in day time. I had a feeling that if I learned these topics myself, I would have spent 3-6 months more.

The course was divided into several talks, here I list all the talks in the order of when they were taken. For details of each talk, if you are interested, please check the reference, you can find papers and slide files there. The only pitty is, that the slide is made by MS office, might not be perfectly compatible with OpenOffice.

Day 1
The top half day was overview, topic was “Balancing System Resource Supply and Demand for Effective Computing”.
The bottom half days we discussed processor cache design principles, included,
– Basic logical structure of CPU cache
– The trade-off between hit rate and access delay
– Cache design for high hit rate with low access delay
Today, the new word for me was multi-column cache. Since most of the architecture book (I read) stops at N-way set-assocaited cache, this was the first time I knew multi-column cache.

Day 2
The first half of the day started with last day’s topic, cache management in multi-core system. Prof Zhang introduced the page coloring scheme, to assign different color to different page mapping to different cache page, and pages mapping to same cache page have same color. Pages with same color are grouped into bucket. If a task is cache consumed in run time, the kernel can allocate more buckets for it. Though I didn’t understand how the buckets can cope with DMA (I guess for physical address mapped cache, DMA transfer might invalidate some content in L1/2 cache), the benchmark number shows quite some improvement when some applications are assigned with more page buckets. Details can be found from [4]. Continuing on this idea, another work was introduced, it was a database prototype called MCC-DB on LCC (last level cache) shared machine. MCC-DB intends to analyze the cache consumption of each query task, and queue the taks with knowledge of cache usage prediction. The benchmark number says this scheme can reduce query execution times by up to 33% [5].
The second half of the day introduced the structure of DRAM row buffer, and explained why when L2 cache conflict happens, row buffer would always conflict (without the improved algorithm). Then Prof Zhang introduced an algorithm to keep the locality in DRAM and avoid row buffer conflict. The algorithm was surprisingly simple, just XOR 2 parts of a physical address. The story how they found the row buffer conflict, and how to analyze the source issue, was quite interesting. Details can be found from [2].

Day 3
The first half of the day started with disk cache replacement algorithm. Introduced LRU, and analyzed the disadvantage of LRU. A major issue was, too many accessed-once-only MRU data will flush out accessed-many-times LRU data. Then Prof Zhang introduced an algorithm called LIRS( Low Inter-reference Recency Set). The basic idea of LIRS is to make multi-level queues LRU. When a page is cached, just keep it in low level queue, only when it is accessed more than once, move it into high level queue. The cache pages in low level queue are the first choice to be replaced out. The whole algorithm is (IMHO) quite complex, and I can not tell all the details without reading the paper [6] again. Though LIRS is cool, it can not replace current improved LRU implementation in Linux kernel for page cache replacement. The key issue is, in LIRS, a quite complex queue insert/remove/move operation needs a big lock to protect involved stacks, which will be a big performance penalty in kernel space. Therefore, currently LIRS is used in user space, for example postgres (if I remember correctly).
The second half of the day continued with the page cache replacement algorithm. Prof Zhang introduced an simplified algorithm called Clock-Pro [7], which was an improvement based on Clock page replacement algorithm. Right now, there is patch that implements Clock-Pro, but not upstream yet. After Clock-Pro, Prof Zhang introduced another algorithm to avoid cache thrashing which was called Token-orderd LRU[3]. Basic idea of Token-ordered LRU is, when page cache is full, new data read-in has to replace existed pages. In order to avoid thrashing, there is a token assigned to a selected process. When replacing cache pages, do not replace pages of the process who has the token. This scheme can make sure the process who has token can execute faster and finish early to release more pages. Token-ordered LRU is in upstream Linux kernel, the implementation is around 50 lines C code, perfect small.

Day 4
The first half of the day again started with disk cache replacement. For harddisk, due to seeking and read ahead buffering, reading continuous data is much faster than reading random data. Therefore, caching random data in memory might have less performance penalty than caching continuous data when page fault happens. Because 2 LBA continous blocks does not mean they are adjacent in physical layout on hard disk. Prof Zhang introduced how they make judgement on whether LBA continous blocks were physically adjacent on disk. The basic idea is, tracking last 2 access time of each cached block. If two LBA continous blocks are read-in within a small interval, these 2 blocks are physically continous on harddisk. Then when buffer cache gets full, replace the physically continous blocks first [8]. IMHO, the drawback of this algorithm is, there has to be a quite big tree structure inside kernel space to trace access time of each block. And the timestamp is made by reading TSC, I am not sure whether there is performance penalty to read TSC on per-block-reading.
The second half of the day was all for SSD. Prof Zhang spent much time to introduce the basic structure of SSD, and their work in cooperation with Intel. They did quite a lot benchmark on Intel SSD (X25-E IIRC), and observed interesting performance numbers[9]. One of the numbers impressed me was, continuous reads is faster than random reads. I asked Prof Zhang, a dirty block should be erased before next writing, was it a nature feature to implement a copy-on-write file system (though the erase block size is some how bigger than file system block size). There was no explicit answer of this question.

Day 5
Today we discussed data transfer and cache managerment on internet, especially on P2P data sharing. The Professor introduced their research to explain why P2P is the most efficient method to transfer multimedia data on internet. The basic idea is multimedia information transfer on internet is not a zipf-like distribution[10], a topology (of users’ location) optimized P2P network can cache the most accessed multimedia data in the users’ local network , which can maximize data transfer bandwidth and speed for multimedia data on internet[11].
This was the last topic of the whole course. After this talk, there was a summary speech by Dragon Star program organizers. Then the 5 days course concluded.

From the above description, though this course touched many fields from CPU cache to internet P2P protocol, there is only one thread: improve data access performance by caching. IMHO, there are 3 methods to improve system I/O performance, cache, duplicated and prefetch. This course provides a quite systematic introduction on the first solution — caching. To design a cache system (no matter where it is located in the storage stack), some key issues should be considered: cache lookup, cache conflict, and cache replacement. This course shares the experiences of a group of people (Prof Zhang’s team) on how to improve overall system performance against these issues. As a file system developer, all the content of this course perfectly hit my professional area, which brings me many new conceptions and provides me a chance to learn how experienced people think and solve problem.

Prof Zhang, please accept my sincerely regards, for your great job in these days. You were so kind to share your knowledge with us, worked so hard even with 40C+ high temperature. Wish we can meet somewhere sometime again … …
Finally I want to thank the dragon star program office to organize such a successful course, so I have chance to know different people and their great jobs. Also I should thank my employer, a great company, to send me to Changsha, give me an opportunity to have such an excited experience.

All slide files of this course can be found from

[1] C. Zhang, X. Zhang, and Y. Yan, “Two fast and high-associativity cache schemes”, IEEE Micro, Vol. 17, No. 5, September/October, 1997, pp. 40-49.
[2] Z. Zhang, Z. Zhu, and X. Zhang, “A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality”, Proceedings of the 33rd Annual International Symposium on Microarchitecture, (Micro-33), Monterey, California, December 10-13, 2000. pp. 32-41.
[3] Song Jiang and Xiaodong Zhang, “Token-ordered LRU: an effective page replacement policy and its implementation in Linux systems”, Performance Evaluation, Vol. 60, Issue 1-4, 2005, pp. 5-29.
[4] Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan, “Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems”, Proceedings of of the 14th International Symposium on High Performance Computer Architecture (HPCA’08), Salt Lake City, Utah, February 16-20, 2008.
[5] Rubao Lee, Xiaoning Ding, Feng Chen, Qingda Lu, and  Xiaodong Zhang , “MCC-DB: minimizing cache conflicts in muli-core processors for databases”, Proceedings of 35th International Conference on Very Large Data Bases, (VLDB 2009), Lyon, France, August 24-28, 2009.
[6] Song Jiang and Xiaodong Zhang, LIRS: an efficient low inter-reference recency set replacement to improve buffer cache performance, Proceedings of the 2002 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, (SIIMETRICS’02), Marina Del Rey, California, June 15-19, 2002.
[7] Song Jiang, Feng Chen, and Xiaodong Zhang, “CLOCK-Pro: an effective improvement of the CLOCK replacement”, Proceedings of 2005 USENIX Annual Technical Conference (USENIX’05), Anaheim, CA, April 10-15, 2005.
[8] Xiaoning Ding, Song Jiang, Feng Chen, Kei Davis, and Xiaodong Zhang, “DiskSeen: exploiting disk layout and access history to enhance I/O prefetch”, Proceedings of the 2007 USENIX Annual Technical Conference, (USENIX’07), Santa Clara, California, June 17-22, 2007.
[9] Feng Chen, David Koufaty, and Xiaodong Zhang, “Understanding intrinsic characteristics and system implications of flash memory based solid state drives”, Proceedings of 2009 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems}, (SIGMETRICS/Performance 2009), Seattle, WA, June 15-19, 2009.
[10] Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang, “Does Internet media traffic really follow Zipf-like distribution?”, Proceedings of ACM SIGMETRICS’07 Conference, (Extended Abstract), San Diego, California, June 12-16, 2007.
[11] Lei Guo, Songqing Chen, and Xiaodong Zhang, “Design and evaluation of a scalable and reliable P2P assisted proxy for on-demand streaming media delivery”, IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 5, 2006, pp. 669-682.

July 3, 2009

gcc macro and preprocessor

Filed under: Basic Knowledge — Tags: — colyli @ 3:21 am

These days, I am checking sys_mount() implementation. During the code reading, I find I can not tell most of the details of macro stringification and concatenation now.

After spending 1.5 days on google and gcc online manual, I re-read macro and preprocessor related part, especially on the stringification and concatenation. After the reading, I write an document for most of the outlines. The document can be found from here [click the URL to download].

Wish it can be helpful to others, of cause I will re-read this doc again and again in future 🙂

Powered by WordPress