kmap(): Mapping Arbitrary Pages Into Kernel VM

During 2.3 kernel development (I think), "HIGHMEM" support was added. Normally, the kernel can only address (4GB-PAGE_OFFSET)/PAGE_SIZE pages of RAM, since all physical pages must be mapped to kernel addresses between PAGE_OFFSET and 4GB. (So if PAGE_OFFSET is 3GB, only 1GB of physical RAM can be used - not even that, in practice, due to fixed kernel mappings and so forth.) The HIGHMEM patches allow the kernel to use more than 1G of memory by mapping the additional pages into the high part of the kernel address space just below 4GB as necessary. They also allow high-memory pages to be mapped into user process address space.

The interface to the kmapper is through the kmap() and kunmap() functions. The heart of these is implemented by kmap_high() and kunmap_high() in mm/highmem.c.

The kmapper uses a physically contiguous set of pagetables allocated at boot time to map pages into kernel space. Having the pagetables contiguous makes it easy to move around without constantly consulting the page directory. The kmap pagetables refer to kernel virtual addresses starting at PKMAP_BASE, which in my 2.4 source tree is 0xFC000000, or 64MB shy of 4GB. A separate array pkmap_count is used to keep track of the reference count for the kmap page-table entries.

kmap() is called with a page struct* argument: the page frame of a page to map into kernel space. This can be a normal or HIGHMEM page; in the former case kmap() simply returns the direct-mapped address. For HIGHMEM pages, we search for an unused entry in the kmap pagetables that were allocated at boot time; this search is done by simply examining the pkmap_count entries in order, looking for a zero entry. If none is found, we sleep waiting for another process to unmap a page. When we find an unused entry, we insert the physical page address of the page we want to map, increment the pkmap_count reference count for the pagetable entry, and return the virtual address to the caller. We also update the page->virtual for the page struct to indicate the mapped address.

kunmap() expects a page struct* representing the page to unmap. It finds the pkmap_count entry for the page's virtual address and decrements it. That's all; the page remains mapped until the kmap pte scavenger flush_all_zero_pkmaps() finds it mapped but unreferenced and kicks it out of the page tables. kmap() calls flush_all_zero_pkmaps() whenever the last pkmap pte is examined during the search for a free pte. It searches the entire pkmap_count array looking for pages that are mapped but not actually used, clears the pkmap_count entry, and clears the pte for all such pages.

Usage Notes

Since there are a limited number of kmap slots available, it is not recommended to kmap pages and leave them mapped indefinitely. Most users of kmap use the mapped pages immediately and then unmap them again. This technique is used, for example, to copy data from HIGHMEM user pages.

The fact the kunmap() does not actually unmap the page seems to me to be somewhat dangerous. Specifically, the page could be freed by the caller (using free_page()), but remain mapped in the kmap pagetables. That would be fine, except that if __get_free_pages() is called with a GFP_MASK of GFP_HIGHMEM, it could return that page's kmapped virtual address. A subsequent call to free_all_zero_kmaps() on behalf of some unrelated code might then unmap the page. Crash, boom.

It seems to be implicit in the HIGHMEM design that __get_free_pages() should never be called with GFP_HIGHMEM, but this is not explicitly documented anywhere that I can find. To use HIGHMEM pages within the kernel, one must always use __alloc_pages() (which returns a free page struct, not a virtual address) with GFP_HIGHMEM and then use kmap() to give the page a virtual address.

Kernel VM Allocation

Questions and comments to Joe Knapka

The LXR links in this page were produced by lxrreplace.tcl, which is available for free.

Credits