Single images and page sizes
“The year of Linux on the desktop” is an old running joke. This has resulted in many “The year of X on the Y” spin off jokes. One of these that’s close to my heart is “The year of the arm64 server”. ARM has long dominated the embedded space and the next market they intend to capture is the server space. As some people will be more than happy to tell you, moving from the embedded space to the enterprise class server space has involved some growing pains (and the occasional meme). Most of the bickering^Wdiscussion comes from the fact that the embedded world has different requirements than the server world. Trying to support all requirements in a single tree often means making a choice for one versus the other.
The goal with a distribution like Fedora is to support many devices with as
few images as possible. Producing a separate image means more code to
maintain, more QA, and generally more work. These days we take it for granted
that multiple ARM devices can be booted on the same kernel image. This was
not always the case. Prior to 2012 or so, the platform support that lived
under arch/arm/
was not designed to work in a unified fashion. Each
vendor had a mach-foo
directory which contained code that (usually) assumed
only mach-foo
devices would exist in the image. A good example of this is
header files. Many devices would have header files under
arch/arm/mach-foo/include/mach/blah.h
. The way the include path was
structured, you could not also compile a device with
arch/arm/mach-bar/include/mach/blah.h
since there would be two headers with
the same name. Many of the important parts of the platform definition (e.g.
PHYS_OFFSET
) were #defines
which meant that platforms with different needs
could not be compiled together.
Driven by a combination of a
move towards devicetree and the realization
that none of this was sustainable, the ARM
community decided to work towards a single kernel image. Fast forward to today, and single image booting is standard thanks
to a bunch of hard work.
arm64 learned from the lessons of arm32 and has always mandated a single image.
You can see this reflected in the existence of a single defconfig
file under
arch/arm64/configs/defconfig
. This is designed to be a set of options that
are reasonable for most platforms. It is not designed to be a production ready
fully optimized configuration file. This gets brought up occasionally on the
mailing list when people try and submit changes to the defconfig
file for
optimization purposes.
Fedora is a production system and it does need to be optimized. There’s been fantastic work recently to support more single board computers like the Raspberry Pi in Fedora. Thanks to single image efforts, the same kernel can boot on both a Raspberry Pi and an enterprise class ARM server. Booting doesn’t mean work well though. Single Board Computers can come with as little as 512MB of RAM. Enterprise servers have significantly more.
Consider the choice of PAGE_SIZE
for Fedora. A page size represents the
smallest amount of physical memory that can be mapped into a
page table. aarch64 has several
options here, 4K being the most common and 64K giving better TLB
performance1. A larger page size also means more wasted space. Many
allocations need to be aligned to PAGE_SIZE
for one reason or another even
if they aren’t using close to that amount of space. This can quickly add up to
megabytes of wasted memory. A server with several gigabytes of memory probably
won’t show an impact but a system with 512MB will start to perform poorly due
to lack of RAM. Choosing one page size over the other is going to be
detrimental to one type of machine.
For a more degenerate case of PAGE_SIZE
problems, we have to look at CMA
(Contiguous Memory Allocator). CMA allows the kernel to get relatively
large (think 8MB or more) physically contiguous allocations. Systems
that use CMA will set up one or more designated CMA regions. The memory in
a CMA region can be used by the system as normal with a few restrictions.
When a driver wants to allocate contiguous memory from a CMA region, the
kernel will use underlying page migration/compaction2 to allocate
the block of memory. To help ensure the migration can succeed, CMA regions
have a minimum size. When PAGE_SIZE
is larger, the minimum size goes up
as well. The particular combination of options Fedora uses makes the minimum
size go up to 512MB when a larger PAGE_SIZE
is used on arm64. Given other
requirements for CMA, this essentially means CMA can’t be used on smaller
memory systems if a larger page size is used since the alignment requirements
are too strict. And thus we get people making choices
about what gets supported.
One way to avoid the need to make multi-platform trade offs is to make more
options runtime selectable. This is popular with many debug features that
can be builtin with the appropriate CONFIG_FOO
option but are only actually
run when an argument is passed on the kernel command line. This doesn’t
work for anything that needs to be determined at compile time though.
PAGE_SIZE
almost certainly falls into this category as do many other
constants in the kernel. The end result is that you will never be able to
find one true build configuration that’s optimal for all situations.
The best you can hope to do is foist the problem off on someone else and
let them make the trade offs so you don’t have to. Or evaluate what your
requirements actually are and go from there. Either works.
talking to a hardware engineer and asking for some graphs. Better yet, ask them for some hardware optimized for a larger page size and then use a smaller page size.
for the interested. I should also write more about CMA some time.