Caching makes me cranky

Among issues with Ion is its incorrect use of the DMA APIs. I’ve briefly mentioned this before. My educated opinion is that it’s a complete mess and that time travel would be a great solution to fix this this problem.

What the DMA APIs do underneath varies greatly depending on what the device is and what platform you are running on. DMA mapping can range from anything to setting up device page tables to just returning a physical address. With very high probability, your cell phone runs an ARM chip. With very high probability as well, your laptop is running some kind of x86 chip. The difference I’m going to highlight here is how the two architectures manage caches. Cache coherency is a topic worthy of PhD dissertations and many conference talks ¹. The key point for this post is that the ARM architecture does not have the same cache guarantees as x86 so it needs explicit cache operations when transfering buffers between devices. The DMA mapping code for arm and arm64 includes explicit cache operations as part of mapping and the sync APIs. Ideally no driver should ever have to think about cache topology and if using the DMA APIs properly everything happens transparently. DMA APIs work properly by creating buffer ownership between the CPU and the device. When a driver calls dma_map_sg, the buffer now belongs to the device. The CPU may not touch the buffer again until dma_unmap_sg or dma_sync_sg_for_device \ dma_sync_sg_for_cpu is called. This ensures that the CPU and the device always see the appropriate data.

Enter Ion. Ion is not a driver for a particular hardware block. It is supposed to be an allocator for other drivers. Ion was written with Android and its stack in mind. Too many drivers written for Android do not conform to the traditional driver model and don’t use the DMA APIs. This means drivers have to manage their caches some other way. Cache operations are easy to get wrong and can be dangerous to the data of the system. Public APIs are carefully reviewed and curated. The near universal rule in the kernel is that drivers should be relying on the DMA API (in a correct manner) to do their cache maintenance. The drivers that don’t use the DMA APIs are typically written for one architecture and begrudingly call architecture implementations.

Ion, being the central location for all hopes and dreams, became a psuedo-DMA layer. When a caller allocates memory from Ion, it is guaranteed to be clean in the cache as if dma_map was called. It does this by calling the dma_sync APIs without calling map. This is not allowed by the DMA APIs and just ‘happens’ to work for the devices Ion is used on (i.e. cellphones). Why not just call dma_map_sg and let that take care of the caches? Calling map would guarantee the memory would be synced appropriately with the cache. It would also kill performance. Buffers in the Android graphics framework are allocated and deallocated with almost any input. To save on the overhead of allocation, Ion keeps pages around in a pool that can be drained when under memory pressure. These pages are guaranteed to be clean in the cache. Calling map each time on every page would be unnecessary. Even if performance weren’t an issue, what device would be used for mapping? Ion has a device exported to userspace which could be used. That ends up feeling forced, especially when all that’s needed is the cache operations. Calling map starts the contract of ownership between device and CPU. The buffer isn’t actually being passed off anywhere so the operations become meaningless. Ion is allocating the buffer for some other device to eventually map and pass it off.

My attempt to pull caching directly into Ion was met with “No don’t do that. Do it properly.” I’ve got a set of APIs that are worth reviewing but I keep going back and forth on clean up and sending them out. I’d still like to pull as much of the explicit caching out of Ion as possible and make those APIs unnecessary. Some of my uncertainty is that I’m not working on a whole framework. No vendor has a really complete Ion implementation easily available for me to hack on. I’m making changes a bit blindly in hopes that others will pick it up. Focusing on one target would give me direction of “Let the framework support these needs”. I could say “Yes, we can stop with the explicit caching most places and just require drivers call begin_cpu_access or the equivalent userspace calls”. Welcome to open source software, I guess. Anyone can fix anything with the bits and pieces available if they try hard enough. We’ll see where this goes.

of presentations at conferences.

A big thank you to everyone from ARM for continuing to give these types ↩