# lowmemorykiller 概述
Android 的设计理念之一,便是应用程序退出,但进程还会继续存在系统以便再次启动时提高响应时间。这样的设计会带来一个问题,每个进程都有自己独立的内存地址空间,随着应用打开数量的增多,系统已使用的内存越来越大,就很有可能导致系统内存不足,那么需要一个能管理所有进程,根据一定策略来释放进程的策略,这便有了 lmk
,全称为 LowMemoryKiller (低内存杀手),它使用 lmkd
来决定什么时机杀掉什么进程.
Android 基于 Linux 的系统,其实 Linux 有类似的内存管理策略 ——OOM killer (Out Of Memory Killer), OOM 的策略更多的是用于分配内存不足时触发,将得分最高的进程杀掉。而 LowMemoryKiller
则会每隔一段时间检查一次,当系统剩余可用内存较低时,便会触发杀进程的策略,根据不同的剩余内存档位来来选择杀不同优先级的进程,而不是等到 OOM 时再来杀进程,真正 OOM 时系统可能已经处于异常状态,系统更希望的是未雨绸缪,在内存很低时来杀掉一些优先级较低的进程来保障后续操作的顺利进行。
与 lowmemorykiller
相关的源码路径
msm-google\drivers\staging\android\lowmemorykiller.c |
# lowmemorykiller 基本原理
/* drivers/misc/lowmemorykiller.c | |
* | |
* The lowmemorykiller driver lets user-space specify (指定) a set of memory thresholds (阈值) | |
* where processes with a range of oom_score_adj values will get killed. Specify | |
* the minimum oom_score_adj values in | |
* /sys/module/lowmemorykiller/parameters/adj and the number of free pages in | |
* /sys/module/lowmemorykiller/parameters/minfree. Both files take a comma | |
* separated list of numbers in ascending order (按升序序列). | |
* | |
* For example, write "0,8" to /sys/module/lowmemorykiller/parameters/adj and | |
* "1024,4096" to /sys/module/lowmemorykiller/parameters/minfree to kill | |
* processes with a oom_score_adj value of 8 or higher when the free memory | |
* drops below 4096 pages and kill processes with a oom_score_adj value of 0 or | |
* higher when the free memory drops below 1024 pages. | |
* | |
* The driver considers memory used for caches to be free, but if a large | |
* percentage of the cached memory is locked this can be very inaccurate (不准确) | |
* and processes may not get killed until the normal oom killer is triggered. | |
*/ |
所有应用进程都是从 zygote 孵化出来的,记录在 AMS 中 mLruProcesses 列表中,由 AMS 进行统一管理,AMS 中会根据进程的状态更新进程对应的 oom_adj 值,这个值会通过文件传递到 kernel 中去,kernel 有个低内存回收机制,在内存达到一定阀值时会触发清理 oom_adj 值高的进程腾出更多的内存空间,这就是 Lowmemorykiller 工作原理。
这一张图很好的表示了 lmk 的原理
但对于我的 pixel3, 内核版本 Linux4.9
来说没有 lowmemorykiller
这个模块
往下翻了翻源码,才发现原来 lowmemorykiller
已经不是一个模块了
(存疑) 并且这些阈值被硬编码在源码中,也就是说无法通过修改 /sys/module/lowmemorykiller/parameters/adj 和 /sys/module/lowmemorykiller/parameters/minfree 来改变 lowmemorykiller 的杀进程策略
解释一下这里各个值的含义:当内存低于 64MB 时,系统会杀死 adj>=12
级别的进程;当内存低于 16MB 时,系统会杀死 adj>=6
级别的进程… 以此类推
对于应用进程来说,需要有自身的 adj,由 AMS 负责更新。定义在 oom_adj 和 oom_score_adj 文件中:
/proc/pid/oom_adj
:代表当前进程的优先级,这个优先级是 kernel 中的优先级,值越小优先级越高/proc/pid/oom_score_adj
:这个是 AMS 上层的优先级,与 ProcessList 中的优先级对应,值越小优先级越高
前台进程的 oom_adj
为 0, oom_score_adj
也为 0, 表示不可被杀死
blueline:/ # ps -A | grep ez | |
u0_a243 16722 994 14961560 221328 SyS_epoll_wait 0 S com.example.ezandroid | |
blueline:/ # cat /proc/16722/oom_adj | |
0 | |
blueline:/ # cat /proc/16722/oom_score_adj | |
0 |
oom_adj
和 oom_score_adj
的转换关系为
//msm-google\drivers\staging\android\lowmemorykiller.c | |
/* | |
* /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for | |
* pid. | |
*/ | |
#define OOM_SCORE_ADJ_MIN (-1000) | |
#define OOM_SCORE_ADJ_MAX 1000 | |
/* | |
* /proc/<pid>/oom_adj set to -17 protects from the oom killer for legacy | |
* purposes. | |
*/ | |
#define OOM_DISABLE (-17) | |
/* inclusive */ | |
#define OOM_ADJUST_MIN (-16) | |
#define OOM_ADJUST_MAX 15 | |
static short lowmem_oom_adj_to_oom_score_adj(short oom_adj) | |
{ | |
if (oom_adj == OOM_ADJUST_MAX) | |
return OOM_SCORE_ADJ_MAX; | |
else | |
return (oom_adj * OOM_SCORE_ADJ_MAX) / -OOM_DISABLE; | |
} |
# LowmemoryKiller 机制剖析
总的来说,Framework 层通过调整 adj 的值和阈值数组,输送给 kernel 中的 lmk,为 lmk 提供杀进程的依据,因为用户空间和内核空间相互隔离,就采用了文件节点进行通讯,用 socket
将 adj 的值与阈值数组传给 lmkd (5.0 之后不在由 AMS 直接与 lmk 通信,引入 lmkd 守护进程),lmkd 将这些值写到内核节点中。lmk 通过读取这些节点,实现进程的 kill,所以整个 lmk 机制大概可分成三层。
# shrinker 机制简介
LMK 驱动通过注册 shrinker 来实现的,shrinker 是 linux kernel 标准的回收内存 page 的机制,由内核线程 kswapd 负责监控。
当内存不足时 kswapd 线程会遍历一张 shrinker 链表,并回调已注册的 shrinker 函数来回收内存 page,kswapd 还会周期性唤醒来执行内存操作。每个 zone 维护 active_list 和 inactive_list 链表,内核根据页面活动状态将 page 在这两个链表之间移动,最终通过 shrink_slab 和 shrink_zone 来回收内存页
# lowmemorykiller 初始化
注册 shrinker
//msm-google\drivers\staging\android\lowmemorykiller.c | |
static struct shrinker lowmem_shrinker = { | |
.scan_objects = lowmem_scan, | |
.count_objects = lowmem_count, | |
.seeks = DEFAULT_SEEKS * 16 | |
}; | |
static int __init lowmem_init(void) | |
{ | |
register_shrinker(&lowmem_shrinker); | |
lmk_event_init(); | |
return 0; | |
} |
其中 shrinker 结构体的参数如下,当 count_objects 返回非 0 值时会触发 scan_objects 回调函数
//msm-google\include\linux\shrinker.h | |
/* | |
* A callback you can register to apply pressure to ageable caches. | |
* | |
* @count_objects should return the number of freeable items in the cache. If | |
* there are no objects to free or the number of freeable items cannot be | |
* determined, it should return 0. No deadlock checks should be done during the | |
* count callback - the shrinker relies on aggregating scan counts that couldn't | |
* be executed due to potential deadlocks to be run at a later call when the | |
* deadlock condition is no longer pending. | |
* | |
* @scan_objects will only be called if @count_objects returned a non-zero | |
* value for the number of freeable objects. The callout should scan the cache | |
* and attempt to free items from the cache. It should then return the number | |
* of objects freed during the scan, or SHRINK_STOP if progress cannot be made | |
* due to potential deadlocks. If SHRINK_STOP is returned, then no further | |
* attempts to call the @scan_objects will be made from the current reclaim | |
* context. | |
* | |
* @flags determine the shrinker abilities, like numa awareness | |
*/ | |
struct shrinker { | |
unsigned long (*count_objects)(struct shrinker *, | |
struct shrink_control *sc); | |
unsigned long (*scan_objects)(struct shrinker *, | |
struct shrink_control *sc); | |
int seeks; /* seeks to recreate an obj */ | |
long batch; /* reclaim batch size, 0 = default */ | |
unsigned long flags; | |
/* These are for internal use */ | |
struct list_head list; | |
/* objs pending delete, per node */ | |
atomic_long_t *nr_deferred; | |
}; |
# lowmem_count
统计缓存
//msm-google\drivers\staging\android\lowmemorykiller.c | |
static unsigned long lowmem_count(struct shrinker *s, | |
struct shrink_control *sc) | |
{ | |
return global_node_page_state(NR_ACTIVE_ANON) + | |
global_node_page_state(NR_ACTIVE_FILE) + | |
global_node_page_state(NR_INACTIVE_ANON) + | |
global_node_page_state(NR_INACTIVE_FILE); | |
} |
- ANON:匿名页,匿名代表没有后备存储,这种内存如果想回收,需要换出到硬盘上的 swap 分区;
- FILE:文件页,文件代表有文件对应,这种内存如果想回收,只需将有数据更新的脏页(dirty page)写回到磁盘的文件中即可。
内存的计算 = 活跃匿名页 + 活跃文件页 + 非活跃匿名页 + 非活跃文件页。
# lowmem_scan
释放缓存
//msm-google\drivers\staging\android\lowmemorykiller.c | |
static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc) | |
{ | |
struct task_struct *tsk; | |
struct task_struct *selected = NULL; | |
unsigned long rem = 0; | |
int tasksize; | |
int i; | |
short min_score_adj = OOM_SCORE_ADJ_MAX + 1; | |
int minfree = 0; | |
int selected_tasksize = 0; | |
short selected_oom_score_adj; | |
int array_size = ARRAY_SIZE(lowmem_adj); | |
// 当前系统剩余内存大小,算法为:基于 free 总量去除其中作为 reserve 管理结构的部分 | |
int other_free = global_page_state(NR_FREE_PAGES) - totalreserve_pages; | |
// 当前系统 page cache 的大小 | |
int other_file = global_node_page_state(NR_FILE_PAGES) - | |
global_node_page_state(NR_SHMEM) - | |
global_node_page_state(NR_UNEVICTABLE) - | |
total_swapcache_pages(); | |
if (lowmem_adj_size < array_size) | |
array_size = lowmem_adj_size; | |
if (lowmem_minfree_size < array_size) | |
array_size = lowmem_minfree_size; | |
// 遍历最小内存阈值,如果当前内存低于阈值,同时当前 page cache 也低于阈值, | |
//(内存充裕时会有大量内存充当 page cache 以提高系统 IO 性能) | |
// 则为 min_score_adj 赋值后退出 for 循环 | |
for (i = 0; i < array_size; i++) { | |
minfree = lowmem_minfree[i]; | |
if (other_free < minfree && other_file < minfree) { | |
min_score_adj = lowmem_adj[i]; | |
break; | |
} | |
} | |
lowmem_print(3, "lowmem_scan %lu, %x, ofree %d %d, ma %hd\n", | |
sc->nr_to_scan, sc->gfp_mask, other_free, | |
other_file, min_score_adj); | |
// 如果 min_score_adj 等于初值,则表示内存充足,退出函数 | |
if (min_score_adj == OOM_SCORE_ADJ_MAX + 1) { | |
lowmem_print(5, "lowmem_scan %lu, %x, return 0\n", | |
sc->nr_to_scan, sc->gfp_mask); | |
return 0; | |
} | |
selected_oom_score_adj = min_score_adj; | |
// 内核 RCU (Read-Copy Update) 同步机制 | |
// 随意读,但更新数据的时候,需要先复制一份副本,在副本上完成修改,再一次性地替换旧数据 | |
rcu_read_lock(); | |
// 遍历系统进程,要开始杀进程了 | |
for_each_process(tsk) { | |
struct task_struct *p; | |
short oom_score_adj; | |
// 内核进程,跳过 | |
if (tsk->flags & PF_KTHREAD) | |
continue; | |
// 对于普通用户进程来说,mm 指向虚拟地址空间的用户空间部分,而对于内核线程,mm 为 NULL。 | |
// 内核线程和普通的进程间的区别在于内核线程没有独立的地址空间,mm 指针被设置为 NULL; | |
// 它只在内核空间运行,从来不切换到用户空间去;并且和普通进程一样,可以被调度,也可以被抢占。 | |
// 如果是内核线程,直接跳过 | |
p = find_lock_task_mm(tsk); | |
if (!p) | |
continue; | |
if (task_lmk_waiting(p) && | |
time_before_eq(jiffies, lowmem_deathpending_timeout)) { | |
task_unlock(p); | |
rcu_read_unlock(); | |
return 0; | |
} | |
// 如果当前找到的进程的 oom_score_adj 比当前需要杀的最小优先级还低,不杀 | |
oom_score_adj = p->signal->oom_score_adj; | |
if (oom_score_adj < min_score_adj) { | |
task_unlock(p); | |
continue; | |
} | |
// 获取进程的占用内存大小 (rss 值) | |
//rss: resident set size, the non-swappend physical memory that a task has used in. | |
tasksize = get_mm_rss(p->mm); | |
task_unlock(p); | |
if (tasksize <= 0) | |
continue; | |
// 首个进程,selected 值必为 NULL | |
if (selected) { | |
// 高优先级进程,不杀 | |
if (oom_score_adj < selected_oom_score_adj) | |
continue; | |
// 同优先级进程,如果该进程占用内存小于阈值,同样不杀 | |
if (oom_score_adj == selected_oom_score_adj && | |
tasksize <= selected_tasksize) | |
continue; | |
} | |
// 已经找到了需要 kill 的进程,更新它的 tasksize 与 oom_score_adj | |
selected = p; | |
selected_tasksize = tasksize; | |
selected_oom_score_adj = oom_score_adj; | |
lowmem_print(2, "select '%s' (%d), adj %hd, size %d, to kill\n", | |
p->comm, p->pid, oom_score_adj, tasksize); | |
} | |
if (selected) { | |
long cache_size = other_file * (long)(PAGE_SIZE / 1024); | |
long cache_limit = minfree * (long)(PAGE_SIZE / 1024); | |
long free = other_free * (long)(PAGE_SIZE / 1024); | |
task_lock(selected); | |
// 发送 SIGKILL 信号,杀死这个进程 | |
send_sig(SIGKILL, selected, 0); | |
if (selected->mm) | |
task_set_lmk_waiting(selected); | |
task_unlock(selected); | |
trace_lowmemory_kill(selected, cache_size, cache_limit, free); | |
lowmem_print(1, "Killing '%s' (%d) (tgid %d), adj %hd,\n" | |
" to free %ldkB on behalf of '%s' (%d) because\n" | |
" cache %ldkB is below limit %ldkB for oom_score_adj %hd\n" | |
" Free memory is %ldkB above reserved\n", | |
selected->comm, selected->pid, selected->tgid, | |
selected_oom_score_adj, | |
selected_tasksize * (long)(PAGE_SIZE / 1024), | |
current->comm, current->pid, | |
cache_size, cache_limit, | |
min_score_adj, | |
free); | |
lowmem_deathpending_timeout = jiffies + HZ; | |
rem += selected_tasksize; | |
get_task_struct(selected); | |
} | |
lowmem_print(4, "lowmem_scan %lu, %x, return %lu\n", | |
sc->nr_to_scan, sc->gfp_mask, rem); | |
rcu_read_unlock(); | |
if (selected) { | |
handle_lmk_event(selected, selected_tasksize, min_score_adj); | |
put_task_struct(selected); | |
} | |
return rem; | |
} |
# 参考资料
- Android LowMemoryKiller 原理分析
- Android 进程系列第六篇 —LowmemoryKiller 机制分析 (上)
- Android 进程系列第七篇 —LowmemoryKiller 机制分析 (中)
- Android 进程系列第八篇 —LowmemoryKiller 机制分析 (下)
- linux 内核:一文读懂 lowmemorykiller 机制