Lru-k在Rust中的实现及源码解析

LRU-K 是一种缓存淘汰算法，旨在改进传统的LRU（Least Recently Used，最近最少使用）算法的性能。将其中高频的数据达到K次访问移入到另一个队列进行保护。

算法思想

LRU-K中的“K”代表最近使用的次数。因此，LRU可以认为是LRU-1的特例。
LRU-K的主要目的是为了解决LRU算法“缓存污染”的问题。其核心思想是将“最近使用过1次”的判断标准扩展为“最近使用过K次”。

工作原理

LRU-K需要维护两个队列：历史队列和缓存队列。
1. 普通队列：保存着每次访问的页面。当页面访问次数达到K次时，该页面从历史队列中移除，并添加到K次队列中。
2. K次队列：保存已经访问K次的页面。当缓存队列满了之后，需要淘汰页面时，会淘汰最后一个页面，即“倒数第K次访问离现在最久”的那个页面。
详细说明：
1. 页面第一次被访问时，添加到普通队列中。
2. 当普通队列中的页面满了，根据一定的缓存策略（如FIFO、LRU、LFU）进行淘汰。
3. 当历史队列中的某个页面第K次访问时，该页面从历史队列中移除，并添加到K次队列中。
4. K次队列中的页面再次被访问时，会重新排序。

优缺点

优点：
1. LRU-K降低了“缓存污染”带来的问题，因为只有当页面被访问K次后才会被加入缓存队列。
2. LRU-K的命中率通常比LRU要高。
缺点：
1. LRU-K需要维护一个普通队列，因此内存消耗会比LRU多。
2. LRU-K需要基于次数进行排序（可以需要淘汰时再排序，也可以即时排序），因此CPU消耗比LRU要高。
3. 当K值较大时（如LRU-3或更大的K值），虽然命中率会更高，但适应性较差，需要大量的数据访问才能将历史访问记录清除掉。

实际应用

在实际应用中，LRU-2通常被认为是综合各种因素后最优的选择。

综上所述，LRU-K通过引入“最近使用过K次”的判断标准，有效地解决了LRU算法中的“缓存污染”问题，提高了缓存的命中率。然而，它也需要更多的内存和CPU资源来维护历史队列和进行排序操作。

结构设计

与Lru的结构类似，K与V均用指针方式保存，避免在使用过程中出现Copy或者Clone的可能，提高性能。
注:该方法用了指针会相应的出现许多unsafe的代码，因为在Rsut中，访问指针都被认为是unsafe。我们也可以使用数组坐标模拟指针的方式来模拟。

节点设计

相对普通的Lru节点，我们需要额外存储次数数据。

 /// LruK节点数据
struct LruKEntry {
    pub key: mem::MaybeUninit,
    pub val: mem::MaybeUninit,
    pub times: usize,
    pub prev: *mut LruKEntry,
    pub next: *mut LruKEntry,
}

类设计

由于有K次的列表，所以需要维护两个列表，在空间占用上会比Lru多一些，主要多一个字段访问次数的维护

 pub struct LruKCache {
    map: HashMap, NonNull>, S>,
    cap: usize,
    /// 触发K次数，默认为2
    times: usize,
    /// K次的队列
    head_times: *mut LruKEntry,
    tail_times: *mut LruKEntry,
    /// 普通队列
    head: *mut LruKEntry,
    tail: *mut LruKEntry,
    /// 普通队列的长度
    lru_count: usize,
}

首先初始化对象，初始化map及空的双向链表：

 impl LruCache {
    /// 提供hash函数
    pub fn with_hasher(cap: usize, times: usize, hash_builder: S) -> LruKCache {
        let cap = cap.max(1);
        let map = HashMap::with_capacity_and_hasher(cap, hash_builder);
        let head = Box::into_raw(Box::new(LruKEntry::new_empty()));
        let tail = Box::into_raw(Box::new(LruKEntry::new_empty()));
        unsafe {
            (*head).next = tail;
            (*tail).prev = head;
        }
        let head_times = Box::into_raw(Box::new(LruKEntry::new_empty()));
        let tail_times = Box::into_raw(Box::new(LruKEntry::new_empty()));
        unsafe {
            (*head_times).next = tail_times;
            (*tail_times).prev = head_times;
        }
        Self {
            map,
            cap,
            times,
            head_times,
            tail_times,
            head,
            tail,
            lru_count: 0,
        }
    }
}

元素插入及删除

插入对象，分已在缓存内和不在缓存内：

 pub fn capture_insert(&mut self, k: K, mut v: V) -> Option<(K, V, bool)> {
    let key = KeyRef::new(&k);
    match self.map.get_mut(&key) {
        Some(entry) => {
            let entry_ptr = entry.as_ptr();
            unsafe {
                mem::swap(&mut *(*entry_ptr).val.as_mut_ptr(), &mut v);
            }
            self.detach(entry_ptr);
            self.attach(entry_ptr);
 
            Some((k, v, true))
        }
        None => {
            let (val, entry) = self.replace_or_create_node(k, v);
            let entry_ptr = entry.as_ptr();
            self.attach(entry_ptr);
            unsafe {
                self.map
                    .insert(KeyRef::new((*entry_ptr).key.as_ptr()), entry);
            }
            val.map(|(k, v)| (k, v, false))
        }
    }
}
 
pub fn remove(&mut self, k: &Q) -> Option<(K, V)>
    where
        K: Borrow,
        Q: Hash + Eq + ?Sized,
    {
        match self.map.remove(KeyWrapper::from_ref(k)) {
            Some(l) => unsafe {
                self.detach(l.as_ptr());
                let node = *Box::from_raw(l.as_ptr());
                Some((node.key.assume_init(), node.val.assume_init()))
            },
            None => None,
        }
    }

与Lru的操作方式类似，但是主要集中在attach及detach因为有两个队列，需要正确的附着在正确的队列之上。

attach

 /// 加到队列中
fn attach(&mut self, entry: *mut LruKEntry) {
    unsafe {
        (*entry).times = (*entry).times.saturating_add(1);
        if (*entry).times < self.times {
            self.lru_count += 1;
            (*entry).next = (*self.head).next;
            (*(*entry).next).prev = entry;
            (*entry).prev = self.head;
            (*self.head).next = entry;
        } else {
            (*entry).next = (*self.head_times).next;
            (*(*entry).next).prev = entry;
            (*entry).prev = self.head_times;
            (*self.head_times).next = entry;
        }
    }
}

在加入到队列的时候，需将访问次数+1，然后判断是否达到K次的次数，如果达到将其加入到head_times队列中。
其中使用了saturating_add，这里说个Rust与其它语言的差别。
因为在Rust中不像c语言，如果在c语言中，定义一个uchar类型

 uchar times = 255;
times += 1; //此时times为0，不会有任何异常

但是在Rust中

 let mut times: u8 = 255;
times = times.overflowing_add(1); // 此时times为0，因为上溢出了
times = times.saturating_add(1); // 此时times为255，因为达到了最大值
times += 1; // 此时将会发生panic

此时这函数的效率基本上等同于Lru的，相对仅仅是多维护times字段及lru_count字段。

detach

 fn detach(&mut self, entry: *mut LruKEntry) {
    unsafe {
        (*(*entry).prev).next = (*entry).next;
        (*(*entry).next).prev = (*entry).prev;
 
        if (*entry).times < self.times {
            self.lru_count -= 1;
        }
    }
}

与Lru中的类似，仅仅如果次数在k次以下的时候维护lru_count，效率基本一致。

replace_or_create_node

 fn replace_or_create_node(&mut self, k: K, v: V) -> (Option<(K, V)>, NonNull>) {
    if self.len() == self.cap {
        let old_key = if self.lru_count > 0 {
            KeyRef {
                k: unsafe { &(*(*(*self.tail).prev).key.as_ptr()) },
            }
        } else {
            KeyRef {
                k: unsafe { &(*(*(*self.tail_times).prev).key.as_ptr()) },
            }
        };
        let old_node = self.map.remove(&old_key).unwrap();
        let node_ptr: *mut LruKEntry = old_node.as_ptr();
        unsafe  {
            (*node_ptr).times = 0;
        }
        let replaced = unsafe {
            (
                mem::replace(&mut (*node_ptr).key, mem::MaybeUninit::new(k)).assume_init(),
                mem::replace(&mut (*node_ptr).val, mem::MaybeUninit::new(v)).assume_init(),
            )
        };
 
        self.detach(node_ptr);
 
        (Some(replaced), old_node)
    } else {
        (None, unsafe {
            NonNull::new_unchecked(Box::into_raw(Box::new(LruKEntry::new(k, v))))
        })
    }

淘汰数据，优先淘汰普通队列的数据，如果普通队列为空，将进入淘汰K次队列。区别就是在于淘汰时多选择一次数据。效率上也基本上可以忽略不计。

其它操作

pop移除栈顶上的数据，最近使用的
pop_last移除栈尾上的数据，最久未被使用的
contains_key判断是否包含key值
raw_get直接获取key的值，不会触发双向链表的维护
get获取key的值，并维护双向链表
get_mut获取key的值，并可以根据需要改变val的值
retain 根据函数保留符合条件的元素
get_or_insert_default 获取或者插入默认参数
get_or_insert_mut 获取或者插入对象，可变数据

如何使用

在cargo.toml中添加

 [dependencies]
algorithm = "0.1"

示例

 use algorithm::LruKCache;
fn main() {
    let mut lru = LruKCache::with_times(3, 3);
    lru.insert("this", "lru");
    for _ in 0..3 {
        let _ = lru.get("this");
    }
    lru.insert("hello", "algorithm");
    lru.insert("auth", "tickbh");
    assert!(lru.len() == 3);
    lru.insert("auth1", "tickbh");
    assert_eq!(lru.get("this"), Some(&"lru"));
    assert_eq!(lru.get("hello"), None);
    assert!(lru.len() == 3);
}

完整项目地址

https://github.com/tickbh/algorithm-rs

结语

Lru-k与lru的区别在于多维护一个队列，及每个元素多维护一个次数选项，对于性能的影响不大，仅仅多耗一点cpu，但是可以相应的提高命中率，下一章将介绍LFU按频次的淘汰机制。

相关阅读:
Windows自带硬盘测试工具使用教程
 操作系统01_进程管理_---软考高级系统架构师006
TienChin 渠道管理-配置字典常量
 tiup cluster destroy
SPA项目开发之动态树+数据表格+分页
 Ubuntu下 FTP的搭建配置
 CPP 核心编程6-多态
 代码模版-实现form表单输入框和label统一对齐，vue+elementui
9月12日OpenCV学习笔记——基于 Dlib 库的人脸检测
 微信小程序上架，AI类目审核（AI问答、AI绘画、AI换脸）
原文地址：https://www.cnblogs.com/wmproxy/p/18260014

	/// LruK节点数据
	struct LruKEntry {
	pub key: mem::MaybeUninit,
	pub val: mem::MaybeUninit,
	pub times: usize,
	pub prev: *mut LruKEntry,
	pub next: *mut LruKEntry,
	}

	pub struct LruKCache {
	map: HashMap, NonNull>, S>,
	cap: usize,
	/// 触发K次数，默认为2
	times: usize,
	/// K次的队列
	head_times: *mut LruKEntry,
	tail_times: *mut LruKEntry,
	/// 普通队列
	head: *mut LruKEntry,
	tail: *mut LruKEntry,
	/// 普通队列的长度
	lru_count: usize,
	}

	impl LruCache {
	/// 提供hash函数
	pub fn with_hasher(cap: usize, times: usize, hash_builder: S) -> LruKCache {
	let cap = cap.max(1);
	let map = HashMap::with_capacity_and_hasher(cap, hash_builder);
	let head = Box::into_raw(Box::new(LruKEntry::new_empty()));
	let tail = Box::into_raw(Box::new(LruKEntry::new_empty()));
	unsafe {
	(*head).next = tail;
	(*tail).prev = head;
	}
	let head_times = Box::into_raw(Box::new(LruKEntry::new_empty()));
	let tail_times = Box::into_raw(Box::new(LruKEntry::new_empty()));
	unsafe {
	(*head_times).next = tail_times;
	(*tail_times).prev = head_times;
	}
	Self {
	map,
	cap,
	times,
	head_times,
	tail_times,
	head,
	tail,
	lru_count: 0,
	}
	}
	}

	pub fn capture_insert(&mut self, k: K, mut v: V) -> Option<(K, V, bool)> {
	let key = KeyRef::new(&k);
	match self.map.get_mut(&key) {
	Some(entry) => {
	let entry_ptr = entry.as_ptr();
	unsafe {
	mem::swap(&mut (entry_ptr).val.as_mut_ptr(), &mut v);
	}
	self.detach(entry_ptr);
	self.attach(entry_ptr);

	Some((k, v, true))
	}
	None => {
	let (val, entry) = self.replace_or_create_node(k, v);
	let entry_ptr = entry.as_ptr();
	self.attach(entry_ptr);
	unsafe {
	self.map
	.insert(KeyRef::new((*entry_ptr).key.as_ptr()), entry);
	}
	val.map(\|(k, v)\| (k, v, false))
	}
	}
	}

	pub fn remove(&mut self, k: &Q) -> Option<(K, V)>
	where
	K: Borrow,
	Q: Hash + Eq + ?Sized,
	{
	match self.map.remove(KeyWrapper::from_ref(k)) {
	Some(l) => unsafe {
	self.detach(l.as_ptr());
	let node = *Box::from_raw(l.as_ptr());
	Some((node.key.assume_init(), node.val.assume_init()))
	},
	None => None,
	}
	}

	/// 加到队列中
	fn attach(&mut self, entry: *mut LruKEntry) {
	unsafe {
	(entry).times = (entry).times.saturating_add(1);
	if (*entry).times < self.times {
	self.lru_count += 1;
	(entry).next = (self.head).next;
	((entry).next).prev = entry;
	(*entry).prev = self.head;
	(*self.head).next = entry;
	} else {
	(entry).next = (self.head_times).next;
	((entry).next).prev = entry;
	(*entry).prev = self.head_times;
	(*self.head_times).next = entry;
	}
	}
	}

	uchar times = 255;
	times += 1; //此时times为0，不会有任何异常

	let mut times: u8 = 255;
	times = times.overflowing_add(1); // 此时times为0，因为上溢出了
	times = times.saturating_add(1); // 此时times为255，因为达到了最大值
	times += 1; // 此时将会发生panic

	fn detach(&mut self, entry: *mut LruKEntry) {
	unsafe {
	((entry).prev).next = (*entry).next;
	((entry).next).prev = (*entry).prev;

	if (*entry).times < self.times {
	self.lru_count -= 1;
	}
	}
	}

	fn replace_or_create_node(&mut self, k: K, v: V) -> (Option<(K, V)>, NonNull>) {
	if self.len() == self.cap {
	let old_key = if self.lru_count > 0 {
	KeyRef {
	k: unsafe { &(((*self.tail).prev).key.as_ptr()) },
	}
	} else {
	KeyRef {
	k: unsafe { &(((*self.tail_times).prev).key.as_ptr()) },
	}
	};
	let old_node = self.map.remove(&old_key).unwrap();
	let node_ptr: *mut LruKEntry = old_node.as_ptr();
	unsafe {
	(*node_ptr).times = 0;
	}
	let replaced = unsafe {
	(
	mem::replace(&mut (*node_ptr).key, mem::MaybeUninit::new(k)).assume_init(),
	mem::replace(&mut (*node_ptr).val, mem::MaybeUninit::new(v)).assume_init(),
	)
	};

	self.detach(node_ptr);

	(Some(replaced), old_node)
	} else {
	(None, unsafe {
	NonNull::new_unchecked(Box::into_raw(Box::new(LruKEntry::new(k, v))))
	})
	}

	use algorithm::LruKCache;
	fn main() {
	let mut lru = LruKCache::with_times(3, 3);
	lru.insert("this", "lru");
	for _ in 0..3 {
	let _ = lru.get("this");
	}
	lru.insert("hello", "algorithm");
	lru.insert("auth", "tickbh");
	assert!(lru.len() == 3);
	lru.insert("auth1", "tickbh");
	assert_eq!(lru.get("this"), Some(&"lru"));
	assert_eq!(lru.get("hello"), None);
	assert!(lru.len() == 3);
	}