常用库学习

2022-05-19

Rust

字数统计: 3.7k | 阅读时长≈ 16 分钟

对于实际开发应用问题，需要对同类 crate 进行深入引用、比较分析，有利于遇到实际应用问题时，找到合适解决方案。

Rust 正在发展，所以这个文档也应该随着 crate 发展而更改。

标准库

宏

利用模板生产字符串

format!()

样例：

#[derive(Debug)]
struct CustomError(String);

let ce = CustomError(format!("Error reading `{}`: {}", path, err));

`println!` 使用注意

注意：println! 语句每次打印都会将内容 flush 到终端，因为通常需要打印新行，所以如果在意程序性能，需要谨慎使用println!，

如何优化？

可以使用 BufWriter

它是将 stdout 句柄包装在BufWriter 中，
默认缓冲区 8KB，
当需要立刻写到磁盘时，调用.flush() 函数将 BufWriter 中的数据打印出来。
优点2：可以方便的获取 stdout 和 stderr 的锁，并使用 writeln!() 出来。

std::env::args

获取命令行参数

通用编程概念：

析构函数（destructor），是一个清理实例的函数；

构造函数，创建实例的函数；

构造函数与析构函数是一对概念

Rust 所有权系统会确保引用总是有效，也会确保drop 只会在值不再被使用时被自动调用一次。

手动调用需要使用 std::mem::drop 方法。

Rust 的 Option<T> 中 task方法会取出Option中的Some值，只留下None值。

更多内容查看Option的 task 方法的 API，有实例。

#[allow(dead_code)] 相关内容

// #[allow(dead_code)] is an attribute that disables the dead_code lint

作用：用在声明但没有使用的函数/方法上，消除dead_code的告警。

cfg 相关内容

the cfg attribute: #[cfg(...)] in attribute position
the cfg! macro: cfg!(...) in boolean expressions

valgrind 工具的使用

we can double check for memory errors using valgrind

Option

Combinators

组合器可以用模块化的方式管理控制流（control flow）

控制流，即条件判断，if，match，for，等都可以用于控制流的表达式。

match 是一种处理 Option 类型有效的方式，但是有时候，当我们需要匹配的值只有一种有效输入时，使用 match 就太重了（即结构复杂，不易读和管理）。

一般由两种方式代替：

只关心一种值是否能匹配，即简单模式使用if let模式匹配来处理；
如果有嵌套判断时（复杂点），使用Combinators中的方法来处理会更优雅一些。

常用方法：

map()
and_then()

`map()`

.map() ，是Option内置的一个方法，提供简单的映射关系，Some->Some，或None->None。

方法签名如下：

pub fn map<U, F: FnOnce(T) -> U>(self, f: F) -> Option<U>

说明：注意参数的返回值类型与外部返回值类型。
f —> U，而 map -> Option<U>

该方法支持链式调用。即map(xx).map(xx)。

`and_then()`

.and_then() ，避免复杂的嵌套，类似其他语言中的flatmap()，

方法签名如下：

pub fn and_then<U, F: FnOnce(T) -> Option<U>>(self, f: F) -> Option<U>

说明：注意参数的返回值类型与外部返回值类型。
f -> Option<U>, and_then -> Option<U>.

当结果为None时，返回None。

从上面的返回值类型对比，可知，

相同：两者返回类型都有一层封装Option类型，

不同：map 会比 and_then 多一层封装。

注意：map 或 and_then 结合嵌套使用时，需要注意返回值类型。可以用 and_then 在外层。

`ok()`

`ok_or()`

用来转换Option<T> -> Result<T, E>

参数为转换后的 Error 类型。

1	pub fn ok_or<E>(self, err: E) -> Result<T, E>

1 2	struct EmptyVec; let first = vec.first().ok_or(EmptyVec)?;

`ok_or_else()`

Transforms the Option<T> into a Result<T, E>, mapping [Some(v)] to [Ok(v)] and None to [Err(err())].

转换 Option<T> -> Result<T, E>

Some(v) -> Ok(v)
None -> Err(e)

转换 None -> Err 的过程需要手动实现并作为参数传入。

Result

Ok(T)
Err(E)

.unwrap()

成功，返回元素T，
失败，抛出，panic，

.unwrap_or_else()

成功，返回元素T，
失败，调用传入的闭包/函数。

fn echo()...

println!("`echo hello > a/b.txxt`");
echo("hello", &Path::new("a/b.txt")).unwrap_or_else(|why| println!("! {:?}", why.kind()));

在 main() 函数中使用 Result

通常的main函数如下：

1
2
3

fn main() {
    println!("Hello world");
}

当main 函数需要返回时，可以使用如下方式：

fn main() -> Result<(), ParseIntError>{
    let number_str = "10";
    let number = match number_str.parse::<i32>() {
        Ok(number) => number,
        Err(e) => return Err(e),
    };
    println!("{}", number);
    Ok(())
}

与Option 类型中的组合器的使用方式与注意事项类似

map()

pub fn map<U, F: FnOnce(T) -> U>(self, op: F) -> Result<U, E>

返回值类型注意点：
op -> U, map -> Result<U, E>

and_then()

pub fn and_then<U, F: FnOnce(T) -> Result<U, E>>(self, op: F) -> Result<U, E>

返回值类型注意点：
op -> Result<U, E>, and_then -> Result<U, E>

特殊点：

fn double_first_v3(vec: Vec<&str>) -> Result<Option<i32>, ParseIntError> {
    let opt = vec.first().map(|first| {
        first.parse::<i32>().map(|n| 2 * n)
    });

    // 这里的map()，从上面的方法签名可以得知参数为实现了 FnOnce(T) -> U 的类型
    // 这里直接使用了简写的 Some。
    opt.map_or(Ok(None), |r| r.map(Some))
}

`Ok()`

Converts from Result<T, E> to Option<T>.

Converts self into an Option<T>, consuming self, and discarding the error, if any.

let x: Result<u32, &str> = Ok(2);
assert_eq!(x.ok(), Some(2));

let x: Result<u32, &str> = Err("Nothing here");
assert_eq!(x.ok(), None);

String

string.trim_matches(chars_to_trim)

通过模式匹配，将前后缀中匹配中的重复内容删除

返回一个字符串切片。

参数可以是char，或者&[char]（即char的切片），或者函数或闭包。

The [pattern] can be a [char], a slice of [char]s, or a function or closure that determines if a character matches.

按字节的16进制值？

因为 &str 与 String 内部都是 Vec[u8] 封装的，所以在处理字符串的时候，可以直接通过写该字节的16进制表示的形式。

注意，写16进制表示的值需要使用\ 转义符号；

1
2
3

// I'm writing Rust!;
let byte_escape = "I'm writing \x52\x75\x73\x74!";
println!("What are you doing\x3F (\\x3F means ?) {}", byte_escape);

\x，表示是16进制；
\u，表示是Unicode；

raw 原始字符串表示

r"xxx" ，即表示原始字符串，此时Rust会输出引号内的内容
如果在raw string 中需要 "双引号，需要使用一对 # 符号来转义。
如果内容中需要#，则添加再添加一对 #符号；

字节字符串

byte string

`String::from_utf8(vec: Vec<u8>)`

通常用于将字节数组转换为 String 时使用。

pub fn from_utf8(vec: Vec) -> Result<String, FromUtf8Error>

Converts a vector of bytes to a String.

将一个 Vec<u8> 转换为 String。

A string (String) is made of bytes (u8), and a vector of bytes ([Vec<u8>]) is made of bytes, so this function converts between the two. Not all byte slices are valid Strings, however: String requires that it is valid UTF-8. from_utf8() checks to ensure that the bytes are valid UTF-8, and then does the conversion.

If you need a [&str] instead of a String, consider std::str::from_utf8.

注意：

因为不是所有的 [u8] (byte 切片) 都是 UTF-8 的，所以这个方法会做检查，如果不是 UTF-8 会报错。
Rust 中要求 String 必须是有效的 UT-8 编码。

thread

1
2
3

use std::thread;
use std::time::Duration;
thread::sleep(Duration::from_secs(5));

上面的代码作用，sleep 5s 。

需要两个标准库中的内容，std::thread 和 std::time::Duration。

Iter

map()

解析失败的情况下，会将error类型返回，并不会触发panic。

fn run_map() {
    let strings = vec!["tofu","93", "18"];
    // 这里当集合中有遇到解析错误的内容，则整体都会出错。
    let numbers: Vec<_>= strings.into_iter().map(|s| s.parse::<i32>()).collect();
    println!("Results: {:?}", numbers);
}

// 运行结果：
Results: [Err(ParseIntError { kind: InvalidDigit }), Ok(93), Ok(18)]

filter_map()

filter_map()结合 Ok() 将失败的内容过滤掉，只输出成功的内容。

fn main() {
    let strings = vec!["tofu", "93", "18"];
    let numbers: Vec<_> = strings
        .into_iter()
        .filter_map(|s| s.parse::<i32>().ok())
        .collect();
    println!("Results: {:?}", numbers);
}

Result 实现了 FromIter ，因此Vec<Result<T,E>> -> Result<Vec<T>, E>。

同时，一旦发现Result::Err，迭代就会停止。

1 2	// 两种写法作用相同； iter.find(f) ==> iter.filter(f).next()

Note that iter.find(f) is equivalent to iter.filter(f).next().

Box

在 stack 上所占内存空间为 32 bit（8 bytes）

使用Box::new() 封装的数据类型，在 stack 上占用的空间永远为 8 bytes。

集合或迭代器的常用方法

collect()

Because collect() is so general, it can cause problems with type inference. As such, collect() is one of the few times you’ll see the syntax affectionately known as the ‘turbofish’: ::<>. This helps the inference algorithm understand specifically which collection you’re trying to collect into.

collect() 方法会通过使用turbofish语法，即::<> ，来标注生成的数据类型。

扩展，是不是所有泛型方法都会这样使用？

尤其是泛型方法在处理 trait的时候。

Path

Path 结构体是用来表示底层操作系统的文件路径的，

注意：Path 使用 Vec<u8> 来存储的，并不是UTF-8字符串，所以使用Path转换为 &str 可能会失败，因为String 和&str 要求字符串为UTF-8编码的。

Path::new()
path.display()
- 为 Path 实现Display trait,
path.join()
- 路径拼接，不需要考虑分隔符
pathbuf.to_str()

File I/O

因为在执行文件I/O操作时，很多事情都可能出错，所以所有的file方法都返回io::Result<T>类型，这是Result<T, io::Error>的别名。

`open`

用只读模式打开一个文件。

`create`

create函数以只写（write-only）模式打开文件。

如果文件已经存在，旧的内容将被销毁。
否则，将创建一个新文件。

lines()

以迭代器的形式，返回一个文件的所有行。

1	fn lines(self) -> Lines<Self>

Lines 实现了 Iterator trait 。

`std::io::Stdout`

Stdout 需要的是 bytes，即u8

注意不是 String

当直接传入字面值时，注意使用 b"xxx" 的形式。

struct IncompleteUtf8 {
    bytes: [u8; 4],
    len: u8,
}

所以当需要利用 write!() / writeln!() 向其中写入数据的时候，需要使用 std::io::Write 而不是 std::fmt::Write

Child process

用于与操作系统命令进行交互。

process::Output结构表示一个已完成的子进程的输出，而process::Command结构是一个进程构建器。

std::Child结构表示一个正在运行的子进程，并公开stdin、stdout和stderr句柄，以便通过管道与底层进程进行交互。

child.wait() ，会阻塞当前线程，并等待 child 进程执行结束。

1
2
3

let output: Result<Output, Error> = Command::new("rustc").output();

let process: Result<Child, Error> = Command::new("rustc").spawn();

上面的 output() 方法是spawn() + piped() + wait()

* 常用组合

在测试中，我们常常将

std::process::Command 与

assert_cmd crate 进行组合使用

得益于 Rust 的 trait 特性，我们可以在自定义的 crate 中通过定义新 trait 从而扩展标准库中已有的 struct 的功能（函数/方法）。

这也是为什么进入第三方 crate 后，标准库中相关 struct 实例调用方法会增多的原因。

例如：

assert_cmd crate 就通过自定义 trait 并为 std::process::Command struct 实现该 trait，从而扩展了 Command 的方法。

在使用 VsCode 编码时，可以通过点击该 struct 上面展示的 N implementations 提示来查看本地有多少个 impl 实现块，这些内容实现了 struct ，或者为该struct 实现了指定 trati。

assert_cmd crate 用于通过命令行的形式执行指定二进制文件并判断执行结果。

`std::fs`

std::fs模块包含几个处理文件系统的函数。

常用 derive

Clone 解决 rust 所有权问题。

涉及到所有权

转移 move
复制
- copy （只有基本数据类型实现了 copy）
  - 栈
- clone
  - 堆

Debug 调试

Rust Cookbook

Argument Parsing - Rust Cookbook (rust-lang-nursery.github.io) 命令行解析 crate 的使用。

Test

The Book chapter on testing
API Guidelines on doc-testing

predicates crate 常用于测试中，与 assert 一起使用进行断言判断，具体如何使用？

当测试程序中需要读取文件信息时，有两种方法：

方法一，提前准备好一个文件，并放到指定位置；
方法二，创建临时文件
- 这个方法更灵活；

assert_fs crate ，可以用来创建临时文件。

proptest 用于单元测试

fuzzer 用于编写模糊测试发现边界错误；

处理错误信息

anyhow

自定义错误信息，建议使用 anyhow 第三方库。

可以自定义错误信息，同时保留原始错误

简单使用时，可以利用 anyhow::Context 和 anyhow::Result组合展示，Result 封装实现了标准库中的 trait，所以利用编译器自动转换。

可用于命令行的工具

`indicatif`

用户展示命令行中的进度条信息

更多内容查看文档和样例：

See the documentation and examples for more information.

clap

目前最受欢迎的命令行库

提供命令行参数解析，并带有友好的结果输出

std::fs::read_to_string() 的缺点：会将整个文件读进内存，当文件过大时，可能导致报错。

优化方法1，使用 std::io::BufReader 代替 read_to_string()；

如果需要运行时支持 --verbose 打印详细信息的功能，

使用，clap-verbosity-flag crate，

具体使用信息参考，clap-verbosity-flag

打印日志

`log`

rust 轻型日志框架，提供了抽象了实际日志实现的API，

日志级别：

error, warn, info, debug, and trace

error 最高优先级，trace 最低优先级，

设置低优先级可以打印高优先级的日志信息，

实际使用时，需要结合合适的日志实现 crate 一起使用。

log crate，提供以日志级别命名的宏，
其他日志适配器 crate，提供处理日志的方式，

日志适配器的使用很灵活，可以通过使用合适的日志适配器，将日志写到例如syslog文件，日志服务器等地方。

例如：

1
2
3

[dependencies]
log = "0.4.0"
env_logger = "0.8.4"  // 适合cli程序的日志适配器，方便在命名行设置日志级别

更多内容可以查看crate库中的描述：

log - crates.io: Rust Package Registry

序列化和反序列化

常规序列化

serde

目前最热的库，添加到 Cargo.toml 的方式如下，一般会添加一个 derive 的features，方便通过 derive 直接添加 Serialize 和 Deserialize

1	serde = { version = "1.0.136", features = ["derive"] }

二进制序列化

bincode

日期时间库

chrono

目前最流行的 Date and time 库

1	chrono = "0.4.19"

加密库

1	rust-crypto = "0.2.36"

版权声明： 本博客所有文章除特别声明外，著作权归作者所有。转载请注明出处！

标准库

宏

println! 使用注意