Pydantic 实践

1. 简介

官网
https://docs.pydantic.dev/latest/api/fields/

pydantic 库是一种常用的用于数据接口 schema 定义与检查的库。
通过 pydantic 库，我们可以更为规范地定义和使用数据接口，这对于大型项目的开发将会更为友好。
当然，除了 pydantic 库之外，像是 valideer 库、marshmallow 库、trafaret 库以及 cerberus 库等都可以完成相似的功能，但是相较之下，pydantic 库的执行效率会更加优秀一些。
因此，这里，我们仅针对 pydantic 库来介绍一下如何规范定义标准 schema 并使用。

安装部署

pip install pydantic
1

2. 使用方法

2.1. schema 基本定义

pydantic 库的数据定义方式是通过 BaseMode l类来进行定义的，所有基于pydantic的数据类型本质上都是一个BaseModel类，它最基本的使用方式如下：

from pydantic import BaseModel

class Person(BaseModel):
    name: str
1
2
3
4

2.2. schema 基本实例化

调用时，我们只需要对其进行实例化即可，实例化方法有以下几种：
直接传值

p = Person(name="Tom")
print(p.json()) # {"name": "Tom"}
1
2

通过字典传入

p = {"name": "Tom"}
p = Person(**p)
print(p.json()) # {"name": "Tom"}
1
2
3

通过其他的实例化对象传入

p2 = Person.copy(p)
print(p2.json()) # {"name": "Tom"}
1
2

2.3. 异常处理

当传入值错误的时候，pydantic就会抛出报错，例如：

Person(person="Tom")  # 定义为name，而非person
1

pydantic会抛出异常：

ValidationError: 1 validation errors for Person
name
  field required (type=value_error.missing)
1
2
3

2.4. 参数过滤

另一方面，如果传入值多于定义值时，BaseModel 也会自动对其进行过滤。如：

p = Person(name="Tom", gender="man", age=24)
print(p.json()) # {"name": "Tom"}
1
2

可以看到，额外的参数 gender 与 age 都被自动过滤了。
通过这种方式，数据的传递将会更为安全，但是，同样的，这也要求我们在前期的 schema 定义中必须要尽可能地定义完全。

2.5. 阴性类型转换

此外，pydantic 在数据传输时会直接进行数据类型转换，因此，如果数据传输格式错误，但是可以通过转换变换为正确的数据类型是，数据传输也可以成功，例如：

p = Person(name=123)
print(p.json()) # {"name": "123"}
1
2

3. pydantic 数据类型

3.1. 基本数据类型

下面，我们来看一下pydantic中的一些常用的基本类型。

from pydantic import BaseModel
from typing import Dict, List, Sequence, Set, Tuple

class Demo(BaseModel):
    a: int # 整型
    b: float # 浮点型
    c: str # 字符串
    d: bool # 布尔型
    e: List[int] # 整型列表
    f: Dict[str, int] # 字典型，key为str，value为int
    g: Set[int] # 集合
    h: Tuple[str, int] # 元组
1
2
3
4
5
6
7
8
9
10
11
12

实例：

from pydantic import BaseModel
import typing as t

class MyModel(BaseModel):
    name: str = "John"
    age: int = 25
    is_student: bool = True
    grades: t.List[float] = [80.5, 91.3, 76.8]

model = MyModel()
print(model.__dict__)
print(model.dict())
#field_types = {k: type(v) for k, v in model.__dict__.items()}
field_types = {k: type(v) for k, v in model.dict().items()}
print(field_types)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

运行结果：

root@056a3fddf212:/opt/config# python test_pydantic.py
{'name': 'John', 'age': 25, 'is_student': True, 'grades': [80.5, 91.3, 76.8]}
{'name': 'John', 'age': 25, 'is_student': True, 'grades': [80.5, 91.3, 76.8]}
{'name': <class 'str'>, 'age': <class 'int'>, 'is_student': <class 'bool'>, 'grades': <class 'list'>}
1
2
3
4

3.2. 高级数据结构

这里，我们给出一些较为复杂的数据类型的实现。

3.2.1. enum 数据类型

enum型数据类型我们可以通过enum库进行实现，给出一个例子如下：

from enum import Enum

class Gender(str, Enum):
    man = "man"
    women = "women"
1
2
3
4
5

3.2.2. 可选数据类型

如果一个数据类型不是必须的，可以允许用户在使用中不进行传入，则我们可以使用typing库中的Optional方法进行实现。

from typing import Optional
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: Optional[int]
1
2
3
4
5
6

需要注意的是，设置为可选之后，数据中仍然会有age字段，但是其默认值为None，即当不传入age字段时，Person仍然可以取到age，只是其值为None。例如：

p = Person(name="Tom")
print(p.json()) # {"name": "Tom", "age": None}
1
2

3.2.3. 数据默认值

上述可选数据类型方法事实上是一种较为特殊的给予数据默认值的方法，只是给其的默认值为None。这里，我们给出一些更加一般性的给出数据默认值的方法。

from pydantic import BaseModel

class Person(BaseModel):
    name: str
    gender: str = "man"

p = Person(name="Tom")
print(p.json()) # {"name": "Tom", "gender": "man"}
1
2
3
4
5
6
7
8

3.2.4. 允许多种数据类型

如果一个数据可以允许多种数据类型，我们可以通过 typing 库中的 Union 方法进行实现。

from typing import Union
from pydantic import BaseModel

class Time(BaseModel):
    time: Union[int, str]
        
t = Time(time=12345)
print(t.json()) # {"time": 12345}
t = Time(time = "2020-7-29")
print(t.json()) # {"time": "2020-7-29"}
1
2
3
4
5
6
7
8
9
10

3.2.5. 异名数据传递（Field）

假设我们之前已经定义了一个schema，将其中某一个参量命名为了A，但是在后续的定义中，我们希望这个量被命名为B，要如何完成这两个不同名称参量的相互传递呢？
我们可以通过 Field 方法来实现这一操作。

from pydantic import BaseModel, Field

class Password(BaseModel):
    password: str = Field(alias = "key")
1
2
3
4

则在传入时，我们需要用key关键词来传入password变量。

p = Password(key="123456")
print(p.json()) # {"password": "123456"}
1
2

遇到的问题

from pydantic import BaseModel, Field

class Password(BaseModel):
    password: str = Field(alias = "key")
    
# 正确输出
tmp_dict = {"key":"67890"}
p1 = Password(**tmp_dict)
print(p1)
# 报错，传入时password，只能用别名，不能用字段名
tmp_dict = {"password":"67890"}
p2 = Password(**tmp_dict)
print(p2)
1
2
3
4
5
6
7
8
9
10
11
12
13

运行结果

user@168cad4304b2:/opt/config# python3 test_pydantic.py
password='67890'
Traceback (most recent call last):
  File "test_pydantic.py", line 38, in <module>
    p2 = Password(**tmp_dict)
  File "pydantic/main.py", line 342, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Password
key
  field required (type=value_error.missing)
1
2
3
4
5
6
7
8
9

解决办法，使用allow_population_by_field_name：

from pydantic import BaseModel, Field

class Password(BaseModel):
    password: str = Field(alias = "key")

    class Config:
        allow_population_by_field_name = True
        
# 正确输出
tmp_dict = {"key":"67890"}
p1 = Password(**tmp_dict)
print(p1)
# 正确输出
tmp_dict = {"password":"67890"}
p2 = Password(**tmp_dict)
print(p2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

运行结果

user@168cad4304b2:/opt/config# python3 test_pydantic.py
password='67890'
password='67890'
1
2
3

参考https://www.cnpython.com/qa/1357031

Field 扩展

参考：https://blog.csdn.net/qq_27371025/article/details/123305565

from pydantic import BaseModel, Field

class Item(BaseModel):
    name: str
    description: str = Field(None,
                             title="The description of the item",
                             max_length=10)
    price: float = Field(...,
                         gt=0,
                         description="The price must be greater than zero")
    tax: float = None

a = Item(name="yo yo", price=22.0, tax=0.9)
print(a.dict())  # {'name': 'yo yo', 'description': None, 'price': 22.0, 'tax': 0.9}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

title 和 description 在 schema_json 输出的时候可以看到

print(Item.schema_json(indent=2))
"""
{
  "title": "Item",
  "type": "object",
  "properties": {
    "name": {
      "title": "Name",
      "type": "string"
    },
    "description": {
      "title": "The description of the item",
      "maxLength": 10,
      "type": "string"
    },
    "price": {
      "title": "Price",
      "description": "The price must be greater than zero",
      "exclusiveMinimum": 0,
      "type": "number"
    },
    "tax": {
      "title": "Tax",
      "type": "number"
    }
  },
  "required": [
    "name",
    "price"
  ]
}
"""
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

Field 相关参数

Field 可用于提供有关字段和验证的额外信息。

default：#	（位置参数）字段的默认值。由于Field替换了字段的默认值，因此第一个参数可用于设置默认值。使用省略号 ( …) 表示该字段为必填项。
default_factory：#	当该字段需要默认值时将被调用。除其他目的外，这可用于设置动态默认值。禁止同时设置default和default_factory。
alias：#	字段的别名
description：#	文档字符串
exclude：#	在转储（.dict和.json）实例时排除此字段
include：#	在转储（.dict和.json）实例时（仅）包含此字段
const：#	此参数必须与字段的默认值相同（如果存在）
gt：#	对于数值 ( int, float, )，向 JSON SchemaDecimal添加“大于”的验证和注释exclusiveMinimum
ge：#	对于数值，这将添加“大于或等于”的验证和minimumJSON 模式的注释
lt：#	对于数值，这会为exclusiveMaximumJSON Schema添加“小于”的验证和注释
le：#	对于数值，这将添加“小于或等于”的验证和maximumJSON 模式的注释
multiple_of：#	对于数值，这会multipleOf向 JSON Schema添加“多个”的验证和注释
max_digits：#	对于Decimal值，这将添加验证以在小数点内具有最大位数。它不包括小数点前的零或尾随的小数零。
decimal_places：#	对于Decimal值，这增加了一个验证，最多允许小数位数。它不包括尾随十进制零。
min_itemsminItems：#	对于列表值，这会向 JSON Schema添加相应的验证和注释
max_itemsmaxItems：#	对于列表值，这会向 JSON Schema添加相应的验证和注释
unique_itemsuniqueItems：#	对于列表值，这会向 JSON Schema添加相应的验证和注释
min_lengthminLength：#	对于字符串值，这会向 JSON Schema添加相应的验证和注释
max_lengthmaxLength：#	对于字符串值，这会向 JSON Schema添加相应的验证和注释
allow_mutation：#	一个布尔值，默认为True. TypeError当为 False 时，如果在实例上分配了字段，则该字段引发 a 。模型配置必须设置validate_assignment为True执行此检查。
regex：#	对于字符串值，这会添加从传递的字符串生成的正则表达式验证和patternJSON 模式的注释
repr：#	一个布尔值，默认为True. 当为 False 时，该字段应从对象表示中隐藏。
**：#	任何其他关键字参数（例如examples）将逐字添加到字段的架构中
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

3.2.6. 多级 schema 定义

这里，我们给出一个较为复杂的基于pydantic的schema定义实现样例。

from enum import Enum
from typing import List, Union
from datetime import date
from pydantic import BaseModel

class Gender(str, Enum):
    man = "man"
    women = "women"

class Person(BaseModel):
    name : str
    gender : Gender
        
class Department(BaseModel):
    name : str
    lead : Person
    cast : List[Person]
        
class Group(BaseModel):
    owner: Person
    member_list: List[Person] = []

class Company(BaseModel):
    name: str
    owner: Union[Person, Group]
    regtime: date
    department_list: List[Department] = []
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

需要注意的是，我们除了可以一步一步地实例化之外，如果我们已经有了一个完整的Company的内容字典，我们也可以一步到位地进行实例化。

sales_department = {
    "name": "sales",
    "lead": {"name": "Sarah", "gender": "women"},
    "cast": [
        {"name": "Sarah", "gender": "women"},
        {"name": "Bob", "gender": "man"},
        {"name": "Mary", "gender": "women"}
    ]
}

research_department = {
    "name": "research",
    "lead": {"name": "Allen", "gender": "man"},
    "cast": [
        {"name": "Jane", "gender": "women"},
        {"name": "Tim", "gender": "man"}
    ]
}

company = {
    "name": "Fantasy",
    "owner": {"name": "Victor", "gender": "man"},
    "regtime": "2020-7-23",
    "department_list": [
        sales_department,
        research_department
    ]
}

company = Company(**company)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

3.3. 数据检查

pydantic 本身提供了上述基本类型的数据检查方法，但是，除此之外，我们也可以使用 validator 和 config 方法来实现更为复杂的数据类型定义以及检查。

3.3.1. validator用法

使用validator方法，我们可以对数据进行更为复杂的数据检查。

import re
from pydantic import BaseModel, validator

class Password(BaseModel):
    password: str
        
    @validator("password")
    def password_rule(cls, password):
        def is_valid(password):
            if len(password) < 6 or len(password) > 20:
                return False
            if not re.search("[a-z]", password):
                return False
            if not re.search("[A-Z]", password):
                return False
            if not re.search("\d", password):
                return False
            return True
        if not is_valid(password):
            raise ValueError("password is invalid")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

通过这种方式，我们就可以额外对密码类进行格式要求，对其字符数以及内部字符进行要求。

3.3.2. Config 方法

如果要对BaseModel中的某一基本型进行统一的格式要求，我们还可以使用Config方法来实现。

from pydantic import BaseModel

class Password(BaseModel):
    password: str
        
    class Config:
        min_anystr_length = 6 # 令Password类中所有的字符串长度均要不少于6
        max_anystr_length = 20 # 令Password类中所有的字符串长度均要不大于20
1
2
3
4
5
6
7
8

4. 模型属性

参考：
https://blog.csdn.net/footless_bird/article/details/134183693
旧版：

dict()— 返回模型字段和值的字典
__dict__ 等同于dict()
json()— 返回一个 JSON 字符串表示字典
copy()— 返回模型的深层副本
parse_obj()— 如果对象不是字典，则用于将任何对象加载到模型中并进行错误处理的实用程序
parse_raw()— 用于加载多种格式字符串的实用程序
parse_field()— 类似于parse_raw()但适用于文件
from_orm() — 将数据从任意类加载到模型中
schema() — 返回将模型表示为 JSON 模式的字典
schema_json()— 返回 JSON 字符串表示形式schema()
construct()— 一种无需运行验证即可创建模型的类方法
__fields_set__— 初始化模型实例时设置的字段名称集
__fields__— 模型字段的字典
__config__ — 模型的配置类
1
2
3
4
5
6
7
8
9
10
11
12
13
14

新版：

类属性：
model_fields：它包含了模型中每个字段的 FieldInfo 对象，以字典的形式存储。FieldInfo 对象提供了有关字段的详细信息，如字段类型、默认值等。
类方法：
model_construct() ：允许在没有验证的情况下创建模型
model_validate() ：用于使用 model 对象或字典创建模型的实例
model_validate_json() ：用于使用 JSON 字符串创建模型的实例
类对象方法：
model_copy()：创建模型的一个副本。
model_dump()：将模型转换为字典，其中包含字段名称和对应的值。
model_dump_json()：将模型转换为 JSON 格式的字符串。
参考：https://blog.csdn.net/footless_bird/article/details/134183693
1
2
3
4
5
6
7
8
9
10
11

4.1. List 转 str

import json

my_list = [{"name": "Tom", "age": 25}, {"name": "Bob", "age": 30}]
json_list = json.dumps([dict(item) for item in my_list])
print(json_list)
# 输出: '[{"name": "Tom", "age": 25}, {"name": "Bob", "age": 30}]'
1
2
3
4
5
6

如果是 List 嵌套 List。可以使用一下方式

json_list = json.dumps([dict(item.dict()) for item in my_list])
1

参考：https://www.python100.com/html/89632.html

实例1 解析数据库返回值

参考：https://www.coder.work/article/7909602

from pydantic import BaseModel

class Ohlc(BaseModel):
    close_time: float
    open_time: float
    high_price: float
    low_price: float
    close_price: float

# 数据库返回值 等同于 list
data = [
  1495324800,
  1495336800,
  242460,
  231962,
  242460
]

ohlc = Ohlc(**{key: data[i] for i, key in enumerate(Ohlc.model_fields.keys())})
_keys = Ohlc.model_fields.keys()
_tup = tuple(_keys)
print(type(_keys), _keys)
print(type(_tup), _tup)
print(type(ohlc), ohlc)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

运行结果：

<class 'dict_keys'> dict_keys(['close_time', 'open_time', 'high_price', 'low_price', 'close_price'])
<class 'tuple'> ('close_time', 'open_time', 'high_price', 'low_price', 'close_price')
<class '__main__.Ohlc'> close_time=1495324800.0 open_time=1495336800.0 high_price=242460.0 low_price=231962.0 close_price=242460.0
1
2
3

实例2 排除字段|屏蔽字段

要排除字段，可以在 Field 中使用 exclude。
对比实例3,4

from typing import Optional, List, Dict, Union
from pydantic import BaseModel, Field

class BaseModelEx(BaseModel):
    def dictex(self, **kwargs):
        return super().dict(**kwargs)

class ItemBase(BaseModelEx):
    count: int = 0
    total: int = 0
    maximum: int = 0
    minimum: int = 0
    average: float = Field(exclude=True, title="val")

if __name__ == '__main__':
    item = ItemBase(average=1)
    print(item.dict())
    print(item.dictex())
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

运行结果：

{'count': 0, 'total': 0, 'maximum': 0, 'minimum': 0}
{'count': 0, 'total': 0, 'maximum': 0, 'minimum': 0}
1
2

实例3 排除字段|屏蔽字段

对比实例2,4

from typing import Optional, List, Dict, Union
from pydantic import BaseModel, Field

class BaseModelEx(BaseModel):
    # 根据 Config 屏蔽字段
    def dict_plus(self, **kwargs):
        include = getattr(self.Config, "include", set())
        if len(include) == 0:
            include = None

        exclude = getattr(self.Config, "exclude", set())
        if len(exclude) == 0:
            exclude = None

        return super().dict(include=include, exclude=exclude, **kwargs)

    # 重写 dict
    def dict(self, **kwargs):
        return super().dict(exclude={"total"}, **kwargs)

class ItemBase(BaseModelEx):
    count: int = 0
    total: int = 0
    maximum: int = 0
    minimum: int = 0
    # 根据 Field.exclude 屏蔽字段
    average: float = Field(exclude=True, title="val")

    class Config:
        exclude = {"maximum", "minimum"}
        
if __name__ == '__main__':
    #item = ItemBase(count=1, total=1, maximum=1, minimum=1, average=1)
    item = ItemBase(average=1)
    print(item.dict())
    print(item.dict_plus())
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

运行结果：

{'count': 0, 'maximum': 0, 'minimum': 0}
{'count': 0, 'total': 0}
1
2

实例4 排除字段|屏蔽字段

在 Config 类中添加排除参数的优点是，可以使用获取排除参数列表 print(User.ConfigEX.exclude)
对比实例2,3
参考：https://www.soinside.com/question/j3mMn2VMd7mpuZyq7NBuT3

from pydantic import BaseModel
from typing import Optional

class CustomBase(BaseModel):
    def model_dump_in(self, **kwargs):
        include = getattr(self.ConfigEX, "include", set())
        print("include", include)
        if len(include) == 0:
            include = None
		
		# set() 可加可不加，提供默认值
        exclude = getattr(self.ConfigEX, "exclude", set())
        #exclude = getattr(self.ConfigEX, "exclude")
        print("exclude", exclude)
        if len(exclude) == 0:
            exclude = None
        return super().model_dump(include=include, exclude=exclude, **kwargs)

class User(CustomBase):
    name :str = ...
    family :str = ...
    age : Optional[int] = 0

    class ConfigEX:
        exclude = {"family", "age"}	# 排除 family 和 age 两个字段
        #exclude = {"family"} 		# 排除 family 一个字段

u = User(**{"name":"milad","family":"vayani"})

print(u.model_dump_in())
print(u.model_dump())
print(u.model_dump(exclude=("name", "age")))	# 排除 name 和 age 两个字段
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

运行结果：

include set()
exclude {'family', 'age'}
{'name': 'milad'}	# 排除了 family 和 age 两个字段
{'name': 'milad', 'family': 'vayani', 'age': 0}
{'family': 'vayani'}
1
2
3
4
5

类继承：

import sys
from pydantic import BaseModel
from typing import Optional

class CustomBase(BaseModel):
    def dict_exclude(self, **kwargs):
        include = getattr(self.ConfigEX, "include", set())
        if len(include) == 0:
            include = None

        exclude = getattr(self.ConfigEX, "exclude", set())
        if len(exclude) == 0:
            exclude = None

        if sys.version_info >= (3,10,):
            return super().model_dump(include=include, exclude=exclude, **kwargs)
        else:
            return super().dict(include=include, exclude=exclude, **kwargs)

class Man(CustomBase):
    name :str = ...
    age : Optional[int] = 0

    class ConfigEX:
        exclude = {"family", "age"}     # 排除 family 和 age 两个字段
        #exclude = {"family"}           # 排除 family 一个字段

class User(Man):
    family :str = ...

m = Man(**{"name":"wocao"})
print(m.dict_exclude())

u = User(**{"name":"milad","family":"vayani"})
print(u.dict_exclude())
print(u.dict())
#print(u.model_dump(exclude=("name", "age")))   # 排除 name 和 age 两个字段
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

运行结果：

root@37c75797034c:/opt/config# python test_pydantic.py
{'name': 'wocao'}			# age 屏蔽掉了
{'name': 'milad'}
{'name': 'milad', 'age': 0, 'family': 'vayani'}
1
2
3
4

参考：
https://blog.csdn.net/codename_cys/article/details/107675748
https://www.cnblogs.com/dyl0/articles/16896330.html

相关阅读:
【企业管理战略方案设计】经营驱动与管理控制相结合
 OpenCV图像特征提取学习四，SIFT特征检测算法
 工程企业管理软件源码-综合型项目管理软件
 关于学什么和如何学的思考
 Linux的shell脚本在线转换为Windows的bat脚本
 JAVA学习-练习试用Java实现“区间和的个数“
GIS技巧之一键下载城市路网数据
 高防CDN与高防服务器：为什么高防服务器不能完全代替高防CDN
在商业领域如何开展数据挖掘
 iOS开发Swift-10-位置授权, cocoapods,API,天气获取,城市获取-和风天气App首页代码
原文地址：https://blog.csdn.net/cliffordl/article/details/134070532