Initial commit
This commit is contained in:
28
.env
Normal file
28
.env
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
DJANGO_SECRET_KEY=GkfYWT9b6xmPNmthn4ficOrYkLnW3dmg8FGgICRR8aymCZC7MNVF0vvzhVGDzxJkSfploEzlPFN8EEP9d9K2RbdQllDYtdNTODOzcCbIFVzfjQ8CTfWVAmi6qp33qi90
|
||||||
|
DJANGO_DEBUG=True
|
||||||
|
DJANGO_ALLOWED_HOSTS=*
|
||||||
|
|
||||||
|
# Database config
|
||||||
|
DB_ENGINE=sqlite
|
||||||
|
DB_NAME=pygoedge
|
||||||
|
DB_USER=root
|
||||||
|
DB_PASSWORD=
|
||||||
|
DB_HOST=127.0.0.1
|
||||||
|
DB_PORT=3306
|
||||||
|
|
||||||
|
# GoEdge settings
|
||||||
|
# 管理员API地址(HTTP接口地址)
|
||||||
|
GOEDGE_ADMIN_API_BASE_URL=https://backend.dooki.cloud
|
||||||
|
# 若未提供 AccessKeyId/AccessKey,可直接填令牌(不推荐长期使用)
|
||||||
|
# GOEDGE_ACCESS_TOKEN=
|
||||||
|
# 推荐:提供管理员 AccessKeyId/AccessKey,系统将自动获取并缓存令牌
|
||||||
|
GOEDGE_ACCESS_KEY_ID=pmP4MVmhYl8fgVpu
|
||||||
|
GOEDGE_ACCESS_KEY=6YWEYwNC6dVKXv009MniUu37H8XOwG2v
|
||||||
|
|
||||||
|
# 创建网站所需的默认节点集群ID(必须配置为有效集群ID)
|
||||||
|
# 可在 GoEdge 管理后台「集群管理」处查看并填写
|
||||||
|
GOEDGE_DEFAULT_NODE_CLUSTER_ID=
|
||||||
|
|
||||||
|
# System defaults
|
||||||
|
DEFAULT_FREE_TRAFFIC_GB_PER_DOMAIN=15
|
||||||
|
CNAME_TEMPLATE={sub}.cdn.example.com
|
||||||
23
.env.example
Normal file
23
.env.example
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
DJANGO_ENV=dev
|
||||||
|
|
||||||
|
# Django
|
||||||
|
DJANGO_SECRET_KEY=change-me
|
||||||
|
DJANGO_ALLOWED_HOSTS=localhost,127.0.0.1
|
||||||
|
DJANGO_DEBUG=True
|
||||||
|
DJANGO_CSRF_TRUSTED_ORIGINS=
|
||||||
|
|
||||||
|
# Database (default sqlite). To use MySQL, uncomment and fill:
|
||||||
|
# DB_ENGINE=mysql
|
||||||
|
# DB_NAME=pygoedge
|
||||||
|
# DB_USER=root
|
||||||
|
# DB_PASSWORD=your-password
|
||||||
|
# DB_HOST=127.0.0.1
|
||||||
|
# DB_PORT=3306
|
||||||
|
# DB_CONN_MAX_AGE=60
|
||||||
|
|
||||||
|
# GoEdge Admin API
|
||||||
|
GOEDGE_ADMIN_API_BASE_URL=https://your-goedge-host:port
|
||||||
|
GOEDGE_ACCESS_KEY_ID=your-access-key-id
|
||||||
|
GOEDGE_ACCESS_KEY=your-access-key
|
||||||
|
GOEDGE_ACCESS_TOKEN=
|
||||||
|
GOEDGE_DEFAULT_NODE_CLUSTER_ID=1
|
||||||
80
.trae/documents/推进 PyGoEdge 面板开发(Django+GoEdge).md
Normal file
80
.trae/documents/推进 PyGoEdge 面板开发(Django+GoEdge).md
Normal file
@@ -0,0 +1,80 @@
|
|||||||
|
## 当前状态快照
|
||||||
|
- 账户/认证:登录、注册、资料、密码修改、登录历史已就绪(accounts/views.py:33, 81, 121)。
|
||||||
|
- 域名接入:支持创建域名→调用 GoEdge 创建服务→生成 CNAME→DNS 检测(domains/views.py:42, 224)。
|
||||||
|
- 流量统计:管理命令拉取每日流量与带宽,写入 `DomainTrafficDaily`(domains/management/commands/pull_traffic.py:12;pull_daily_stats.py:11)。
|
||||||
|
- 套餐与功能:Plan 模型与运营面板的创建/编辑/开关/可见性(plans/models.py:4;admin_panel/views.py:154)。
|
||||||
|
- 账单:按自然月生成(套餐费+超量费),用户侧与运营侧账单列表/详情/导出/标记支付(billing/management/commands/generate_invoices.py:40;billing/views.py:14;admin_panel/views.py:406)。
|
||||||
|
- 系统设置:GoEdge API、默认配额、策略ID、集群ID、验证码与异常检测配置(core/models.py:4;admin_panel/forms.py:9;admin_panel/views.py:47)。
|
||||||
|
- GoEdge 封装:令牌管理、创建网站、查询每日统计/带宽、查询 webId/sslPolicyId、同步访问日志/WebSocket/WAF/HTTP3(core/goedge_client.py:20)。
|
||||||
|
- 模板:用户与运营页面基础完成,域名详情有统计与 GoEdge 状态占位(templates/domains/detail.html)。
|
||||||
|
|
||||||
|
## 目标增量(两周迭代)
|
||||||
|
1) 完善“域名功能设置”到 GoEdge 的同步覆盖,支持缓存、路径规则、重写、Header、HTTPS 跳转、Referer、防盗链、RemoteAddr 等通用能力。
|
||||||
|
2) 增加“访问日志浏览”页面(按日/小时/IP/关键词分页查询)。
|
||||||
|
3) 执行超量/未支付策略(停服或限速),打通账单与域名状态联动。
|
||||||
|
4) 域名详情的图表可视化与 UX 优化(近 7/30 天流量曲线,峰值标注)。
|
||||||
|
5) 健壮性提升:吞错点统一日志化;MySQL 连接与配置检测提示;操作日志完善。
|
||||||
|
6) 运行指南与定时任务:Windows 任务计划或 Linux cron 的命令与频率。
|
||||||
|
7) 测试用例:关键流程的单元/集成测试。
|
||||||
|
|
||||||
|
## 迭代拆解
|
||||||
|
### 迭代 1:GoEdge 同步能力完善(功能设置)
|
||||||
|
- 扩展 `GoEdgeClient`:
|
||||||
|
- `updateHTTPWebCache`(HTTPWebService/updateHTTPWebCache)映射 `cache_rules_json` 最小配置。
|
||||||
|
- `updateHTTPWebLocations` 映射 `page_rules_json` 中的路径/动作(最简:按前缀/正则路由到缓存/重写)。
|
||||||
|
- `updateHTTPWebRewriteRules` 支持简单重写(from→to)。
|
||||||
|
- `updateHTTPWebRequestHeader` / `updateHTTPWebResponseHeader` 支持常用 Header 注入(如 HSTS、Cache-Control)。
|
||||||
|
- `updateHTTPWebRedirectToHTTPS` 一键跳转 HTTPS(可由计划功能开关或自定义字段驱动)。
|
||||||
|
- `updateHTTPWebReferers` 防盗链白名单/黑名单的最小实现。
|
||||||
|
- `updateHTTPWebRemoteAddr` 按 X-Forwarded-For 解析真实客户端 IP(与 WAF/限速联动)。
|
||||||
|
- 扩展 `domains/views.py:340 domain_settings`:在保存后,根据 `webId/sslPolicyId` 调用上述接口进行同步;失败采用 `messages.warning` 并写入 `OperationLog`。
|
||||||
|
- 约定 JSON 形态:首版采取简化 schema(键名与 GoEdge 文档一致),并在后端进行最小校验与 Base64 编码。
|
||||||
|
|
||||||
|
### 迭代 2:访问日志浏览
|
||||||
|
- 新增路由 `/domains/<id>/logs/` 与模板:
|
||||||
|
- 使用 `HTTPAccessLogService/listHTTPAccessLogs`,支持 day/hourFrom/hourTo/size/reverse/ip/keyword 过滤。
|
||||||
|
- 提供翻页(基于 `requestId` 与 `hasMore`)。
|
||||||
|
- 展示字段:时间、域名、IP、方法、路径、状态码、字节数、UA、WAF 命中。
|
||||||
|
- 权限:仅域名所有者或 staff 可访问。
|
||||||
|
|
||||||
|
### 迭代 3:账单联动与策略执行
|
||||||
|
- 账单生成后(未支付):
|
||||||
|
- 运营面板增加“执行超量策略”入口:
|
||||||
|
- 停服:`HTTPWebService/updateHTTPWebShutdown` 开启;域名状态设为 `suspended`。
|
||||||
|
- 限速:`HTTPWebService/updateHTTPWebRequestLimit` 设置最小限速阈值(可从 `SystemSettings.default_overage_policy` 读取)。
|
||||||
|
- 支付后:自动/手动恢复(关闭 Shutdown 或解除 RequestLimit,域名状态回到 `active`)。
|
||||||
|
- 管理命令或后台任务:按周期扫描未支付账单并批量执行策略(安全保护:仅对 `active` 域名生效)。
|
||||||
|
|
||||||
|
### 迭代 4:可视化与 UX
|
||||||
|
- 域名详情页:
|
||||||
|
- 使用 Chart.js 绘制近 30 天流量曲线(GB)与峰值带宽(Mbps)双轴图;近 7 天摘要卡片。
|
||||||
|
- 清晰的 CNAME 生效提示与重试入口;GoEdge 状态分组卡片(访问日志/WebSocket/WAF/HTTP3)。
|
||||||
|
|
||||||
|
### 迭代 5:健壮性与日志化
|
||||||
|
- 替换 `except Exception: pass` 为日志记录:
|
||||||
|
- 例如 `core/goedge_client.py:388,407`、`admin_panel/views.py:101,126,330,357,392,451,466,482`、`billing/views.py:51`、`accounts/views.py:68`、`settings/base.py:68`。
|
||||||
|
- MySQL 检测:若 `DB_ENGINE=mysql` 而 `pymysql.install_as_MySQLdb()` 失败,提示更明确的安装/环境信息。
|
||||||
|
- 操作日志:补齐失败原因与结果摘要,便于审计回溯。
|
||||||
|
|
||||||
|
### 迭代 6:运行指南与定时任务
|
||||||
|
- 文档(运营面板“系统设置”页添加说明块):
|
||||||
|
- 每日 02:00 执行:`python manage.py pull_daily_stats --days 1`。
|
||||||
|
- 每月 01 日 01:00 执行:`python manage.py generate_invoices --overwrite`。
|
||||||
|
- Windows 任务计划/ Linux cron 配置示例与常见故障排查。
|
||||||
|
|
||||||
|
### 迭代 7:测试
|
||||||
|
- 单元测试:
|
||||||
|
- `GoEdgeClient` 令牌续期与错误处理(mock requests)。
|
||||||
|
- `check_cname_map` 与 `bytes_to_gb` 边界用例。
|
||||||
|
- 集成测试:
|
||||||
|
- 域名接入→DNS 检测→流量写入→账单生成 →(策略执行)闭环。
|
||||||
|
|
||||||
|
## 验收与交付
|
||||||
|
- 域名设置同步:开启功能后在 GoEdge 生效(访问日志、WAF、WebSocket、HTTP/3、缓存、重写、Header、HTTPS 跳转、Referer)。
|
||||||
|
- 访问日志浏览:指定日期范围能分页查看,并支持关键词/IP 筛选。
|
||||||
|
- 账单联动:未支付账单可一键对域名执行停服/限速;支付后恢复。
|
||||||
|
- 图表与 UX:域名详情页数据可视化清晰,操作反馈明确。
|
||||||
|
- 健壮性:关键吞错点具备异常日志,配置错误有可见提示。
|
||||||
|
- 测试:核心流程覆盖并可运行通过。
|
||||||
|
|
||||||
|
若确认以上计划,我将按迭代顺序开始实现并提交改动(保持 Django+Bootstrap 现有风格,复用已建模型与模板结构)。
|
||||||
451
.trae/rules/project_rules.md
Normal file
451
.trae/rules/project_rules.md
Normal file
@@ -0,0 +1,451 @@
|
|||||||
|
# 项目开发需求说明文档(Django + Bootstrap + MySQL)
|
||||||
|
|
||||||
|
> 本文档用于告诉 AI Agent:需要开发一个 **类似 Cloudflare / 百度云加速的域名加速面板**,
|
||||||
|
> 使用 **Python + Django(非前后端分离)+ Bootstrap + MySQL**,并集成 GoEdge 管理员 API 作为底层 CDN 引擎。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. 项目目标
|
||||||
|
|
||||||
|
- 搭建一个 **按域名计费的 CDN 加速控制面板**,用户以 CNAME 方式接入域名。
|
||||||
|
- 每个域名默认有 **每月 15GB 免费流量**(该值应可在管理后台配置,并可对用户/域名单独调整)。
|
||||||
|
- 支持多种 **套餐(Plan)**,不同套餐控制:
|
||||||
|
- 每个域名的免费/包含流量额度
|
||||||
|
- 每 GB 超量单价
|
||||||
|
- 可用功能(WAF、日志、HTTP/3、访问控制、缓存高级规则等)
|
||||||
|
- 系统提供:
|
||||||
|
- 用户注册登录、域名接入管理、流量统计展示
|
||||||
|
- 套餐订购与变更、账单与支付(先定义结构,支付接入可预留)
|
||||||
|
- 强大的管理后台(基于 Django Admin + 自定义页面),可灵活配置套餐、免费额度、功能开关、超量策略等。
|
||||||
|
- 后端使用 Django 模板 + Bootstrap 完成 UI,**不做前后端分离**。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. 技术栈与基础要求
|
||||||
|
|
||||||
|
- **后端框架**:Python 3.x + Django 4.x(或当前 LTS)
|
||||||
|
- **前端框架**:Django Template + Bootstrap 5(或当前版本)
|
||||||
|
- **数据库**:MySQL 8.x
|
||||||
|
- **CDN 引擎集成**:通过 HTTP 调用 **GoEdge 管理员 API**(使用 X-Edge-Access-Token)
|
||||||
|
- **用户认证**:
|
||||||
|
- 使用 Django 内置认证体系扩展(或自定义 User 模型)
|
||||||
|
- 支持邮箱 / 用户名登录
|
||||||
|
- **语言**:
|
||||||
|
- 首版可仅中文界面,保留 i18n 结构,便于未来多语言。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. 核心业务概念与模型
|
||||||
|
|
||||||
|
### 3.1 用户(User)
|
||||||
|
|
||||||
|
- 平台登录账号(客户)。
|
||||||
|
- 一个用户可以拥有多个域名。
|
||||||
|
|
||||||
|
### 3.2 域名(Domain / Site / Zone)
|
||||||
|
|
||||||
|
- 用户添加的待加速域名,如 `example.com`。
|
||||||
|
- 用户通过将其子域名(如 `www.example.com`)CNAME 到平台提供的加速域名接入。
|
||||||
|
- 每个域名:
|
||||||
|
- 绑定一个当前套餐(Plan)
|
||||||
|
- 拥有每月免费/包含流量额度
|
||||||
|
- 有一套独立的功能配置(WAF 开关、缓存规则等)
|
||||||
|
|
||||||
|
### 3.3 套餐(Plan)
|
||||||
|
|
||||||
|
- 按域名计费的套餐模版。
|
||||||
|
- 定义:
|
||||||
|
- 基础月费:每个域名每月多少钱
|
||||||
|
- 包含流量:每个域名每月包含多少 GB
|
||||||
|
- 超量单价:每 GB 超出后的费用
|
||||||
|
- 功能开关:是否支持 WAF、日志下载、HTTP/3、WebSocket、自定义规则等
|
||||||
|
- 可见/可购状态:是否显示给普通用户新购/升级、是否仅管理员可用
|
||||||
|
|
||||||
|
### 3.4 域名套餐实例(DomainPlan / Subscription)
|
||||||
|
|
||||||
|
- 某个域名当前使用的套餐实例。
|
||||||
|
- 记录:
|
||||||
|
- `domain` + `plan`
|
||||||
|
- 当前计费周期(开始时间/结束时间)
|
||||||
|
- 实际生效的配额(可在 Plan 基础上对该域名单独 override)
|
||||||
|
- 自定义价格/流量/功能
|
||||||
|
|
||||||
|
### 3.5 流量使用记录(Traffic Usage)
|
||||||
|
|
||||||
|
- 以 **域名 + 日期** 为粒度记录流量数据。
|
||||||
|
- 用于:
|
||||||
|
- 展示流量图表
|
||||||
|
- 计算是否超过“免费额度”和“套餐包含流量”
|
||||||
|
- 计算超量费用
|
||||||
|
|
||||||
|
### 3.6 账单 / 订单(Invoice / Order)
|
||||||
|
|
||||||
|
- 对应一个计费周期或某一段用量的费用结果。
|
||||||
|
- 包含:
|
||||||
|
- 基础套餐费用
|
||||||
|
- 超量流量费用
|
||||||
|
- 优惠/调整
|
||||||
|
- 是否已支付
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. 功能模块(用户侧)
|
||||||
|
|
||||||
|
### 4.1 用户注册 / 登录 / 账号管理
|
||||||
|
|
||||||
|
**需求:**
|
||||||
|
|
||||||
|
- 用户可以通过邮箱注册账号。
|
||||||
|
- 登录支持:
|
||||||
|
- 用户名 + 密码 或 邮箱 + 密码
|
||||||
|
- 验证码系统(预留,可以用简单图形验证码)。
|
||||||
|
- 用户中心页面:
|
||||||
|
- 修改密码
|
||||||
|
- 查看登录历史(最近若干登录 IP、时间)
|
||||||
|
- 管理个人信息(名称、联系方式)
|
||||||
|
|
||||||
|
### 4.2 域名列表 & 概览
|
||||||
|
|
||||||
|
- 页面:`/domains/`
|
||||||
|
- 展示内容:
|
||||||
|
- 域名名称
|
||||||
|
- 当前套餐名称
|
||||||
|
- 本月已用流量 / 当前月度可用流量
|
||||||
|
- 状态(未接入 / DNS 未生效 / 正常 / 超额 / 暂停)
|
||||||
|
- 支持操作:
|
||||||
|
- 添加新域名
|
||||||
|
- 进入域名详情
|
||||||
|
- 删除/停用域名(逻辑删除或状态标记)
|
||||||
|
|
||||||
|
### 4.3 添加域名流程(CNAME 接入向导)
|
||||||
|
|
||||||
|
页面流程(可以拆成一个页面多步骤):
|
||||||
|
|
||||||
|
1. 用户输入域名(`example.com`)
|
||||||
|
2. 选择接入子域名(可填多个,如 `www.example.com`、`static.example.com`)
|
||||||
|
3. 填写源站信息:
|
||||||
|
- 源站地址(IP 或 源站域名)
|
||||||
|
- 回源协议(HTTP/HTTPS)
|
||||||
|
- 回源端口
|
||||||
|
4. 选择套餐(默认 Free 套餐,支持可选升级)
|
||||||
|
5. 创建后:
|
||||||
|
- 后端调用 GoEdge Admin API 创建对应的 Server 与配置;
|
||||||
|
- 生成 CNAME 目标,例如:`www.example.com.cdn.platform.com`
|
||||||
|
6. 页面展示:
|
||||||
|
- 需要添加的 DNS 记录列表
|
||||||
|
- 状态为“等待 DNS 配置”
|
||||||
|
- 提供“立即检测 DNS 生效情况”的按钮(后台通过 DNS 查询判断)
|
||||||
|
|
||||||
|
### 4.4 域名详情页
|
||||||
|
|
||||||
|
- 路径示例:`/domains/<domain_id>/`
|
||||||
|
- 展示模块:
|
||||||
|
|
||||||
|
1. **概览**
|
||||||
|
- 当前套餐信息
|
||||||
|
- 本月流量进度条(已用 vs 总额度)
|
||||||
|
- 超量状态提示(如已超过套餐流量 X GB)
|
||||||
|
|
||||||
|
2. **统计(流量 / 请求)**
|
||||||
|
- 近 24 小时 / 7 天 / 30 天:
|
||||||
|
- 流量图(MB/GB)
|
||||||
|
- 请求数
|
||||||
|
- HTTP 状态码分布(200/404/5xx)
|
||||||
|
- 按地区/运营商分布(若 GoEdge 支持)
|
||||||
|
|
||||||
|
3. **接入信息**
|
||||||
|
- 当前接入的子域名和对应的 CNAME 记录
|
||||||
|
- DNS 状态
|
||||||
|
|
||||||
|
4. **功能设置(视套餐决定是否可见/可编辑)**
|
||||||
|
- SSL/TLS 设置
|
||||||
|
- 缓存规则
|
||||||
|
- WAF 总开关
|
||||||
|
- IP 黑白名单
|
||||||
|
- 路径规则(简单页面规则)
|
||||||
|
|
||||||
|
> 所有修改类操作都通过 Django 视图调用 GoEdge Admin API 更新配置。
|
||||||
|
|
||||||
|
### 4.5 套餐列表 & 升级
|
||||||
|
|
||||||
|
- 页面:`/plans/`
|
||||||
|
- 展示所有 **对用户公开的套餐**,包括:
|
||||||
|
- 套餐名称、月费
|
||||||
|
- 每域名包含流量
|
||||||
|
- 超量单价
|
||||||
|
- 功能对比(表格形式)
|
||||||
|
- 在域名详情页可以:
|
||||||
|
- 点击「升级套餐」
|
||||||
|
- 跳转到套餐列表,选中一个 Plan → 后端进行升级逻辑(仅变更套餐记录,计费逻辑后续执行)
|
||||||
|
|
||||||
|
> 支付可先不做实际第三方接口,预留订单记录和“标记为已支付”的操作。
|
||||||
|
|
||||||
|
### 4.6 账单 / 费用
|
||||||
|
|
||||||
|
- 页面:`/billing/`
|
||||||
|
- 用户可查看:
|
||||||
|
- 历史账单列表:时间区间、金额、状态
|
||||||
|
- 每个账单详情:
|
||||||
|
- 流量用量明细(按域名)
|
||||||
|
- 超量部分计算方式
|
||||||
|
|
||||||
|
> 样式可以简单先实现,支付流程在后端逻辑中预留钩子。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. 功能模块(管理员后台)
|
||||||
|
|
||||||
|
### 5.1 使用 Django Admin 的基础管理
|
||||||
|
|
||||||
|
- 启用 Django Admin,并注册模型:
|
||||||
|
- 用户管理
|
||||||
|
- 域名管理
|
||||||
|
- 套餐管理
|
||||||
|
- 域名套餐实例(DomainPlan/Subscription)
|
||||||
|
- 流量记录
|
||||||
|
- 账单/订单
|
||||||
|
|
||||||
|
### 5.2 自定义管理面板(独立于 admin site 的运营后台)
|
||||||
|
|
||||||
|
> 目的:提供一个比 Django Admin 更贴合业务的运营控制台。
|
||||||
|
|
||||||
|
可能路径:`/admin-panel/`(与 Django admin 区分)
|
||||||
|
|
||||||
|
功能:
|
||||||
|
|
||||||
|
#### 5.2.1 全局系统设置
|
||||||
|
|
||||||
|
- 例如 `SystemSettings` 表,对应的配置项:
|
||||||
|
- `default_free_traffic_gb_per_domain`(默认每域名每月免费流量,默认 15)
|
||||||
|
- 默认超量策略(超量是否允许访问、是否限速、是否自动停服)
|
||||||
|
- GoEdge API 地址 + 管理员 `X-Edge-Access-Token`
|
||||||
|
- 默认 CNAME 目标域名模板(如 `<subdomain>.cdn.xxx.com`)
|
||||||
|
|
||||||
|
#### 5.2.2 套餐管理(Plans)
|
||||||
|
|
||||||
|
- 新增/编辑/删除套餐:
|
||||||
|
- 套餐名称、描述
|
||||||
|
- 月费(每域名月费)
|
||||||
|
- 每域名包含流量(GB)
|
||||||
|
- 超量单价(元/GB)
|
||||||
|
- 功能开关(布尔或 JSON 配置):
|
||||||
|
- 启用 WAF(是/否)
|
||||||
|
- 是否支持自定义 SSL 证书
|
||||||
|
- 是否支持实时日志
|
||||||
|
- 是否支持 HTTP/3
|
||||||
|
- 是否支持 WebSocket
|
||||||
|
- 是否支持自定义规则(页面规则数量上限)
|
||||||
|
- 可见性:
|
||||||
|
- 是否对普通用户展示
|
||||||
|
- 是否允许新购
|
||||||
|
- 是否允许续费
|
||||||
|
- 是否只允许管理员手动分配
|
||||||
|
|
||||||
|
#### 5.2.3 用户与域名管理
|
||||||
|
|
||||||
|
- 用户列表:
|
||||||
|
- 查看用户基本信息、注册时间、状态
|
||||||
|
- 查看该用户下所有域名
|
||||||
|
- 域名列表:
|
||||||
|
- 按域名搜索、按用户过滤
|
||||||
|
- 查看当前使用的套餐、用量、状态
|
||||||
|
- 操作:
|
||||||
|
- 手动切换套餐
|
||||||
|
- 为该域名增加 **额外的免费流量额度**(本周期)
|
||||||
|
- 为该域名设置 **自定义超量单价**
|
||||||
|
- 暂停/恢复域名
|
||||||
|
- 查看与 GoEdge 的映射关系(serverId / clusterId 等)
|
||||||
|
|
||||||
|
#### 5.2.4 免费流量与配额调整
|
||||||
|
|
||||||
|
- 支持对以下层级做手动调整:
|
||||||
|
1. 全局默认(所有新域名默认免费 15GB)
|
||||||
|
2. 某个套餐的包含流量
|
||||||
|
3. 某个用户的默认免费额度(该用户新添加的域名都使用这个值)
|
||||||
|
4. 某个具体域名在当前周期的 **个人额外赠送流量**
|
||||||
|
|
||||||
|
- 在管理面板中应有页面:
|
||||||
|
- 显示每个域名:
|
||||||
|
- 基础套餐流量
|
||||||
|
- 全局默认调整
|
||||||
|
- 用户层级额外
|
||||||
|
- 域名本身额外
|
||||||
|
- 以及最终计算出的“本周期可用总免费流量”。
|
||||||
|
|
||||||
|
#### 5.2.5 计费与账单管理
|
||||||
|
|
||||||
|
- 账单列表:
|
||||||
|
- 账单编号、用户、计费周期、总金额、是否已支付
|
||||||
|
- 支持:
|
||||||
|
- 手动标记账单为已支付
|
||||||
|
- 重新生成某个周期的账单(用于调试或运营)
|
||||||
|
- 对超量部分进行 **人工减免或加价**(在账单上增加调账项目)
|
||||||
|
|
||||||
|
#### 5.2.6 运维监控 & 风控(可分阶段实现)
|
||||||
|
|
||||||
|
- 简单版本:
|
||||||
|
- 全平台每日总流量曲线
|
||||||
|
- 域名按流量排序的 Top N 列表
|
||||||
|
- 进一步:
|
||||||
|
- 配置“异常流量检测阈值”(例如当日流量超过过去 7 日平均的 3 倍)→ 标记异常域名
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. 数据库设计(概要说明)
|
||||||
|
|
||||||
|
> 这里仅做表与字段的**概念说明**,具体字段类型和索引由 AI Agent 在实现时按 MySQL 习惯设计。
|
||||||
|
|
||||||
|
### 6.1 用户相关
|
||||||
|
|
||||||
|
- `users`(可扩展 Django User 或自定义)
|
||||||
|
- id
|
||||||
|
- username
|
||||||
|
- email
|
||||||
|
- password_hash
|
||||||
|
- is_active
|
||||||
|
- is_staff / is_superuser
|
||||||
|
- created_at, updated_at
|
||||||
|
|
||||||
|
- `user_profile`(可选扩展)
|
||||||
|
- user_id (FK)
|
||||||
|
- display_name
|
||||||
|
- contact_phone
|
||||||
|
- default_free_traffic_gb_per_domain_override(用户级默认免费流量覆盖值)
|
||||||
|
|
||||||
|
### 6.2 域名与接入
|
||||||
|
|
||||||
|
- `domains`
|
||||||
|
- id
|
||||||
|
- user_id (FK)
|
||||||
|
- name(`example.com`)
|
||||||
|
- status(pending_dns / active / suspended / deleted)
|
||||||
|
- current_plan_id (FK → plans)
|
||||||
|
- current_cycle_start
|
||||||
|
- current_cycle_end
|
||||||
|
- cname_targets(JSON,子域名与 CNAME 目标映射)
|
||||||
|
- origin_config(JSON:源站地址/端口/协议等)
|
||||||
|
- edge_server_id(在 GoEdge 中的 serverId)
|
||||||
|
- created_at, updated_at
|
||||||
|
|
||||||
|
- `domain_overrides`(域名级别的配额/价格/功能覆盖,可合并到 domains 里)
|
||||||
|
- domain_id (FK)
|
||||||
|
- extra_free_traffic_gb_current_cycle(本周期额外赠送的免费流量)
|
||||||
|
- custom_overage_price_per_gb
|
||||||
|
- custom_features (JSON)
|
||||||
|
- note
|
||||||
|
|
||||||
|
### 6.3 套餐 & 功能
|
||||||
|
|
||||||
|
- `plans`
|
||||||
|
- id
|
||||||
|
- name
|
||||||
|
- description
|
||||||
|
- billing_mode(per_domain_monthly)
|
||||||
|
- base_price_per_domain
|
||||||
|
- included_traffic_gb_per_domain
|
||||||
|
- overage_price_per_gb
|
||||||
|
- allow_overage(是否允许超量计费)
|
||||||
|
- is_active
|
||||||
|
- is_public
|
||||||
|
- allow_new_purchase
|
||||||
|
- allow_renew
|
||||||
|
- created_at, updated_at
|
||||||
|
|
||||||
|
- `plan_features`(或直接用 JSON 存在 plans 中)
|
||||||
|
- plan_id
|
||||||
|
- key(如 `waf_enabled`、`http3_enabled` 等)
|
||||||
|
- value(bool/int/string)
|
||||||
|
|
||||||
|
### 6.4 流量统计与计费
|
||||||
|
|
||||||
|
- `domain_traffic_daily`
|
||||||
|
- id
|
||||||
|
- domain_id (FK)
|
||||||
|
- day(日期)
|
||||||
|
- bytes
|
||||||
|
- peak_bandwidth_mbps
|
||||||
|
- created_at
|
||||||
|
|
||||||
|
- `invoices`
|
||||||
|
- id
|
||||||
|
- user_id
|
||||||
|
- period_start
|
||||||
|
- period_end
|
||||||
|
- amount_plan_total(该周期所有域名套餐费用之和)
|
||||||
|
- amount_overage_total
|
||||||
|
- amount_adjustment(手工调整)
|
||||||
|
- amount_total
|
||||||
|
- status(unpaid / paid / cancelled)
|
||||||
|
- created_at, paid_at
|
||||||
|
|
||||||
|
- `invoice_items`
|
||||||
|
- id
|
||||||
|
- invoice_id (FK)
|
||||||
|
- domain_id (FK)
|
||||||
|
- description(如“基础套餐费用”、“超量流量费用”等)
|
||||||
|
- quantity(如 GB 数)
|
||||||
|
- unit_price
|
||||||
|
- amount
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. 业务流程说明(供实现时参考)
|
||||||
|
|
||||||
|
### 7.1 域名接入流程
|
||||||
|
|
||||||
|
1. 用户登录 → 添加域名 → 填写源站 → 选择套餐 → 创建域名记录 + 调 GoEdge 创建 Server
|
||||||
|
2. 系统生成 CNAME 目标 → 显示给用户
|
||||||
|
3. 用户配置 DNS → 前端提供“检测”按钮 → 后端通过 DNS 查询判断已生效
|
||||||
|
4. DNS 生效后,域名状态从 `pending_dns` → `active`,开始正式计流量。
|
||||||
|
|
||||||
|
### 7.2 计费周期与流量统计(按月)
|
||||||
|
|
||||||
|
1. 每天定时任务,从 GoEdge 统计 API 拉各域名的流量/请求数据,写入 `domain_traffic_daily`。
|
||||||
|
2. 每个域名有自己的计费周期(默认按自然月,也可按首次接入日算周期)。
|
||||||
|
3. 周期结束时,后台任务:
|
||||||
|
- 计算该周期每个域名总流量
|
||||||
|
- 对比“免费/套餐包含流量”:
|
||||||
|
- 未超出 → 只收套餐费
|
||||||
|
- 超出 → 套餐费 + 超量*单价
|
||||||
|
- 生成 `invoice` 和 `invoice_items` 记录。
|
||||||
|
|
||||||
|
4. 对于未支付账单,可根据策略:
|
||||||
|
- 在下一周期开始前停止所有域名(状态为 `suspended`)
|
||||||
|
- 或只允许免费额度,超出部分阻断。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. 界面与前端要求(Django + Bootstrap)
|
||||||
|
|
||||||
|
- 全站使用 Django 模板 + Bootstrap 布局:
|
||||||
|
- 顶部导航:登录用户信息、域名列表入口、套餐、账单入口
|
||||||
|
- 左侧在“域名详情页”使用类似 Cloudflare 的菜单列表(概览 / 分析 / SSL / 缓存 / 防火墙 / 设置)
|
||||||
|
- 表格和列表使用 Bootstrap Table 风格。
|
||||||
|
- 表单使用 CSRF 保护。
|
||||||
|
- 操作类按钮(如“升级套餐”、“暂停域名”)需二次确认弹窗。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. 安全与权限
|
||||||
|
|
||||||
|
- 使用 Django Auth + `login_required` 装饰器保护用户页面。
|
||||||
|
- 管理后台分两套:
|
||||||
|
- Django Admin(仅超级用户访问)
|
||||||
|
- 自定义运营面板 `/admin-panel/`(限 staff 与具有相应权限的用户)
|
||||||
|
- 所有与 GoEdge API 交互的操作仅在后端进行,前端不暴露任何 GoEdge Access Token。
|
||||||
|
- 重要操作(删除域名、切换套餐、调账等)写「操作日志表」,记录操作人、时间、变更内容。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. 可扩展点(后续版本预留)
|
||||||
|
|
||||||
|
- 支付功能集成(支付宝 / 微信 / Stripe 等)基于易支付:
|
||||||
|
- 暂时只定义订单、账单结构,后续接入支付网关。
|
||||||
|
- 工单系统(Ticket)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
> **AI Agent 实施重点:**
|
||||||
|
> - 先完成基础模型(用户 / 域名 / 套餐 / 流量 / 账单)和管理后台结构。
|
||||||
|
> - 实现「添加域名 → 生成 CNAME → 拉取流量 → 生成账单」完整闭环。
|
||||||
|
> - 前端使用 Django 模板 + Bootstrap 提供一个简单但清晰的 Cloudflare 风格面板。
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
pip
|
||||||
247
.venv/Lib/site-packages/asgiref-3.10.0.dist-info/METADATA
Normal file
247
.venv/Lib/site-packages/asgiref-3.10.0.dist-info/METADATA
Normal file
@@ -0,0 +1,247 @@
|
|||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: asgiref
|
||||||
|
Version: 3.10.0
|
||||||
|
Summary: ASGI specs, helper code, and adapters
|
||||||
|
Home-page: https://github.com/django/asgiref/
|
||||||
|
Author: Django Software Foundation
|
||||||
|
Author-email: foundation@djangoproject.com
|
||||||
|
License: BSD-3-Clause
|
||||||
|
Project-URL: Documentation, https://asgi.readthedocs.io/
|
||||||
|
Project-URL: Further Documentation, https://docs.djangoproject.com/en/stable/topics/async/#async-adapter-functions
|
||||||
|
Project-URL: Changelog, https://github.com/django/asgiref/blob/master/CHANGELOG.txt
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Environment :: Web Environment
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: License :: OSI Approved :: BSD License
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP
|
||||||
|
Requires-Python: >=3.9
|
||||||
|
License-File: LICENSE
|
||||||
|
Requires-Dist: typing_extensions>=4; python_version < "3.11"
|
||||||
|
Provides-Extra: tests
|
||||||
|
Requires-Dist: pytest; extra == "tests"
|
||||||
|
Requires-Dist: pytest-asyncio; extra == "tests"
|
||||||
|
Requires-Dist: mypy>=1.14.0; extra == "tests"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
asgiref
|
||||||
|
=======
|
||||||
|
|
||||||
|
.. image:: https://github.com/django/asgiref/actions/workflows/tests.yml/badge.svg
|
||||||
|
:target: https://github.com/django/asgiref/actions/workflows/tests.yml
|
||||||
|
|
||||||
|
.. image:: https://img.shields.io/pypi/v/asgiref.svg
|
||||||
|
:target: https://pypi.python.org/pypi/asgiref
|
||||||
|
|
||||||
|
ASGI is a standard for Python asynchronous web apps and servers to communicate
|
||||||
|
with each other, and positioned as an asynchronous successor to WSGI. You can
|
||||||
|
read more at https://asgi.readthedocs.io/en/latest/
|
||||||
|
|
||||||
|
This package includes ASGI base libraries, such as:
|
||||||
|
|
||||||
|
* Sync-to-async and async-to-sync function wrappers, ``asgiref.sync``
|
||||||
|
* Server base classes, ``asgiref.server``
|
||||||
|
* A WSGI-to-ASGI adapter, in ``asgiref.wsgi``
|
||||||
|
|
||||||
|
|
||||||
|
Function wrappers
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
These allow you to wrap or decorate async or sync functions to call them from
|
||||||
|
the other style (so you can call async functions from a synchronous thread,
|
||||||
|
or vice-versa).
|
||||||
|
|
||||||
|
In particular:
|
||||||
|
|
||||||
|
* AsyncToSync lets a synchronous subthread stop and wait while the async
|
||||||
|
function is called on the main thread's event loop, and then control is
|
||||||
|
returned to the thread when the async function is finished.
|
||||||
|
|
||||||
|
* SyncToAsync lets async code call a synchronous function, which is run in
|
||||||
|
a threadpool and control returned to the async coroutine when the synchronous
|
||||||
|
function completes.
|
||||||
|
|
||||||
|
The idea is to make it easier to call synchronous APIs from async code and
|
||||||
|
asynchronous APIs from synchronous code so it's easier to transition code from
|
||||||
|
one style to the other. In the case of Channels, we wrap the (synchronous)
|
||||||
|
Django view system with SyncToAsync to allow it to run inside the (asynchronous)
|
||||||
|
ASGI server.
|
||||||
|
|
||||||
|
Note that exactly what threads things run in is very specific, and aimed to
|
||||||
|
keep maximum compatibility with old synchronous code. See
|
||||||
|
"Synchronous code & Threads" below for a full explanation. By default,
|
||||||
|
``sync_to_async`` will run all synchronous code in the program in the same
|
||||||
|
thread for safety reasons; you can disable this for more performance with
|
||||||
|
``@sync_to_async(thread_sensitive=False)``, but make sure that your code does
|
||||||
|
not rely on anything bound to threads (like database connections) when you do.
|
||||||
|
|
||||||
|
|
||||||
|
Threadlocal replacement
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
This is a drop-in replacement for ``threading.local`` that works with both
|
||||||
|
threads and asyncio Tasks. Even better, it will proxy values through from a
|
||||||
|
task-local context to a thread-local context when you use ``sync_to_async``
|
||||||
|
to run things in a threadpool, and vice-versa for ``async_to_sync``.
|
||||||
|
|
||||||
|
If you instead want true thread- and task-safety, you can set
|
||||||
|
``thread_critical`` on the Local object to ensure this instead.
|
||||||
|
|
||||||
|
|
||||||
|
Server base classes
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Includes a ``StatelessServer`` class which provides all the hard work of
|
||||||
|
writing a stateless server (as in, does not handle direct incoming sockets
|
||||||
|
but instead consumes external streams or sockets to work out what is happening).
|
||||||
|
|
||||||
|
An example of such a server would be a chatbot server that connects out to
|
||||||
|
a central chat server and provides a "connection scope" per user chatting to
|
||||||
|
it. There's only one actual connection, but the server has to separate things
|
||||||
|
into several scopes for easier writing of the code.
|
||||||
|
|
||||||
|
You can see an example of this being used in `frequensgi <https://github.com/andrewgodwin/frequensgi>`_.
|
||||||
|
|
||||||
|
|
||||||
|
WSGI-to-ASGI adapter
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Allows you to wrap a WSGI application so it appears as a valid ASGI application.
|
||||||
|
|
||||||
|
Simply wrap it around your WSGI application like so::
|
||||||
|
|
||||||
|
asgi_application = WsgiToAsgi(wsgi_application)
|
||||||
|
|
||||||
|
The WSGI application will be run in a synchronous threadpool, and the wrapped
|
||||||
|
ASGI application will be one that accepts ``http`` class messages.
|
||||||
|
|
||||||
|
Please note that not all extended features of WSGI may be supported (such as
|
||||||
|
file handles for incoming POST bodies).
|
||||||
|
|
||||||
|
|
||||||
|
Dependencies
|
||||||
|
------------
|
||||||
|
|
||||||
|
``asgiref`` requires Python 3.9 or higher.
|
||||||
|
|
||||||
|
|
||||||
|
Contributing
|
||||||
|
------------
|
||||||
|
|
||||||
|
Please refer to the
|
||||||
|
`main Channels contributing docs <https://github.com/django/channels/blob/master/CONTRIBUTING.rst>`_.
|
||||||
|
|
||||||
|
|
||||||
|
Testing
|
||||||
|
'''''''
|
||||||
|
|
||||||
|
To run tests, make sure you have installed the ``tests`` extra with the package::
|
||||||
|
|
||||||
|
cd asgiref/
|
||||||
|
pip install -e .[tests]
|
||||||
|
pytest
|
||||||
|
|
||||||
|
|
||||||
|
Building the documentation
|
||||||
|
''''''''''''''''''''''''''
|
||||||
|
|
||||||
|
The documentation uses `Sphinx <http://www.sphinx-doc.org>`_::
|
||||||
|
|
||||||
|
cd asgiref/docs/
|
||||||
|
pip install sphinx
|
||||||
|
|
||||||
|
To build the docs, you can use the default tools::
|
||||||
|
|
||||||
|
sphinx-build -b html . _build/html # or `make html`, if you've got make set up
|
||||||
|
cd _build/html
|
||||||
|
python -m http.server
|
||||||
|
|
||||||
|
...or you can use ``sphinx-autobuild`` to run a server and rebuild/reload
|
||||||
|
your documentation changes automatically::
|
||||||
|
|
||||||
|
pip install sphinx-autobuild
|
||||||
|
sphinx-autobuild . _build/html
|
||||||
|
|
||||||
|
|
||||||
|
Releasing
|
||||||
|
'''''''''
|
||||||
|
|
||||||
|
To release, first add details to CHANGELOG.txt and update the version number in ``asgiref/__init__.py``.
|
||||||
|
|
||||||
|
Then, build and push the packages::
|
||||||
|
|
||||||
|
python -m build
|
||||||
|
twine upload dist/*
|
||||||
|
rm -r asgiref.egg-info dist
|
||||||
|
|
||||||
|
|
||||||
|
Implementation Details
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
Synchronous code & threads
|
||||||
|
''''''''''''''''''''''''''
|
||||||
|
|
||||||
|
The ``asgiref.sync`` module provides two wrappers that let you go between
|
||||||
|
asynchronous and synchronous code at will, while taking care of the rough edges
|
||||||
|
for you.
|
||||||
|
|
||||||
|
Unfortunately, the rough edges are numerous, and the code has to work especially
|
||||||
|
hard to keep things in the same thread as much as possible. Notably, the
|
||||||
|
restrictions we are working with are:
|
||||||
|
|
||||||
|
* All synchronous code called through ``SyncToAsync`` and marked with
|
||||||
|
``thread_sensitive`` should run in the same thread as each other (and if the
|
||||||
|
outer layer of the program is synchronous, the main thread)
|
||||||
|
|
||||||
|
* If a thread already has a running async loop, ``AsyncToSync`` can't run things
|
||||||
|
on that loop if it's blocked on synchronous code that is above you in the
|
||||||
|
call stack.
|
||||||
|
|
||||||
|
The first compromise you get to might be that ``thread_sensitive`` code should
|
||||||
|
just run in the same thread and not spawn in a sub-thread, fulfilling the first
|
||||||
|
restriction, but that immediately runs you into the second restriction.
|
||||||
|
|
||||||
|
The only real solution is to essentially have a variant of ThreadPoolExecutor
|
||||||
|
that executes any ``thread_sensitive`` code on the outermost synchronous
|
||||||
|
thread - either the main thread, or a single spawned subthread.
|
||||||
|
|
||||||
|
This means you now have two basic states:
|
||||||
|
|
||||||
|
* If the outermost layer of your program is synchronous, then all async code
|
||||||
|
run through ``AsyncToSync`` will run in a per-call event loop in arbitrary
|
||||||
|
sub-threads, while all ``thread_sensitive`` code will run in the main thread.
|
||||||
|
|
||||||
|
* If the outermost layer of your program is asynchronous, then all async code
|
||||||
|
runs on the main thread's event loop, and all ``thread_sensitive`` synchronous
|
||||||
|
code will run in a single shared sub-thread.
|
||||||
|
|
||||||
|
Crucially, this means that in both cases there is a thread which is a shared
|
||||||
|
resource that all ``thread_sensitive`` code must run on, and there is a chance
|
||||||
|
that this thread is currently blocked on its own ``AsyncToSync`` call. Thus,
|
||||||
|
``AsyncToSync`` needs to act as an executor for thread code while it's blocking.
|
||||||
|
|
||||||
|
The ``CurrentThreadExecutor`` class provides this functionality; rather than
|
||||||
|
simply waiting on a Future, you can call its ``run_until_future`` method and
|
||||||
|
it will run submitted code until that Future is done. This means that code
|
||||||
|
inside the call can then run code on your thread.
|
||||||
|
|
||||||
|
|
||||||
|
Maintenance and Security
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
To report security issues, please contact security@djangoproject.com. For GPG
|
||||||
|
signatures and more security process information, see
|
||||||
|
https://docs.djangoproject.com/en/dev/internals/security/.
|
||||||
|
|
||||||
|
To report bugs or request new features, please open a new GitHub issue.
|
||||||
|
|
||||||
|
This repository is part of the Channels project. For the shepherd and maintenance team, please see the
|
||||||
|
`main Channels readme <https://github.com/django/channels/blob/master/README.rst>`_.
|
||||||
27
.venv/Lib/site-packages/asgiref-3.10.0.dist-info/RECORD
Normal file
27
.venv/Lib/site-packages/asgiref-3.10.0.dist-info/RECORD
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
asgiref-3.10.0.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
|
||||||
|
asgiref-3.10.0.dist-info/METADATA,sha256=TlcKOCn3FwSCGD62jZkbckPRh-RjAhkCLLDnfmDZTyA,9287
|
||||||
|
asgiref-3.10.0.dist-info/RECORD,,
|
||||||
|
asgiref-3.10.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
||||||
|
asgiref-3.10.0.dist-info/licenses/LICENSE,sha256=uEZBXRtRTpwd_xSiLeuQbXlLxUbKYSn5UKGM0JHipmk,1552
|
||||||
|
asgiref-3.10.0.dist-info/top_level.txt,sha256=bokQjCzwwERhdBiPdvYEZa4cHxT4NCeAffQNUqJ8ssg,8
|
||||||
|
asgiref/__init__.py,sha256=iKJAvc5i0UTDDSSefTGL0Tq-kWQ4S3OJJgvyaQfQNF8,23
|
||||||
|
asgiref/__pycache__/__init__.cpython-312.pyc,,
|
||||||
|
asgiref/__pycache__/compatibility.cpython-312.pyc,,
|
||||||
|
asgiref/__pycache__/current_thread_executor.cpython-312.pyc,,
|
||||||
|
asgiref/__pycache__/local.cpython-312.pyc,,
|
||||||
|
asgiref/__pycache__/server.cpython-312.pyc,,
|
||||||
|
asgiref/__pycache__/sync.cpython-312.pyc,,
|
||||||
|
asgiref/__pycache__/testing.cpython-312.pyc,,
|
||||||
|
asgiref/__pycache__/timeout.cpython-312.pyc,,
|
||||||
|
asgiref/__pycache__/typing.cpython-312.pyc,,
|
||||||
|
asgiref/__pycache__/wsgi.cpython-312.pyc,,
|
||||||
|
asgiref/compatibility.py,sha256=DhY1SOpOvOw0Y1lSEjCqg-znRUQKecG3LTaV48MZi68,1606
|
||||||
|
asgiref/current_thread_executor.py,sha256=42CU1VODLTk-_PYise-cP1XgyAvI5Djc8f97owFzdrs,4157
|
||||||
|
asgiref/local.py,sha256=ZZeWWIXptVU4GbNApMMWQ-skuglvodcQA5WpzJDMxh4,4912
|
||||||
|
asgiref/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
||||||
|
asgiref/server.py,sha256=3A68169Nuh2sTY_2O5JzRd_opKObWvvrEFcrXssq3kA,6311
|
||||||
|
asgiref/sync.py,sha256=CEKxFyePiksUoA7MronOKaF6mmNQxUYZjXlfJZXEQCM,22551
|
||||||
|
asgiref/testing.py,sha256=U5wcs_-ZYTO5SIGfl80EqRAGv_T8BHrAhvAKRuuztT4,4421
|
||||||
|
asgiref/timeout.py,sha256=LtGL-xQpG8JHprdsEUCMErJ0kNWj4qwWZhEHJ3iKu4s,3627
|
||||||
|
asgiref/typing.py,sha256=Zi72AZlOyF1C7N14LLZnpAdfUH4ljoBqFdQo_bBKMq0,6290
|
||||||
|
asgiref/wsgi.py,sha256=J8OAgirfsYHZmxxqIGfFiZ43uq1qKKv2xGMkRISNIo4,6742
|
||||||
5
.venv/Lib/site-packages/asgiref-3.10.0.dist-info/WHEEL
Normal file
5
.venv/Lib/site-packages/asgiref-3.10.0.dist-info/WHEEL
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
Wheel-Version: 1.0
|
||||||
|
Generator: setuptools (80.9.0)
|
||||||
|
Root-Is-Purelib: true
|
||||||
|
Tag: py3-none-any
|
||||||
|
|
||||||
@@ -0,0 +1,27 @@
|
|||||||
|
Copyright (c) Django Software Foundation and individual contributors.
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without modification,
|
||||||
|
are permitted provided that the following conditions are met:
|
||||||
|
|
||||||
|
1. Redistributions of source code must retain the above copyright notice,
|
||||||
|
this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
3. Neither the name of Django nor the names of its contributors may be used
|
||||||
|
to endorse or promote products derived from this software without
|
||||||
|
specific prior written permission.
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
||||||
|
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||||
|
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||||
|
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
||||||
|
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
||||||
|
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
||||||
|
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
|
||||||
|
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||||
|
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
asgiref
|
||||||
1
.venv/Lib/site-packages/asgiref/__init__.py
Normal file
1
.venv/Lib/site-packages/asgiref/__init__.py
Normal file
@@ -0,0 +1 @@
|
|||||||
|
__version__ = "3.10.0"
|
||||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
BIN
.venv/Lib/site-packages/asgiref/__pycache__/sync.cpython-312.pyc
Normal file
BIN
.venv/Lib/site-packages/asgiref/__pycache__/sync.cpython-312.pyc
Normal file
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
BIN
.venv/Lib/site-packages/asgiref/__pycache__/wsgi.cpython-312.pyc
Normal file
BIN
.venv/Lib/site-packages/asgiref/__pycache__/wsgi.cpython-312.pyc
Normal file
Binary file not shown.
48
.venv/Lib/site-packages/asgiref/compatibility.py
Normal file
48
.venv/Lib/site-packages/asgiref/compatibility.py
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
import inspect
|
||||||
|
|
||||||
|
from .sync import iscoroutinefunction
|
||||||
|
|
||||||
|
|
||||||
|
def is_double_callable(application):
|
||||||
|
"""
|
||||||
|
Tests to see if an application is a legacy-style (double-callable) application.
|
||||||
|
"""
|
||||||
|
# Look for a hint on the object first
|
||||||
|
if getattr(application, "_asgi_single_callable", False):
|
||||||
|
return False
|
||||||
|
if getattr(application, "_asgi_double_callable", False):
|
||||||
|
return True
|
||||||
|
# Uninstanted classes are double-callable
|
||||||
|
if inspect.isclass(application):
|
||||||
|
return True
|
||||||
|
# Instanted classes depend on their __call__
|
||||||
|
if hasattr(application, "__call__"):
|
||||||
|
# We only check to see if its __call__ is a coroutine function -
|
||||||
|
# if it's not, it still might be a coroutine function itself.
|
||||||
|
if iscoroutinefunction(application.__call__):
|
||||||
|
return False
|
||||||
|
# Non-classes we just check directly
|
||||||
|
return not iscoroutinefunction(application)
|
||||||
|
|
||||||
|
|
||||||
|
def double_to_single_callable(application):
|
||||||
|
"""
|
||||||
|
Transforms a double-callable ASGI application into a single-callable one.
|
||||||
|
"""
|
||||||
|
|
||||||
|
async def new_application(scope, receive, send):
|
||||||
|
instance = application(scope)
|
||||||
|
return await instance(receive, send)
|
||||||
|
|
||||||
|
return new_application
|
||||||
|
|
||||||
|
|
||||||
|
def guarantee_single_callable(application):
|
||||||
|
"""
|
||||||
|
Takes either a single- or double-callable application and always returns it
|
||||||
|
in single-callable style. Use this to add backwards compatibility for ASGI
|
||||||
|
2.0 applications to your server/test harness/etc.
|
||||||
|
"""
|
||||||
|
if is_double_callable(application):
|
||||||
|
application = double_to_single_callable(application)
|
||||||
|
return application
|
||||||
123
.venv/Lib/site-packages/asgiref/current_thread_executor.py
Normal file
123
.venv/Lib/site-packages/asgiref/current_thread_executor.py
Normal file
@@ -0,0 +1,123 @@
|
|||||||
|
import sys
|
||||||
|
import threading
|
||||||
|
from collections import deque
|
||||||
|
from concurrent.futures import Executor, Future
|
||||||
|
from typing import Any, Callable, TypeVar
|
||||||
|
|
||||||
|
if sys.version_info >= (3, 10):
|
||||||
|
from typing import ParamSpec
|
||||||
|
else:
|
||||||
|
from typing_extensions import ParamSpec
|
||||||
|
|
||||||
|
_T = TypeVar("_T")
|
||||||
|
_P = ParamSpec("_P")
|
||||||
|
_R = TypeVar("_R")
|
||||||
|
|
||||||
|
|
||||||
|
class _WorkItem:
|
||||||
|
"""
|
||||||
|
Represents an item needing to be run in the executor.
|
||||||
|
Copied from ThreadPoolExecutor (but it's private, so we're not going to rely on importing it)
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
future: "Future[_R]",
|
||||||
|
fn: Callable[_P, _R],
|
||||||
|
*args: _P.args,
|
||||||
|
**kwargs: _P.kwargs,
|
||||||
|
):
|
||||||
|
self.future = future
|
||||||
|
self.fn = fn
|
||||||
|
self.args = args
|
||||||
|
self.kwargs = kwargs
|
||||||
|
|
||||||
|
def run(self) -> None:
|
||||||
|
__traceback_hide__ = True # noqa: F841
|
||||||
|
if not self.future.set_running_or_notify_cancel():
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
result = self.fn(*self.args, **self.kwargs)
|
||||||
|
except BaseException as exc:
|
||||||
|
self.future.set_exception(exc)
|
||||||
|
# Break a reference cycle with the exception 'exc'
|
||||||
|
self = None # type: ignore[assignment]
|
||||||
|
else:
|
||||||
|
self.future.set_result(result)
|
||||||
|
|
||||||
|
|
||||||
|
class CurrentThreadExecutor(Executor):
|
||||||
|
"""
|
||||||
|
An Executor that actually runs code in the thread it is instantiated in.
|
||||||
|
Passed to other threads running async code, so they can run sync code in
|
||||||
|
the thread they came from.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, old_executor: "CurrentThreadExecutor | None") -> None:
|
||||||
|
self._work_thread = threading.current_thread()
|
||||||
|
self._work_ready = threading.Condition(threading.Lock())
|
||||||
|
self._work_items = deque[_WorkItem]() # synchronized by _work_ready
|
||||||
|
self._broken = False # synchronized by _work_ready
|
||||||
|
self._old_executor = old_executor
|
||||||
|
|
||||||
|
def run_until_future(self, future: "Future[Any]") -> None:
|
||||||
|
"""
|
||||||
|
Runs the code in the work queue until a result is available from the future.
|
||||||
|
Should be run from the thread the executor is initialised in.
|
||||||
|
"""
|
||||||
|
# Check we're in the right thread
|
||||||
|
if threading.current_thread() != self._work_thread:
|
||||||
|
raise RuntimeError(
|
||||||
|
"You cannot run CurrentThreadExecutor from a different thread"
|
||||||
|
)
|
||||||
|
|
||||||
|
def done(future: "Future[Any]") -> None:
|
||||||
|
with self._work_ready:
|
||||||
|
self._broken = True
|
||||||
|
self._work_ready.notify()
|
||||||
|
|
||||||
|
future.add_done_callback(done)
|
||||||
|
# Keep getting and running work items until the future we're waiting for
|
||||||
|
# is done and the queue is empty.
|
||||||
|
while True:
|
||||||
|
with self._work_ready:
|
||||||
|
while not self._work_items and not self._broken:
|
||||||
|
self._work_ready.wait()
|
||||||
|
if not self._work_items:
|
||||||
|
break
|
||||||
|
# Get a work item and run it
|
||||||
|
work_item = self._work_items.popleft()
|
||||||
|
work_item.run()
|
||||||
|
del work_item
|
||||||
|
|
||||||
|
def submit(
|
||||||
|
self,
|
||||||
|
fn: Callable[_P, _R],
|
||||||
|
/,
|
||||||
|
*args: _P.args,
|
||||||
|
**kwargs: _P.kwargs,
|
||||||
|
) -> "Future[_R]":
|
||||||
|
# Check they're not submitting from the same thread
|
||||||
|
if threading.current_thread() == self._work_thread:
|
||||||
|
raise RuntimeError(
|
||||||
|
"You cannot submit onto CurrentThreadExecutor from its own thread"
|
||||||
|
)
|
||||||
|
f: "Future[_R]" = Future()
|
||||||
|
work_item = _WorkItem(f, fn, *args, **kwargs)
|
||||||
|
|
||||||
|
# Walk up the CurrentThreadExecutor stack to find the closest one still
|
||||||
|
# running
|
||||||
|
executor = self
|
||||||
|
while True:
|
||||||
|
with executor._work_ready:
|
||||||
|
if not executor._broken:
|
||||||
|
# Add to work queue
|
||||||
|
executor._work_items.append(work_item)
|
||||||
|
executor._work_ready.notify()
|
||||||
|
break
|
||||||
|
if executor._old_executor is None:
|
||||||
|
raise RuntimeError("CurrentThreadExecutor already quit or is broken")
|
||||||
|
executor = executor._old_executor
|
||||||
|
|
||||||
|
# Return the future
|
||||||
|
return f
|
||||||
131
.venv/Lib/site-packages/asgiref/local.py
Normal file
131
.venv/Lib/site-packages/asgiref/local.py
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
import asyncio
|
||||||
|
import contextlib
|
||||||
|
import contextvars
|
||||||
|
import threading
|
||||||
|
from typing import Any, Dict, Union
|
||||||
|
|
||||||
|
|
||||||
|
class _CVar:
|
||||||
|
"""Storage utility for Local."""
|
||||||
|
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._data: "contextvars.ContextVar[Dict[str, Any]]" = contextvars.ContextVar(
|
||||||
|
"asgiref.local"
|
||||||
|
)
|
||||||
|
|
||||||
|
def __getattr__(self, key):
|
||||||
|
storage_object = self._data.get({})
|
||||||
|
try:
|
||||||
|
return storage_object[key]
|
||||||
|
except KeyError:
|
||||||
|
raise AttributeError(f"{self!r} object has no attribute {key!r}")
|
||||||
|
|
||||||
|
def __setattr__(self, key: str, value: Any) -> None:
|
||||||
|
if key == "_data":
|
||||||
|
return super().__setattr__(key, value)
|
||||||
|
|
||||||
|
storage_object = self._data.get({}).copy()
|
||||||
|
storage_object[key] = value
|
||||||
|
self._data.set(storage_object)
|
||||||
|
|
||||||
|
def __delattr__(self, key: str) -> None:
|
||||||
|
storage_object = self._data.get({}).copy()
|
||||||
|
if key in storage_object:
|
||||||
|
del storage_object[key]
|
||||||
|
self._data.set(storage_object)
|
||||||
|
else:
|
||||||
|
raise AttributeError(f"{self!r} object has no attribute {key!r}")
|
||||||
|
|
||||||
|
|
||||||
|
class Local:
|
||||||
|
"""Local storage for async tasks.
|
||||||
|
|
||||||
|
This is a namespace object (similar to `threading.local`) where data is
|
||||||
|
also local to the current async task (if there is one).
|
||||||
|
|
||||||
|
In async threads, local means in the same sense as the `contextvars`
|
||||||
|
module - i.e. a value set in an async frame will be visible:
|
||||||
|
|
||||||
|
- to other async code `await`-ed from this frame.
|
||||||
|
- to tasks spawned using `asyncio` utilities (`create_task`, `wait_for`,
|
||||||
|
`gather` and probably others).
|
||||||
|
- to code scheduled in a sync thread using `sync_to_async`
|
||||||
|
|
||||||
|
In "sync" threads (a thread with no async event loop running), the
|
||||||
|
data is thread-local, but additionally shared with async code executed
|
||||||
|
via the `async_to_sync` utility, which schedules async code in a new thread
|
||||||
|
and copies context across to that thread.
|
||||||
|
|
||||||
|
If `thread_critical` is True, then the local will only be visible per-thread,
|
||||||
|
behaving exactly like `threading.local` if the thread is sync, and as
|
||||||
|
`contextvars` if the thread is async. This allows genuinely thread-sensitive
|
||||||
|
code (such as DB handles) to be kept stricly to their initial thread and
|
||||||
|
disable the sharing across `sync_to_async` and `async_to_sync` wrapped calls.
|
||||||
|
|
||||||
|
Unlike plain `contextvars` objects, this utility is threadsafe.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, thread_critical: bool = False) -> None:
|
||||||
|
self._thread_critical = thread_critical
|
||||||
|
self._thread_lock = threading.RLock()
|
||||||
|
|
||||||
|
self._storage: "Union[threading.local, _CVar]"
|
||||||
|
|
||||||
|
if thread_critical:
|
||||||
|
# Thread-local storage
|
||||||
|
self._storage = threading.local()
|
||||||
|
else:
|
||||||
|
# Contextvar storage
|
||||||
|
self._storage = _CVar()
|
||||||
|
|
||||||
|
@contextlib.contextmanager
|
||||||
|
def _lock_storage(self):
|
||||||
|
# Thread safe access to storage
|
||||||
|
if self._thread_critical:
|
||||||
|
is_async = True
|
||||||
|
try:
|
||||||
|
# this is a test for are we in a async or sync
|
||||||
|
# thread - will raise RuntimeError if there is
|
||||||
|
# no current loop
|
||||||
|
asyncio.get_running_loop()
|
||||||
|
except RuntimeError:
|
||||||
|
is_async = False
|
||||||
|
if not is_async:
|
||||||
|
# We are in a sync thread, the storage is
|
||||||
|
# just the plain thread local (i.e, "global within
|
||||||
|
# this thread" - it doesn't matter where you are
|
||||||
|
# in a call stack you see the same storage)
|
||||||
|
yield self._storage
|
||||||
|
else:
|
||||||
|
# We are in an async thread - storage is still
|
||||||
|
# local to this thread, but additionally should
|
||||||
|
# behave like a context var (is only visible with
|
||||||
|
# the same async call stack)
|
||||||
|
|
||||||
|
# Ensure context exists in the current thread
|
||||||
|
if not hasattr(self._storage, "cvar"):
|
||||||
|
self._storage.cvar = _CVar()
|
||||||
|
|
||||||
|
# self._storage is a thread local, so the members
|
||||||
|
# can't be accessed in another thread (we don't
|
||||||
|
# need any locks)
|
||||||
|
yield self._storage.cvar
|
||||||
|
else:
|
||||||
|
# Lock for thread_critical=False as other threads
|
||||||
|
# can access the exact same storage object
|
||||||
|
with self._thread_lock:
|
||||||
|
yield self._storage
|
||||||
|
|
||||||
|
def __getattr__(self, key):
|
||||||
|
with self._lock_storage() as storage:
|
||||||
|
return getattr(storage, key)
|
||||||
|
|
||||||
|
def __setattr__(self, key, value):
|
||||||
|
if key in ("_local", "_storage", "_thread_critical", "_thread_lock"):
|
||||||
|
return super().__setattr__(key, value)
|
||||||
|
with self._lock_storage() as storage:
|
||||||
|
setattr(storage, key, value)
|
||||||
|
|
||||||
|
def __delattr__(self, key):
|
||||||
|
with self._lock_storage() as storage:
|
||||||
|
delattr(storage, key)
|
||||||
0
.venv/Lib/site-packages/asgiref/py.typed
Normal file
0
.venv/Lib/site-packages/asgiref/py.typed
Normal file
173
.venv/Lib/site-packages/asgiref/server.py
Normal file
173
.venv/Lib/site-packages/asgiref/server.py
Normal file
@@ -0,0 +1,173 @@
|
|||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import time
|
||||||
|
import traceback
|
||||||
|
|
||||||
|
from .compatibility import guarantee_single_callable
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class StatelessServer:
|
||||||
|
"""
|
||||||
|
Base server class that handles basic concepts like application instance
|
||||||
|
creation/pooling, exception handling, and similar, for stateless protocols
|
||||||
|
(i.e. ones without actual incoming connections to the process)
|
||||||
|
|
||||||
|
Your code should override the handle() method, doing whatever it needs to,
|
||||||
|
and calling get_or_create_application_instance with a unique `scope_id`
|
||||||
|
and `scope` for the scope it wants to get.
|
||||||
|
|
||||||
|
If an application instance is found with the same `scope_id`, you are
|
||||||
|
given its input queue, otherwise one is made for you with the scope provided
|
||||||
|
and you are given that fresh new input queue. Either way, you should do
|
||||||
|
something like:
|
||||||
|
|
||||||
|
input_queue = self.get_or_create_application_instance(
|
||||||
|
"user-123456",
|
||||||
|
{"type": "testprotocol", "user_id": "123456", "username": "andrew"},
|
||||||
|
)
|
||||||
|
input_queue.put_nowait(message)
|
||||||
|
|
||||||
|
If you try and create an application instance and there are already
|
||||||
|
`max_application` instances, the oldest/least recently used one will be
|
||||||
|
reclaimed and shut down to make space.
|
||||||
|
|
||||||
|
Application coroutines that error will be found periodically (every 100ms
|
||||||
|
by default) and have their exceptions printed to the console. Override
|
||||||
|
application_exception() if you want to do more when this happens.
|
||||||
|
|
||||||
|
If you override run(), make sure you handle things like launching the
|
||||||
|
application checker.
|
||||||
|
"""
|
||||||
|
|
||||||
|
application_checker_interval = 0.1
|
||||||
|
|
||||||
|
def __init__(self, application, max_applications=1000):
|
||||||
|
# Parameters
|
||||||
|
self.application = application
|
||||||
|
self.max_applications = max_applications
|
||||||
|
# Initialisation
|
||||||
|
self.application_instances = {}
|
||||||
|
|
||||||
|
### Mainloop and handling
|
||||||
|
|
||||||
|
def run(self):
|
||||||
|
"""
|
||||||
|
Runs the asyncio event loop with our handler loop.
|
||||||
|
"""
|
||||||
|
event_loop = asyncio.get_event_loop()
|
||||||
|
try:
|
||||||
|
event_loop.run_until_complete(self.arun())
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
logger.info("Exiting due to Ctrl-C/interrupt")
|
||||||
|
|
||||||
|
async def arun(self):
|
||||||
|
"""
|
||||||
|
Runs the asyncio event loop with our handler loop.
|
||||||
|
"""
|
||||||
|
|
||||||
|
class Done(Exception):
|
||||||
|
pass
|
||||||
|
|
||||||
|
async def handle():
|
||||||
|
await self.handle()
|
||||||
|
raise Done
|
||||||
|
|
||||||
|
try:
|
||||||
|
await asyncio.gather(self.application_checker(), handle())
|
||||||
|
except Done:
|
||||||
|
pass
|
||||||
|
|
||||||
|
async def handle(self):
|
||||||
|
raise NotImplementedError("You must implement handle()")
|
||||||
|
|
||||||
|
async def application_send(self, scope, message):
|
||||||
|
"""
|
||||||
|
Receives outbound sends from applications and handles them.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError("You must implement application_send()")
|
||||||
|
|
||||||
|
### Application instance management
|
||||||
|
|
||||||
|
def get_or_create_application_instance(self, scope_id, scope):
|
||||||
|
"""
|
||||||
|
Creates an application instance and returns its queue.
|
||||||
|
"""
|
||||||
|
if scope_id in self.application_instances:
|
||||||
|
self.application_instances[scope_id]["last_used"] = time.time()
|
||||||
|
return self.application_instances[scope_id]["input_queue"]
|
||||||
|
# See if we need to delete an old one
|
||||||
|
while len(self.application_instances) > self.max_applications:
|
||||||
|
self.delete_oldest_application_instance()
|
||||||
|
# Make an instance of the application
|
||||||
|
input_queue = asyncio.Queue()
|
||||||
|
application_instance = guarantee_single_callable(self.application)
|
||||||
|
# Run it, and stash the future for later checking
|
||||||
|
future = asyncio.ensure_future(
|
||||||
|
application_instance(
|
||||||
|
scope=scope,
|
||||||
|
receive=input_queue.get,
|
||||||
|
send=lambda message: self.application_send(scope, message),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
self.application_instances[scope_id] = {
|
||||||
|
"input_queue": input_queue,
|
||||||
|
"future": future,
|
||||||
|
"scope": scope,
|
||||||
|
"last_used": time.time(),
|
||||||
|
}
|
||||||
|
return input_queue
|
||||||
|
|
||||||
|
def delete_oldest_application_instance(self):
|
||||||
|
"""
|
||||||
|
Finds and deletes the oldest application instance
|
||||||
|
"""
|
||||||
|
oldest_time = min(
|
||||||
|
details["last_used"] for details in self.application_instances.values()
|
||||||
|
)
|
||||||
|
for scope_id, details in self.application_instances.items():
|
||||||
|
if details["last_used"] == oldest_time:
|
||||||
|
self.delete_application_instance(scope_id)
|
||||||
|
# Return to make sure we only delete one in case two have
|
||||||
|
# the same oldest time
|
||||||
|
return
|
||||||
|
|
||||||
|
def delete_application_instance(self, scope_id):
|
||||||
|
"""
|
||||||
|
Removes an application instance (makes sure its task is stopped,
|
||||||
|
then removes it from the current set)
|
||||||
|
"""
|
||||||
|
details = self.application_instances[scope_id]
|
||||||
|
del self.application_instances[scope_id]
|
||||||
|
if not details["future"].done():
|
||||||
|
details["future"].cancel()
|
||||||
|
|
||||||
|
async def application_checker(self):
|
||||||
|
"""
|
||||||
|
Goes through the set of current application instance Futures and cleans up
|
||||||
|
any that are done/prints exceptions for any that errored.
|
||||||
|
"""
|
||||||
|
while True:
|
||||||
|
await asyncio.sleep(self.application_checker_interval)
|
||||||
|
for scope_id, details in list(self.application_instances.items()):
|
||||||
|
if details["future"].done():
|
||||||
|
exception = details["future"].exception()
|
||||||
|
if exception:
|
||||||
|
await self.application_exception(exception, details)
|
||||||
|
try:
|
||||||
|
del self.application_instances[scope_id]
|
||||||
|
except KeyError:
|
||||||
|
# Exception handling might have already got here before us. That's fine.
|
||||||
|
pass
|
||||||
|
|
||||||
|
async def application_exception(self, exception, application_details):
|
||||||
|
"""
|
||||||
|
Called whenever an application coroutine has an exception.
|
||||||
|
"""
|
||||||
|
logging.error(
|
||||||
|
"Exception inside application: %s\n%s%s",
|
||||||
|
exception,
|
||||||
|
"".join(traceback.format_tb(exception.__traceback__)),
|
||||||
|
f" {exception}",
|
||||||
|
)
|
||||||
647
.venv/Lib/site-packages/asgiref/sync.py
Normal file
647
.venv/Lib/site-packages/asgiref/sync.py
Normal file
@@ -0,0 +1,647 @@
|
|||||||
|
import asyncio
|
||||||
|
import asyncio.coroutines
|
||||||
|
import contextvars
|
||||||
|
import functools
|
||||||
|
import inspect
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import threading
|
||||||
|
import warnings
|
||||||
|
import weakref
|
||||||
|
from concurrent.futures import Future, ThreadPoolExecutor
|
||||||
|
from typing import (
|
||||||
|
TYPE_CHECKING,
|
||||||
|
Any,
|
||||||
|
Awaitable,
|
||||||
|
Callable,
|
||||||
|
Coroutine,
|
||||||
|
Dict,
|
||||||
|
Generic,
|
||||||
|
List,
|
||||||
|
Optional,
|
||||||
|
TypeVar,
|
||||||
|
Union,
|
||||||
|
overload,
|
||||||
|
)
|
||||||
|
|
||||||
|
from .current_thread_executor import CurrentThreadExecutor
|
||||||
|
from .local import Local
|
||||||
|
|
||||||
|
if sys.version_info >= (3, 10):
|
||||||
|
from typing import ParamSpec
|
||||||
|
else:
|
||||||
|
from typing_extensions import ParamSpec
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
# This is not available to import at runtime
|
||||||
|
from _typeshed import OptExcInfo
|
||||||
|
|
||||||
|
_F = TypeVar("_F", bound=Callable[..., Any])
|
||||||
|
_P = ParamSpec("_P")
|
||||||
|
_R = TypeVar("_R")
|
||||||
|
|
||||||
|
|
||||||
|
def _restore_context(context: contextvars.Context) -> None:
|
||||||
|
# Check for changes in contextvars, and set them to the current
|
||||||
|
# context for downstream consumers
|
||||||
|
for cvar in context:
|
||||||
|
cvalue = context.get(cvar)
|
||||||
|
try:
|
||||||
|
if cvar.get() != cvalue:
|
||||||
|
cvar.set(cvalue)
|
||||||
|
except LookupError:
|
||||||
|
cvar.set(cvalue)
|
||||||
|
|
||||||
|
|
||||||
|
# Python 3.12 deprecates asyncio.iscoroutinefunction() as an alias for
|
||||||
|
# inspect.iscoroutinefunction(), whilst also removing the _is_coroutine marker.
|
||||||
|
# The latter is replaced with the inspect.markcoroutinefunction decorator.
|
||||||
|
# Until 3.12 is the minimum supported Python version, provide a shim.
|
||||||
|
|
||||||
|
if hasattr(inspect, "markcoroutinefunction"):
|
||||||
|
iscoroutinefunction = inspect.iscoroutinefunction
|
||||||
|
markcoroutinefunction: Callable[[_F], _F] = inspect.markcoroutinefunction
|
||||||
|
else:
|
||||||
|
iscoroutinefunction = asyncio.iscoroutinefunction # type: ignore[assignment]
|
||||||
|
|
||||||
|
def markcoroutinefunction(func: _F) -> _F:
|
||||||
|
func._is_coroutine = asyncio.coroutines._is_coroutine # type: ignore
|
||||||
|
return func
|
||||||
|
|
||||||
|
|
||||||
|
class AsyncSingleThreadContext:
|
||||||
|
"""Context manager to run async code inside the same thread.
|
||||||
|
|
||||||
|
Normally, AsyncToSync functions run either inside a separate ThreadPoolExecutor or
|
||||||
|
the main event loop if it exists. This context manager ensures that all AsyncToSync
|
||||||
|
functions execute within the same thread.
|
||||||
|
|
||||||
|
This context manager is re-entrant, so only the outer-most call to
|
||||||
|
AsyncSingleThreadContext will set the context.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
|
||||||
|
>>> import asyncio
|
||||||
|
>>> with AsyncSingleThreadContext():
|
||||||
|
... async_to_sync(asyncio.sleep(1))()
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.token = None
|
||||||
|
|
||||||
|
def __enter__(self):
|
||||||
|
try:
|
||||||
|
AsyncToSync.async_single_thread_context.get()
|
||||||
|
except LookupError:
|
||||||
|
self.token = AsyncToSync.async_single_thread_context.set(self)
|
||||||
|
|
||||||
|
return self
|
||||||
|
|
||||||
|
def __exit__(self, exc, value, tb):
|
||||||
|
if not self.token:
|
||||||
|
return
|
||||||
|
|
||||||
|
executor = AsyncToSync.context_to_thread_executor.pop(self, None)
|
||||||
|
if executor:
|
||||||
|
executor.shutdown()
|
||||||
|
|
||||||
|
AsyncToSync.async_single_thread_context.reset(self.token)
|
||||||
|
|
||||||
|
|
||||||
|
class ThreadSensitiveContext:
|
||||||
|
"""Async context manager to manage context for thread sensitive mode
|
||||||
|
|
||||||
|
This context manager controls which thread pool executor is used when in
|
||||||
|
thread sensitive mode. By default, a single thread pool executor is shared
|
||||||
|
within a process.
|
||||||
|
|
||||||
|
The ThreadSensitiveContext() context manager may be used to specify a
|
||||||
|
thread pool per context.
|
||||||
|
|
||||||
|
This context manager is re-entrant, so only the outer-most call to
|
||||||
|
ThreadSensitiveContext will set the context.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
|
||||||
|
>>> import time
|
||||||
|
>>> async with ThreadSensitiveContext():
|
||||||
|
... await sync_to_async(time.sleep, 1)()
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.token = None
|
||||||
|
|
||||||
|
async def __aenter__(self):
|
||||||
|
try:
|
||||||
|
SyncToAsync.thread_sensitive_context.get()
|
||||||
|
except LookupError:
|
||||||
|
self.token = SyncToAsync.thread_sensitive_context.set(self)
|
||||||
|
|
||||||
|
return self
|
||||||
|
|
||||||
|
async def __aexit__(self, exc, value, tb):
|
||||||
|
if not self.token:
|
||||||
|
return
|
||||||
|
|
||||||
|
executor = SyncToAsync.context_to_thread_executor.pop(self, None)
|
||||||
|
if executor:
|
||||||
|
executor.shutdown()
|
||||||
|
SyncToAsync.thread_sensitive_context.reset(self.token)
|
||||||
|
|
||||||
|
|
||||||
|
class AsyncToSync(Generic[_P, _R]):
|
||||||
|
"""
|
||||||
|
Utility class which turns an awaitable that only works on the thread with
|
||||||
|
the event loop into a synchronous callable that works in a subthread.
|
||||||
|
|
||||||
|
If the call stack contains an async loop, the code runs there.
|
||||||
|
Otherwise, the code runs in a new loop in a new thread.
|
||||||
|
|
||||||
|
Either way, this thread then pauses and waits to run any thread_sensitive
|
||||||
|
code called from further down the call stack using SyncToAsync, before
|
||||||
|
finally exiting once the async task returns.
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Keeps a reference to the CurrentThreadExecutor in local context, so that
|
||||||
|
# any sync_to_async inside the wrapped code can find it.
|
||||||
|
executors: "Local" = Local()
|
||||||
|
|
||||||
|
# When we can't find a CurrentThreadExecutor from the context, such as
|
||||||
|
# inside create_task, we'll look it up here from the running event loop.
|
||||||
|
loop_thread_executors: "Dict[asyncio.AbstractEventLoop, CurrentThreadExecutor]" = {}
|
||||||
|
|
||||||
|
async_single_thread_context: "contextvars.ContextVar[AsyncSingleThreadContext]" = (
|
||||||
|
contextvars.ContextVar("async_single_thread_context")
|
||||||
|
)
|
||||||
|
|
||||||
|
context_to_thread_executor: "weakref.WeakKeyDictionary[AsyncSingleThreadContext, ThreadPoolExecutor]" = (
|
||||||
|
weakref.WeakKeyDictionary()
|
||||||
|
)
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
awaitable: Union[
|
||||||
|
Callable[_P, Coroutine[Any, Any, _R]],
|
||||||
|
Callable[_P, Awaitable[_R]],
|
||||||
|
],
|
||||||
|
force_new_loop: bool = False,
|
||||||
|
):
|
||||||
|
if not callable(awaitable) or (
|
||||||
|
not iscoroutinefunction(awaitable)
|
||||||
|
and not iscoroutinefunction(getattr(awaitable, "__call__", awaitable))
|
||||||
|
):
|
||||||
|
# Python does not have very reliable detection of async functions
|
||||||
|
# (lots of false negatives) so this is just a warning.
|
||||||
|
warnings.warn(
|
||||||
|
"async_to_sync was passed a non-async-marked callable", stacklevel=2
|
||||||
|
)
|
||||||
|
self.awaitable = awaitable
|
||||||
|
try:
|
||||||
|
self.__self__ = self.awaitable.__self__ # type: ignore[union-attr]
|
||||||
|
except AttributeError:
|
||||||
|
pass
|
||||||
|
self.force_new_loop = force_new_loop
|
||||||
|
self.main_event_loop = None
|
||||||
|
try:
|
||||||
|
self.main_event_loop = asyncio.get_running_loop()
|
||||||
|
except RuntimeError:
|
||||||
|
# There's no event loop in this thread.
|
||||||
|
pass
|
||||||
|
|
||||||
|
def __call__(self, *args: _P.args, **kwargs: _P.kwargs) -> _R:
|
||||||
|
__traceback_hide__ = True # noqa: F841
|
||||||
|
|
||||||
|
if not self.force_new_loop and not self.main_event_loop:
|
||||||
|
# There's no event loop in this thread. Look for the threadlocal if
|
||||||
|
# we're inside SyncToAsync
|
||||||
|
main_event_loop_pid = getattr(
|
||||||
|
SyncToAsync.threadlocal, "main_event_loop_pid", None
|
||||||
|
)
|
||||||
|
# We make sure the parent loop is from the same process - if
|
||||||
|
# they've forked, this is not going to be valid any more (#194)
|
||||||
|
if main_event_loop_pid and main_event_loop_pid == os.getpid():
|
||||||
|
self.main_event_loop = getattr(
|
||||||
|
SyncToAsync.threadlocal, "main_event_loop", None
|
||||||
|
)
|
||||||
|
|
||||||
|
# You can't call AsyncToSync from a thread with a running event loop
|
||||||
|
try:
|
||||||
|
asyncio.get_running_loop()
|
||||||
|
except RuntimeError:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
raise RuntimeError(
|
||||||
|
"You cannot use AsyncToSync in the same thread as an async event loop - "
|
||||||
|
"just await the async function directly."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Make a future for the return information
|
||||||
|
call_result: "Future[_R]" = Future()
|
||||||
|
|
||||||
|
# Make a CurrentThreadExecutor we'll use to idle in this thread - we
|
||||||
|
# need one for every sync frame, even if there's one above us in the
|
||||||
|
# same thread.
|
||||||
|
old_executor = getattr(self.executors, "current", None)
|
||||||
|
current_executor = CurrentThreadExecutor(old_executor)
|
||||||
|
self.executors.current = current_executor
|
||||||
|
|
||||||
|
# Wrapping context in list so it can be reassigned from within
|
||||||
|
# `main_wrap`.
|
||||||
|
context = [contextvars.copy_context()]
|
||||||
|
|
||||||
|
# Get task context so that parent task knows which task to propagate
|
||||||
|
# an asyncio.CancelledError to.
|
||||||
|
task_context = getattr(SyncToAsync.threadlocal, "task_context", None)
|
||||||
|
|
||||||
|
# Use call_soon_threadsafe to schedule a synchronous callback on the
|
||||||
|
# main event loop's thread if it's there, otherwise make a new loop
|
||||||
|
# in this thread.
|
||||||
|
try:
|
||||||
|
awaitable = self.main_wrap(
|
||||||
|
call_result,
|
||||||
|
sys.exc_info(),
|
||||||
|
task_context,
|
||||||
|
context,
|
||||||
|
# prepare an awaitable which can be passed as is to self.main_wrap,
|
||||||
|
# so that `args` and `kwargs` don't need to be
|
||||||
|
# destructured when passed to self.main_wrap
|
||||||
|
# (which is required by `ParamSpec`)
|
||||||
|
# as that may cause overlapping arguments
|
||||||
|
self.awaitable(*args, **kwargs),
|
||||||
|
)
|
||||||
|
|
||||||
|
async def new_loop_wrap() -> None:
|
||||||
|
loop = asyncio.get_running_loop()
|
||||||
|
self.loop_thread_executors[loop] = current_executor
|
||||||
|
try:
|
||||||
|
await awaitable
|
||||||
|
finally:
|
||||||
|
del self.loop_thread_executors[loop]
|
||||||
|
|
||||||
|
if self.main_event_loop is not None:
|
||||||
|
try:
|
||||||
|
self.main_event_loop.call_soon_threadsafe(
|
||||||
|
self.main_event_loop.create_task, awaitable
|
||||||
|
)
|
||||||
|
except RuntimeError:
|
||||||
|
running_in_main_event_loop = False
|
||||||
|
else:
|
||||||
|
running_in_main_event_loop = True
|
||||||
|
# Run the CurrentThreadExecutor until the future is done.
|
||||||
|
current_executor.run_until_future(call_result)
|
||||||
|
else:
|
||||||
|
running_in_main_event_loop = False
|
||||||
|
|
||||||
|
if not running_in_main_event_loop:
|
||||||
|
loop_executor = None
|
||||||
|
|
||||||
|
if self.async_single_thread_context.get(None):
|
||||||
|
single_thread_context = self.async_single_thread_context.get()
|
||||||
|
|
||||||
|
if single_thread_context in self.context_to_thread_executor:
|
||||||
|
loop_executor = self.context_to_thread_executor[
|
||||||
|
single_thread_context
|
||||||
|
]
|
||||||
|
else:
|
||||||
|
loop_executor = ThreadPoolExecutor(max_workers=1)
|
||||||
|
self.context_to_thread_executor[
|
||||||
|
single_thread_context
|
||||||
|
] = loop_executor
|
||||||
|
else:
|
||||||
|
# Make our own event loop - in a new thread - and run inside that.
|
||||||
|
loop_executor = ThreadPoolExecutor(max_workers=1)
|
||||||
|
|
||||||
|
loop_future = loop_executor.submit(asyncio.run, new_loop_wrap())
|
||||||
|
# Run the CurrentThreadExecutor until the future is done.
|
||||||
|
current_executor.run_until_future(loop_future)
|
||||||
|
# Wait for future and/or allow for exception propagation
|
||||||
|
loop_future.result()
|
||||||
|
finally:
|
||||||
|
_restore_context(context[0])
|
||||||
|
# Restore old current thread executor state
|
||||||
|
self.executors.current = old_executor
|
||||||
|
|
||||||
|
# Wait for results from the future.
|
||||||
|
return call_result.result()
|
||||||
|
|
||||||
|
def __get__(self, parent: Any, objtype: Any) -> Callable[_P, _R]:
|
||||||
|
"""
|
||||||
|
Include self for methods
|
||||||
|
"""
|
||||||
|
func = functools.partial(self.__call__, parent)
|
||||||
|
return functools.update_wrapper(func, self.awaitable)
|
||||||
|
|
||||||
|
async def main_wrap(
|
||||||
|
self,
|
||||||
|
call_result: "Future[_R]",
|
||||||
|
exc_info: "OptExcInfo",
|
||||||
|
task_context: "Optional[List[asyncio.Task[Any]]]",
|
||||||
|
context: List[contextvars.Context],
|
||||||
|
awaitable: Union[Coroutine[Any, Any, _R], Awaitable[_R]],
|
||||||
|
) -> None:
|
||||||
|
"""
|
||||||
|
Wraps the awaitable with something that puts the result into the
|
||||||
|
result/exception future.
|
||||||
|
"""
|
||||||
|
|
||||||
|
__traceback_hide__ = True # noqa: F841
|
||||||
|
|
||||||
|
if context is not None:
|
||||||
|
_restore_context(context[0])
|
||||||
|
|
||||||
|
current_task = asyncio.current_task()
|
||||||
|
if current_task is not None and task_context is not None:
|
||||||
|
task_context.append(current_task)
|
||||||
|
|
||||||
|
try:
|
||||||
|
# If we have an exception, run the function inside the except block
|
||||||
|
# after raising it so exc_info is correctly populated.
|
||||||
|
if exc_info[1]:
|
||||||
|
try:
|
||||||
|
raise exc_info[1]
|
||||||
|
except BaseException:
|
||||||
|
result = await awaitable
|
||||||
|
else:
|
||||||
|
result = await awaitable
|
||||||
|
except BaseException as e:
|
||||||
|
call_result.set_exception(e)
|
||||||
|
else:
|
||||||
|
call_result.set_result(result)
|
||||||
|
finally:
|
||||||
|
if current_task is not None and task_context is not None:
|
||||||
|
task_context.remove(current_task)
|
||||||
|
context[0] = contextvars.copy_context()
|
||||||
|
|
||||||
|
|
||||||
|
class SyncToAsync(Generic[_P, _R]):
|
||||||
|
"""
|
||||||
|
Utility class which turns a synchronous callable into an awaitable that
|
||||||
|
runs in a threadpool. It also sets a threadlocal inside the thread so
|
||||||
|
calls to AsyncToSync can escape it.
|
||||||
|
|
||||||
|
If thread_sensitive is passed, the code will run in the same thread as any
|
||||||
|
outer code. This is needed for underlying Python code that is not
|
||||||
|
threadsafe (for example, code which handles SQLite database connections).
|
||||||
|
|
||||||
|
If the outermost program is async (i.e. SyncToAsync is outermost), then
|
||||||
|
this will be a dedicated single sub-thread that all sync code runs in,
|
||||||
|
one after the other. If the outermost program is sync (i.e. AsyncToSync is
|
||||||
|
outermost), this will just be the main thread. This is achieved by idling
|
||||||
|
with a CurrentThreadExecutor while AsyncToSync is blocking its sync parent,
|
||||||
|
rather than just blocking.
|
||||||
|
|
||||||
|
If executor is passed in, that will be used instead of the loop's default executor.
|
||||||
|
In order to pass in an executor, thread_sensitive must be set to False, otherwise
|
||||||
|
a TypeError will be raised.
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Storage for main event loop references
|
||||||
|
threadlocal = threading.local()
|
||||||
|
|
||||||
|
# Single-thread executor for thread-sensitive code
|
||||||
|
single_thread_executor = ThreadPoolExecutor(max_workers=1)
|
||||||
|
|
||||||
|
# Maintain a contextvar for the current execution context. Optionally used
|
||||||
|
# for thread sensitive mode.
|
||||||
|
thread_sensitive_context: "contextvars.ContextVar[ThreadSensitiveContext]" = (
|
||||||
|
contextvars.ContextVar("thread_sensitive_context")
|
||||||
|
)
|
||||||
|
|
||||||
|
# Contextvar that is used to detect if the single thread executor
|
||||||
|
# would be awaited on while already being used in the same context
|
||||||
|
deadlock_context: "contextvars.ContextVar[bool]" = contextvars.ContextVar(
|
||||||
|
"deadlock_context"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Maintaining a weak reference to the context ensures that thread pools are
|
||||||
|
# erased once the context goes out of scope. This terminates the thread pool.
|
||||||
|
context_to_thread_executor: "weakref.WeakKeyDictionary[ThreadSensitiveContext, ThreadPoolExecutor]" = (
|
||||||
|
weakref.WeakKeyDictionary()
|
||||||
|
)
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
func: Callable[_P, _R],
|
||||||
|
thread_sensitive: bool = True,
|
||||||
|
executor: Optional["ThreadPoolExecutor"] = None,
|
||||||
|
) -> None:
|
||||||
|
if (
|
||||||
|
not callable(func)
|
||||||
|
or iscoroutinefunction(func)
|
||||||
|
or iscoroutinefunction(getattr(func, "__call__", func))
|
||||||
|
):
|
||||||
|
raise TypeError("sync_to_async can only be applied to sync functions.")
|
||||||
|
self.func = func
|
||||||
|
functools.update_wrapper(self, func)
|
||||||
|
self._thread_sensitive = thread_sensitive
|
||||||
|
markcoroutinefunction(self)
|
||||||
|
if thread_sensitive and executor is not None:
|
||||||
|
raise TypeError("executor must not be set when thread_sensitive is True")
|
||||||
|
self._executor = executor
|
||||||
|
try:
|
||||||
|
self.__self__ = func.__self__ # type: ignore
|
||||||
|
except AttributeError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
async def __call__(self, *args: _P.args, **kwargs: _P.kwargs) -> _R:
|
||||||
|
__traceback_hide__ = True # noqa: F841
|
||||||
|
loop = asyncio.get_running_loop()
|
||||||
|
|
||||||
|
# Work out what thread to run the code in
|
||||||
|
if self._thread_sensitive:
|
||||||
|
current_thread_executor = getattr(AsyncToSync.executors, "current", None)
|
||||||
|
if current_thread_executor:
|
||||||
|
# If we have a parent sync thread above somewhere, use that
|
||||||
|
executor = current_thread_executor
|
||||||
|
elif self.thread_sensitive_context.get(None):
|
||||||
|
# If we have a way of retrieving the current context, attempt
|
||||||
|
# to use a per-context thread pool executor
|
||||||
|
thread_sensitive_context = self.thread_sensitive_context.get()
|
||||||
|
|
||||||
|
if thread_sensitive_context in self.context_to_thread_executor:
|
||||||
|
# Re-use thread executor in current context
|
||||||
|
executor = self.context_to_thread_executor[thread_sensitive_context]
|
||||||
|
else:
|
||||||
|
# Create new thread executor in current context
|
||||||
|
executor = ThreadPoolExecutor(max_workers=1)
|
||||||
|
self.context_to_thread_executor[thread_sensitive_context] = executor
|
||||||
|
elif loop in AsyncToSync.loop_thread_executors:
|
||||||
|
# Re-use thread executor for running loop
|
||||||
|
executor = AsyncToSync.loop_thread_executors[loop]
|
||||||
|
elif self.deadlock_context.get(False):
|
||||||
|
raise RuntimeError(
|
||||||
|
"Single thread executor already being used, would deadlock"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
# Otherwise, we run it in a fixed single thread
|
||||||
|
executor = self.single_thread_executor
|
||||||
|
self.deadlock_context.set(True)
|
||||||
|
else:
|
||||||
|
# Use the passed in executor, or the loop's default if it is None
|
||||||
|
executor = self._executor
|
||||||
|
|
||||||
|
context = contextvars.copy_context()
|
||||||
|
child = functools.partial(self.func, *args, **kwargs)
|
||||||
|
func = context.run
|
||||||
|
task_context: List[asyncio.Task[Any]] = []
|
||||||
|
|
||||||
|
# Run the code in the right thread
|
||||||
|
exec_coro = loop.run_in_executor(
|
||||||
|
executor,
|
||||||
|
functools.partial(
|
||||||
|
self.thread_handler,
|
||||||
|
loop,
|
||||||
|
sys.exc_info(),
|
||||||
|
task_context,
|
||||||
|
func,
|
||||||
|
child,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
ret: _R
|
||||||
|
try:
|
||||||
|
ret = await asyncio.shield(exec_coro)
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
cancel_parent = True
|
||||||
|
try:
|
||||||
|
task = task_context[0]
|
||||||
|
task.cancel()
|
||||||
|
try:
|
||||||
|
await task
|
||||||
|
cancel_parent = False
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
pass
|
||||||
|
except IndexError:
|
||||||
|
pass
|
||||||
|
if exec_coro.done():
|
||||||
|
raise
|
||||||
|
if cancel_parent:
|
||||||
|
exec_coro.cancel()
|
||||||
|
ret = await exec_coro
|
||||||
|
finally:
|
||||||
|
_restore_context(context)
|
||||||
|
self.deadlock_context.set(False)
|
||||||
|
|
||||||
|
return ret
|
||||||
|
|
||||||
|
def __get__(
|
||||||
|
self, parent: Any, objtype: Any
|
||||||
|
) -> Callable[_P, Coroutine[Any, Any, _R]]:
|
||||||
|
"""
|
||||||
|
Include self for methods
|
||||||
|
"""
|
||||||
|
func = functools.partial(self.__call__, parent)
|
||||||
|
return functools.update_wrapper(func, self.func)
|
||||||
|
|
||||||
|
def thread_handler(self, loop, exc_info, task_context, func, *args, **kwargs):
|
||||||
|
"""
|
||||||
|
Wraps the sync application with exception handling.
|
||||||
|
"""
|
||||||
|
|
||||||
|
__traceback_hide__ = True # noqa: F841
|
||||||
|
|
||||||
|
# Set the threadlocal for AsyncToSync
|
||||||
|
self.threadlocal.main_event_loop = loop
|
||||||
|
self.threadlocal.main_event_loop_pid = os.getpid()
|
||||||
|
self.threadlocal.task_context = task_context
|
||||||
|
|
||||||
|
# Run the function
|
||||||
|
# If we have an exception, run the function inside the except block
|
||||||
|
# after raising it so exc_info is correctly populated.
|
||||||
|
if exc_info[1]:
|
||||||
|
try:
|
||||||
|
raise exc_info[1]
|
||||||
|
except BaseException:
|
||||||
|
return func(*args, **kwargs)
|
||||||
|
else:
|
||||||
|
return func(*args, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
@overload
|
||||||
|
def async_to_sync(
|
||||||
|
*,
|
||||||
|
force_new_loop: bool = False,
|
||||||
|
) -> Callable[
|
||||||
|
[Union[Callable[_P, Coroutine[Any, Any, _R]], Callable[_P, Awaitable[_R]]]],
|
||||||
|
Callable[_P, _R],
|
||||||
|
]:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
@overload
|
||||||
|
def async_to_sync(
|
||||||
|
awaitable: Union[
|
||||||
|
Callable[_P, Coroutine[Any, Any, _R]],
|
||||||
|
Callable[_P, Awaitable[_R]],
|
||||||
|
],
|
||||||
|
*,
|
||||||
|
force_new_loop: bool = False,
|
||||||
|
) -> Callable[_P, _R]:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
def async_to_sync(
|
||||||
|
awaitable: Optional[
|
||||||
|
Union[
|
||||||
|
Callable[_P, Coroutine[Any, Any, _R]],
|
||||||
|
Callable[_P, Awaitable[_R]],
|
||||||
|
]
|
||||||
|
] = None,
|
||||||
|
*,
|
||||||
|
force_new_loop: bool = False,
|
||||||
|
) -> Union[
|
||||||
|
Callable[
|
||||||
|
[Union[Callable[_P, Coroutine[Any, Any, _R]], Callable[_P, Awaitable[_R]]]],
|
||||||
|
Callable[_P, _R],
|
||||||
|
],
|
||||||
|
Callable[_P, _R],
|
||||||
|
]:
|
||||||
|
if awaitable is None:
|
||||||
|
return lambda f: AsyncToSync(
|
||||||
|
f,
|
||||||
|
force_new_loop=force_new_loop,
|
||||||
|
)
|
||||||
|
return AsyncToSync(
|
||||||
|
awaitable,
|
||||||
|
force_new_loop=force_new_loop,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@overload
|
||||||
|
def sync_to_async(
|
||||||
|
*,
|
||||||
|
thread_sensitive: bool = True,
|
||||||
|
executor: Optional["ThreadPoolExecutor"] = None,
|
||||||
|
) -> Callable[[Callable[_P, _R]], Callable[_P, Coroutine[Any, Any, _R]]]:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
@overload
|
||||||
|
def sync_to_async(
|
||||||
|
func: Callable[_P, _R],
|
||||||
|
*,
|
||||||
|
thread_sensitive: bool = True,
|
||||||
|
executor: Optional["ThreadPoolExecutor"] = None,
|
||||||
|
) -> Callable[_P, Coroutine[Any, Any, _R]]:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
def sync_to_async(
|
||||||
|
func: Optional[Callable[_P, _R]] = None,
|
||||||
|
*,
|
||||||
|
thread_sensitive: bool = True,
|
||||||
|
executor: Optional["ThreadPoolExecutor"] = None,
|
||||||
|
) -> Union[
|
||||||
|
Callable[[Callable[_P, _R]], Callable[_P, Coroutine[Any, Any, _R]]],
|
||||||
|
Callable[_P, Coroutine[Any, Any, _R]],
|
||||||
|
]:
|
||||||
|
if func is None:
|
||||||
|
return lambda f: SyncToAsync(
|
||||||
|
f,
|
||||||
|
thread_sensitive=thread_sensitive,
|
||||||
|
executor=executor,
|
||||||
|
)
|
||||||
|
return SyncToAsync(
|
||||||
|
func,
|
||||||
|
thread_sensitive=thread_sensitive,
|
||||||
|
executor=executor,
|
||||||
|
)
|
||||||
137
.venv/Lib/site-packages/asgiref/testing.py
Normal file
137
.venv/Lib/site-packages/asgiref/testing.py
Normal file
@@ -0,0 +1,137 @@
|
|||||||
|
import asyncio
|
||||||
|
import contextvars
|
||||||
|
import time
|
||||||
|
|
||||||
|
from .compatibility import guarantee_single_callable
|
||||||
|
from .timeout import timeout as async_timeout
|
||||||
|
|
||||||
|
|
||||||
|
class ApplicationCommunicator:
|
||||||
|
"""
|
||||||
|
Runs an ASGI application in a test mode, allowing sending of
|
||||||
|
messages to it and retrieval of messages it sends.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, application, scope):
|
||||||
|
self._future = None
|
||||||
|
self.application = guarantee_single_callable(application)
|
||||||
|
self.scope = scope
|
||||||
|
self._input_queue = None
|
||||||
|
self._output_queue = None
|
||||||
|
|
||||||
|
# For Python 3.9 we need to lazily bind the queues, on 3.10+ they bind the
|
||||||
|
# event loop lazily.
|
||||||
|
@property
|
||||||
|
def input_queue(self):
|
||||||
|
if self._input_queue is None:
|
||||||
|
self._input_queue = asyncio.Queue()
|
||||||
|
return self._input_queue
|
||||||
|
|
||||||
|
@property
|
||||||
|
def output_queue(self):
|
||||||
|
if self._output_queue is None:
|
||||||
|
self._output_queue = asyncio.Queue()
|
||||||
|
return self._output_queue
|
||||||
|
|
||||||
|
@property
|
||||||
|
def future(self):
|
||||||
|
if self._future is None:
|
||||||
|
# Clear context - this ensures that context vars set in the testing scope
|
||||||
|
# are not "leaked" into the application which would normally begin with
|
||||||
|
# an empty context. In Python >= 3.11 this could also be written as:
|
||||||
|
# asyncio.create_task(..., context=contextvars.Context())
|
||||||
|
self._future = contextvars.Context().run(
|
||||||
|
asyncio.create_task,
|
||||||
|
self.application(
|
||||||
|
self.scope, self.input_queue.get, self.output_queue.put
|
||||||
|
),
|
||||||
|
)
|
||||||
|
return self._future
|
||||||
|
|
||||||
|
async def wait(self, timeout=1):
|
||||||
|
"""
|
||||||
|
Waits for the application to stop itself and returns any exceptions.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
async with async_timeout(timeout):
|
||||||
|
try:
|
||||||
|
await self.future
|
||||||
|
self.future.result()
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
pass
|
||||||
|
finally:
|
||||||
|
if not self.future.done():
|
||||||
|
self.future.cancel()
|
||||||
|
try:
|
||||||
|
await self.future
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
def stop(self, exceptions=True):
|
||||||
|
future = self._future
|
||||||
|
if future is None:
|
||||||
|
return
|
||||||
|
|
||||||
|
if not future.done():
|
||||||
|
future.cancel()
|
||||||
|
elif exceptions:
|
||||||
|
# Give a chance to raise any exceptions
|
||||||
|
future.result()
|
||||||
|
|
||||||
|
def __del__(self):
|
||||||
|
# Clean up on deletion
|
||||||
|
try:
|
||||||
|
self.stop(exceptions=False)
|
||||||
|
except RuntimeError:
|
||||||
|
# Event loop already stopped
|
||||||
|
pass
|
||||||
|
|
||||||
|
async def send_input(self, message):
|
||||||
|
"""
|
||||||
|
Sends a single message to the application
|
||||||
|
"""
|
||||||
|
# Make sure there's not an exception to raise from the task
|
||||||
|
if self.future.done():
|
||||||
|
self.future.result()
|
||||||
|
|
||||||
|
# Give it the message
|
||||||
|
await self.input_queue.put(message)
|
||||||
|
|
||||||
|
async def receive_output(self, timeout=1):
|
||||||
|
"""
|
||||||
|
Receives a single message from the application, with optional timeout.
|
||||||
|
"""
|
||||||
|
# Make sure there's not an exception to raise from the task
|
||||||
|
if self.future.done():
|
||||||
|
self.future.result()
|
||||||
|
# Wait and receive the message
|
||||||
|
try:
|
||||||
|
async with async_timeout(timeout):
|
||||||
|
return await self.output_queue.get()
|
||||||
|
except asyncio.TimeoutError as e:
|
||||||
|
# See if we have another error to raise inside
|
||||||
|
if self.future.done():
|
||||||
|
self.future.result()
|
||||||
|
else:
|
||||||
|
self.future.cancel()
|
||||||
|
try:
|
||||||
|
await self.future
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
pass
|
||||||
|
raise e
|
||||||
|
|
||||||
|
async def receive_nothing(self, timeout=0.1, interval=0.01):
|
||||||
|
"""
|
||||||
|
Checks that there is no message to receive in the given time.
|
||||||
|
"""
|
||||||
|
# Make sure there's not an exception to raise from the task
|
||||||
|
if self.future.done():
|
||||||
|
self.future.result()
|
||||||
|
|
||||||
|
# `interval` has precedence over `timeout`
|
||||||
|
start = time.monotonic()
|
||||||
|
while time.monotonic() - start < timeout:
|
||||||
|
if not self.output_queue.empty():
|
||||||
|
return False
|
||||||
|
await asyncio.sleep(interval)
|
||||||
|
return self.output_queue.empty()
|
||||||
118
.venv/Lib/site-packages/asgiref/timeout.py
Normal file
118
.venv/Lib/site-packages/asgiref/timeout.py
Normal file
@@ -0,0 +1,118 @@
|
|||||||
|
# This code is originally sourced from the aio-libs project "async_timeout",
|
||||||
|
# under the Apache 2.0 license. You may see the original project at
|
||||||
|
# https://github.com/aio-libs/async-timeout
|
||||||
|
|
||||||
|
# It is vendored here to reduce chain-dependencies on this library, and
|
||||||
|
# modified slightly to remove some features we don't use.
|
||||||
|
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import warnings
|
||||||
|
from types import TracebackType
|
||||||
|
from typing import Any # noqa
|
||||||
|
from typing import Optional, Type
|
||||||
|
|
||||||
|
|
||||||
|
class timeout:
|
||||||
|
"""timeout context manager.
|
||||||
|
|
||||||
|
Useful in cases when you want to apply timeout logic around block
|
||||||
|
of code or in cases when asyncio.wait_for is not suitable. For example:
|
||||||
|
|
||||||
|
>>> with timeout(0.001):
|
||||||
|
... async with aiohttp.get('https://github.com') as r:
|
||||||
|
... await r.text()
|
||||||
|
|
||||||
|
|
||||||
|
timeout - value in seconds or None to disable timeout logic
|
||||||
|
loop - asyncio compatible event loop
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
timeout: Optional[float],
|
||||||
|
*,
|
||||||
|
loop: Optional[asyncio.AbstractEventLoop] = None,
|
||||||
|
) -> None:
|
||||||
|
self._timeout = timeout
|
||||||
|
if loop is None:
|
||||||
|
loop = asyncio.get_running_loop()
|
||||||
|
else:
|
||||||
|
warnings.warn(
|
||||||
|
"""The loop argument to timeout() is deprecated.""", DeprecationWarning
|
||||||
|
)
|
||||||
|
self._loop = loop
|
||||||
|
self._task = None # type: Optional[asyncio.Task[Any]]
|
||||||
|
self._cancelled = False
|
||||||
|
self._cancel_handler = None # type: Optional[asyncio.Handle]
|
||||||
|
self._cancel_at = None # type: Optional[float]
|
||||||
|
|
||||||
|
def __enter__(self) -> "timeout":
|
||||||
|
return self._do_enter()
|
||||||
|
|
||||||
|
def __exit__(
|
||||||
|
self,
|
||||||
|
exc_type: Type[BaseException],
|
||||||
|
exc_val: BaseException,
|
||||||
|
exc_tb: TracebackType,
|
||||||
|
) -> Optional[bool]:
|
||||||
|
self._do_exit(exc_type)
|
||||||
|
return None
|
||||||
|
|
||||||
|
async def __aenter__(self) -> "timeout":
|
||||||
|
return self._do_enter()
|
||||||
|
|
||||||
|
async def __aexit__(
|
||||||
|
self,
|
||||||
|
exc_type: Type[BaseException],
|
||||||
|
exc_val: BaseException,
|
||||||
|
exc_tb: TracebackType,
|
||||||
|
) -> None:
|
||||||
|
self._do_exit(exc_type)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def expired(self) -> bool:
|
||||||
|
return self._cancelled
|
||||||
|
|
||||||
|
@property
|
||||||
|
def remaining(self) -> Optional[float]:
|
||||||
|
if self._cancel_at is not None:
|
||||||
|
return max(self._cancel_at - self._loop.time(), 0.0)
|
||||||
|
else:
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _do_enter(self) -> "timeout":
|
||||||
|
# Support Tornado 5- without timeout
|
||||||
|
# Details: https://github.com/python/asyncio/issues/392
|
||||||
|
if self._timeout is None:
|
||||||
|
return self
|
||||||
|
|
||||||
|
self._task = asyncio.current_task(self._loop)
|
||||||
|
if self._task is None:
|
||||||
|
raise RuntimeError(
|
||||||
|
"Timeout context manager should be used " "inside a task"
|
||||||
|
)
|
||||||
|
|
||||||
|
if self._timeout <= 0:
|
||||||
|
self._loop.call_soon(self._cancel_task)
|
||||||
|
return self
|
||||||
|
|
||||||
|
self._cancel_at = self._loop.time() + self._timeout
|
||||||
|
self._cancel_handler = self._loop.call_at(self._cancel_at, self._cancel_task)
|
||||||
|
return self
|
||||||
|
|
||||||
|
def _do_exit(self, exc_type: Type[BaseException]) -> None:
|
||||||
|
if exc_type is asyncio.CancelledError and self._cancelled:
|
||||||
|
self._cancel_handler = None
|
||||||
|
self._task = None
|
||||||
|
raise asyncio.TimeoutError
|
||||||
|
if self._timeout is not None and self._cancel_handler is not None:
|
||||||
|
self._cancel_handler.cancel()
|
||||||
|
self._cancel_handler = None
|
||||||
|
self._task = None
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _cancel_task(self) -> None:
|
||||||
|
if self._task is not None:
|
||||||
|
self._task.cancel()
|
||||||
|
self._cancelled = True
|
||||||
279
.venv/Lib/site-packages/asgiref/typing.py
Normal file
279
.venv/Lib/site-packages/asgiref/typing.py
Normal file
@@ -0,0 +1,279 @@
|
|||||||
|
import sys
|
||||||
|
from typing import (
|
||||||
|
Any,
|
||||||
|
Awaitable,
|
||||||
|
Callable,
|
||||||
|
Dict,
|
||||||
|
Iterable,
|
||||||
|
Literal,
|
||||||
|
Optional,
|
||||||
|
Protocol,
|
||||||
|
Tuple,
|
||||||
|
Type,
|
||||||
|
TypedDict,
|
||||||
|
Union,
|
||||||
|
)
|
||||||
|
|
||||||
|
if sys.version_info >= (3, 11):
|
||||||
|
from typing import NotRequired
|
||||||
|
else:
|
||||||
|
from typing_extensions import NotRequired
|
||||||
|
|
||||||
|
__all__ = (
|
||||||
|
"ASGIVersions",
|
||||||
|
"HTTPScope",
|
||||||
|
"WebSocketScope",
|
||||||
|
"LifespanScope",
|
||||||
|
"WWWScope",
|
||||||
|
"Scope",
|
||||||
|
"HTTPRequestEvent",
|
||||||
|
"HTTPResponseStartEvent",
|
||||||
|
"HTTPResponseBodyEvent",
|
||||||
|
"HTTPResponseTrailersEvent",
|
||||||
|
"HTTPResponsePathsendEvent",
|
||||||
|
"HTTPServerPushEvent",
|
||||||
|
"HTTPDisconnectEvent",
|
||||||
|
"WebSocketConnectEvent",
|
||||||
|
"WebSocketAcceptEvent",
|
||||||
|
"WebSocketReceiveEvent",
|
||||||
|
"WebSocketSendEvent",
|
||||||
|
"WebSocketResponseStartEvent",
|
||||||
|
"WebSocketResponseBodyEvent",
|
||||||
|
"WebSocketDisconnectEvent",
|
||||||
|
"WebSocketCloseEvent",
|
||||||
|
"LifespanStartupEvent",
|
||||||
|
"LifespanShutdownEvent",
|
||||||
|
"LifespanStartupCompleteEvent",
|
||||||
|
"LifespanStartupFailedEvent",
|
||||||
|
"LifespanShutdownCompleteEvent",
|
||||||
|
"LifespanShutdownFailedEvent",
|
||||||
|
"ASGIReceiveEvent",
|
||||||
|
"ASGISendEvent",
|
||||||
|
"ASGIReceiveCallable",
|
||||||
|
"ASGISendCallable",
|
||||||
|
"ASGI2Protocol",
|
||||||
|
"ASGI2Application",
|
||||||
|
"ASGI3Application",
|
||||||
|
"ASGIApplication",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class ASGIVersions(TypedDict):
|
||||||
|
spec_version: str
|
||||||
|
version: Union[Literal["2.0"], Literal["3.0"]]
|
||||||
|
|
||||||
|
|
||||||
|
class HTTPScope(TypedDict):
|
||||||
|
type: Literal["http"]
|
||||||
|
asgi: ASGIVersions
|
||||||
|
http_version: str
|
||||||
|
method: str
|
||||||
|
scheme: str
|
||||||
|
path: str
|
||||||
|
raw_path: bytes
|
||||||
|
query_string: bytes
|
||||||
|
root_path: str
|
||||||
|
headers: Iterable[Tuple[bytes, bytes]]
|
||||||
|
client: Optional[Tuple[str, int]]
|
||||||
|
server: Optional[Tuple[str, Optional[int]]]
|
||||||
|
state: NotRequired[Dict[str, Any]]
|
||||||
|
extensions: Optional[Dict[str, Dict[object, object]]]
|
||||||
|
|
||||||
|
|
||||||
|
class WebSocketScope(TypedDict):
|
||||||
|
type: Literal["websocket"]
|
||||||
|
asgi: ASGIVersions
|
||||||
|
http_version: str
|
||||||
|
scheme: str
|
||||||
|
path: str
|
||||||
|
raw_path: bytes
|
||||||
|
query_string: bytes
|
||||||
|
root_path: str
|
||||||
|
headers: Iterable[Tuple[bytes, bytes]]
|
||||||
|
client: Optional[Tuple[str, int]]
|
||||||
|
server: Optional[Tuple[str, Optional[int]]]
|
||||||
|
subprotocols: Iterable[str]
|
||||||
|
state: NotRequired[Dict[str, Any]]
|
||||||
|
extensions: Optional[Dict[str, Dict[object, object]]]
|
||||||
|
|
||||||
|
|
||||||
|
class LifespanScope(TypedDict):
|
||||||
|
type: Literal["lifespan"]
|
||||||
|
asgi: ASGIVersions
|
||||||
|
state: NotRequired[Dict[str, Any]]
|
||||||
|
|
||||||
|
|
||||||
|
WWWScope = Union[HTTPScope, WebSocketScope]
|
||||||
|
Scope = Union[HTTPScope, WebSocketScope, LifespanScope]
|
||||||
|
|
||||||
|
|
||||||
|
class HTTPRequestEvent(TypedDict):
|
||||||
|
type: Literal["http.request"]
|
||||||
|
body: bytes
|
||||||
|
more_body: bool
|
||||||
|
|
||||||
|
|
||||||
|
class HTTPResponseDebugEvent(TypedDict):
|
||||||
|
type: Literal["http.response.debug"]
|
||||||
|
info: Dict[str, object]
|
||||||
|
|
||||||
|
|
||||||
|
class HTTPResponseStartEvent(TypedDict):
|
||||||
|
type: Literal["http.response.start"]
|
||||||
|
status: int
|
||||||
|
headers: Iterable[Tuple[bytes, bytes]]
|
||||||
|
trailers: bool
|
||||||
|
|
||||||
|
|
||||||
|
class HTTPResponseBodyEvent(TypedDict):
|
||||||
|
type: Literal["http.response.body"]
|
||||||
|
body: bytes
|
||||||
|
more_body: bool
|
||||||
|
|
||||||
|
|
||||||
|
class HTTPResponseTrailersEvent(TypedDict):
|
||||||
|
type: Literal["http.response.trailers"]
|
||||||
|
headers: Iterable[Tuple[bytes, bytes]]
|
||||||
|
more_trailers: bool
|
||||||
|
|
||||||
|
|
||||||
|
class HTTPResponsePathsendEvent(TypedDict):
|
||||||
|
type: Literal["http.response.pathsend"]
|
||||||
|
path: str
|
||||||
|
|
||||||
|
|
||||||
|
class HTTPServerPushEvent(TypedDict):
|
||||||
|
type: Literal["http.response.push"]
|
||||||
|
path: str
|
||||||
|
headers: Iterable[Tuple[bytes, bytes]]
|
||||||
|
|
||||||
|
|
||||||
|
class HTTPDisconnectEvent(TypedDict):
|
||||||
|
type: Literal["http.disconnect"]
|
||||||
|
|
||||||
|
|
||||||
|
class WebSocketConnectEvent(TypedDict):
|
||||||
|
type: Literal["websocket.connect"]
|
||||||
|
|
||||||
|
|
||||||
|
class WebSocketAcceptEvent(TypedDict):
|
||||||
|
type: Literal["websocket.accept"]
|
||||||
|
subprotocol: Optional[str]
|
||||||
|
headers: Iterable[Tuple[bytes, bytes]]
|
||||||
|
|
||||||
|
|
||||||
|
class WebSocketReceiveEvent(TypedDict):
|
||||||
|
type: Literal["websocket.receive"]
|
||||||
|
bytes: Optional[bytes]
|
||||||
|
text: Optional[str]
|
||||||
|
|
||||||
|
|
||||||
|
class WebSocketSendEvent(TypedDict):
|
||||||
|
type: Literal["websocket.send"]
|
||||||
|
bytes: Optional[bytes]
|
||||||
|
text: Optional[str]
|
||||||
|
|
||||||
|
|
||||||
|
class WebSocketResponseStartEvent(TypedDict):
|
||||||
|
type: Literal["websocket.http.response.start"]
|
||||||
|
status: int
|
||||||
|
headers: Iterable[Tuple[bytes, bytes]]
|
||||||
|
|
||||||
|
|
||||||
|
class WebSocketResponseBodyEvent(TypedDict):
|
||||||
|
type: Literal["websocket.http.response.body"]
|
||||||
|
body: bytes
|
||||||
|
more_body: bool
|
||||||
|
|
||||||
|
|
||||||
|
class WebSocketDisconnectEvent(TypedDict):
|
||||||
|
type: Literal["websocket.disconnect"]
|
||||||
|
code: int
|
||||||
|
reason: Optional[str]
|
||||||
|
|
||||||
|
|
||||||
|
class WebSocketCloseEvent(TypedDict):
|
||||||
|
type: Literal["websocket.close"]
|
||||||
|
code: int
|
||||||
|
reason: Optional[str]
|
||||||
|
|
||||||
|
|
||||||
|
class LifespanStartupEvent(TypedDict):
|
||||||
|
type: Literal["lifespan.startup"]
|
||||||
|
|
||||||
|
|
||||||
|
class LifespanShutdownEvent(TypedDict):
|
||||||
|
type: Literal["lifespan.shutdown"]
|
||||||
|
|
||||||
|
|
||||||
|
class LifespanStartupCompleteEvent(TypedDict):
|
||||||
|
type: Literal["lifespan.startup.complete"]
|
||||||
|
|
||||||
|
|
||||||
|
class LifespanStartupFailedEvent(TypedDict):
|
||||||
|
type: Literal["lifespan.startup.failed"]
|
||||||
|
message: str
|
||||||
|
|
||||||
|
|
||||||
|
class LifespanShutdownCompleteEvent(TypedDict):
|
||||||
|
type: Literal["lifespan.shutdown.complete"]
|
||||||
|
|
||||||
|
|
||||||
|
class LifespanShutdownFailedEvent(TypedDict):
|
||||||
|
type: Literal["lifespan.shutdown.failed"]
|
||||||
|
message: str
|
||||||
|
|
||||||
|
|
||||||
|
ASGIReceiveEvent = Union[
|
||||||
|
HTTPRequestEvent,
|
||||||
|
HTTPDisconnectEvent,
|
||||||
|
WebSocketConnectEvent,
|
||||||
|
WebSocketReceiveEvent,
|
||||||
|
WebSocketDisconnectEvent,
|
||||||
|
LifespanStartupEvent,
|
||||||
|
LifespanShutdownEvent,
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
ASGISendEvent = Union[
|
||||||
|
HTTPResponseStartEvent,
|
||||||
|
HTTPResponseBodyEvent,
|
||||||
|
HTTPResponseTrailersEvent,
|
||||||
|
HTTPServerPushEvent,
|
||||||
|
HTTPDisconnectEvent,
|
||||||
|
WebSocketAcceptEvent,
|
||||||
|
WebSocketSendEvent,
|
||||||
|
WebSocketResponseStartEvent,
|
||||||
|
WebSocketResponseBodyEvent,
|
||||||
|
WebSocketCloseEvent,
|
||||||
|
LifespanStartupCompleteEvent,
|
||||||
|
LifespanStartupFailedEvent,
|
||||||
|
LifespanShutdownCompleteEvent,
|
||||||
|
LifespanShutdownFailedEvent,
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
ASGIReceiveCallable = Callable[[], Awaitable[ASGIReceiveEvent]]
|
||||||
|
ASGISendCallable = Callable[[ASGISendEvent], Awaitable[None]]
|
||||||
|
|
||||||
|
|
||||||
|
class ASGI2Protocol(Protocol):
|
||||||
|
def __init__(self, scope: Scope) -> None:
|
||||||
|
...
|
||||||
|
|
||||||
|
async def __call__(
|
||||||
|
self, receive: ASGIReceiveCallable, send: ASGISendCallable
|
||||||
|
) -> None:
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
ASGI2Application = Type[ASGI2Protocol]
|
||||||
|
ASGI3Application = Callable[
|
||||||
|
[
|
||||||
|
Scope,
|
||||||
|
ASGIReceiveCallable,
|
||||||
|
ASGISendCallable,
|
||||||
|
],
|
||||||
|
Awaitable[None],
|
||||||
|
]
|
||||||
|
ASGIApplication = Union[ASGI2Application, ASGI3Application]
|
||||||
166
.venv/Lib/site-packages/asgiref/wsgi.py
Normal file
166
.venv/Lib/site-packages/asgiref/wsgi.py
Normal file
@@ -0,0 +1,166 @@
|
|||||||
|
import sys
|
||||||
|
from tempfile import SpooledTemporaryFile
|
||||||
|
|
||||||
|
from asgiref.sync import AsyncToSync, sync_to_async
|
||||||
|
|
||||||
|
|
||||||
|
class WsgiToAsgi:
|
||||||
|
"""
|
||||||
|
Wraps a WSGI application to make it into an ASGI application.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, wsgi_application):
|
||||||
|
self.wsgi_application = wsgi_application
|
||||||
|
|
||||||
|
async def __call__(self, scope, receive, send):
|
||||||
|
"""
|
||||||
|
ASGI application instantiation point.
|
||||||
|
We return a new WsgiToAsgiInstance here with the WSGI app
|
||||||
|
and the scope, ready to respond when it is __call__ed.
|
||||||
|
"""
|
||||||
|
await WsgiToAsgiInstance(self.wsgi_application)(scope, receive, send)
|
||||||
|
|
||||||
|
|
||||||
|
class WsgiToAsgiInstance:
|
||||||
|
"""
|
||||||
|
Per-socket instance of a wrapped WSGI application
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, wsgi_application):
|
||||||
|
self.wsgi_application = wsgi_application
|
||||||
|
self.response_started = False
|
||||||
|
self.response_content_length = None
|
||||||
|
|
||||||
|
async def __call__(self, scope, receive, send):
|
||||||
|
if scope["type"] != "http":
|
||||||
|
raise ValueError("WSGI wrapper received a non-HTTP scope")
|
||||||
|
self.scope = scope
|
||||||
|
with SpooledTemporaryFile(max_size=65536) as body:
|
||||||
|
# Alright, wait for the http.request messages
|
||||||
|
while True:
|
||||||
|
message = await receive()
|
||||||
|
if message["type"] != "http.request":
|
||||||
|
raise ValueError("WSGI wrapper received a non-HTTP-request message")
|
||||||
|
body.write(message.get("body", b""))
|
||||||
|
if not message.get("more_body"):
|
||||||
|
break
|
||||||
|
body.seek(0)
|
||||||
|
# Wrap send so it can be called from the subthread
|
||||||
|
self.sync_send = AsyncToSync(send)
|
||||||
|
# Call the WSGI app
|
||||||
|
await self.run_wsgi_app(body)
|
||||||
|
|
||||||
|
def build_environ(self, scope, body):
|
||||||
|
"""
|
||||||
|
Builds a scope and request body into a WSGI environ object.
|
||||||
|
"""
|
||||||
|
script_name = scope.get("root_path", "").encode("utf8").decode("latin1")
|
||||||
|
path_info = scope["path"].encode("utf8").decode("latin1")
|
||||||
|
if path_info.startswith(script_name):
|
||||||
|
path_info = path_info[len(script_name) :]
|
||||||
|
environ = {
|
||||||
|
"REQUEST_METHOD": scope["method"],
|
||||||
|
"SCRIPT_NAME": script_name,
|
||||||
|
"PATH_INFO": path_info,
|
||||||
|
"QUERY_STRING": scope["query_string"].decode("ascii"),
|
||||||
|
"SERVER_PROTOCOL": "HTTP/%s" % scope["http_version"],
|
||||||
|
"wsgi.version": (1, 0),
|
||||||
|
"wsgi.url_scheme": scope.get("scheme", "http"),
|
||||||
|
"wsgi.input": body,
|
||||||
|
"wsgi.errors": sys.stderr,
|
||||||
|
"wsgi.multithread": True,
|
||||||
|
"wsgi.multiprocess": True,
|
||||||
|
"wsgi.run_once": False,
|
||||||
|
}
|
||||||
|
# Get server name and port - required in WSGI, not in ASGI
|
||||||
|
if "server" in scope:
|
||||||
|
environ["SERVER_NAME"] = scope["server"][0]
|
||||||
|
environ["SERVER_PORT"] = str(scope["server"][1])
|
||||||
|
else:
|
||||||
|
environ["SERVER_NAME"] = "localhost"
|
||||||
|
environ["SERVER_PORT"] = "80"
|
||||||
|
|
||||||
|
if scope.get("client") is not None:
|
||||||
|
environ["REMOTE_ADDR"] = scope["client"][0]
|
||||||
|
|
||||||
|
# Go through headers and make them into environ entries
|
||||||
|
for name, value in self.scope.get("headers", []):
|
||||||
|
name = name.decode("latin1")
|
||||||
|
if name == "content-length":
|
||||||
|
corrected_name = "CONTENT_LENGTH"
|
||||||
|
elif name == "content-type":
|
||||||
|
corrected_name = "CONTENT_TYPE"
|
||||||
|
else:
|
||||||
|
corrected_name = "HTTP_%s" % name.upper().replace("-", "_")
|
||||||
|
# HTTPbis say only ASCII chars are allowed in headers, but we latin1 just in case
|
||||||
|
value = value.decode("latin1")
|
||||||
|
if corrected_name in environ:
|
||||||
|
value = environ[corrected_name] + "," + value
|
||||||
|
environ[corrected_name] = value
|
||||||
|
return environ
|
||||||
|
|
||||||
|
def start_response(self, status, response_headers, exc_info=None):
|
||||||
|
"""
|
||||||
|
WSGI start_response callable.
|
||||||
|
"""
|
||||||
|
# Don't allow re-calling once response has begun
|
||||||
|
if self.response_started:
|
||||||
|
raise exc_info[1].with_traceback(exc_info[2])
|
||||||
|
# Don't allow re-calling without exc_info
|
||||||
|
if hasattr(self, "response_start") and exc_info is None:
|
||||||
|
raise ValueError(
|
||||||
|
"You cannot call start_response a second time without exc_info"
|
||||||
|
)
|
||||||
|
# Extract status code
|
||||||
|
status_code, _ = status.split(" ", 1)
|
||||||
|
status_code = int(status_code)
|
||||||
|
# Extract headers
|
||||||
|
headers = [
|
||||||
|
(name.lower().encode("ascii"), value.encode("ascii"))
|
||||||
|
for name, value in response_headers
|
||||||
|
]
|
||||||
|
# Extract content-length
|
||||||
|
self.response_content_length = None
|
||||||
|
for name, value in response_headers:
|
||||||
|
if name.lower() == "content-length":
|
||||||
|
self.response_content_length = int(value)
|
||||||
|
# Build and send response start message.
|
||||||
|
self.response_start = {
|
||||||
|
"type": "http.response.start",
|
||||||
|
"status": status_code,
|
||||||
|
"headers": headers,
|
||||||
|
}
|
||||||
|
|
||||||
|
@sync_to_async
|
||||||
|
def run_wsgi_app(self, body):
|
||||||
|
"""
|
||||||
|
Called in a subthread to run the WSGI app. We encapsulate like
|
||||||
|
this so that the start_response callable is called in the same thread.
|
||||||
|
"""
|
||||||
|
# Translate the scope and incoming request body into a WSGI environ
|
||||||
|
environ = self.build_environ(self.scope, body)
|
||||||
|
# Run the WSGI app
|
||||||
|
bytes_sent = 0
|
||||||
|
for output in self.wsgi_application(environ, self.start_response):
|
||||||
|
# If this is the first response, include the response headers
|
||||||
|
if not self.response_started:
|
||||||
|
self.response_started = True
|
||||||
|
self.sync_send(self.response_start)
|
||||||
|
# If the application supplies a Content-Length header
|
||||||
|
if self.response_content_length is not None:
|
||||||
|
# The server should not transmit more bytes to the client than the header allows
|
||||||
|
bytes_allowed = self.response_content_length - bytes_sent
|
||||||
|
if len(output) > bytes_allowed:
|
||||||
|
output = output[:bytes_allowed]
|
||||||
|
self.sync_send(
|
||||||
|
{"type": "http.response.body", "body": output, "more_body": True}
|
||||||
|
)
|
||||||
|
bytes_sent += len(output)
|
||||||
|
# The server should stop iterating over the response when enough data has been sent
|
||||||
|
if bytes_sent == self.response_content_length:
|
||||||
|
break
|
||||||
|
# Close connection
|
||||||
|
if not self.response_started:
|
||||||
|
self.response_started = True
|
||||||
|
self.sync_send(self.response_start)
|
||||||
|
self.sync_send({"type": "http.response.body"})
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
pip
|
||||||
78
.venv/Lib/site-packages/certifi-2025.10.5.dist-info/METADATA
Normal file
78
.venv/Lib/site-packages/certifi-2025.10.5.dist-info/METADATA
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: certifi
|
||||||
|
Version: 2025.10.5
|
||||||
|
Summary: Python package for providing Mozilla's CA Bundle.
|
||||||
|
Home-page: https://github.com/certifi/python-certifi
|
||||||
|
Author: Kenneth Reitz
|
||||||
|
Author-email: me@kennethreitz.com
|
||||||
|
License: MPL-2.0
|
||||||
|
Project-URL: Source, https://github.com/certifi/python-certifi
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
|
||||||
|
Classifier: Natural Language :: English
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Requires-Python: >=3.7
|
||||||
|
License-File: LICENSE
|
||||||
|
Dynamic: author
|
||||||
|
Dynamic: author-email
|
||||||
|
Dynamic: classifier
|
||||||
|
Dynamic: description
|
||||||
|
Dynamic: home-page
|
||||||
|
Dynamic: license
|
||||||
|
Dynamic: license-file
|
||||||
|
Dynamic: project-url
|
||||||
|
Dynamic: requires-python
|
||||||
|
Dynamic: summary
|
||||||
|
|
||||||
|
Certifi: Python SSL Certificates
|
||||||
|
================================
|
||||||
|
|
||||||
|
Certifi provides Mozilla's carefully curated collection of Root Certificates for
|
||||||
|
validating the trustworthiness of SSL certificates while verifying the identity
|
||||||
|
of TLS hosts. It has been extracted from the `Requests`_ project.
|
||||||
|
|
||||||
|
Installation
|
||||||
|
------------
|
||||||
|
|
||||||
|
``certifi`` is available on PyPI. Simply install it with ``pip``::
|
||||||
|
|
||||||
|
$ pip install certifi
|
||||||
|
|
||||||
|
Usage
|
||||||
|
-----
|
||||||
|
|
||||||
|
To reference the installed certificate authority (CA) bundle, you can use the
|
||||||
|
built-in function::
|
||||||
|
|
||||||
|
>>> import certifi
|
||||||
|
|
||||||
|
>>> certifi.where()
|
||||||
|
'/usr/local/lib/python3.7/site-packages/certifi/cacert.pem'
|
||||||
|
|
||||||
|
Or from the command line::
|
||||||
|
|
||||||
|
$ python -m certifi
|
||||||
|
/usr/local/lib/python3.7/site-packages/certifi/cacert.pem
|
||||||
|
|
||||||
|
Enjoy!
|
||||||
|
|
||||||
|
.. _`Requests`: https://requests.readthedocs.io/en/master/
|
||||||
|
|
||||||
|
Addition/Removal of Certificates
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
Certifi does not support any addition/removal or other modification of the
|
||||||
|
CA trust store content. This project is intended to provide a reliable and
|
||||||
|
highly portable root of trust to python deployments. Look to upstream projects
|
||||||
|
for methods to use alternate trust.
|
||||||
14
.venv/Lib/site-packages/certifi-2025.10.5.dist-info/RECORD
Normal file
14
.venv/Lib/site-packages/certifi-2025.10.5.dist-info/RECORD
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
certifi-2025.10.5.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
|
||||||
|
certifi-2025.10.5.dist-info/METADATA,sha256=RzyR4sT6xRN1pNNy24IHVOlZuDJh1BNfaMa04zEadtk,2474
|
||||||
|
certifi-2025.10.5.dist-info/RECORD,,
|
||||||
|
certifi-2025.10.5.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
||||||
|
certifi-2025.10.5.dist-info/licenses/LICENSE,sha256=6TcW2mucDVpKHfYP5pWzcPBpVgPSH2-D8FPkLPwQyvc,989
|
||||||
|
certifi-2025.10.5.dist-info/top_level.txt,sha256=KMu4vUCfsjLrkPbSNdgdekS-pVJzBAJFO__nI8NF6-U,8
|
||||||
|
certifi/__init__.py,sha256=jWkaYHMk4oIPSSBEK5bLMbO_qrkyNm_cRFx-D16-3Ks,94
|
||||||
|
certifi/__main__.py,sha256=xBBoj905TUWBLRGANOcf7oi6e-3dMP4cEoG9OyMs11g,243
|
||||||
|
certifi/__pycache__/__init__.cpython-312.pyc,,
|
||||||
|
certifi/__pycache__/__main__.cpython-312.pyc,,
|
||||||
|
certifi/__pycache__/core.cpython-312.pyc,,
|
||||||
|
certifi/cacert.pem,sha256=IIn8WiWDZAH67pn3IkYLAbOTmZdGoPuBeUNmbW7MBFg,291366
|
||||||
|
certifi/core.py,sha256=XFXycndG5pf37ayeF8N32HUuDafsyhkVMbO4BAPWHa0,3394
|
||||||
|
certifi/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
||||||
@@ -0,0 +1,5 @@
|
|||||||
|
Wheel-Version: 1.0
|
||||||
|
Generator: setuptools (80.9.0)
|
||||||
|
Root-Is-Purelib: true
|
||||||
|
Tag: py3-none-any
|
||||||
|
|
||||||
@@ -0,0 +1,20 @@
|
|||||||
|
This package contains a modified version of ca-bundle.crt:
|
||||||
|
|
||||||
|
ca-bundle.crt -- Bundle of CA Root Certificates
|
||||||
|
|
||||||
|
This is a bundle of X.509 certificates of public Certificate Authorities
|
||||||
|
(CA). These were automatically extracted from Mozilla's root certificates
|
||||||
|
file (certdata.txt). This file can be found in the mozilla source tree:
|
||||||
|
https://hg.mozilla.org/mozilla-central/file/tip/security/nss/lib/ckfw/builtins/certdata.txt
|
||||||
|
It contains the certificates in PEM format and therefore
|
||||||
|
can be directly used with curl / libcurl / php_curl, or with
|
||||||
|
an Apache+mod_ssl webserver for SSL client authentication.
|
||||||
|
Just configure this file as the SSLCACertificateFile.#
|
||||||
|
|
||||||
|
***** BEGIN LICENSE BLOCK *****
|
||||||
|
This Source Code Form is subject to the terms of the Mozilla Public License,
|
||||||
|
v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain
|
||||||
|
one at http://mozilla.org/MPL/2.0/.
|
||||||
|
|
||||||
|
***** END LICENSE BLOCK *****
|
||||||
|
@(#) $RCSfile: certdata.txt,v $ $Revision: 1.80 $ $Date: 2011/11/03 15:11:58 $
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
certifi
|
||||||
4
.venv/Lib/site-packages/certifi/__init__.py
Normal file
4
.venv/Lib/site-packages/certifi/__init__.py
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
from .core import contents, where
|
||||||
|
|
||||||
|
__all__ = ["contents", "where"]
|
||||||
|
__version__ = "2025.10.05"
|
||||||
12
.venv/Lib/site-packages/certifi/__main__.py
Normal file
12
.venv/Lib/site-packages/certifi/__main__.py
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
import argparse
|
||||||
|
|
||||||
|
from certifi import contents, where
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("-c", "--contents", action="store_true")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if args.contents:
|
||||||
|
print(contents())
|
||||||
|
else:
|
||||||
|
print(where())
|
||||||
Binary file not shown.
Binary file not shown.
BIN
.venv/Lib/site-packages/certifi/__pycache__/core.cpython-312.pyc
Normal file
BIN
.venv/Lib/site-packages/certifi/__pycache__/core.cpython-312.pyc
Normal file
Binary file not shown.
4800
.venv/Lib/site-packages/certifi/cacert.pem
Normal file
4800
.venv/Lib/site-packages/certifi/cacert.pem
Normal file
File diff suppressed because it is too large
Load Diff
83
.venv/Lib/site-packages/certifi/core.py
Normal file
83
.venv/Lib/site-packages/certifi/core.py
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
"""
|
||||||
|
certifi.py
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
This module returns the installation location of cacert.pem or its contents.
|
||||||
|
"""
|
||||||
|
import sys
|
||||||
|
import atexit
|
||||||
|
|
||||||
|
def exit_cacert_ctx() -> None:
|
||||||
|
_CACERT_CTX.__exit__(None, None, None) # type: ignore[union-attr]
|
||||||
|
|
||||||
|
|
||||||
|
if sys.version_info >= (3, 11):
|
||||||
|
|
||||||
|
from importlib.resources import as_file, files
|
||||||
|
|
||||||
|
_CACERT_CTX = None
|
||||||
|
_CACERT_PATH = None
|
||||||
|
|
||||||
|
def where() -> str:
|
||||||
|
# This is slightly terrible, but we want to delay extracting the file
|
||||||
|
# in cases where we're inside of a zipimport situation until someone
|
||||||
|
# actually calls where(), but we don't want to re-extract the file
|
||||||
|
# on every call of where(), so we'll do it once then store it in a
|
||||||
|
# global variable.
|
||||||
|
global _CACERT_CTX
|
||||||
|
global _CACERT_PATH
|
||||||
|
if _CACERT_PATH is None:
|
||||||
|
# This is slightly janky, the importlib.resources API wants you to
|
||||||
|
# manage the cleanup of this file, so it doesn't actually return a
|
||||||
|
# path, it returns a context manager that will give you the path
|
||||||
|
# when you enter it and will do any cleanup when you leave it. In
|
||||||
|
# the common case of not needing a temporary file, it will just
|
||||||
|
# return the file system location and the __exit__() is a no-op.
|
||||||
|
#
|
||||||
|
# We also have to hold onto the actual context manager, because
|
||||||
|
# it will do the cleanup whenever it gets garbage collected, so
|
||||||
|
# we will also store that at the global level as well.
|
||||||
|
_CACERT_CTX = as_file(files("certifi").joinpath("cacert.pem"))
|
||||||
|
_CACERT_PATH = str(_CACERT_CTX.__enter__())
|
||||||
|
atexit.register(exit_cacert_ctx)
|
||||||
|
|
||||||
|
return _CACERT_PATH
|
||||||
|
|
||||||
|
def contents() -> str:
|
||||||
|
return files("certifi").joinpath("cacert.pem").read_text(encoding="ascii")
|
||||||
|
|
||||||
|
else:
|
||||||
|
|
||||||
|
from importlib.resources import path as get_path, read_text
|
||||||
|
|
||||||
|
_CACERT_CTX = None
|
||||||
|
_CACERT_PATH = None
|
||||||
|
|
||||||
|
def where() -> str:
|
||||||
|
# This is slightly terrible, but we want to delay extracting the
|
||||||
|
# file in cases where we're inside of a zipimport situation until
|
||||||
|
# someone actually calls where(), but we don't want to re-extract
|
||||||
|
# the file on every call of where(), so we'll do it once then store
|
||||||
|
# it in a global variable.
|
||||||
|
global _CACERT_CTX
|
||||||
|
global _CACERT_PATH
|
||||||
|
if _CACERT_PATH is None:
|
||||||
|
# This is slightly janky, the importlib.resources API wants you
|
||||||
|
# to manage the cleanup of this file, so it doesn't actually
|
||||||
|
# return a path, it returns a context manager that will give
|
||||||
|
# you the path when you enter it and will do any cleanup when
|
||||||
|
# you leave it. In the common case of not needing a temporary
|
||||||
|
# file, it will just return the file system location and the
|
||||||
|
# __exit__() is a no-op.
|
||||||
|
#
|
||||||
|
# We also have to hold onto the actual context manager, because
|
||||||
|
# it will do the cleanup whenever it gets garbage collected, so
|
||||||
|
# we will also store that at the global level as well.
|
||||||
|
_CACERT_CTX = get_path("certifi", "cacert.pem")
|
||||||
|
_CACERT_PATH = str(_CACERT_CTX.__enter__())
|
||||||
|
atexit.register(exit_cacert_ctx)
|
||||||
|
|
||||||
|
return _CACERT_PATH
|
||||||
|
|
||||||
|
def contents() -> str:
|
||||||
|
return read_text("certifi", "cacert.pem", encoding="ascii")
|
||||||
0
.venv/Lib/site-packages/certifi/py.typed
Normal file
0
.venv/Lib/site-packages/certifi/py.typed
Normal file
@@ -0,0 +1 @@
|
|||||||
|
pip
|
||||||
@@ -0,0 +1,764 @@
|
|||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: charset-normalizer
|
||||||
|
Version: 3.4.4
|
||||||
|
Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
|
||||||
|
Author-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
Maintainer-email: "Ahmed R. TAHRI" <tahri.ahmed@proton.me>
|
||||||
|
License: MIT
|
||||||
|
Project-URL: Changelog, https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md
|
||||||
|
Project-URL: Documentation, https://charset-normalizer.readthedocs.io/
|
||||||
|
Project-URL: Code, https://github.com/jawah/charset_normalizer
|
||||||
|
Project-URL: Issue tracker, https://github.com/jawah/charset_normalizer/issues
|
||||||
|
Keywords: encoding,charset,charset-detector,detector,normalization,unicode,chardet,detect
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3.7
|
||||||
|
Classifier: Programming Language :: Python :: 3.8
|
||||||
|
Classifier: Programming Language :: Python :: 3.9
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||||
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
||||||
|
Classifier: Topic :: Text Processing :: Linguistic
|
||||||
|
Classifier: Topic :: Utilities
|
||||||
|
Classifier: Typing :: Typed
|
||||||
|
Requires-Python: >=3.7
|
||||||
|
Description-Content-Type: text/markdown
|
||||||
|
License-File: LICENSE
|
||||||
|
Provides-Extra: unicode-backport
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
<h1 align="center">Charset Detection, for Everyone 👋</h1>
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<sup>The Real First Universal Charset Detector</sup><br>
|
||||||
|
<a href="https://pypi.org/project/charset-normalizer">
|
||||||
|
<img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />
|
||||||
|
</a>
|
||||||
|
<a href="https://pepy.tech/project/charset-normalizer/">
|
||||||
|
<img alt="Download Count Total" src="https://static.pepy.tech/badge/charset-normalizer/month" />
|
||||||
|
</a>
|
||||||
|
<a href="https://bestpractices.coreinfrastructure.org/projects/7297">
|
||||||
|
<img src="https://bestpractices.coreinfrastructure.org/projects/7297/badge">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>Featured Packages</i></sup><br>
|
||||||
|
<a href="https://github.com/jawah/niquests">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Niquests-Most_Advanced_HTTP_Client-cyan">
|
||||||
|
</a>
|
||||||
|
<a href="https://github.com/jawah/wassima">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Wassima-Certifi_Replacement-cyan">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
<p align="center">
|
||||||
|
<sup><i>In other language (unofficial port - by the community)</i></sup><br>
|
||||||
|
<a href="https://github.com/nickspring/charset-normalizer-rs">
|
||||||
|
<img alt="Static Badge" src="https://img.shields.io/badge/Rust-red">
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
|
||||||
|
> I'm trying to resolve the issue by taking a new approach.
|
||||||
|
> All IANA character set names for which the Python core library provides codecs are supported.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
|
||||||
|
</p>
|
||||||
|
|
||||||
|
This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.
|
||||||
|
|
||||||
|
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
|
||||||
|
|--------------------------------------------------|:---------------------------------------------:|:--------------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
|
||||||
|
| `Fast` | ❌ | ✅ | ✅ |
|
||||||
|
| `Universal**` | ❌ | ✅ | ❌ |
|
||||||
|
| `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ |
|
||||||
|
| `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ |
|
||||||
|
| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
|
||||||
|
| `Native Python` | ✅ | ✅ | ❌ |
|
||||||
|
| `Detect spoken language` | ❌ | ✅ | N/A |
|
||||||
|
| `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ |
|
||||||
|
| `Whl Size (min)` | 193.6 kB | 42 kB | ~200 kB |
|
||||||
|
| `Supported Encoding` | 33 | 🎉 [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br>
|
||||||
|
|
||||||
|
## ⚡ Performance
|
||||||
|
|
||||||
|
This package offer better performance than its counterpart Chardet. Here are some numbers.
|
||||||
|
|
||||||
|
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
|
||||||
|
|-----------------------------------------------|:--------:|:------------------:|:------------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 86 % | 63 ms | 16 file/sec |
|
||||||
|
| charset-normalizer | **98 %** | **10 ms** | 100 file/sec |
|
||||||
|
|
||||||
|
| Package | 99th percentile | 95th percentile | 50th percentile |
|
||||||
|
|-----------------------------------------------|:---------------:|:---------------:|:---------------:|
|
||||||
|
| [chardet](https://github.com/chardet/chardet) | 265 ms | 71 ms | 7 ms |
|
||||||
|
| charset-normalizer | 100 ms | 50 ms | 5 ms |
|
||||||
|
|
||||||
|
_updated as of december 2024 using CPython 3.12_
|
||||||
|
|
||||||
|
Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.
|
||||||
|
|
||||||
|
> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.
|
||||||
|
> And yes, these results might change at any time. The dataset can be updated to include more files.
|
||||||
|
> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.
|
||||||
|
> Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability
|
||||||
|
> (e.g. Supported Encoding) Challenge-them if you want.
|
||||||
|
|
||||||
|
## ✨ Installation
|
||||||
|
|
||||||
|
Using pip:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install charset-normalizer -U
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🚀 Basic Usage
|
||||||
|
|
||||||
|
### CLI
|
||||||
|
This package comes with a CLI.
|
||||||
|
|
||||||
|
```
|
||||||
|
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
|
||||||
|
file [file ...]
|
||||||
|
|
||||||
|
The Real First Universal Charset Detector. Discover originating encoding used
|
||||||
|
on text file. Normalize text to unicode.
|
||||||
|
|
||||||
|
positional arguments:
|
||||||
|
files File(s) to be analysed
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
-h, --help show this help message and exit
|
||||||
|
-v, --verbose Display complementary information about file if any.
|
||||||
|
Stdout will contain logs about the detection process.
|
||||||
|
-a, --with-alternative
|
||||||
|
Output complementary possibilities if any. Top-level
|
||||||
|
JSON WILL be a list.
|
||||||
|
-n, --normalize Permit to normalize input file. If not set, program
|
||||||
|
does not write anything.
|
||||||
|
-m, --minimal Only output the charset detected to STDOUT. Disabling
|
||||||
|
JSON output.
|
||||||
|
-r, --replace Replace file when trying to normalize it instead of
|
||||||
|
creating a new one.
|
||||||
|
-f, --force Replace file without asking if you are sure, use this
|
||||||
|
flag with caution.
|
||||||
|
-t THRESHOLD, --threshold THRESHOLD
|
||||||
|
Define a custom maximum amount of chaos allowed in
|
||||||
|
decoded content. 0. <= chaos <= 1.
|
||||||
|
--version Show version information and exit.
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m charset_normalizer ./data/sample.1.fr.srt
|
||||||
|
```
|
||||||
|
|
||||||
|
🎉 Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
|
||||||
|
"encoding": "cp1252",
|
||||||
|
"encoding_aliases": [
|
||||||
|
"1252",
|
||||||
|
"windows_1252"
|
||||||
|
],
|
||||||
|
"alternative_encodings": [
|
||||||
|
"cp1254",
|
||||||
|
"cp1256",
|
||||||
|
"cp1258",
|
||||||
|
"iso8859_14",
|
||||||
|
"iso8859_15",
|
||||||
|
"iso8859_16",
|
||||||
|
"iso8859_3",
|
||||||
|
"iso8859_9",
|
||||||
|
"latin_1",
|
||||||
|
"mbcs"
|
||||||
|
],
|
||||||
|
"language": "French",
|
||||||
|
"alphabets": [
|
||||||
|
"Basic Latin",
|
||||||
|
"Latin-1 Supplement"
|
||||||
|
],
|
||||||
|
"has_sig_or_bom": false,
|
||||||
|
"chaos": 0.149,
|
||||||
|
"coherence": 97.152,
|
||||||
|
"unicode_path": null,
|
||||||
|
"is_preferred": true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python
|
||||||
|
*Just print out normalized text*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import from_path
|
||||||
|
|
||||||
|
results = from_path('./my_subtitle.srt')
|
||||||
|
|
||||||
|
print(str(results.best()))
|
||||||
|
```
|
||||||
|
|
||||||
|
*Upgrade your code without effort*
|
||||||
|
```python
|
||||||
|
from charset_normalizer import detect
|
||||||
|
```
|
||||||
|
|
||||||
|
The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible.
|
||||||
|
|
||||||
|
See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)
|
||||||
|
|
||||||
|
## 😇 Why
|
||||||
|
|
||||||
|
When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
|
||||||
|
reliable alternative using a completely different method. Also! I never back down on a good challenge!
|
||||||
|
|
||||||
|
I **don't care** about the **originating charset** encoding, because **two different tables** can
|
||||||
|
produce **two identical rendered string.**
|
||||||
|
What I want is to get readable text, the best I can.
|
||||||
|
|
||||||
|
In a way, **I'm brute forcing text decoding.** How cool is that ? 😎
|
||||||
|
|
||||||
|
Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair Unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.
|
||||||
|
|
||||||
|
## 🍰 How
|
||||||
|
|
||||||
|
- Discard all charset encoding table that could not fit the binary content.
|
||||||
|
- Measure noise, or the mess once opened (by chunks) with a corresponding charset encoding.
|
||||||
|
- Extract matches with the lowest mess detected.
|
||||||
|
- Additionally, we measure coherence / probe for a language.
|
||||||
|
|
||||||
|
**Wait a minute**, what is noise/mess and coherence according to **YOU ?**
|
||||||
|
|
||||||
|
*Noise :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then
|
||||||
|
**I established** some ground rules about **what is obvious** when **it seems like** a mess (aka. defining noise in rendered text).
|
||||||
|
I know that my interpretation of what is noise is probably incomplete, feel free to contribute in order to
|
||||||
|
improve or rewrite it.
|
||||||
|
|
||||||
|
*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought
|
||||||
|
that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.
|
||||||
|
|
||||||
|
## ⚡ Known limitations
|
||||||
|
|
||||||
|
- Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
|
||||||
|
- Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.
|
||||||
|
|
||||||
|
## ⚠️ About Python EOLs
|
||||||
|
|
||||||
|
**If you are running:**
|
||||||
|
|
||||||
|
- Python >=2.7,<3.5: Unsupported
|
||||||
|
- Python 3.5: charset-normalizer < 2.1
|
||||||
|
- Python 3.6: charset-normalizer < 3.1
|
||||||
|
- Python 3.7: charset-normalizer < 4.0
|
||||||
|
|
||||||
|
Upgrade your Python interpreter as soon as possible.
|
||||||
|
|
||||||
|
## 👤 Contributing
|
||||||
|
|
||||||
|
Contributions, issues and feature requests are very much welcome.<br />
|
||||||
|
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.
|
||||||
|
|
||||||
|
## 📝 License
|
||||||
|
|
||||||
|
Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
|
||||||
|
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.
|
||||||
|
|
||||||
|
Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)
|
||||||
|
|
||||||
|
## 💼 For Enterprise
|
||||||
|
|
||||||
|
Professional support for charset-normalizer is available as part of the [Tidelift
|
||||||
|
Subscription][1]. Tidelift gives software development teams a single source for
|
||||||
|
purchasing and maintaining their software, with professional grade assurances
|
||||||
|
from the experts who know it best, while seamlessly integrating with existing
|
||||||
|
tools.
|
||||||
|
|
||||||
|
[1]: https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme
|
||||||
|
|
||||||
|
[](https://www.bestpractices.dev/projects/7297)
|
||||||
|
|
||||||
|
# Changelog
|
||||||
|
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
|
|
||||||
|
## [3.4.4](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.4) (2025-10-13)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Bound `setuptools` to a specific constraint `setuptools>=68,<=81`.
|
||||||
|
- Raised upper bound of mypyc for the optional pre-built extension to v1.18.2
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `setuptools-scm` as a build dependency.
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- Enforced hashes in `dev-requirements.txt` and created `ci-requirements.txt` for security purposes.
|
||||||
|
- Additional pre-built wheels for riscv64, s390x, and armv7l architectures.
|
||||||
|
- Restore ` multiple.intoto.jsonl` in GitHub releases in addition to individual attestation file per wheel.
|
||||||
|
|
||||||
|
## [3.4.3](https://github.com/Ousret/charset_normalizer/compare/3.4.2...3.4.3) (2025-08-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- mypy(c) is no longer a required dependency at build time if `CHARSET_NORMALIZER_USE_MYPYC` isn't set to `1`. (#595) (#583)
|
||||||
|
- automatically lower confidence on small bytes samples that are not Unicode in `detect` output legacy function. (#391)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Custom build backend to overcome inability to mark mypy as an optional dependency in the build phase.
|
||||||
|
- Support for Python 3.14
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- sdist archive contained useless directories.
|
||||||
|
- automatically fallback on valid UTF-16 or UTF-32 even if the md says it's noisy. (#633)
|
||||||
|
|
||||||
|
### Misc
|
||||||
|
- SBOM are automatically published to the relevant GitHub release to comply with regulatory changes.
|
||||||
|
Each published wheel comes with its SBOM. We choose CycloneDX as the format.
|
||||||
|
- Prebuilt optimized wheel are no longer distributed by default for CPython 3.7 due to a change in cibuildwheel.
|
||||||
|
|
||||||
|
## [3.4.2](https://github.com/Ousret/charset_normalizer/compare/3.4.1...3.4.2) (2025-05-02)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Addressed the DeprecationWarning in our CLI regarding `argparse.FileType` by backporting the target class into the package. (#591)
|
||||||
|
- Improved the overall reliability of the detector with CJK Ideographs. (#605) (#587)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.15 for Python >= 3.8
|
||||||
|
|
||||||
|
## [3.4.1](https://github.com/Ousret/charset_normalizer/compare/3.4.0...3.4.1) (2024-12-24)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` using setuptools as the build backend.
|
||||||
|
- Enforce annotation delayed loading for a simpler and consistent types in the project.
|
||||||
|
- Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- pre-commit configuration.
|
||||||
|
- noxfile.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- `build-requirements.txt` as per using `pyproject.toml` native build configuration.
|
||||||
|
- `bin/integration.py` and `bin/serve.py` in favor of downstream integration test (see noxfile).
|
||||||
|
- `setup.cfg` in favor of `pyproject.toml` metadata configuration.
|
||||||
|
- Unused `utils.range_scan` function.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Converting content to Unicode bytes may insert `utf_8` instead of preferred `utf-8`. (#572)
|
||||||
|
- Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+
|
||||||
|
|
||||||
|
## [3.4.0](https://github.com/Ousret/charset_normalizer/compare/3.3.2...3.4.0) (2024-10-08)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
|
||||||
|
- Support for Python 3.13 (#512)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
|
||||||
|
- Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) (#537)
|
||||||
|
- Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#381)
|
||||||
|
|
||||||
|
## [3.3.2](https://github.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2) (2023-10-31)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unintentional memory usage regression when using large payload that match several encoding (#376)
|
||||||
|
- Regression on some detection case showcased in the documentation (#371)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)
|
||||||
|
|
||||||
|
## [3.3.1](https://github.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1) (2023-10-22)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
|
||||||
|
- Improved the general detection reliability based on reports from the community
|
||||||
|
|
||||||
|
## [3.3.0](https://github.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0) (2023-09-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
|
||||||
|
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias (#323)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
|
||||||
|
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
|
||||||
|
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.8
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350)
|
||||||
|
|
||||||
|
## [3.2.0](https://github.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) (2023-06-07)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
|
||||||
|
- Minor improvement over the global detection reliability
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
|
||||||
|
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
|
||||||
|
- Explicit support for Python 3.12
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #289)
|
||||||
|
|
||||||
|
## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) (2023-03-06)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.6 (PR #260)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional speedup provided by mypy/c 1.0.1
|
||||||
|
|
||||||
|
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) (2022-11-18)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Speedup provided by mypy/c 0.990 on Python >= 3.7
|
||||||
|
|
||||||
|
## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
||||||
|
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
||||||
|
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Build with static metadata using 'build' frontend
|
||||||
|
- Make the language detection stricter
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- CLI with opt --normalize fail when using full path for files
|
||||||
|
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Coherence detector no longer return 'Simple English' instead return 'English'
|
||||||
|
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
||||||
|
|
||||||
|
## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Method `first()` and `best()` from CharsetMatch
|
||||||
|
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Sphinx warnings when generating the documentation
|
||||||
|
|
||||||
|
## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
||||||
|
- Breaking: Top-level function `normalize`
|
||||||
|
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
||||||
|
- Support for the backport `unicodedata2`
|
||||||
|
|
||||||
|
## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Function `normalize` scheduled for removal in 3.0
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Removed useless call to decode in fn is_unprintable (#206)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from [@aleksandernovikov](https://github.com/aleksandernovikov) (#204)
|
||||||
|
|
||||||
|
## [2.1.0](https://github.com/Ousret/charset_normalizer/compare/2.0.12...2.1.0) (2022-06-19)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Output the Unicode table version when running the CLI with `--version` (PR #194)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Re-use decoded buffer for single byte character sets from [@nijel](https://github.com/nijel) (PR #175)
|
||||||
|
- Fixing some performance bottlenecks from [@deedy5](https://github.com/deedy5) (PR #183)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
|
||||||
|
- CLI default threshold aligned with the API threshold from [@oleksandr-kuzmenko](https://github.com/oleksandr-kuzmenko) (PR #181)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Support for Python 3.5 (PR #192)
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
|
||||||
|
|
||||||
|
## [2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12) (2022-02-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- ASCII miss-detection on rare cases (PR #170)
|
||||||
|
|
||||||
|
## [2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11) (2022-01-30)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Explicit support for Python 3.11 (PR #164)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
|
||||||
|
|
||||||
|
## [2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10) (2022-01-04)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Skipping the language-detection (CD) on ASCII (PR #155)
|
||||||
|
|
||||||
|
## [2.0.9](https://github.com/Ousret/charset_normalizer/compare/2.0.8...2.0.9) (2021-12-03)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Moderating the logging impact (since 2.0.8) for specific environments (PR #147)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Wrong logging level applied when setting kwarg `explain` to True (PR #146)
|
||||||
|
|
||||||
|
## [2.0.8](https://github.com/Ousret/charset_normalizer/compare/2.0.7...2.0.8) (2021-11-24)
|
||||||
|
### Changed
|
||||||
|
- Improvement over Vietnamese detection (PR #126)
|
||||||
|
- MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
|
||||||
|
- Efficiency improvements in cd/alphabet_languages from [@adbar](https://github.com/adbar) (PR #122)
|
||||||
|
- call sum() without an intermediary list following PEP 289 recommendations from [@adbar](https://github.com/adbar) (PR #129)
|
||||||
|
- Code style as refactored by Sourcery-AI (PR #131)
|
||||||
|
- Minor adjustment on the MD around european words (PR #133)
|
||||||
|
- Remove and replace SRTs from assets / tests (PR #139)
|
||||||
|
- Initialize the library logger with a `NullHandler` by default from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
|
||||||
|
- Avoid using too insignificant chunk (PR #137)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add and expose function `set_logging_handler` to configure a specific StreamHandler from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
||||||
|
- Add `CHANGELOG.md` entries, format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) (PR #141)
|
||||||
|
|
||||||
|
## [2.0.7](https://github.com/Ousret/charset_normalizer/compare/2.0.6...2.0.7) (2021-10-11)
|
||||||
|
### Added
|
||||||
|
- Add support for Kazakh (Cyrillic) language detection (PR #109)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Further, improve inferring the language from a given single-byte code page (PR #112)
|
||||||
|
- Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
|
||||||
|
- Refactoring for potential performance improvements in loops from [@adbar](https://github.com/adbar) (PR #113)
|
||||||
|
- Various detection improvement (MD+CD) (PR #117)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Remove redundant logging entry about detected language(s) (PR #115)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
|
||||||
|
|
||||||
|
## [2.0.6](https://github.com/Ousret/charset_normalizer/compare/2.0.5...2.0.6) (2021-09-18)
|
||||||
|
### Fixed
|
||||||
|
- Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
|
||||||
|
- Fix CLI crash when using --minimal output in certain cases (PR #103)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
|
||||||
|
|
||||||
|
## [2.0.5](https://github.com/Ousret/charset_normalizer/compare/2.0.4...2.0.5) (2021-09-14)
|
||||||
|
### Changed
|
||||||
|
- The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
|
||||||
|
- The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
|
||||||
|
- The Unicode detection is slightly improved (PR #93)
|
||||||
|
- Add syntax sugar \_\_bool\_\_ for results CharsetMatches list-container (PR #91)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
|
||||||
|
- Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
|
||||||
|
- The MANIFEST.in was not exhaustive (PR #78)
|
||||||
|
|
||||||
|
## [2.0.4](https://github.com/Ousret/charset_normalizer/compare/2.0.3...2.0.4) (2021-07-30)
|
||||||
|
### Fixed
|
||||||
|
- The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
|
||||||
|
- Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
|
||||||
|
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
|
||||||
|
- Submatch factoring could be wrong in rare edge cases (PR #72)
|
||||||
|
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
|
||||||
|
- Fix line endings from CRLF to LF for certain project files (PR #67)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
|
||||||
|
- Allow fallback on specified encoding if any (PR #71)
|
||||||
|
|
||||||
|
## [2.0.3](https://github.com/Ousret/charset_normalizer/compare/2.0.2...2.0.3) (2021-07-16)
|
||||||
|
### Changed
|
||||||
|
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
|
||||||
|
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
|
||||||
|
|
||||||
|
## [2.0.2](https://github.com/Ousret/charset_normalizer/compare/2.0.1...2.0.2) (2021-07-15)
|
||||||
|
### Fixed
|
||||||
|
- Empty/Too small JSON payload miss-detection fixed. Report from [@tseaver](https://github.com/tseaver) (PR #59)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Don't inject unicodedata2 into sys.modules from [@akx](https://github.com/akx) (PR #57)
|
||||||
|
|
||||||
|
## [2.0.1](https://github.com/Ousret/charset_normalizer/compare/2.0.0...2.0.1) (2021-07-13)
|
||||||
|
### Fixed
|
||||||
|
- Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from [@sethmlarson](https://github.com/sethmlarson). (PR #55)
|
||||||
|
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
|
||||||
|
- One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
|
||||||
|
- Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Public function normalize default args values were not aligned with from_bytes (PR #53)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
|
||||||
|
|
||||||
|
## [2.0.0](https://github.com/Ousret/charset_normalizer/compare/1.4.1...2.0.0) (2021-07-02)
|
||||||
|
### Changed
|
||||||
|
- 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
|
||||||
|
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
|
||||||
|
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
|
||||||
|
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
|
||||||
|
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
|
||||||
|
- utf_7 detection has been reinstated.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- This package no longer require anything when used with Python 3.5 (Dropped cached_property)
|
||||||
|
- Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
|
||||||
|
- The exception hook on UnicodeDecodeError has been removed.
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The CLI output used the relative path of the file(s). Should be absolute.
|
||||||
|
|
||||||
|
## [1.4.1](https://github.com/Ousret/charset_normalizer/compare/1.4.0...1.4.1) (2021-05-28)
|
||||||
|
### Fixed
|
||||||
|
- Logger configuration/usage no longer conflict with others (PR #44)
|
||||||
|
|
||||||
|
## [1.4.0](https://github.com/Ousret/charset_normalizer/compare/1.3.9...1.4.0) (2021-05-21)
|
||||||
|
### Removed
|
||||||
|
- Using standard logging instead of using the package loguru.
|
||||||
|
- Dropping nose test framework in favor of the maintained pytest.
|
||||||
|
- Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
|
||||||
|
- Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
|
||||||
|
- Stop support for UTF-7 that does not contain a SIG.
|
||||||
|
- Dropping PrettyTable, replaced with pure JSON output in CLI.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
|
||||||
|
- Not searching properly for the BOM when trying utf32/16 parent codec.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Improving the package final size by compressing frequencies.json.
|
||||||
|
- Huge improvement over the larges payload.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- CLI now produces JSON consumable output.
|
||||||
|
- Return ASCII if given sequences fit. Given reasonable confidence.
|
||||||
|
|
||||||
|
## [1.3.9](https://github.com/Ousret/charset_normalizer/compare/1.3.8...1.3.9) (2021-05-13)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload (PR #40)
|
||||||
|
|
||||||
|
## [1.3.8](https://github.com/Ousret/charset_normalizer/compare/1.3.7...1.3.8) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Empty given payload for detection may cause an exception if trying to access the `alphabets` property. (PR #39)
|
||||||
|
|
||||||
|
## [1.3.7](https://github.com/Ousret/charset_normalizer/compare/1.3.6...1.3.7) (2021-05-12)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- The legacy detect function should return UTF-8-SIG if sig is present in the payload. (PR #38)
|
||||||
|
|
||||||
|
## [1.3.6](https://github.com/Ousret/charset_normalizer/compare/1.3.5...1.3.6) (2021-02-09)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Amend the previous release to allow prettytable 2.0 (PR #35)
|
||||||
|
|
||||||
|
## [1.3.5](https://github.com/Ousret/charset_normalizer/compare/1.3.4...1.3.5) (2021-02-08)
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Fix error while using the package with a python pre-release interpreter (PR #33)
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Dependencies refactoring, constraints revised.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Add python 3.9 and 3.10 to the supported interpreters
|
||||||
|
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2025 TAHRI Ahmed R.
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
@@ -0,0 +1,35 @@
|
|||||||
|
../../Scripts/normalizer.exe,sha256=swCbbsYGphhCYhRLu1NPZcgzP9vs9NMw8xpifu59AwU,108386
|
||||||
|
charset_normalizer-3.4.4.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
|
||||||
|
charset_normalizer-3.4.4.dist-info/METADATA,sha256=Mg5oc0yfpVMtDcprHt_pPbbV0qUSHEeaEz4NG53pmyY,38067
|
||||||
|
charset_normalizer-3.4.4.dist-info/RECORD,,
|
||||||
|
charset_normalizer-3.4.4.dist-info/WHEEL,sha256=8UP9x9puWI0P1V_d7K2oMTBqfeLNm21CTzZ_Ptr0NXU,101
|
||||||
|
charset_normalizer-3.4.4.dist-info/entry_points.txt,sha256=ADSTKrkXZ3hhdOVFi6DcUEHQRS0xfxDIE_pEz4wLIXA,65
|
||||||
|
charset_normalizer-3.4.4.dist-info/licenses/LICENSE,sha256=GFd0hdNwTxpHne2OVzwJds_tMV_S_ReYP6mI2kwvcNE,1092
|
||||||
|
charset_normalizer-3.4.4.dist-info/top_level.txt,sha256=7ASyzePr8_xuZWJsnqJjIBtyV8vhEo0wBCv1MPRRi3Q,19
|
||||||
|
charset_normalizer/__init__.py,sha256=0NT8MHi7SKq3juMqYfOdrkzjisK0L73lneNHH4qaUAs,1638
|
||||||
|
charset_normalizer/__main__.py,sha256=2sj_BS6H0sU25C1bMqz9DVwa6kOK9lchSEbSU-_iu7M,115
|
||||||
|
charset_normalizer/__pycache__/__init__.cpython-312.pyc,,
|
||||||
|
charset_normalizer/__pycache__/__main__.cpython-312.pyc,,
|
||||||
|
charset_normalizer/__pycache__/api.cpython-312.pyc,,
|
||||||
|
charset_normalizer/__pycache__/cd.cpython-312.pyc,,
|
||||||
|
charset_normalizer/__pycache__/constant.cpython-312.pyc,,
|
||||||
|
charset_normalizer/__pycache__/legacy.cpython-312.pyc,,
|
||||||
|
charset_normalizer/__pycache__/md.cpython-312.pyc,,
|
||||||
|
charset_normalizer/__pycache__/models.cpython-312.pyc,,
|
||||||
|
charset_normalizer/__pycache__/utils.cpython-312.pyc,,
|
||||||
|
charset_normalizer/__pycache__/version.cpython-312.pyc,,
|
||||||
|
charset_normalizer/api.py,sha256=ODy4hX78b3ldTl5sViYPU1yzQ5qkclfgSIFE8BtNrTI,23337
|
||||||
|
charset_normalizer/cd.py,sha256=uq8nVxRpR6Guc16ACvOWtL8KO3w7vYaCh8hHisuOyTg,12917
|
||||||
|
charset_normalizer/cli/__init__.py,sha256=d9MUx-1V_qD3x9igIy4JT4oC5CU0yjulk7QyZWeRFhg,144
|
||||||
|
charset_normalizer/cli/__main__.py,sha256=-pdJCyPywouPyFsC8_eTSgTmvh1YEvgjsvy1WZ0XjaA,13027
|
||||||
|
charset_normalizer/cli/__pycache__/__init__.cpython-312.pyc,,
|
||||||
|
charset_normalizer/cli/__pycache__/__main__.cpython-312.pyc,,
|
||||||
|
charset_normalizer/constant.py,sha256=mCJmYzpBU27Ut9kiNWWoBbhhxQ-aRVw3K7LSwoFwBGI,44728
|
||||||
|
charset_normalizer/legacy.py,sha256=ui08NlKqAXU3Y7smK-NFJjEgRRQz9ruM7aNCbT0OOrE,2811
|
||||||
|
charset_normalizer/md.cp312-win_amd64.pyd,sha256=dqU14JU7SKI0i4dyNqV5nPHQHLIUIsfxeULzU2fLXI8,10752
|
||||||
|
charset_normalizer/md.py,sha256=LSuW2hNgXSgF7JGdRapLAHLuj6pABHiP85LTNAYmu7c,20780
|
||||||
|
charset_normalizer/md__mypyc.cp312-win_amd64.pyd,sha256=CDDD_25vg5Sn3xcPlfwQ3mWrnyKzD50jg_DMKZuN8QE,126976
|
||||||
|
charset_normalizer/models.py,sha256=ZR2PE-fqf6dASZfqdE5Uhkmr0o1MciSdXOjuNqwkmvg,12754
|
||||||
|
charset_normalizer/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
||||||
|
charset_normalizer/utils.py,sha256=XtWIQeOuz7cnGebMzyi4Vvi1JtA84QBSIeR9PDzF7pw,12584
|
||||||
|
charset_normalizer/version.py,sha256=MhW8dOLls4GbbxBUqeS1huc7Rth1ArKi4nS90qTFwz8,123
|
||||||
@@ -0,0 +1,5 @@
|
|||||||
|
Wheel-Version: 1.0
|
||||||
|
Generator: setuptools (80.9.0)
|
||||||
|
Root-Is-Purelib: false
|
||||||
|
Tag: cp312-cp312-win_amd64
|
||||||
|
|
||||||
@@ -0,0 +1,2 @@
|
|||||||
|
[console_scripts]
|
||||||
|
normalizer = charset_normalizer.cli:cli_detect
|
||||||
@@ -0,0 +1,21 @@
|
|||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2025 TAHRI Ahmed R.
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
charset_normalizer
|
||||||
48
.venv/Lib/site-packages/charset_normalizer/__init__.py
Normal file
48
.venv/Lib/site-packages/charset_normalizer/__init__.py
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
"""
|
||||||
|
Charset-Normalizer
|
||||||
|
~~~~~~~~~~~~~~
|
||||||
|
The Real First Universal Charset Detector.
|
||||||
|
A library that helps you read text from an unknown charset encoding.
|
||||||
|
Motivated by chardet, This package is trying to resolve the issue by taking a new approach.
|
||||||
|
All IANA character set names for which the Python core library provides codecs are supported.
|
||||||
|
|
||||||
|
Basic usage:
|
||||||
|
>>> from charset_normalizer import from_bytes
|
||||||
|
>>> results = from_bytes('Bсеки човек има право на образование. Oбразованието!'.encode('utf_8'))
|
||||||
|
>>> best_guess = results.best()
|
||||||
|
>>> str(best_guess)
|
||||||
|
'Bсеки човек има право на образование. Oбразованието!'
|
||||||
|
|
||||||
|
Others methods and usages are available - see the full documentation
|
||||||
|
at <https://github.com/Ousret/charset_normalizer>.
|
||||||
|
:copyright: (c) 2021 by Ahmed TAHRI
|
||||||
|
:license: MIT, see LICENSE for more details.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
|
||||||
|
from .api import from_bytes, from_fp, from_path, is_binary
|
||||||
|
from .legacy import detect
|
||||||
|
from .models import CharsetMatch, CharsetMatches
|
||||||
|
from .utils import set_logging_handler
|
||||||
|
from .version import VERSION, __version__
|
||||||
|
|
||||||
|
__all__ = (
|
||||||
|
"from_fp",
|
||||||
|
"from_path",
|
||||||
|
"from_bytes",
|
||||||
|
"is_binary",
|
||||||
|
"detect",
|
||||||
|
"CharsetMatch",
|
||||||
|
"CharsetMatches",
|
||||||
|
"__version__",
|
||||||
|
"VERSION",
|
||||||
|
"set_logging_handler",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Attach a NullHandler to the top level logger by default
|
||||||
|
# https://docs.python.org/3.3/howto/logging.html#configuring-logging-for-a-library
|
||||||
|
|
||||||
|
logging.getLogger("charset_normalizer").addHandler(logging.NullHandler())
|
||||||
6
.venv/Lib/site-packages/charset_normalizer/__main__.py
Normal file
6
.venv/Lib/site-packages/charset_normalizer/__main__.py
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from .cli import cli_detect
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
cli_detect()
|
||||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
669
.venv/Lib/site-packages/charset_normalizer/api.py
Normal file
669
.venv/Lib/site-packages/charset_normalizer/api.py
Normal file
@@ -0,0 +1,669 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
from os import PathLike
|
||||||
|
from typing import BinaryIO
|
||||||
|
|
||||||
|
from .cd import (
|
||||||
|
coherence_ratio,
|
||||||
|
encoding_languages,
|
||||||
|
mb_encoding_languages,
|
||||||
|
merge_coherence_ratios,
|
||||||
|
)
|
||||||
|
from .constant import IANA_SUPPORTED, TOO_BIG_SEQUENCE, TOO_SMALL_SEQUENCE, TRACE
|
||||||
|
from .md import mess_ratio
|
||||||
|
from .models import CharsetMatch, CharsetMatches
|
||||||
|
from .utils import (
|
||||||
|
any_specified_encoding,
|
||||||
|
cut_sequence_chunks,
|
||||||
|
iana_name,
|
||||||
|
identify_sig_or_bom,
|
||||||
|
is_cp_similar,
|
||||||
|
is_multi_byte_encoding,
|
||||||
|
should_strip_sig_or_bom,
|
||||||
|
)
|
||||||
|
|
||||||
|
logger = logging.getLogger("charset_normalizer")
|
||||||
|
explain_handler = logging.StreamHandler()
|
||||||
|
explain_handler.setFormatter(
|
||||||
|
logging.Formatter("%(asctime)s | %(levelname)s | %(message)s")
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def from_bytes(
|
||||||
|
sequences: bytes | bytearray,
|
||||||
|
steps: int = 5,
|
||||||
|
chunk_size: int = 512,
|
||||||
|
threshold: float = 0.2,
|
||||||
|
cp_isolation: list[str] | None = None,
|
||||||
|
cp_exclusion: list[str] | None = None,
|
||||||
|
preemptive_behaviour: bool = True,
|
||||||
|
explain: bool = False,
|
||||||
|
language_threshold: float = 0.1,
|
||||||
|
enable_fallback: bool = True,
|
||||||
|
) -> CharsetMatches:
|
||||||
|
"""
|
||||||
|
Given a raw bytes sequence, return the best possibles charset usable to render str objects.
|
||||||
|
If there is no results, it is a strong indicator that the source is binary/not text.
|
||||||
|
By default, the process will extract 5 blocks of 512o each to assess the mess and coherence of a given sequence.
|
||||||
|
And will give up a particular code page after 20% of measured mess. Those criteria are customizable at will.
|
||||||
|
|
||||||
|
The preemptive behavior DOES NOT replace the traditional detection workflow, it prioritize a particular code page
|
||||||
|
but never take it for granted. Can improve the performance.
|
||||||
|
|
||||||
|
You may want to focus your attention to some code page or/and not others, use cp_isolation and cp_exclusion for that
|
||||||
|
purpose.
|
||||||
|
|
||||||
|
This function will strip the SIG in the payload/sequence every time except on UTF-16, UTF-32.
|
||||||
|
By default the library does not setup any handler other than the NullHandler, if you choose to set the 'explain'
|
||||||
|
toggle to True it will alter the logger configuration to add a StreamHandler that is suitable for debugging.
|
||||||
|
Custom logging format and handler can be set manually.
|
||||||
|
"""
|
||||||
|
|
||||||
|
if not isinstance(sequences, (bytearray, bytes)):
|
||||||
|
raise TypeError(
|
||||||
|
"Expected object of type bytes or bytearray, got: {}".format(
|
||||||
|
type(sequences)
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
if explain:
|
||||||
|
previous_logger_level: int = logger.level
|
||||||
|
logger.addHandler(explain_handler)
|
||||||
|
logger.setLevel(TRACE)
|
||||||
|
|
||||||
|
length: int = len(sequences)
|
||||||
|
|
||||||
|
if length == 0:
|
||||||
|
logger.debug("Encoding detection on empty bytes, assuming utf_8 intention.")
|
||||||
|
if explain: # Defensive: ensure exit path clean handler
|
||||||
|
logger.removeHandler(explain_handler)
|
||||||
|
logger.setLevel(previous_logger_level or logging.WARNING)
|
||||||
|
return CharsetMatches([CharsetMatch(sequences, "utf_8", 0.0, False, [], "")])
|
||||||
|
|
||||||
|
if cp_isolation is not None:
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"cp_isolation is set. use this flag for debugging purpose. "
|
||||||
|
"limited list of encoding allowed : %s.",
|
||||||
|
", ".join(cp_isolation),
|
||||||
|
)
|
||||||
|
cp_isolation = [iana_name(cp, False) for cp in cp_isolation]
|
||||||
|
else:
|
||||||
|
cp_isolation = []
|
||||||
|
|
||||||
|
if cp_exclusion is not None:
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"cp_exclusion is set. use this flag for debugging purpose. "
|
||||||
|
"limited list of encoding excluded : %s.",
|
||||||
|
", ".join(cp_exclusion),
|
||||||
|
)
|
||||||
|
cp_exclusion = [iana_name(cp, False) for cp in cp_exclusion]
|
||||||
|
else:
|
||||||
|
cp_exclusion = []
|
||||||
|
|
||||||
|
if length <= (chunk_size * steps):
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"override steps (%i) and chunk_size (%i) as content does not fit (%i byte(s) given) parameters.",
|
||||||
|
steps,
|
||||||
|
chunk_size,
|
||||||
|
length,
|
||||||
|
)
|
||||||
|
steps = 1
|
||||||
|
chunk_size = length
|
||||||
|
|
||||||
|
if steps > 1 and length / steps < chunk_size:
|
||||||
|
chunk_size = int(length / steps)
|
||||||
|
|
||||||
|
is_too_small_sequence: bool = len(sequences) < TOO_SMALL_SEQUENCE
|
||||||
|
is_too_large_sequence: bool = len(sequences) >= TOO_BIG_SEQUENCE
|
||||||
|
|
||||||
|
if is_too_small_sequence:
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"Trying to detect encoding from a tiny portion of ({}) byte(s).".format(
|
||||||
|
length
|
||||||
|
),
|
||||||
|
)
|
||||||
|
elif is_too_large_sequence:
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"Using lazy str decoding because the payload is quite large, ({}) byte(s).".format(
|
||||||
|
length
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
prioritized_encodings: list[str] = []
|
||||||
|
|
||||||
|
specified_encoding: str | None = (
|
||||||
|
any_specified_encoding(sequences) if preemptive_behaviour else None
|
||||||
|
)
|
||||||
|
|
||||||
|
if specified_encoding is not None:
|
||||||
|
prioritized_encodings.append(specified_encoding)
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"Detected declarative mark in sequence. Priority +1 given for %s.",
|
||||||
|
specified_encoding,
|
||||||
|
)
|
||||||
|
|
||||||
|
tested: set[str] = set()
|
||||||
|
tested_but_hard_failure: list[str] = []
|
||||||
|
tested_but_soft_failure: list[str] = []
|
||||||
|
|
||||||
|
fallback_ascii: CharsetMatch | None = None
|
||||||
|
fallback_u8: CharsetMatch | None = None
|
||||||
|
fallback_specified: CharsetMatch | None = None
|
||||||
|
|
||||||
|
results: CharsetMatches = CharsetMatches()
|
||||||
|
|
||||||
|
early_stop_results: CharsetMatches = CharsetMatches()
|
||||||
|
|
||||||
|
sig_encoding, sig_payload = identify_sig_or_bom(sequences)
|
||||||
|
|
||||||
|
if sig_encoding is not None:
|
||||||
|
prioritized_encodings.append(sig_encoding)
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"Detected a SIG or BOM mark on first %i byte(s). Priority +1 given for %s.",
|
||||||
|
len(sig_payload),
|
||||||
|
sig_encoding,
|
||||||
|
)
|
||||||
|
|
||||||
|
prioritized_encodings.append("ascii")
|
||||||
|
|
||||||
|
if "utf_8" not in prioritized_encodings:
|
||||||
|
prioritized_encodings.append("utf_8")
|
||||||
|
|
||||||
|
for encoding_iana in prioritized_encodings + IANA_SUPPORTED:
|
||||||
|
if cp_isolation and encoding_iana not in cp_isolation:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if cp_exclusion and encoding_iana in cp_exclusion:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if encoding_iana in tested:
|
||||||
|
continue
|
||||||
|
|
||||||
|
tested.add(encoding_iana)
|
||||||
|
|
||||||
|
decoded_payload: str | None = None
|
||||||
|
bom_or_sig_available: bool = sig_encoding == encoding_iana
|
||||||
|
strip_sig_or_bom: bool = bom_or_sig_available and should_strip_sig_or_bom(
|
||||||
|
encoding_iana
|
||||||
|
)
|
||||||
|
|
||||||
|
if encoding_iana in {"utf_16", "utf_32"} and not bom_or_sig_available:
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"Encoding %s won't be tested as-is because it require a BOM. Will try some sub-encoder LE/BE.",
|
||||||
|
encoding_iana,
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
if encoding_iana in {"utf_7"} and not bom_or_sig_available:
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"Encoding %s won't be tested as-is because detection is unreliable without BOM/SIG.",
|
||||||
|
encoding_iana,
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
|
||||||
|
try:
|
||||||
|
is_multi_byte_decoder: bool = is_multi_byte_encoding(encoding_iana)
|
||||||
|
except (ModuleNotFoundError, ImportError):
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"Encoding %s does not provide an IncrementalDecoder",
|
||||||
|
encoding_iana,
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
|
||||||
|
try:
|
||||||
|
if is_too_large_sequence and is_multi_byte_decoder is False:
|
||||||
|
str(
|
||||||
|
(
|
||||||
|
sequences[: int(50e4)]
|
||||||
|
if strip_sig_or_bom is False
|
||||||
|
else sequences[len(sig_payload) : int(50e4)]
|
||||||
|
),
|
||||||
|
encoding=encoding_iana,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
decoded_payload = str(
|
||||||
|
(
|
||||||
|
sequences
|
||||||
|
if strip_sig_or_bom is False
|
||||||
|
else sequences[len(sig_payload) :]
|
||||||
|
),
|
||||||
|
encoding=encoding_iana,
|
||||||
|
)
|
||||||
|
except (UnicodeDecodeError, LookupError) as e:
|
||||||
|
if not isinstance(e, LookupError):
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"Code page %s does not fit given bytes sequence at ALL. %s",
|
||||||
|
encoding_iana,
|
||||||
|
str(e),
|
||||||
|
)
|
||||||
|
tested_but_hard_failure.append(encoding_iana)
|
||||||
|
continue
|
||||||
|
|
||||||
|
similar_soft_failure_test: bool = False
|
||||||
|
|
||||||
|
for encoding_soft_failed in tested_but_soft_failure:
|
||||||
|
if is_cp_similar(encoding_iana, encoding_soft_failed):
|
||||||
|
similar_soft_failure_test = True
|
||||||
|
break
|
||||||
|
|
||||||
|
if similar_soft_failure_test:
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"%s is deemed too similar to code page %s and was consider unsuited already. Continuing!",
|
||||||
|
encoding_iana,
|
||||||
|
encoding_soft_failed,
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
|
||||||
|
r_ = range(
|
||||||
|
0 if not bom_or_sig_available else len(sig_payload),
|
||||||
|
length,
|
||||||
|
int(length / steps),
|
||||||
|
)
|
||||||
|
|
||||||
|
multi_byte_bonus: bool = (
|
||||||
|
is_multi_byte_decoder
|
||||||
|
and decoded_payload is not None
|
||||||
|
and len(decoded_payload) < length
|
||||||
|
)
|
||||||
|
|
||||||
|
if multi_byte_bonus:
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"Code page %s is a multi byte encoding table and it appear that at least one character "
|
||||||
|
"was encoded using n-bytes.",
|
||||||
|
encoding_iana,
|
||||||
|
)
|
||||||
|
|
||||||
|
max_chunk_gave_up: int = int(len(r_) / 4)
|
||||||
|
|
||||||
|
max_chunk_gave_up = max(max_chunk_gave_up, 2)
|
||||||
|
early_stop_count: int = 0
|
||||||
|
lazy_str_hard_failure = False
|
||||||
|
|
||||||
|
md_chunks: list[str] = []
|
||||||
|
md_ratios = []
|
||||||
|
|
||||||
|
try:
|
||||||
|
for chunk in cut_sequence_chunks(
|
||||||
|
sequences,
|
||||||
|
encoding_iana,
|
||||||
|
r_,
|
||||||
|
chunk_size,
|
||||||
|
bom_or_sig_available,
|
||||||
|
strip_sig_or_bom,
|
||||||
|
sig_payload,
|
||||||
|
is_multi_byte_decoder,
|
||||||
|
decoded_payload,
|
||||||
|
):
|
||||||
|
md_chunks.append(chunk)
|
||||||
|
|
||||||
|
md_ratios.append(
|
||||||
|
mess_ratio(
|
||||||
|
chunk,
|
||||||
|
threshold,
|
||||||
|
explain is True and 1 <= len(cp_isolation) <= 2,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
if md_ratios[-1] >= threshold:
|
||||||
|
early_stop_count += 1
|
||||||
|
|
||||||
|
if (early_stop_count >= max_chunk_gave_up) or (
|
||||||
|
bom_or_sig_available and strip_sig_or_bom is False
|
||||||
|
):
|
||||||
|
break
|
||||||
|
except (
|
||||||
|
UnicodeDecodeError
|
||||||
|
) as e: # Lazy str loading may have missed something there
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"LazyStr Loading: After MD chunk decode, code page %s does not fit given bytes sequence at ALL. %s",
|
||||||
|
encoding_iana,
|
||||||
|
str(e),
|
||||||
|
)
|
||||||
|
early_stop_count = max_chunk_gave_up
|
||||||
|
lazy_str_hard_failure = True
|
||||||
|
|
||||||
|
# We might want to check the sequence again with the whole content
|
||||||
|
# Only if initial MD tests passes
|
||||||
|
if (
|
||||||
|
not lazy_str_hard_failure
|
||||||
|
and is_too_large_sequence
|
||||||
|
and not is_multi_byte_decoder
|
||||||
|
):
|
||||||
|
try:
|
||||||
|
sequences[int(50e3) :].decode(encoding_iana, errors="strict")
|
||||||
|
except UnicodeDecodeError as e:
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"LazyStr Loading: After final lookup, code page %s does not fit given bytes sequence at ALL. %s",
|
||||||
|
encoding_iana,
|
||||||
|
str(e),
|
||||||
|
)
|
||||||
|
tested_but_hard_failure.append(encoding_iana)
|
||||||
|
continue
|
||||||
|
|
||||||
|
mean_mess_ratio: float = sum(md_ratios) / len(md_ratios) if md_ratios else 0.0
|
||||||
|
if mean_mess_ratio >= threshold or early_stop_count >= max_chunk_gave_up:
|
||||||
|
tested_but_soft_failure.append(encoding_iana)
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"%s was excluded because of initial chaos probing. Gave up %i time(s). "
|
||||||
|
"Computed mean chaos is %f %%.",
|
||||||
|
encoding_iana,
|
||||||
|
early_stop_count,
|
||||||
|
round(mean_mess_ratio * 100, ndigits=3),
|
||||||
|
)
|
||||||
|
# Preparing those fallbacks in case we got nothing.
|
||||||
|
if (
|
||||||
|
enable_fallback
|
||||||
|
and encoding_iana
|
||||||
|
in ["ascii", "utf_8", specified_encoding, "utf_16", "utf_32"]
|
||||||
|
and not lazy_str_hard_failure
|
||||||
|
):
|
||||||
|
fallback_entry = CharsetMatch(
|
||||||
|
sequences,
|
||||||
|
encoding_iana,
|
||||||
|
threshold,
|
||||||
|
bom_or_sig_available,
|
||||||
|
[],
|
||||||
|
decoded_payload,
|
||||||
|
preemptive_declaration=specified_encoding,
|
||||||
|
)
|
||||||
|
if encoding_iana == specified_encoding:
|
||||||
|
fallback_specified = fallback_entry
|
||||||
|
elif encoding_iana == "ascii":
|
||||||
|
fallback_ascii = fallback_entry
|
||||||
|
else:
|
||||||
|
fallback_u8 = fallback_entry
|
||||||
|
continue
|
||||||
|
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"%s passed initial chaos probing. Mean measured chaos is %f %%",
|
||||||
|
encoding_iana,
|
||||||
|
round(mean_mess_ratio * 100, ndigits=3),
|
||||||
|
)
|
||||||
|
|
||||||
|
if not is_multi_byte_decoder:
|
||||||
|
target_languages: list[str] = encoding_languages(encoding_iana)
|
||||||
|
else:
|
||||||
|
target_languages = mb_encoding_languages(encoding_iana)
|
||||||
|
|
||||||
|
if target_languages:
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"{} should target any language(s) of {}".format(
|
||||||
|
encoding_iana, str(target_languages)
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
cd_ratios = []
|
||||||
|
|
||||||
|
# We shall skip the CD when its about ASCII
|
||||||
|
# Most of the time its not relevant to run "language-detection" on it.
|
||||||
|
if encoding_iana != "ascii":
|
||||||
|
for chunk in md_chunks:
|
||||||
|
chunk_languages = coherence_ratio(
|
||||||
|
chunk,
|
||||||
|
language_threshold,
|
||||||
|
",".join(target_languages) if target_languages else None,
|
||||||
|
)
|
||||||
|
|
||||||
|
cd_ratios.append(chunk_languages)
|
||||||
|
|
||||||
|
cd_ratios_merged = merge_coherence_ratios(cd_ratios)
|
||||||
|
|
||||||
|
if cd_ratios_merged:
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"We detected language {} using {}".format(
|
||||||
|
cd_ratios_merged, encoding_iana
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
current_match = CharsetMatch(
|
||||||
|
sequences,
|
||||||
|
encoding_iana,
|
||||||
|
mean_mess_ratio,
|
||||||
|
bom_or_sig_available,
|
||||||
|
cd_ratios_merged,
|
||||||
|
(
|
||||||
|
decoded_payload
|
||||||
|
if (
|
||||||
|
is_too_large_sequence is False
|
||||||
|
or encoding_iana in [specified_encoding, "ascii", "utf_8"]
|
||||||
|
)
|
||||||
|
else None
|
||||||
|
),
|
||||||
|
preemptive_declaration=specified_encoding,
|
||||||
|
)
|
||||||
|
|
||||||
|
results.append(current_match)
|
||||||
|
|
||||||
|
if (
|
||||||
|
encoding_iana in [specified_encoding, "ascii", "utf_8"]
|
||||||
|
and mean_mess_ratio < 0.1
|
||||||
|
):
|
||||||
|
# If md says nothing to worry about, then... stop immediately!
|
||||||
|
if mean_mess_ratio == 0.0:
|
||||||
|
logger.debug(
|
||||||
|
"Encoding detection: %s is most likely the one.",
|
||||||
|
current_match.encoding,
|
||||||
|
)
|
||||||
|
if explain: # Defensive: ensure exit path clean handler
|
||||||
|
logger.removeHandler(explain_handler)
|
||||||
|
logger.setLevel(previous_logger_level)
|
||||||
|
return CharsetMatches([current_match])
|
||||||
|
|
||||||
|
early_stop_results.append(current_match)
|
||||||
|
|
||||||
|
if (
|
||||||
|
len(early_stop_results)
|
||||||
|
and (specified_encoding is None or specified_encoding in tested)
|
||||||
|
and "ascii" in tested
|
||||||
|
and "utf_8" in tested
|
||||||
|
):
|
||||||
|
probable_result: CharsetMatch = early_stop_results.best() # type: ignore[assignment]
|
||||||
|
logger.debug(
|
||||||
|
"Encoding detection: %s is most likely the one.",
|
||||||
|
probable_result.encoding,
|
||||||
|
)
|
||||||
|
if explain: # Defensive: ensure exit path clean handler
|
||||||
|
logger.removeHandler(explain_handler)
|
||||||
|
logger.setLevel(previous_logger_level)
|
||||||
|
|
||||||
|
return CharsetMatches([probable_result])
|
||||||
|
|
||||||
|
if encoding_iana == sig_encoding:
|
||||||
|
logger.debug(
|
||||||
|
"Encoding detection: %s is most likely the one as we detected a BOM or SIG within "
|
||||||
|
"the beginning of the sequence.",
|
||||||
|
encoding_iana,
|
||||||
|
)
|
||||||
|
if explain: # Defensive: ensure exit path clean handler
|
||||||
|
logger.removeHandler(explain_handler)
|
||||||
|
logger.setLevel(previous_logger_level)
|
||||||
|
return CharsetMatches([results[encoding_iana]])
|
||||||
|
|
||||||
|
if len(results) == 0:
|
||||||
|
if fallback_u8 or fallback_ascii or fallback_specified:
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"Nothing got out of the detection process. Using ASCII/UTF-8/Specified fallback.",
|
||||||
|
)
|
||||||
|
|
||||||
|
if fallback_specified:
|
||||||
|
logger.debug(
|
||||||
|
"Encoding detection: %s will be used as a fallback match",
|
||||||
|
fallback_specified.encoding,
|
||||||
|
)
|
||||||
|
results.append(fallback_specified)
|
||||||
|
elif (
|
||||||
|
(fallback_u8 and fallback_ascii is None)
|
||||||
|
or (
|
||||||
|
fallback_u8
|
||||||
|
and fallback_ascii
|
||||||
|
and fallback_u8.fingerprint != fallback_ascii.fingerprint
|
||||||
|
)
|
||||||
|
or (fallback_u8 is not None)
|
||||||
|
):
|
||||||
|
logger.debug("Encoding detection: utf_8 will be used as a fallback match")
|
||||||
|
results.append(fallback_u8)
|
||||||
|
elif fallback_ascii:
|
||||||
|
logger.debug("Encoding detection: ascii will be used as a fallback match")
|
||||||
|
results.append(fallback_ascii)
|
||||||
|
|
||||||
|
if results:
|
||||||
|
logger.debug(
|
||||||
|
"Encoding detection: Found %s as plausible (best-candidate) for content. With %i alternatives.",
|
||||||
|
results.best().encoding, # type: ignore
|
||||||
|
len(results) - 1,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
logger.debug("Encoding detection: Unable to determine any suitable charset.")
|
||||||
|
|
||||||
|
if explain:
|
||||||
|
logger.removeHandler(explain_handler)
|
||||||
|
logger.setLevel(previous_logger_level)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def from_fp(
|
||||||
|
fp: BinaryIO,
|
||||||
|
steps: int = 5,
|
||||||
|
chunk_size: int = 512,
|
||||||
|
threshold: float = 0.20,
|
||||||
|
cp_isolation: list[str] | None = None,
|
||||||
|
cp_exclusion: list[str] | None = None,
|
||||||
|
preemptive_behaviour: bool = True,
|
||||||
|
explain: bool = False,
|
||||||
|
language_threshold: float = 0.1,
|
||||||
|
enable_fallback: bool = True,
|
||||||
|
) -> CharsetMatches:
|
||||||
|
"""
|
||||||
|
Same thing than the function from_bytes but using a file pointer that is already ready.
|
||||||
|
Will not close the file pointer.
|
||||||
|
"""
|
||||||
|
return from_bytes(
|
||||||
|
fp.read(),
|
||||||
|
steps,
|
||||||
|
chunk_size,
|
||||||
|
threshold,
|
||||||
|
cp_isolation,
|
||||||
|
cp_exclusion,
|
||||||
|
preemptive_behaviour,
|
||||||
|
explain,
|
||||||
|
language_threshold,
|
||||||
|
enable_fallback,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def from_path(
|
||||||
|
path: str | bytes | PathLike, # type: ignore[type-arg]
|
||||||
|
steps: int = 5,
|
||||||
|
chunk_size: int = 512,
|
||||||
|
threshold: float = 0.20,
|
||||||
|
cp_isolation: list[str] | None = None,
|
||||||
|
cp_exclusion: list[str] | None = None,
|
||||||
|
preemptive_behaviour: bool = True,
|
||||||
|
explain: bool = False,
|
||||||
|
language_threshold: float = 0.1,
|
||||||
|
enable_fallback: bool = True,
|
||||||
|
) -> CharsetMatches:
|
||||||
|
"""
|
||||||
|
Same thing than the function from_bytes but with one extra step. Opening and reading given file path in binary mode.
|
||||||
|
Can raise IOError.
|
||||||
|
"""
|
||||||
|
with open(path, "rb") as fp:
|
||||||
|
return from_fp(
|
||||||
|
fp,
|
||||||
|
steps,
|
||||||
|
chunk_size,
|
||||||
|
threshold,
|
||||||
|
cp_isolation,
|
||||||
|
cp_exclusion,
|
||||||
|
preemptive_behaviour,
|
||||||
|
explain,
|
||||||
|
language_threshold,
|
||||||
|
enable_fallback,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def is_binary(
|
||||||
|
fp_or_path_or_payload: PathLike | str | BinaryIO | bytes, # type: ignore[type-arg]
|
||||||
|
steps: int = 5,
|
||||||
|
chunk_size: int = 512,
|
||||||
|
threshold: float = 0.20,
|
||||||
|
cp_isolation: list[str] | None = None,
|
||||||
|
cp_exclusion: list[str] | None = None,
|
||||||
|
preemptive_behaviour: bool = True,
|
||||||
|
explain: bool = False,
|
||||||
|
language_threshold: float = 0.1,
|
||||||
|
enable_fallback: bool = False,
|
||||||
|
) -> bool:
|
||||||
|
"""
|
||||||
|
Detect if the given input (file, bytes, or path) points to a binary file. aka. not a string.
|
||||||
|
Based on the same main heuristic algorithms and default kwargs at the sole exception that fallbacks match
|
||||||
|
are disabled to be stricter around ASCII-compatible but unlikely to be a string.
|
||||||
|
"""
|
||||||
|
if isinstance(fp_or_path_or_payload, (str, PathLike)):
|
||||||
|
guesses = from_path(
|
||||||
|
fp_or_path_or_payload,
|
||||||
|
steps=steps,
|
||||||
|
chunk_size=chunk_size,
|
||||||
|
threshold=threshold,
|
||||||
|
cp_isolation=cp_isolation,
|
||||||
|
cp_exclusion=cp_exclusion,
|
||||||
|
preemptive_behaviour=preemptive_behaviour,
|
||||||
|
explain=explain,
|
||||||
|
language_threshold=language_threshold,
|
||||||
|
enable_fallback=enable_fallback,
|
||||||
|
)
|
||||||
|
elif isinstance(
|
||||||
|
fp_or_path_or_payload,
|
||||||
|
(
|
||||||
|
bytes,
|
||||||
|
bytearray,
|
||||||
|
),
|
||||||
|
):
|
||||||
|
guesses = from_bytes(
|
||||||
|
fp_or_path_or_payload,
|
||||||
|
steps=steps,
|
||||||
|
chunk_size=chunk_size,
|
||||||
|
threshold=threshold,
|
||||||
|
cp_isolation=cp_isolation,
|
||||||
|
cp_exclusion=cp_exclusion,
|
||||||
|
preemptive_behaviour=preemptive_behaviour,
|
||||||
|
explain=explain,
|
||||||
|
language_threshold=language_threshold,
|
||||||
|
enable_fallback=enable_fallback,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
guesses = from_fp(
|
||||||
|
fp_or_path_or_payload,
|
||||||
|
steps=steps,
|
||||||
|
chunk_size=chunk_size,
|
||||||
|
threshold=threshold,
|
||||||
|
cp_isolation=cp_isolation,
|
||||||
|
cp_exclusion=cp_exclusion,
|
||||||
|
preemptive_behaviour=preemptive_behaviour,
|
||||||
|
explain=explain,
|
||||||
|
language_threshold=language_threshold,
|
||||||
|
enable_fallback=enable_fallback,
|
||||||
|
)
|
||||||
|
|
||||||
|
return not guesses
|
||||||
395
.venv/Lib/site-packages/charset_normalizer/cd.py
Normal file
395
.venv/Lib/site-packages/charset_normalizer/cd.py
Normal file
@@ -0,0 +1,395 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import importlib
|
||||||
|
from codecs import IncrementalDecoder
|
||||||
|
from collections import Counter
|
||||||
|
from functools import lru_cache
|
||||||
|
from typing import Counter as TypeCounter
|
||||||
|
|
||||||
|
from .constant import (
|
||||||
|
FREQUENCIES,
|
||||||
|
KO_NAMES,
|
||||||
|
LANGUAGE_SUPPORTED_COUNT,
|
||||||
|
TOO_SMALL_SEQUENCE,
|
||||||
|
ZH_NAMES,
|
||||||
|
)
|
||||||
|
from .md import is_suspiciously_successive_range
|
||||||
|
from .models import CoherenceMatches
|
||||||
|
from .utils import (
|
||||||
|
is_accentuated,
|
||||||
|
is_latin,
|
||||||
|
is_multi_byte_encoding,
|
||||||
|
is_unicode_range_secondary,
|
||||||
|
unicode_range,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def encoding_unicode_range(iana_name: str) -> list[str]:
|
||||||
|
"""
|
||||||
|
Return associated unicode ranges in a single byte code page.
|
||||||
|
"""
|
||||||
|
if is_multi_byte_encoding(iana_name):
|
||||||
|
raise OSError("Function not supported on multi-byte code page")
|
||||||
|
|
||||||
|
decoder = importlib.import_module(f"encodings.{iana_name}").IncrementalDecoder
|
||||||
|
|
||||||
|
p: IncrementalDecoder = decoder(errors="ignore")
|
||||||
|
seen_ranges: dict[str, int] = {}
|
||||||
|
character_count: int = 0
|
||||||
|
|
||||||
|
for i in range(0x40, 0xFF):
|
||||||
|
chunk: str = p.decode(bytes([i]))
|
||||||
|
|
||||||
|
if chunk:
|
||||||
|
character_range: str | None = unicode_range(chunk)
|
||||||
|
|
||||||
|
if character_range is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if is_unicode_range_secondary(character_range) is False:
|
||||||
|
if character_range not in seen_ranges:
|
||||||
|
seen_ranges[character_range] = 0
|
||||||
|
seen_ranges[character_range] += 1
|
||||||
|
character_count += 1
|
||||||
|
|
||||||
|
return sorted(
|
||||||
|
[
|
||||||
|
character_range
|
||||||
|
for character_range in seen_ranges
|
||||||
|
if seen_ranges[character_range] / character_count >= 0.15
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def unicode_range_languages(primary_range: str) -> list[str]:
|
||||||
|
"""
|
||||||
|
Return inferred languages used with a unicode range.
|
||||||
|
"""
|
||||||
|
languages: list[str] = []
|
||||||
|
|
||||||
|
for language, characters in FREQUENCIES.items():
|
||||||
|
for character in characters:
|
||||||
|
if unicode_range(character) == primary_range:
|
||||||
|
languages.append(language)
|
||||||
|
break
|
||||||
|
|
||||||
|
return languages
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache()
|
||||||
|
def encoding_languages(iana_name: str) -> list[str]:
|
||||||
|
"""
|
||||||
|
Single-byte encoding language association. Some code page are heavily linked to particular language(s).
|
||||||
|
This function does the correspondence.
|
||||||
|
"""
|
||||||
|
unicode_ranges: list[str] = encoding_unicode_range(iana_name)
|
||||||
|
primary_range: str | None = None
|
||||||
|
|
||||||
|
for specified_range in unicode_ranges:
|
||||||
|
if "Latin" not in specified_range:
|
||||||
|
primary_range = specified_range
|
||||||
|
break
|
||||||
|
|
||||||
|
if primary_range is None:
|
||||||
|
return ["Latin Based"]
|
||||||
|
|
||||||
|
return unicode_range_languages(primary_range)
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache()
|
||||||
|
def mb_encoding_languages(iana_name: str) -> list[str]:
|
||||||
|
"""
|
||||||
|
Multi-byte encoding language association. Some code page are heavily linked to particular language(s).
|
||||||
|
This function does the correspondence.
|
||||||
|
"""
|
||||||
|
if (
|
||||||
|
iana_name.startswith("shift_")
|
||||||
|
or iana_name.startswith("iso2022_jp")
|
||||||
|
or iana_name.startswith("euc_j")
|
||||||
|
or iana_name == "cp932"
|
||||||
|
):
|
||||||
|
return ["Japanese"]
|
||||||
|
if iana_name.startswith("gb") or iana_name in ZH_NAMES:
|
||||||
|
return ["Chinese"]
|
||||||
|
if iana_name.startswith("iso2022_kr") or iana_name in KO_NAMES:
|
||||||
|
return ["Korean"]
|
||||||
|
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=LANGUAGE_SUPPORTED_COUNT)
|
||||||
|
def get_target_features(language: str) -> tuple[bool, bool]:
|
||||||
|
"""
|
||||||
|
Determine main aspects from a supported language if it contains accents and if is pure Latin.
|
||||||
|
"""
|
||||||
|
target_have_accents: bool = False
|
||||||
|
target_pure_latin: bool = True
|
||||||
|
|
||||||
|
for character in FREQUENCIES[language]:
|
||||||
|
if not target_have_accents and is_accentuated(character):
|
||||||
|
target_have_accents = True
|
||||||
|
if target_pure_latin and is_latin(character) is False:
|
||||||
|
target_pure_latin = False
|
||||||
|
|
||||||
|
return target_have_accents, target_pure_latin
|
||||||
|
|
||||||
|
|
||||||
|
def alphabet_languages(
|
||||||
|
characters: list[str], ignore_non_latin: bool = False
|
||||||
|
) -> list[str]:
|
||||||
|
"""
|
||||||
|
Return associated languages associated to given characters.
|
||||||
|
"""
|
||||||
|
languages: list[tuple[str, float]] = []
|
||||||
|
|
||||||
|
source_have_accents = any(is_accentuated(character) for character in characters)
|
||||||
|
|
||||||
|
for language, language_characters in FREQUENCIES.items():
|
||||||
|
target_have_accents, target_pure_latin = get_target_features(language)
|
||||||
|
|
||||||
|
if ignore_non_latin and target_pure_latin is False:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if target_have_accents is False and source_have_accents:
|
||||||
|
continue
|
||||||
|
|
||||||
|
character_count: int = len(language_characters)
|
||||||
|
|
||||||
|
character_match_count: int = len(
|
||||||
|
[c for c in language_characters if c in characters]
|
||||||
|
)
|
||||||
|
|
||||||
|
ratio: float = character_match_count / character_count
|
||||||
|
|
||||||
|
if ratio >= 0.2:
|
||||||
|
languages.append((language, ratio))
|
||||||
|
|
||||||
|
languages = sorted(languages, key=lambda x: x[1], reverse=True)
|
||||||
|
|
||||||
|
return [compatible_language[0] for compatible_language in languages]
|
||||||
|
|
||||||
|
|
||||||
|
def characters_popularity_compare(
|
||||||
|
language: str, ordered_characters: list[str]
|
||||||
|
) -> float:
|
||||||
|
"""
|
||||||
|
Determine if a ordered characters list (by occurrence from most appearance to rarest) match a particular language.
|
||||||
|
The result is a ratio between 0. (absolutely no correspondence) and 1. (near perfect fit).
|
||||||
|
Beware that is function is not strict on the match in order to ease the detection. (Meaning close match is 1.)
|
||||||
|
"""
|
||||||
|
if language not in FREQUENCIES:
|
||||||
|
raise ValueError(f"{language} not available")
|
||||||
|
|
||||||
|
character_approved_count: int = 0
|
||||||
|
FREQUENCIES_language_set = set(FREQUENCIES[language])
|
||||||
|
|
||||||
|
ordered_characters_count: int = len(ordered_characters)
|
||||||
|
target_language_characters_count: int = len(FREQUENCIES[language])
|
||||||
|
|
||||||
|
large_alphabet: bool = target_language_characters_count > 26
|
||||||
|
|
||||||
|
for character, character_rank in zip(
|
||||||
|
ordered_characters, range(0, ordered_characters_count)
|
||||||
|
):
|
||||||
|
if character not in FREQUENCIES_language_set:
|
||||||
|
continue
|
||||||
|
|
||||||
|
character_rank_in_language: int = FREQUENCIES[language].index(character)
|
||||||
|
expected_projection_ratio: float = (
|
||||||
|
target_language_characters_count / ordered_characters_count
|
||||||
|
)
|
||||||
|
character_rank_projection: int = int(character_rank * expected_projection_ratio)
|
||||||
|
|
||||||
|
if (
|
||||||
|
large_alphabet is False
|
||||||
|
and abs(character_rank_projection - character_rank_in_language) > 4
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
|
||||||
|
if (
|
||||||
|
large_alphabet is True
|
||||||
|
and abs(character_rank_projection - character_rank_in_language)
|
||||||
|
< target_language_characters_count / 3
|
||||||
|
):
|
||||||
|
character_approved_count += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
characters_before_source: list[str] = FREQUENCIES[language][
|
||||||
|
0:character_rank_in_language
|
||||||
|
]
|
||||||
|
characters_after_source: list[str] = FREQUENCIES[language][
|
||||||
|
character_rank_in_language:
|
||||||
|
]
|
||||||
|
characters_before: list[str] = ordered_characters[0:character_rank]
|
||||||
|
characters_after: list[str] = ordered_characters[character_rank:]
|
||||||
|
|
||||||
|
before_match_count: int = len(
|
||||||
|
set(characters_before) & set(characters_before_source)
|
||||||
|
)
|
||||||
|
|
||||||
|
after_match_count: int = len(
|
||||||
|
set(characters_after) & set(characters_after_source)
|
||||||
|
)
|
||||||
|
|
||||||
|
if len(characters_before_source) == 0 and before_match_count <= 4:
|
||||||
|
character_approved_count += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
if len(characters_after_source) == 0 and after_match_count <= 4:
|
||||||
|
character_approved_count += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
if (
|
||||||
|
before_match_count / len(characters_before_source) >= 0.4
|
||||||
|
or after_match_count / len(characters_after_source) >= 0.4
|
||||||
|
):
|
||||||
|
character_approved_count += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
return character_approved_count / len(ordered_characters)
|
||||||
|
|
||||||
|
|
||||||
|
def alpha_unicode_split(decoded_sequence: str) -> list[str]:
|
||||||
|
"""
|
||||||
|
Given a decoded text sequence, return a list of str. Unicode range / alphabet separation.
|
||||||
|
Ex. a text containing English/Latin with a bit a Hebrew will return two items in the resulting list;
|
||||||
|
One containing the latin letters and the other hebrew.
|
||||||
|
"""
|
||||||
|
layers: dict[str, str] = {}
|
||||||
|
|
||||||
|
for character in decoded_sequence:
|
||||||
|
if character.isalpha() is False:
|
||||||
|
continue
|
||||||
|
|
||||||
|
character_range: str | None = unicode_range(character)
|
||||||
|
|
||||||
|
if character_range is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
layer_target_range: str | None = None
|
||||||
|
|
||||||
|
for discovered_range in layers:
|
||||||
|
if (
|
||||||
|
is_suspiciously_successive_range(discovered_range, character_range)
|
||||||
|
is False
|
||||||
|
):
|
||||||
|
layer_target_range = discovered_range
|
||||||
|
break
|
||||||
|
|
||||||
|
if layer_target_range is None:
|
||||||
|
layer_target_range = character_range
|
||||||
|
|
||||||
|
if layer_target_range not in layers:
|
||||||
|
layers[layer_target_range] = character.lower()
|
||||||
|
continue
|
||||||
|
|
||||||
|
layers[layer_target_range] += character.lower()
|
||||||
|
|
||||||
|
return list(layers.values())
|
||||||
|
|
||||||
|
|
||||||
|
def merge_coherence_ratios(results: list[CoherenceMatches]) -> CoherenceMatches:
|
||||||
|
"""
|
||||||
|
This function merge results previously given by the function coherence_ratio.
|
||||||
|
The return type is the same as coherence_ratio.
|
||||||
|
"""
|
||||||
|
per_language_ratios: dict[str, list[float]] = {}
|
||||||
|
for result in results:
|
||||||
|
for sub_result in result:
|
||||||
|
language, ratio = sub_result
|
||||||
|
if language not in per_language_ratios:
|
||||||
|
per_language_ratios[language] = [ratio]
|
||||||
|
continue
|
||||||
|
per_language_ratios[language].append(ratio)
|
||||||
|
|
||||||
|
merge = [
|
||||||
|
(
|
||||||
|
language,
|
||||||
|
round(
|
||||||
|
sum(per_language_ratios[language]) / len(per_language_ratios[language]),
|
||||||
|
4,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
for language in per_language_ratios
|
||||||
|
]
|
||||||
|
|
||||||
|
return sorted(merge, key=lambda x: x[1], reverse=True)
|
||||||
|
|
||||||
|
|
||||||
|
def filter_alt_coherence_matches(results: CoherenceMatches) -> CoherenceMatches:
|
||||||
|
"""
|
||||||
|
We shall NOT return "English—" in CoherenceMatches because it is an alternative
|
||||||
|
of "English". This function only keeps the best match and remove the em-dash in it.
|
||||||
|
"""
|
||||||
|
index_results: dict[str, list[float]] = dict()
|
||||||
|
|
||||||
|
for result in results:
|
||||||
|
language, ratio = result
|
||||||
|
no_em_name: str = language.replace("—", "")
|
||||||
|
|
||||||
|
if no_em_name not in index_results:
|
||||||
|
index_results[no_em_name] = []
|
||||||
|
|
||||||
|
index_results[no_em_name].append(ratio)
|
||||||
|
|
||||||
|
if any(len(index_results[e]) > 1 for e in index_results):
|
||||||
|
filtered_results: CoherenceMatches = []
|
||||||
|
|
||||||
|
for language in index_results:
|
||||||
|
filtered_results.append((language, max(index_results[language])))
|
||||||
|
|
||||||
|
return filtered_results
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=2048)
|
||||||
|
def coherence_ratio(
|
||||||
|
decoded_sequence: str, threshold: float = 0.1, lg_inclusion: str | None = None
|
||||||
|
) -> CoherenceMatches:
|
||||||
|
"""
|
||||||
|
Detect ANY language that can be identified in given sequence. The sequence will be analysed by layers.
|
||||||
|
A layer = Character extraction by alphabets/ranges.
|
||||||
|
"""
|
||||||
|
|
||||||
|
results: list[tuple[str, float]] = []
|
||||||
|
ignore_non_latin: bool = False
|
||||||
|
|
||||||
|
sufficient_match_count: int = 0
|
||||||
|
|
||||||
|
lg_inclusion_list = lg_inclusion.split(",") if lg_inclusion is not None else []
|
||||||
|
if "Latin Based" in lg_inclusion_list:
|
||||||
|
ignore_non_latin = True
|
||||||
|
lg_inclusion_list.remove("Latin Based")
|
||||||
|
|
||||||
|
for layer in alpha_unicode_split(decoded_sequence):
|
||||||
|
sequence_frequencies: TypeCounter[str] = Counter(layer)
|
||||||
|
most_common = sequence_frequencies.most_common()
|
||||||
|
|
||||||
|
character_count: int = sum(o for c, o in most_common)
|
||||||
|
|
||||||
|
if character_count <= TOO_SMALL_SEQUENCE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
popular_character_ordered: list[str] = [c for c, o in most_common]
|
||||||
|
|
||||||
|
for language in lg_inclusion_list or alphabet_languages(
|
||||||
|
popular_character_ordered, ignore_non_latin
|
||||||
|
):
|
||||||
|
ratio: float = characters_popularity_compare(
|
||||||
|
language, popular_character_ordered
|
||||||
|
)
|
||||||
|
|
||||||
|
if ratio < threshold:
|
||||||
|
continue
|
||||||
|
elif ratio >= 0.8:
|
||||||
|
sufficient_match_count += 1
|
||||||
|
|
||||||
|
results.append((language, round(ratio, 4)))
|
||||||
|
|
||||||
|
if sufficient_match_count >= 3:
|
||||||
|
break
|
||||||
|
|
||||||
|
return sorted(
|
||||||
|
filter_alt_coherence_matches(results), key=lambda x: x[1], reverse=True
|
||||||
|
)
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from .__main__ import cli_detect, query_yes_no
|
||||||
|
|
||||||
|
__all__ = (
|
||||||
|
"cli_detect",
|
||||||
|
"query_yes_no",
|
||||||
|
)
|
||||||
381
.venv/Lib/site-packages/charset_normalizer/cli/__main__.py
Normal file
381
.venv/Lib/site-packages/charset_normalizer/cli/__main__.py
Normal file
@@ -0,0 +1,381 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import sys
|
||||||
|
import typing
|
||||||
|
from json import dumps
|
||||||
|
from os.path import abspath, basename, dirname, join, realpath
|
||||||
|
from platform import python_version
|
||||||
|
from unicodedata import unidata_version
|
||||||
|
|
||||||
|
import charset_normalizer.md as md_module
|
||||||
|
from charset_normalizer import from_fp
|
||||||
|
from charset_normalizer.models import CliDetectionResult
|
||||||
|
from charset_normalizer.version import __version__
|
||||||
|
|
||||||
|
|
||||||
|
def query_yes_no(question: str, default: str = "yes") -> bool:
|
||||||
|
"""Ask a yes/no question via input() and return their answer.
|
||||||
|
|
||||||
|
"question" is a string that is presented to the user.
|
||||||
|
"default" is the presumed answer if the user just hits <Enter>.
|
||||||
|
It must be "yes" (the default), "no" or None (meaning
|
||||||
|
an answer is required of the user).
|
||||||
|
|
||||||
|
The "answer" return value is True for "yes" or False for "no".
|
||||||
|
|
||||||
|
Credit goes to (c) https://stackoverflow.com/questions/3041986/apt-command-line-interface-like-yes-no-input
|
||||||
|
"""
|
||||||
|
valid = {"yes": True, "y": True, "ye": True, "no": False, "n": False}
|
||||||
|
if default is None:
|
||||||
|
prompt = " [y/n] "
|
||||||
|
elif default == "yes":
|
||||||
|
prompt = " [Y/n] "
|
||||||
|
elif default == "no":
|
||||||
|
prompt = " [y/N] "
|
||||||
|
else:
|
||||||
|
raise ValueError("invalid default answer: '%s'" % default)
|
||||||
|
|
||||||
|
while True:
|
||||||
|
sys.stdout.write(question + prompt)
|
||||||
|
choice = input().lower()
|
||||||
|
if default is not None and choice == "":
|
||||||
|
return valid[default]
|
||||||
|
elif choice in valid:
|
||||||
|
return valid[choice]
|
||||||
|
else:
|
||||||
|
sys.stdout.write("Please respond with 'yes' or 'no' (or 'y' or 'n').\n")
|
||||||
|
|
||||||
|
|
||||||
|
class FileType:
|
||||||
|
"""Factory for creating file object types
|
||||||
|
|
||||||
|
Instances of FileType are typically passed as type= arguments to the
|
||||||
|
ArgumentParser add_argument() method.
|
||||||
|
|
||||||
|
Keyword Arguments:
|
||||||
|
- mode -- A string indicating how the file is to be opened. Accepts the
|
||||||
|
same values as the builtin open() function.
|
||||||
|
- bufsize -- The file's desired buffer size. Accepts the same values as
|
||||||
|
the builtin open() function.
|
||||||
|
- encoding -- The file's encoding. Accepts the same values as the
|
||||||
|
builtin open() function.
|
||||||
|
- errors -- A string indicating how encoding and decoding errors are to
|
||||||
|
be handled. Accepts the same value as the builtin open() function.
|
||||||
|
|
||||||
|
Backported from CPython 3.12
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
mode: str = "r",
|
||||||
|
bufsize: int = -1,
|
||||||
|
encoding: str | None = None,
|
||||||
|
errors: str | None = None,
|
||||||
|
):
|
||||||
|
self._mode = mode
|
||||||
|
self._bufsize = bufsize
|
||||||
|
self._encoding = encoding
|
||||||
|
self._errors = errors
|
||||||
|
|
||||||
|
def __call__(self, string: str) -> typing.IO: # type: ignore[type-arg]
|
||||||
|
# the special argument "-" means sys.std{in,out}
|
||||||
|
if string == "-":
|
||||||
|
if "r" in self._mode:
|
||||||
|
return sys.stdin.buffer if "b" in self._mode else sys.stdin
|
||||||
|
elif any(c in self._mode for c in "wax"):
|
||||||
|
return sys.stdout.buffer if "b" in self._mode else sys.stdout
|
||||||
|
else:
|
||||||
|
msg = f'argument "-" with mode {self._mode}'
|
||||||
|
raise ValueError(msg)
|
||||||
|
|
||||||
|
# all other arguments are used as file names
|
||||||
|
try:
|
||||||
|
return open(string, self._mode, self._bufsize, self._encoding, self._errors)
|
||||||
|
except OSError as e:
|
||||||
|
message = f"can't open '{string}': {e}"
|
||||||
|
raise argparse.ArgumentTypeError(message)
|
||||||
|
|
||||||
|
def __repr__(self) -> str:
|
||||||
|
args = self._mode, self._bufsize
|
||||||
|
kwargs = [("encoding", self._encoding), ("errors", self._errors)]
|
||||||
|
args_str = ", ".join(
|
||||||
|
[repr(arg) for arg in args if arg != -1]
|
||||||
|
+ [f"{kw}={arg!r}" for kw, arg in kwargs if arg is not None]
|
||||||
|
)
|
||||||
|
return f"{type(self).__name__}({args_str})"
|
||||||
|
|
||||||
|
|
||||||
|
def cli_detect(argv: list[str] | None = None) -> int:
|
||||||
|
"""
|
||||||
|
CLI assistant using ARGV and ArgumentParser
|
||||||
|
:param argv:
|
||||||
|
:return: 0 if everything is fine, anything else equal trouble
|
||||||
|
"""
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="The Real First Universal Charset Detector. "
|
||||||
|
"Discover originating encoding used on text file. "
|
||||||
|
"Normalize text to unicode."
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument(
|
||||||
|
"files", type=FileType("rb"), nargs="+", help="File(s) to be analysed"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"-v",
|
||||||
|
"--verbose",
|
||||||
|
action="store_true",
|
||||||
|
default=False,
|
||||||
|
dest="verbose",
|
||||||
|
help="Display complementary information about file if any. "
|
||||||
|
"Stdout will contain logs about the detection process.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"-a",
|
||||||
|
"--with-alternative",
|
||||||
|
action="store_true",
|
||||||
|
default=False,
|
||||||
|
dest="alternatives",
|
||||||
|
help="Output complementary possibilities if any. Top-level JSON WILL be a list.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"-n",
|
||||||
|
"--normalize",
|
||||||
|
action="store_true",
|
||||||
|
default=False,
|
||||||
|
dest="normalize",
|
||||||
|
help="Permit to normalize input file. If not set, program does not write anything.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"-m",
|
||||||
|
"--minimal",
|
||||||
|
action="store_true",
|
||||||
|
default=False,
|
||||||
|
dest="minimal",
|
||||||
|
help="Only output the charset detected to STDOUT. Disabling JSON output.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"-r",
|
||||||
|
"--replace",
|
||||||
|
action="store_true",
|
||||||
|
default=False,
|
||||||
|
dest="replace",
|
||||||
|
help="Replace file when trying to normalize it instead of creating a new one.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"-f",
|
||||||
|
"--force",
|
||||||
|
action="store_true",
|
||||||
|
default=False,
|
||||||
|
dest="force",
|
||||||
|
help="Replace file without asking if you are sure, use this flag with caution.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"-i",
|
||||||
|
"--no-preemptive",
|
||||||
|
action="store_true",
|
||||||
|
default=False,
|
||||||
|
dest="no_preemptive",
|
||||||
|
help="Disable looking at a charset declaration to hint the detector.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"-t",
|
||||||
|
"--threshold",
|
||||||
|
action="store",
|
||||||
|
default=0.2,
|
||||||
|
type=float,
|
||||||
|
dest="threshold",
|
||||||
|
help="Define a custom maximum amount of noise allowed in decoded content. 0. <= noise <= 1.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--version",
|
||||||
|
action="version",
|
||||||
|
version="Charset-Normalizer {} - Python {} - Unicode {} - SpeedUp {}".format(
|
||||||
|
__version__,
|
||||||
|
python_version(),
|
||||||
|
unidata_version,
|
||||||
|
"OFF" if md_module.__file__.lower().endswith(".py") else "ON",
|
||||||
|
),
|
||||||
|
help="Show version information and exit.",
|
||||||
|
)
|
||||||
|
|
||||||
|
args = parser.parse_args(argv)
|
||||||
|
|
||||||
|
if args.replace is True and args.normalize is False:
|
||||||
|
if args.files:
|
||||||
|
for my_file in args.files:
|
||||||
|
my_file.close()
|
||||||
|
print("Use --replace in addition of --normalize only.", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
if args.force is True and args.replace is False:
|
||||||
|
if args.files:
|
||||||
|
for my_file in args.files:
|
||||||
|
my_file.close()
|
||||||
|
print("Use --force in addition of --replace only.", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
if args.threshold < 0.0 or args.threshold > 1.0:
|
||||||
|
if args.files:
|
||||||
|
for my_file in args.files:
|
||||||
|
my_file.close()
|
||||||
|
print("--threshold VALUE should be between 0. AND 1.", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
x_ = []
|
||||||
|
|
||||||
|
for my_file in args.files:
|
||||||
|
matches = from_fp(
|
||||||
|
my_file,
|
||||||
|
threshold=args.threshold,
|
||||||
|
explain=args.verbose,
|
||||||
|
preemptive_behaviour=args.no_preemptive is False,
|
||||||
|
)
|
||||||
|
|
||||||
|
best_guess = matches.best()
|
||||||
|
|
||||||
|
if best_guess is None:
|
||||||
|
print(
|
||||||
|
'Unable to identify originating encoding for "{}". {}'.format(
|
||||||
|
my_file.name,
|
||||||
|
(
|
||||||
|
"Maybe try increasing maximum amount of chaos."
|
||||||
|
if args.threshold < 1.0
|
||||||
|
else ""
|
||||||
|
),
|
||||||
|
),
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
x_.append(
|
||||||
|
CliDetectionResult(
|
||||||
|
abspath(my_file.name),
|
||||||
|
None,
|
||||||
|
[],
|
||||||
|
[],
|
||||||
|
"Unknown",
|
||||||
|
[],
|
||||||
|
False,
|
||||||
|
1.0,
|
||||||
|
0.0,
|
||||||
|
None,
|
||||||
|
True,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
x_.append(
|
||||||
|
CliDetectionResult(
|
||||||
|
abspath(my_file.name),
|
||||||
|
best_guess.encoding,
|
||||||
|
best_guess.encoding_aliases,
|
||||||
|
[
|
||||||
|
cp
|
||||||
|
for cp in best_guess.could_be_from_charset
|
||||||
|
if cp != best_guess.encoding
|
||||||
|
],
|
||||||
|
best_guess.language,
|
||||||
|
best_guess.alphabets,
|
||||||
|
best_guess.bom,
|
||||||
|
best_guess.percent_chaos,
|
||||||
|
best_guess.percent_coherence,
|
||||||
|
None,
|
||||||
|
True,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
if len(matches) > 1 and args.alternatives:
|
||||||
|
for el in matches:
|
||||||
|
if el != best_guess:
|
||||||
|
x_.append(
|
||||||
|
CliDetectionResult(
|
||||||
|
abspath(my_file.name),
|
||||||
|
el.encoding,
|
||||||
|
el.encoding_aliases,
|
||||||
|
[
|
||||||
|
cp
|
||||||
|
for cp in el.could_be_from_charset
|
||||||
|
if cp != el.encoding
|
||||||
|
],
|
||||||
|
el.language,
|
||||||
|
el.alphabets,
|
||||||
|
el.bom,
|
||||||
|
el.percent_chaos,
|
||||||
|
el.percent_coherence,
|
||||||
|
None,
|
||||||
|
False,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.normalize is True:
|
||||||
|
if best_guess.encoding.startswith("utf") is True:
|
||||||
|
print(
|
||||||
|
'"{}" file does not need to be normalized, as it already came from unicode.'.format(
|
||||||
|
my_file.name
|
||||||
|
),
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
if my_file.closed is False:
|
||||||
|
my_file.close()
|
||||||
|
continue
|
||||||
|
|
||||||
|
dir_path = dirname(realpath(my_file.name))
|
||||||
|
file_name = basename(realpath(my_file.name))
|
||||||
|
|
||||||
|
o_: list[str] = file_name.split(".")
|
||||||
|
|
||||||
|
if args.replace is False:
|
||||||
|
o_.insert(-1, best_guess.encoding)
|
||||||
|
if my_file.closed is False:
|
||||||
|
my_file.close()
|
||||||
|
elif (
|
||||||
|
args.force is False
|
||||||
|
and query_yes_no(
|
||||||
|
'Are you sure to normalize "{}" by replacing it ?'.format(
|
||||||
|
my_file.name
|
||||||
|
),
|
||||||
|
"no",
|
||||||
|
)
|
||||||
|
is False
|
||||||
|
):
|
||||||
|
if my_file.closed is False:
|
||||||
|
my_file.close()
|
||||||
|
continue
|
||||||
|
|
||||||
|
try:
|
||||||
|
x_[0].unicode_path = join(dir_path, ".".join(o_))
|
||||||
|
|
||||||
|
with open(x_[0].unicode_path, "wb") as fp:
|
||||||
|
fp.write(best_guess.output())
|
||||||
|
except OSError as e:
|
||||||
|
print(str(e), file=sys.stderr)
|
||||||
|
if my_file.closed is False:
|
||||||
|
my_file.close()
|
||||||
|
return 2
|
||||||
|
|
||||||
|
if my_file.closed is False:
|
||||||
|
my_file.close()
|
||||||
|
|
||||||
|
if args.minimal is False:
|
||||||
|
print(
|
||||||
|
dumps(
|
||||||
|
[el.__dict__ for el in x_] if len(x_) > 1 else x_[0].__dict__,
|
||||||
|
ensure_ascii=True,
|
||||||
|
indent=4,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
for my_file in args.files:
|
||||||
|
print(
|
||||||
|
", ".join(
|
||||||
|
[
|
||||||
|
el.encoding or "undefined"
|
||||||
|
for el in x_
|
||||||
|
if el.path == abspath(my_file.name)
|
||||||
|
]
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
cli_detect()
|
||||||
Binary file not shown.
Binary file not shown.
2015
.venv/Lib/site-packages/charset_normalizer/constant.py
Normal file
2015
.venv/Lib/site-packages/charset_normalizer/constant.py
Normal file
File diff suppressed because it is too large
Load Diff
80
.venv/Lib/site-packages/charset_normalizer/legacy.py
Normal file
80
.venv/Lib/site-packages/charset_normalizer/legacy.py
Normal file
@@ -0,0 +1,80 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import TYPE_CHECKING, Any
|
||||||
|
from warnings import warn
|
||||||
|
|
||||||
|
from .api import from_bytes
|
||||||
|
from .constant import CHARDET_CORRESPONDENCE, TOO_SMALL_SEQUENCE
|
||||||
|
|
||||||
|
# TODO: remove this check when dropping Python 3.7 support
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
from typing_extensions import TypedDict
|
||||||
|
|
||||||
|
class ResultDict(TypedDict):
|
||||||
|
encoding: str | None
|
||||||
|
language: str
|
||||||
|
confidence: float | None
|
||||||
|
|
||||||
|
|
||||||
|
def detect(
|
||||||
|
byte_str: bytes, should_rename_legacy: bool = False, **kwargs: Any
|
||||||
|
) -> ResultDict:
|
||||||
|
"""
|
||||||
|
chardet legacy method
|
||||||
|
Detect the encoding of the given byte string. It should be mostly backward-compatible.
|
||||||
|
Encoding name will match Chardet own writing whenever possible. (Not on encoding name unsupported by it)
|
||||||
|
This function is deprecated and should be used to migrate your project easily, consult the documentation for
|
||||||
|
further information. Not planned for removal.
|
||||||
|
|
||||||
|
:param byte_str: The byte sequence to examine.
|
||||||
|
:param should_rename_legacy: Should we rename legacy encodings
|
||||||
|
to their more modern equivalents?
|
||||||
|
"""
|
||||||
|
if len(kwargs):
|
||||||
|
warn(
|
||||||
|
f"charset-normalizer disregard arguments '{','.join(list(kwargs.keys()))}' in legacy function detect()"
|
||||||
|
)
|
||||||
|
|
||||||
|
if not isinstance(byte_str, (bytearray, bytes)):
|
||||||
|
raise TypeError( # pragma: nocover
|
||||||
|
f"Expected object of type bytes or bytearray, got: {type(byte_str)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
if isinstance(byte_str, bytearray):
|
||||||
|
byte_str = bytes(byte_str)
|
||||||
|
|
||||||
|
r = from_bytes(byte_str).best()
|
||||||
|
|
||||||
|
encoding = r.encoding if r is not None else None
|
||||||
|
language = r.language if r is not None and r.language != "Unknown" else ""
|
||||||
|
confidence = 1.0 - r.chaos if r is not None else None
|
||||||
|
|
||||||
|
# automatically lower confidence
|
||||||
|
# on small bytes samples.
|
||||||
|
# https://github.com/jawah/charset_normalizer/issues/391
|
||||||
|
if (
|
||||||
|
confidence is not None
|
||||||
|
and confidence >= 0.9
|
||||||
|
and encoding
|
||||||
|
not in {
|
||||||
|
"utf_8",
|
||||||
|
"ascii",
|
||||||
|
}
|
||||||
|
and r.bom is False # type: ignore[union-attr]
|
||||||
|
and len(byte_str) < TOO_SMALL_SEQUENCE
|
||||||
|
):
|
||||||
|
confidence -= 0.2
|
||||||
|
|
||||||
|
# Note: CharsetNormalizer does not return 'UTF-8-SIG' as the sig get stripped in the detection/normalization process
|
||||||
|
# but chardet does return 'utf-8-sig' and it is a valid codec name.
|
||||||
|
if r is not None and encoding == "utf_8" and r.bom:
|
||||||
|
encoding += "_sig"
|
||||||
|
|
||||||
|
if should_rename_legacy is False and encoding in CHARDET_CORRESPONDENCE:
|
||||||
|
encoding = CHARDET_CORRESPONDENCE[encoding]
|
||||||
|
|
||||||
|
return {
|
||||||
|
"encoding": encoding,
|
||||||
|
"language": language,
|
||||||
|
"confidence": confidence,
|
||||||
|
}
|
||||||
Binary file not shown.
635
.venv/Lib/site-packages/charset_normalizer/md.py
Normal file
635
.venv/Lib/site-packages/charset_normalizer/md.py
Normal file
@@ -0,0 +1,635 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from functools import lru_cache
|
||||||
|
from logging import getLogger
|
||||||
|
|
||||||
|
from .constant import (
|
||||||
|
COMMON_SAFE_ASCII_CHARACTERS,
|
||||||
|
TRACE,
|
||||||
|
UNICODE_SECONDARY_RANGE_KEYWORD,
|
||||||
|
)
|
||||||
|
from .utils import (
|
||||||
|
is_accentuated,
|
||||||
|
is_arabic,
|
||||||
|
is_arabic_isolated_form,
|
||||||
|
is_case_variable,
|
||||||
|
is_cjk,
|
||||||
|
is_emoticon,
|
||||||
|
is_hangul,
|
||||||
|
is_hiragana,
|
||||||
|
is_katakana,
|
||||||
|
is_latin,
|
||||||
|
is_punctuation,
|
||||||
|
is_separator,
|
||||||
|
is_symbol,
|
||||||
|
is_thai,
|
||||||
|
is_unprintable,
|
||||||
|
remove_accent,
|
||||||
|
unicode_range,
|
||||||
|
is_cjk_uncommon,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class MessDetectorPlugin:
|
||||||
|
"""
|
||||||
|
Base abstract class used for mess detection plugins.
|
||||||
|
All detectors MUST extend and implement given methods.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def eligible(self, character: str) -> bool:
|
||||||
|
"""
|
||||||
|
Determine if given character should be fed in.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError # pragma: nocover
|
||||||
|
|
||||||
|
def feed(self, character: str) -> None:
|
||||||
|
"""
|
||||||
|
The main routine to be executed upon character.
|
||||||
|
Insert the logic in witch the text would be considered chaotic.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError # pragma: nocover
|
||||||
|
|
||||||
|
def reset(self) -> None: # pragma: no cover
|
||||||
|
"""
|
||||||
|
Permit to reset the plugin to the initial state.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError
|
||||||
|
|
||||||
|
@property
|
||||||
|
def ratio(self) -> float:
|
||||||
|
"""
|
||||||
|
Compute the chaos ratio based on what your feed() has seen.
|
||||||
|
Must NOT be lower than 0.; No restriction gt 0.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError # pragma: nocover
|
||||||
|
|
||||||
|
|
||||||
|
class TooManySymbolOrPunctuationPlugin(MessDetectorPlugin):
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._punctuation_count: int = 0
|
||||||
|
self._symbol_count: int = 0
|
||||||
|
self._character_count: int = 0
|
||||||
|
|
||||||
|
self._last_printable_char: str | None = None
|
||||||
|
self._frenzy_symbol_in_word: bool = False
|
||||||
|
|
||||||
|
def eligible(self, character: str) -> bool:
|
||||||
|
return character.isprintable()
|
||||||
|
|
||||||
|
def feed(self, character: str) -> None:
|
||||||
|
self._character_count += 1
|
||||||
|
|
||||||
|
if (
|
||||||
|
character != self._last_printable_char
|
||||||
|
and character not in COMMON_SAFE_ASCII_CHARACTERS
|
||||||
|
):
|
||||||
|
if is_punctuation(character):
|
||||||
|
self._punctuation_count += 1
|
||||||
|
elif (
|
||||||
|
character.isdigit() is False
|
||||||
|
and is_symbol(character)
|
||||||
|
and is_emoticon(character) is False
|
||||||
|
):
|
||||||
|
self._symbol_count += 2
|
||||||
|
|
||||||
|
self._last_printable_char = character
|
||||||
|
|
||||||
|
def reset(self) -> None: # Abstract
|
||||||
|
self._punctuation_count = 0
|
||||||
|
self._character_count = 0
|
||||||
|
self._symbol_count = 0
|
||||||
|
|
||||||
|
@property
|
||||||
|
def ratio(self) -> float:
|
||||||
|
if self._character_count == 0:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
ratio_of_punctuation: float = (
|
||||||
|
self._punctuation_count + self._symbol_count
|
||||||
|
) / self._character_count
|
||||||
|
|
||||||
|
return ratio_of_punctuation if ratio_of_punctuation >= 0.3 else 0.0
|
||||||
|
|
||||||
|
|
||||||
|
class TooManyAccentuatedPlugin(MessDetectorPlugin):
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._character_count: int = 0
|
||||||
|
self._accentuated_count: int = 0
|
||||||
|
|
||||||
|
def eligible(self, character: str) -> bool:
|
||||||
|
return character.isalpha()
|
||||||
|
|
||||||
|
def feed(self, character: str) -> None:
|
||||||
|
self._character_count += 1
|
||||||
|
|
||||||
|
if is_accentuated(character):
|
||||||
|
self._accentuated_count += 1
|
||||||
|
|
||||||
|
def reset(self) -> None: # Abstract
|
||||||
|
self._character_count = 0
|
||||||
|
self._accentuated_count = 0
|
||||||
|
|
||||||
|
@property
|
||||||
|
def ratio(self) -> float:
|
||||||
|
if self._character_count < 8:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
ratio_of_accentuation: float = self._accentuated_count / self._character_count
|
||||||
|
return ratio_of_accentuation if ratio_of_accentuation >= 0.35 else 0.0
|
||||||
|
|
||||||
|
|
||||||
|
class UnprintablePlugin(MessDetectorPlugin):
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._unprintable_count: int = 0
|
||||||
|
self._character_count: int = 0
|
||||||
|
|
||||||
|
def eligible(self, character: str) -> bool:
|
||||||
|
return True
|
||||||
|
|
||||||
|
def feed(self, character: str) -> None:
|
||||||
|
if is_unprintable(character):
|
||||||
|
self._unprintable_count += 1
|
||||||
|
self._character_count += 1
|
||||||
|
|
||||||
|
def reset(self) -> None: # Abstract
|
||||||
|
self._unprintable_count = 0
|
||||||
|
|
||||||
|
@property
|
||||||
|
def ratio(self) -> float:
|
||||||
|
if self._character_count == 0:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
return (self._unprintable_count * 8) / self._character_count
|
||||||
|
|
||||||
|
|
||||||
|
class SuspiciousDuplicateAccentPlugin(MessDetectorPlugin):
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._successive_count: int = 0
|
||||||
|
self._character_count: int = 0
|
||||||
|
|
||||||
|
self._last_latin_character: str | None = None
|
||||||
|
|
||||||
|
def eligible(self, character: str) -> bool:
|
||||||
|
return character.isalpha() and is_latin(character)
|
||||||
|
|
||||||
|
def feed(self, character: str) -> None:
|
||||||
|
self._character_count += 1
|
||||||
|
if (
|
||||||
|
self._last_latin_character is not None
|
||||||
|
and is_accentuated(character)
|
||||||
|
and is_accentuated(self._last_latin_character)
|
||||||
|
):
|
||||||
|
if character.isupper() and self._last_latin_character.isupper():
|
||||||
|
self._successive_count += 1
|
||||||
|
# Worse if its the same char duplicated with different accent.
|
||||||
|
if remove_accent(character) == remove_accent(self._last_latin_character):
|
||||||
|
self._successive_count += 1
|
||||||
|
self._last_latin_character = character
|
||||||
|
|
||||||
|
def reset(self) -> None: # Abstract
|
||||||
|
self._successive_count = 0
|
||||||
|
self._character_count = 0
|
||||||
|
self._last_latin_character = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def ratio(self) -> float:
|
||||||
|
if self._character_count == 0:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
return (self._successive_count * 2) / self._character_count
|
||||||
|
|
||||||
|
|
||||||
|
class SuspiciousRange(MessDetectorPlugin):
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._suspicious_successive_range_count: int = 0
|
||||||
|
self._character_count: int = 0
|
||||||
|
self._last_printable_seen: str | None = None
|
||||||
|
|
||||||
|
def eligible(self, character: str) -> bool:
|
||||||
|
return character.isprintable()
|
||||||
|
|
||||||
|
def feed(self, character: str) -> None:
|
||||||
|
self._character_count += 1
|
||||||
|
|
||||||
|
if (
|
||||||
|
character.isspace()
|
||||||
|
or is_punctuation(character)
|
||||||
|
or character in COMMON_SAFE_ASCII_CHARACTERS
|
||||||
|
):
|
||||||
|
self._last_printable_seen = None
|
||||||
|
return
|
||||||
|
|
||||||
|
if self._last_printable_seen is None:
|
||||||
|
self._last_printable_seen = character
|
||||||
|
return
|
||||||
|
|
||||||
|
unicode_range_a: str | None = unicode_range(self._last_printable_seen)
|
||||||
|
unicode_range_b: str | None = unicode_range(character)
|
||||||
|
|
||||||
|
if is_suspiciously_successive_range(unicode_range_a, unicode_range_b):
|
||||||
|
self._suspicious_successive_range_count += 1
|
||||||
|
|
||||||
|
self._last_printable_seen = character
|
||||||
|
|
||||||
|
def reset(self) -> None: # Abstract
|
||||||
|
self._character_count = 0
|
||||||
|
self._suspicious_successive_range_count = 0
|
||||||
|
self._last_printable_seen = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def ratio(self) -> float:
|
||||||
|
if self._character_count <= 13:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
ratio_of_suspicious_range_usage: float = (
|
||||||
|
self._suspicious_successive_range_count * 2
|
||||||
|
) / self._character_count
|
||||||
|
|
||||||
|
return ratio_of_suspicious_range_usage
|
||||||
|
|
||||||
|
|
||||||
|
class SuperWeirdWordPlugin(MessDetectorPlugin):
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._word_count: int = 0
|
||||||
|
self._bad_word_count: int = 0
|
||||||
|
self._foreign_long_count: int = 0
|
||||||
|
|
||||||
|
self._is_current_word_bad: bool = False
|
||||||
|
self._foreign_long_watch: bool = False
|
||||||
|
|
||||||
|
self._character_count: int = 0
|
||||||
|
self._bad_character_count: int = 0
|
||||||
|
|
||||||
|
self._buffer: str = ""
|
||||||
|
self._buffer_accent_count: int = 0
|
||||||
|
self._buffer_glyph_count: int = 0
|
||||||
|
|
||||||
|
def eligible(self, character: str) -> bool:
|
||||||
|
return True
|
||||||
|
|
||||||
|
def feed(self, character: str) -> None:
|
||||||
|
if character.isalpha():
|
||||||
|
self._buffer += character
|
||||||
|
if is_accentuated(character):
|
||||||
|
self._buffer_accent_count += 1
|
||||||
|
if (
|
||||||
|
self._foreign_long_watch is False
|
||||||
|
and (is_latin(character) is False or is_accentuated(character))
|
||||||
|
and is_cjk(character) is False
|
||||||
|
and is_hangul(character) is False
|
||||||
|
and is_katakana(character) is False
|
||||||
|
and is_hiragana(character) is False
|
||||||
|
and is_thai(character) is False
|
||||||
|
):
|
||||||
|
self._foreign_long_watch = True
|
||||||
|
if (
|
||||||
|
is_cjk(character)
|
||||||
|
or is_hangul(character)
|
||||||
|
or is_katakana(character)
|
||||||
|
or is_hiragana(character)
|
||||||
|
or is_thai(character)
|
||||||
|
):
|
||||||
|
self._buffer_glyph_count += 1
|
||||||
|
return
|
||||||
|
if not self._buffer:
|
||||||
|
return
|
||||||
|
if (
|
||||||
|
character.isspace() or is_punctuation(character) or is_separator(character)
|
||||||
|
) and self._buffer:
|
||||||
|
self._word_count += 1
|
||||||
|
buffer_length: int = len(self._buffer)
|
||||||
|
|
||||||
|
self._character_count += buffer_length
|
||||||
|
|
||||||
|
if buffer_length >= 4:
|
||||||
|
if self._buffer_accent_count / buffer_length >= 0.5:
|
||||||
|
self._is_current_word_bad = True
|
||||||
|
# Word/Buffer ending with an upper case accentuated letter are so rare,
|
||||||
|
# that we will consider them all as suspicious. Same weight as foreign_long suspicious.
|
||||||
|
elif (
|
||||||
|
is_accentuated(self._buffer[-1])
|
||||||
|
and self._buffer[-1].isupper()
|
||||||
|
and all(_.isupper() for _ in self._buffer) is False
|
||||||
|
):
|
||||||
|
self._foreign_long_count += 1
|
||||||
|
self._is_current_word_bad = True
|
||||||
|
elif self._buffer_glyph_count == 1:
|
||||||
|
self._is_current_word_bad = True
|
||||||
|
self._foreign_long_count += 1
|
||||||
|
if buffer_length >= 24 and self._foreign_long_watch:
|
||||||
|
camel_case_dst = [
|
||||||
|
i
|
||||||
|
for c, i in zip(self._buffer, range(0, buffer_length))
|
||||||
|
if c.isupper()
|
||||||
|
]
|
||||||
|
probable_camel_cased: bool = False
|
||||||
|
|
||||||
|
if camel_case_dst and (len(camel_case_dst) / buffer_length <= 0.3):
|
||||||
|
probable_camel_cased = True
|
||||||
|
|
||||||
|
if not probable_camel_cased:
|
||||||
|
self._foreign_long_count += 1
|
||||||
|
self._is_current_word_bad = True
|
||||||
|
|
||||||
|
if self._is_current_word_bad:
|
||||||
|
self._bad_word_count += 1
|
||||||
|
self._bad_character_count += len(self._buffer)
|
||||||
|
self._is_current_word_bad = False
|
||||||
|
|
||||||
|
self._foreign_long_watch = False
|
||||||
|
self._buffer = ""
|
||||||
|
self._buffer_accent_count = 0
|
||||||
|
self._buffer_glyph_count = 0
|
||||||
|
elif (
|
||||||
|
character not in {"<", ">", "-", "=", "~", "|", "_"}
|
||||||
|
and character.isdigit() is False
|
||||||
|
and is_symbol(character)
|
||||||
|
):
|
||||||
|
self._is_current_word_bad = True
|
||||||
|
self._buffer += character
|
||||||
|
|
||||||
|
def reset(self) -> None: # Abstract
|
||||||
|
self._buffer = ""
|
||||||
|
self._is_current_word_bad = False
|
||||||
|
self._foreign_long_watch = False
|
||||||
|
self._bad_word_count = 0
|
||||||
|
self._word_count = 0
|
||||||
|
self._character_count = 0
|
||||||
|
self._bad_character_count = 0
|
||||||
|
self._foreign_long_count = 0
|
||||||
|
|
||||||
|
@property
|
||||||
|
def ratio(self) -> float:
|
||||||
|
if self._word_count <= 10 and self._foreign_long_count == 0:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
return self._bad_character_count / self._character_count
|
||||||
|
|
||||||
|
|
||||||
|
class CjkUncommonPlugin(MessDetectorPlugin):
|
||||||
|
"""
|
||||||
|
Detect messy CJK text that probably means nothing.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._character_count: int = 0
|
||||||
|
self._uncommon_count: int = 0
|
||||||
|
|
||||||
|
def eligible(self, character: str) -> bool:
|
||||||
|
return is_cjk(character)
|
||||||
|
|
||||||
|
def feed(self, character: str) -> None:
|
||||||
|
self._character_count += 1
|
||||||
|
|
||||||
|
if is_cjk_uncommon(character):
|
||||||
|
self._uncommon_count += 1
|
||||||
|
return
|
||||||
|
|
||||||
|
def reset(self) -> None: # Abstract
|
||||||
|
self._character_count = 0
|
||||||
|
self._uncommon_count = 0
|
||||||
|
|
||||||
|
@property
|
||||||
|
def ratio(self) -> float:
|
||||||
|
if self._character_count < 8:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
uncommon_form_usage: float = self._uncommon_count / self._character_count
|
||||||
|
|
||||||
|
# we can be pretty sure it's garbage when uncommon characters are widely
|
||||||
|
# used. otherwise it could just be traditional chinese for example.
|
||||||
|
return uncommon_form_usage / 10 if uncommon_form_usage > 0.5 else 0.0
|
||||||
|
|
||||||
|
|
||||||
|
class ArchaicUpperLowerPlugin(MessDetectorPlugin):
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._buf: bool = False
|
||||||
|
|
||||||
|
self._character_count_since_last_sep: int = 0
|
||||||
|
|
||||||
|
self._successive_upper_lower_count: int = 0
|
||||||
|
self._successive_upper_lower_count_final: int = 0
|
||||||
|
|
||||||
|
self._character_count: int = 0
|
||||||
|
|
||||||
|
self._last_alpha_seen: str | None = None
|
||||||
|
self._current_ascii_only: bool = True
|
||||||
|
|
||||||
|
def eligible(self, character: str) -> bool:
|
||||||
|
return True
|
||||||
|
|
||||||
|
def feed(self, character: str) -> None:
|
||||||
|
is_concerned = character.isalpha() and is_case_variable(character)
|
||||||
|
chunk_sep = is_concerned is False
|
||||||
|
|
||||||
|
if chunk_sep and self._character_count_since_last_sep > 0:
|
||||||
|
if (
|
||||||
|
self._character_count_since_last_sep <= 64
|
||||||
|
and character.isdigit() is False
|
||||||
|
and self._current_ascii_only is False
|
||||||
|
):
|
||||||
|
self._successive_upper_lower_count_final += (
|
||||||
|
self._successive_upper_lower_count
|
||||||
|
)
|
||||||
|
|
||||||
|
self._successive_upper_lower_count = 0
|
||||||
|
self._character_count_since_last_sep = 0
|
||||||
|
self._last_alpha_seen = None
|
||||||
|
self._buf = False
|
||||||
|
self._character_count += 1
|
||||||
|
self._current_ascii_only = True
|
||||||
|
|
||||||
|
return
|
||||||
|
|
||||||
|
if self._current_ascii_only is True and character.isascii() is False:
|
||||||
|
self._current_ascii_only = False
|
||||||
|
|
||||||
|
if self._last_alpha_seen is not None:
|
||||||
|
if (character.isupper() and self._last_alpha_seen.islower()) or (
|
||||||
|
character.islower() and self._last_alpha_seen.isupper()
|
||||||
|
):
|
||||||
|
if self._buf is True:
|
||||||
|
self._successive_upper_lower_count += 2
|
||||||
|
self._buf = False
|
||||||
|
else:
|
||||||
|
self._buf = True
|
||||||
|
else:
|
||||||
|
self._buf = False
|
||||||
|
|
||||||
|
self._character_count += 1
|
||||||
|
self._character_count_since_last_sep += 1
|
||||||
|
self._last_alpha_seen = character
|
||||||
|
|
||||||
|
def reset(self) -> None: # Abstract
|
||||||
|
self._character_count = 0
|
||||||
|
self._character_count_since_last_sep = 0
|
||||||
|
self._successive_upper_lower_count = 0
|
||||||
|
self._successive_upper_lower_count_final = 0
|
||||||
|
self._last_alpha_seen = None
|
||||||
|
self._buf = False
|
||||||
|
self._current_ascii_only = True
|
||||||
|
|
||||||
|
@property
|
||||||
|
def ratio(self) -> float:
|
||||||
|
if self._character_count == 0:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
return self._successive_upper_lower_count_final / self._character_count
|
||||||
|
|
||||||
|
|
||||||
|
class ArabicIsolatedFormPlugin(MessDetectorPlugin):
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._character_count: int = 0
|
||||||
|
self._isolated_form_count: int = 0
|
||||||
|
|
||||||
|
def reset(self) -> None: # Abstract
|
||||||
|
self._character_count = 0
|
||||||
|
self._isolated_form_count = 0
|
||||||
|
|
||||||
|
def eligible(self, character: str) -> bool:
|
||||||
|
return is_arabic(character)
|
||||||
|
|
||||||
|
def feed(self, character: str) -> None:
|
||||||
|
self._character_count += 1
|
||||||
|
|
||||||
|
if is_arabic_isolated_form(character):
|
||||||
|
self._isolated_form_count += 1
|
||||||
|
|
||||||
|
@property
|
||||||
|
def ratio(self) -> float:
|
||||||
|
if self._character_count < 8:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
isolated_form_usage: float = self._isolated_form_count / self._character_count
|
||||||
|
|
||||||
|
return isolated_form_usage
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=1024)
|
||||||
|
def is_suspiciously_successive_range(
|
||||||
|
unicode_range_a: str | None, unicode_range_b: str | None
|
||||||
|
) -> bool:
|
||||||
|
"""
|
||||||
|
Determine if two Unicode range seen next to each other can be considered as suspicious.
|
||||||
|
"""
|
||||||
|
if unicode_range_a is None or unicode_range_b is None:
|
||||||
|
return True
|
||||||
|
|
||||||
|
if unicode_range_a == unicode_range_b:
|
||||||
|
return False
|
||||||
|
|
||||||
|
if "Latin" in unicode_range_a and "Latin" in unicode_range_b:
|
||||||
|
return False
|
||||||
|
|
||||||
|
if "Emoticons" in unicode_range_a or "Emoticons" in unicode_range_b:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Latin characters can be accompanied with a combining diacritical mark
|
||||||
|
# eg. Vietnamese.
|
||||||
|
if ("Latin" in unicode_range_a or "Latin" in unicode_range_b) and (
|
||||||
|
"Combining" in unicode_range_a or "Combining" in unicode_range_b
|
||||||
|
):
|
||||||
|
return False
|
||||||
|
|
||||||
|
keywords_range_a, keywords_range_b = (
|
||||||
|
unicode_range_a.split(" "),
|
||||||
|
unicode_range_b.split(" "),
|
||||||
|
)
|
||||||
|
|
||||||
|
for el in keywords_range_a:
|
||||||
|
if el in UNICODE_SECONDARY_RANGE_KEYWORD:
|
||||||
|
continue
|
||||||
|
if el in keywords_range_b:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Japanese Exception
|
||||||
|
range_a_jp_chars, range_b_jp_chars = (
|
||||||
|
unicode_range_a
|
||||||
|
in (
|
||||||
|
"Hiragana",
|
||||||
|
"Katakana",
|
||||||
|
),
|
||||||
|
unicode_range_b in ("Hiragana", "Katakana"),
|
||||||
|
)
|
||||||
|
if (range_a_jp_chars or range_b_jp_chars) and (
|
||||||
|
"CJK" in unicode_range_a or "CJK" in unicode_range_b
|
||||||
|
):
|
||||||
|
return False
|
||||||
|
if range_a_jp_chars and range_b_jp_chars:
|
||||||
|
return False
|
||||||
|
|
||||||
|
if "Hangul" in unicode_range_a or "Hangul" in unicode_range_b:
|
||||||
|
if "CJK" in unicode_range_a or "CJK" in unicode_range_b:
|
||||||
|
return False
|
||||||
|
if unicode_range_a == "Basic Latin" or unicode_range_b == "Basic Latin":
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Chinese/Japanese use dedicated range for punctuation and/or separators.
|
||||||
|
if ("CJK" in unicode_range_a or "CJK" in unicode_range_b) or (
|
||||||
|
unicode_range_a in ["Katakana", "Hiragana"]
|
||||||
|
and unicode_range_b in ["Katakana", "Hiragana"]
|
||||||
|
):
|
||||||
|
if "Punctuation" in unicode_range_a or "Punctuation" in unicode_range_b:
|
||||||
|
return False
|
||||||
|
if "Forms" in unicode_range_a or "Forms" in unicode_range_b:
|
||||||
|
return False
|
||||||
|
if unicode_range_a == "Basic Latin" or unicode_range_b == "Basic Latin":
|
||||||
|
return False
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=2048)
|
||||||
|
def mess_ratio(
|
||||||
|
decoded_sequence: str, maximum_threshold: float = 0.2, debug: bool = False
|
||||||
|
) -> float:
|
||||||
|
"""
|
||||||
|
Compute a mess ratio given a decoded bytes sequence. The maximum threshold does stop the computation earlier.
|
||||||
|
"""
|
||||||
|
|
||||||
|
detectors: list[MessDetectorPlugin] = [
|
||||||
|
md_class() for md_class in MessDetectorPlugin.__subclasses__()
|
||||||
|
]
|
||||||
|
|
||||||
|
length: int = len(decoded_sequence) + 1
|
||||||
|
|
||||||
|
mean_mess_ratio: float = 0.0
|
||||||
|
|
||||||
|
if length < 512:
|
||||||
|
intermediary_mean_mess_ratio_calc: int = 32
|
||||||
|
elif length <= 1024:
|
||||||
|
intermediary_mean_mess_ratio_calc = 64
|
||||||
|
else:
|
||||||
|
intermediary_mean_mess_ratio_calc = 128
|
||||||
|
|
||||||
|
for character, index in zip(decoded_sequence + "\n", range(length)):
|
||||||
|
for detector in detectors:
|
||||||
|
if detector.eligible(character):
|
||||||
|
detector.feed(character)
|
||||||
|
|
||||||
|
if (
|
||||||
|
index > 0 and index % intermediary_mean_mess_ratio_calc == 0
|
||||||
|
) or index == length - 1:
|
||||||
|
mean_mess_ratio = sum(dt.ratio for dt in detectors)
|
||||||
|
|
||||||
|
if mean_mess_ratio >= maximum_threshold:
|
||||||
|
break
|
||||||
|
|
||||||
|
if debug:
|
||||||
|
logger = getLogger("charset_normalizer")
|
||||||
|
|
||||||
|
logger.log(
|
||||||
|
TRACE,
|
||||||
|
"Mess-detector extended-analysis start. "
|
||||||
|
f"intermediary_mean_mess_ratio_calc={intermediary_mean_mess_ratio_calc} mean_mess_ratio={mean_mess_ratio} "
|
||||||
|
f"maximum_threshold={maximum_threshold}",
|
||||||
|
)
|
||||||
|
|
||||||
|
if len(decoded_sequence) > 16:
|
||||||
|
logger.log(TRACE, f"Starting with: {decoded_sequence[:16]}")
|
||||||
|
logger.log(TRACE, f"Ending with: {decoded_sequence[-16::]}")
|
||||||
|
|
||||||
|
for dt in detectors:
|
||||||
|
logger.log(TRACE, f"{dt.__class__}: {dt.ratio}")
|
||||||
|
|
||||||
|
return round(mean_mess_ratio, 3)
|
||||||
Binary file not shown.
360
.venv/Lib/site-packages/charset_normalizer/models.py
Normal file
360
.venv/Lib/site-packages/charset_normalizer/models.py
Normal file
@@ -0,0 +1,360 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from encodings.aliases import aliases
|
||||||
|
from hashlib import sha256
|
||||||
|
from json import dumps
|
||||||
|
from re import sub
|
||||||
|
from typing import Any, Iterator, List, Tuple
|
||||||
|
|
||||||
|
from .constant import RE_POSSIBLE_ENCODING_INDICATION, TOO_BIG_SEQUENCE
|
||||||
|
from .utils import iana_name, is_multi_byte_encoding, unicode_range
|
||||||
|
|
||||||
|
|
||||||
|
class CharsetMatch:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
payload: bytes,
|
||||||
|
guessed_encoding: str,
|
||||||
|
mean_mess_ratio: float,
|
||||||
|
has_sig_or_bom: bool,
|
||||||
|
languages: CoherenceMatches,
|
||||||
|
decoded_payload: str | None = None,
|
||||||
|
preemptive_declaration: str | None = None,
|
||||||
|
):
|
||||||
|
self._payload: bytes = payload
|
||||||
|
|
||||||
|
self._encoding: str = guessed_encoding
|
||||||
|
self._mean_mess_ratio: float = mean_mess_ratio
|
||||||
|
self._languages: CoherenceMatches = languages
|
||||||
|
self._has_sig_or_bom: bool = has_sig_or_bom
|
||||||
|
self._unicode_ranges: list[str] | None = None
|
||||||
|
|
||||||
|
self._leaves: list[CharsetMatch] = []
|
||||||
|
self._mean_coherence_ratio: float = 0.0
|
||||||
|
|
||||||
|
self._output_payload: bytes | None = None
|
||||||
|
self._output_encoding: str | None = None
|
||||||
|
|
||||||
|
self._string: str | None = decoded_payload
|
||||||
|
|
||||||
|
self._preemptive_declaration: str | None = preemptive_declaration
|
||||||
|
|
||||||
|
def __eq__(self, other: object) -> bool:
|
||||||
|
if not isinstance(other, CharsetMatch):
|
||||||
|
if isinstance(other, str):
|
||||||
|
return iana_name(other) == self.encoding
|
||||||
|
return False
|
||||||
|
return self.encoding == other.encoding and self.fingerprint == other.fingerprint
|
||||||
|
|
||||||
|
def __lt__(self, other: object) -> bool:
|
||||||
|
"""
|
||||||
|
Implemented to make sorted available upon CharsetMatches items.
|
||||||
|
"""
|
||||||
|
if not isinstance(other, CharsetMatch):
|
||||||
|
raise ValueError
|
||||||
|
|
||||||
|
chaos_difference: float = abs(self.chaos - other.chaos)
|
||||||
|
coherence_difference: float = abs(self.coherence - other.coherence)
|
||||||
|
|
||||||
|
# Below 1% difference --> Use Coherence
|
||||||
|
if chaos_difference < 0.01 and coherence_difference > 0.02:
|
||||||
|
return self.coherence > other.coherence
|
||||||
|
elif chaos_difference < 0.01 and coherence_difference <= 0.02:
|
||||||
|
# When having a difficult decision, use the result that decoded as many multi-byte as possible.
|
||||||
|
# preserve RAM usage!
|
||||||
|
if len(self._payload) >= TOO_BIG_SEQUENCE:
|
||||||
|
return self.chaos < other.chaos
|
||||||
|
return self.multi_byte_usage > other.multi_byte_usage
|
||||||
|
|
||||||
|
return self.chaos < other.chaos
|
||||||
|
|
||||||
|
@property
|
||||||
|
def multi_byte_usage(self) -> float:
|
||||||
|
return 1.0 - (len(str(self)) / len(self.raw))
|
||||||
|
|
||||||
|
def __str__(self) -> str:
|
||||||
|
# Lazy Str Loading
|
||||||
|
if self._string is None:
|
||||||
|
self._string = str(self._payload, self._encoding, "strict")
|
||||||
|
return self._string
|
||||||
|
|
||||||
|
def __repr__(self) -> str:
|
||||||
|
return f"<CharsetMatch '{self.encoding}' bytes({self.fingerprint})>"
|
||||||
|
|
||||||
|
def add_submatch(self, other: CharsetMatch) -> None:
|
||||||
|
if not isinstance(other, CharsetMatch) or other == self:
|
||||||
|
raise ValueError(
|
||||||
|
"Unable to add instance <{}> as a submatch of a CharsetMatch".format(
|
||||||
|
other.__class__
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
other._string = None # Unload RAM usage; dirty trick.
|
||||||
|
self._leaves.append(other)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def encoding(self) -> str:
|
||||||
|
return self._encoding
|
||||||
|
|
||||||
|
@property
|
||||||
|
def encoding_aliases(self) -> list[str]:
|
||||||
|
"""
|
||||||
|
Encoding name are known by many name, using this could help when searching for IBM855 when it's listed as CP855.
|
||||||
|
"""
|
||||||
|
also_known_as: list[str] = []
|
||||||
|
for u, p in aliases.items():
|
||||||
|
if self.encoding == u:
|
||||||
|
also_known_as.append(p)
|
||||||
|
elif self.encoding == p:
|
||||||
|
also_known_as.append(u)
|
||||||
|
return also_known_as
|
||||||
|
|
||||||
|
@property
|
||||||
|
def bom(self) -> bool:
|
||||||
|
return self._has_sig_or_bom
|
||||||
|
|
||||||
|
@property
|
||||||
|
def byte_order_mark(self) -> bool:
|
||||||
|
return self._has_sig_or_bom
|
||||||
|
|
||||||
|
@property
|
||||||
|
def languages(self) -> list[str]:
|
||||||
|
"""
|
||||||
|
Return the complete list of possible languages found in decoded sequence.
|
||||||
|
Usually not really useful. Returned list may be empty even if 'language' property return something != 'Unknown'.
|
||||||
|
"""
|
||||||
|
return [e[0] for e in self._languages]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def language(self) -> str:
|
||||||
|
"""
|
||||||
|
Most probable language found in decoded sequence. If none were detected or inferred, the property will return
|
||||||
|
"Unknown".
|
||||||
|
"""
|
||||||
|
if not self._languages:
|
||||||
|
# Trying to infer the language based on the given encoding
|
||||||
|
# Its either English or we should not pronounce ourselves in certain cases.
|
||||||
|
if "ascii" in self.could_be_from_charset:
|
||||||
|
return "English"
|
||||||
|
|
||||||
|
# doing it there to avoid circular import
|
||||||
|
from charset_normalizer.cd import encoding_languages, mb_encoding_languages
|
||||||
|
|
||||||
|
languages = (
|
||||||
|
mb_encoding_languages(self.encoding)
|
||||||
|
if is_multi_byte_encoding(self.encoding)
|
||||||
|
else encoding_languages(self.encoding)
|
||||||
|
)
|
||||||
|
|
||||||
|
if len(languages) == 0 or "Latin Based" in languages:
|
||||||
|
return "Unknown"
|
||||||
|
|
||||||
|
return languages[0]
|
||||||
|
|
||||||
|
return self._languages[0][0]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def chaos(self) -> float:
|
||||||
|
return self._mean_mess_ratio
|
||||||
|
|
||||||
|
@property
|
||||||
|
def coherence(self) -> float:
|
||||||
|
if not self._languages:
|
||||||
|
return 0.0
|
||||||
|
return self._languages[0][1]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def percent_chaos(self) -> float:
|
||||||
|
return round(self.chaos * 100, ndigits=3)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def percent_coherence(self) -> float:
|
||||||
|
return round(self.coherence * 100, ndigits=3)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def raw(self) -> bytes:
|
||||||
|
"""
|
||||||
|
Original untouched bytes.
|
||||||
|
"""
|
||||||
|
return self._payload
|
||||||
|
|
||||||
|
@property
|
||||||
|
def submatch(self) -> list[CharsetMatch]:
|
||||||
|
return self._leaves
|
||||||
|
|
||||||
|
@property
|
||||||
|
def has_submatch(self) -> bool:
|
||||||
|
return len(self._leaves) > 0
|
||||||
|
|
||||||
|
@property
|
||||||
|
def alphabets(self) -> list[str]:
|
||||||
|
if self._unicode_ranges is not None:
|
||||||
|
return self._unicode_ranges
|
||||||
|
# list detected ranges
|
||||||
|
detected_ranges: list[str | None] = [unicode_range(char) for char in str(self)]
|
||||||
|
# filter and sort
|
||||||
|
self._unicode_ranges = sorted(list({r for r in detected_ranges if r}))
|
||||||
|
return self._unicode_ranges
|
||||||
|
|
||||||
|
@property
|
||||||
|
def could_be_from_charset(self) -> list[str]:
|
||||||
|
"""
|
||||||
|
The complete list of encoding that output the exact SAME str result and therefore could be the originating
|
||||||
|
encoding.
|
||||||
|
This list does include the encoding available in property 'encoding'.
|
||||||
|
"""
|
||||||
|
return [self._encoding] + [m.encoding for m in self._leaves]
|
||||||
|
|
||||||
|
def output(self, encoding: str = "utf_8") -> bytes:
|
||||||
|
"""
|
||||||
|
Method to get re-encoded bytes payload using given target encoding. Default to UTF-8.
|
||||||
|
Any errors will be simply ignored by the encoder NOT replaced.
|
||||||
|
"""
|
||||||
|
if self._output_encoding is None or self._output_encoding != encoding:
|
||||||
|
self._output_encoding = encoding
|
||||||
|
decoded_string = str(self)
|
||||||
|
if (
|
||||||
|
self._preemptive_declaration is not None
|
||||||
|
and self._preemptive_declaration.lower()
|
||||||
|
not in ["utf-8", "utf8", "utf_8"]
|
||||||
|
):
|
||||||
|
patched_header = sub(
|
||||||
|
RE_POSSIBLE_ENCODING_INDICATION,
|
||||||
|
lambda m: m.string[m.span()[0] : m.span()[1]].replace(
|
||||||
|
m.groups()[0],
|
||||||
|
iana_name(self._output_encoding).replace("_", "-"), # type: ignore[arg-type]
|
||||||
|
),
|
||||||
|
decoded_string[:8192],
|
||||||
|
count=1,
|
||||||
|
)
|
||||||
|
|
||||||
|
decoded_string = patched_header + decoded_string[8192:]
|
||||||
|
|
||||||
|
self._output_payload = decoded_string.encode(encoding, "replace")
|
||||||
|
|
||||||
|
return self._output_payload # type: ignore
|
||||||
|
|
||||||
|
@property
|
||||||
|
def fingerprint(self) -> str:
|
||||||
|
"""
|
||||||
|
Retrieve the unique SHA256 computed using the transformed (re-encoded) payload. Not the original one.
|
||||||
|
"""
|
||||||
|
return sha256(self.output()).hexdigest()
|
||||||
|
|
||||||
|
|
||||||
|
class CharsetMatches:
|
||||||
|
"""
|
||||||
|
Container with every CharsetMatch items ordered by default from most probable to the less one.
|
||||||
|
Act like a list(iterable) but does not implements all related methods.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, results: list[CharsetMatch] | None = None):
|
||||||
|
self._results: list[CharsetMatch] = sorted(results) if results else []
|
||||||
|
|
||||||
|
def __iter__(self) -> Iterator[CharsetMatch]:
|
||||||
|
yield from self._results
|
||||||
|
|
||||||
|
def __getitem__(self, item: int | str) -> CharsetMatch:
|
||||||
|
"""
|
||||||
|
Retrieve a single item either by its position or encoding name (alias may be used here).
|
||||||
|
Raise KeyError upon invalid index or encoding not present in results.
|
||||||
|
"""
|
||||||
|
if isinstance(item, int):
|
||||||
|
return self._results[item]
|
||||||
|
if isinstance(item, str):
|
||||||
|
item = iana_name(item, False)
|
||||||
|
for result in self._results:
|
||||||
|
if item in result.could_be_from_charset:
|
||||||
|
return result
|
||||||
|
raise KeyError
|
||||||
|
|
||||||
|
def __len__(self) -> int:
|
||||||
|
return len(self._results)
|
||||||
|
|
||||||
|
def __bool__(self) -> bool:
|
||||||
|
return len(self._results) > 0
|
||||||
|
|
||||||
|
def append(self, item: CharsetMatch) -> None:
|
||||||
|
"""
|
||||||
|
Insert a single match. Will be inserted accordingly to preserve sort.
|
||||||
|
Can be inserted as a submatch.
|
||||||
|
"""
|
||||||
|
if not isinstance(item, CharsetMatch):
|
||||||
|
raise ValueError(
|
||||||
|
"Cannot append instance '{}' to CharsetMatches".format(
|
||||||
|
str(item.__class__)
|
||||||
|
)
|
||||||
|
)
|
||||||
|
# We should disable the submatch factoring when the input file is too heavy (conserve RAM usage)
|
||||||
|
if len(item.raw) < TOO_BIG_SEQUENCE:
|
||||||
|
for match in self._results:
|
||||||
|
if match.fingerprint == item.fingerprint and match.chaos == item.chaos:
|
||||||
|
match.add_submatch(item)
|
||||||
|
return
|
||||||
|
self._results.append(item)
|
||||||
|
self._results = sorted(self._results)
|
||||||
|
|
||||||
|
def best(self) -> CharsetMatch | None:
|
||||||
|
"""
|
||||||
|
Simply return the first match. Strict equivalent to matches[0].
|
||||||
|
"""
|
||||||
|
if not self._results:
|
||||||
|
return None
|
||||||
|
return self._results[0]
|
||||||
|
|
||||||
|
def first(self) -> CharsetMatch | None:
|
||||||
|
"""
|
||||||
|
Redundant method, call the method best(). Kept for BC reasons.
|
||||||
|
"""
|
||||||
|
return self.best()
|
||||||
|
|
||||||
|
|
||||||
|
CoherenceMatch = Tuple[str, float]
|
||||||
|
CoherenceMatches = List[CoherenceMatch]
|
||||||
|
|
||||||
|
|
||||||
|
class CliDetectionResult:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
path: str,
|
||||||
|
encoding: str | None,
|
||||||
|
encoding_aliases: list[str],
|
||||||
|
alternative_encodings: list[str],
|
||||||
|
language: str,
|
||||||
|
alphabets: list[str],
|
||||||
|
has_sig_or_bom: bool,
|
||||||
|
chaos: float,
|
||||||
|
coherence: float,
|
||||||
|
unicode_path: str | None,
|
||||||
|
is_preferred: bool,
|
||||||
|
):
|
||||||
|
self.path: str = path
|
||||||
|
self.unicode_path: str | None = unicode_path
|
||||||
|
self.encoding: str | None = encoding
|
||||||
|
self.encoding_aliases: list[str] = encoding_aliases
|
||||||
|
self.alternative_encodings: list[str] = alternative_encodings
|
||||||
|
self.language: str = language
|
||||||
|
self.alphabets: list[str] = alphabets
|
||||||
|
self.has_sig_or_bom: bool = has_sig_or_bom
|
||||||
|
self.chaos: float = chaos
|
||||||
|
self.coherence: float = coherence
|
||||||
|
self.is_preferred: bool = is_preferred
|
||||||
|
|
||||||
|
@property
|
||||||
|
def __dict__(self) -> dict[str, Any]: # type: ignore
|
||||||
|
return {
|
||||||
|
"path": self.path,
|
||||||
|
"encoding": self.encoding,
|
||||||
|
"encoding_aliases": self.encoding_aliases,
|
||||||
|
"alternative_encodings": self.alternative_encodings,
|
||||||
|
"language": self.language,
|
||||||
|
"alphabets": self.alphabets,
|
||||||
|
"has_sig_or_bom": self.has_sig_or_bom,
|
||||||
|
"chaos": self.chaos,
|
||||||
|
"coherence": self.coherence,
|
||||||
|
"unicode_path": self.unicode_path,
|
||||||
|
"is_preferred": self.is_preferred,
|
||||||
|
}
|
||||||
|
|
||||||
|
def to_json(self) -> str:
|
||||||
|
return dumps(self.__dict__, ensure_ascii=True, indent=4)
|
||||||
0
.venv/Lib/site-packages/charset_normalizer/py.typed
Normal file
0
.venv/Lib/site-packages/charset_normalizer/py.typed
Normal file
414
.venv/Lib/site-packages/charset_normalizer/utils.py
Normal file
414
.venv/Lib/site-packages/charset_normalizer/utils.py
Normal file
@@ -0,0 +1,414 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import importlib
|
||||||
|
import logging
|
||||||
|
import unicodedata
|
||||||
|
from codecs import IncrementalDecoder
|
||||||
|
from encodings.aliases import aliases
|
||||||
|
from functools import lru_cache
|
||||||
|
from re import findall
|
||||||
|
from typing import Generator
|
||||||
|
|
||||||
|
from _multibytecodec import ( # type: ignore[import-not-found,import]
|
||||||
|
MultibyteIncrementalDecoder,
|
||||||
|
)
|
||||||
|
|
||||||
|
from .constant import (
|
||||||
|
ENCODING_MARKS,
|
||||||
|
IANA_SUPPORTED_SIMILAR,
|
||||||
|
RE_POSSIBLE_ENCODING_INDICATION,
|
||||||
|
UNICODE_RANGES_COMBINED,
|
||||||
|
UNICODE_SECONDARY_RANGE_KEYWORD,
|
||||||
|
UTF8_MAXIMAL_ALLOCATION,
|
||||||
|
COMMON_CJK_CHARACTERS,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_accentuated(character: str) -> bool:
|
||||||
|
try:
|
||||||
|
description: str = unicodedata.name(character)
|
||||||
|
except ValueError: # Defensive: unicode database outdated?
|
||||||
|
return False
|
||||||
|
return (
|
||||||
|
"WITH GRAVE" in description
|
||||||
|
or "WITH ACUTE" in description
|
||||||
|
or "WITH CEDILLA" in description
|
||||||
|
or "WITH DIAERESIS" in description
|
||||||
|
or "WITH CIRCUMFLEX" in description
|
||||||
|
or "WITH TILDE" in description
|
||||||
|
or "WITH MACRON" in description
|
||||||
|
or "WITH RING ABOVE" in description
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def remove_accent(character: str) -> str:
|
||||||
|
decomposed: str = unicodedata.decomposition(character)
|
||||||
|
if not decomposed:
|
||||||
|
return character
|
||||||
|
|
||||||
|
codes: list[str] = decomposed.split(" ")
|
||||||
|
|
||||||
|
return chr(int(codes[0], 16))
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def unicode_range(character: str) -> str | None:
|
||||||
|
"""
|
||||||
|
Retrieve the Unicode range official name from a single character.
|
||||||
|
"""
|
||||||
|
character_ord: int = ord(character)
|
||||||
|
|
||||||
|
for range_name, ord_range in UNICODE_RANGES_COMBINED.items():
|
||||||
|
if character_ord in ord_range:
|
||||||
|
return range_name
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_latin(character: str) -> bool:
|
||||||
|
try:
|
||||||
|
description: str = unicodedata.name(character)
|
||||||
|
except ValueError: # Defensive: unicode database outdated?
|
||||||
|
return False
|
||||||
|
return "LATIN" in description
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_punctuation(character: str) -> bool:
|
||||||
|
character_category: str = unicodedata.category(character)
|
||||||
|
|
||||||
|
if "P" in character_category:
|
||||||
|
return True
|
||||||
|
|
||||||
|
character_range: str | None = unicode_range(character)
|
||||||
|
|
||||||
|
if character_range is None:
|
||||||
|
return False
|
||||||
|
|
||||||
|
return "Punctuation" in character_range
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_symbol(character: str) -> bool:
|
||||||
|
character_category: str = unicodedata.category(character)
|
||||||
|
|
||||||
|
if "S" in character_category or "N" in character_category:
|
||||||
|
return True
|
||||||
|
|
||||||
|
character_range: str | None = unicode_range(character)
|
||||||
|
|
||||||
|
if character_range is None:
|
||||||
|
return False
|
||||||
|
|
||||||
|
return "Forms" in character_range and character_category != "Lo"
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_emoticon(character: str) -> bool:
|
||||||
|
character_range: str | None = unicode_range(character)
|
||||||
|
|
||||||
|
if character_range is None:
|
||||||
|
return False
|
||||||
|
|
||||||
|
return "Emoticons" in character_range or "Pictographs" in character_range
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_separator(character: str) -> bool:
|
||||||
|
if character.isspace() or character in {"|", "+", "<", ">"}:
|
||||||
|
return True
|
||||||
|
|
||||||
|
character_category: str = unicodedata.category(character)
|
||||||
|
|
||||||
|
return "Z" in character_category or character_category in {"Po", "Pd", "Pc"}
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_case_variable(character: str) -> bool:
|
||||||
|
return character.islower() != character.isupper()
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_cjk(character: str) -> bool:
|
||||||
|
try:
|
||||||
|
character_name = unicodedata.name(character)
|
||||||
|
except ValueError: # Defensive: unicode database outdated?
|
||||||
|
return False
|
||||||
|
|
||||||
|
return "CJK" in character_name
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_hiragana(character: str) -> bool:
|
||||||
|
try:
|
||||||
|
character_name = unicodedata.name(character)
|
||||||
|
except ValueError: # Defensive: unicode database outdated?
|
||||||
|
return False
|
||||||
|
|
||||||
|
return "HIRAGANA" in character_name
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_katakana(character: str) -> bool:
|
||||||
|
try:
|
||||||
|
character_name = unicodedata.name(character)
|
||||||
|
except ValueError: # Defensive: unicode database outdated?
|
||||||
|
return False
|
||||||
|
|
||||||
|
return "KATAKANA" in character_name
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_hangul(character: str) -> bool:
|
||||||
|
try:
|
||||||
|
character_name = unicodedata.name(character)
|
||||||
|
except ValueError: # Defensive: unicode database outdated?
|
||||||
|
return False
|
||||||
|
|
||||||
|
return "HANGUL" in character_name
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_thai(character: str) -> bool:
|
||||||
|
try:
|
||||||
|
character_name = unicodedata.name(character)
|
||||||
|
except ValueError: # Defensive: unicode database outdated?
|
||||||
|
return False
|
||||||
|
|
||||||
|
return "THAI" in character_name
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_arabic(character: str) -> bool:
|
||||||
|
try:
|
||||||
|
character_name = unicodedata.name(character)
|
||||||
|
except ValueError: # Defensive: unicode database outdated?
|
||||||
|
return False
|
||||||
|
|
||||||
|
return "ARABIC" in character_name
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_arabic_isolated_form(character: str) -> bool:
|
||||||
|
try:
|
||||||
|
character_name = unicodedata.name(character)
|
||||||
|
except ValueError: # Defensive: unicode database outdated?
|
||||||
|
return False
|
||||||
|
|
||||||
|
return "ARABIC" in character_name and "ISOLATED FORM" in character_name
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_cjk_uncommon(character: str) -> bool:
|
||||||
|
return character not in COMMON_CJK_CHARACTERS
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=len(UNICODE_RANGES_COMBINED))
|
||||||
|
def is_unicode_range_secondary(range_name: str) -> bool:
|
||||||
|
return any(keyword in range_name for keyword in UNICODE_SECONDARY_RANGE_KEYWORD)
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=UTF8_MAXIMAL_ALLOCATION)
|
||||||
|
def is_unprintable(character: str) -> bool:
|
||||||
|
return (
|
||||||
|
character.isspace() is False # includes \n \t \r \v
|
||||||
|
and character.isprintable() is False
|
||||||
|
and character != "\x1a" # Why? Its the ASCII substitute character.
|
||||||
|
and character != "\ufeff" # bug discovered in Python,
|
||||||
|
# Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space.
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def any_specified_encoding(sequence: bytes, search_zone: int = 8192) -> str | None:
|
||||||
|
"""
|
||||||
|
Extract using ASCII-only decoder any specified encoding in the first n-bytes.
|
||||||
|
"""
|
||||||
|
if not isinstance(sequence, bytes):
|
||||||
|
raise TypeError
|
||||||
|
|
||||||
|
seq_len: int = len(sequence)
|
||||||
|
|
||||||
|
results: list[str] = findall(
|
||||||
|
RE_POSSIBLE_ENCODING_INDICATION,
|
||||||
|
sequence[: min(seq_len, search_zone)].decode("ascii", errors="ignore"),
|
||||||
|
)
|
||||||
|
|
||||||
|
if len(results) == 0:
|
||||||
|
return None
|
||||||
|
|
||||||
|
for specified_encoding in results:
|
||||||
|
specified_encoding = specified_encoding.lower().replace("-", "_")
|
||||||
|
|
||||||
|
encoding_alias: str
|
||||||
|
encoding_iana: str
|
||||||
|
|
||||||
|
for encoding_alias, encoding_iana in aliases.items():
|
||||||
|
if encoding_alias == specified_encoding:
|
||||||
|
return encoding_iana
|
||||||
|
if encoding_iana == specified_encoding:
|
||||||
|
return encoding_iana
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache(maxsize=128)
|
||||||
|
def is_multi_byte_encoding(name: str) -> bool:
|
||||||
|
"""
|
||||||
|
Verify is a specific encoding is a multi byte one based on it IANA name
|
||||||
|
"""
|
||||||
|
return name in {
|
||||||
|
"utf_8",
|
||||||
|
"utf_8_sig",
|
||||||
|
"utf_16",
|
||||||
|
"utf_16_be",
|
||||||
|
"utf_16_le",
|
||||||
|
"utf_32",
|
||||||
|
"utf_32_le",
|
||||||
|
"utf_32_be",
|
||||||
|
"utf_7",
|
||||||
|
} or issubclass(
|
||||||
|
importlib.import_module(f"encodings.{name}").IncrementalDecoder,
|
||||||
|
MultibyteIncrementalDecoder,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def identify_sig_or_bom(sequence: bytes) -> tuple[str | None, bytes]:
|
||||||
|
"""
|
||||||
|
Identify and extract SIG/BOM in given sequence.
|
||||||
|
"""
|
||||||
|
|
||||||
|
for iana_encoding in ENCODING_MARKS:
|
||||||
|
marks: bytes | list[bytes] = ENCODING_MARKS[iana_encoding]
|
||||||
|
|
||||||
|
if isinstance(marks, bytes):
|
||||||
|
marks = [marks]
|
||||||
|
|
||||||
|
for mark in marks:
|
||||||
|
if sequence.startswith(mark):
|
||||||
|
return iana_encoding, mark
|
||||||
|
|
||||||
|
return None, b""
|
||||||
|
|
||||||
|
|
||||||
|
def should_strip_sig_or_bom(iana_encoding: str) -> bool:
|
||||||
|
return iana_encoding not in {"utf_16", "utf_32"}
|
||||||
|
|
||||||
|
|
||||||
|
def iana_name(cp_name: str, strict: bool = True) -> str:
|
||||||
|
"""Returns the Python normalized encoding name (Not the IANA official name)."""
|
||||||
|
cp_name = cp_name.lower().replace("-", "_")
|
||||||
|
|
||||||
|
encoding_alias: str
|
||||||
|
encoding_iana: str
|
||||||
|
|
||||||
|
for encoding_alias, encoding_iana in aliases.items():
|
||||||
|
if cp_name in [encoding_alias, encoding_iana]:
|
||||||
|
return encoding_iana
|
||||||
|
|
||||||
|
if strict:
|
||||||
|
raise ValueError(f"Unable to retrieve IANA for '{cp_name}'")
|
||||||
|
|
||||||
|
return cp_name
|
||||||
|
|
||||||
|
|
||||||
|
def cp_similarity(iana_name_a: str, iana_name_b: str) -> float:
|
||||||
|
if is_multi_byte_encoding(iana_name_a) or is_multi_byte_encoding(iana_name_b):
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
decoder_a = importlib.import_module(f"encodings.{iana_name_a}").IncrementalDecoder
|
||||||
|
decoder_b = importlib.import_module(f"encodings.{iana_name_b}").IncrementalDecoder
|
||||||
|
|
||||||
|
id_a: IncrementalDecoder = decoder_a(errors="ignore")
|
||||||
|
id_b: IncrementalDecoder = decoder_b(errors="ignore")
|
||||||
|
|
||||||
|
character_match_count: int = 0
|
||||||
|
|
||||||
|
for i in range(255):
|
||||||
|
to_be_decoded: bytes = bytes([i])
|
||||||
|
if id_a.decode(to_be_decoded) == id_b.decode(to_be_decoded):
|
||||||
|
character_match_count += 1
|
||||||
|
|
||||||
|
return character_match_count / 254
|
||||||
|
|
||||||
|
|
||||||
|
def is_cp_similar(iana_name_a: str, iana_name_b: str) -> bool:
|
||||||
|
"""
|
||||||
|
Determine if two code page are at least 80% similar. IANA_SUPPORTED_SIMILAR dict was generated using
|
||||||
|
the function cp_similarity.
|
||||||
|
"""
|
||||||
|
return (
|
||||||
|
iana_name_a in IANA_SUPPORTED_SIMILAR
|
||||||
|
and iana_name_b in IANA_SUPPORTED_SIMILAR[iana_name_a]
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def set_logging_handler(
|
||||||
|
name: str = "charset_normalizer",
|
||||||
|
level: int = logging.INFO,
|
||||||
|
format_string: str = "%(asctime)s | %(levelname)s | %(message)s",
|
||||||
|
) -> None:
|
||||||
|
logger = logging.getLogger(name)
|
||||||
|
logger.setLevel(level)
|
||||||
|
|
||||||
|
handler = logging.StreamHandler()
|
||||||
|
handler.setFormatter(logging.Formatter(format_string))
|
||||||
|
logger.addHandler(handler)
|
||||||
|
|
||||||
|
|
||||||
|
def cut_sequence_chunks(
|
||||||
|
sequences: bytes,
|
||||||
|
encoding_iana: str,
|
||||||
|
offsets: range,
|
||||||
|
chunk_size: int,
|
||||||
|
bom_or_sig_available: bool,
|
||||||
|
strip_sig_or_bom: bool,
|
||||||
|
sig_payload: bytes,
|
||||||
|
is_multi_byte_decoder: bool,
|
||||||
|
decoded_payload: str | None = None,
|
||||||
|
) -> Generator[str, None, None]:
|
||||||
|
if decoded_payload and is_multi_byte_decoder is False:
|
||||||
|
for i in offsets:
|
||||||
|
chunk = decoded_payload[i : i + chunk_size]
|
||||||
|
if not chunk:
|
||||||
|
break
|
||||||
|
yield chunk
|
||||||
|
else:
|
||||||
|
for i in offsets:
|
||||||
|
chunk_end = i + chunk_size
|
||||||
|
if chunk_end > len(sequences) + 8:
|
||||||
|
continue
|
||||||
|
|
||||||
|
cut_sequence = sequences[i : i + chunk_size]
|
||||||
|
|
||||||
|
if bom_or_sig_available and strip_sig_or_bom is False:
|
||||||
|
cut_sequence = sig_payload + cut_sequence
|
||||||
|
|
||||||
|
chunk = cut_sequence.decode(
|
||||||
|
encoding_iana,
|
||||||
|
errors="ignore" if is_multi_byte_decoder else "strict",
|
||||||
|
)
|
||||||
|
|
||||||
|
# multi-byte bad cutting detector and adjustment
|
||||||
|
# not the cleanest way to perform that fix but clever enough for now.
|
||||||
|
if is_multi_byte_decoder and i > 0:
|
||||||
|
chunk_partial_size_chk: int = min(chunk_size, 16)
|
||||||
|
|
||||||
|
if (
|
||||||
|
decoded_payload
|
||||||
|
and chunk[:chunk_partial_size_chk] not in decoded_payload
|
||||||
|
):
|
||||||
|
for j in range(i, i - 4, -1):
|
||||||
|
cut_sequence = sequences[j:chunk_end]
|
||||||
|
|
||||||
|
if bom_or_sig_available and strip_sig_or_bom is False:
|
||||||
|
cut_sequence = sig_payload + cut_sequence
|
||||||
|
|
||||||
|
chunk = cut_sequence.decode(encoding_iana, errors="ignore")
|
||||||
|
|
||||||
|
if chunk[:chunk_partial_size_chk] in decoded_payload:
|
||||||
|
break
|
||||||
|
|
||||||
|
yield chunk
|
||||||
8
.venv/Lib/site-packages/charset_normalizer/version.py
Normal file
8
.venv/Lib/site-packages/charset_normalizer/version.py
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
"""
|
||||||
|
Expose version
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
__version__ = "3.4.4"
|
||||||
|
VERSION = __version__.split(".")
|
||||||
1
.venv/Lib/site-packages/django-5.2.8.dist-info/INSTALLER
Normal file
1
.venv/Lib/site-packages/django-5.2.8.dist-info/INSTALLER
Normal file
@@ -0,0 +1 @@
|
|||||||
|
pip
|
||||||
98
.venv/Lib/site-packages/django-5.2.8.dist-info/METADATA
Normal file
98
.venv/Lib/site-packages/django-5.2.8.dist-info/METADATA
Normal file
@@ -0,0 +1,98 @@
|
|||||||
|
Metadata-Version: 2.4
|
||||||
|
Name: Django
|
||||||
|
Version: 5.2.8
|
||||||
|
Summary: A high-level Python web framework that encourages rapid development and clean, pragmatic design.
|
||||||
|
Author-email: Django Software Foundation <foundation@djangoproject.com>
|
||||||
|
License-Expression: BSD-3-Clause
|
||||||
|
Project-URL: Homepage, https://www.djangoproject.com/
|
||||||
|
Project-URL: Documentation, https://docs.djangoproject.com/
|
||||||
|
Project-URL: Release notes, https://docs.djangoproject.com/en/stable/releases/
|
||||||
|
Project-URL: Funding, https://www.djangoproject.com/fundraising/
|
||||||
|
Project-URL: Source, https://github.com/django/django
|
||||||
|
Project-URL: Tracker, https://code.djangoproject.com/
|
||||||
|
Classifier: Development Status :: 5 - Production/Stable
|
||||||
|
Classifier: Environment :: Web Environment
|
||||||
|
Classifier: Framework :: Django
|
||||||
|
Classifier: Intended Audience :: Developers
|
||||||
|
Classifier: Operating System :: OS Independent
|
||||||
|
Classifier: Programming Language :: Python
|
||||||
|
Classifier: Programming Language :: Python :: 3
|
||||||
|
Classifier: Programming Language :: Python :: 3 :: Only
|
||||||
|
Classifier: Programming Language :: Python :: 3.10
|
||||||
|
Classifier: Programming Language :: Python :: 3.11
|
||||||
|
Classifier: Programming Language :: Python :: 3.12
|
||||||
|
Classifier: Programming Language :: Python :: 3.13
|
||||||
|
Classifier: Programming Language :: Python :: 3.14
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
|
||||||
|
Classifier: Topic :: Internet :: WWW/HTTP :: WSGI
|
||||||
|
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
|
||||||
|
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
||||||
|
Requires-Python: >=3.10
|
||||||
|
Description-Content-Type: text/x-rst
|
||||||
|
License-File: LICENSE
|
||||||
|
License-File: LICENSE.python
|
||||||
|
Requires-Dist: asgiref>=3.8.1
|
||||||
|
Requires-Dist: sqlparse>=0.3.1
|
||||||
|
Requires-Dist: tzdata; sys_platform == "win32"
|
||||||
|
Provides-Extra: argon2
|
||||||
|
Requires-Dist: argon2-cffi>=19.1.0; extra == "argon2"
|
||||||
|
Provides-Extra: bcrypt
|
||||||
|
Requires-Dist: bcrypt; extra == "bcrypt"
|
||||||
|
Dynamic: license-file
|
||||||
|
|
||||||
|
======
|
||||||
|
Django
|
||||||
|
======
|
||||||
|
|
||||||
|
Django is a high-level Python web framework that encourages rapid development
|
||||||
|
and clean, pragmatic design. Thanks for checking it out.
|
||||||
|
|
||||||
|
All documentation is in the "``docs``" directory and online at
|
||||||
|
https://docs.djangoproject.com/en/stable/. If you're just getting started,
|
||||||
|
here's how we recommend you read the docs:
|
||||||
|
|
||||||
|
* First, read ``docs/intro/install.txt`` for instructions on installing Django.
|
||||||
|
|
||||||
|
* Next, work through the tutorials in order (``docs/intro/tutorial01.txt``,
|
||||||
|
``docs/intro/tutorial02.txt``, etc.).
|
||||||
|
|
||||||
|
* If you want to set up an actual deployment server, read
|
||||||
|
``docs/howto/deployment/index.txt`` for instructions.
|
||||||
|
|
||||||
|
* You'll probably want to read through the topical guides (in ``docs/topics``)
|
||||||
|
next; from there you can jump to the HOWTOs (in ``docs/howto``) for specific
|
||||||
|
problems, and check out the reference (``docs/ref``) for gory details.
|
||||||
|
|
||||||
|
* See ``docs/README`` for instructions on building an HTML version of the docs.
|
||||||
|
|
||||||
|
Docs are updated rigorously. If you find any problems in the docs, or think
|
||||||
|
they should be clarified in any way, please take 30 seconds to fill out a
|
||||||
|
ticket here: https://code.djangoproject.com/newticket
|
||||||
|
|
||||||
|
To get more help:
|
||||||
|
|
||||||
|
* Join the ``#django`` channel on ``irc.libera.chat``. Lots of helpful people
|
||||||
|
hang out there. `Webchat is available <https://web.libera.chat/#django>`_.
|
||||||
|
|
||||||
|
* Join the `Django Discord community <https://chat.djangoproject.com>`_.
|
||||||
|
|
||||||
|
* Join the community on the `Django Forum <https://forum.djangoproject.com/>`_.
|
||||||
|
|
||||||
|
To contribute to Django:
|
||||||
|
|
||||||
|
* Check out https://docs.djangoproject.com/en/dev/internals/contributing/ for
|
||||||
|
information about getting involved.
|
||||||
|
|
||||||
|
To run Django's test suite:
|
||||||
|
|
||||||
|
* Follow the instructions in the "Unit tests" section of
|
||||||
|
``docs/internals/contributing/writing-code/unit-tests.txt``, published online at
|
||||||
|
https://docs.djangoproject.com/en/dev/internals/contributing/writing-code/unit-tests/#running-the-unit-tests
|
||||||
|
|
||||||
|
Supporting the Development of Django
|
||||||
|
====================================
|
||||||
|
|
||||||
|
Django's development depends on your contributions.
|
||||||
|
|
||||||
|
If you depend on Django, remember to support the Django Software Foundation: https://www.djangoproject.com/fundraising/
|
||||||
4553
.venv/Lib/site-packages/django-5.2.8.dist-info/RECORD
Normal file
4553
.venv/Lib/site-packages/django-5.2.8.dist-info/RECORD
Normal file
File diff suppressed because it is too large
Load Diff
5
.venv/Lib/site-packages/django-5.2.8.dist-info/WHEEL
Normal file
5
.venv/Lib/site-packages/django-5.2.8.dist-info/WHEEL
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
Wheel-Version: 1.0
|
||||||
|
Generator: setuptools (80.9.0)
|
||||||
|
Root-Is-Purelib: true
|
||||||
|
Tag: py3-none-any
|
||||||
|
|
||||||
@@ -0,0 +1,2 @@
|
|||||||
|
[console_scripts]
|
||||||
|
django-admin = django.core.management:execute_from_command_line
|
||||||
@@ -0,0 +1,27 @@
|
|||||||
|
Copyright (c) Django Software Foundation and individual contributors.
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without modification,
|
||||||
|
are permitted provided that the following conditions are met:
|
||||||
|
|
||||||
|
1. Redistributions of source code must retain the above copyright notice,
|
||||||
|
this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
3. Neither the name of Django nor the names of its contributors may be used
|
||||||
|
to endorse or promote products derived from this software without
|
||||||
|
specific prior written permission.
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
||||||
|
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||||
|
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||||
|
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
||||||
|
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
||||||
|
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
||||||
|
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
|
||||||
|
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||||
|
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
@@ -0,0 +1,288 @@
|
|||||||
|
Django is licensed under the three-clause BSD license; see the file
|
||||||
|
LICENSE for details.
|
||||||
|
|
||||||
|
Django includes code from the Python standard library, which is licensed under
|
||||||
|
the Python license, a permissive open source license. The copyright and license
|
||||||
|
is included below for compliance with Python's terms.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
Copyright (c) 2001-present Python Software Foundation; All Rights Reserved
|
||||||
|
|
||||||
|
A. HISTORY OF THE SOFTWARE
|
||||||
|
==========================
|
||||||
|
|
||||||
|
Python was created in the early 1990s by Guido van Rossum at Stichting
|
||||||
|
Mathematisch Centrum (CWI, see https://www.cwi.nl) in the Netherlands
|
||||||
|
as a successor of a language called ABC. Guido remains Python's
|
||||||
|
principal author, although it includes many contributions from others.
|
||||||
|
|
||||||
|
In 1995, Guido continued his work on Python at the Corporation for
|
||||||
|
National Research Initiatives (CNRI, see https://www.cnri.reston.va.us)
|
||||||
|
in Reston, Virginia where he released several versions of the
|
||||||
|
software.
|
||||||
|
|
||||||
|
In May 2000, Guido and the Python core development team moved to
|
||||||
|
BeOpen.com to form the BeOpen PythonLabs team. In October of the same
|
||||||
|
year, the PythonLabs team moved to Digital Creations, which became
|
||||||
|
Zope Corporation. In 2001, the Python Software Foundation (PSF, see
|
||||||
|
https://www.python.org/psf/) was formed, a non-profit organization
|
||||||
|
created specifically to own Python-related Intellectual Property.
|
||||||
|
Zope Corporation was a sponsoring member of the PSF.
|
||||||
|
|
||||||
|
All Python releases are Open Source (see https://opensource.org for
|
||||||
|
the Open Source Definition). Historically, most, but not all, Python
|
||||||
|
releases have also been GPL-compatible; the table below summarizes
|
||||||
|
the various releases.
|
||||||
|
|
||||||
|
Release Derived Year Owner GPL-
|
||||||
|
from compatible? (1)
|
||||||
|
|
||||||
|
0.9.0 thru 1.2 1991-1995 CWI yes
|
||||||
|
1.3 thru 1.5.2 1.2 1995-1999 CNRI yes
|
||||||
|
1.6 1.5.2 2000 CNRI no
|
||||||
|
2.0 1.6 2000 BeOpen.com no
|
||||||
|
1.6.1 1.6 2001 CNRI yes (2)
|
||||||
|
2.1 2.0+1.6.1 2001 PSF no
|
||||||
|
2.0.1 2.0+1.6.1 2001 PSF yes
|
||||||
|
2.1.1 2.1+2.0.1 2001 PSF yes
|
||||||
|
2.1.2 2.1.1 2002 PSF yes
|
||||||
|
2.1.3 2.1.2 2002 PSF yes
|
||||||
|
2.2 and above 2.1.1 2001-now PSF yes
|
||||||
|
|
||||||
|
Footnotes:
|
||||||
|
|
||||||
|
(1) GPL-compatible doesn't mean that we're distributing Python under
|
||||||
|
the GPL. All Python licenses, unlike the GPL, let you distribute
|
||||||
|
a modified version without making your changes open source. The
|
||||||
|
GPL-compatible licenses make it possible to combine Python with
|
||||||
|
other software that is released under the GPL; the others don't.
|
||||||
|
|
||||||
|
(2) According to Richard Stallman, 1.6.1 is not GPL-compatible,
|
||||||
|
because its license has a choice of law clause. According to
|
||||||
|
CNRI, however, Stallman's lawyer has told CNRI's lawyer that 1.6.1
|
||||||
|
is "not incompatible" with the GPL.
|
||||||
|
|
||||||
|
Thanks to the many outside volunteers who have worked under Guido's
|
||||||
|
direction to make these releases possible.
|
||||||
|
|
||||||
|
|
||||||
|
B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON
|
||||||
|
===============================================================
|
||||||
|
|
||||||
|
Python software and documentation are licensed under the
|
||||||
|
Python Software Foundation License Version 2.
|
||||||
|
|
||||||
|
Starting with Python 3.8.6, examples, recipes, and other code in
|
||||||
|
the documentation are dual licensed under the PSF License Version 2
|
||||||
|
and the Zero-Clause BSD license.
|
||||||
|
|
||||||
|
Some software incorporated into Python is under different licenses.
|
||||||
|
The licenses are listed with code falling under that license.
|
||||||
|
|
||||||
|
|
||||||
|
PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
|
1. This LICENSE AGREEMENT is between the Python Software Foundation
|
||||||
|
("PSF"), and the Individual or Organization ("Licensee") accessing and
|
||||||
|
otherwise using this software ("Python") in source or binary form and
|
||||||
|
its associated documentation.
|
||||||
|
|
||||||
|
2. Subject to the terms and conditions of this License Agreement, PSF hereby
|
||||||
|
grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce,
|
||||||
|
analyze, test, perform and/or display publicly, prepare derivative works,
|
||||||
|
distribute, and otherwise use Python alone or in any derivative version,
|
||||||
|
provided, however, that PSF's License Agreement and PSF's notice of copyright,
|
||||||
|
i.e., "Copyright (c) 2001 Python Software Foundation; All Rights Reserved"
|
||||||
|
are retained in Python alone or in any derivative version prepared by Licensee.
|
||||||
|
|
||||||
|
3. In the event Licensee prepares a derivative work that is based on
|
||||||
|
or incorporates Python or any part thereof, and wants to make
|
||||||
|
the derivative work available to others as provided herein, then
|
||||||
|
Licensee hereby agrees to include in any such work a brief summary of
|
||||||
|
the changes made to Python.
|
||||||
|
|
||||||
|
4. PSF is making Python available to Licensee on an "AS IS"
|
||||||
|
basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
|
||||||
|
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND
|
||||||
|
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
|
||||||
|
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT
|
||||||
|
INFRINGE ANY THIRD PARTY RIGHTS.
|
||||||
|
|
||||||
|
5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
|
||||||
|
FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
|
||||||
|
A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON,
|
||||||
|
OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
|
||||||
|
|
||||||
|
6. This License Agreement will automatically terminate upon a material
|
||||||
|
breach of its terms and conditions.
|
||||||
|
|
||||||
|
7. Nothing in this License Agreement shall be deemed to create any
|
||||||
|
relationship of agency, partnership, or joint venture between PSF and
|
||||||
|
Licensee. This License Agreement does not grant permission to use PSF
|
||||||
|
trademarks or trade name in a trademark sense to endorse or promote
|
||||||
|
products or services of Licensee, or any third party.
|
||||||
|
|
||||||
|
8. By copying, installing or otherwise using Python, Licensee
|
||||||
|
agrees to be bound by the terms and conditions of this License
|
||||||
|
Agreement.
|
||||||
|
|
||||||
|
|
||||||
|
BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0
|
||||||
|
-------------------------------------------
|
||||||
|
|
||||||
|
BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1
|
||||||
|
|
||||||
|
1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an
|
||||||
|
office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the
|
||||||
|
Individual or Organization ("Licensee") accessing and otherwise using
|
||||||
|
this software in source or binary form and its associated
|
||||||
|
documentation ("the Software").
|
||||||
|
|
||||||
|
2. Subject to the terms and conditions of this BeOpen Python License
|
||||||
|
Agreement, BeOpen hereby grants Licensee a non-exclusive,
|
||||||
|
royalty-free, world-wide license to reproduce, analyze, test, perform
|
||||||
|
and/or display publicly, prepare derivative works, distribute, and
|
||||||
|
otherwise use the Software alone or in any derivative version,
|
||||||
|
provided, however, that the BeOpen Python License is retained in the
|
||||||
|
Software, alone or in any derivative version prepared by Licensee.
|
||||||
|
|
||||||
|
3. BeOpen is making the Software available to Licensee on an "AS IS"
|
||||||
|
basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
|
||||||
|
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND
|
||||||
|
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
|
||||||
|
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT
|
||||||
|
INFRINGE ANY THIRD PARTY RIGHTS.
|
||||||
|
|
||||||
|
4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE
|
||||||
|
SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS
|
||||||
|
AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY
|
||||||
|
DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
|
||||||
|
|
||||||
|
5. This License Agreement will automatically terminate upon a material
|
||||||
|
breach of its terms and conditions.
|
||||||
|
|
||||||
|
6. This License Agreement shall be governed by and interpreted in all
|
||||||
|
respects by the law of the State of California, excluding conflict of
|
||||||
|
law provisions. Nothing in this License Agreement shall be deemed to
|
||||||
|
create any relationship of agency, partnership, or joint venture
|
||||||
|
between BeOpen and Licensee. This License Agreement does not grant
|
||||||
|
permission to use BeOpen trademarks or trade names in a trademark
|
||||||
|
sense to endorse or promote products or services of Licensee, or any
|
||||||
|
third party. As an exception, the "BeOpen Python" logos available at
|
||||||
|
http://www.pythonlabs.com/logos.html may be used according to the
|
||||||
|
permissions granted on that web page.
|
||||||
|
|
||||||
|
7. By copying, installing or otherwise using the software, Licensee
|
||||||
|
agrees to be bound by the terms and conditions of this License
|
||||||
|
Agreement.
|
||||||
|
|
||||||
|
|
||||||
|
CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1
|
||||||
|
---------------------------------------
|
||||||
|
|
||||||
|
1. This LICENSE AGREEMENT is between the Corporation for National
|
||||||
|
Research Initiatives, having an office at 1895 Preston White Drive,
|
||||||
|
Reston, VA 20191 ("CNRI"), and the Individual or Organization
|
||||||
|
("Licensee") accessing and otherwise using Python 1.6.1 software in
|
||||||
|
source or binary form and its associated documentation.
|
||||||
|
|
||||||
|
2. Subject to the terms and conditions of this License Agreement, CNRI
|
||||||
|
hereby grants Licensee a nonexclusive, royalty-free, world-wide
|
||||||
|
license to reproduce, analyze, test, perform and/or display publicly,
|
||||||
|
prepare derivative works, distribute, and otherwise use Python 1.6.1
|
||||||
|
alone or in any derivative version, provided, however, that CNRI's
|
||||||
|
License Agreement and CNRI's notice of copyright, i.e., "Copyright (c)
|
||||||
|
1995-2001 Corporation for National Research Initiatives; All Rights
|
||||||
|
Reserved" are retained in Python 1.6.1 alone or in any derivative
|
||||||
|
version prepared by Licensee. Alternately, in lieu of CNRI's License
|
||||||
|
Agreement, Licensee may substitute the following text (omitting the
|
||||||
|
quotes): "Python 1.6.1 is made available subject to the terms and
|
||||||
|
conditions in CNRI's License Agreement. This Agreement together with
|
||||||
|
Python 1.6.1 may be located on the internet using the following
|
||||||
|
unique, persistent identifier (known as a handle): 1895.22/1013. This
|
||||||
|
Agreement may also be obtained from a proxy server on the internet
|
||||||
|
using the following URL: http://hdl.handle.net/1895.22/1013".
|
||||||
|
|
||||||
|
3. In the event Licensee prepares a derivative work that is based on
|
||||||
|
or incorporates Python 1.6.1 or any part thereof, and wants to make
|
||||||
|
the derivative work available to others as provided herein, then
|
||||||
|
Licensee hereby agrees to include in any such work a brief summary of
|
||||||
|
the changes made to Python 1.6.1.
|
||||||
|
|
||||||
|
4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS"
|
||||||
|
basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
|
||||||
|
IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND
|
||||||
|
DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
|
||||||
|
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT
|
||||||
|
INFRINGE ANY THIRD PARTY RIGHTS.
|
||||||
|
|
||||||
|
5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
|
||||||
|
1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
|
||||||
|
A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1,
|
||||||
|
OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
|
||||||
|
|
||||||
|
6. This License Agreement will automatically terminate upon a material
|
||||||
|
breach of its terms and conditions.
|
||||||
|
|
||||||
|
7. This License Agreement shall be governed by the federal
|
||||||
|
intellectual property law of the United States, including without
|
||||||
|
limitation the federal copyright law, and, to the extent such
|
||||||
|
U.S. federal law does not apply, by the law of the Commonwealth of
|
||||||
|
Virginia, excluding Virginia's conflict of law provisions.
|
||||||
|
Notwithstanding the foregoing, with regard to derivative works based
|
||||||
|
on Python 1.6.1 that incorporate non-separable material that was
|
||||||
|
previously distributed under the GNU General Public License (GPL), the
|
||||||
|
law of the Commonwealth of Virginia shall govern this License
|
||||||
|
Agreement only as to issues arising under or with respect to
|
||||||
|
Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this
|
||||||
|
License Agreement shall be deemed to create any relationship of
|
||||||
|
agency, partnership, or joint venture between CNRI and Licensee. This
|
||||||
|
License Agreement does not grant permission to use CNRI trademarks or
|
||||||
|
trade name in a trademark sense to endorse or promote products or
|
||||||
|
services of Licensee, or any third party.
|
||||||
|
|
||||||
|
8. By clicking on the "ACCEPT" button where indicated, or by copying,
|
||||||
|
installing or otherwise using Python 1.6.1, Licensee agrees to be
|
||||||
|
bound by the terms and conditions of this License Agreement.
|
||||||
|
|
||||||
|
ACCEPT
|
||||||
|
|
||||||
|
|
||||||
|
CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam,
|
||||||
|
The Netherlands. All rights reserved.
|
||||||
|
|
||||||
|
Permission to use, copy, modify, and distribute this software and its
|
||||||
|
documentation for any purpose and without fee is hereby granted,
|
||||||
|
provided that the above copyright notice appear in all copies and that
|
||||||
|
both that copyright notice and this permission notice appear in
|
||||||
|
supporting documentation, and that the name of Stichting Mathematisch
|
||||||
|
Centrum or CWI not be used in advertising or publicity pertaining to
|
||||||
|
distribution of the software without specific, written prior
|
||||||
|
permission.
|
||||||
|
|
||||||
|
STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO
|
||||||
|
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
|
||||||
|
FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE
|
||||||
|
FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
|
||||||
|
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
|
||||||
|
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
|
||||||
|
OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|
||||||
|
|
||||||
|
ZERO-CLAUSE BSD LICENSE FOR CODE IN THE PYTHON DOCUMENTATION
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
Permission to use, copy, modify, and/or distribute this software for any
|
||||||
|
purpose with or without fee is hereby granted.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH
|
||||||
|
REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY
|
||||||
|
AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,
|
||||||
|
INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
|
||||||
|
LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
|
||||||
|
OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
|
||||||
|
PERFORMANCE OF THIS SOFTWARE.
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
django
|
||||||
24
.venv/Lib/site-packages/django/__init__.py
Normal file
24
.venv/Lib/site-packages/django/__init__.py
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
from django.utils.version import get_version
|
||||||
|
|
||||||
|
VERSION = (5, 2, 8, "final", 0)
|
||||||
|
|
||||||
|
__version__ = get_version(VERSION)
|
||||||
|
|
||||||
|
|
||||||
|
def setup(set_prefix=True):
|
||||||
|
"""
|
||||||
|
Configure the settings (this happens as a side effect of accessing the
|
||||||
|
first setting), configure logging and populate the app registry.
|
||||||
|
Set the thread-local urlresolvers script prefix if `set_prefix` is True.
|
||||||
|
"""
|
||||||
|
from django.apps import apps
|
||||||
|
from django.conf import settings
|
||||||
|
from django.urls import set_script_prefix
|
||||||
|
from django.utils.log import configure_logging
|
||||||
|
|
||||||
|
configure_logging(settings.LOGGING_CONFIG, settings.LOGGING)
|
||||||
|
if set_prefix:
|
||||||
|
set_script_prefix(
|
||||||
|
"/" if settings.FORCE_SCRIPT_NAME is None else settings.FORCE_SCRIPT_NAME
|
||||||
|
)
|
||||||
|
apps.populate(settings.INSTALLED_APPS)
|
||||||
10
.venv/Lib/site-packages/django/__main__.py
Normal file
10
.venv/Lib/site-packages/django/__main__.py
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
"""
|
||||||
|
Invokes django-admin when the django module is run as a script.
|
||||||
|
|
||||||
|
Example: python -m django check
|
||||||
|
"""
|
||||||
|
|
||||||
|
from django.core import management
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
management.execute_from_command_line()
|
||||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
4
.venv/Lib/site-packages/django/apps/__init__.py
Normal file
4
.venv/Lib/site-packages/django/apps/__init__.py
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
from .config import AppConfig
|
||||||
|
from .registry import apps
|
||||||
|
|
||||||
|
__all__ = ["AppConfig", "apps"]
|
||||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
274
.venv/Lib/site-packages/django/apps/config.py
Normal file
274
.venv/Lib/site-packages/django/apps/config.py
Normal file
@@ -0,0 +1,274 @@
|
|||||||
|
import inspect
|
||||||
|
import os
|
||||||
|
from importlib import import_module
|
||||||
|
|
||||||
|
from django.core.exceptions import ImproperlyConfigured
|
||||||
|
from django.utils.functional import cached_property
|
||||||
|
from django.utils.module_loading import import_string, module_has_submodule
|
||||||
|
|
||||||
|
APPS_MODULE_NAME = "apps"
|
||||||
|
MODELS_MODULE_NAME = "models"
|
||||||
|
|
||||||
|
|
||||||
|
class AppConfig:
|
||||||
|
"""Class representing a Django application and its configuration."""
|
||||||
|
|
||||||
|
def __init__(self, app_name, app_module):
|
||||||
|
# Full Python path to the application e.g. 'django.contrib.admin'.
|
||||||
|
self.name = app_name
|
||||||
|
|
||||||
|
# Root module for the application e.g. <module 'django.contrib.admin'
|
||||||
|
# from 'django/contrib/admin/__init__.py'>.
|
||||||
|
self.module = app_module
|
||||||
|
|
||||||
|
# Reference to the Apps registry that holds this AppConfig. Set by the
|
||||||
|
# registry when it registers the AppConfig instance.
|
||||||
|
self.apps = None
|
||||||
|
|
||||||
|
# The following attributes could be defined at the class level in a
|
||||||
|
# subclass, hence the test-and-set pattern.
|
||||||
|
|
||||||
|
# Last component of the Python path to the application e.g. 'admin'.
|
||||||
|
# This value must be unique across a Django project.
|
||||||
|
if not hasattr(self, "label"):
|
||||||
|
self.label = app_name.rpartition(".")[2]
|
||||||
|
if not self.label.isidentifier():
|
||||||
|
raise ImproperlyConfigured(
|
||||||
|
"The app label '%s' is not a valid Python identifier." % self.label
|
||||||
|
)
|
||||||
|
|
||||||
|
# Human-readable name for the application e.g. "Admin".
|
||||||
|
if not hasattr(self, "verbose_name"):
|
||||||
|
self.verbose_name = self.label.title()
|
||||||
|
|
||||||
|
# Filesystem path to the application directory e.g.
|
||||||
|
# '/path/to/django/contrib/admin'.
|
||||||
|
if not hasattr(self, "path"):
|
||||||
|
self.path = self._path_from_module(app_module)
|
||||||
|
|
||||||
|
# Module containing models e.g. <module 'django.contrib.admin.models'
|
||||||
|
# from 'django/contrib/admin/models.py'>. Set by import_models().
|
||||||
|
# None if the application doesn't have a models module.
|
||||||
|
self.models_module = None
|
||||||
|
|
||||||
|
# Mapping of lowercase model names to model classes. Initially set to
|
||||||
|
# None to prevent accidental access before import_models() runs.
|
||||||
|
self.models = None
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return "<%s: %s>" % (self.__class__.__name__, self.label)
|
||||||
|
|
||||||
|
@cached_property
|
||||||
|
def default_auto_field(self):
|
||||||
|
from django.conf import settings
|
||||||
|
|
||||||
|
return settings.DEFAULT_AUTO_FIELD
|
||||||
|
|
||||||
|
@property
|
||||||
|
def _is_default_auto_field_overridden(self):
|
||||||
|
return self.__class__.default_auto_field is not AppConfig.default_auto_field
|
||||||
|
|
||||||
|
def _path_from_module(self, module):
|
||||||
|
"""Attempt to determine app's filesystem path from its module."""
|
||||||
|
# See #21874 for extended discussion of the behavior of this method in
|
||||||
|
# various cases.
|
||||||
|
# Convert to list because __path__ may not support indexing.
|
||||||
|
paths = list(getattr(module, "__path__", []))
|
||||||
|
if len(paths) != 1:
|
||||||
|
filename = getattr(module, "__file__", None)
|
||||||
|
if filename is not None:
|
||||||
|
paths = [os.path.dirname(filename)]
|
||||||
|
else:
|
||||||
|
# For unknown reasons, sometimes the list returned by __path__
|
||||||
|
# contains duplicates that must be removed (#25246).
|
||||||
|
paths = list(set(paths))
|
||||||
|
if len(paths) > 1:
|
||||||
|
raise ImproperlyConfigured(
|
||||||
|
"The app module %r has multiple filesystem locations (%r); "
|
||||||
|
"you must configure this app with an AppConfig subclass "
|
||||||
|
"with a 'path' class attribute." % (module, paths)
|
||||||
|
)
|
||||||
|
elif not paths:
|
||||||
|
raise ImproperlyConfigured(
|
||||||
|
"The app module %r has no filesystem location, "
|
||||||
|
"you must configure this app with an AppConfig subclass "
|
||||||
|
"with a 'path' class attribute." % module
|
||||||
|
)
|
||||||
|
return paths[0]
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def create(cls, entry):
|
||||||
|
"""
|
||||||
|
Factory that creates an app config from an entry in INSTALLED_APPS.
|
||||||
|
"""
|
||||||
|
# create() eventually returns app_config_class(app_name, app_module).
|
||||||
|
app_config_class = None
|
||||||
|
app_name = None
|
||||||
|
app_module = None
|
||||||
|
|
||||||
|
# If import_module succeeds, entry points to the app module.
|
||||||
|
try:
|
||||||
|
app_module = import_module(entry)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
# If app_module has an apps submodule that defines a single
|
||||||
|
# AppConfig subclass, use it automatically.
|
||||||
|
# To prevent this, an AppConfig subclass can declare a class
|
||||||
|
# variable default = False.
|
||||||
|
# If the apps module defines more than one AppConfig subclass,
|
||||||
|
# the default one can declare default = True.
|
||||||
|
if module_has_submodule(app_module, APPS_MODULE_NAME):
|
||||||
|
mod_path = "%s.%s" % (entry, APPS_MODULE_NAME)
|
||||||
|
mod = import_module(mod_path)
|
||||||
|
# Check if there's exactly one AppConfig candidate,
|
||||||
|
# excluding those that explicitly define default = False.
|
||||||
|
app_configs = [
|
||||||
|
(name, candidate)
|
||||||
|
for name, candidate in inspect.getmembers(mod, inspect.isclass)
|
||||||
|
if (
|
||||||
|
issubclass(candidate, cls)
|
||||||
|
and candidate is not cls
|
||||||
|
and getattr(candidate, "default", True)
|
||||||
|
)
|
||||||
|
]
|
||||||
|
if len(app_configs) == 1:
|
||||||
|
app_config_class = app_configs[0][1]
|
||||||
|
else:
|
||||||
|
# Check if there's exactly one AppConfig subclass,
|
||||||
|
# among those that explicitly define default = True.
|
||||||
|
app_configs = [
|
||||||
|
(name, candidate)
|
||||||
|
for name, candidate in app_configs
|
||||||
|
if getattr(candidate, "default", False)
|
||||||
|
]
|
||||||
|
if len(app_configs) > 1:
|
||||||
|
candidates = [repr(name) for name, _ in app_configs]
|
||||||
|
raise RuntimeError(
|
||||||
|
"%r declares more than one default AppConfig: "
|
||||||
|
"%s." % (mod_path, ", ".join(candidates))
|
||||||
|
)
|
||||||
|
elif len(app_configs) == 1:
|
||||||
|
app_config_class = app_configs[0][1]
|
||||||
|
|
||||||
|
# Use the default app config class if we didn't find anything.
|
||||||
|
if app_config_class is None:
|
||||||
|
app_config_class = cls
|
||||||
|
app_name = entry
|
||||||
|
|
||||||
|
# If import_string succeeds, entry is an app config class.
|
||||||
|
if app_config_class is None:
|
||||||
|
try:
|
||||||
|
app_config_class = import_string(entry)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
# If both import_module and import_string failed, it means that entry
|
||||||
|
# doesn't have a valid value.
|
||||||
|
if app_module is None and app_config_class is None:
|
||||||
|
# If the last component of entry starts with an uppercase letter,
|
||||||
|
# then it was likely intended to be an app config class; if not,
|
||||||
|
# an app module. Provide a nice error message in both cases.
|
||||||
|
mod_path, _, cls_name = entry.rpartition(".")
|
||||||
|
if mod_path and cls_name[0].isupper():
|
||||||
|
# We could simply re-trigger the string import exception, but
|
||||||
|
# we're going the extra mile and providing a better error
|
||||||
|
# message for typos in INSTALLED_APPS.
|
||||||
|
# This may raise ImportError, which is the best exception
|
||||||
|
# possible if the module at mod_path cannot be imported.
|
||||||
|
mod = import_module(mod_path)
|
||||||
|
candidates = [
|
||||||
|
repr(name)
|
||||||
|
for name, candidate in inspect.getmembers(mod, inspect.isclass)
|
||||||
|
if issubclass(candidate, cls) and candidate is not cls
|
||||||
|
]
|
||||||
|
msg = "Module '%s' does not contain a '%s' class." % (
|
||||||
|
mod_path,
|
||||||
|
cls_name,
|
||||||
|
)
|
||||||
|
if candidates:
|
||||||
|
msg += " Choices are: %s." % ", ".join(candidates)
|
||||||
|
raise ImportError(msg)
|
||||||
|
else:
|
||||||
|
# Re-trigger the module import exception.
|
||||||
|
import_module(entry)
|
||||||
|
|
||||||
|
# Check for obvious errors. (This check prevents duck typing, but
|
||||||
|
# it could be removed if it became a problem in practice.)
|
||||||
|
if not issubclass(app_config_class, AppConfig):
|
||||||
|
raise ImproperlyConfigured("'%s' isn't a subclass of AppConfig." % entry)
|
||||||
|
|
||||||
|
# Obtain app name here rather than in AppClass.__init__ to keep
|
||||||
|
# all error checking for entries in INSTALLED_APPS in one place.
|
||||||
|
if app_name is None:
|
||||||
|
try:
|
||||||
|
app_name = app_config_class.name
|
||||||
|
except AttributeError:
|
||||||
|
raise ImproperlyConfigured("'%s' must supply a name attribute." % entry)
|
||||||
|
|
||||||
|
# Ensure app_name points to a valid module.
|
||||||
|
try:
|
||||||
|
app_module = import_module(app_name)
|
||||||
|
except ImportError:
|
||||||
|
raise ImproperlyConfigured(
|
||||||
|
"Cannot import '%s'. Check that '%s.%s.name' is correct."
|
||||||
|
% (
|
||||||
|
app_name,
|
||||||
|
app_config_class.__module__,
|
||||||
|
app_config_class.__qualname__,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Entry is a path to an app config class.
|
||||||
|
return app_config_class(app_name, app_module)
|
||||||
|
|
||||||
|
def get_model(self, model_name, require_ready=True):
|
||||||
|
"""
|
||||||
|
Return the model with the given case-insensitive model_name.
|
||||||
|
|
||||||
|
Raise LookupError if no model exists with this name.
|
||||||
|
"""
|
||||||
|
if require_ready:
|
||||||
|
self.apps.check_models_ready()
|
||||||
|
else:
|
||||||
|
self.apps.check_apps_ready()
|
||||||
|
try:
|
||||||
|
return self.models[model_name.lower()]
|
||||||
|
except KeyError:
|
||||||
|
raise LookupError(
|
||||||
|
"App '%s' doesn't have a '%s' model." % (self.label, model_name)
|
||||||
|
)
|
||||||
|
|
||||||
|
def get_models(self, include_auto_created=False, include_swapped=False):
|
||||||
|
"""
|
||||||
|
Return an iterable of models.
|
||||||
|
|
||||||
|
By default, the following models aren't included:
|
||||||
|
|
||||||
|
- auto-created models for many-to-many relations without
|
||||||
|
an explicit intermediate table,
|
||||||
|
- models that have been swapped out.
|
||||||
|
|
||||||
|
Set the corresponding keyword argument to True to include such models.
|
||||||
|
Keyword arguments aren't documented; they're a private API.
|
||||||
|
"""
|
||||||
|
self.apps.check_models_ready()
|
||||||
|
for model in self.models.values():
|
||||||
|
if model._meta.auto_created and not include_auto_created:
|
||||||
|
continue
|
||||||
|
if model._meta.swapped and not include_swapped:
|
||||||
|
continue
|
||||||
|
yield model
|
||||||
|
|
||||||
|
def import_models(self):
|
||||||
|
# Dictionary of models for this app, primarily maintained in the
|
||||||
|
# 'all_models' attribute of the Apps this AppConfig is attached to.
|
||||||
|
self.models = self.apps.all_models[self.label]
|
||||||
|
|
||||||
|
if module_has_submodule(self.module, MODELS_MODULE_NAME):
|
||||||
|
models_module_name = "%s.%s" % (self.name, MODELS_MODULE_NAME)
|
||||||
|
self.models_module = import_module(models_module_name)
|
||||||
|
|
||||||
|
def ready(self):
|
||||||
|
"""
|
||||||
|
Override this method in subclasses to run code when Django starts.
|
||||||
|
"""
|
||||||
437
.venv/Lib/site-packages/django/apps/registry.py
Normal file
437
.venv/Lib/site-packages/django/apps/registry.py
Normal file
@@ -0,0 +1,437 @@
|
|||||||
|
import functools
|
||||||
|
import sys
|
||||||
|
import threading
|
||||||
|
import warnings
|
||||||
|
from collections import Counter, defaultdict
|
||||||
|
from functools import partial
|
||||||
|
|
||||||
|
from django.core.exceptions import AppRegistryNotReady, ImproperlyConfigured
|
||||||
|
|
||||||
|
from .config import AppConfig
|
||||||
|
|
||||||
|
|
||||||
|
class Apps:
|
||||||
|
"""
|
||||||
|
A registry that stores the configuration of installed applications.
|
||||||
|
|
||||||
|
It also keeps track of models, e.g. to provide reverse relations.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, installed_apps=()):
|
||||||
|
# installed_apps is set to None when creating the main registry
|
||||||
|
# because it cannot be populated at that point. Other registries must
|
||||||
|
# provide a list of installed apps and are populated immediately.
|
||||||
|
if installed_apps is None and hasattr(sys.modules[__name__], "apps"):
|
||||||
|
raise RuntimeError("You must supply an installed_apps argument.")
|
||||||
|
|
||||||
|
# Mapping of app labels => model names => model classes. Every time a
|
||||||
|
# model is imported, ModelBase.__new__ calls apps.register_model which
|
||||||
|
# creates an entry in all_models. All imported models are registered,
|
||||||
|
# regardless of whether they're defined in an installed application
|
||||||
|
# and whether the registry has been populated. Since it isn't possible
|
||||||
|
# to reimport a module safely (it could reexecute initialization code)
|
||||||
|
# all_models is never overridden or reset.
|
||||||
|
self.all_models = defaultdict(dict)
|
||||||
|
|
||||||
|
# Mapping of labels to AppConfig instances for installed apps.
|
||||||
|
self.app_configs = {}
|
||||||
|
|
||||||
|
# Stack of app_configs. Used to store the current state in
|
||||||
|
# set_available_apps and set_installed_apps.
|
||||||
|
self.stored_app_configs = []
|
||||||
|
|
||||||
|
# Whether the registry is populated.
|
||||||
|
self.apps_ready = self.models_ready = self.ready = False
|
||||||
|
# For the autoreloader.
|
||||||
|
self.ready_event = threading.Event()
|
||||||
|
|
||||||
|
# Lock for thread-safe population.
|
||||||
|
self._lock = threading.RLock()
|
||||||
|
self.loading = False
|
||||||
|
|
||||||
|
# Maps ("app_label", "modelname") tuples to lists of functions to be
|
||||||
|
# called when the corresponding model is ready. Used by this class's
|
||||||
|
# `lazy_model_operation()` and `do_pending_operations()` methods.
|
||||||
|
self._pending_operations = defaultdict(list)
|
||||||
|
|
||||||
|
# Populate apps and models, unless it's the main registry.
|
||||||
|
if installed_apps is not None:
|
||||||
|
self.populate(installed_apps)
|
||||||
|
|
||||||
|
def populate(self, installed_apps=None):
|
||||||
|
"""
|
||||||
|
Load application configurations and models.
|
||||||
|
|
||||||
|
Import each application module and then each model module.
|
||||||
|
|
||||||
|
It is thread-safe and idempotent, but not reentrant.
|
||||||
|
"""
|
||||||
|
if self.ready:
|
||||||
|
return
|
||||||
|
|
||||||
|
# populate() might be called by two threads in parallel on servers
|
||||||
|
# that create threads before initializing the WSGI callable.
|
||||||
|
with self._lock:
|
||||||
|
if self.ready:
|
||||||
|
return
|
||||||
|
|
||||||
|
# An RLock prevents other threads from entering this section. The
|
||||||
|
# compare and set operation below is atomic.
|
||||||
|
if self.loading:
|
||||||
|
# Prevent reentrant calls to avoid running AppConfig.ready()
|
||||||
|
# methods twice.
|
||||||
|
raise RuntimeError("populate() isn't reentrant")
|
||||||
|
self.loading = True
|
||||||
|
|
||||||
|
# Phase 1: initialize app configs and import app modules.
|
||||||
|
for entry in installed_apps:
|
||||||
|
if isinstance(entry, AppConfig):
|
||||||
|
app_config = entry
|
||||||
|
else:
|
||||||
|
app_config = AppConfig.create(entry)
|
||||||
|
if app_config.label in self.app_configs:
|
||||||
|
raise ImproperlyConfigured(
|
||||||
|
"Application labels aren't unique, "
|
||||||
|
"duplicates: %s" % app_config.label
|
||||||
|
)
|
||||||
|
|
||||||
|
self.app_configs[app_config.label] = app_config
|
||||||
|
app_config.apps = self
|
||||||
|
|
||||||
|
# Check for duplicate app names.
|
||||||
|
counts = Counter(
|
||||||
|
app_config.name for app_config in self.app_configs.values()
|
||||||
|
)
|
||||||
|
duplicates = [name for name, count in counts.most_common() if count > 1]
|
||||||
|
if duplicates:
|
||||||
|
raise ImproperlyConfigured(
|
||||||
|
"Application names aren't unique, "
|
||||||
|
"duplicates: %s" % ", ".join(duplicates)
|
||||||
|
)
|
||||||
|
|
||||||
|
self.apps_ready = True
|
||||||
|
|
||||||
|
# Phase 2: import models modules.
|
||||||
|
for app_config in self.app_configs.values():
|
||||||
|
app_config.import_models()
|
||||||
|
|
||||||
|
self.clear_cache()
|
||||||
|
|
||||||
|
self.models_ready = True
|
||||||
|
|
||||||
|
# Phase 3: run ready() methods of app configs.
|
||||||
|
for app_config in self.get_app_configs():
|
||||||
|
app_config.ready()
|
||||||
|
|
||||||
|
self.ready = True
|
||||||
|
self.ready_event.set()
|
||||||
|
|
||||||
|
def check_apps_ready(self):
|
||||||
|
"""Raise an exception if all apps haven't been imported yet."""
|
||||||
|
if not self.apps_ready:
|
||||||
|
from django.conf import settings
|
||||||
|
|
||||||
|
# If "not ready" is due to unconfigured settings, accessing
|
||||||
|
# INSTALLED_APPS raises a more helpful ImproperlyConfigured
|
||||||
|
# exception.
|
||||||
|
settings.INSTALLED_APPS
|
||||||
|
raise AppRegistryNotReady("Apps aren't loaded yet.")
|
||||||
|
|
||||||
|
def check_models_ready(self):
|
||||||
|
"""Raise an exception if all models haven't been imported yet."""
|
||||||
|
if not self.models_ready:
|
||||||
|
raise AppRegistryNotReady("Models aren't loaded yet.")
|
||||||
|
|
||||||
|
def get_app_configs(self):
|
||||||
|
"""Import applications and return an iterable of app configs."""
|
||||||
|
self.check_apps_ready()
|
||||||
|
return self.app_configs.values()
|
||||||
|
|
||||||
|
def get_app_config(self, app_label):
|
||||||
|
"""
|
||||||
|
Import applications and returns an app config for the given label.
|
||||||
|
|
||||||
|
Raise LookupError if no application exists with this label.
|
||||||
|
"""
|
||||||
|
self.check_apps_ready()
|
||||||
|
try:
|
||||||
|
return self.app_configs[app_label]
|
||||||
|
except KeyError:
|
||||||
|
message = "No installed app with label '%s'." % app_label
|
||||||
|
for app_config in self.get_app_configs():
|
||||||
|
if app_config.name == app_label:
|
||||||
|
message += " Did you mean '%s'?" % app_config.label
|
||||||
|
break
|
||||||
|
raise LookupError(message)
|
||||||
|
|
||||||
|
# This method is performance-critical at least for Django's test suite.
|
||||||
|
@functools.cache
|
||||||
|
def get_models(self, include_auto_created=False, include_swapped=False):
|
||||||
|
"""
|
||||||
|
Return a list of all installed models.
|
||||||
|
|
||||||
|
By default, the following models aren't included:
|
||||||
|
|
||||||
|
- auto-created models for many-to-many relations without
|
||||||
|
an explicit intermediate table,
|
||||||
|
- models that have been swapped out.
|
||||||
|
|
||||||
|
Set the corresponding keyword argument to True to include such models.
|
||||||
|
"""
|
||||||
|
self.check_models_ready()
|
||||||
|
|
||||||
|
result = []
|
||||||
|
for app_config in self.app_configs.values():
|
||||||
|
result.extend(app_config.get_models(include_auto_created, include_swapped))
|
||||||
|
return result
|
||||||
|
|
||||||
|
def get_model(self, app_label, model_name=None, require_ready=True):
|
||||||
|
"""
|
||||||
|
Return the model matching the given app_label and model_name.
|
||||||
|
|
||||||
|
As a shortcut, app_label may be in the form <app_label>.<model_name>.
|
||||||
|
|
||||||
|
model_name is case-insensitive.
|
||||||
|
|
||||||
|
Raise LookupError if no application exists with this label, or no
|
||||||
|
model exists with this name in the application. Raise ValueError if
|
||||||
|
called with a single argument that doesn't contain exactly one dot.
|
||||||
|
"""
|
||||||
|
if require_ready:
|
||||||
|
self.check_models_ready()
|
||||||
|
else:
|
||||||
|
self.check_apps_ready()
|
||||||
|
|
||||||
|
if model_name is None:
|
||||||
|
app_label, model_name = app_label.split(".")
|
||||||
|
|
||||||
|
app_config = self.get_app_config(app_label)
|
||||||
|
|
||||||
|
if not require_ready and app_config.models is None:
|
||||||
|
app_config.import_models()
|
||||||
|
|
||||||
|
return app_config.get_model(model_name, require_ready=require_ready)
|
||||||
|
|
||||||
|
def register_model(self, app_label, model):
|
||||||
|
# Since this method is called when models are imported, it cannot
|
||||||
|
# perform imports because of the risk of import loops. It mustn't
|
||||||
|
# call get_app_config().
|
||||||
|
model_name = model._meta.model_name
|
||||||
|
app_models = self.all_models[app_label]
|
||||||
|
if model_name in app_models:
|
||||||
|
if (
|
||||||
|
model.__name__ == app_models[model_name].__name__
|
||||||
|
and model.__module__ == app_models[model_name].__module__
|
||||||
|
):
|
||||||
|
warnings.warn(
|
||||||
|
"Model '%s.%s' was already registered. Reloading models is not "
|
||||||
|
"advised as it can lead to inconsistencies, most notably with "
|
||||||
|
"related models." % (app_label, model_name),
|
||||||
|
RuntimeWarning,
|
||||||
|
stacklevel=2,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
raise RuntimeError(
|
||||||
|
"Conflicting '%s' models in application '%s': %s and %s."
|
||||||
|
% (model_name, app_label, app_models[model_name], model)
|
||||||
|
)
|
||||||
|
app_models[model_name] = model
|
||||||
|
self.do_pending_operations(model)
|
||||||
|
self.clear_cache()
|
||||||
|
|
||||||
|
def is_installed(self, app_name):
|
||||||
|
"""
|
||||||
|
Check whether an application with this name exists in the registry.
|
||||||
|
|
||||||
|
app_name is the full name of the app e.g. 'django.contrib.admin'.
|
||||||
|
"""
|
||||||
|
self.check_apps_ready()
|
||||||
|
return any(ac.name == app_name for ac in self.app_configs.values())
|
||||||
|
|
||||||
|
def get_containing_app_config(self, object_name):
|
||||||
|
"""
|
||||||
|
Look for an app config containing a given object.
|
||||||
|
|
||||||
|
object_name is the dotted Python path to the object.
|
||||||
|
|
||||||
|
Return the app config for the inner application in case of nesting.
|
||||||
|
Return None if the object isn't in any registered app config.
|
||||||
|
"""
|
||||||
|
self.check_apps_ready()
|
||||||
|
candidates = []
|
||||||
|
for app_config in self.app_configs.values():
|
||||||
|
if object_name.startswith(app_config.name):
|
||||||
|
subpath = object_name.removeprefix(app_config.name)
|
||||||
|
if subpath == "" or subpath[0] == ".":
|
||||||
|
candidates.append(app_config)
|
||||||
|
if candidates:
|
||||||
|
return sorted(candidates, key=lambda ac: -len(ac.name))[0]
|
||||||
|
|
||||||
|
def get_registered_model(self, app_label, model_name):
|
||||||
|
"""
|
||||||
|
Similar to get_model(), but doesn't require that an app exists with
|
||||||
|
the given app_label.
|
||||||
|
|
||||||
|
It's safe to call this method at import time, even while the registry
|
||||||
|
is being populated.
|
||||||
|
"""
|
||||||
|
model = self.all_models[app_label].get(model_name.lower())
|
||||||
|
if model is None:
|
||||||
|
raise LookupError("Model '%s.%s' not registered." % (app_label, model_name))
|
||||||
|
return model
|
||||||
|
|
||||||
|
@functools.cache
|
||||||
|
def get_swappable_settings_name(self, to_string):
|
||||||
|
"""
|
||||||
|
For a given model string (e.g. "auth.User"), return the name of the
|
||||||
|
corresponding settings name if it refers to a swappable model. If the
|
||||||
|
referred model is not swappable, return None.
|
||||||
|
|
||||||
|
This method is decorated with @functools.cache because it's performance
|
||||||
|
critical when it comes to migrations. Since the swappable settings don't
|
||||||
|
change after Django has loaded the settings, there is no reason to get
|
||||||
|
the respective settings attribute over and over again.
|
||||||
|
"""
|
||||||
|
to_string = to_string.lower()
|
||||||
|
for model in self.get_models(include_swapped=True):
|
||||||
|
swapped = model._meta.swapped
|
||||||
|
# Is this model swapped out for the model given by to_string?
|
||||||
|
if swapped and swapped.lower() == to_string:
|
||||||
|
return model._meta.swappable
|
||||||
|
# Is this model swappable and the one given by to_string?
|
||||||
|
if model._meta.swappable and model._meta.label_lower == to_string:
|
||||||
|
return model._meta.swappable
|
||||||
|
return None
|
||||||
|
|
||||||
|
def set_available_apps(self, available):
|
||||||
|
"""
|
||||||
|
Restrict the set of installed apps used by get_app_config[s].
|
||||||
|
|
||||||
|
available must be an iterable of application names.
|
||||||
|
|
||||||
|
set_available_apps() must be balanced with unset_available_apps().
|
||||||
|
|
||||||
|
Primarily used for performance optimization in TransactionTestCase.
|
||||||
|
|
||||||
|
This method is safe in the sense that it doesn't trigger any imports.
|
||||||
|
"""
|
||||||
|
available = set(available)
|
||||||
|
installed = {app_config.name for app_config in self.get_app_configs()}
|
||||||
|
if not available.issubset(installed):
|
||||||
|
raise ValueError(
|
||||||
|
"Available apps isn't a subset of installed apps, extra apps: %s"
|
||||||
|
% ", ".join(available - installed)
|
||||||
|
)
|
||||||
|
|
||||||
|
self.stored_app_configs.append(self.app_configs)
|
||||||
|
self.app_configs = {
|
||||||
|
label: app_config
|
||||||
|
for label, app_config in self.app_configs.items()
|
||||||
|
if app_config.name in available
|
||||||
|
}
|
||||||
|
self.clear_cache()
|
||||||
|
|
||||||
|
def unset_available_apps(self):
|
||||||
|
"""Cancel a previous call to set_available_apps()."""
|
||||||
|
self.app_configs = self.stored_app_configs.pop()
|
||||||
|
self.clear_cache()
|
||||||
|
|
||||||
|
def set_installed_apps(self, installed):
|
||||||
|
"""
|
||||||
|
Enable a different set of installed apps for get_app_config[s].
|
||||||
|
|
||||||
|
installed must be an iterable in the same format as INSTALLED_APPS.
|
||||||
|
|
||||||
|
set_installed_apps() must be balanced with unset_installed_apps(),
|
||||||
|
even if it exits with an exception.
|
||||||
|
|
||||||
|
Primarily used as a receiver of the setting_changed signal in tests.
|
||||||
|
|
||||||
|
This method may trigger new imports, which may add new models to the
|
||||||
|
registry of all imported models. They will stay in the registry even
|
||||||
|
after unset_installed_apps(). Since it isn't possible to replay
|
||||||
|
imports safely (e.g. that could lead to registering listeners twice),
|
||||||
|
models are registered when they're imported and never removed.
|
||||||
|
"""
|
||||||
|
if not self.ready:
|
||||||
|
raise AppRegistryNotReady("App registry isn't ready yet.")
|
||||||
|
self.stored_app_configs.append(self.app_configs)
|
||||||
|
self.app_configs = {}
|
||||||
|
self.apps_ready = self.models_ready = self.loading = self.ready = False
|
||||||
|
self.clear_cache()
|
||||||
|
self.populate(installed)
|
||||||
|
|
||||||
|
def unset_installed_apps(self):
|
||||||
|
"""Cancel a previous call to set_installed_apps()."""
|
||||||
|
self.app_configs = self.stored_app_configs.pop()
|
||||||
|
self.apps_ready = self.models_ready = self.ready = True
|
||||||
|
self.clear_cache()
|
||||||
|
|
||||||
|
def clear_cache(self):
|
||||||
|
"""
|
||||||
|
Clear all internal caches, for methods that alter the app registry.
|
||||||
|
|
||||||
|
This is mostly used in tests.
|
||||||
|
"""
|
||||||
|
self.get_swappable_settings_name.cache_clear()
|
||||||
|
# Call expire cache on each model. This will purge
|
||||||
|
# the relation tree and the fields cache.
|
||||||
|
self.get_models.cache_clear()
|
||||||
|
if self.ready:
|
||||||
|
# Circumvent self.get_models() to prevent that the cache is refilled.
|
||||||
|
# This particularly prevents that an empty value is cached while cloning.
|
||||||
|
for app_config in self.app_configs.values():
|
||||||
|
for model in app_config.get_models(include_auto_created=True):
|
||||||
|
model._meta._expire_cache()
|
||||||
|
|
||||||
|
def lazy_model_operation(self, function, *model_keys):
|
||||||
|
"""
|
||||||
|
Take a function and a number of ("app_label", "modelname") tuples, and
|
||||||
|
when all the corresponding models have been imported and registered,
|
||||||
|
call the function with the model classes as its arguments.
|
||||||
|
|
||||||
|
The function passed to this method must accept exactly n models as
|
||||||
|
arguments, where n=len(model_keys).
|
||||||
|
"""
|
||||||
|
# Base case: no arguments, just execute the function.
|
||||||
|
if not model_keys:
|
||||||
|
function()
|
||||||
|
# Recursive case: take the head of model_keys, wait for the
|
||||||
|
# corresponding model class to be imported and registered, then apply
|
||||||
|
# that argument to the supplied function. Pass the resulting partial
|
||||||
|
# to lazy_model_operation() along with the remaining model args and
|
||||||
|
# repeat until all models are loaded and all arguments are applied.
|
||||||
|
else:
|
||||||
|
next_model, *more_models = model_keys
|
||||||
|
|
||||||
|
# This will be executed after the class corresponding to next_model
|
||||||
|
# has been imported and registered. The `func` attribute provides
|
||||||
|
# duck-type compatibility with partials.
|
||||||
|
def apply_next_model(model):
|
||||||
|
next_function = partial(apply_next_model.func, model)
|
||||||
|
self.lazy_model_operation(next_function, *more_models)
|
||||||
|
|
||||||
|
apply_next_model.func = function
|
||||||
|
|
||||||
|
# If the model has already been imported and registered, partially
|
||||||
|
# apply it to the function now. If not, add it to the list of
|
||||||
|
# pending operations for the model, where it will be executed with
|
||||||
|
# the model class as its sole argument once the model is ready.
|
||||||
|
try:
|
||||||
|
model_class = self.get_registered_model(*next_model)
|
||||||
|
except LookupError:
|
||||||
|
self._pending_operations[next_model].append(apply_next_model)
|
||||||
|
else:
|
||||||
|
apply_next_model(model_class)
|
||||||
|
|
||||||
|
def do_pending_operations(self, model):
|
||||||
|
"""
|
||||||
|
Take a newly-prepared model and pass it to each function waiting for
|
||||||
|
it. This is called at the very end of Apps.register_model().
|
||||||
|
"""
|
||||||
|
key = model._meta.app_label, model._meta.model_name
|
||||||
|
for function in self._pending_operations.pop(key, []):
|
||||||
|
function(model)
|
||||||
|
|
||||||
|
|
||||||
|
apps = Apps(installed_apps=None)
|
||||||
272
.venv/Lib/site-packages/django/conf/__init__.py
Normal file
272
.venv/Lib/site-packages/django/conf/__init__.py
Normal file
@@ -0,0 +1,272 @@
|
|||||||
|
"""
|
||||||
|
Settings and configuration for Django.
|
||||||
|
|
||||||
|
Read values from the module specified by the DJANGO_SETTINGS_MODULE environment
|
||||||
|
variable, and then from django.conf.global_settings; see the global_settings.py
|
||||||
|
for a list of all possible variables.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import importlib
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
import traceback
|
||||||
|
import warnings
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import django
|
||||||
|
from django.conf import global_settings
|
||||||
|
from django.core.exceptions import ImproperlyConfigured
|
||||||
|
from django.utils.deprecation import RemovedInDjango60Warning
|
||||||
|
from django.utils.functional import LazyObject, empty
|
||||||
|
|
||||||
|
ENVIRONMENT_VARIABLE = "DJANGO_SETTINGS_MODULE"
|
||||||
|
DEFAULT_STORAGE_ALIAS = "default"
|
||||||
|
STATICFILES_STORAGE_ALIAS = "staticfiles"
|
||||||
|
|
||||||
|
# RemovedInDjango60Warning.
|
||||||
|
FORMS_URLFIELD_ASSUME_HTTPS_DEPRECATED_MSG = (
|
||||||
|
"The FORMS_URLFIELD_ASSUME_HTTPS transitional setting is deprecated."
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class SettingsReference(str):
|
||||||
|
"""
|
||||||
|
String subclass which references a current settings value. It's treated as
|
||||||
|
the value in memory but serializes to a settings.NAME attribute reference.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __new__(self, value, setting_name):
|
||||||
|
return str.__new__(self, value)
|
||||||
|
|
||||||
|
def __init__(self, value, setting_name):
|
||||||
|
self.setting_name = setting_name
|
||||||
|
|
||||||
|
|
||||||
|
class LazySettings(LazyObject):
|
||||||
|
"""
|
||||||
|
A lazy proxy for either global Django settings or a custom settings object.
|
||||||
|
The user can manually configure settings prior to using them. Otherwise,
|
||||||
|
Django uses the settings module pointed to by DJANGO_SETTINGS_MODULE.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def _setup(self, name=None):
|
||||||
|
"""
|
||||||
|
Load the settings module pointed to by the environment variable. This
|
||||||
|
is used the first time settings are needed, if the user hasn't
|
||||||
|
configured settings manually.
|
||||||
|
"""
|
||||||
|
settings_module = os.environ.get(ENVIRONMENT_VARIABLE)
|
||||||
|
if not settings_module:
|
||||||
|
desc = ("setting %s" % name) if name else "settings"
|
||||||
|
raise ImproperlyConfigured(
|
||||||
|
"Requested %s, but settings are not configured. "
|
||||||
|
"You must either define the environment variable %s "
|
||||||
|
"or call settings.configure() before accessing settings."
|
||||||
|
% (desc, ENVIRONMENT_VARIABLE)
|
||||||
|
)
|
||||||
|
|
||||||
|
self._wrapped = Settings(settings_module)
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
# Hardcode the class name as otherwise it yields 'Settings'.
|
||||||
|
if self._wrapped is empty:
|
||||||
|
return "<LazySettings [Unevaluated]>"
|
||||||
|
return '<LazySettings "%(settings_module)s">' % {
|
||||||
|
"settings_module": self._wrapped.SETTINGS_MODULE,
|
||||||
|
}
|
||||||
|
|
||||||
|
def __getattr__(self, name):
|
||||||
|
"""Return the value of a setting and cache it in self.__dict__."""
|
||||||
|
if (_wrapped := self._wrapped) is empty:
|
||||||
|
self._setup(name)
|
||||||
|
_wrapped = self._wrapped
|
||||||
|
val = getattr(_wrapped, name)
|
||||||
|
|
||||||
|
# Special case some settings which require further modification.
|
||||||
|
# This is done here for performance reasons so the modified value is cached.
|
||||||
|
if name in {"MEDIA_URL", "STATIC_URL"} and val is not None:
|
||||||
|
val = self._add_script_prefix(val)
|
||||||
|
elif name == "SECRET_KEY" and not val:
|
||||||
|
raise ImproperlyConfigured("The SECRET_KEY setting must not be empty.")
|
||||||
|
|
||||||
|
self.__dict__[name] = val
|
||||||
|
return val
|
||||||
|
|
||||||
|
def __setattr__(self, name, value):
|
||||||
|
"""
|
||||||
|
Set the value of setting. Clear all cached values if _wrapped changes
|
||||||
|
(@override_settings does this) or clear single values when set.
|
||||||
|
"""
|
||||||
|
if name == "_wrapped":
|
||||||
|
self.__dict__.clear()
|
||||||
|
else:
|
||||||
|
self.__dict__.pop(name, None)
|
||||||
|
super().__setattr__(name, value)
|
||||||
|
|
||||||
|
def __delattr__(self, name):
|
||||||
|
"""Delete a setting and clear it from cache if needed."""
|
||||||
|
super().__delattr__(name)
|
||||||
|
self.__dict__.pop(name, None)
|
||||||
|
|
||||||
|
def configure(self, default_settings=global_settings, **options):
|
||||||
|
"""
|
||||||
|
Called to manually configure the settings. The 'default_settings'
|
||||||
|
parameter sets where to retrieve any unspecified values from (its
|
||||||
|
argument must support attribute access (__getattr__)).
|
||||||
|
"""
|
||||||
|
if self._wrapped is not empty:
|
||||||
|
raise RuntimeError("Settings already configured.")
|
||||||
|
holder = UserSettingsHolder(default_settings)
|
||||||
|
for name, value in options.items():
|
||||||
|
if not name.isupper():
|
||||||
|
raise TypeError("Setting %r must be uppercase." % name)
|
||||||
|
setattr(holder, name, value)
|
||||||
|
self._wrapped = holder
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _add_script_prefix(value):
|
||||||
|
"""
|
||||||
|
Add SCRIPT_NAME prefix to relative paths.
|
||||||
|
|
||||||
|
Useful when the app is being served at a subpath and manually prefixing
|
||||||
|
subpath to STATIC_URL and MEDIA_URL in settings is inconvenient.
|
||||||
|
"""
|
||||||
|
# Don't apply prefix to absolute paths and URLs.
|
||||||
|
if value.startswith(("http://", "https://", "/")):
|
||||||
|
return value
|
||||||
|
from django.urls import get_script_prefix
|
||||||
|
|
||||||
|
return "%s%s" % (get_script_prefix(), value)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def configured(self):
|
||||||
|
"""Return True if the settings have already been configured."""
|
||||||
|
return self._wrapped is not empty
|
||||||
|
|
||||||
|
def _show_deprecation_warning(self, message, category):
|
||||||
|
stack = traceback.extract_stack()
|
||||||
|
# Show a warning if the setting is used outside of Django.
|
||||||
|
# Stack index: -1 this line, -2 the property, -3 the
|
||||||
|
# LazyObject __getattribute__(), -4 the caller.
|
||||||
|
filename, _, _, _ = stack[-4]
|
||||||
|
if not filename.startswith(os.path.dirname(django.__file__)):
|
||||||
|
warnings.warn(message, category, stacklevel=2)
|
||||||
|
|
||||||
|
|
||||||
|
class Settings:
|
||||||
|
def __init__(self, settings_module):
|
||||||
|
# update this dict from global settings (but only for ALL_CAPS settings)
|
||||||
|
for setting in dir(global_settings):
|
||||||
|
if setting.isupper():
|
||||||
|
setattr(self, setting, getattr(global_settings, setting))
|
||||||
|
|
||||||
|
# store the settings module in case someone later cares
|
||||||
|
self.SETTINGS_MODULE = settings_module
|
||||||
|
|
||||||
|
mod = importlib.import_module(self.SETTINGS_MODULE)
|
||||||
|
|
||||||
|
tuple_settings = (
|
||||||
|
"ALLOWED_HOSTS",
|
||||||
|
"INSTALLED_APPS",
|
||||||
|
"TEMPLATE_DIRS",
|
||||||
|
"LOCALE_PATHS",
|
||||||
|
"SECRET_KEY_FALLBACKS",
|
||||||
|
)
|
||||||
|
self._explicit_settings = set()
|
||||||
|
for setting in dir(mod):
|
||||||
|
if setting.isupper():
|
||||||
|
setting_value = getattr(mod, setting)
|
||||||
|
|
||||||
|
if setting in tuple_settings and not isinstance(
|
||||||
|
setting_value, (list, tuple)
|
||||||
|
):
|
||||||
|
raise ImproperlyConfigured(
|
||||||
|
"The %s setting must be a list or a tuple." % setting
|
||||||
|
)
|
||||||
|
setattr(self, setting, setting_value)
|
||||||
|
self._explicit_settings.add(setting)
|
||||||
|
|
||||||
|
if self.is_overridden("FORMS_URLFIELD_ASSUME_HTTPS"):
|
||||||
|
warnings.warn(
|
||||||
|
FORMS_URLFIELD_ASSUME_HTTPS_DEPRECATED_MSG,
|
||||||
|
RemovedInDjango60Warning,
|
||||||
|
)
|
||||||
|
|
||||||
|
if hasattr(time, "tzset") and self.TIME_ZONE:
|
||||||
|
# When we can, attempt to validate the timezone. If we can't find
|
||||||
|
# this file, no check happens and it's harmless.
|
||||||
|
zoneinfo_root = Path("/usr/share/zoneinfo")
|
||||||
|
zone_info_file = zoneinfo_root.joinpath(*self.TIME_ZONE.split("/"))
|
||||||
|
if zoneinfo_root.exists() and not zone_info_file.exists():
|
||||||
|
raise ValueError("Incorrect timezone setting: %s" % self.TIME_ZONE)
|
||||||
|
# Move the time zone info into os.environ. See ticket #2315 for why
|
||||||
|
# we don't do this unconditionally (breaks Windows).
|
||||||
|
os.environ["TZ"] = self.TIME_ZONE
|
||||||
|
time.tzset()
|
||||||
|
|
||||||
|
def is_overridden(self, setting):
|
||||||
|
return setting in self._explicit_settings
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return '<%(cls)s "%(settings_module)s">' % {
|
||||||
|
"cls": self.__class__.__name__,
|
||||||
|
"settings_module": self.SETTINGS_MODULE,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class UserSettingsHolder:
|
||||||
|
"""Holder for user configured settings."""
|
||||||
|
|
||||||
|
# SETTINGS_MODULE doesn't make much sense in the manually configured
|
||||||
|
# (standalone) case.
|
||||||
|
SETTINGS_MODULE = None
|
||||||
|
|
||||||
|
def __init__(self, default_settings):
|
||||||
|
"""
|
||||||
|
Requests for configuration variables not in this class are satisfied
|
||||||
|
from the module specified in default_settings (if possible).
|
||||||
|
"""
|
||||||
|
self.__dict__["_deleted"] = set()
|
||||||
|
self.default_settings = default_settings
|
||||||
|
|
||||||
|
def __getattr__(self, name):
|
||||||
|
if not name.isupper() or name in self._deleted:
|
||||||
|
raise AttributeError
|
||||||
|
return getattr(self.default_settings, name)
|
||||||
|
|
||||||
|
def __setattr__(self, name, value):
|
||||||
|
self._deleted.discard(name)
|
||||||
|
if name == "FORMS_URLFIELD_ASSUME_HTTPS":
|
||||||
|
warnings.warn(
|
||||||
|
FORMS_URLFIELD_ASSUME_HTTPS_DEPRECATED_MSG,
|
||||||
|
RemovedInDjango60Warning,
|
||||||
|
)
|
||||||
|
super().__setattr__(name, value)
|
||||||
|
|
||||||
|
def __delattr__(self, name):
|
||||||
|
self._deleted.add(name)
|
||||||
|
if hasattr(self, name):
|
||||||
|
super().__delattr__(name)
|
||||||
|
|
||||||
|
def __dir__(self):
|
||||||
|
return sorted(
|
||||||
|
s
|
||||||
|
for s in [*self.__dict__, *dir(self.default_settings)]
|
||||||
|
if s not in self._deleted
|
||||||
|
)
|
||||||
|
|
||||||
|
def is_overridden(self, setting):
|
||||||
|
deleted = setting in self._deleted
|
||||||
|
set_locally = setting in self.__dict__
|
||||||
|
set_on_default = getattr(
|
||||||
|
self.default_settings, "is_overridden", lambda s: False
|
||||||
|
)(setting)
|
||||||
|
return deleted or set_locally or set_on_default
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return "<%(cls)s>" % {
|
||||||
|
"cls": self.__class__.__name__,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
settings = LazySettings()
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user