55 Commits

Author SHA1 Message Date
9ec5797919 Applied max-keys to combined current + archived ListObjectVersions output and reports truncation 2026-04-22 00:12:22 +08:00
8935188c8f Update static website 404 page 2026-04-21 21:26:50 +08:00
c77c592832 Update static website to include proper error handling; add missing features 2026-04-21 20:54:00 +08:00
501d563df2 Add missing features - notifcations, object lock, acl 2026-04-21 00:27:50 +08:00
ddcdb4026c Fix domains mapping missing 2026-04-20 22:02:05 +08:00
3e7c0af019 Fix dockerfile issue 2026-04-20 21:39:06 +08:00
476b9bd2e4 First porting of Python to Rust - update docs and bug fixes 2026-04-20 21:27:02 +08:00
c2ef37b84e Separate Python and Rust into python/ and rust/ with per-stack Dockerfiles 2026-04-19 14:01:05 +08:00
be8e030940 Migrate more Python functions to Rust 2026-04-19 13:53:55 +08:00
ad7b2a02cb Add missing endpoints for Rust S3 API 2026-04-05 15:22:24 +08:00
72ddd9822c Add docker support for rust integration 2026-04-03 12:31:11 +08:00
4c30efd802 Update myfsio rust engines - added more implementations 2026-04-02 21:57:16 +08:00
926a7e6366 Add Rust storage engine foundation 2026-04-02 17:00:58 +08:00
1eadc7b75c Fix more-actions dropdown positioning: use Popper fixed strategy instead of raw CSS position:fixed 2026-04-01 16:24:42 +08:00
4a224a127b Fix more-actions dropdown triggering row selection on object list 2026-04-01 16:17:29 +08:00
c498fe7aee Add self-heal missing ETags and harden ETag index persistence 2026-03-31 21:10:47 +08:00
3838aed954 Fix presigned URL security vulnerabilities: enforce key/user status in SigV4 paths, remove duplicate verification, remove X-Forwarded-Host trust 2026-03-31 20:27:18 +08:00
6a193dbb1c Add --version option for run.py 2026-03-31 17:21:33 +08:00
e94b341a5b Add robust myfsio_core staleness detection with Python fallback; document Rust extension build in README 2026-03-31 17:13:05 +08:00
2ad3736852 Add intra-bucket cursor tracking to integrity scanner for progressive full coverage; Optimize integrity scanner: early batch exit, lazy sorted walk, cursor-aware index reads 2026-03-31 17:04:28 +08:00
f05b2668c0 Reduce per-request overhead: pre-compile SigV4 regex, in-memory etag index cache, 1MB GET chunks, configurable meta cache, skip fsync for rebuildable caches 2026-03-25 13:44:34 +08:00
f7c1c1f809 Update requirements.txt 2026-03-25 13:26:42 +08:00
0e392e18b4 Hide ghost details in object panel when preview fails to load 2026-03-24 15:15:03 +08:00
8996f1ce06 Fix folder selection not showing delete button in bucket browser 2026-03-24 12:10:38 +08:00
f60dbaf9c9 Respect DISPLAY_TIMEZONE in GC and integrity scanner history tables 2026-03-23 18:36:13 +08:00
1a5a7aa9e1 Auto-refresh Recent Scans/Executions tables after GC and integrity scan completion 2026-03-23 18:31:13 +08:00
326367ae4c Fix integrity scanner batch limit and add cursor-based rotation 2026-03-23 17:46:27 +08:00
a7f9b0a22f Convert GC to async with polling to prevent proxy timeouts 2026-03-23 17:14:04 +08:00
0e525713b1 Fix missing CSRF token on presigned URL request 2026-03-23 16:48:25 +08:00
f43fad02fb Replace fetch with XHR for multipart upload progress and add retry logic 2026-03-23 16:27:28 +08:00
eff3e378f3 Fix mobile infinite scroll on object list and ghost preview on fast object swap 2026-03-23 11:55:46 +08:00
5e32cef792 Add I/O throttling to GC and integrity scanner to prevent HDD starvation 2026-03-23 11:36:38 +08:00
9898167f8d Make integrity scan async with progress indicator in UI 2026-03-22 14:17:43 +08:00
4a553555d3 Clean up debug code 2026-03-22 11:38:29 +08:00
7a3202c996 Possible fix for the issue 2026-03-22 11:27:52 +08:00
bd20ca86ab Further debugging on s3 api issues on Granian 2026-03-22 11:22:24 +08:00
532cf95d59 Debug s3 api issues on Granian 2026-03-22 11:14:32 +08:00
366f8ce60d the middleware now also triggers when Content-Length is '0' but X-Amz-Decoded-Content-Length or aws-chunked headers indicate a body should be present 2026-03-22 00:24:04 +08:00
7612cb054a further fixes 2026-03-22 00:16:30 +08:00
966d524dca Fix 0-byte uploads caused by Granian stripping Expect header and missing CONTENT_LENGTH for chunked transfers 2026-03-22 00:04:55 +08:00
e84f1f1851 Fix SigV4 SignatureDoesNotMatch when Expect header is stripped by WSGI server 2026-03-21 23:48:19 +08:00
a059f0502d Fix 0-byte uploads caused by Granian default buffer size; Add SERVER_MAX_BUFFER_SIZE config 2026-03-21 22:57:48 +08:00
afd7173ba0 Fix buttons all showing Running state when only one action is triggered 2026-03-21 14:51:43 +08:00
c807bb2388 Update install/uninstall scripts for encrypted IAM config 2026-03-20 17:51:00 +08:00
aa4f9f5566 Bypass boto3 proxy for object streaming, read directly from storage layer; Add streaming object iterator to eliminate O(n²) directory rescanning on large buckets; Add iter_objects_shallow delegation to EncryptedObjectStorage 2026-03-20 17:35:10 +08:00
14786151e5 Fix selected object losing highlight on scroll in virtual list 2026-03-20 12:10:26 +08:00
a496862902 Fix stale object count on dashboard after deleting all objects in bucket 2026-03-17 23:25:30 +08:00
df4f27ca2e Fix IAM policy editor injecting prefix on existing policies without one 2026-03-15 16:04:35 +08:00
d72e0a347e Overhaul IAM: granular actions, multi-key users, prefix-scoped policies 2026-03-14 23:50:44 +08:00
6ed4b7d8ea Add System page: server info, feature flags, GC and integrity scanner UI 2026-03-14 20:27:57 +08:00
31ebbea680 Fix Docker healthcheck failure: Granian cannot run inside daemon process 2026-03-14 18:31:12 +08:00
d878134ebf Switch from Waitress to Granian (Rust/hyper WSGI server) for improved concurrency 2026-03-14 18:17:39 +08:00
55568d6892 Fix video seekbar in static website hosting by adding HTTP Range request support 2026-03-10 22:21:55 +08:00
a4ae81c77c Add integrity scanner: background detection and healing of corrupted objects, orphaned files, phantom metadata, stale versions, etag cache inconsistencies, and legacy metadata drift 2026-03-10 22:14:39 +08:00
9da7104887 Redesign tags UI: split pills, grid editor with column headers, ghost delete buttons 2026-03-10 17:48:17 +08:00
206 changed files with 66129 additions and 3136 deletions

7
.gitignore vendored
View File

@@ -27,8 +27,11 @@ dist/
.eggs/
# Rust / maturin build artifacts
myfsio_core/target/
myfsio_core/Cargo.lock
python/myfsio_core/target/
python/myfsio_core/Cargo.lock
# Rust engine build artifacts
rust/myfsio-engine/target/
# Local runtime artifacts
logs/

388
README.md
View File

@@ -1,250 +1,212 @@
# MyFSIO
A lightweight, S3-compatible object storage system built with Flask. MyFSIO implements core AWS S3 REST API operations with filesystem-backed storage, making it ideal for local development, testing, and self-hosted storage scenarios.
MyFSIO is an S3-compatible object storage server with a Rust runtime and a filesystem-backed storage engine. The active server lives under `rust/myfsio-engine` and serves both the S3 API and the built-in web UI from a single process.
The `python/` implementation is deprecated as of 2026-04-21. It remains in the repository for migration reference and legacy tests, but new development and supported runtime usage should target the Rust server.
## Features
**Core Storage**
- S3-compatible REST API with AWS Signature Version 4 authentication
- Bucket and object CRUD operations
- Object versioning with version history
- Multipart uploads for large files
- Presigned URLs (1 second to 7 days validity)
- S3-compatible REST API with Signature Version 4 authentication
- Browser UI for buckets, objects, IAM users, policies, replication, metrics, and site administration
- Filesystem-backed storage rooted at `data/`
- Bucket versioning, multipart uploads, presigned URLs, CORS, object and bucket tagging
- Server-side encryption and built-in KMS support
- Optional background services for lifecycle, garbage collection, integrity scanning, operation metrics, and system metrics history
- Replication, site sync, and static website hosting support
**Security & Access Control**
- IAM users with access key management and rotation
- Bucket policies (AWS Policy Version 2012-10-17)
- Server-side encryption (SSE-S3 and SSE-KMS)
- Built-in Key Management Service (KMS)
- Rate limiting per endpoint
## Runtime Model
**Advanced Features**
- Cross-bucket replication to remote S3-compatible endpoints
- Hot-reload for bucket policies (no restart required)
- CORS configuration per bucket
MyFSIO now runs as one Rust process:
**Management UI**
- Web console for bucket and object management
- IAM dashboard for user administration
- Inline JSON policy editor with presets
- Object browser with folder navigation and bulk operations
- Dark mode support
- API listener on `HOST` + `PORT` (default `127.0.0.1:5000`)
- UI listener on `HOST` + `UI_PORT` (default `127.0.0.1:5100`)
- Shared state for storage, IAM, policies, sessions, metrics, and background workers
## Architecture
```
+------------------+ +------------------+
| API Server | | UI Server |
| (port 5000) | | (port 5100) |
| | | |
| - S3 REST API |<------->| - Web Console |
| - SigV4 Auth | | - IAM Dashboard |
| - Presign URLs | | - Bucket Editor |
+--------+---------+ +------------------+
|
v
+------------------+ +------------------+
| Object Storage | | System Metadata |
| (filesystem) | | (.myfsio.sys/) |
| | | |
| data/<bucket>/ | | - IAM config |
| <objects> | | - Bucket policies|
| | | - Encryption keys|
+------------------+ +------------------+
```
If you want API-only mode, set `UI_ENABLED=false`. There is no separate "UI-only" runtime anymore.
## Quick Start
From the repository root:
```bash
# Clone and setup
git clone https://gitea.jzwsite.com/kqjy/MyFSIO
cd s3
python -m venv .venv
# Activate virtual environment
# Windows PowerShell:
.\.venv\Scripts\Activate.ps1
# Windows CMD:
.venv\Scripts\activate.bat
# Linux/macOS:
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Start both servers
python run.py
# Or start individually
python run.py --mode api # API only (port 5000)
python run.py --mode ui # UI only (port 5100)
cd rust/myfsio-engine
cargo run -p myfsio-server --
```
**Credentials:** Generated automatically on first run and printed to the console. If missed, check the IAM config file at `<STORAGE_ROOT>/.myfsio.sys/config/iam.json`.
Useful URLs:
- **Web Console:** http://127.0.0.1:5100/ui
- **API Endpoint:** http://127.0.0.1:5000
- UI: `http://127.0.0.1:5100/ui`
- API: `http://127.0.0.1:5000/`
- Health: `http://127.0.0.1:5000/myfsio/health`
On first boot, MyFSIO creates `data/.myfsio.sys/config/iam.json` and prints the generated admin access key and secret key to the console.
### Common CLI commands
```bash
# Show resolved configuration
cargo run -p myfsio-server -- --show-config
# Validate configuration and exit non-zero on critical issues
cargo run -p myfsio-server -- --check-config
# Reset admin credentials
cargo run -p myfsio-server -- --reset-cred
# API only
UI_ENABLED=false cargo run -p myfsio-server --
```
## Building a Binary
```bash
cd rust/myfsio-engine
cargo build --release -p myfsio-server
```
Binary locations:
- Linux/macOS: `rust/myfsio-engine/target/release/myfsio-server`
- Windows: `rust/myfsio-engine/target/release/myfsio-server.exe`
Run the built binary directly:
```bash
./target/release/myfsio-server
```
## Configuration
The server reads environment variables from the process environment and also loads, when present:
- `/opt/myfsio/myfsio.env`
- `.env`
- `myfsio.env`
Core settings:
| Variable | Default | Description |
|----------|---------|-------------|
| `STORAGE_ROOT` | `./data` | Filesystem root for bucket storage |
| `IAM_CONFIG` | `.myfsio.sys/config/iam.json` | IAM user and policy store |
| `BUCKET_POLICY_PATH` | `.myfsio.sys/config/bucket_policies.json` | Bucket policy store |
| `API_BASE_URL` | `http://127.0.0.1:5000` | API endpoint for UI calls |
| `MAX_UPLOAD_SIZE` | `1073741824` | Maximum upload size in bytes (1 GB) |
| `MULTIPART_MIN_PART_SIZE` | `5242880` | Minimum multipart part size (5 MB) |
| `UI_PAGE_SIZE` | `100` | Default page size for listings |
| `SECRET_KEY` | `dev-secret-key` | Flask session secret |
| `AWS_REGION` | `us-east-1` | Region for SigV4 signing |
| `AWS_SERVICE` | `s3` | Service name for SigV4 signing |
| `ENCRYPTION_ENABLED` | `false` | Enable server-side encryption |
| `KMS_ENABLED` | `false` | Enable Key Management Service |
| `LOG_LEVEL` | `INFO` | Logging verbosity |
| `SIGV4_TIMESTAMP_TOLERANCE_SECONDS` | `900` | Max time skew for SigV4 requests |
| `PRESIGNED_URL_MAX_EXPIRY_SECONDS` | `604800` | Max presigned URL expiry (7 days) |
| `REPLICATION_CONNECT_TIMEOUT_SECONDS` | `5` | Replication connection timeout |
| `SITE_SYNC_ENABLED` | `false` | Enable bi-directional site sync |
| `OBJECT_TAG_LIMIT` | `50` | Maximum tags per object |
| --- | --- | --- |
| `HOST` | `127.0.0.1` | Bind address for API and UI listeners |
| `PORT` | `5000` | API port |
| `UI_PORT` | `5100` | UI port |
| `UI_ENABLED` | `true` | Disable to run API-only |
| `STORAGE_ROOT` | `./data` | Root directory for buckets and system metadata |
| `IAM_CONFIG` | `<STORAGE_ROOT>/.myfsio.sys/config/iam.json` | IAM config path |
| `API_BASE_URL` | unset | Public API base used by the UI and presigned URL generation |
| `AWS_REGION` | `us-east-1` | Region used in SigV4 scope |
| `SIGV4_TIMESTAMP_TOLERANCE_SECONDS` | `900` | Allowed request time skew |
| `PRESIGNED_URL_MIN_EXPIRY_SECONDS` | `1` | Minimum presigned URL expiry |
| `PRESIGNED_URL_MAX_EXPIRY_SECONDS` | `604800` | Maximum presigned URL expiry |
| `SECRET_KEY` | loaded from `.myfsio.sys/config/.secret` if present | Session signing key and IAM-at-rest encryption key |
| `ADMIN_ACCESS_KEY` | unset | Optional first-run or reset access key |
| `ADMIN_SECRET_KEY` | unset | Optional first-run or reset secret key |
Feature toggles:
| Variable | Default |
| --- | --- |
| `ENCRYPTION_ENABLED` | `false` |
| `KMS_ENABLED` | `false` |
| `GC_ENABLED` | `false` |
| `INTEGRITY_ENABLED` | `false` |
| `LIFECYCLE_ENABLED` | `false` |
| `METRICS_HISTORY_ENABLED` | `false` |
| `OPERATION_METRICS_ENABLED` | `false` |
| `WEBSITE_HOSTING_ENABLED` | `false` |
| `SITE_SYNC_ENABLED` | `false` |
Metrics and replication tuning:
| Variable | Default |
| --- | --- |
| `OPERATION_METRICS_INTERVAL_MINUTES` | `5` |
| `OPERATION_METRICS_RETENTION_HOURS` | `24` |
| `METRICS_HISTORY_INTERVAL_MINUTES` | `5` |
| `METRICS_HISTORY_RETENTION_HOURS` | `24` |
| `REPLICATION_CONNECT_TIMEOUT_SECONDS` | `5` |
| `REPLICATION_READ_TIMEOUT_SECONDS` | `30` |
| `REPLICATION_MAX_RETRIES` | `2` |
| `REPLICATION_STREAMING_THRESHOLD_BYTES` | `10485760` |
| `REPLICATION_MAX_FAILURES_PER_BUCKET` | `50` |
| `SITE_SYNC_INTERVAL_SECONDS` | `60` |
| `SITE_SYNC_BATCH_SIZE` | `100` |
| `SITE_SYNC_CONNECT_TIMEOUT_SECONDS` | `10` |
| `SITE_SYNC_READ_TIMEOUT_SECONDS` | `120` |
| `SITE_SYNC_MAX_RETRIES` | `2` |
| `SITE_SYNC_CLOCK_SKEW_TOLERANCE_SECONDS` | `1.0` |
UI asset overrides:
| Variable | Default |
| --- | --- |
| `TEMPLATES_DIR` | built-in crate templates directory |
| `STATIC_DIR` | built-in crate static directory |
See [docs.md](./docs.md) for the full Rust-side operations guide.
## Data Layout
```
```text
data/
├── <bucket>/ # User buckets with objects
└── .myfsio.sys/ # System metadata
├── config/
│ ├── iam.json # IAM users and policies
│ ├── bucket_policies.json # Bucket policies
├── replication_rules.json
└── connections.json # Remote S3 connections
├── buckets/<bucket>/
│ ├── meta/ # Object metadata (.meta.json)
│ ├── versions/ # Archived object versions
└── .bucket.json # Bucket config (versioning, CORS)
├── multipart/ # Active multipart uploads
└── keys/ # Encryption keys (SSE-S3/KMS)
<bucket>/
.myfsio.sys/
config/
iam.json
bucket_policies.json
connections.json
operation_metrics.json
metrics_history.json
buckets/<bucket>/
meta/
versions/
multipart/
keys/
```
## API Reference
All endpoints require AWS Signature Version 4 authentication unless using presigned URLs or public bucket policies.
### Bucket Operations
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/` | List all buckets |
| `PUT` | `/<bucket>` | Create bucket |
| `DELETE` | `/<bucket>` | Delete bucket (must be empty) |
| `HEAD` | `/<bucket>` | Check bucket exists |
### Object Operations
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/<bucket>` | List objects (supports `list-type=2`) |
| `PUT` | `/<bucket>/<key>` | Upload object |
| `GET` | `/<bucket>/<key>` | Download object |
| `DELETE` | `/<bucket>/<key>` | Delete object |
| `HEAD` | `/<bucket>/<key>` | Get object metadata |
| `POST` | `/<bucket>/<key>?uploads` | Initiate multipart upload |
| `PUT` | `/<bucket>/<key>?partNumber=N&uploadId=X` | Upload part |
| `POST` | `/<bucket>/<key>?uploadId=X` | Complete multipart upload |
| `DELETE` | `/<bucket>/<key>?uploadId=X` | Abort multipart upload |
### Bucket Policies (S3-compatible)
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/<bucket>?policy` | Get bucket policy |
| `PUT` | `/<bucket>?policy` | Set bucket policy |
| `DELETE` | `/<bucket>?policy` | Delete bucket policy |
### Versioning
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/<bucket>/<key>?versionId=X` | Get specific version |
| `DELETE` | `/<bucket>/<key>?versionId=X` | Delete specific version |
| `GET` | `/<bucket>?versions` | List object versions |
### Health Check
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/myfsio/health` | Health check endpoint |
## IAM & Access Control
### Users and Access Keys
On first run, MyFSIO creates a default admin user (`localadmin`/`localadmin`). Use the IAM dashboard to:
- Create and delete users
- Generate and rotate access keys
- Attach inline policies to users
- Control IAM management permissions
### Bucket Policies
Bucket policies follow AWS policy grammar (Version `2012-10-17`) with support for:
- Principal-based access (`*` for anonymous, specific users)
- Action-based permissions (`s3:GetObject`, `s3:PutObject`, etc.)
- Resource patterns (`arn:aws:s3:::bucket/*`)
- Condition keys
**Policy Presets:**
- **Public:** Grants anonymous read access (`s3:GetObject`, `s3:ListBucket`)
- **Private:** Removes bucket policy (IAM-only access)
- **Custom:** Manual policy editing with draft preservation
Policies hot-reload when the JSON file changes.
## Server-Side Encryption
MyFSIO supports two encryption modes:
- **SSE-S3:** Server-managed keys with automatic key rotation
- **SSE-KMS:** Customer-managed keys via built-in KMS
Enable encryption with:
```bash
ENCRYPTION_ENABLED=true python run.py
```
## Cross-Bucket Replication
Replicate objects to remote S3-compatible endpoints:
1. Configure remote connections in the UI
2. Create replication rules specifying source/destination
3. Objects are automatically replicated on upload
## Docker
Build the Rust image from the `rust/` directory:
```bash
docker build -t myfsio .
docker run -p 5000:5000 -p 5100:5100 -v ./data:/app/data myfsio
docker build -t myfsio ./rust
docker run --rm -p 5000:5000 -p 5100:5100 -v "${PWD}/data:/app/data" myfsio
```
If the instance sits behind a reverse proxy, set `API_BASE_URL` to the public S3 endpoint.
## Linux Installation
The repository includes `scripts/install.sh` for systemd-style Linux installs. Build the Rust binary first, then pass it to the installer:
```bash
cd rust/myfsio-engine
cargo build --release -p myfsio-server
cd ../..
sudo ./scripts/install.sh --binary ./rust/myfsio-engine/target/release/myfsio-server
```
The installer copies the binary into `/opt/myfsio/myfsio`, writes `/opt/myfsio/myfsio.env`, and can register a `myfsio.service` unit.
## Testing
Run the Rust test suite from the workspace:
```bash
# Run all tests
pytest tests/ -v
# Run specific test file
pytest tests/test_api.py -v
# Run with coverage
pytest tests/ --cov=app --cov-report=html
cd rust/myfsio-engine
cargo test
```
## References
## Health Check
- [Amazon S3 Documentation](https://docs.aws.amazon.com/s3/)
- [AWS Signature Version 4](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html)
- [S3 Bucket Policy Examples](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies.html)
`GET /myfsio/health` returns:
```json
{
"status": "ok",
"version": "0.5.0"
}
```
The `version` field comes from the Rust crate version in `rust/myfsio-engine/crates/myfsio-server/Cargo.toml`.

View File

@@ -1,5 +0,0 @@
#!/bin/sh
set -e
# Run both services using the python runner in production mode
exec python run.py --prod

2610
docs.md

File diff suppressed because it is too large Load Diff

View File

@@ -11,3 +11,7 @@ htmlcov
logs
data
tmp
tests
myfsio_core/target
Dockerfile
.dockerignore

View File

@@ -1,9 +1,9 @@
FROM python:3.14.3-slim
FROM python:3.14.3-slim AS builder
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
WORKDIR /app
WORKDIR /build
RUN apt-get update \
&& apt-get install -y --no-install-recommends build-essential curl \
@@ -12,23 +12,34 @@ RUN apt-get update \
ENV PATH="/root/.cargo/bin:${PATH}"
RUN pip install --no-cache-dir maturin
COPY myfsio_core ./myfsio_core
RUN cd myfsio_core \
&& maturin build --release --out /wheels
FROM python:3.14.3-slim
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
COPY --from=builder /wheels/*.whl /tmp/
RUN pip install --no-cache-dir /tmp/*.whl && rm /tmp/*.whl
RUN pip install --no-cache-dir maturin \
&& cd myfsio_core \
&& maturin build --release \
&& pip install target/wheels/*.whl \
&& cd .. \
&& rm -rf myfsio_core/target \
&& pip uninstall -y maturin \
&& rustup self uninstall -y
COPY app ./app
COPY templates ./templates
COPY static ./static
COPY run.py ./
COPY docker-entrypoint.sh ./
RUN chmod +x docker-entrypoint.sh
RUN mkdir -p /app/data \
RUN chmod +x docker-entrypoint.sh \
&& mkdir -p /app/data \
&& useradd -m -u 1000 myfsio \
&& chown -R myfsio:myfsio /app

14
python/README.md Normal file
View File

@@ -0,0 +1,14 @@
# Deprecated Python Implementation
The Python implementation of MyFSIO is deprecated as of 2026-04-21.
The supported server runtime now lives in `../rust/myfsio-engine` and serves the S3 API and web UI from the Rust `myfsio-server` binary. Keep this tree for migration reference, compatibility checks, and legacy tests only.
For normal development and operations, run:
```bash
cd ../rust/myfsio-engine
cargo run -p myfsio-server --
```
Do not add new product features to the Python implementation unless they are needed to unblock a migration or compare behavior with the Rust server.

View File

@@ -1,6 +1,5 @@
from __future__ import annotations
import html as html_module
import itertools
import logging
import mimetypes
@@ -18,6 +17,8 @@ from flask_cors import CORS
from flask_wtf.csrf import CSRFError
from werkzeug.middleware.proxy_fix import ProxyFix
import io
from .access_logging import AccessLoggingService
from .operation_metrics import OperationMetricsCollector, classify_endpoint
from .compression import GzipMiddleware
@@ -30,6 +31,7 @@ from .extensions import limiter, csrf
from .iam import IamService
from .kms import KMSManager
from .gc import GarbageCollector
from .integrity import IntegrityChecker
from .lifecycle import LifecycleManager
from .notifications import NotificationService
from .object_lock import ObjectLockService
@@ -43,6 +45,64 @@ from .website_domains import WebsiteDomainStore
_request_counter = itertools.count(1)
class _ChunkedTransferMiddleware:
def __init__(self, app):
self.app = app
def __call__(self, environ, start_response):
if environ.get("REQUEST_METHOD") not in ("PUT", "POST"):
return self.app(environ, start_response)
transfer_encoding = environ.get("HTTP_TRANSFER_ENCODING", "")
content_length = environ.get("CONTENT_LENGTH")
if "chunked" in transfer_encoding.lower():
if content_length:
del environ["HTTP_TRANSFER_ENCODING"]
else:
raw = environ.get("wsgi.input")
if raw:
try:
if hasattr(raw, "seek"):
raw.seek(0)
body = raw.read()
except Exception:
body = b""
if body:
environ["wsgi.input"] = io.BytesIO(body)
environ["CONTENT_LENGTH"] = str(len(body))
del environ["HTTP_TRANSFER_ENCODING"]
content_length = environ.get("CONTENT_LENGTH")
if not content_length or content_length == "0":
sha256 = environ.get("HTTP_X_AMZ_CONTENT_SHA256", "")
decoded_len = environ.get("HTTP_X_AMZ_DECODED_CONTENT_LENGTH", "")
content_encoding = environ.get("HTTP_CONTENT_ENCODING", "")
if ("STREAMING" in sha256.upper() or decoded_len
or "aws-chunked" in content_encoding.lower()):
raw = environ.get("wsgi.input")
if raw:
try:
if hasattr(raw, "seek"):
raw.seek(0)
body = raw.read()
except Exception:
body = b""
if body:
environ["wsgi.input"] = io.BytesIO(body)
environ["CONTENT_LENGTH"] = str(len(body))
raw = environ.get("wsgi.input")
if raw and hasattr(raw, "seek"):
try:
raw.seek(0)
except Exception:
pass
return self.app(environ, start_response)
def _migrate_config_file(active_path: Path, legacy_paths: List[Path]) -> Path:
"""Migrate config file from legacy locations to the active path.
@@ -106,10 +166,11 @@ def create_app(
)
app.wsgi_app = ProxyFix(app.wsgi_app, x_for=num_proxies, x_proto=num_proxies, x_host=num_proxies, x_prefix=num_proxies)
# Enable gzip compression for responses (10-20x smaller JSON payloads)
if app.config.get("ENABLE_GZIP", True):
app.wsgi_app = GzipMiddleware(app.wsgi_app, compression_level=6)
app.wsgi_app = _ChunkedTransferMiddleware(app.wsgi_app)
_configure_cors(app)
_configure_logging(app)
@@ -122,6 +183,7 @@ def create_app(
object_cache_max_size=app.config.get("OBJECT_CACHE_MAX_SIZE", 100),
bucket_config_cache_ttl=app.config.get("BUCKET_CONFIG_CACHE_TTL_SECONDS", 30.0),
object_key_max_length_bytes=app.config.get("OBJECT_KEY_MAX_LENGTH_BYTES", 1024),
meta_read_cache_max=app.config.get("META_READ_CACHE_MAX", 2048),
)
if app.config.get("WARM_CACHE_ON_STARTUP", True) and not app.config.get("TESTING"):
@@ -231,9 +293,22 @@ def create_app(
multipart_max_age_days=app.config.get("GC_MULTIPART_MAX_AGE_DAYS", 7),
lock_file_max_age_hours=app.config.get("GC_LOCK_FILE_MAX_AGE_HOURS", 1.0),
dry_run=app.config.get("GC_DRY_RUN", False),
io_throttle_ms=app.config.get("GC_IO_THROTTLE_MS", 10),
)
gc_collector.start()
integrity_checker = None
if app.config.get("INTEGRITY_ENABLED", False):
integrity_checker = IntegrityChecker(
storage_root=storage_root,
interval_hours=app.config.get("INTEGRITY_INTERVAL_HOURS", 24.0),
batch_size=app.config.get("INTEGRITY_BATCH_SIZE", 1000),
auto_heal=app.config.get("INTEGRITY_AUTO_HEAL", False),
dry_run=app.config.get("INTEGRITY_DRY_RUN", False),
io_throttle_ms=app.config.get("INTEGRITY_IO_THROTTLE_MS", 10),
)
integrity_checker.start()
app.extensions["object_storage"] = storage
app.extensions["iam"] = iam
app.extensions["bucket_policies"] = bucket_policies
@@ -246,6 +321,7 @@ def create_app(
app.extensions["acl"] = acl_service
app.extensions["lifecycle"] = lifecycle_manager
app.extensions["gc"] = gc_collector
app.extensions["integrity"] = integrity_checker
app.extensions["object_lock"] = object_lock_service
app.extensions["notifications"] = notification_service
app.extensions["access_logging"] = access_logging_service
@@ -549,30 +625,57 @@ def _configure_logging(app: Flask) -> None:
is_encrypted = "x-amz-server-side-encryption" in metadata
except (StorageError, OSError):
pass
if request.method == "HEAD":
response = Response(status=200)
if is_encrypted and hasattr(storage, "get_object_data"):
try:
data, _ = storage.get_object_data(bucket, object_key)
response.headers["Content-Length"] = len(data)
file_size = len(data)
except (StorageError, OSError):
return _website_error_response(500, "Internal Server Error")
else:
data = None
try:
stat = obj_path.stat()
response.headers["Content-Length"] = stat.st_size
file_size = stat.st_size
except OSError:
return _website_error_response(500, "Internal Server Error")
if request.method == "HEAD":
response = Response(status=200)
response.headers["Content-Length"] = file_size
response.headers["Content-Type"] = content_type
response.headers["Accept-Ranges"] = "bytes"
return response
if is_encrypted and hasattr(storage, "get_object_data"):
try:
data, _ = storage.get_object_data(bucket, object_key)
from .s3_api import _parse_range_header
range_header = request.headers.get("Range")
if range_header:
ranges = _parse_range_header(range_header, file_size)
if ranges is None:
return Response(status=416, headers={"Content-Range": f"bytes */{file_size}"})
start, end = ranges[0]
length = end - start + 1
if data is not None:
partial_data = data[start:end + 1]
response = Response(partial_data, status=206, mimetype=content_type)
else:
def _stream_range(file_path, start_pos, length_to_read):
with file_path.open("rb") as f:
f.seek(start_pos)
remaining = length_to_read
while remaining > 0:
chunk = f.read(min(262144, remaining))
if not chunk:
break
remaining -= len(chunk)
yield chunk
response = Response(_stream_range(obj_path, start, length), status=206, mimetype=content_type, direct_passthrough=True)
response.headers["Content-Range"] = f"bytes {start}-{end}/{file_size}"
response.headers["Content-Length"] = length
response.headers["Accept-Ranges"] = "bytes"
return response
if data is not None:
response = Response(data, mimetype=content_type)
response.headers["Content-Length"] = len(data)
response.headers["Content-Length"] = file_size
response.headers["Accept-Ranges"] = "bytes"
return response
except (StorageError, OSError):
return _website_error_response(500, "Internal Server Error")
def _stream(file_path):
with file_path.open("rb") as f:
while True:
@@ -580,13 +683,10 @@ def _configure_logging(app: Flask) -> None:
if not chunk:
break
yield chunk
try:
stat = obj_path.stat()
response = Response(_stream(obj_path), mimetype=content_type, direct_passthrough=True)
response.headers["Content-Length"] = stat.st_size
response.headers["Content-Length"] = file_size
response.headers["Accept-Ranges"] = "bytes"
return response
except OSError:
return _website_error_response(500, "Internal Server Error")
def _serve_website_error(storage, bucket, error_doc_key, status_code):
if not error_doc_key:
@@ -619,9 +719,10 @@ def _configure_logging(app: Flask) -> None:
return _website_error_response(status_code, "Not Found")
def _website_error_response(status_code, message):
safe_msg = html_module.escape(str(message))
safe_code = html_module.escape(str(status_code))
body = f"<html><head><title>{safe_code} {safe_msg}</title></head><body><h1>{safe_code} {safe_msg}</h1></body></html>"
if status_code == 404:
body = "<h1>404 page not found</h1>"
else:
body = f"{status_code} {message}"
return Response(body, status=status_code, mimetype="text/html")
@app.after_request
@@ -641,6 +742,7 @@ def _configure_logging(app: Flask) -> None:
},
)
response.headers["X-Request-Duration-ms"] = f"{duration_ms:.2f}"
response.headers["Server"] = "MyFSIO"
operation_metrics = app.extensions.get("operation_metrics")
if operation_metrics:

View File

@@ -15,6 +15,7 @@ from flask import Blueprint, Response, current_app, jsonify, request
from .connections import ConnectionStore
from .extensions import limiter
from .gc import GarbageCollector
from .integrity import IntegrityChecker
from .iam import IamError, Principal
from .replication import ReplicationManager
from .site_registry import PeerSite, SiteInfo, SiteRegistry
@@ -685,6 +686,107 @@ def _storage():
return current_app.extensions["object_storage"]
def _require_iam_action(action: str):
principal, error = _require_principal()
if error:
return None, error
try:
_iam().authorize(principal, None, action)
return principal, None
except IamError:
return None, _json_error("AccessDenied", f"Requires {action} permission", 403)
@admin_api_bp.route("/iam/users", methods=["GET"])
@limiter.limit(lambda: _get_admin_rate_limit())
def iam_list_users():
principal, error = _require_iam_action("iam:list_users")
if error:
return error
return jsonify({"users": _iam().list_users()})
@admin_api_bp.route("/iam/users/<identifier>", methods=["GET"])
@limiter.limit(lambda: _get_admin_rate_limit())
def iam_get_user(identifier):
principal, error = _require_iam_action("iam:get_user")
if error:
return error
try:
user_id = _iam().resolve_user_id(identifier)
return jsonify(_iam().get_user_by_id(user_id))
except IamError as exc:
return _json_error("NotFound", str(exc), 404)
@admin_api_bp.route("/iam/users/<identifier>/policies", methods=["GET"])
@limiter.limit(lambda: _get_admin_rate_limit())
def iam_get_user_policies(identifier):
principal, error = _require_iam_action("iam:get_policy")
if error:
return error
try:
return jsonify({"policies": _iam().get_user_policies(identifier)})
except IamError as exc:
return _json_error("NotFound", str(exc), 404)
@admin_api_bp.route("/iam/users/<identifier>/keys", methods=["POST"])
@limiter.limit(lambda: _get_admin_rate_limit())
def iam_create_access_key(identifier):
principal, error = _require_iam_action("iam:create_key")
if error:
return error
try:
result = _iam().create_access_key(identifier)
logger.info("Access key created for %s by %s", identifier, principal.access_key)
return jsonify(result), 201
except IamError as exc:
return _json_error("InvalidRequest", str(exc), 400)
@admin_api_bp.route("/iam/users/<identifier>/keys/<access_key>", methods=["DELETE"])
@limiter.limit(lambda: _get_admin_rate_limit())
def iam_delete_access_key(identifier, access_key):
principal, error = _require_iam_action("iam:delete_key")
if error:
return error
try:
_iam().delete_access_key(access_key)
logger.info("Access key %s deleted by %s", access_key, principal.access_key)
return "", 204
except IamError as exc:
return _json_error("InvalidRequest", str(exc), 400)
@admin_api_bp.route("/iam/users/<identifier>/disable", methods=["POST"])
@limiter.limit(lambda: _get_admin_rate_limit())
def iam_disable_user(identifier):
principal, error = _require_iam_action("iam:disable_user")
if error:
return error
try:
_iam().disable_user(identifier)
logger.info("User %s disabled by %s", identifier, principal.access_key)
return jsonify({"status": "disabled"})
except IamError as exc:
return _json_error("InvalidRequest", str(exc), 400)
@admin_api_bp.route("/iam/users/<identifier>/enable", methods=["POST"])
@limiter.limit(lambda: _get_admin_rate_limit())
def iam_enable_user(identifier):
principal, error = _require_iam_action("iam:disable_user")
if error:
return error
try:
_iam().enable_user(identifier)
logger.info("User %s enabled by %s", identifier, principal.access_key)
return jsonify({"status": "enabled"})
except IamError as exc:
return _json_error("InvalidRequest", str(exc), 400)
@admin_api_bp.route("/website-domains", methods=["GET"])
@limiter.limit(lambda: _get_admin_rate_limit())
def list_website_domains():
@@ -805,15 +907,11 @@ def gc_run_now():
if not gc:
return _json_error("InvalidRequest", "GC is not enabled", 400)
payload = request.get_json(silent=True) or {}
original_dry_run = gc.dry_run
if "dry_run" in payload:
gc.dry_run = bool(payload["dry_run"])
try:
result = gc.run_now()
finally:
gc.dry_run = original_dry_run
started = gc.run_async(dry_run=payload.get("dry_run"))
logger.info("GC manual run by %s", principal.access_key)
return jsonify(result.to_dict())
if not started:
return _json_error("Conflict", "GC is already in progress", 409)
return jsonify({"status": "started"})
@admin_api_bp.route("/gc/history", methods=["GET"])
@@ -829,3 +927,58 @@ def gc_history():
offset = int(request.args.get("offset", 0))
records = gc.get_history(limit=limit, offset=offset)
return jsonify({"executions": records})
def _integrity() -> Optional[IntegrityChecker]:
return current_app.extensions.get("integrity")
@admin_api_bp.route("/integrity/status", methods=["GET"])
@limiter.limit(lambda: _get_admin_rate_limit())
def integrity_status():
principal, error = _require_admin()
if error:
return error
checker = _integrity()
if not checker:
return jsonify({"enabled": False, "message": "Integrity checker is not enabled. Set INTEGRITY_ENABLED=true to enable."})
return jsonify(checker.get_status())
@admin_api_bp.route("/integrity/run", methods=["POST"])
@limiter.limit(lambda: _get_admin_rate_limit())
def integrity_run_now():
principal, error = _require_admin()
if error:
return error
checker = _integrity()
if not checker:
return _json_error("InvalidRequest", "Integrity checker is not enabled", 400)
payload = request.get_json(silent=True) or {}
override_dry_run = payload.get("dry_run")
override_auto_heal = payload.get("auto_heal")
started = checker.run_async(
auto_heal=override_auto_heal if override_auto_heal is not None else None,
dry_run=override_dry_run if override_dry_run is not None else None,
)
logger.info("Integrity manual run by %s", principal.access_key)
if not started:
return _json_error("Conflict", "A scan is already in progress", 409)
return jsonify({"status": "started"})
@admin_api_bp.route("/integrity/history", methods=["GET"])
@limiter.limit(lambda: _get_admin_rate_limit())
def integrity_history():
principal, error = _require_admin()
if error:
return error
checker = _integrity()
if not checker:
return jsonify({"executions": []})
limit = min(int(request.args.get("limit", 50)), 200)
offset = int(request.args.get("offset", 0))
records = checker.get_history(limit=limit, offset=offset)
return jsonify({"executions": records})

View File

@@ -25,7 +25,7 @@ def _calculate_auto_connection_limit() -> int:
def _calculate_auto_backlog(connection_limit: int) -> int:
return max(64, min(connection_limit * 2, 4096))
return max(128, min(connection_limit * 2, 4096))
def _validate_rate_limit(value: str) -> str:
@@ -115,6 +115,7 @@ class AppConfig:
server_connection_limit: int
server_backlog: int
server_channel_timeout: int
server_max_buffer_size: int
server_threads_auto: bool
server_connection_limit_auto: bool
server_backlog_auto: bool
@@ -135,6 +136,7 @@ class AppConfig:
site_sync_clock_skew_tolerance_seconds: float
object_key_max_length_bytes: int
object_cache_max_size: int
meta_read_cache_max: int
bucket_config_cache_ttl_seconds: float
object_tag_limit: int
encryption_chunk_size_bytes: int
@@ -156,6 +158,13 @@ class AppConfig:
gc_multipart_max_age_days: int
gc_lock_file_max_age_hours: float
gc_dry_run: bool
gc_io_throttle_ms: int
integrity_enabled: bool
integrity_interval_hours: float
integrity_batch_size: int
integrity_auto_heal: bool
integrity_dry_run: bool
integrity_io_throttle_ms: int
@classmethod
def from_env(cls, overrides: Optional[Dict[str, Any]] = None) -> "AppConfig":
@@ -288,6 +297,7 @@ class AppConfig:
server_backlog_auto = False
server_channel_timeout = int(_get("SERVER_CHANNEL_TIMEOUT", 120))
server_max_buffer_size = int(_get("SERVER_MAX_BUFFER_SIZE", 1024 * 1024 * 128))
site_sync_enabled = str(_get("SITE_SYNC_ENABLED", "0")).lower() in {"1", "true", "yes", "on"}
site_sync_interval_seconds = int(_get("SITE_SYNC_INTERVAL_SECONDS", 60))
site_sync_batch_size = int(_get("SITE_SYNC_BATCH_SIZE", 100))
@@ -306,6 +316,7 @@ class AppConfig:
site_sync_clock_skew_tolerance_seconds = float(_get("SITE_SYNC_CLOCK_SKEW_TOLERANCE_SECONDS", 1.0))
object_key_max_length_bytes = int(_get("OBJECT_KEY_MAX_LENGTH_BYTES", 1024))
object_cache_max_size = int(_get("OBJECT_CACHE_MAX_SIZE", 100))
meta_read_cache_max = int(_get("META_READ_CACHE_MAX", 2048))
bucket_config_cache_ttl_seconds = float(_get("BUCKET_CONFIG_CACHE_TTL_SECONDS", 30.0))
object_tag_limit = int(_get("OBJECT_TAG_LIMIT", 50))
encryption_chunk_size_bytes = int(_get("ENCRYPTION_CHUNK_SIZE_BYTES", 64 * 1024))
@@ -331,6 +342,13 @@ class AppConfig:
gc_multipart_max_age_days = int(_get("GC_MULTIPART_MAX_AGE_DAYS", 7))
gc_lock_file_max_age_hours = float(_get("GC_LOCK_FILE_MAX_AGE_HOURS", 1.0))
gc_dry_run = str(_get("GC_DRY_RUN", "0")).lower() in {"1", "true", "yes", "on"}
gc_io_throttle_ms = int(_get("GC_IO_THROTTLE_MS", 10))
integrity_enabled = str(_get("INTEGRITY_ENABLED", "0")).lower() in {"1", "true", "yes", "on"}
integrity_interval_hours = float(_get("INTEGRITY_INTERVAL_HOURS", 24.0))
integrity_batch_size = int(_get("INTEGRITY_BATCH_SIZE", 1000))
integrity_auto_heal = str(_get("INTEGRITY_AUTO_HEAL", "0")).lower() in {"1", "true", "yes", "on"}
integrity_dry_run = str(_get("INTEGRITY_DRY_RUN", "0")).lower() in {"1", "true", "yes", "on"}
integrity_io_throttle_ms = int(_get("INTEGRITY_IO_THROTTLE_MS", 10))
return cls(storage_root=storage_root,
max_upload_size=max_upload_size,
@@ -384,6 +402,7 @@ class AppConfig:
server_connection_limit=server_connection_limit,
server_backlog=server_backlog,
server_channel_timeout=server_channel_timeout,
server_max_buffer_size=server_max_buffer_size,
server_threads_auto=server_threads_auto,
server_connection_limit_auto=server_connection_limit_auto,
server_backlog_auto=server_backlog_auto,
@@ -404,6 +423,7 @@ class AppConfig:
site_sync_clock_skew_tolerance_seconds=site_sync_clock_skew_tolerance_seconds,
object_key_max_length_bytes=object_key_max_length_bytes,
object_cache_max_size=object_cache_max_size,
meta_read_cache_max=meta_read_cache_max,
bucket_config_cache_ttl_seconds=bucket_config_cache_ttl_seconds,
object_tag_limit=object_tag_limit,
encryption_chunk_size_bytes=encryption_chunk_size_bytes,
@@ -424,7 +444,14 @@ class AppConfig:
gc_temp_file_max_age_hours=gc_temp_file_max_age_hours,
gc_multipart_max_age_days=gc_multipart_max_age_days,
gc_lock_file_max_age_hours=gc_lock_file_max_age_hours,
gc_dry_run=gc_dry_run)
gc_dry_run=gc_dry_run,
gc_io_throttle_ms=gc_io_throttle_ms,
integrity_enabled=integrity_enabled,
integrity_interval_hours=integrity_interval_hours,
integrity_batch_size=integrity_batch_size,
integrity_auto_heal=integrity_auto_heal,
integrity_dry_run=integrity_dry_run,
integrity_io_throttle_ms=integrity_io_throttle_ms)
def validate_and_report(self) -> list[str]:
"""Validate configuration and return a list of warnings/issues.
@@ -489,10 +516,12 @@ class AppConfig:
issues.append(f"CRITICAL: SERVER_THREADS={self.server_threads} is outside valid range (1-64). Server cannot start.")
if not (10 <= self.server_connection_limit <= 1000):
issues.append(f"CRITICAL: SERVER_CONNECTION_LIMIT={self.server_connection_limit} is outside valid range (10-1000). Server cannot start.")
if not (64 <= self.server_backlog <= 4096):
issues.append(f"CRITICAL: SERVER_BACKLOG={self.server_backlog} is outside valid range (64-4096). Server cannot start.")
if not (128 <= self.server_backlog <= 4096):
issues.append(f"CRITICAL: SERVER_BACKLOG={self.server_backlog} is outside valid range (128-4096). Server cannot start.")
if not (10 <= self.server_channel_timeout <= 300):
issues.append(f"CRITICAL: SERVER_CHANNEL_TIMEOUT={self.server_channel_timeout} is outside valid range (10-300). Server cannot start.")
if self.server_max_buffer_size < 1024 * 1024:
issues.append(f"WARNING: SERVER_MAX_BUFFER_SIZE={self.server_max_buffer_size} is less than 1MB. Large uploads will fail.")
if sys.platform != "win32":
try:
@@ -538,6 +567,7 @@ class AppConfig:
print(f" CONNECTION_LIMIT: {self.server_connection_limit}{_auto(self.server_connection_limit_auto)}")
print(f" BACKLOG: {self.server_backlog}{_auto(self.server_backlog_auto)}")
print(f" CHANNEL_TIMEOUT: {self.server_channel_timeout}s")
print(f" MAX_BUFFER_SIZE: {self.server_max_buffer_size // (1024 * 1024)}MB")
print("=" * 60)
issues = self.validate_and_report()
@@ -603,6 +633,7 @@ class AppConfig:
"SERVER_CONNECTION_LIMIT": self.server_connection_limit,
"SERVER_BACKLOG": self.server_backlog,
"SERVER_CHANNEL_TIMEOUT": self.server_channel_timeout,
"SERVER_MAX_BUFFER_SIZE": self.server_max_buffer_size,
"SITE_SYNC_ENABLED": self.site_sync_enabled,
"SITE_SYNC_INTERVAL_SECONDS": self.site_sync_interval_seconds,
"SITE_SYNC_BATCH_SIZE": self.site_sync_batch_size,
@@ -620,6 +651,7 @@ class AppConfig:
"SITE_SYNC_CLOCK_SKEW_TOLERANCE_SECONDS": self.site_sync_clock_skew_tolerance_seconds,
"OBJECT_KEY_MAX_LENGTH_BYTES": self.object_key_max_length_bytes,
"OBJECT_CACHE_MAX_SIZE": self.object_cache_max_size,
"META_READ_CACHE_MAX": self.meta_read_cache_max,
"BUCKET_CONFIG_CACHE_TTL_SECONDS": self.bucket_config_cache_ttl_seconds,
"OBJECT_TAG_LIMIT": self.object_tag_limit,
"ENCRYPTION_CHUNK_SIZE_BYTES": self.encryption_chunk_size_bytes,
@@ -641,4 +673,11 @@ class AppConfig:
"GC_MULTIPART_MAX_AGE_DAYS": self.gc_multipart_max_age_days,
"GC_LOCK_FILE_MAX_AGE_HOURS": self.gc_lock_file_max_age_hours,
"GC_DRY_RUN": self.gc_dry_run,
"GC_IO_THROTTLE_MS": self.gc_io_throttle_ms,
"INTEGRITY_ENABLED": self.integrity_enabled,
"INTEGRITY_INTERVAL_HOURS": self.integrity_interval_hours,
"INTEGRITY_BATCH_SIZE": self.integrity_batch_size,
"INTEGRITY_AUTO_HEAL": self.integrity_auto_heal,
"INTEGRITY_DRY_RUN": self.integrity_dry_run,
"INTEGRITY_IO_THROTTLE_MS": self.integrity_io_throttle_ms,
}

View File

@@ -193,6 +193,9 @@ class EncryptedObjectStorage:
def list_objects_shallow(self, bucket_name: str, **kwargs):
return self.storage.list_objects_shallow(bucket_name, **kwargs)
def iter_objects_shallow(self, bucket_name: str, **kwargs):
return self.storage.iter_objects_shallow(bucket_name, **kwargs)
def search_objects(self, bucket_name: str, query: str, **kwargs):
return self.storage.search_objects(bucket_name, query, **kwargs)

View File

@@ -21,6 +21,10 @@ if sys.platform != "win32":
try:
import myfsio_core as _rc
if not all(hasattr(_rc, f) for f in (
"encrypt_stream_chunked", "decrypt_stream_chunked",
)):
raise ImportError("myfsio_core is outdated, rebuild with: cd myfsio_core && maturin develop --release")
_HAS_RUST = True
except ImportError:
_rc = None

View File

@@ -175,13 +175,21 @@ def handle_app_error(error: AppError) -> Response:
def handle_rate_limit_exceeded(e: RateLimitExceeded) -> Response:
g.s3_error_code = "SlowDown"
if request.path.startswith("/ui") or request.path.startswith("/buckets"):
wants_json = (
request.is_json or
request.headers.get("X-Requested-With") == "XMLHttpRequest" or
"application/json" in request.accept_mimetypes.values()
)
if wants_json:
return jsonify({"success": False, "error": {"code": "SlowDown", "message": "Please reduce your request rate."}}), 429
error = Element("Error")
SubElement(error, "Code").text = "SlowDown"
SubElement(error, "Message").text = "Please reduce your request rate."
SubElement(error, "Resource").text = request.path
SubElement(error, "RequestId").text = getattr(g, "request_id", "")
xml_bytes = tostring(error, encoding="utf-8")
return Response(xml_bytes, status=429, mimetype="application/xml")
return Response(xml_bytes, status="429 Too Many Requests", mimetype="application/xml")
def register_error_handlers(app):

View File

@@ -162,6 +162,7 @@ class GarbageCollector:
lock_file_max_age_hours: float = 1.0,
dry_run: bool = False,
max_history: int = 50,
io_throttle_ms: int = 10,
) -> None:
self.storage_root = Path(storage_root)
self.interval_seconds = interval_hours * 3600.0
@@ -172,6 +173,9 @@ class GarbageCollector:
self._timer: Optional[threading.Timer] = None
self._shutdown = False
self._lock = threading.Lock()
self._scanning = False
self._scan_start_time: Optional[float] = None
self._io_throttle = max(0, io_throttle_ms) / 1000.0
self.history_store = GCHistoryStore(storage_root, max_records=max_history)
def start(self) -> None:
@@ -212,16 +216,30 @@ class GarbageCollector:
finally:
self._schedule_next()
def run_now(self) -> GCResult:
start = time.time()
def run_now(self, dry_run: Optional[bool] = None) -> GCResult:
if not self._lock.acquire(blocking=False):
raise RuntimeError("GC is already in progress")
effective_dry_run = dry_run if dry_run is not None else self.dry_run
try:
self._scanning = True
self._scan_start_time = time.time()
start = self._scan_start_time
result = GCResult()
original_dry_run = self.dry_run
self.dry_run = effective_dry_run
try:
self._clean_temp_files(result)
self._clean_orphaned_multipart(result)
self._clean_stale_locks(result)
self._clean_orphaned_metadata(result)
self._clean_orphaned_versions(result)
self._clean_empty_dirs(result)
finally:
self.dry_run = original_dry_run
result.execution_time_seconds = time.time() - start
@@ -240,21 +258,39 @@ class GarbageCollector:
result.orphaned_version_bytes_freed / (1024 * 1024),
result.empty_dirs_removed,
len(result.errors),
" (dry run)" if self.dry_run else "",
" (dry run)" if effective_dry_run else "",
)
record = GCExecutionRecord(
timestamp=time.time(),
result=result.to_dict(),
dry_run=self.dry_run,
dry_run=effective_dry_run,
)
self.history_store.add(record)
return result
finally:
self._scanning = False
self._scan_start_time = None
self._lock.release()
def run_async(self, dry_run: Optional[bool] = None) -> bool:
if self._scanning:
return False
t = threading.Thread(target=self.run_now, args=(dry_run,), daemon=True)
t.start()
return True
def _system_path(self) -> Path:
return self.storage_root / self.SYSTEM_ROOT
def _throttle(self) -> bool:
if self._shutdown:
return True
if self._io_throttle > 0:
time.sleep(self._io_throttle)
return self._shutdown
def _list_bucket_names(self) -> List[str]:
names = []
try:
@@ -271,6 +307,8 @@ class GarbageCollector:
return
try:
for entry in tmp_dir.iterdir():
if self._throttle():
return
if not entry.is_file():
continue
age = _file_age_hours(entry)
@@ -292,6 +330,8 @@ class GarbageCollector:
bucket_names = self._list_bucket_names()
for bucket_name in bucket_names:
if self._shutdown:
return
for multipart_root in (
self._system_path() / self.SYSTEM_MULTIPART_DIR / bucket_name,
self.storage_root / bucket_name / ".multipart",
@@ -300,6 +340,8 @@ class GarbageCollector:
continue
try:
for upload_dir in multipart_root.iterdir():
if self._throttle():
return
if not upload_dir.is_dir():
continue
self._maybe_clean_upload(upload_dir, cutoff_hours, result)
@@ -329,6 +371,8 @@ class GarbageCollector:
try:
for bucket_dir in buckets_root.iterdir():
if self._shutdown:
return
if not bucket_dir.is_dir():
continue
locks_dir = bucket_dir / "locks"
@@ -336,6 +380,8 @@ class GarbageCollector:
continue
try:
for lock_file in locks_dir.iterdir():
if self._throttle():
return
if not lock_file.is_file() or not lock_file.name.endswith(".lock"):
continue
age = _file_age_hours(lock_file)
@@ -356,6 +402,8 @@ class GarbageCollector:
bucket_names = self._list_bucket_names()
for bucket_name in bucket_names:
if self._shutdown:
return
legacy_meta = self.storage_root / bucket_name / ".meta"
if legacy_meta.exists():
self._clean_legacy_metadata(bucket_name, legacy_meta, result)
@@ -368,6 +416,8 @@ class GarbageCollector:
bucket_path = self.storage_root / bucket_name
try:
for meta_file in meta_root.rglob("*.meta.json"):
if self._throttle():
return
if not meta_file.is_file():
continue
try:
@@ -387,6 +437,8 @@ class GarbageCollector:
bucket_path = self.storage_root / bucket_name
try:
for index_file in meta_root.rglob("_index.json"):
if self._throttle():
return
if not index_file.is_file():
continue
try:
@@ -430,6 +482,8 @@ class GarbageCollector:
bucket_names = self._list_bucket_names()
for bucket_name in bucket_names:
if self._shutdown:
return
bucket_path = self.storage_root / bucket_name
for versions_root in (
self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_VERSIONS_DIR,
@@ -439,6 +493,8 @@ class GarbageCollector:
continue
try:
for key_dir in versions_root.iterdir():
if self._throttle():
return
if not key_dir.is_dir():
continue
self._clean_versions_for_key(bucket_path, versions_root, key_dir, result)
@@ -489,6 +545,8 @@ class GarbageCollector:
self._remove_empty_dirs_recursive(root, root, result)
def _remove_empty_dirs_recursive(self, path: Path, stop_at: Path, result: GCResult) -> bool:
if self._shutdown:
return False
if not path.is_dir():
return False
@@ -499,6 +557,8 @@ class GarbageCollector:
all_empty = True
for child in children:
if self._throttle():
return False
if child.is_dir():
if not self._remove_empty_dirs_recursive(child, stop_at, result):
all_empty = False
@@ -520,12 +580,17 @@ class GarbageCollector:
return [r.to_dict() for r in records]
def get_status(self) -> dict:
return {
status: Dict[str, Any] = {
"enabled": not self._shutdown or self._timer is not None,
"running": self._timer is not None and not self._shutdown,
"scanning": self._scanning,
"interval_hours": self.interval_seconds / 3600.0,
"temp_file_max_age_hours": self.temp_file_max_age_hours,
"multipart_max_age_days": self.multipart_max_age_days,
"lock_file_max_age_hours": self.lock_file_max_age_hours,
"dry_run": self.dry_run,
"io_throttle_ms": round(self._io_throttle * 1000),
}
if self._scanning and self._scan_start_time:
status["scan_elapsed_seconds"] = time.time() - self._scan_start_time
return status

View File

@@ -10,7 +10,7 @@ import secrets
import threading
import time
from collections import deque
from dataclasses import dataclass
from dataclasses import dataclass, field
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any, Deque, Dict, Iterable, List, Optional, Sequence, Set, Tuple
@@ -22,16 +22,37 @@ class IamError(RuntimeError):
"""Raised when authentication or authorization fails."""
S3_ACTIONS = {"list", "read", "write", "delete", "share", "policy", "replication", "lifecycle", "cors"}
S3_ACTIONS = {
"list", "read", "write", "delete", "share", "policy",
"replication", "lifecycle", "cors",
"create_bucket", "delete_bucket",
"versioning", "tagging", "encryption", "quota",
"object_lock", "notification", "logging", "website",
}
IAM_ACTIONS = {
"iam:list_users",
"iam:create_user",
"iam:delete_user",
"iam:rotate_key",
"iam:update_policy",
"iam:create_key",
"iam:delete_key",
"iam:get_user",
"iam:get_policy",
"iam:disable_user",
}
ALLOWED_ACTIONS = (S3_ACTIONS | IAM_ACTIONS) | {"iam:*"}
_V1_IMPLIED_ACTIONS = {
"write": {"create_bucket"},
"delete": {"delete_bucket"},
"policy": {
"versioning", "tagging", "encryption", "quota",
"object_lock", "notification", "logging", "website",
"cors", "lifecycle", "replication", "share",
},
}
ACTION_ALIASES = {
"list": "list",
"s3:listbucket": "list",
@@ -45,14 +66,11 @@ ACTION_ALIASES = {
"s3:getobjecttagging": "read",
"s3:getobjectversiontagging": "read",
"s3:getobjectacl": "read",
"s3:getbucketversioning": "read",
"s3:headobject": "read",
"s3:headbucket": "read",
"write": "write",
"s3:putobject": "write",
"s3:createbucket": "write",
"s3:putobjecttagging": "write",
"s3:putbucketversioning": "write",
"s3:createmultipartupload": "write",
"s3:uploadpart": "write",
"s3:completemultipartupload": "write",
@@ -61,8 +79,11 @@ ACTION_ALIASES = {
"delete": "delete",
"s3:deleteobject": "delete",
"s3:deleteobjectversion": "delete",
"s3:deletebucket": "delete",
"s3:deleteobjecttagging": "delete",
"create_bucket": "create_bucket",
"s3:createbucket": "create_bucket",
"delete_bucket": "delete_bucket",
"s3:deletebucket": "delete_bucket",
"share": "share",
"s3:putobjectacl": "share",
"s3:putbucketacl": "share",
@@ -88,11 +109,50 @@ ACTION_ALIASES = {
"s3:getbucketcors": "cors",
"s3:putbucketcors": "cors",
"s3:deletebucketcors": "cors",
"versioning": "versioning",
"s3:getbucketversioning": "versioning",
"s3:putbucketversioning": "versioning",
"tagging": "tagging",
"s3:getbuckettagging": "tagging",
"s3:putbuckettagging": "tagging",
"s3:deletebuckettagging": "tagging",
"encryption": "encryption",
"s3:getencryptionconfiguration": "encryption",
"s3:putencryptionconfiguration": "encryption",
"s3:deleteencryptionconfiguration": "encryption",
"quota": "quota",
"s3:getbucketquota": "quota",
"s3:putbucketquota": "quota",
"s3:deletebucketquota": "quota",
"object_lock": "object_lock",
"s3:getobjectlockconfiguration": "object_lock",
"s3:putobjectlockconfiguration": "object_lock",
"s3:putobjectretention": "object_lock",
"s3:getobjectretention": "object_lock",
"s3:putobjectlegalhold": "object_lock",
"s3:getobjectlegalhold": "object_lock",
"notification": "notification",
"s3:getbucketnotificationconfiguration": "notification",
"s3:putbucketnotificationconfiguration": "notification",
"s3:deletebucketnotificationconfiguration": "notification",
"logging": "logging",
"s3:getbucketlogging": "logging",
"s3:putbucketlogging": "logging",
"s3:deletebucketlogging": "logging",
"website": "website",
"s3:getbucketwebsite": "website",
"s3:putbucketwebsite": "website",
"s3:deletebucketwebsite": "website",
"iam:listusers": "iam:list_users",
"iam:createuser": "iam:create_user",
"iam:deleteuser": "iam:delete_user",
"iam:rotateaccesskey": "iam:rotate_key",
"iam:putuserpolicy": "iam:update_policy",
"iam:createaccesskey": "iam:create_key",
"iam:deleteaccesskey": "iam:delete_key",
"iam:getuser": "iam:get_user",
"iam:getpolicy": "iam:get_policy",
"iam:disableuser": "iam:disable_user",
"iam:*": "iam:*",
}
@@ -101,6 +161,7 @@ ACTION_ALIASES = {
class Policy:
bucket: str
actions: Set[str]
prefix: str = "*"
@dataclass
@@ -117,6 +178,16 @@ def _derive_fernet_key(secret: str) -> bytes:
_IAM_ENCRYPTED_PREFIX = b"MYFSIO_IAM_ENC:"
_CONFIG_VERSION = 2
def _expand_v1_actions(actions: Set[str]) -> Set[str]:
expanded = set(actions)
for action, implied in _V1_IMPLIED_ACTIONS.items():
if action in expanded:
expanded.update(implied)
return expanded
class IamService:
"""Loads IAM configuration, manages users, and evaluates policies."""
@@ -131,7 +202,10 @@ class IamService:
self.config_path.parent.mkdir(parents=True, exist_ok=True)
if not self.config_path.exists():
self._write_default()
self._users: Dict[str, Dict[str, Any]] = {}
self._user_records: Dict[str, Dict[str, Any]] = {}
self._key_index: Dict[str, str] = {}
self._key_secrets: Dict[str, str] = {}
self._key_status: Dict[str, str] = {}
self._raw_config: Dict[str, Any] = {}
self._failed_attempts: Dict[str, Deque[datetime]] = {}
self._last_load_time = 0.0
@@ -146,7 +220,6 @@ class IamService:
self._load_lockout_state()
def _maybe_reload(self) -> None:
"""Reload configuration if the file has changed on disk."""
now = time.time()
if now - self._last_stat_check < self._stat_check_interval:
return
@@ -183,11 +256,20 @@ class IamService:
raise IamError(
f"Access temporarily locked. Try again in {seconds} seconds."
)
record = self._users.get(access_key)
stored_secret = record["secret_key"] if record else secrets.token_urlsafe(24)
if not record or not hmac.compare_digest(stored_secret, secret_key):
user_id = self._key_index.get(access_key)
stored_secret = self._key_secrets.get(access_key, secrets.token_urlsafe(24))
if not user_id or not hmac.compare_digest(stored_secret, secret_key):
self._record_failed_attempt(access_key)
raise IamError("Invalid credentials")
key_status = self._key_status.get(access_key, "active")
if key_status != "active":
raise IamError("Access key is inactive")
record = self._user_records.get(user_id)
if not record:
self._record_failed_attempt(access_key)
raise IamError("Invalid credentials")
if not record.get("enabled", True):
raise IamError("User account is disabled")
self._check_expiry(access_key, record)
self._clear_failed_attempts(access_key)
return self._build_principal(access_key, record)
@@ -215,7 +297,6 @@ class IamService:
return self.config_path.parent / "lockout_state.json"
def _load_lockout_state(self) -> None:
"""Load lockout state from disk."""
try:
if self._lockout_file().exists():
data = json.loads(self._lockout_file().read_text(encoding="utf-8"))
@@ -235,7 +316,6 @@ class IamService:
pass
def _save_lockout_state(self) -> None:
"""Persist lockout state to disk."""
data: Dict[str, Any] = {"failed_attempts": {}}
for key, attempts in self._failed_attempts.items():
data["failed_attempts"][key] = [ts.isoformat() for ts in attempts]
@@ -270,10 +350,9 @@ class IamService:
return int(max(0, self.auth_lockout_window.total_seconds() - elapsed))
def create_session_token(self, access_key: str, duration_seconds: int = 3600) -> str:
"""Create a temporary session token for an access key."""
self._maybe_reload()
record = self._users.get(access_key)
if not record:
user_id = self._key_index.get(access_key)
if not user_id or user_id not in self._user_records:
raise IamError("Unknown access key")
self._cleanup_expired_sessions()
token = secrets.token_urlsafe(32)
@@ -285,7 +364,6 @@ class IamService:
return token
def validate_session_token(self, access_key: str, session_token: str) -> bool:
"""Validate a session token for an access key (thread-safe, constant-time)."""
dummy_key = secrets.token_urlsafe(16)
dummy_token = secrets.token_urlsafe(32)
with self._session_lock:
@@ -304,7 +382,6 @@ class IamService:
return True
def _cleanup_expired_sessions(self) -> None:
"""Remove expired session tokens."""
now = time.time()
expired = [token for token, data in self._sessions.items() if now > data["expires_at"]]
for token in expired:
@@ -316,13 +393,20 @@ class IamService:
if cached:
principal, cached_time = cached
if now - cached_time < self._cache_ttl:
record = self._users.get(access_key)
user_id = self._key_index.get(access_key)
if user_id:
record = self._user_records.get(user_id)
if record:
self._check_expiry(access_key, record)
self._enforce_key_and_user_status(access_key)
return principal
self._maybe_reload()
record = self._users.get(access_key)
self._enforce_key_and_user_status(access_key)
user_id = self._key_index.get(access_key)
if not user_id:
raise IamError("Unknown access key")
record = self._user_records.get(user_id)
if not record:
raise IamError("Unknown access key")
self._check_expiry(access_key, record)
@@ -332,22 +416,27 @@ class IamService:
def secret_for_key(self, access_key: str) -> str:
self._maybe_reload()
record = self._users.get(access_key)
if not record:
self._enforce_key_and_user_status(access_key)
secret = self._key_secrets.get(access_key)
if not secret:
raise IamError("Unknown access key")
user_id = self._key_index.get(access_key)
if user_id:
record = self._user_records.get(user_id)
if record:
self._check_expiry(access_key, record)
return record["secret_key"]
return secret
def authorize(self, principal: Principal, bucket_name: str | None, action: str) -> None:
def authorize(self, principal: Principal, bucket_name: str | None, action: str, *, object_key: str | None = None) -> None:
action = self._normalize_action(action)
if action not in ALLOWED_ACTIONS:
raise IamError(f"Unknown action '{action}'")
bucket_name = bucket_name or "*"
normalized = bucket_name.lower() if bucket_name != "*" else bucket_name
if not self._is_allowed(principal, normalized, action):
if not self._is_allowed(principal, normalized, action, object_key=object_key):
raise IamError(f"Access denied for action '{action}' on bucket '{bucket_name}'")
def check_permissions(self, principal: Principal, bucket_name: str | None, actions: Iterable[str]) -> Dict[str, bool]:
def check_permissions(self, principal: Principal, bucket_name: str | None, actions: Iterable[str], *, object_key: str | None = None) -> Dict[str, bool]:
self._maybe_reload()
bucket_name = (bucket_name or "*").lower() if bucket_name != "*" else (bucket_name or "*")
normalized_actions = {a: self._normalize_action(a) for a in actions}
@@ -356,37 +445,53 @@ class IamService:
if canonical not in ALLOWED_ACTIONS:
results[original] = False
else:
results[original] = self._is_allowed(principal, bucket_name, canonical)
results[original] = self._is_allowed(principal, bucket_name, canonical, object_key=object_key)
return results
def buckets_for_principal(self, principal: Principal, buckets: Iterable[str]) -> List[str]:
return [bucket for bucket in buckets if self._is_allowed(principal, bucket, "list")]
def _is_allowed(self, principal: Principal, bucket_name: str, action: str) -> bool:
def _is_allowed(self, principal: Principal, bucket_name: str, action: str, *, object_key: str | None = None) -> bool:
bucket_name = bucket_name.lower()
for policy in principal.policies:
if policy.bucket not in {"*", bucket_name}:
continue
if "*" in policy.actions or action in policy.actions:
return True
if "iam:*" in policy.actions and action.startswith("iam:"):
action_match = "*" in policy.actions or action in policy.actions
if not action_match and "iam:*" in policy.actions and action.startswith("iam:"):
action_match = True
if not action_match:
continue
if object_key is not None and policy.prefix != "*":
prefix = policy.prefix.rstrip("*")
if not object_key.startswith(prefix):
continue
return True
return False
def list_users(self) -> List[Dict[str, Any]]:
listing: List[Dict[str, Any]] = []
for access_key, record in self._users.items():
listing.append(
{
"access_key": access_key,
for user_id, record in self._user_records.items():
access_keys = []
for key_info in record.get("access_keys", []):
access_keys.append({
"access_key": key_info["access_key"],
"status": key_info.get("status", "active"),
"created_at": key_info.get("created_at"),
})
user_entry: Dict[str, Any] = {
"user_id": user_id,
"display_name": record["display_name"],
"enabled": record.get("enabled", True),
"expires_at": record.get("expires_at"),
"access_keys": access_keys,
"policies": [
{"bucket": policy.bucket, "actions": sorted(policy.actions)}
{**{"bucket": policy.bucket, "actions": sorted(policy.actions)}, **({"prefix": policy.prefix} if policy.prefix != "*" else {})}
for policy in record["policies"]
],
}
)
if access_keys:
user_entry["access_key"] = access_keys[0]["access_key"]
listing.append(user_entry)
return listing
def create_user(
@@ -397,20 +502,33 @@ class IamService:
access_key: str | None = None,
secret_key: str | None = None,
expires_at: str | None = None,
user_id: str | None = None,
) -> Dict[str, str]:
access_key = (access_key or self._generate_access_key()).strip()
if not access_key:
raise IamError("Access key cannot be empty")
if access_key in self._users:
if access_key in self._key_index:
raise IamError("Access key already exists")
if expires_at:
self._validate_expires_at(expires_at)
secret_key = secret_key or self._generate_secret_key()
sanitized_policies = self._prepare_policy_payload(policies)
user_id = user_id or self._generate_user_id()
if user_id in self._user_records:
raise IamError("User ID already exists")
now_iso = datetime.now(timezone.utc).isoformat()
record: Dict[str, Any] = {
"user_id": user_id,
"display_name": display_name or access_key,
"enabled": True,
"access_keys": [
{
"access_key": access_key,
"secret_key": secret_key,
"display_name": display_name or access_key,
"status": "active",
"created_at": now_iso,
}
],
"policies": sanitized_policies,
}
if expires_at:
@@ -418,12 +536,108 @@ class IamService:
self._raw_config.setdefault("users", []).append(record)
self._save()
self._load()
return {"access_key": access_key, "secret_key": secret_key}
return {"user_id": user_id, "access_key": access_key, "secret_key": secret_key}
def create_access_key(self, identifier: str) -> Dict[str, str]:
user_raw, _ = self._resolve_raw_user(identifier)
new_access_key = self._generate_access_key()
new_secret_key = self._generate_secret_key()
now_iso = datetime.now(timezone.utc).isoformat()
key_entry = {
"access_key": new_access_key,
"secret_key": new_secret_key,
"status": "active",
"created_at": now_iso,
}
user_raw.setdefault("access_keys", []).append(key_entry)
self._save()
self._load()
return {"access_key": new_access_key, "secret_key": new_secret_key}
def delete_access_key(self, access_key: str) -> None:
user_raw, _ = self._resolve_raw_user(access_key)
keys = user_raw.get("access_keys", [])
if len(keys) <= 1:
raise IamError("Cannot delete the only access key for a user")
remaining = [k for k in keys if k["access_key"] != access_key]
if len(remaining) == len(keys):
raise IamError("Access key not found")
user_raw["access_keys"] = remaining
self._save()
self._principal_cache.pop(access_key, None)
self._secret_key_cache.pop(access_key, None)
from .s3_api import clear_signing_key_cache
clear_signing_key_cache()
self._load()
def disable_user(self, identifier: str) -> None:
user_raw, _ = self._resolve_raw_user(identifier)
user_raw["enabled"] = False
self._save()
for key_info in user_raw.get("access_keys", []):
ak = key_info["access_key"]
self._principal_cache.pop(ak, None)
self._secret_key_cache.pop(ak, None)
from .s3_api import clear_signing_key_cache
clear_signing_key_cache()
self._load()
def enable_user(self, identifier: str) -> None:
user_raw, _ = self._resolve_raw_user(identifier)
user_raw["enabled"] = True
self._save()
self._load()
def get_user_by_id(self, user_id: str) -> Dict[str, Any]:
record = self._user_records.get(user_id)
if not record:
raise IamError("User not found")
access_keys = []
for key_info in record.get("access_keys", []):
access_keys.append({
"access_key": key_info["access_key"],
"status": key_info.get("status", "active"),
"created_at": key_info.get("created_at"),
})
return {
"user_id": user_id,
"display_name": record["display_name"],
"enabled": record.get("enabled", True),
"expires_at": record.get("expires_at"),
"access_keys": access_keys,
"policies": [
{"bucket": p.bucket, "actions": sorted(p.actions), "prefix": p.prefix}
for p in record["policies"]
],
}
def get_user_policies(self, identifier: str) -> List[Dict[str, Any]]:
_, user_id = self._resolve_raw_user(identifier)
record = self._user_records.get(user_id)
if not record:
raise IamError("User not found")
return [
{**{"bucket": p.bucket, "actions": sorted(p.actions)}, **({"prefix": p.prefix} if p.prefix != "*" else {})}
for p in record["policies"]
]
def resolve_user_id(self, identifier: str) -> str:
if identifier in self._user_records:
return identifier
user_id = self._key_index.get(identifier)
if user_id:
return user_id
raise IamError("User not found")
def rotate_secret(self, access_key: str) -> str:
user = self._get_raw_user(access_key)
user_raw, _ = self._resolve_raw_user(access_key)
new_secret = self._generate_secret_key()
user["secret_key"] = new_secret
for key_info in user_raw.get("access_keys", []):
if key_info["access_key"] == access_key:
key_info["secret_key"] = new_secret
break
else:
raise IamError("Access key not found")
self._save()
self._principal_cache.pop(access_key, None)
self._secret_key_cache.pop(access_key, None)
@@ -433,8 +647,8 @@ class IamService:
return new_secret
def update_user(self, access_key: str, display_name: str) -> None:
user = self._get_raw_user(access_key)
user["display_name"] = display_name
user_raw, _ = self._resolve_raw_user(access_key)
user_raw["display_name"] = display_name
self._save()
self._load()
@@ -442,32 +656,43 @@ class IamService:
users = self._raw_config.get("users", [])
if len(users) <= 1:
raise IamError("Cannot delete the only user")
remaining = [user for user in users if user["access_key"] != access_key]
if len(remaining) == len(users):
_, target_user_id = self._resolve_raw_user(access_key)
target_user_raw = None
remaining = []
for u in users:
if u.get("user_id") == target_user_id:
target_user_raw = u
else:
remaining.append(u)
if target_user_raw is None:
raise IamError("User not found")
self._raw_config["users"] = remaining
self._save()
self._principal_cache.pop(access_key, None)
self._secret_key_cache.pop(access_key, None)
for key_info in target_user_raw.get("access_keys", []):
ak = key_info["access_key"]
self._principal_cache.pop(ak, None)
self._secret_key_cache.pop(ak, None)
from .s3_api import clear_signing_key_cache
clear_signing_key_cache()
self._load()
def update_user_expiry(self, access_key: str, expires_at: str | None) -> None:
user = self._get_raw_user(access_key)
user_raw, _ = self._resolve_raw_user(access_key)
if expires_at:
self._validate_expires_at(expires_at)
user["expires_at"] = expires_at
user_raw["expires_at"] = expires_at
else:
user.pop("expires_at", None)
user_raw.pop("expires_at", None)
self._save()
self._principal_cache.pop(access_key, None)
self._secret_key_cache.pop(access_key, None)
for key_info in user_raw.get("access_keys", []):
ak = key_info["access_key"]
self._principal_cache.pop(ak, None)
self._secret_key_cache.pop(ak, None)
self._load()
def update_user_policies(self, access_key: str, policies: Sequence[Dict[str, Any]]) -> None:
user = self._get_raw_user(access_key)
user["policies"] = self._prepare_policy_payload(policies)
user_raw, _ = self._resolve_raw_user(access_key)
user_raw["policies"] = self._prepare_policy_payload(policies)
self._save()
self._load()
@@ -482,6 +707,52 @@ class IamService:
raise IamError("Cannot decrypt IAM config. SECRET_KEY may have changed. Use 'python run.py reset-cred' to reset credentials.")
return raw_bytes.decode("utf-8")
def _is_v2_config(self, raw: Dict[str, Any]) -> bool:
return raw.get("version", 1) >= _CONFIG_VERSION
def _migrate_v1_to_v2(self, raw: Dict[str, Any]) -> Dict[str, Any]:
migrated_users = []
now_iso = datetime.now(timezone.utc).isoformat()
for user in raw.get("users", []):
old_policies = user.get("policies", [])
expanded_policies = []
for p in old_policies:
raw_actions = p.get("actions", [])
if isinstance(raw_actions, str):
raw_actions = [raw_actions]
action_set: Set[str] = set()
for a in raw_actions:
canonical = self._normalize_action(a)
if canonical == "*":
action_set = set(ALLOWED_ACTIONS)
break
if canonical:
action_set.add(canonical)
action_set = _expand_v1_actions(action_set)
expanded_policies.append({
"bucket": p.get("bucket", "*"),
"actions": sorted(action_set),
"prefix": p.get("prefix", "*"),
})
migrated_user: Dict[str, Any] = {
"user_id": user["access_key"],
"display_name": user.get("display_name", user["access_key"]),
"enabled": True,
"access_keys": [
{
"access_key": user["access_key"],
"secret_key": user["secret_key"],
"status": "active",
"created_at": now_iso,
}
],
"policies": expanded_policies,
}
if user.get("expires_at"):
migrated_user["expires_at"] = user["expires_at"]
migrated_users.append(migrated_user)
return {"version": _CONFIG_VERSION, "users": migrated_users}
def _load(self) -> None:
try:
self._last_load_time = self.config_path.stat().st_mtime
@@ -500,35 +771,67 @@ class IamService:
raise IamError(f"Failed to load IAM config: {e}")
was_plaintext = not raw_bytes.startswith(_IAM_ENCRYPTED_PREFIX)
was_v1 = not self._is_v2_config(raw)
if was_v1:
raw = self._migrate_v1_to_v2(raw)
user_records: Dict[str, Dict[str, Any]] = {}
key_index: Dict[str, str] = {}
key_secrets: Dict[str, str] = {}
key_status_map: Dict[str, str] = {}
users: Dict[str, Dict[str, Any]] = {}
for user in raw.get("users", []):
user_id = user["user_id"]
policies = self._build_policy_objects(user.get("policies", []))
user_record: Dict[str, Any] = {
"secret_key": user["secret_key"],
"display_name": user.get("display_name", user["access_key"]),
access_keys_raw = user.get("access_keys", [])
access_keys_info = []
for key_entry in access_keys_raw:
ak = key_entry["access_key"]
sk = key_entry["secret_key"]
status = key_entry.get("status", "active")
key_index[ak] = user_id
key_secrets[ak] = sk
key_status_map[ak] = status
access_keys_info.append({
"access_key": ak,
"secret_key": sk,
"status": status,
"created_at": key_entry.get("created_at"),
})
record: Dict[str, Any] = {
"display_name": user.get("display_name", user_id),
"enabled": user.get("enabled", True),
"policies": policies,
"access_keys": access_keys_info,
}
if user.get("expires_at"):
user_record["expires_at"] = user["expires_at"]
users[user["access_key"]] = user_record
if not users:
raise IamError("IAM configuration contains no users")
self._users = users
raw_users: List[Dict[str, Any]] = []
for entry in raw.get("users", []):
raw_entry: Dict[str, Any] = {
"access_key": entry["access_key"],
"secret_key": entry["secret_key"],
"display_name": entry.get("display_name", entry["access_key"]),
"policies": entry.get("policies", []),
}
if entry.get("expires_at"):
raw_entry["expires_at"] = entry["expires_at"]
raw_users.append(raw_entry)
self._raw_config = {"users": raw_users}
record["expires_at"] = user["expires_at"]
user_records[user_id] = record
if was_plaintext and self._fernet:
if not user_records:
raise IamError("IAM configuration contains no users")
self._user_records = user_records
self._key_index = key_index
self._key_secrets = key_secrets
self._key_status = key_status_map
raw_users: List[Dict[str, Any]] = []
for user in raw.get("users", []):
raw_entry: Dict[str, Any] = {
"user_id": user["user_id"],
"display_name": user.get("display_name", user["user_id"]),
"enabled": user.get("enabled", True),
"access_keys": user.get("access_keys", []),
"policies": user.get("policies", []),
}
if user.get("expires_at"):
raw_entry["expires_at"] = user["expires_at"]
raw_users.append(raw_entry)
self._raw_config = {"version": _CONFIG_VERSION, "users": raw_users}
if was_v1 or (was_plaintext and self._fernet):
self._save()
def _save(self) -> None:
@@ -547,19 +850,30 @@ class IamService:
def config_summary(self) -> Dict[str, Any]:
return {
"path": str(self.config_path),
"user_count": len(self._users),
"user_count": len(self._user_records),
"allowed_actions": sorted(ALLOWED_ACTIONS),
}
def export_config(self, mask_secrets: bool = True) -> Dict[str, Any]:
payload: Dict[str, Any] = {"users": []}
payload: Dict[str, Any] = {"version": _CONFIG_VERSION, "users": []}
for user in self._raw_config.get("users", []):
access_keys = []
for key_info in user.get("access_keys", []):
access_keys.append({
"access_key": key_info["access_key"],
"secret_key": "\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022\u2022" if mask_secrets else key_info["secret_key"],
"status": key_info.get("status", "active"),
"created_at": key_info.get("created_at"),
})
record: Dict[str, Any] = {
"access_key": user["access_key"],
"secret_key": "••••••••••" if mask_secrets else user["secret_key"],
"user_id": user["user_id"],
"display_name": user["display_name"],
"enabled": user.get("enabled", True),
"access_keys": access_keys,
"policies": user["policies"],
}
if access_keys:
record["access_key"] = access_keys[0]["access_key"]
if user.get("expires_at"):
record["expires_at"] = user["expires_at"]
payload["users"].append(record)
@@ -569,6 +883,7 @@ class IamService:
entries: List[Policy] = []
for policy in policies:
bucket = str(policy.get("bucket", "*")).lower()
prefix = str(policy.get("prefix", "*"))
raw_actions = policy.get("actions", [])
if isinstance(raw_actions, str):
raw_actions = [raw_actions]
@@ -581,7 +896,7 @@ class IamService:
if canonical:
action_set.add(canonical)
if action_set:
entries.append(Policy(bucket=bucket, actions=action_set))
entries.append(Policy(bucket=bucket, actions=action_set, prefix=prefix))
return entries
def _prepare_policy_payload(self, policies: Optional[Sequence[Dict[str, Any]]]) -> List[Dict[str, Any]]:
@@ -589,12 +904,14 @@ class IamService:
policies = (
{
"bucket": "*",
"actions": ["list", "read", "write", "delete", "share", "policy"],
"actions": ["list", "read", "write", "delete", "share", "policy",
"create_bucket", "delete_bucket"],
},
)
sanitized: List[Dict[str, Any]] = []
for policy in policies:
bucket = str(policy.get("bucket", "*")).lower()
prefix = str(policy.get("prefix", "*"))
raw_actions = policy.get("actions", [])
if isinstance(raw_actions, str):
raw_actions = [raw_actions]
@@ -608,7 +925,10 @@ class IamService:
action_set.add(canonical)
if not action_set:
continue
sanitized.append({"bucket": bucket, "actions": sorted(action_set)})
entry: Dict[str, Any] = {"bucket": bucket, "actions": sorted(action_set)}
if prefix != "*":
entry["prefix"] = prefix
sanitized.append(entry)
if not sanitized:
raise IamError("At least one policy with valid actions is required")
return sanitized
@@ -633,12 +953,23 @@ class IamService:
access_key = os.environ.get("ADMIN_ACCESS_KEY", "").strip() or secrets.token_hex(12)
secret_key = os.environ.get("ADMIN_SECRET_KEY", "").strip() or secrets.token_urlsafe(32)
custom_keys = bool(os.environ.get("ADMIN_ACCESS_KEY", "").strip())
user_id = self._generate_user_id()
now_iso = datetime.now(timezone.utc).isoformat()
default = {
"version": _CONFIG_VERSION,
"users": [
{
"user_id": user_id,
"display_name": "Local Admin",
"enabled": True,
"access_keys": [
{
"access_key": access_key,
"secret_key": secret_key,
"display_name": "Local Admin",
"status": "active",
"created_at": now_iso,
}
],
"policies": [
{"bucket": "*", "actions": list(ALLOWED_ACTIONS)}
],
@@ -660,6 +991,7 @@ class IamService:
else:
print(f"Access Key: {access_key}")
print(f"Secret Key: {secret_key}")
print(f"User ID: {user_id}")
print(f"{'='*60}")
if self._fernet:
print("IAM config is encrypted at rest.")
@@ -682,30 +1014,58 @@ class IamService:
def _generate_secret_key(self) -> str:
return secrets.token_urlsafe(24)
def _get_raw_user(self, access_key: str) -> Dict[str, Any]:
def _generate_user_id(self) -> str:
return f"u-{secrets.token_hex(8)}"
def _resolve_raw_user(self, identifier: str) -> Tuple[Dict[str, Any], str]:
for user in self._raw_config.get("users", []):
if user["access_key"] == access_key:
return user
if user.get("user_id") == identifier:
return user, identifier
for user in self._raw_config.get("users", []):
for key_info in user.get("access_keys", []):
if key_info["access_key"] == identifier:
return user, user["user_id"]
raise IamError("User not found")
def _get_raw_user(self, access_key: str) -> Dict[str, Any]:
user, _ = self._resolve_raw_user(access_key)
return user
def _enforce_key_and_user_status(self, access_key: str) -> None:
key_status = self._key_status.get(access_key, "active")
if key_status != "active":
raise IamError("Access key is inactive")
user_id = self._key_index.get(access_key)
if user_id:
record = self._user_records.get(user_id)
if record and not record.get("enabled", True):
raise IamError("User account is disabled")
def get_secret_key(self, access_key: str) -> str | None:
now = time.time()
cached = self._secret_key_cache.get(access_key)
if cached:
secret_key, cached_time = cached
if now - cached_time < self._cache_ttl:
record = self._users.get(access_key)
user_id = self._key_index.get(access_key)
if user_id:
record = self._user_records.get(user_id)
if record:
self._check_expiry(access_key, record)
self._enforce_key_and_user_status(access_key)
return secret_key
self._maybe_reload()
record = self._users.get(access_key)
secret = self._key_secrets.get(access_key)
if secret:
user_id = self._key_index.get(access_key)
if user_id:
record = self._user_records.get(user_id)
if record:
self._check_expiry(access_key, record)
secret_key = record["secret_key"]
self._secret_key_cache[access_key] = (secret_key, now)
return secret_key
self._enforce_key_and_user_status(access_key)
self._secret_key_cache[access_key] = (secret, now)
return secret
return None
def get_principal(self, access_key: str) -> Principal | None:
@@ -714,13 +1074,19 @@ class IamService:
if cached:
principal, cached_time = cached
if now - cached_time < self._cache_ttl:
record = self._users.get(access_key)
user_id = self._key_index.get(access_key)
if user_id:
record = self._user_records.get(user_id)
if record:
self._check_expiry(access_key, record)
self._enforce_key_and_user_status(access_key)
return principal
self._maybe_reload()
record = self._users.get(access_key)
self._enforce_key_and_user_status(access_key)
user_id = self._key_index.get(access_key)
if user_id:
record = self._user_records.get(user_id)
if record:
self._check_expiry(access_key, record)
principal = self._build_principal(access_key, record)

995
python/app/integrity.py Normal file
View File

@@ -0,0 +1,995 @@
from __future__ import annotations
import hashlib
import json
import logging
import os
import threading
import time
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional
try:
import myfsio_core as _rc
if not hasattr(_rc, "md5_file"):
raise ImportError("myfsio_core is outdated, rebuild with: cd myfsio_core && maturin develop --release")
_HAS_RUST = True
except ImportError:
_HAS_RUST = False
logger = logging.getLogger(__name__)
def _compute_etag(path: Path) -> str:
if _HAS_RUST:
return _rc.md5_file(str(path))
checksum = hashlib.md5()
with path.open("rb") as handle:
for chunk in iter(lambda: handle.read(8192), b""):
checksum.update(chunk)
return checksum.hexdigest()
@dataclass
class IntegrityIssue:
issue_type: str
bucket: str
key: str
detail: str
healed: bool = False
heal_action: str = ""
def to_dict(self) -> dict:
return {
"issue_type": self.issue_type,
"bucket": self.bucket,
"key": self.key,
"detail": self.detail,
"healed": self.healed,
"heal_action": self.heal_action,
}
@dataclass
class IntegrityResult:
corrupted_objects: int = 0
orphaned_objects: int = 0
phantom_metadata: int = 0
stale_versions: int = 0
etag_cache_inconsistencies: int = 0
legacy_metadata_drifts: int = 0
issues_healed: int = 0
issues: List[IntegrityIssue] = field(default_factory=list)
errors: List[str] = field(default_factory=list)
objects_scanned: int = 0
buckets_scanned: int = 0
execution_time_seconds: float = 0.0
def to_dict(self) -> dict:
return {
"corrupted_objects": self.corrupted_objects,
"orphaned_objects": self.orphaned_objects,
"phantom_metadata": self.phantom_metadata,
"stale_versions": self.stale_versions,
"etag_cache_inconsistencies": self.etag_cache_inconsistencies,
"legacy_metadata_drifts": self.legacy_metadata_drifts,
"issues_healed": self.issues_healed,
"issues": [i.to_dict() for i in self.issues],
"errors": self.errors,
"objects_scanned": self.objects_scanned,
"buckets_scanned": self.buckets_scanned,
"execution_time_seconds": self.execution_time_seconds,
}
@property
def total_issues(self) -> int:
return (
self.corrupted_objects
+ self.orphaned_objects
+ self.phantom_metadata
+ self.stale_versions
+ self.etag_cache_inconsistencies
+ self.legacy_metadata_drifts
)
@property
def has_issues(self) -> bool:
return self.total_issues > 0
@dataclass
class IntegrityExecutionRecord:
timestamp: float
result: dict
dry_run: bool
auto_heal: bool
def to_dict(self) -> dict:
return {
"timestamp": self.timestamp,
"result": self.result,
"dry_run": self.dry_run,
"auto_heal": self.auto_heal,
}
@classmethod
def from_dict(cls, data: dict) -> IntegrityExecutionRecord:
return cls(
timestamp=data["timestamp"],
result=data["result"],
dry_run=data.get("dry_run", False),
auto_heal=data.get("auto_heal", False),
)
class IntegrityHistoryStore:
def __init__(self, storage_root: Path, max_records: int = 50) -> None:
self.storage_root = storage_root
self.max_records = max_records
self._lock = threading.Lock()
def _get_path(self) -> Path:
return self.storage_root / ".myfsio.sys" / "config" / "integrity_history.json"
def load(self) -> List[IntegrityExecutionRecord]:
path = self._get_path()
if not path.exists():
return []
try:
with open(path, "r", encoding="utf-8") as f:
data = json.load(f)
return [IntegrityExecutionRecord.from_dict(d) for d in data.get("executions", [])]
except (OSError, ValueError, KeyError) as e:
logger.error("Failed to load integrity history: %s", e)
return []
def save(self, records: List[IntegrityExecutionRecord]) -> None:
path = self._get_path()
path.parent.mkdir(parents=True, exist_ok=True)
data = {"executions": [r.to_dict() for r in records[: self.max_records]]}
try:
with open(path, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
except OSError as e:
logger.error("Failed to save integrity history: %s", e)
def add(self, record: IntegrityExecutionRecord) -> None:
with self._lock:
records = self.load()
records.insert(0, record)
self.save(records)
def get_history(self, limit: int = 50, offset: int = 0) -> List[IntegrityExecutionRecord]:
return self.load()[offset : offset + limit]
class IntegrityCursorStore:
def __init__(self, storage_root: Path) -> None:
self.storage_root = storage_root
self._lock = threading.Lock()
def _get_path(self) -> Path:
return self.storage_root / ".myfsio.sys" / "config" / "integrity_cursor.json"
def load(self) -> Dict[str, Any]:
path = self._get_path()
if not path.exists():
return {"buckets": {}}
try:
with open(path, "r", encoding="utf-8") as f:
data = json.load(f)
if not isinstance(data.get("buckets"), dict):
return {"buckets": {}}
return data
except (OSError, ValueError, KeyError):
return {"buckets": {}}
def save(self, data: Dict[str, Any]) -> None:
path = self._get_path()
path.parent.mkdir(parents=True, exist_ok=True)
try:
with open(path, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
except OSError as e:
logger.error("Failed to save integrity cursor: %s", e)
def update_bucket(
self,
bucket_name: str,
timestamp: float,
last_key: Optional[str] = None,
completed: bool = False,
) -> None:
with self._lock:
data = self.load()
entry = data["buckets"].get(bucket_name, {})
if completed:
entry["last_scanned"] = timestamp
entry.pop("last_key", None)
entry["completed"] = True
else:
entry["last_scanned"] = timestamp
if last_key is not None:
entry["last_key"] = last_key
entry["completed"] = False
data["buckets"][bucket_name] = entry
self.save(data)
def clean_stale(self, existing_buckets: List[str]) -> None:
with self._lock:
data = self.load()
existing_set = set(existing_buckets)
stale_keys = [k for k in data["buckets"] if k not in existing_set]
if stale_keys:
for k in stale_keys:
del data["buckets"][k]
self.save(data)
def get_last_key(self, bucket_name: str) -> Optional[str]:
data = self.load()
entry = data.get("buckets", {}).get(bucket_name)
if entry is None:
return None
return entry.get("last_key")
def get_bucket_order(self, bucket_names: List[str]) -> List[str]:
data = self.load()
buckets_info = data.get("buckets", {})
incomplete = []
complete = []
for name in bucket_names:
entry = buckets_info.get(name)
if entry is None:
incomplete.append((name, 0.0))
elif entry.get("last_key") is not None:
incomplete.append((name, entry.get("last_scanned", 0.0)))
else:
complete.append((name, entry.get("last_scanned", 0.0)))
incomplete.sort(key=lambda x: x[1])
complete.sort(key=lambda x: x[1])
return [n for n, _ in incomplete] + [n for n, _ in complete]
def get_info(self) -> Dict[str, Any]:
data = self.load()
buckets = data.get("buckets", {})
return {
"tracked_buckets": len(buckets),
"buckets": {
name: {
"last_scanned": info.get("last_scanned"),
"last_key": info.get("last_key"),
"completed": info.get("completed", False),
}
for name, info in buckets.items()
},
}
MAX_ISSUES = 500
class IntegrityChecker:
SYSTEM_ROOT = ".myfsio.sys"
SYSTEM_BUCKETS_DIR = "buckets"
BUCKET_META_DIR = "meta"
BUCKET_VERSIONS_DIR = "versions"
INTERNAL_FOLDERS = {".meta", ".versions", ".multipart"}
def __init__(
self,
storage_root: Path,
interval_hours: float = 24.0,
batch_size: int = 1000,
auto_heal: bool = False,
dry_run: bool = False,
max_history: int = 50,
io_throttle_ms: int = 10,
) -> None:
self.storage_root = Path(storage_root)
self.interval_seconds = interval_hours * 3600.0
self.batch_size = batch_size
self.auto_heal = auto_heal
self.dry_run = dry_run
self._timer: Optional[threading.Timer] = None
self._shutdown = False
self._lock = threading.Lock()
self._scanning = False
self._scan_start_time: Optional[float] = None
self._io_throttle = max(0, io_throttle_ms) / 1000.0
self.history_store = IntegrityHistoryStore(storage_root, max_records=max_history)
self.cursor_store = IntegrityCursorStore(self.storage_root)
def start(self) -> None:
if self._timer is not None:
return
self._shutdown = False
self._schedule_next()
logger.info(
"Integrity checker started: interval=%.1fh, batch_size=%d, auto_heal=%s, dry_run=%s",
self.interval_seconds / 3600.0,
self.batch_size,
self.auto_heal,
self.dry_run,
)
def stop(self) -> None:
self._shutdown = True
if self._timer:
self._timer.cancel()
self._timer = None
logger.info("Integrity checker stopped")
def _schedule_next(self) -> None:
if self._shutdown:
return
self._timer = threading.Timer(self.interval_seconds, self._run_cycle)
self._timer.daemon = True
self._timer.start()
def _run_cycle(self) -> None:
if self._shutdown:
return
try:
self.run_now()
except Exception as e:
logger.error("Integrity check cycle failed: %s", e)
finally:
self._schedule_next()
def run_now(self, auto_heal: Optional[bool] = None, dry_run: Optional[bool] = None) -> IntegrityResult:
if not self._lock.acquire(blocking=False):
raise RuntimeError("Integrity scan is already in progress")
try:
self._scanning = True
self._scan_start_time = time.time()
effective_auto_heal = auto_heal if auto_heal is not None else self.auto_heal
effective_dry_run = dry_run if dry_run is not None else self.dry_run
start = self._scan_start_time
result = IntegrityResult()
bucket_names = self._list_bucket_names()
self.cursor_store.clean_stale(bucket_names)
ordered_buckets = self.cursor_store.get_bucket_order(bucket_names)
for bucket_name in ordered_buckets:
if self._batch_exhausted(result):
break
result.buckets_scanned += 1
cursor_key = self.cursor_store.get_last_key(bucket_name)
key_corrupted = self._check_corrupted_objects(bucket_name, result, effective_auto_heal, effective_dry_run, cursor_key)
key_orphaned = self._check_orphaned_objects(bucket_name, result, effective_auto_heal, effective_dry_run, cursor_key)
key_phantom = self._check_phantom_metadata(bucket_name, result, effective_auto_heal, effective_dry_run, cursor_key)
self._check_stale_versions(bucket_name, result, effective_auto_heal, effective_dry_run)
self._check_etag_cache(bucket_name, result, effective_auto_heal, effective_dry_run)
self._check_legacy_metadata(bucket_name, result, effective_auto_heal, effective_dry_run)
returned_keys = [k for k in (key_corrupted, key_orphaned, key_phantom) if k is not None]
bucket_exhausted = self._batch_exhausted(result)
if bucket_exhausted and returned_keys:
self.cursor_store.update_bucket(bucket_name, time.time(), last_key=min(returned_keys))
else:
self.cursor_store.update_bucket(bucket_name, time.time(), completed=True)
result.execution_time_seconds = time.time() - start
if result.has_issues or result.errors:
logger.info(
"Integrity check completed in %.2fs: corrupted=%d, orphaned=%d, phantom=%d, "
"stale_versions=%d, etag_cache=%d, legacy_drift=%d, healed=%d, errors=%d%s",
result.execution_time_seconds,
result.corrupted_objects,
result.orphaned_objects,
result.phantom_metadata,
result.stale_versions,
result.etag_cache_inconsistencies,
result.legacy_metadata_drifts,
result.issues_healed,
len(result.errors),
" (dry run)" if effective_dry_run else "",
)
record = IntegrityExecutionRecord(
timestamp=time.time(),
result=result.to_dict(),
dry_run=effective_dry_run,
auto_heal=effective_auto_heal,
)
self.history_store.add(record)
return result
finally:
self._scanning = False
self._scan_start_time = None
self._lock.release()
def run_async(self, auto_heal: Optional[bool] = None, dry_run: Optional[bool] = None) -> bool:
if self._scanning:
return False
t = threading.Thread(target=self.run_now, args=(auto_heal, dry_run), daemon=True)
t.start()
return True
def _system_path(self) -> Path:
return self.storage_root / self.SYSTEM_ROOT
def _list_bucket_names(self) -> List[str]:
names = []
try:
for entry in self.storage_root.iterdir():
if entry.is_dir() and entry.name != self.SYSTEM_ROOT:
names.append(entry.name)
except OSError:
pass
return names
def _throttle(self) -> bool:
if self._shutdown:
return True
if self._io_throttle > 0:
time.sleep(self._io_throttle)
return self._shutdown
def _batch_exhausted(self, result: IntegrityResult) -> bool:
return self._shutdown or result.objects_scanned >= self.batch_size
def _add_issue(self, result: IntegrityResult, issue: IntegrityIssue) -> None:
if len(result.issues) < MAX_ISSUES:
result.issues.append(issue)
def _collect_index_keys(
self, meta_root: Path, cursor_key: Optional[str] = None,
) -> Dict[str, Dict[str, Any]]:
all_keys: Dict[str, Dict[str, Any]] = {}
if not meta_root.exists():
return all_keys
try:
for index_file in meta_root.rglob("_index.json"):
if not index_file.is_file():
continue
rel_dir = index_file.parent.relative_to(meta_root)
dir_prefix = "" if rel_dir == Path(".") else rel_dir.as_posix()
if cursor_key is not None and dir_prefix:
full_prefix = dir_prefix + "/"
if not cursor_key.startswith(full_prefix) and cursor_key > full_prefix:
continue
try:
index_data = json.loads(index_file.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
continue
for key_name, entry in index_data.items():
full_key = (dir_prefix + "/" + key_name) if dir_prefix else key_name
if cursor_key is not None and full_key <= cursor_key:
continue
all_keys[full_key] = {
"entry": entry,
"index_file": index_file,
"key_name": key_name,
}
except OSError:
pass
return all_keys
def _walk_bucket_files_sorted(
self, bucket_path: Path, cursor_key: Optional[str] = None,
):
def _walk(dir_path: Path, prefix: str):
try:
entries = list(os.scandir(dir_path))
except OSError:
return
def _sort_key(e):
if e.is_dir(follow_symlinks=False):
return e.name + "/"
return e.name
entries.sort(key=_sort_key)
for entry in entries:
if entry.is_dir(follow_symlinks=False):
if not prefix and entry.name in self.INTERNAL_FOLDERS:
continue
new_prefix = (prefix + "/" + entry.name) if prefix else entry.name
if cursor_key is not None:
full_prefix = new_prefix + "/"
if not cursor_key.startswith(full_prefix) and cursor_key > full_prefix:
continue
yield from _walk(Path(entry.path), new_prefix)
elif entry.is_file(follow_symlinks=False):
full_key = (prefix + "/" + entry.name) if prefix else entry.name
if cursor_key is not None and full_key <= cursor_key:
continue
yield full_key
yield from _walk(bucket_path, "")
def _check_corrupted_objects(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool,
cursor_key: Optional[str] = None,
) -> Optional[str]:
if self._batch_exhausted(result):
return None
bucket_path = self.storage_root / bucket_name
meta_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_META_DIR
if not meta_root.exists():
return None
last_key = None
try:
all_keys = self._collect_index_keys(meta_root, cursor_key)
sorted_keys = sorted(all_keys.keys())
for full_key in sorted_keys:
if self._throttle():
return last_key
if self._batch_exhausted(result):
return last_key
info = all_keys[full_key]
entry = info["entry"]
index_file = info["index_file"]
key_name = info["key_name"]
object_path = bucket_path / full_key
if not object_path.exists():
continue
result.objects_scanned += 1
last_key = full_key
meta = entry.get("metadata", {}) if isinstance(entry, dict) else {}
stored_etag = meta.get("__etag__")
if not stored_etag:
continue
try:
actual_etag = _compute_etag(object_path)
except OSError:
continue
if actual_etag != stored_etag:
result.corrupted_objects += 1
issue = IntegrityIssue(
issue_type="corrupted_object",
bucket=bucket_name,
key=full_key,
detail=f"stored_etag={stored_etag} actual_etag={actual_etag}",
)
if auto_heal and not dry_run:
try:
stat = object_path.stat()
meta["__etag__"] = actual_etag
meta["__size__"] = str(stat.st_size)
meta["__last_modified__"] = str(stat.st_mtime)
try:
index_data = json.loads(index_file.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
index_data = {}
index_data[key_name] = {"metadata": meta}
self._atomic_write_index(index_file, index_data)
issue.healed = True
issue.heal_action = "updated etag in index"
result.issues_healed += 1
except OSError as e:
result.errors.append(f"heal corrupted {bucket_name}/{full_key}: {e}")
self._add_issue(result, issue)
except OSError as e:
result.errors.append(f"check corrupted {bucket_name}: {e}")
return last_key
def _check_orphaned_objects(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool,
cursor_key: Optional[str] = None,
) -> Optional[str]:
if self._batch_exhausted(result):
return None
bucket_path = self.storage_root / bucket_name
meta_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_META_DIR
last_key = None
try:
for full_key in self._walk_bucket_files_sorted(bucket_path, cursor_key):
if self._throttle():
return last_key
if self._batch_exhausted(result):
return last_key
result.objects_scanned += 1
last_key = full_key
key_path = Path(full_key)
key_name = key_path.name
parent = key_path.parent
if parent == Path("."):
index_path = meta_root / "_index.json"
else:
index_path = meta_root / parent / "_index.json"
has_entry = False
if index_path.exists():
try:
index_data = json.loads(index_path.read_text(encoding="utf-8"))
has_entry = key_name in index_data
except (OSError, json.JSONDecodeError):
pass
if not has_entry:
result.orphaned_objects += 1
issue = IntegrityIssue(
issue_type="orphaned_object",
bucket=bucket_name,
key=full_key,
detail="file exists without metadata entry",
)
if auto_heal and not dry_run:
try:
object_path = bucket_path / full_key
etag = _compute_etag(object_path)
stat = object_path.stat()
meta = {
"__etag__": etag,
"__size__": str(stat.st_size),
"__last_modified__": str(stat.st_mtime),
}
index_data = {}
if index_path.exists():
try:
index_data = json.loads(index_path.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
pass
index_data[key_name] = {"metadata": meta}
self._atomic_write_index(index_path, index_data)
issue.healed = True
issue.heal_action = "created metadata entry"
result.issues_healed += 1
except OSError as e:
result.errors.append(f"heal orphaned {bucket_name}/{full_key}: {e}")
self._add_issue(result, issue)
except OSError as e:
result.errors.append(f"check orphaned {bucket_name}: {e}")
return last_key
def _check_phantom_metadata(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool,
cursor_key: Optional[str] = None,
) -> Optional[str]:
if self._batch_exhausted(result):
return None
bucket_path = self.storage_root / bucket_name
meta_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_META_DIR
if not meta_root.exists():
return None
last_key = None
try:
all_keys = self._collect_index_keys(meta_root, cursor_key)
sorted_keys = sorted(all_keys.keys())
heal_by_index: Dict[Path, List[str]] = {}
for full_key in sorted_keys:
if self._batch_exhausted(result):
break
result.objects_scanned += 1
last_key = full_key
object_path = bucket_path / full_key
if not object_path.exists():
result.phantom_metadata += 1
info = all_keys[full_key]
issue = IntegrityIssue(
issue_type="phantom_metadata",
bucket=bucket_name,
key=full_key,
detail="metadata entry without file on disk",
)
if auto_heal and not dry_run:
index_file = info["index_file"]
heal_by_index.setdefault(index_file, []).append(info["key_name"])
issue.healed = True
issue.heal_action = "removed stale index entry"
result.issues_healed += 1
self._add_issue(result, issue)
if heal_by_index and auto_heal and not dry_run:
for index_file, keys_to_remove in heal_by_index.items():
try:
index_data = json.loads(index_file.read_text(encoding="utf-8"))
for k in keys_to_remove:
index_data.pop(k, None)
if index_data:
self._atomic_write_index(index_file, index_data)
else:
index_file.unlink(missing_ok=True)
except OSError as e:
result.errors.append(f"heal phantom {bucket_name}: {e}")
except OSError as e:
result.errors.append(f"check phantom {bucket_name}: {e}")
return last_key
def _check_stale_versions(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool
) -> None:
if self._batch_exhausted(result):
return
versions_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_VERSIONS_DIR
if not versions_root.exists():
return
try:
for key_dir in versions_root.rglob("*"):
if self._throttle():
return
if self._batch_exhausted(result):
return
if not key_dir.is_dir():
continue
bin_files = {f.stem: f for f in key_dir.glob("*.bin")}
json_files = {f.stem: f for f in key_dir.glob("*.json")}
for stem, bin_file in bin_files.items():
if self._batch_exhausted(result):
return
result.objects_scanned += 1
if stem not in json_files:
result.stale_versions += 1
issue = IntegrityIssue(
issue_type="stale_version",
bucket=bucket_name,
key=f"{key_dir.relative_to(versions_root).as_posix()}/{bin_file.name}",
detail="version data without manifest",
)
if auto_heal and not dry_run:
try:
bin_file.unlink(missing_ok=True)
issue.healed = True
issue.heal_action = "removed orphaned version data"
result.issues_healed += 1
except OSError as e:
result.errors.append(f"heal stale version {bin_file}: {e}")
self._add_issue(result, issue)
for stem, json_file in json_files.items():
if self._batch_exhausted(result):
return
result.objects_scanned += 1
if stem not in bin_files:
result.stale_versions += 1
issue = IntegrityIssue(
issue_type="stale_version",
bucket=bucket_name,
key=f"{key_dir.relative_to(versions_root).as_posix()}/{json_file.name}",
detail="version manifest without data",
)
if auto_heal and not dry_run:
try:
json_file.unlink(missing_ok=True)
issue.healed = True
issue.heal_action = "removed orphaned version manifest"
result.issues_healed += 1
except OSError as e:
result.errors.append(f"heal stale version {json_file}: {e}")
self._add_issue(result, issue)
except OSError as e:
result.errors.append(f"check stale versions {bucket_name}: {e}")
def _check_etag_cache(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool
) -> None:
if self._batch_exhausted(result):
return
etag_index_path = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / "etag_index.json"
if not etag_index_path.exists():
return
meta_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_META_DIR
if not meta_root.exists():
return
try:
etag_cache = json.loads(etag_index_path.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
return
found_mismatch = False
for full_key, cached_etag in etag_cache.items():
if self._batch_exhausted(result):
break
result.objects_scanned += 1
key_path = Path(full_key)
key_name = key_path.name
parent = key_path.parent
if parent == Path("."):
index_path = meta_root / "_index.json"
else:
index_path = meta_root / parent / "_index.json"
if not index_path.exists():
continue
try:
index_data = json.loads(index_path.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
continue
entry = index_data.get(key_name)
if not entry:
continue
meta = entry.get("metadata", {}) if isinstance(entry, dict) else {}
stored_etag = meta.get("__etag__")
if stored_etag and cached_etag != stored_etag:
result.etag_cache_inconsistencies += 1
found_mismatch = True
issue = IntegrityIssue(
issue_type="etag_cache_inconsistency",
bucket=bucket_name,
key=full_key,
detail=f"cached_etag={cached_etag} index_etag={stored_etag}",
)
self._add_issue(result, issue)
if found_mismatch and auto_heal and not dry_run:
try:
etag_index_path.unlink(missing_ok=True)
for issue in result.issues:
if issue.issue_type == "etag_cache_inconsistency" and issue.bucket == bucket_name and not issue.healed:
issue.healed = True
issue.heal_action = "deleted etag_index.json"
result.issues_healed += 1
except OSError as e:
result.errors.append(f"heal etag cache {bucket_name}: {e}")
def _check_legacy_metadata(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool
) -> None:
if self._batch_exhausted(result):
return
legacy_meta_root = self.storage_root / bucket_name / ".meta"
if not legacy_meta_root.exists():
return
meta_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_META_DIR
try:
for meta_file in legacy_meta_root.rglob("*.meta.json"):
if self._throttle():
return
if self._batch_exhausted(result):
return
if not meta_file.is_file():
continue
result.objects_scanned += 1
try:
rel = meta_file.relative_to(legacy_meta_root)
except ValueError:
continue
full_key = rel.as_posix().removesuffix(".meta.json")
key_path = Path(full_key)
key_name = key_path.name
parent = key_path.parent
if parent == Path("."):
index_path = meta_root / "_index.json"
else:
index_path = meta_root / parent / "_index.json"
try:
legacy_data = json.loads(meta_file.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
continue
index_entry = None
if index_path.exists():
try:
index_data = json.loads(index_path.read_text(encoding="utf-8"))
index_entry = index_data.get(key_name)
except (OSError, json.JSONDecodeError):
pass
if index_entry is None:
result.legacy_metadata_drifts += 1
issue = IntegrityIssue(
issue_type="legacy_metadata_drift",
bucket=bucket_name,
key=full_key,
detail="unmigrated legacy .meta.json",
)
if auto_heal and not dry_run:
try:
index_data = {}
if index_path.exists():
try:
index_data = json.loads(index_path.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
pass
index_data[key_name] = {"metadata": legacy_data}
self._atomic_write_index(index_path, index_data)
meta_file.unlink(missing_ok=True)
issue.healed = True
issue.heal_action = "migrated to index and deleted legacy file"
result.issues_healed += 1
except OSError as e:
result.errors.append(f"heal legacy {bucket_name}/{full_key}: {e}")
self._add_issue(result, issue)
else:
index_meta = index_entry.get("metadata", {}) if isinstance(index_entry, dict) else {}
if legacy_data != index_meta:
result.legacy_metadata_drifts += 1
issue = IntegrityIssue(
issue_type="legacy_metadata_drift",
bucket=bucket_name,
key=full_key,
detail="legacy .meta.json differs from index entry",
)
if auto_heal and not dry_run:
try:
meta_file.unlink(missing_ok=True)
issue.healed = True
issue.heal_action = "deleted legacy file (index is authoritative)"
result.issues_healed += 1
except OSError as e:
result.errors.append(f"heal legacy drift {bucket_name}/{full_key}: {e}")
self._add_issue(result, issue)
except OSError as e:
result.errors.append(f"check legacy meta {bucket_name}: {e}")
@staticmethod
def _atomic_write_index(index_path: Path, data: Dict[str, Any]) -> None:
index_path.parent.mkdir(parents=True, exist_ok=True)
tmp_path = index_path.with_suffix(".tmp")
try:
with open(tmp_path, "w", encoding="utf-8") as f:
json.dump(data, f)
os.replace(str(tmp_path), str(index_path))
except BaseException:
try:
tmp_path.unlink(missing_ok=True)
except OSError:
pass
raise
def get_history(self, limit: int = 50, offset: int = 0) -> List[dict]:
records = self.history_store.get_history(limit, offset)
return [r.to_dict() for r in records]
def get_status(self) -> dict:
status: Dict[str, Any] = {
"enabled": not self._shutdown or self._timer is not None,
"running": self._timer is not None and not self._shutdown,
"scanning": self._scanning,
"interval_hours": self.interval_seconds / 3600.0,
"batch_size": self.batch_size,
"auto_heal": self.auto_heal,
"dry_run": self.dry_run,
"io_throttle_ms": round(self._io_throttle * 1000),
}
if self._scanning and self._scan_start_time is not None:
status["scan_elapsed_seconds"] = round(time.time() - self._scan_start_time, 1)
status["cursor"] = self.cursor_store.get_info()
return status

View File

@@ -19,6 +19,10 @@ from defusedxml.ElementTree import fromstring
try:
import myfsio_core as _rc
if not all(hasattr(_rc, f) for f in (
"verify_sigv4_signature", "derive_signing_key", "clear_signing_key_cache",
)):
raise ImportError("myfsio_core is outdated, rebuild with: cd myfsio_core && maturin develop --release")
_HAS_RUST = True
except ImportError:
_rc = None
@@ -201,6 +205,11 @@ _SIGNING_KEY_CACHE_LOCK = threading.Lock()
_SIGNING_KEY_CACHE_TTL = 60.0
_SIGNING_KEY_CACHE_MAX_SIZE = 256
_SIGV4_HEADER_RE = re.compile(
r"AWS4-HMAC-SHA256 Credential=([^/]+)/([^/]+)/([^/]+)/([^/]+)/aws4_request, SignedHeaders=([^,]+), Signature=(.+)"
)
_SIGV4_REQUIRED_HEADERS = frozenset({'host', 'x-amz-date'})
def clear_signing_key_cache() -> None:
if _HAS_RUST:
@@ -259,10 +268,7 @@ def _get_canonical_uri(req: Any) -> str:
def _verify_sigv4_header(req: Any, auth_header: str) -> Principal | None:
match = re.match(
r"AWS4-HMAC-SHA256 Credential=([^/]+)/([^/]+)/([^/]+)/([^/]+)/aws4_request, SignedHeaders=([^,]+), Signature=(.+)",
auth_header,
)
match = _SIGV4_HEADER_RE.match(auth_header)
if not match:
return None
@@ -286,14 +292,9 @@ def _verify_sigv4_header(req: Any, auth_header: str) -> Principal | None:
if time_diff > tolerance:
raise IamError("Request timestamp too old or too far in the future")
required_headers = {'host', 'x-amz-date'}
signed_headers_set = set(signed_headers_str.split(';'))
if not required_headers.issubset(signed_headers_set):
if 'date' in signed_headers_set:
required_headers.remove('x-amz-date')
required_headers.add('date')
if not required_headers.issubset(signed_headers_set):
if not _SIGV4_REQUIRED_HEADERS.issubset(signed_headers_set):
if not ({'host', 'date'}.issubset(signed_headers_set)):
raise IamError("Required headers not signed")
canonical_uri = _get_canonical_uri(req)
@@ -301,7 +302,12 @@ def _verify_sigv4_header(req: Any, auth_header: str) -> Principal | None:
if _HAS_RUST:
query_params = list(req.args.items(multi=True))
header_values = [(h, req.headers.get(h) or "") for h in signed_headers_str.split(";")]
header_values = []
for h in signed_headers_str.split(";"):
val = req.headers.get(h) or ""
if h.lower() == "expect" and val == "":
val = "100-continue"
header_values.append((h, val))
if not _rc.verify_sigv4_signature(
req.method, canonical_uri, query_params, signed_headers_str,
header_values, payload_hash, amz_date, date_stamp, region,
@@ -390,7 +396,12 @@ def _verify_sigv4_query(req: Any) -> Principal | None:
if _HAS_RUST:
query_params = [(k, v) for k, v in req.args.items(multi=True) if k != "X-Amz-Signature"]
header_values = [(h, req.headers.get(h) or "") for h in signed_headers_str.split(";")]
header_values = []
for h in signed_headers_str.split(";"):
val = req.headers.get(h) or ""
if h.lower() == "expect" and val == "":
val = "100-continue"
header_values.append((h, val))
if not _rc.verify_sigv4_signature(
req.method, canonical_uri, query_params, signed_headers_str,
header_values, "UNSIGNED-PAYLOAD", amz_date, date_stamp, region,
@@ -488,7 +499,7 @@ def _authorize_action(principal: Principal | None, bucket_name: str | None, acti
iam_error: IamError | None = None
if principal is not None:
try:
_iam().authorize(principal, bucket_name, action)
_iam().authorize(principal, bucket_name, action, object_key=object_key)
iam_allowed = True
except IamError as exc:
iam_error = exc
@@ -523,21 +534,6 @@ def _authorize_action(principal: Principal | None, bucket_name: str | None, acti
raise iam_error or IamError("Access denied")
def _enforce_bucket_policy(principal: Principal | None, bucket_name: str | None, object_key: str | None, action: str) -> None:
if not bucket_name:
return
policy_context = _build_policy_context()
decision = _bucket_policies().evaluate(
principal.access_key if principal else None,
bucket_name,
object_key,
action,
policy_context,
)
if decision == "deny":
raise IamError("Access denied by bucket policy")
def _object_principal(action: str, bucket_name: str, object_key: str):
principal, error = _require_principal()
try:
@@ -546,121 +542,7 @@ def _object_principal(action: str, bucket_name: str, object_key: str):
except IamError as exc:
if not error:
return None, _error_response("AccessDenied", str(exc), 403)
if not _has_presign_params():
return None, error
try:
principal = _validate_presigned_request(action, bucket_name, object_key)
_enforce_bucket_policy(principal, bucket_name, object_key, action)
return principal, None
except IamError as exc:
return None, _error_response("AccessDenied", str(exc), 403)
def _has_presign_params() -> bool:
return bool(request.args.get("X-Amz-Algorithm"))
def _validate_presigned_request(action: str, bucket_name: str, object_key: str) -> Principal:
algorithm = request.args.get("X-Amz-Algorithm")
credential = request.args.get("X-Amz-Credential")
amz_date = request.args.get("X-Amz-Date")
signed_headers = request.args.get("X-Amz-SignedHeaders")
expires = request.args.get("X-Amz-Expires")
signature = request.args.get("X-Amz-Signature")
if not all([algorithm, credential, amz_date, signed_headers, expires, signature]):
raise IamError("Malformed presigned URL")
if algorithm != "AWS4-HMAC-SHA256":
raise IamError("Unsupported signing algorithm")
parts = credential.split("/")
if len(parts) != 5:
raise IamError("Invalid credential scope")
access_key, date_stamp, region, service, terminal = parts
if terminal != "aws4_request":
raise IamError("Invalid credential scope")
config_region = current_app.config["AWS_REGION"]
config_service = current_app.config["AWS_SERVICE"]
if region != config_region or service != config_service:
raise IamError("Credential scope mismatch")
try:
expiry = int(expires)
except ValueError as exc:
raise IamError("Invalid expiration") from exc
min_expiry = current_app.config.get("PRESIGNED_URL_MIN_EXPIRY_SECONDS", 1)
max_expiry = current_app.config.get("PRESIGNED_URL_MAX_EXPIRY_SECONDS", 604800)
if expiry < min_expiry or expiry > max_expiry:
raise IamError(f"Expiration must be between {min_expiry} second(s) and {max_expiry} seconds")
try:
request_time = datetime.strptime(amz_date, "%Y%m%dT%H%M%SZ").replace(tzinfo=timezone.utc)
except ValueError as exc:
raise IamError("Invalid X-Amz-Date") from exc
now = datetime.now(timezone.utc)
tolerance = timedelta(seconds=current_app.config.get("SIGV4_TIMESTAMP_TOLERANCE_SECONDS", 900))
if request_time > now + tolerance:
raise IamError("Request date is too far in the future")
if now > request_time + timedelta(seconds=expiry):
raise IamError("Presigned URL expired")
signed_headers_list = [header.strip().lower() for header in signed_headers.split(";") if header]
signed_headers_list.sort()
canonical_headers = _canonical_headers_from_request(signed_headers_list)
canonical_query = _canonical_query_from_request()
payload_hash = request.args.get("X-Amz-Content-Sha256", "UNSIGNED-PAYLOAD")
canonical_request = "\n".join(
[
request.method,
_canonical_uri(bucket_name, object_key),
canonical_query,
canonical_headers,
";".join(signed_headers_list),
payload_hash,
]
)
hashed_request = hashlib.sha256(canonical_request.encode()).hexdigest()
scope = f"{date_stamp}/{region}/{service}/aws4_request"
string_to_sign = "\n".join([
"AWS4-HMAC-SHA256",
amz_date,
scope,
hashed_request,
])
secret = _iam().secret_for_key(access_key)
signing_key = _derive_signing_key(secret, date_stamp, region, service)
expected = hmac.new(signing_key, string_to_sign.encode(), hashlib.sha256).hexdigest()
if not hmac.compare_digest(expected, signature):
raise IamError("Signature mismatch")
return _iam().principal_for_key(access_key)
def _canonical_query_from_request() -> str:
parts = []
for key in sorted(request.args.keys()):
if key == "X-Amz-Signature":
continue
values = request.args.getlist(key)
encoded_key = quote(str(key), safe="-_.~")
for value in sorted(values):
encoded_value = quote(str(value), safe="-_.~")
parts.append(f"{encoded_key}={encoded_value}")
return "&".join(parts)
def _canonical_headers_from_request(headers: list[str]) -> str:
lines = []
for header in headers:
if header == "host":
api_base = current_app.config.get("API_BASE_URL")
if api_base:
value = urlparse(api_base).netloc
else:
value = request.host
else:
value = request.headers.get(header, "")
canonical_value = " ".join(value.strip().split()) if value else ""
lines.append(f"{header}:{canonical_value}")
return "\n".join(lines) + "\n"
def _canonical_uri(bucket_name: str, object_key: str | None) -> str:
@@ -726,8 +608,8 @@ def _generate_presigned_url(
host = parsed.netloc
scheme = parsed.scheme
else:
host = request.headers.get("X-Forwarded-Host", request.host)
scheme = request.headers.get("X-Forwarded-Proto", request.scheme or "http")
host = request.host
scheme = request.scheme or "http"
canonical_headers = f"host:{host}\n"
canonical_request = "\n".join(
@@ -1000,7 +882,7 @@ def _render_encryption_document(config: dict[str, Any]) -> Element:
return root
def _stream_file(path, chunk_size: int = 256 * 1024):
def _stream_file(path, chunk_size: int = 1024 * 1024):
with path.open("rb") as handle:
while True:
chunk = handle.read(chunk_size)
@@ -1135,7 +1017,7 @@ def _bucket_versioning_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
_authorize_action(principal, bucket_name, "versioning")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
storage = _storage()
@@ -1182,7 +1064,7 @@ def _bucket_tagging_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
_authorize_action(principal, bucket_name, "tagging")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
storage = _storage()
@@ -1347,7 +1229,7 @@ def _bucket_cors_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
_authorize_action(principal, bucket_name, "cors")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
storage = _storage()
@@ -1400,7 +1282,7 @@ def _bucket_encryption_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
_authorize_action(principal, bucket_name, "encryption")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
storage = _storage()
@@ -1475,7 +1357,7 @@ def _bucket_acl_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
_authorize_action(principal, bucket_name, "share")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
storage = _storage()
@@ -1723,7 +1605,7 @@ def _bucket_lifecycle_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
_authorize_action(principal, bucket_name, "lifecycle")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
@@ -1887,7 +1769,7 @@ def _bucket_quota_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
_authorize_action(principal, bucket_name, "quota")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
@@ -1964,7 +1846,7 @@ def _bucket_object_lock_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
_authorize_action(principal, bucket_name, "object_lock")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
@@ -2010,7 +1892,7 @@ def _bucket_notification_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
_authorize_action(principal, bucket_name, "notification")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
@@ -2106,7 +1988,7 @@ def _bucket_logging_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
_authorize_action(principal, bucket_name, "logging")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
@@ -2248,7 +2130,7 @@ def _object_retention_handler(bucket_name: str, object_key: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "write" if request.method == "PUT" else "read", object_key=object_key)
_authorize_action(principal, bucket_name, "object_lock", object_key=object_key)
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
@@ -2324,7 +2206,7 @@ def _object_legal_hold_handler(bucket_name: str, object_key: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "write" if request.method == "PUT" else "read", object_key=object_key)
_authorize_action(principal, bucket_name, "object_lock", object_key=object_key)
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
@@ -2657,7 +2539,7 @@ def bucket_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "write")
_authorize_action(principal, bucket_name, "create_bucket")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
try:
@@ -2674,7 +2556,7 @@ def bucket_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "delete")
_authorize_action(principal, bucket_name, "delete_bucket")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
try:
@@ -2951,9 +2833,12 @@ def object_handler(bucket_name: str, object_key: str):
is_encrypted = "x-amz-server-side-encryption" in metadata
cond_etag = metadata.get("__etag__")
_etag_was_healed = False
if not cond_etag and not is_encrypted:
try:
cond_etag = storage._compute_etag(path)
_etag_was_healed = True
storage.heal_missing_etag(bucket_name, object_key, cond_etag)
except OSError:
cond_etag = None
if cond_etag:
@@ -2999,7 +2884,7 @@ def object_handler(bucket_name: str, object_key: str):
try:
stat = path.stat()
file_size = stat.st_size
etag = metadata.get("__etag__") or storage._compute_etag(path)
etag = cond_etag or storage._compute_etag(path)
except PermissionError:
return _error_response("AccessDenied", "Permission denied accessing object", 403)
except OSError as exc:
@@ -3047,7 +2932,7 @@ def object_handler(bucket_name: str, object_key: str):
try:
stat = path.stat()
response = Response(status=200)
etag = metadata.get("__etag__") or storage._compute_etag(path)
etag = cond_etag or storage._compute_etag(path)
except PermissionError:
return _error_response("AccessDenied", "Permission denied accessing object", 403)
except OSError as exc:
@@ -3229,7 +3114,7 @@ def _bucket_replication_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
_authorize_action(principal, bucket_name, "replication")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
storage = _storage()
@@ -3312,7 +3197,7 @@ def _bucket_website_handler(bucket_name: str) -> Response:
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
_authorize_action(principal, bucket_name, "website")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
storage = _storage()
@@ -3432,9 +3317,13 @@ def head_object(bucket_name: str, object_key: str) -> Response:
return error
try:
_authorize_action(principal, bucket_name, "read", object_key=object_key)
path = _storage().get_object_path(bucket_name, object_key)
metadata = _storage().get_object_metadata(bucket_name, object_key)
etag = metadata.get("__etag__") or _storage()._compute_etag(path)
storage = _storage()
path = storage.get_object_path(bucket_name, object_key)
metadata = storage.get_object_metadata(bucket_name, object_key)
etag = metadata.get("__etag__")
if not etag:
etag = storage._compute_etag(path)
storage.heal_missing_etag(bucket_name, object_key, etag)
head_mtime = float(metadata["__last_modified__"]) if "__last_modified__" in metadata else None
if head_mtime is None:

View File

@@ -2,6 +2,7 @@ from __future__ import annotations
import hashlib
import json
import logging
import os
import re
import shutil
@@ -20,12 +21,21 @@ from typing import Any, BinaryIO, Dict, Generator, List, Optional
try:
import myfsio_core as _rc
if not all(hasattr(_rc, f) for f in (
"validate_bucket_name", "validate_object_key", "md5_file",
"shallow_scan", "bucket_stats_scan", "search_objects_scan",
"stream_to_file_with_md5", "assemble_parts_with_md5",
"build_object_cache", "read_index_entry", "write_index_entry",
"delete_index_entry", "check_bucket_contents",
)):
raise ImportError("myfsio_core is outdated, rebuild with: cd myfsio_core && maturin develop --release")
_HAS_RUST = True
except ImportError:
_rc = None
_HAS_RUST = False
# Platform-specific file locking
logger = logging.getLogger(__name__)
if os.name == "nt":
import msvcrt
@@ -190,6 +200,7 @@ class ObjectStorage:
object_cache_max_size: int = 100,
bucket_config_cache_ttl: float = 30.0,
object_key_max_length_bytes: int = 1024,
meta_read_cache_max: int = 2048,
) -> None:
self.root = Path(root)
self.root.mkdir(parents=True, exist_ok=True)
@@ -208,7 +219,7 @@ class ObjectStorage:
self._sorted_key_cache: Dict[str, tuple[list[str], int]] = {}
self._meta_index_locks: Dict[str, threading.Lock] = {}
self._meta_read_cache: OrderedDict[tuple, Optional[Dict[str, Any]]] = OrderedDict()
self._meta_read_cache_max = 2048
self._meta_read_cache_max = meta_read_cache_max
self._cleanup_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="ParentCleanup")
self._stats_mem: Dict[str, Dict[str, int]] = {}
self._stats_serial: Dict[str, int] = {}
@@ -218,6 +229,7 @@ class ObjectStorage:
self._stats_flush_timer: Optional[threading.Timer] = None
self._etag_index_dirty: set[str] = set()
self._etag_index_flush_timer: Optional[threading.Timer] = None
self._etag_index_mem: Dict[str, tuple[Dict[str, str], float]] = {}
def _get_bucket_lock(self, bucket_id: str) -> threading.Lock:
with self._registry_lock:
@@ -406,6 +418,10 @@ class ObjectStorage:
self._stats_serial[bucket_id] = self._stats_serial.get(bucket_id, 0) + 1
self._stats_mem_time[bucket_id] = time.monotonic()
self._stats_dirty.add(bucket_id)
needs_immediate = data["objects"] == 0 and objects_delta < 0
if needs_immediate:
self._flush_stats()
else:
self._schedule_stats_flush()
def _schedule_stats_flush(self) -> None:
@@ -423,7 +439,7 @@ class ObjectStorage:
cache_path = self._system_bucket_root(bucket_id) / "stats.json"
try:
cache_path.parent.mkdir(parents=True, exist_ok=True)
self._atomic_write_json(cache_path, data)
self._atomic_write_json(cache_path, data, sync=False)
except OSError:
pass
@@ -598,14 +614,7 @@ class ObjectStorage:
is_truncated=False, next_continuation_token=None,
)
etag_index_path = self._system_bucket_root(bucket_id) / "etag_index.json"
meta_cache: Dict[str, str] = {}
if etag_index_path.exists():
try:
with open(etag_index_path, 'r', encoding='utf-8') as f:
meta_cache = json.load(f)
except (OSError, json.JSONDecodeError):
pass
meta_cache: Dict[str, str] = self._get_etag_index(bucket_id)
entries_files: list[tuple[str, int, float, Optional[str]]] = []
entries_dirs: list[str] = []
@@ -710,6 +719,73 @@ class ObjectStorage:
next_continuation_token=next_token,
)
def iter_objects_shallow(
self,
bucket_name: str,
*,
prefix: str = "",
delimiter: str = "/",
) -> Generator[tuple[str, ObjectMeta | str], None, None]:
bucket_path = self._bucket_path(bucket_name)
if not bucket_path.exists():
raise BucketNotFoundError("Bucket does not exist")
bucket_id = bucket_path.name
target_dir = bucket_path
if prefix:
safe_prefix_path = Path(prefix.rstrip("/"))
if ".." in safe_prefix_path.parts:
return
target_dir = bucket_path / safe_prefix_path
try:
resolved = target_dir.resolve()
bucket_resolved = bucket_path.resolve()
if not str(resolved).startswith(str(bucket_resolved) + os.sep) and resolved != bucket_resolved:
return
except (OSError, ValueError):
return
if not target_dir.exists() or not target_dir.is_dir():
return
etag_index_path = self._system_bucket_root(bucket_id) / "etag_index.json"
meta_cache: Dict[str, str] = {}
if etag_index_path.exists():
try:
with open(etag_index_path, 'r', encoding='utf-8') as f:
meta_cache = json.load(f)
except (OSError, json.JSONDecodeError):
pass
try:
with os.scandir(str(target_dir)) as it:
for entry in it:
name = entry.name
if name in self.INTERNAL_FOLDERS:
continue
if entry.is_dir(follow_symlinks=False):
yield ("folder", prefix + name + delimiter)
elif entry.is_file(follow_symlinks=False):
key = prefix + name
try:
st = entry.stat()
etag = meta_cache.get(key)
if etag is None:
safe_key = PurePosixPath(key)
meta = self._read_metadata(bucket_id, Path(safe_key))
etag = meta.get("__etag__") if meta else None
yield ("object", ObjectMeta(
key=key,
size=st.st_size,
last_modified=datetime.fromtimestamp(st.st_mtime, timezone.utc),
etag=etag,
metadata=None,
))
except OSError:
pass
except OSError:
return
def _shallow_via_full_scan(
self,
bucket_name: str,
@@ -1008,6 +1084,30 @@ class ObjectStorage:
safe_key = self._sanitize_object_key(object_key, self._object_key_max_length_bytes)
return self._read_metadata(bucket_path.name, safe_key) or {}
def heal_missing_etag(self, bucket_name: str, object_key: str, etag: str) -> None:
"""Persist a computed ETag back to metadata (self-heal on read)."""
try:
bucket_path = self._bucket_path(bucket_name)
if not bucket_path.exists():
return
bucket_id = bucket_path.name
safe_key = self._sanitize_object_key(object_key, self._object_key_max_length_bytes)
existing = self._read_metadata(bucket_id, safe_key) or {}
if existing.get("__etag__"):
return
existing["__etag__"] = etag
self._write_metadata(bucket_id, safe_key, existing)
with self._obj_cache_lock:
cached = self._object_cache.get(bucket_id)
if cached:
obj = cached[0].get(safe_key.as_posix())
if obj and not obj.etag:
obj.etag = etag
self._etag_index_dirty.add(bucket_id)
self._schedule_etag_index_flush()
except Exception:
logger.warning("Failed to heal missing ETag for %s/%s", bucket_name, object_key)
def _cleanup_empty_parents(self, path: Path, stop_at: Path) -> None:
"""Remove empty parent directories in a background thread.
@@ -2017,6 +2117,7 @@ class ObjectStorage:
etag_index_path.parent.mkdir(parents=True, exist_ok=True)
with open(etag_index_path, 'w', encoding='utf-8') as f:
json.dump(raw["etag_cache"], f)
self._etag_index_mem[bucket_id] = (dict(raw["etag_cache"]), etag_index_path.stat().st_mtime)
except OSError:
pass
for key, size, mtime, etag in raw["objects"]:
@@ -2140,6 +2241,7 @@ class ObjectStorage:
etag_index_path.parent.mkdir(parents=True, exist_ok=True)
with open(etag_index_path, 'w', encoding='utf-8') as f:
json.dump(meta_cache, f)
self._etag_index_mem[bucket_id] = (dict(meta_cache), etag_index_path.stat().st_mtime)
except OSError:
pass
@@ -2253,6 +2355,25 @@ class ObjectStorage:
self._etag_index_dirty.add(bucket_id)
self._schedule_etag_index_flush()
def _get_etag_index(self, bucket_id: str) -> Dict[str, str]:
etag_index_path = self._system_bucket_root(bucket_id) / "etag_index.json"
try:
current_mtime = etag_index_path.stat().st_mtime
except OSError:
return {}
cached = self._etag_index_mem.get(bucket_id)
if cached:
cache_dict, cached_mtime = cached
if current_mtime == cached_mtime:
return cache_dict
try:
with open(etag_index_path, 'r', encoding='utf-8') as f:
data = json.load(f)
self._etag_index_mem[bucket_id] = (data, current_mtime)
return data
except (OSError, json.JSONDecodeError):
return {}
def _schedule_etag_index_flush(self) -> None:
if self._etag_index_flush_timer is None or not self._etag_index_flush_timer.is_alive():
self._etag_index_flush_timer = threading.Timer(5.0, self._flush_etag_indexes)
@@ -2271,11 +2392,10 @@ class ObjectStorage:
index = {k: v.etag for k, v in objects.items() if v.etag}
etag_index_path = self._system_bucket_root(bucket_id) / "etag_index.json"
try:
etag_index_path.parent.mkdir(parents=True, exist_ok=True)
with open(etag_index_path, 'w', encoding='utf-8') as f:
json.dump(index, f)
self._atomic_write_json(etag_index_path, index, sync=False)
self._etag_index_mem[bucket_id] = (index, etag_index_path.stat().st_mtime)
except OSError:
pass
logger.warning("Failed to flush etag index for bucket %s", bucket_id)
def warm_cache(self, bucket_names: Optional[List[str]] = None) -> None:
"""Pre-warm the object cache for specified buckets or all buckets.
@@ -2317,12 +2437,13 @@ class ObjectStorage:
path.mkdir(parents=True, exist_ok=True)
@staticmethod
def _atomic_write_json(path: Path, data: Any) -> None:
def _atomic_write_json(path: Path, data: Any, *, sync: bool = True) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
tmp_path = path.with_suffix(".tmp")
try:
with tmp_path.open("w", encoding="utf-8") as f:
json.dump(data, f)
if sync:
f.flush()
os.fsync(f.fileno())
tmp_path.replace(path)

View File

@@ -225,10 +225,10 @@ def _policy_allows_public_read(policy: dict[str, Any]) -> bool:
def _bucket_access_descriptor(policy: dict[str, Any] | None) -> tuple[str, str]:
if not policy:
return ("IAM only", "text-bg-secondary")
return ("IAM only", "bg-secondary-subtle text-secondary-emphasis")
if _policy_allows_public_read(policy):
return ("Public read", "text-bg-warning")
return ("Custom policy", "text-bg-info")
return ("Public read", "bg-warning-subtle text-warning-emphasis")
return ("Custom policy", "bg-info-subtle text-info-emphasis")
def _current_principal():
@@ -618,20 +618,77 @@ def stream_bucket_objects(bucket_name: str):
prefix = request.args.get("prefix") or None
delimiter = request.args.get("delimiter") or None
storage = _storage()
try:
client = get_session_s3_client()
except (PermissionError, RuntimeError) as exc:
return jsonify({"error": str(exc)}), 403
versioning_enabled = get_versioning_via_s3(client, bucket_name)
versioning_enabled = storage.is_versioning_enabled(bucket_name)
except StorageError:
versioning_enabled = False
url_templates = build_url_templates(bucket_name)
display_tz = current_app.config.get("DISPLAY_TIMEZONE", "UTC")
def generate():
yield json.dumps({
"type": "meta",
"versioning_enabled": versioning_enabled,
"url_templates": url_templates,
}) + "\n"
yield json.dumps({"type": "count", "total_count": 0}) + "\n"
running_count = 0
try:
if delimiter:
for item_type, item in storage.iter_objects_shallow(
bucket_name, prefix=prefix or "", delimiter=delimiter,
):
if item_type == "folder":
yield json.dumps({"type": "folder", "prefix": item}) + "\n"
else:
last_mod = item.last_modified
yield json.dumps({
"type": "object",
"key": item.key,
"size": item.size,
"last_modified": last_mod.isoformat(),
"last_modified_display": _format_datetime_display(last_mod, display_tz),
"last_modified_iso": _format_datetime_iso(last_mod, display_tz),
"etag": item.etag or "",
}) + "\n"
running_count += 1
if running_count % 1000 == 0:
yield json.dumps({"type": "count", "total_count": running_count}) + "\n"
else:
continuation_token = None
while True:
result = storage.list_objects(
bucket_name,
max_keys=1000,
continuation_token=continuation_token,
prefix=prefix,
)
for obj in result.objects:
last_mod = obj.last_modified
yield json.dumps({
"type": "object",
"key": obj.key,
"size": obj.size,
"last_modified": last_mod.isoformat(),
"last_modified_display": _format_datetime_display(last_mod, display_tz),
"last_modified_iso": _format_datetime_iso(last_mod, display_tz),
"etag": obj.etag or "",
}) + "\n"
running_count += len(result.objects)
yield json.dumps({"type": "count", "total_count": running_count}) + "\n"
if not result.is_truncated:
break
continuation_token = result.next_continuation_token
except StorageError as exc:
yield json.dumps({"type": "error", "error": str(exc)}) + "\n"
return
yield json.dumps({"type": "count", "total_count": running_count}) + "\n"
yield json.dumps({"type": "done"}) + "\n"
return Response(
stream_objects_ndjson(
client, bucket_name, prefix, url_templates, display_tz, versioning_enabled,
delimiter=delimiter,
),
generate(),
mimetype='application/x-ndjson',
headers={
'Cache-Control': 'no-cache',
@@ -1006,6 +1063,27 @@ def bulk_delete_objects(bucket_name: str):
return _respond(False, f"A maximum of {MAX_KEYS} objects can be deleted per request", status_code=400)
unique_keys = list(dict.fromkeys(cleaned))
folder_prefixes = [k for k in unique_keys if k.endswith("/")]
if folder_prefixes:
try:
client = get_session_s3_client()
for prefix in folder_prefixes:
unique_keys.remove(prefix)
paginator = client.get_paginator("list_objects_v2")
for page in paginator.paginate(Bucket=bucket_name, Prefix=prefix):
for obj in page.get("Contents", []):
if obj["Key"] not in unique_keys:
unique_keys.append(obj["Key"])
except (ClientError, EndpointConnectionError, ConnectionClosedError) as exc:
if isinstance(exc, ClientError):
err, status = handle_client_error(exc)
return _respond(False, err["error"], status_code=status)
return _respond(False, "S3 API server is unreachable", status_code=502)
if not unique_keys:
return _respond(False, "No objects found under the selected folders", status_code=400)
try:
_authorize_ui(principal, bucket_name, "delete")
except IamError as exc:
@@ -1036,13 +1114,17 @@ def bulk_delete_objects(bucket_name: str):
else:
try:
client = get_session_s3_client()
objects_to_delete = [{"Key": k} for k in unique_keys]
deleted = []
errors = []
for i in range(0, len(unique_keys), 1000):
batch = unique_keys[i:i + 1000]
objects_to_delete = [{"Key": k} for k in batch]
resp = client.delete_objects(
Bucket=bucket_name,
Delete={"Objects": objects_to_delete, "Quiet": False},
)
deleted = [d["Key"] for d in resp.get("Deleted", [])]
errors = [{"key": e["Key"], "error": e.get("Message", e.get("Code", "Unknown error"))} for e in resp.get("Errors", [])]
deleted.extend(d["Key"] for d in resp.get("Deleted", []))
errors.extend({"key": e["Key"], "error": e.get("Message", e.get("Code", "Unknown error"))} for e in resp.get("Errors", []))
for key in deleted:
_replication_manager().trigger_replication(bucket_name, key, action="delete")
except (ClientError, EndpointConnectionError, ConnectionClosedError) as exc:
@@ -4041,6 +4123,182 @@ def get_peer_sync_stats(site_id: str):
return jsonify(stats)
@ui_bp.get("/system")
def system_dashboard():
principal = _current_principal()
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
flash("Access denied: System page requires admin permissions", "danger")
return redirect(url_for("ui.buckets_overview"))
import platform as _platform
import sys
from app.version import APP_VERSION
try:
import myfsio_core as _rc
has_rust = True
except ImportError:
has_rust = False
gc = current_app.extensions.get("gc")
gc_status = gc.get_status() if gc else {"enabled": False}
gc_history_records = []
if gc:
raw = gc.get_history(limit=10, offset=0)
for rec in raw:
r = rec.get("result", {})
total_freed = r.get("temp_bytes_freed", 0) + r.get("multipart_bytes_freed", 0) + r.get("orphaned_version_bytes_freed", 0)
rec["bytes_freed_display"] = _format_bytes(total_freed)
rec["timestamp_display"] = _format_datetime_display(datetime.fromtimestamp(rec["timestamp"], tz=dt_timezone.utc))
gc_history_records.append(rec)
checker = current_app.extensions.get("integrity")
integrity_status = checker.get_status() if checker else {"enabled": False}
integrity_history_records = []
if checker:
raw = checker.get_history(limit=10, offset=0)
for rec in raw:
rec["timestamp_display"] = _format_datetime_display(datetime.fromtimestamp(rec["timestamp"], tz=dt_timezone.utc))
integrity_history_records.append(rec)
features = [
{"label": "Encryption (SSE-S3)", "enabled": current_app.config.get("ENCRYPTION_ENABLED", False)},
{"label": "KMS", "enabled": current_app.config.get("KMS_ENABLED", False)},
{"label": "Versioning Lifecycle", "enabled": current_app.config.get("LIFECYCLE_ENABLED", False)},
{"label": "Metrics History", "enabled": current_app.config.get("METRICS_HISTORY_ENABLED", False)},
{"label": "Operation Metrics", "enabled": current_app.config.get("OPERATION_METRICS_ENABLED", False)},
{"label": "Site Sync", "enabled": current_app.config.get("SITE_SYNC_ENABLED", False)},
{"label": "Website Hosting", "enabled": current_app.config.get("WEBSITE_HOSTING_ENABLED", False)},
{"label": "Garbage Collection", "enabled": current_app.config.get("GC_ENABLED", False)},
{"label": "Integrity Scanner", "enabled": current_app.config.get("INTEGRITY_ENABLED", False)},
]
return render_template(
"system.html",
principal=principal,
app_version=APP_VERSION,
storage_root=current_app.config.get("STORAGE_ROOT", "./data"),
platform=_platform.platform(),
python_version=sys.version.split()[0],
has_rust=has_rust,
features=features,
gc_status=gc_status,
gc_history=gc_history_records,
integrity_status=integrity_status,
integrity_history=integrity_history_records,
display_timezone=current_app.config.get("DISPLAY_TIMEZONE", "UTC"),
)
@ui_bp.post("/system/gc/run")
def system_gc_run():
principal = _current_principal()
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
return jsonify({"error": "Access denied"}), 403
gc = current_app.extensions.get("gc")
if not gc:
return jsonify({"error": "GC is not enabled"}), 400
payload = request.get_json(silent=True) or {}
started = gc.run_async(dry_run=payload.get("dry_run"))
if not started:
return jsonify({"error": "GC is already in progress"}), 409
return jsonify({"status": "started"})
@ui_bp.get("/system/gc/status")
def system_gc_status():
principal = _current_principal()
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
return jsonify({"error": "Access denied"}), 403
gc = current_app.extensions.get("gc")
if not gc:
return jsonify({"error": "GC is not enabled"}), 400
return jsonify(gc.get_status())
@ui_bp.get("/system/gc/history")
def system_gc_history():
principal = _current_principal()
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
return jsonify({"error": "Access denied"}), 403
gc = current_app.extensions.get("gc")
if not gc:
return jsonify({"executions": []})
limit = min(int(request.args.get("limit", 10)), 200)
offset = int(request.args.get("offset", 0))
records = gc.get_history(limit=limit, offset=offset)
return jsonify({"executions": records})
@ui_bp.post("/system/integrity/run")
def system_integrity_run():
principal = _current_principal()
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
return jsonify({"error": "Access denied"}), 403
checker = current_app.extensions.get("integrity")
if not checker:
return jsonify({"error": "Integrity checker is not enabled"}), 400
payload = request.get_json(silent=True) or {}
started = checker.run_async(
auto_heal=payload.get("auto_heal"),
dry_run=payload.get("dry_run"),
)
if not started:
return jsonify({"error": "A scan is already in progress"}), 409
return jsonify({"status": "started"})
@ui_bp.get("/system/integrity/status")
def system_integrity_status():
principal = _current_principal()
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
return jsonify({"error": "Access denied"}), 403
checker = current_app.extensions.get("integrity")
if not checker:
return jsonify({"error": "Integrity checker is not enabled"}), 400
return jsonify(checker.get_status())
@ui_bp.get("/system/integrity/history")
def system_integrity_history():
principal = _current_principal()
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
return jsonify({"error": "Access denied"}), 403
checker = current_app.extensions.get("integrity")
if not checker:
return jsonify({"executions": []})
limit = min(int(request.args.get("limit", 10)), 200)
offset = int(request.args.get("offset", 0))
records = checker.get_history(limit=limit, offset=offset)
return jsonify({"executions": records})
@ui_bp.app_errorhandler(404)
def ui_not_found(error): # type: ignore[override]
prefix = ui_bp.url_prefix or ""

View File

@@ -1,6 +1,6 @@
from __future__ import annotations
APP_VERSION = "0.3.8"
APP_VERSION = "0.4.3"
def get_version() -> str:

View File

@@ -0,0 +1,4 @@
#!/bin/sh
set -e
exec python run.py --prod

View File

@@ -125,7 +125,7 @@ pub fn delete_index_entry(py: Python<'_>, path: &str, entry_name: &str) -> PyRes
fs::write(&path_owned, serialized)
.map_err(|e| PyIOError::new_err(format!("Failed to write index: {}", e)))?;
Ok(false)
Ok(true)
})
}

View File

@@ -1,4 +1,4 @@
Flask>=3.1.2
Flask>=3.1.3
Flask-Limiter>=4.1.1
Flask-Cors>=6.0.2
Flask-WTF>=1.2.2
@@ -6,8 +6,8 @@ python-dotenv>=1.2.1
pytest>=9.0.2
requests>=2.32.5
boto3>=1.42.14
waitress>=3.0.2
psutil>=7.1.3
cryptography>=46.0.3
granian>=2.7.2
psutil>=7.2.2
cryptography>=46.0.5
defusedxml>=0.7.1
duckdb>=1.4.4
duckdb>=1.5.1

View File

@@ -2,7 +2,9 @@
from __future__ import annotations
import argparse
import atexit
import os
import signal
import sys
import warnings
import multiprocessing
@@ -24,6 +26,12 @@ from typing import Optional
from app import create_api_app, create_ui_app
from app.config import AppConfig
from app.iam import IamService, IamError, ALLOWED_ACTIONS, _derive_fernet_key
from app.version import get_version
PYTHON_DEPRECATION_MESSAGE = (
"The Python MyFSIO runtime is deprecated as of 2026-04-21. "
"Use the Rust server in rust/myfsio-engine for supported development and production usage."
)
def _server_host() -> str:
@@ -40,24 +48,42 @@ def _is_frozen() -> bool:
return getattr(sys, 'frozen', False) or '__compiled__' in globals()
def serve_api(port: int, prod: bool = False, config: Optional[AppConfig] = None) -> None:
app = create_api_app()
if prod:
from waitress import serve
def _serve_granian(target: str, port: int, config: Optional[AppConfig] = None) -> None:
from granian import Granian
from granian.constants import Interfaces
from granian.http import HTTP1Settings
kwargs: dict = {
"target": target,
"address": _server_host(),
"port": port,
"interface": Interfaces.WSGI,
"factory": True,
"workers": 1,
}
if config:
serve(
app,
host=_server_host(),
port=port,
ident="MyFSIO",
threads=config.server_threads,
connection_limit=config.server_connection_limit,
backlog=config.server_backlog,
channel_timeout=config.server_channel_timeout,
kwargs["blocking_threads"] = config.server_threads
kwargs["backlog"] = config.server_backlog
kwargs["backpressure"] = config.server_connection_limit
kwargs["http1_settings"] = HTTP1Settings(
header_read_timeout=config.server_channel_timeout * 1000,
max_buffer_size=config.server_max_buffer_size,
)
else:
serve(app, host=_server_host(), port=port, ident="MyFSIO")
kwargs["http1_settings"] = HTTP1Settings(
max_buffer_size=1024 * 1024 * 128,
)
server = Granian(**kwargs)
server.serve()
def serve_api(port: int, prod: bool = False, config: Optional[AppConfig] = None) -> None:
if prod:
_serve_granian("app:create_api_app", port, config)
else:
app = create_api_app()
debug = _is_debug_enabled()
if debug:
warnings.warn("DEBUG MODE ENABLED - DO NOT USE IN PRODUCTION", RuntimeWarning)
@@ -65,23 +91,10 @@ def serve_api(port: int, prod: bool = False, config: Optional[AppConfig] = None)
def serve_ui(port: int, prod: bool = False, config: Optional[AppConfig] = None) -> None:
app = create_ui_app()
if prod:
from waitress import serve
if config:
serve(
app,
host=_server_host(),
port=port,
ident="MyFSIO",
threads=config.server_threads,
connection_limit=config.server_connection_limit,
backlog=config.server_backlog,
channel_timeout=config.server_channel_timeout,
)
else:
serve(app, host=_server_host(), port=port, ident="MyFSIO")
_serve_granian("app:create_ui_app", port, config)
else:
app = create_ui_app()
debug = _is_debug_enabled()
if debug:
warnings.warn("DEBUG MODE ENABLED - DO NOT USE IN PRODUCTION", RuntimeWarning)
@@ -126,6 +139,7 @@ def reset_credentials() -> None:
pass
if raw_config and raw_config.get("users"):
is_v2 = raw_config.get("version", 1) >= 2
admin_user = None
for user in raw_config["users"]:
policies = user.get("policies", [])
@@ -139,15 +153,39 @@ def reset_credentials() -> None:
if not admin_user:
admin_user = raw_config["users"][0]
if is_v2:
admin_keys = admin_user.get("access_keys", [])
if admin_keys:
admin_keys[0]["access_key"] = access_key
admin_keys[0]["secret_key"] = secret_key
else:
from datetime import datetime as _dt, timezone as _tz
admin_user["access_keys"] = [{
"access_key": access_key,
"secret_key": secret_key,
"status": "active",
"created_at": _dt.now(_tz.utc).isoformat(),
}]
else:
admin_user["access_key"] = access_key
admin_user["secret_key"] = secret_key
else:
from datetime import datetime as _dt, timezone as _tz
raw_config = {
"version": 2,
"users": [
{
"user_id": f"u-{secrets.token_hex(8)}",
"display_name": "Local Admin",
"enabled": True,
"access_keys": [
{
"access_key": access_key,
"secret_key": secret_key,
"display_name": "Local Admin",
"status": "active",
"created_at": _dt.now(_tz.utc).isoformat(),
}
],
"policies": [
{"bucket": "*", "actions": list(ALLOWED_ACTIONS)}
],
@@ -192,13 +230,16 @@ if __name__ == "__main__":
parser.add_argument("--mode", choices=["api", "ui", "both", "reset-cred"], default="both")
parser.add_argument("--api-port", type=int, default=5000)
parser.add_argument("--ui-port", type=int, default=5100)
parser.add_argument("--prod", action="store_true", help="Run in production mode using Waitress")
parser.add_argument("--prod", action="store_true", help="Run in production mode using Granian")
parser.add_argument("--dev", action="store_true", help="Force development mode (Flask dev server)")
parser.add_argument("--check-config", action="store_true", help="Validate configuration and exit")
parser.add_argument("--show-config", action="store_true", help="Show configuration summary and exit")
parser.add_argument("--reset-cred", action="store_true", help="Reset admin credentials and exit")
parser.add_argument("--version", action="version", version=f"MyFSIO {get_version()}")
args = parser.parse_args()
warnings.warn(PYTHON_DEPRECATION_MESSAGE, DeprecationWarning, stacklevel=1)
if args.reset_cred or args.mode == "reset-cred":
reset_credentials()
sys.exit(0)
@@ -235,7 +276,7 @@ if __name__ == "__main__":
pass
if prod_mode:
print("Running in production mode (Waitress)")
print("Running in production mode (Granian)")
issues = config.validate_and_report()
critical_issues = [i for i in issues if i.startswith("CRITICAL:")]
if critical_issues:
@@ -248,11 +289,22 @@ if __name__ == "__main__":
if args.mode in {"api", "both"}:
print(f"Starting API server on port {args.api_port}...")
api_proc = Process(target=serve_api, args=(args.api_port, prod_mode, config), daemon=True)
api_proc = Process(target=serve_api, args=(args.api_port, prod_mode, config))
api_proc.start()
else:
api_proc = None
def _cleanup_api():
if api_proc and api_proc.is_alive():
api_proc.terminate()
api_proc.join(timeout=5)
if api_proc.is_alive():
api_proc.kill()
if api_proc:
atexit.register(_cleanup_api)
signal.signal(signal.SIGTERM, lambda *_: sys.exit(0))
if args.mode in {"ui", "both"}:
print(f"Starting UI server on port {args.ui_port}...")
serve_ui(args.ui_port, prod_mode, config)

View File

@@ -15,6 +15,12 @@
--myfsio-hover-bg: rgba(59, 130, 246, 0.12);
--myfsio-accent: #3b82f6;
--myfsio-accent-hover: #2563eb;
--myfsio-tag-key-bg: #e0e7ff;
--myfsio-tag-key-text: #3730a3;
--myfsio-tag-value-bg: #f0f1fa;
--myfsio-tag-value-text: #4338ca;
--myfsio-tag-border: #c7d2fe;
--myfsio-tag-delete-hover: #ef4444;
}
[data-theme='dark'] {
@@ -34,6 +40,12 @@
--myfsio-hover-bg: rgba(59, 130, 246, 0.2);
--myfsio-accent: #60a5fa;
--myfsio-accent-hover: #3b82f6;
--myfsio-tag-key-bg: #312e81;
--myfsio-tag-key-text: #c7d2fe;
--myfsio-tag-value-bg: #1e1b4b;
--myfsio-tag-value-text: #a5b4fc;
--myfsio-tag-border: #4338ca;
--myfsio-tag-delete-hover: #f87171;
}
[data-theme='dark'] body,
@@ -2643,7 +2655,7 @@ pre code {
}
.objects-table-container {
max-height: none;
max-height: 60vh;
}
.preview-card {
@@ -3002,6 +3014,89 @@ body:has(.login-card) .main-wrapper {
padding: 0.375rem 1rem;
}
.tag-pill {
display: inline-flex;
border-radius: 9999px;
border: 1px solid var(--myfsio-tag-border);
overflow: hidden;
font-size: 0.75rem;
line-height: 1;
}
.tag-pill-key {
padding: 0.3rem 0.5rem;
background: var(--myfsio-tag-key-bg);
color: var(--myfsio-tag-key-text);
font-weight: 600;
}
.tag-pill-value {
padding: 0.3rem 0.5rem;
background: var(--myfsio-tag-value-bg);
color: var(--myfsio-tag-value-text);
font-weight: 400;
}
.tag-editor-card {
background: var(--myfsio-preview-bg);
border-radius: 0.5rem;
padding: 0.75rem;
}
.tag-editor-header,
.tag-editor-row {
display: grid;
grid-template-columns: 1fr 1fr 28px;
gap: 0.5rem;
align-items: center;
}
.tag-editor-header {
padding-bottom: 0.375rem;
border-bottom: 1px solid var(--myfsio-card-border);
margin-bottom: 0.5rem;
}
.tag-editor-header span {
font-size: 0.7rem;
font-weight: 600;
text-transform: uppercase;
color: var(--myfsio-muted);
letter-spacing: 0.05em;
}
.tag-editor-row {
margin-bottom: 0.375rem;
}
.tag-editor-delete {
display: inline-flex;
align-items: center;
justify-content: center;
width: 28px;
height: 28px;
border: none;
background: transparent;
color: var(--myfsio-muted);
border-radius: 0.375rem;
cursor: pointer;
transition: color 0.15s, background 0.15s;
}
.tag-editor-delete:hover {
color: var(--myfsio-tag-delete-hover);
background: rgba(239, 68, 68, 0.1);
}
.tag-editor-actions {
display: flex;
align-items: center;
gap: 0.5rem;
margin-top: 0.75rem;
padding-top: 0.5rem;
border-top: 1px solid var(--myfsio-card-border);
}
@media (prefers-reduced-motion: reduce) {
*,
*::before,

View File

Before

Width:  |  Height:  |  Size: 200 KiB

After

Width:  |  Height:  |  Size: 200 KiB

View File

Before

Width:  |  Height:  |  Size: 872 KiB

After

Width:  |  Height:  |  Size: 872 KiB

View File

@@ -98,6 +98,9 @@
const previewMetadata = document.getElementById('preview-metadata');
const previewMetadataList = document.getElementById('preview-metadata-list');
const previewPlaceholder = document.getElementById('preview-placeholder');
const previewPlaceholderDefault = previewPlaceholder ? previewPlaceholder.innerHTML : '';
const previewErrorAlert = document.getElementById('preview-error-alert');
const previewDetailsMeta = document.getElementById('preview-details-meta');
const previewImage = document.getElementById('preview-image');
const previewVideo = document.getElementById('preview-video');
const previewAudio = document.getElementById('preview-audio');
@@ -242,12 +245,12 @@
</svg>
</a>
<div class="dropdown d-inline-block">
<button class="btn btn-outline-secondary btn-icon dropdown-toggle" type="button" data-bs-toggle="dropdown" data-bs-auto-close="true" aria-expanded="false" title="More actions">
<button class="btn btn-outline-secondary btn-icon dropdown-toggle" type="button" data-bs-toggle="dropdown" data-bs-auto-close="true" data-bs-config='{"popperConfig":{"strategy":"fixed"}}' aria-expanded="false" title="More actions">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" viewBox="0 0 16 16">
<path d="M9.5 13a1.5 1.5 0 1 1-3 0 1.5 1.5 0 0 1 3 0zm0-5a1.5 1.5 0 1 1-3 0 1.5 1.5 0 0 1 3 0zm0-5a1.5 1.5 0 1 1-3 0 1.5 1.5 0 0 1 3 0z"/>
</svg>
</button>
<ul class="dropdown-menu dropdown-menu-end" style="position: fixed;">
<ul class="dropdown-menu dropdown-menu-end">
<li><button class="dropdown-item" type="button" onclick="openCopyMoveModal('copy', '${escapeHtml(obj.key)}')">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" class="me-2" viewBox="0 0 16 16"><path fill-rule="evenodd" d="M4 2a2 2 0 0 1 2-2h8a2 2 0 0 1 2 2v8a2 2 0 0 1-2 2H6a2 2 0 0 1-2-2V2Zm2-1a1 1 0 0 0-1 1v8a1 1 0 0 0 1 1h8a1 1 0 0 0 1-1V2a1 1 0 0 0-1-1H6ZM2 5a1 1 0 0 0-1 1v8a1 1 0 0 0 1 1h8a1 1 0 0 0 1-1v-1h1v1a2 2 0 0 1-2 2H2a2 2 0 0 1-2-2V6a2 2 0 0 1 2-2h1v1H2Z"/></svg>
Copy
@@ -849,6 +852,11 @@
selectCheckbox.checked = true;
row.classList.add('table-active');
}
if (activeRow && activeRow.dataset.key === row.dataset.key) {
row.classList.add('table-active');
activeRow = row;
}
});
const folderRows = document.querySelectorAll('.folder-row');
@@ -861,6 +869,11 @@
const checkbox = row.querySelector('[data-folder-select]');
checkbox?.addEventListener('change', (e) => {
e.stopPropagation();
if (checkbox.checked) {
selectedRows.set(folderPath, { key: folderPath, isFolder: true });
} else {
selectedRows.delete(folderPath);
}
const folderObjects = allObjects.filter(obj => obj.key.startsWith(folderPath));
folderObjects.forEach(obj => {
if (checkbox.checked) {
@@ -935,7 +948,7 @@
const row = e.target.closest('[data-object-row]');
if (!row) return;
if (e.target.closest('[data-delete-object]') || e.target.closest('[data-object-select]') || e.target.closest('a')) {
if (e.target.closest('[data-delete-object]') || e.target.closest('[data-object-select]') || e.target.closest('a') || e.target.closest('.dropdown')) {
return;
}
@@ -1345,8 +1358,11 @@
}
if (selectAllCheckbox) {
const filesInView = visibleItems.filter(item => item.type === 'file');
const total = filesInView.length;
const visibleSelectedCount = filesInView.filter(item => selectedRows.has(item.data.key)).length;
const foldersInView = visibleItems.filter(item => item.type === 'folder');
const total = filesInView.length + foldersInView.length;
const fileSelectedCount = filesInView.filter(item => selectedRows.has(item.data.key)).length;
const folderSelectedCount = foldersInView.filter(item => selectedRows.has(item.path)).length;
const visibleSelectedCount = fileSelectedCount + folderSelectedCount;
selectAllCheckbox.disabled = total === 0;
selectAllCheckbox.checked = visibleSelectedCount > 0 && visibleSelectedCount === total && total > 0;
selectAllCheckbox.indeterminate = visibleSelectedCount > 0 && visibleSelectedCount < total;
@@ -1368,8 +1384,12 @@
const keys = Array.from(selectedRows.keys());
bulkDeleteList.innerHTML = '';
if (bulkDeleteCount) {
const label = keys.length === 1 ? 'object' : 'objects';
bulkDeleteCount.textContent = `${keys.length} ${label} selected`;
const folderCount = keys.filter(k => k.endsWith('/')).length;
const objectCount = keys.length - folderCount;
const parts = [];
if (folderCount) parts.push(`${folderCount} folder${folderCount !== 1 ? 's' : ''}`);
if (objectCount) parts.push(`${objectCount} object${objectCount !== 1 ? 's' : ''}`);
bulkDeleteCount.textContent = `${parts.join(' and ')} selected`;
}
if (!keys.length) {
const empty = document.createElement('li');
@@ -1508,7 +1528,7 @@
};
const response = await fetch(endpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
headers: { 'Content-Type': 'application/json', 'X-CSRFToken': window.getCsrfToken ? window.getCsrfToken() : '' },
body: JSON.stringify(payload),
});
const data = await response.json();
@@ -1952,6 +1972,10 @@
[previewImage, previewVideo, previewAudio, previewIframe].forEach((el) => {
if (!el) return;
el.classList.add('d-none');
if (el.tagName === 'IMG') {
el.removeAttribute('src');
el.onload = null;
}
if (el.tagName === 'VIDEO' || el.tagName === 'AUDIO') {
el.pause();
el.removeAttribute('src');
@@ -1964,9 +1988,38 @@
previewText.classList.add('d-none');
previewText.textContent = '';
}
previewPlaceholder.innerHTML = previewPlaceholderDefault;
previewPlaceholder.classList.remove('d-none');
};
let previewFailed = false;
const handlePreviewError = () => {
previewFailed = true;
if (downloadButton) {
downloadButton.classList.add('disabled');
downloadButton.removeAttribute('href');
}
if (presignButton) presignButton.disabled = true;
if (generatePresignButton) generatePresignButton.disabled = true;
if (previewDetailsMeta) previewDetailsMeta.classList.add('d-none');
if (previewMetadata) previewMetadata.classList.add('d-none');
const tagsPanel = document.getElementById('preview-tags');
if (tagsPanel) tagsPanel.classList.add('d-none');
const versionPanel = document.getElementById('version-panel');
if (versionPanel) versionPanel.classList.add('d-none');
if (previewErrorAlert) {
previewErrorAlert.textContent = 'Unable to load object \u2014 it may have been deleted, or the server returned an error.';
previewErrorAlert.classList.remove('d-none');
}
};
const clearPreviewError = () => {
previewFailed = false;
if (previewErrorAlert) previewErrorAlert.classList.add('d-none');
if (previewDetailsMeta) previewDetailsMeta.classList.remove('d-none');
};
async function fetchMetadata(metadataUrl) {
if (!metadataUrl) return null;
try {
@@ -1988,6 +2041,7 @@
previewPanel.classList.remove('d-none');
activeRow = row;
renderMetadata(null);
clearPreviewError();
previewKey.textContent = row.dataset.key;
previewSize.textContent = formatBytes(Number(row.dataset.size));
@@ -2011,18 +2065,71 @@
const previewUrl = row.dataset.previewUrl;
const lower = row.dataset.key.toLowerCase();
if (previewUrl && lower.match(/\.(png|jpg|jpeg|gif|webp|svg|ico|bmp)$/)) {
previewImage.src = previewUrl;
previewPlaceholder.innerHTML = '<div class="spinner-border spinner-border-sm text-secondary" role="status"></div><div class="small mt-2">Loading preview\u2026</div>';
const currentRow = row;
fetch(previewUrl)
.then((r) => {
if (activeRow !== currentRow) return;
if (!r.ok) {
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
handlePreviewError();
return;
}
return r.blob();
})
.then((blob) => {
if (!blob || activeRow !== currentRow) return;
const url = URL.createObjectURL(blob);
previewImage.onload = () => {
if (activeRow !== currentRow) { URL.revokeObjectURL(url); return; }
previewImage.classList.remove('d-none');
previewPlaceholder.classList.add('d-none');
};
previewImage.onerror = () => {
if (activeRow !== currentRow) { URL.revokeObjectURL(url); return; }
URL.revokeObjectURL(url);
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
};
previewImage.src = url;
})
.catch(() => {
if (activeRow !== currentRow) return;
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
handlePreviewError();
});
} else if (previewUrl && lower.match(/\.(mp4|webm|ogv|mov|avi|mkv)$/)) {
const currentRow = row;
previewVideo.onerror = () => {
if (activeRow !== currentRow) return;
previewVideo.classList.add('d-none');
previewPlaceholder.classList.remove('d-none');
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
handlePreviewError();
};
previewVideo.src = previewUrl;
previewVideo.classList.remove('d-none');
previewPlaceholder.classList.add('d-none');
} else if (previewUrl && lower.match(/\.(mp3|wav|flac|ogg|aac|m4a|wma)$/)) {
const currentRow = row;
previewAudio.onerror = () => {
if (activeRow !== currentRow) return;
previewAudio.classList.add('d-none');
previewPlaceholder.classList.remove('d-none');
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
handlePreviewError();
};
previewAudio.src = previewUrl;
previewAudio.classList.remove('d-none');
previewPlaceholder.classList.add('d-none');
} else if (previewUrl && lower.match(/\.(pdf)$/)) {
const currentRow = row;
previewIframe.onerror = () => {
if (activeRow !== currentRow) return;
previewIframe.classList.add('d-none');
previewPlaceholder.classList.remove('d-none');
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
handlePreviewError();
};
previewIframe.src = previewUrl;
previewIframe.style.minHeight = '500px';
previewIframe.classList.remove('d-none');
@@ -2047,14 +2154,17 @@
})
.catch(() => {
if (activeRow !== currentRow) return;
previewText.textContent = 'Failed to load preview';
previewText.classList.add('d-none');
previewPlaceholder.classList.remove('d-none');
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
handlePreviewError();
});
}
const metadataUrl = row.dataset.metadataUrl;
if (metadataUrl) {
const metadata = await fetchMetadata(metadataUrl);
if (activeRow === row) {
if (activeRow === row && !previewFailed) {
renderMetadata(metadata);
}
}
@@ -3152,6 +3262,15 @@
}
});
const foldersInView = visibleItems.filter(item => item.type === 'folder');
foldersInView.forEach(item => {
if (shouldSelect) {
selectedRows.set(item.path, { key: item.path, isFolder: true });
} else {
selectedRows.delete(item.path);
}
});
document.querySelectorAll('[data-folder-select]').forEach(cb => {
cb.checked = shouldSelect;
});
@@ -3948,9 +4067,14 @@
const cancelTagsButton = document.getElementById('cancelTagsButton');
let currentObjectTags = [];
let isEditingTags = false;
let savedObjectTags = [];
const loadObjectTags = async (row) => {
if (!row || !previewTagsPanel) return;
if (previewFailed) {
previewTagsPanel.classList.add('d-none');
return;
}
const tagsUrl = row.dataset.tagsUrl;
if (!tagsUrl) {
previewTagsPanel.classList.add('d-none');
@@ -3976,17 +4100,26 @@
previewTagsEmpty.classList.remove('d-none');
} else {
previewTagsEmpty.classList.add('d-none');
previewTagsList.innerHTML = currentObjectTags.map(t => `<span class="badge bg-info-subtle text-info">${escapeHtml(t.Key)}=${escapeHtml(t.Value)}</span>`).join('');
previewTagsList.innerHTML = currentObjectTags.map(t => `<span class="tag-pill"><span class="tag-pill-key">${escapeHtml(t.Key)}</span><span class="tag-pill-value">${escapeHtml(t.Value)}</span></span>`).join('');
}
};
const syncTagInputs = () => {
previewTagsInputs?.querySelectorAll('.tag-editor-row').forEach((row, idx) => {
if (idx < currentObjectTags.length) {
currentObjectTags[idx].Key = row.querySelector(`[data-tag-key="${idx}"]`)?.value || '';
currentObjectTags[idx].Value = row.querySelector(`[data-tag-value="${idx}"]`)?.value || '';
}
});
};
const renderTagEditor = () => {
if (!previewTagsInputs) return;
previewTagsInputs.innerHTML = currentObjectTags.map((t, idx) => `
<div class="input-group input-group-sm mb-1">
<input type="text" class="form-control" placeholder="Key" value="${escapeHtml(t.Key)}" data-tag-key="${idx}">
<input type="text" class="form-control" placeholder="Value" value="${escapeHtml(t.Value)}" data-tag-value="${idx}">
<button class="btn btn-outline-danger" type="button" onclick="removeTagRow(${idx})">
<div class="tag-editor-row">
<input type="text" class="form-control form-control-sm" placeholder="e.g. Environment" value="${escapeHtml(t.Key)}" data-tag-key="${idx}">
<input type="text" class="form-control form-control-sm" placeholder="e.g. Production" value="${escapeHtml(t.Value)}" data-tag-value="${idx}">
<button class="tag-editor-delete" type="button" onclick="removeTagRow(${idx})">
<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="currentColor" viewBox="0 0 16 16"><path d="M4.646 4.646a.5.5 0 0 1 .708 0L8 7.293l2.646-2.647a.5.5 0 0 1 .708.708L8.707 8l2.647 2.646a.5.5 0 0 1-.708.708L8 8.707l-2.646 2.647a.5.5 0 0 1-.708-.708L7.293 8 4.646 5.354a.5.5 0 0 1 0-.708z"/></svg>
</button>
</div>
@@ -3994,20 +4127,29 @@
};
window.removeTagRow = (idx) => {
syncTagInputs();
currentObjectTags.splice(idx, 1);
renderTagEditor();
};
editTagsButton?.addEventListener('click', () => {
savedObjectTags = currentObjectTags.map(t => ({ Key: t.Key, Value: t.Value }));
isEditingTags = true;
previewTagsList.classList.add('d-none');
previewTagsEmpty.classList.add('d-none');
previewTagsEditor?.classList.remove('d-none');
const card = previewTagsEditor?.querySelector('.tag-editor-card');
if (card) {
card.style.opacity = '0';
card.style.transition = 'opacity 0.2s ease';
requestAnimationFrame(() => { card.style.opacity = '1'; });
}
renderTagEditor();
});
cancelTagsButton?.addEventListener('click', () => {
isEditingTags = false;
currentObjectTags = savedObjectTags.map(t => ({ Key: t.Key, Value: t.Value }));
previewTagsEditor?.classList.add('d-none');
previewTagsList.classList.remove('d-none');
renderObjectTags();
@@ -4018,6 +4160,7 @@
showMessage({ title: 'Limit reached', body: 'Maximum 10 tags allowed per object.', variant: 'warning' });
return;
}
syncTagInputs();
currentObjectTags.push({ Key: '', Value: '' });
renderTagEditor();
});
@@ -4026,7 +4169,7 @@
if (!activeRow) return;
const tagsUrl = activeRow.dataset.tagsUrl;
if (!tagsUrl) return;
const inputs = previewTagsInputs?.querySelectorAll('.input-group');
const inputs = previewTagsInputs?.querySelectorAll('.tag-editor-row');
const newTags = [];
inputs?.forEach((group, idx) => {
const key = group.querySelector(`[data-tag-key="${idx}"]`)?.value?.trim() || '';

View File

@@ -3,6 +3,8 @@ window.BucketDetailUpload = (function() {
const MULTIPART_THRESHOLD = 8 * 1024 * 1024;
const CHUNK_SIZE = 8 * 1024 * 1024;
const MAX_PART_RETRIES = 3;
const RETRY_BASE_DELAY_MS = 1000;
let state = {
isUploading: false,
@@ -204,6 +206,67 @@ window.BucketDetailUpload = (function() {
}
}
function uploadPartXHR(url, chunk, csrfToken, baseBytes, fileSize, progressItem, partNumber, totalParts) {
return new Promise((resolve, reject) => {
const xhr = new XMLHttpRequest();
xhr.open('PUT', url, true);
xhr.setRequestHeader('X-CSRFToken', csrfToken || '');
xhr.upload.addEventListener('progress', (e) => {
if (e.lengthComputable) {
updateProgressItem(progressItem, {
status: `Part ${partNumber}/${totalParts}`,
loaded: baseBytes + e.loaded,
total: fileSize
});
}
});
xhr.addEventListener('load', () => {
if (xhr.status >= 200 && xhr.status < 300) {
try {
resolve(JSON.parse(xhr.responseText));
} catch {
reject(new Error(`Part ${partNumber}: invalid response`));
}
} else {
try {
const data = JSON.parse(xhr.responseText);
reject(new Error(data.error || `Part ${partNumber} failed (${xhr.status})`));
} catch {
reject(new Error(`Part ${partNumber} failed (${xhr.status})`));
}
}
});
xhr.addEventListener('error', () => reject(new Error(`Part ${partNumber}: network error`)));
xhr.addEventListener('abort', () => reject(new Error(`Part ${partNumber}: aborted`)));
xhr.send(chunk);
});
}
async function uploadPartWithRetry(url, chunk, csrfToken, baseBytes, fileSize, progressItem, partNumber, totalParts) {
let lastError;
for (let attempt = 0; attempt <= MAX_PART_RETRIES; attempt++) {
try {
return await uploadPartXHR(url, chunk, csrfToken, baseBytes, fileSize, progressItem, partNumber, totalParts);
} catch (err) {
lastError = err;
if (attempt < MAX_PART_RETRIES) {
const delay = RETRY_BASE_DELAY_MS * Math.pow(2, attempt);
updateProgressItem(progressItem, {
status: `Part ${partNumber}/${totalParts} retry ${attempt + 1}/${MAX_PART_RETRIES}...`,
loaded: baseBytes,
total: fileSize
});
await new Promise(r => setTimeout(r, delay));
}
}
}
throw lastError;
}
async function uploadMultipart(file, objectKey, metadata, progressItem, urls) {
const csrfToken = document.querySelector('input[name="csrf_token"]')?.value;
@@ -233,26 +296,14 @@ window.BucketDetailUpload = (function() {
const end = Math.min(start + CHUNK_SIZE, file.size);
const chunk = file.slice(start, end);
updateProgressItem(progressItem, {
status: `Part ${partNumber}/${totalParts}`,
loaded: uploadedBytes,
total: file.size
});
const partData = await uploadPartWithRetry(
`${partUrl}?partNumber=${partNumber}`,
chunk, csrfToken, uploadedBytes, file.size,
progressItem, partNumber, totalParts
);
const partResp = await fetch(`${partUrl}?partNumber=${partNumber}`, {
method: 'PUT',
headers: { 'X-CSRFToken': csrfToken || '' },
body: chunk
});
if (!partResp.ok) {
const err = await partResp.json().catch(() => ({}));
throw new Error(err.error || `Part ${partNumber} failed`);
}
const partData = await partResp.json();
parts.push({ part_number: partNumber, etag: partData.etag });
uploadedBytes += chunk.size;
uploadedBytes += (end - start);
updateProgressItem(progressItem, {
loaded: uploadedBytes,

View File

@@ -17,12 +17,20 @@ window.IAMManagement = (function() {
var currentDeleteKey = null;
var currentExpiryKey = null;
var ALL_S3_ACTIONS = ['list', 'read', 'write', 'delete', 'share', 'policy', 'replication', 'lifecycle', 'cors'];
var ALL_S3_ACTIONS = [
'list', 'read', 'write', 'delete', 'share', 'policy',
'replication', 'lifecycle', 'cors',
'create_bucket', 'delete_bucket',
'versioning', 'tagging', 'encryption', 'quota',
'object_lock', 'notification', 'logging', 'website'
];
var policyTemplates = {
full: [{ bucket: '*', actions: ['list', 'read', 'write', 'delete', 'share', 'policy', 'replication', 'lifecycle', 'cors', 'iam:*'] }],
full: [{ bucket: '*', actions: ['list', 'read', 'write', 'delete', 'share', 'policy', 'create_bucket', 'delete_bucket', 'replication', 'lifecycle', 'cors', 'versioning', 'tagging', 'encryption', 'quota', 'object_lock', 'notification', 'logging', 'website', 'iam:*'] }],
readonly: [{ bucket: '*', actions: ['list', 'read'] }],
writer: [{ bucket: '*', actions: ['list', 'read', 'write'] }]
writer: [{ bucket: '*', actions: ['list', 'read', 'write'] }],
operator: [{ bucket: '*', actions: ['list', 'read', 'write', 'delete', 'create_bucket', 'delete_bucket'] }],
bucketadmin: [{ bucket: '*', actions: ['list', 'read', 'write', 'delete', 'share', 'policy', 'create_bucket', 'delete_bucket', 'versioning', 'tagging', 'encryption', 'cors', 'lifecycle', 'quota', 'object_lock', 'notification', 'logging', 'website', 'replication'] }]
};
function isAdminUser(policies) {

View File

@@ -110,6 +110,14 @@
<span>Domains</span>
</a>
{% endif %}
{% if can_manage_iam %}
<a href="{{ url_for('ui.system_dashboard') }}" class="sidebar-link {% if request.endpoint == 'ui.system_dashboard' %}active{% endif %}">
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" fill="currentColor" viewBox="0 0 16 16">
<path d="M9.405 1.05c-.413-1.4-2.397-1.4-2.81 0l-.1.34a1.464 1.464 0 0 1-2.105.872l-.31-.17c-1.283-.698-2.686.705-1.987 1.987l.169.311c.446.82.023 1.841-.872 2.105l-.34.1c-1.4.413-1.4 2.397 0 2.81l.34.1a1.464 1.464 0 0 1 .872 2.105l-.17.31c-.698 1.283.705 2.686 1.987 1.987l.311-.169a1.464 1.464 0 0 1 2.105.872l.1.34c.413 1.4 2.397 1.4 2.81 0l.1-.34a1.464 1.464 0 0 1 2.105-.872l.31.17c1.283.698 2.686-.705 1.987-1.987l-.169-.311a1.464 1.464 0 0 1 .872-2.105l.34-.1c1.4-.413 1.4-2.397 0-2.81l-.34-.1a1.464 1.464 0 0 1-.872-2.105l.17-.31c.698-1.283-.705-2.686-1.987-1.987l-.311.169a1.464 1.464 0 0 1-2.105-.872l-.1-.34zM8 10.93a2.929 2.929 0 1 1 0-5.86 2.929 2.929 0 0 1 0 5.858z"/>
</svg>
<span>System</span>
</a>
{% endif %}
</div>
<div class="nav-section">
<span class="nav-section-title">Resources</span>
@@ -210,6 +218,14 @@
<span class="sidebar-link-text">Domains</span>
</a>
{% endif %}
{% if can_manage_iam %}
<a href="{{ url_for('ui.system_dashboard') }}" class="sidebar-link {% if request.endpoint == 'ui.system_dashboard' %}active{% endif %}" data-tooltip="System">
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" fill="currentColor" viewBox="0 0 16 16">
<path d="M9.405 1.05c-.413-1.4-2.397-1.4-2.81 0l-.1.34a1.464 1.464 0 0 1-2.105.872l-.31-.17c-1.283-.698-2.686.705-1.987 1.987l.169.311c.446.82.023 1.841-.872 2.105l-.34.1c-1.4.413-1.4 2.397 0 2.81l.34.1a1.464 1.464 0 0 1 .872 2.105l-.17.31c-.698 1.283.705 2.686 1.987 1.987l.311-.169a1.464 1.464 0 0 1 2.105.872l.1.34c.413 1.4 2.397 1.4 2.81 0l.1-.34a1.464 1.464 0 0 1 2.105-.872l.31.17c1.283.698 2.686-.705 1.987-1.987l-.169-.311a1.464 1.464 0 0 1 .872-2.105l.34-.1c1.4-.413 1.4-2.397 0-2.81l-.34-.1a1.464 1.464 0 0 1-.872-2.105l.17-.31c.698-1.283-.705-2.686-1.987-1.987l-.311.169a1.464 1.464 0 0 1-2.105-.872l-.1-.34zM8 10.93a2.929 2.929 0 1 1 0-5.86 2.929 2.929 0 0 1 0 5.858z"/>
</svg>
<span class="sidebar-link-text">System</span>
</a>
{% endif %}
</div>
<div class="nav-section">
<span class="nav-section-title">Resources</span>

View File

@@ -257,7 +257,8 @@
Share Link
</button>
</div>
<div class="p-3 rounded mb-3" style="background: var(--myfsio-preview-bg);">
<div id="preview-error-alert" class="alert alert-warning d-none py-2 px-3 mb-3 small" role="alert"></div>
<div id="preview-details-meta" class="p-3 rounded mb-3" style="background: var(--myfsio-preview-bg);">
<dl class="row small mb-0">
<dt class="col-5 text-muted fw-normal">Last modified</dt>
<dd class="col-7 mb-2 fw-medium" id="preview-modified"></dd>
@@ -292,19 +293,28 @@
Edit
</button>
</div>
<div id="preview-tags-list" class="d-flex flex-wrap gap-1"></div>
<div id="preview-tags-list" class="d-flex flex-wrap gap-2"></div>
<div id="preview-tags-empty" class="text-muted small p-2 bg-body-tertiary rounded">No tags</div>
<div id="preview-tags-editor" class="d-none mt-2">
<div id="preview-tags-inputs" class="mb-2"></div>
<div class="d-flex gap-2">
<button class="btn btn-sm btn-outline-secondary flex-grow-1" type="button" id="addTagRow">
<div class="tag-editor-card">
<div class="tag-editor-header">
<span>Key</span>
<span>Value</span>
<span></span>
</div>
<div id="preview-tags-inputs"></div>
<div class="tag-editor-actions">
<button class="btn btn-sm btn-outline-secondary" type="button" id="addTagRow">
<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="currentColor" class="me-1" viewBox="0 0 16 16">
<path d="M8 4a.5.5 0 0 1 .5.5v3h3a.5.5 0 0 1 0 1h-3v3a.5.5 0 0 1-1 0v-3h-3a.5.5 0 0 1 0-1h3v-3A.5.5 0 0 1 8 4z"/>
</svg>
Add Tag
</button>
<button class="btn btn-sm btn-primary" type="button" id="saveTagsButton">Save</button>
<div class="ms-auto d-flex gap-2">
<button class="btn btn-sm btn-outline-secondary" type="button" id="cancelTagsButton">Cancel</button>
<button class="btn btn-sm btn-primary" type="button" id="saveTagsButton">Save</button>
</div>
</div>
</div>
<div class="form-text mt-1">Maximum 10 tags. Keys and values up to 256 characters.</div>
</div>
@@ -911,14 +921,14 @@
<path d="M8 16A8 8 0 1 0 8 0a8 8 0 0 0 0 16zm.93-9.412-1 4.705c-.07.34.029.533.304.533.194 0 .487-.07.686-.246l-.088.416c-.287.346-.92.598-1.465.598-.703 0-1.002-.422-.808-1.319l.738-3.468c.064-.293.006-.399-.287-.47l-.451-.081.082-.381 2.29-.287zM8 5.5a1 1 0 1 1 0-2 1 1 0 0 1 0 2z"/>
</svg>
<div>
<strong>Storage quota enabled</strong>
<strong>Storage quota active</strong>
<p class="mb-0 small">
{% if max_bytes is not none and max_objects is not none %}
Limited to {{ max_bytes | filesizeformat }} and {{ max_objects }} objects.
This bucket is limited to {{ max_bytes | filesizeformat }} storage and {{ max_objects }} objects.
{% elif max_bytes is not none %}
Limited to {{ max_bytes | filesizeformat }} storage.
This bucket is limited to {{ max_bytes | filesizeformat }} storage.
{% else %}
Limited to {{ max_objects }} objects.
This bucket is limited to {{ max_objects }} objects.
{% endif %}
</p>
</div>
@@ -2048,7 +2058,7 @@
<div class="col-12">
<label class="form-label fw-medium">Select files</label>
<input class="form-control" type="file" name="object" id="uploadFileInput" multiple required />
<div class="form-text">Select one or more files from your device. Files ≥ 8&nbsp;MB automatically switch to multipart uploads.</div>
<div class="form-text">Select one or more files from your device. Files ≥ 8&nbsp;MB use multipart uploads with automatic retry.</div>
</div>
<div class="col-12">
<div class="upload-dropzone text-center" data-dropzone>

View File

@@ -41,6 +41,7 @@
<li><a href="#encryption">Encryption</a></li>
<li><a href="#lifecycle">Lifecycle Rules</a></li>
<li><a href="#garbage-collection">Garbage Collection</a></li>
<li><a href="#integrity">Integrity Scanner</a></li>
<li><a href="#metrics">Metrics History</a></li>
<li><a href="#operation-metrics">Operation Metrics</a></li>
<li><a href="#troubleshooting">Troubleshooting</a></li>
@@ -83,7 +84,7 @@ pip install -r requirements.txt
# Run both API and UI (Development)
python run.py
# Run in Production (Waitress server)
# Run in Production (Granian server)
python run.py --prod
# Or run individually
@@ -219,7 +220,7 @@ python run.py --mode ui
<tr>
<td><code>SERVER_THREADS</code></td>
<td><code>0</code> (auto)</td>
<td>Waitress worker threads (1-64). 0 = auto (CPU cores × 2).</td>
<td>Granian blocking threads (1-64). 0 = auto (CPU cores × 2).</td>
</tr>
<tr>
<td><code>SERVER_CONNECTION_LIMIT</code></td>
@@ -1731,10 +1732,114 @@ curl "{{ api_base }}/admin/gc/history?limit=10" \
</div>
</div>
</article>
<article id="metrics" class="card shadow-sm docs-section">
<article id="integrity" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">15</span>
<h2 class="h4 mb-0">Integrity Scanner</h2>
</div>
<p class="text-muted">Detect and optionally auto-repair data inconsistencies: corrupted objects, orphaned files, phantom metadata, stale versions, ETag cache drift, and unmigrated legacy metadata.</p>
<h3 class="h6 text-uppercase text-muted mt-4">Enabling Integrity Scanner</h3>
<p class="small text-muted">Disabled by default. Enable via environment variable:</p>
<pre class="mb-3"><code class="language-bash">INTEGRITY_ENABLED=true python run.py</code></pre>
<h3 class="h6 text-uppercase text-muted mt-4">Configuration</h3>
<div class="table-responsive mb-3">
<table class="table table-sm table-bordered small">
<thead class="table-light">
<tr>
<th>Variable</th>
<th>Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr><td><code>INTEGRITY_ENABLED</code></td><td><code>false</code></td><td>Enable background integrity scanning</td></tr>
<tr><td><code>INTEGRITY_INTERVAL_HOURS</code></td><td><code>24</code></td><td>Hours between scan cycles</td></tr>
<tr><td><code>INTEGRITY_BATCH_SIZE</code></td><td><code>1000</code></td><td>Max objects to scan per cycle</td></tr>
<tr><td><code>INTEGRITY_AUTO_HEAL</code></td><td><code>false</code></td><td>Automatically repair detected issues</td></tr>
<tr><td><code>INTEGRITY_DRY_RUN</code></td><td><code>false</code></td><td>Log issues without healing</td></tr>
</tbody>
</table>
</div>
<h3 class="h6 text-uppercase text-muted mt-4">What Gets Checked</h3>
<div class="table-responsive mb-3">
<table class="table table-sm table-bordered small">
<thead class="table-light">
<tr>
<th>Check</th>
<th>Detection</th>
<th>Heal Action</th>
</tr>
</thead>
<tbody>
<tr><td><strong>Corrupted objects</strong></td><td>File MD5 does not match stored ETag</td><td>Update ETag in index (disk is authoritative)</td></tr>
<tr><td><strong>Orphaned objects</strong></td><td>File exists without metadata entry</td><td>Create index entry with computed MD5/size/mtime</td></tr>
<tr><td><strong>Phantom metadata</strong></td><td>Index entry exists but file is missing</td><td>Remove stale entry from index</td></tr>
<tr><td><strong>Stale versions</strong></td><td>Manifest without data or vice versa</td><td>Remove orphaned version file</td></tr>
<tr><td><strong>ETag cache</strong></td><td><code>etag_index.json</code> differs from metadata</td><td>Delete cache file (auto-rebuilt)</td></tr>
<tr><td><strong>Legacy metadata</strong></td><td>Legacy <code>.meta.json</code> differs or unmigrated</td><td>Migrate to index, delete legacy file</td></tr>
</tbody>
</table>
</div>
<h3 class="h6 text-uppercase text-muted mt-4">Admin API</h3>
<div class="table-responsive mb-3">
<table class="table table-sm table-bordered small">
<thead class="table-light">
<tr>
<th>Method</th>
<th>Route</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr><td><code>GET</code></td><td><code>/admin/integrity/status</code></td><td>Get scanner status and configuration</td></tr>
<tr><td><code>POST</code></td><td><code>/admin/integrity/run</code></td><td>Trigger manual scan</td></tr>
<tr><td><code>GET</code></td><td><code>/admin/integrity/history</code></td><td>Get scan history</td></tr>
</tbody>
</table>
</div>
<pre class="mb-3"><code class="language-bash"># Trigger a dry run with auto-heal preview
curl -X POST "{{ api_base }}/admin/integrity/run" \
-H "X-Access-Key: &lt;key&gt;" -H "X-Secret-Key: &lt;secret&gt;" \
-H "Content-Type: application/json" \
-d '{"dry_run": true, "auto_heal": true}'
# Trigger actual scan with healing
curl -X POST "{{ api_base }}/admin/integrity/run" \
-H "X-Access-Key: &lt;key&gt;" -H "X-Secret-Key: &lt;secret&gt;" \
-H "Content-Type: application/json" \
-d '{"auto_heal": true}'
# Check status
curl "{{ api_base }}/admin/integrity/status" \
-H "X-Access-Key: &lt;key&gt;" -H "X-Secret-Key: &lt;secret&gt;"
# View history
curl "{{ api_base }}/admin/integrity/history?limit=10" \
-H "X-Access-Key: &lt;key&gt;" -H "X-Secret-Key: &lt;secret&gt;"</code></pre>
<div class="alert alert-light border mb-0">
<div class="d-flex gap-2">
<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-info-circle text-muted mt-1 flex-shrink-0" viewBox="0 0 16 16">
<path d="M8 15A7 7 0 1 1 8 1a7 7 0 0 1 0 14zm0 1A8 8 0 1 0 8 0a8 8 0 0 0 0 16z"/>
<path d="m8.93 6.588-2.29.287-.082.38.45.083c.294.07.352.176.288.469l-.738 3.468c-.194.897.105 1.319.808 1.319.545 0 1.178-.252 1.465-.598l.088-.416c-.2.176-.492.246-.686.246-.275 0-.375-.193-.304-.533L8.93 6.588zM9 4.5a1 1 0 1 1-2 0 1 1 0 0 1 2 0z"/>
</svg>
<div>
<strong>Dry Run:</strong> Use <code>INTEGRITY_DRY_RUN=true</code> or pass <code>{"dry_run": true}</code> to the API to preview detected issues without making any changes. Combine with <code>{"auto_heal": true}</code> to see what would be repaired.
</div>
</div>
</div>
</div>
</article>
<article id="metrics" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">16</span>
<h2 class="h4 mb-0">Metrics History</h2>
</div>
<p class="text-muted">Track CPU, memory, and disk usage over time with optional metrics history. Disabled by default to minimize overhead.</p>
@@ -1818,7 +1923,7 @@ curl -X PUT "{{ api_base | replace('/api', '/ui') }}/metrics/settings" \
<article id="operation-metrics" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">16</span>
<span class="docs-section-kicker">17</span>
<h2 class="h4 mb-0">Operation Metrics</h2>
</div>
<p class="text-muted">Track API request statistics including request counts, latency, error rates, and bandwidth usage. Provides real-time visibility into API operations.</p>
@@ -1925,7 +2030,7 @@ curl "{{ api_base | replace('/api', '/ui') }}/metrics/operations/history?hours=6
<article id="troubleshooting" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">17</span>
<span class="docs-section-kicker">18</span>
<h2 class="h4 mb-0">Troubleshooting &amp; tips</h2>
</div>
<div class="table-responsive">
@@ -1976,7 +2081,7 @@ curl "{{ api_base | replace('/api', '/ui') }}/metrics/operations/history?hours=6
<article id="health-check" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">18</span>
<span class="docs-section-kicker">19</span>
<h2 class="h4 mb-0">Health Check Endpoint</h2>
</div>
<p class="text-muted">The API exposes a health check endpoint for monitoring and load balancer integration.</p>
@@ -1998,7 +2103,7 @@ curl {{ api_base }}/myfsio/health
<article id="object-lock" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">19</span>
<span class="docs-section-kicker">20</span>
<h2 class="h4 mb-0">Object Lock &amp; Retention</h2>
</div>
<p class="text-muted">Object Lock prevents objects from being deleted or overwritten for a specified retention period.</p>
@@ -2058,7 +2163,7 @@ curl "{{ api_base }}/&lt;bucket&gt;/&lt;key&gt;?legal-hold" \
<article id="access-logging" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">20</span>
<span class="docs-section-kicker">21</span>
<h2 class="h4 mb-0">Access Logging</h2>
</div>
<p class="text-muted">Enable S3-style access logging to track all requests to your buckets for audit and analysis.</p>
@@ -2085,7 +2190,7 @@ curl "{{ api_base }}/&lt;bucket&gt;?logging" \
<article id="notifications" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">21</span>
<span class="docs-section-kicker">22</span>
<h2 class="h4 mb-0">Notifications &amp; Webhooks</h2>
</div>
<p class="text-muted">Configure event notifications to trigger webhooks when objects are created or deleted.</p>
@@ -2148,7 +2253,7 @@ curl -X PUT "{{ api_base }}/&lt;bucket&gt;?notification" \
<article id="select-content" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">22</span>
<span class="docs-section-kicker">23</span>
<h2 class="h4 mb-0">SelectObjectContent (SQL)</h2>
</div>
<p class="text-muted">Query CSV, JSON, or Parquet files directly using SQL without downloading the entire object.</p>
@@ -2193,7 +2298,7 @@ curl -X POST "{{ api_base }}/&lt;bucket&gt;/data.csv?select" \
<article id="advanced-ops" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">23</span>
<span class="docs-section-kicker">24</span>
<h2 class="h4 mb-0">Advanced S3 Operations</h2>
</div>
<p class="text-muted">Copy, move, and partially download objects using advanced S3 operations.</p>
@@ -2267,7 +2372,7 @@ curl "{{ api_base }}/&lt;bucket&gt;/&lt;key&gt;" \
<article id="acls" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">24</span>
<span class="docs-section-kicker">25</span>
<h2 class="h4 mb-0">Access Control Lists (ACLs)</h2>
</div>
<p class="text-muted">ACLs provide legacy-style permission management for buckets and objects.</p>
@@ -2321,7 +2426,7 @@ curl -X PUT "{{ api_base }}/&lt;bucket&gt;/&lt;key&gt;" \
<article id="tagging" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">25</span>
<span class="docs-section-kicker">26</span>
<h2 class="h4 mb-0">Object &amp; Bucket Tagging</h2>
</div>
<p class="text-muted">Add metadata tags to buckets and objects for organization, cost allocation, or lifecycle rule filtering.</p>
@@ -2380,7 +2485,7 @@ curl -X PUT "{{ api_base }}/&lt;bucket&gt;?tagging" \
<article id="website-hosting" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">26</span>
<span class="docs-section-kicker">27</span>
<h2 class="h4 mb-0">Static Website Hosting</h2>
</div>
<p class="text-muted">Host static websites directly from S3 buckets with custom index and error pages, served via custom domain mapping.</p>
@@ -2473,7 +2578,7 @@ server {
<article id="cors-config" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">27</span>
<span class="docs-section-kicker">28</span>
<h2 class="h4 mb-0">CORS Configuration</h2>
</div>
<p class="text-muted">Configure per-bucket Cross-Origin Resource Sharing rules to control which origins can access your bucket from a browser.</p>
@@ -2540,7 +2645,7 @@ curl -X DELETE "{{ api_base }}/&lt;bucket&gt;?cors" \
<article id="post-object" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">28</span>
<span class="docs-section-kicker">29</span>
<h2 class="h4 mb-0">PostObject (HTML Form Upload)</h2>
</div>
<p class="text-muted">Upload objects directly from an HTML form using browser-based POST uploads with policy-based authorization.</p>
@@ -2582,7 +2687,7 @@ curl -X DELETE "{{ api_base }}/&lt;bucket&gt;?cors" \
<article id="list-objects-v2" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">29</span>
<span class="docs-section-kicker">30</span>
<h2 class="h4 mb-0">List Objects API v2</h2>
</div>
<p class="text-muted">Use the v2 list API for improved pagination with continuation tokens instead of markers.</p>
@@ -2626,7 +2731,7 @@ curl "{{ api_base }}/&lt;bucket&gt;?list-type=2&amp;start-after=photos/2025/" \
<article id="upgrading" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">30</span>
<span class="docs-section-kicker">31</span>
<h2 class="h4 mb-0">Upgrading &amp; Updates</h2>
</div>
<p class="text-muted">How to safely update MyFSIO to a new version.</p>
@@ -2659,7 +2764,7 @@ cp -r logs/ logs-backup/</code></pre>
<article id="api-matrix" class="card shadow-sm docs-section">
<div class="card-body">
<div class="d-flex align-items-center gap-2 mb-3">
<span class="docs-section-kicker">31</span>
<span class="docs-section-kicker">32</span>
<h2 class="h4 mb-0">Full API Reference</h2>
</div>
<p class="text-muted">Complete list of all S3-compatible, admin, and KMS endpoints.</p>

View File

@@ -235,7 +235,7 @@
{% set bucket_label = 'All Buckets' if policy.bucket == '*' else policy.bucket %}
{% if '*' in policy.actions %}
{% set perm_label = 'Full Access' %}
{% elif policy.actions|length >= 9 %}
{% elif policy.actions|length >= 19 %}
{% set perm_label = 'Full Access' %}
{% elif 'list' in policy.actions and 'read' in policy.actions and 'write' in policy.actions and 'delete' in policy.actions %}
{% set perm_label = 'Read + Write + Delete' %}
@@ -354,6 +354,8 @@
<button class="btn btn-outline-secondary btn-sm" type="button" data-create-policy-template="full">Full Control</button>
<button class="btn btn-outline-secondary btn-sm" type="button" data-create-policy-template="readonly">Read-Only</button>
<button class="btn btn-outline-secondary btn-sm" type="button" data-create-policy-template="writer">Read + Write</button>
<button class="btn btn-outline-secondary btn-sm" type="button" data-create-policy-template="operator">Operator</button>
<button class="btn btn-outline-secondary btn-sm" type="button" data-create-policy-template="bucketadmin">Bucket Admin</button>
</div>
</div>
<div class="modal-footer">
@@ -404,6 +406,8 @@
<button class="btn btn-outline-secondary btn-sm" type="button" data-policy-template="full">Full Control</button>
<button class="btn btn-outline-secondary btn-sm" type="button" data-policy-template="readonly">Read-Only</button>
<button class="btn btn-outline-secondary btn-sm" type="button" data-policy-template="writer">Read + Write</button>
<button class="btn btn-outline-secondary btn-sm" type="button" data-policy-template="operator">Operator</button>
<button class="btn btn-outline-secondary btn-sm" type="button" data-policy-template="bucketadmin">Bucket Admin</button>
</div>
</form>
</div>

View File

@@ -210,9 +210,6 @@
<div class="fw-bold" data-metric="health_uptime">{{ app.uptime_days }}d</div>
<small class="opacity-75" style="font-size: 0.7rem;">Uptime</small>
</div>
<div class="text-center">
<span class="badge bg-white bg-opacity-25 fw-semibold px-2 py-1">v{{ app.version }}</span>
</div>
</div>
</div>
</div>

View File

@@ -0,0 +1,750 @@
{% extends "base.html" %}
{% block title %}System - MyFSIO Console{% endblock %}
{% block content %}
<div class="page-header d-flex justify-content-between align-items-center mb-4">
<div>
<p class="text-uppercase text-muted small mb-1">Administration</p>
<h1 class="h3 mb-1 d-flex align-items-center gap-2">
<svg xmlns="http://www.w3.org/2000/svg" width="28" height="28" fill="currentColor" class="text-primary" viewBox="0 0 16 16">
<path d="M9.405 1.05c-.413-1.4-2.397-1.4-2.81 0l-.1.34a1.464 1.464 0 0 1-2.105.872l-.31-.17c-1.283-.698-2.686.705-1.987 1.987l.169.311c.446.82.023 1.841-.872 2.105l-.34.1c-1.4.413-1.4 2.397 0 2.81l.34.1a1.464 1.464 0 0 1 .872 2.105l-.17.31c-.698 1.283.705 2.686 1.987 1.987l.311-.169a1.464 1.464 0 0 1 2.105.872l.1.34c.413 1.4 2.397 1.4 2.81 0l.1-.34a1.464 1.464 0 0 1 2.105-.872l.31.17c1.283.698 2.686-.705 1.987-1.987l-.169-.311a1.464 1.464 0 0 1 .872-2.105l.34-.1c1.4-.413 1.4-2.397 0-2.81l-.34-.1a1.464 1.464 0 0 1-.872-2.105l.17-.31c.698-1.283-.705-2.686-1.987-1.987l-.311.169a1.464 1.464 0 0 1-2.105-.872l-.1-.34zM8 10.93a2.929 2.929 0 1 1 0-5.86 2.929 2.929 0 0 1 0 5.858z"/>
</svg>
System
</h1>
<p class="text-muted mb-0 mt-1">Server information, feature flags, and maintenance tools.</p>
</div>
<div class="d-none d-md-block">
<span class="badge bg-primary bg-opacity-10 text-primary fs-6 px-3 py-2">v{{ app_version }}</span>
</div>
</div>
<div class="row g-4 mb-4">
<div class="col-lg-6">
<div class="card shadow-sm border-0" style="border-radius: 1rem;">
<div class="card-header bg-transparent border-0 pt-4 pb-0 px-4">
<h5 class="fw-semibold d-flex align-items-center gap-2 mb-1">
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" fill="currentColor" class="text-primary" viewBox="0 0 16 16">
<path d="M5 0a.5.5 0 0 1 .5.5V2h1V.5a.5.5 0 0 1 1 0V2h1V.5a.5.5 0 0 1 1 0V2h1V.5a.5.5 0 0 1 1 0V2A2.5 2.5 0 0 1 14 4.5h1.5a.5.5 0 0 1 0 1H14v1h1.5a.5.5 0 0 1 0 1H14v1h1.5a.5.5 0 0 1 0 1H14v1h1.5a.5.5 0 0 1 0 1H14a2.5 2.5 0 0 1-2.5 2.5v1.5a.5.5 0 0 1-1 0V14h-1v1.5a.5.5 0 0 1-1 0V14h-1v1.5a.5.5 0 0 1-1 0V14h-1v1.5a.5.5 0 0 1-1 0V14A2.5 2.5 0 0 1 2 11.5H.5a.5.5 0 0 1 0-1H2v-1H.5a.5.5 0 0 1 0-1H2v-1H.5a.5.5 0 0 1 0-1H2v-1H.5a.5.5 0 0 1 0-1H2A2.5 2.5 0 0 1 4.5 2V.5A.5.5 0 0 1 5 0zm-.5 3A1.5 1.5 0 0 0 3 4.5v7A1.5 1.5 0 0 0 4.5 13h7a1.5 1.5 0 0 0 1.5-1.5v-7A1.5 1.5 0 0 0 11.5 3h-7zM5 6.5A1.5 1.5 0 0 1 6.5 5h3A1.5 1.5 0 0 1 11 6.5v3A1.5 1.5 0 0 1 9.5 11h-3A1.5 1.5 0 0 1 5 9.5v-3zM6.5 6a.5.5 0 0 0-.5.5v3a.5.5 0 0 0 .5.5h3a.5.5 0 0 0 .5-.5v-3a.5.5 0 0 0-.5-.5h-3z"/>
</svg>
Server Information
</h5>
<p class="text-muted small mb-0">Runtime environment and configuration</p>
</div>
<div class="card-body px-4 pb-4">
<table class="table table-sm mb-0">
<tbody>
<tr><td class="text-muted" style="width:40%">Version</td><td class="fw-medium">{{ app_version }}</td></tr>
<tr><td class="text-muted">Storage Root</td><td><code>{{ storage_root }}</code></td></tr>
<tr><td class="text-muted">Platform</td><td>{{ platform }}</td></tr>
<tr><td class="text-muted">Python</td><td>{{ python_version }}</td></tr>
<tr><td class="text-muted">Rust Extension</td><td>
{% if has_rust %}
<span class="badge bg-success bg-opacity-10 text-success">Loaded</span>
{% else %}
<span class="badge bg-secondary bg-opacity-10 text-secondary">Not loaded</span>
{% endif %}
</td></tr>
</tbody>
</table>
</div>
</div>
</div>
<div class="col-lg-6">
<div class="card shadow-sm border-0" style="border-radius: 1rem;">
<div class="card-header bg-transparent border-0 pt-4 pb-0 px-4">
<h5 class="fw-semibold d-flex align-items-center gap-2 mb-1">
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" fill="currentColor" class="text-primary" viewBox="0 0 16 16">
<path fill-rule="evenodd" d="M11.5 2a1.5 1.5 0 1 0 0 3 1.5 1.5 0 0 0 0-3zM9.05 3a2.5 2.5 0 0 1 4.9 0H16v1h-2.05a2.5 2.5 0 0 1-4.9 0H0V3h9.05zM4.5 7a1.5 1.5 0 1 0 0 3 1.5 1.5 0 0 0 0-3zM2.05 8a2.5 2.5 0 0 1 4.9 0H16v1H6.95a2.5 2.5 0 0 1-4.9 0H0V8h2.05zm9.45 4a1.5 1.5 0 1 0 0 3 1.5 1.5 0 0 0 0-3zm-2.45 1a2.5 2.5 0 0 1 4.9 0H16v1h-2.05a2.5 2.5 0 0 1-4.9 0H0v-1h9.05z"/>
</svg>
Feature Flags
</h5>
<p class="text-muted small mb-0">Features configured via environment variables</p>
</div>
<div class="card-body px-4 pb-4">
<table class="table table-sm mb-0">
<tbody>
{% for feat in features %}
<tr>
<td class="text-muted" style="width:55%">{{ feat.label }}</td>
<td class="text-end">
{% if feat.enabled %}
<span class="badge bg-success bg-opacity-10 text-success">Enabled</span>
{% else %}
<span class="badge bg-secondary bg-opacity-10 text-secondary">Disabled</span>
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
</div>
</div>
<div class="row g-4 mb-4">
<div class="col-lg-6">
<div class="card shadow-sm border-0" style="border-radius: 1rem;">
<div class="card-header bg-transparent border-0 pt-4 pb-0 px-4">
<div class="d-flex justify-content-between align-items-start">
<div>
<h5 class="fw-semibold d-flex align-items-center gap-2 mb-1">
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" fill="currentColor" class="text-primary" viewBox="0 0 16 16">
<path d="M2.5 1a1 1 0 0 0-1 1v1a1 1 0 0 0 1 1H3v9a2 2 0 0 0 2 2h6a2 2 0 0 0 2-2V4h.5a1 1 0 0 0 1-1V2a1 1 0 0 0-1-1H10a1 1 0 0 0-1-1H7a1 1 0 0 0-1 1H2.5zm3 4a.5.5 0 0 1 .5.5v7a.5.5 0 0 1-1 0v-7a.5.5 0 0 1 .5-.5zM8 5a.5.5 0 0 1 .5.5v7a.5.5 0 0 1-1 0v-7A.5.5 0 0 1 8 5zm3 .5v7a.5.5 0 0 1-1 0v-7a.5.5 0 0 1 1 0z"/>
</svg>
Garbage Collection
</h5>
<p class="text-muted small mb-0">Clean up temporary files, orphaned uploads, and stale locks</p>
</div>
<div>
{% if gc_status.enabled %}
<span class="badge bg-success bg-opacity-10 text-success">Active</span>
{% else %}
<span class="badge bg-secondary bg-opacity-10 text-secondary">Disabled</span>
{% endif %}
</div>
</div>
</div>
<div class="card-body px-4 pb-4">
{% if gc_status.enabled %}
<div class="d-flex gap-2 mb-3">
<button class="btn btn-primary btn-sm d-inline-flex align-items-center" id="gcRunBtn" onclick="runGC(false)">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" class="me-1 flex-shrink-0" viewBox="0 0 16 16">
<path fill-rule="evenodd" d="M8 3a5 5 0 1 0 4.546 2.914.5.5 0 0 1 .908-.417A6 6 0 1 1 8 2v1z"/>
<path d="M8 4.466V.534a.25.25 0 0 1 .41-.192l2.36 1.966c.12.1.12.284 0 .384L8.41 4.658A.25.25 0 0 1 8 4.466z"/>
</svg>
Run Now
</button>
<button class="btn btn-outline-secondary btn-sm" id="gcDryRunBtn" onclick="runGC(true)">
Dry Run
</button>
</div>
<div id="gcScanningBanner" class="mb-3 {% if not gc_status.scanning %}d-none{% endif %}">
<div class="alert alert-info mb-0 small d-flex align-items-center gap-2">
<div class="spinner-border spinner-border-sm text-info" role="status"></div>
<span>GC in progress<span id="gcScanElapsed"></span></span>
</div>
</div>
<div id="gcResult" class="mb-3 d-none">
<div class="alert mb-0 small" id="gcResultAlert">
<div class="d-flex justify-content-between align-items-start">
<div class="fw-semibold mb-1" id="gcResultTitle"></div>
<button type="button" class="btn-close btn-close-sm" style="font-size:0.65rem" onclick="document.getElementById('gcResult').classList.add('d-none')"></button>
</div>
<div id="gcResultBody"></div>
</div>
</div>
<div class="border rounded p-3 mb-3" style="background: var(--bs-tertiary-bg, #f8f9fa);">
<div class="d-flex align-items-center gap-2 mb-2">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" class="text-muted" viewBox="0 0 16 16">
<path d="M9.405 1.05c-.413-1.4-2.397-1.4-2.81 0l-.1.34a1.464 1.464 0 0 1-2.105.872l-.31-.17c-1.283-.698-2.686.705-1.987 1.987l.169.311c.446.82.023 1.841-.872 2.105l-.34.1c-1.4.413-1.4 2.397 0 2.81l.34.1a1.464 1.464 0 0 1 .872 2.105l-.17.31c-.698 1.283.705 2.686 1.987 1.987l.311-.169a1.464 1.464 0 0 1 2.105.872l.1.34c.413 1.4 2.397 1.4 2.81 0l.1-.34a1.464 1.464 0 0 1 2.105-.872l.31.17c1.283.698 2.686-.705 1.987-1.987l-.169-.311a1.464 1.464 0 0 1 .872-2.105l.34-.1c1.4-.413 1.4-2.397 0-2.81l-.34-.1a1.464 1.464 0 0 1-.872-2.105l.17-.31c.698-1.283-.705-2.686-1.987-1.987l-.311.169a1.464 1.464 0 0 1-2.105-.872l-.1-.34zM8 10.93a2.929 2.929 0 1 1 0-5.86 2.929 2.929 0 0 1 0 5.858z"/>
</svg>
<span class="small fw-semibold text-muted">Configuration</span>
</div>
<div class="row small">
<div class="col-6 mb-1"><span class="text-muted">Interval:</span> {{ gc_status.interval_hours }}h</div>
<div class="col-6 mb-1"><span class="text-muted">Dry run:</span> {{ "Yes" if gc_status.dry_run else "No" }}</div>
<div class="col-6 mb-1"><span class="text-muted">Temp max age:</span> {{ gc_status.temp_file_max_age_hours }}h</div>
<div class="col-6 mb-1"><span class="text-muted">Lock max age:</span> {{ gc_status.lock_file_max_age_hours }}h</div>
<div class="col-6"><span class="text-muted">Multipart max age:</span> {{ gc_status.multipart_max_age_days }}d</div>
</div>
</div>
<div id="gcHistoryContainer">
{% if gc_history %}
<h6 class="fw-semibold small text-muted mb-2 d-flex align-items-center gap-2">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" viewBox="0 0 16 16">
<path d="M8.515 1.019A7 7 0 0 0 8 1V0a8 8 0 0 1 .589.022l-.074.997zm2.004.45a7.003 7.003 0 0 0-.985-.299l.219-.976c.383.086.76.2 1.126.342l-.36.933zm1.37.71a7.01 7.01 0 0 0-.439-.27l.493-.87a8.025 8.025 0 0 1 .979.654l-.615.789a6.996 6.996 0 0 0-.418-.302zm1.834 1.79a6.99 6.99 0 0 0-.653-.796l.724-.69c.27.285.52.59.747.91l-.818.576zm.744 1.352a7.08 7.08 0 0 0-.214-.468l.893-.45a7.976 7.976 0 0 1 .45 1.088l-.95.313a7.023 7.023 0 0 0-.179-.483zm.53 2.507a6.991 6.991 0 0 0-.1-1.025l.985-.17c.067.386.106.778.116 1.17l-1 .025zm-.131 1.538c.033-.17.06-.339.081-.51l.993.123a7.957 7.957 0 0 1-.23 1.155l-.964-.267c.046-.165.086-.332.12-.501zm-.952 2.379c.184-.29.346-.594.486-.908l.914.405c-.16.36-.345.706-.555 1.038l-.845-.535zm-.964 1.205c.122-.122.239-.248.35-.378l.758.653a8.073 8.073 0 0 1-.401.432l-.707-.707z"/>
<path d="M8 1a7 7 0 1 0 4.95 11.95l.707.707A8.001 8.001 0 1 1 8 0v1z"/>
<path d="M7.5 3a.5.5 0 0 1 .5.5v5.21l3.248 1.856a.5.5 0 0 1-.496.868l-3.5-2A.5.5 0 0 1 7 8V3.5a.5.5 0 0 1 .5-.5z"/>
</svg>
Recent Executions
</h6>
<div class="table-responsive">
<table class="table table-sm small mb-0">
<thead class="table-light">
<tr>
<th>Time</th>
<th class="text-center">Cleaned</th>
<th class="text-center">Freed</th>
<th class="text-center">Mode</th>
</tr>
</thead>
<tbody>
{% for exec in gc_history %}
<tr>
<td class="text-nowrap">{{ exec.timestamp_display }}</td>
<td class="text-center">
{% set r = exec.result %}
{{ (r.temp_files_deleted|d(0)) + (r.multipart_uploads_deleted|d(0)) + (r.lock_files_deleted|d(0)) + (r.orphaned_metadata_deleted|d(0)) + (r.orphaned_versions_deleted|d(0)) + (r.empty_dirs_removed|d(0)) }}
</td>
<td class="text-center">{{ exec.bytes_freed_display }}</td>
<td class="text-center">
{% if exec.dry_run %}
<span class="badge bg-warning bg-opacity-10 text-warning">Dry run</span>
{% else %}
<span class="badge bg-primary bg-opacity-10 text-primary">Live</span>
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% else %}
<div class="text-center py-2">
<p class="text-muted small mb-0">No executions recorded yet.</p>
</div>
{% endif %}
</div>
{% else %}
<div class="text-center py-4">
<svg xmlns="http://www.w3.org/2000/svg" width="40" height="40" fill="currentColor" class="text-muted mb-2 opacity-50" viewBox="0 0 16 16">
<path d="M2.5 1a1 1 0 0 0-1 1v1a1 1 0 0 0 1 1H3v9a2 2 0 0 0 2 2h6a2 2 0 0 0 2-2V4h.5a1 1 0 0 0 1-1V2a1 1 0 0 0-1-1H10a1 1 0 0 0-1-1H7a1 1 0 0 0-1 1H2.5zm3 4a.5.5 0 0 1 .5.5v7a.5.5 0 0 1-1 0v-7a.5.5 0 0 1 .5-.5zM8 5a.5.5 0 0 1 .5.5v7a.5.5 0 0 1-1 0v-7A.5.5 0 0 1 8 5zm3 .5v7a.5.5 0 0 1-1 0v-7a.5.5 0 0 1 1 0z"/>
</svg>
<p class="text-muted mb-1">Garbage collection is not enabled.</p>
<p class="text-muted small mb-0">Set <code>GC_ENABLED=true</code> to enable automatic cleanup.</p>
</div>
{% endif %}
</div>
</div>
</div>
<div class="col-lg-6">
<div class="card shadow-sm border-0" style="border-radius: 1rem;">
<div class="card-header bg-transparent border-0 pt-4 pb-0 px-4">
<div class="d-flex justify-content-between align-items-start">
<div>
<h5 class="fw-semibold d-flex align-items-center gap-2 mb-1">
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" fill="currentColor" class="text-primary" viewBox="0 0 16 16">
<path d="M5.338 1.59a61.44 61.44 0 0 0-2.837.856.481.481 0 0 0-.328.39c-.554 4.157.726 7.19 2.253 9.188a10.725 10.725 0 0 0 2.287 2.233c.346.244.652.42.893.533.12.057.218.095.293.118a.55.55 0 0 0 .101.025.615.615 0 0 0 .1-.025c.076-.023.174-.061.294-.118.24-.113.547-.29.893-.533a10.726 10.726 0 0 0 2.287-2.233c1.527-1.997 2.807-5.031 2.253-9.188a.48.48 0 0 0-.328-.39c-.651-.213-1.75-.56-2.837-.855C9.552 1.29 8.531 1.067 8 1.067c-.53 0-1.552.223-2.662.524zM5.072.56C6.157.265 7.31 0 8 0s1.843.265 2.928.56c1.11.3 2.229.655 2.887.87a1.54 1.54 0 0 1 1.044 1.262c.596 4.477-.787 7.795-2.465 9.99a11.775 11.775 0 0 1-2.517 2.453 7.159 7.159 0 0 1-1.048.625c-.28.132-.581.24-.829.24s-.548-.108-.829-.24a7.158 7.158 0 0 1-1.048-.625 11.777 11.777 0 0 1-2.517-2.453C1.928 10.487.545 7.169 1.141 2.692A1.54 1.54 0 0 1 2.185 1.43 62.456 62.456 0 0 1 5.072.56z"/>
<path d="M10.854 5.146a.5.5 0 0 1 0 .708l-3 3a.5.5 0 0 1-.708 0l-1.5-1.5a.5.5 0 1 1 .708-.708L7.5 7.793l2.646-2.647a.5.5 0 0 1 .708 0z"/>
</svg>
Integrity Scanner
</h5>
<p class="text-muted small mb-0">Detect and heal corrupted objects, orphaned files, and metadata drift</p>
</div>
<div>
{% if integrity_status.enabled %}
<span class="badge bg-success bg-opacity-10 text-success">Active</span>
{% else %}
<span class="badge bg-secondary bg-opacity-10 text-secondary">Disabled</span>
{% endif %}
</div>
</div>
</div>
<div class="card-body px-4 pb-4">
{% if integrity_status.enabled %}
<div class="d-flex gap-2 flex-wrap mb-3">
<button class="btn btn-primary btn-sm d-inline-flex align-items-center" id="integrityRunBtn" onclick="runIntegrity(false, false)" {% if integrity_status.scanning %}disabled{% endif %}>
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" class="me-1 flex-shrink-0" viewBox="0 0 16 16">
<path fill-rule="evenodd" d="M8 3a5 5 0 1 0 4.546 2.914.5.5 0 0 1 .908-.417A6 6 0 1 1 8 2v1z"/>
<path d="M8 4.466V.534a.25.25 0 0 1 .41-.192l2.36 1.966c.12.1.12.284 0 .384L8.41 4.658A.25.25 0 0 1 8 4.466z"/>
</svg>
Scan Now
</button>
<button class="btn btn-outline-warning btn-sm" id="integrityHealBtn" onclick="runIntegrity(false, true)" {% if integrity_status.scanning %}disabled{% endif %}>
Scan &amp; Heal
</button>
<button class="btn btn-outline-secondary btn-sm" id="integrityDryRunBtn" onclick="runIntegrity(true, false)" {% if integrity_status.scanning %}disabled{% endif %}>
Dry Run
</button>
</div>
<div id="integrityScanningBanner" class="mb-3 {% if not integrity_status.scanning %}d-none{% endif %}">
<div class="alert alert-info mb-0 small d-flex align-items-center gap-2">
<div class="spinner-border spinner-border-sm text-info" role="status"></div>
<span>Scan in progress<span id="integrityScanElapsed"></span></span>
</div>
</div>
<div id="integrityResult" class="mb-3 d-none">
<div class="alert mb-0 small" id="integrityResultAlert">
<div class="d-flex justify-content-between align-items-start">
<div class="fw-semibold mb-1" id="integrityResultTitle"></div>
<button type="button" class="btn-close btn-close-sm" style="font-size:0.65rem" onclick="document.getElementById('integrityResult').classList.add('d-none')"></button>
</div>
<div id="integrityResultBody"></div>
</div>
</div>
<div class="border rounded p-3 mb-3" style="background: var(--bs-tertiary-bg, #f8f9fa);">
<div class="d-flex align-items-center gap-2 mb-2">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" class="text-muted" viewBox="0 0 16 16">
<path d="M9.405 1.05c-.413-1.4-2.397-1.4-2.81 0l-.1.34a1.464 1.464 0 0 1-2.105.872l-.31-.17c-1.283-.698-2.686.705-1.987 1.987l.169.311c.446.82.023 1.841-.872 2.105l-.34.1c-1.4.413-1.4 2.397 0 2.81l.34.1a1.464 1.464 0 0 1 .872 2.105l-.17.31c-.698 1.283.705 2.686 1.987 1.987l.311-.169a1.464 1.464 0 0 1 2.105.872l.1.34c.413 1.4 2.397 1.4 2.81 0l.1-.34a1.464 1.464 0 0 1 2.105-.872l.31.17c1.283.698 2.686-.705 1.987-1.987l-.169-.311a1.464 1.464 0 0 1 .872-2.105l.34-.1c1.4-.413 1.4-2.397 0-2.81l-.34-.1a1.464 1.464 0 0 1-.872-2.105l.17-.31c.698-1.283-.705-2.686-1.987-1.987l-.311.169a1.464 1.464 0 0 1-2.105-.872l-.1-.34zM8 10.93a2.929 2.929 0 1 1 0-5.86 2.929 2.929 0 0 1 0 5.858z"/>
</svg>
<span class="small fw-semibold text-muted">Configuration</span>
</div>
<div class="row small">
<div class="col-6 mb-1"><span class="text-muted">Interval:</span> {{ integrity_status.interval_hours }}h</div>
<div class="col-6 mb-1"><span class="text-muted">Dry run:</span> {{ "Yes" if integrity_status.dry_run else "No" }}</div>
<div class="col-6"><span class="text-muted">Batch size:</span> {{ integrity_status.batch_size }}</div>
<div class="col-6"><span class="text-muted">Auto-heal:</span> {{ "Yes" if integrity_status.auto_heal else "No" }}</div>
</div>
</div>
<div id="integrityHistoryContainer">
{% if integrity_history %}
<h6 class="fw-semibold small text-muted mb-2 d-flex align-items-center gap-2">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" viewBox="0 0 16 16">
<path d="M8.515 1.019A7 7 0 0 0 8 1V0a8 8 0 0 1 .589.022l-.074.997zm2.004.45a7.003 7.003 0 0 0-.985-.299l.219-.976c.383.086.76.2 1.126.342l-.36.933zm1.37.71a7.01 7.01 0 0 0-.439-.27l.493-.87a8.025 8.025 0 0 1 .979.654l-.615.789a6.996 6.996 0 0 0-.418-.302zm1.834 1.79a6.99 6.99 0 0 0-.653-.796l.724-.69c.27.285.52.59.747.91l-.818.576zm.744 1.352a7.08 7.08 0 0 0-.214-.468l.893-.45a7.976 7.976 0 0 1 .45 1.088l-.95.313a7.023 7.023 0 0 0-.179-.483zm.53 2.507a6.991 6.991 0 0 0-.1-1.025l.985-.17c.067.386.106.778.116 1.17l-1 .025zm-.131 1.538c.033-.17.06-.339.081-.51l.993.123a7.957 7.957 0 0 1-.23 1.155l-.964-.267c.046-.165.086-.332.12-.501zm-.952 2.379c.184-.29.346-.594.486-.908l.914.405c-.16.36-.345.706-.555 1.038l-.845-.535zm-.964 1.205c.122-.122.239-.248.35-.378l.758.653a8.073 8.073 0 0 1-.401.432l-.707-.707z"/>
<path d="M8 1a7 7 0 1 0 4.95 11.95l.707.707A8.001 8.001 0 1 1 8 0v1z"/>
<path d="M7.5 3a.5.5 0 0 1 .5.5v5.21l3.248 1.856a.5.5 0 0 1-.496.868l-3.5-2A.5.5 0 0 1 7 8V3.5a.5.5 0 0 1 .5-.5z"/>
</svg>
Recent Scans
</h6>
<div class="table-responsive">
<table class="table table-sm small mb-0">
<thead class="table-light">
<tr>
<th>Time</th>
<th class="text-center">Scanned</th>
<th class="text-center">Issues</th>
<th class="text-center">Healed</th>
<th class="text-center">Mode</th>
</tr>
</thead>
<tbody>
{% for exec in integrity_history %}
<tr>
<td class="text-nowrap">{{ exec.timestamp_display }}</td>
<td class="text-center">{{ exec.result.objects_scanned|d(0) }}</td>
<td class="text-center">
{% set total_issues = (exec.result.corrupted_objects|d(0)) + (exec.result.orphaned_objects|d(0)) + (exec.result.phantom_metadata|d(0)) + (exec.result.stale_versions|d(0)) + (exec.result.etag_cache_inconsistencies|d(0)) + (exec.result.legacy_metadata_drifts|d(0)) %}
{% if total_issues > 0 %}
<span class="text-danger fw-medium">{{ total_issues }}</span>
{% else %}
<span class="text-success">0</span>
{% endif %}
</td>
<td class="text-center">{{ exec.result.issues_healed|d(0) }}</td>
<td class="text-center">
{% if exec.dry_run %}
<span class="badge bg-warning bg-opacity-10 text-warning">Dry</span>
{% elif exec.auto_heal %}
<span class="badge bg-success bg-opacity-10 text-success">Heal</span>
{% else %}
<span class="badge bg-primary bg-opacity-10 text-primary">Scan</span>
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% else %}
<div class="text-center py-2">
<p class="text-muted small mb-0">No scans recorded yet.</p>
</div>
{% endif %}
</div>
{% else %}
<div class="text-center py-4">
<svg xmlns="http://www.w3.org/2000/svg" width="40" height="40" fill="currentColor" class="text-muted mb-2 opacity-50" viewBox="0 0 16 16">
<path d="M5.338 1.59a61.44 61.44 0 0 0-2.837.856.481.481 0 0 0-.328.39c-.554 4.157.726 7.19 2.253 9.188a10.725 10.725 0 0 0 2.287 2.233c.346.244.652.42.893.533.12.057.218.095.293.118a.55.55 0 0 0 .101.025.615.615 0 0 0 .1-.025c.076-.023.174-.061.294-.118.24-.113.547-.29.893-.533a10.726 10.726 0 0 0 2.287-2.233c1.527-1.997 2.807-5.031 2.253-9.188a.48.48 0 0 0-.328-.39c-.651-.213-1.75-.56-2.837-.855C9.552 1.29 8.531 1.067 8 1.067c-.53 0-1.552.223-2.662.524zM5.072.56C6.157.265 7.31 0 8 0s1.843.265 2.928.56c1.11.3 2.229.655 2.887.87a1.54 1.54 0 0 1 1.044 1.262c.596 4.477-.787 7.795-2.465 9.99a11.775 11.775 0 0 1-2.517 2.453 7.159 7.159 0 0 1-1.048.625c-.28.132-.581.24-.829.24s-.548-.108-.829-.24a7.158 7.158 0 0 1-1.048-.625 11.777 11.777 0 0 1-2.517-2.453C1.928 10.487.545 7.169 1.141 2.692A1.54 1.54 0 0 1 2.185 1.43 62.456 62.456 0 0 1 5.072.56z"/>
<path d="M10.854 5.146a.5.5 0 0 1 0 .708l-3 3a.5.5 0 0 1-.708 0l-1.5-1.5a.5.5 0 1 1 .708-.708L7.5 7.793l2.646-2.647a.5.5 0 0 1 .708 0z"/>
</svg>
<p class="text-muted mb-1">Integrity scanner is not enabled.</p>
<p class="text-muted small mb-0">Set <code>INTEGRITY_ENABLED=true</code> to enable automatic scanning.</p>
</div>
{% endif %}
</div>
</div>
</div>
</div>
{% endblock %}
{% block extra_scripts %}
<script>
(function () {
var csrfToken = document.querySelector('meta[name="csrf-token"]')?.getAttribute('content') || '';
function setLoading(btnId, loading, spinnerOnly) {
var btn = document.getElementById(btnId);
if (!btn) return;
btn.disabled = loading;
if (loading && !spinnerOnly) {
btn.dataset.originalHtml = btn.innerHTML;
btn.innerHTML = '<span class="spinner-border spinner-border-sm me-1" role="status"></span>Running...';
} else if (!loading && btn.dataset.originalHtml) {
btn.innerHTML = btn.dataset.originalHtml;
}
}
function formatBytes(bytes) {
if (!bytes || bytes === 0) return '0 B';
var units = ['B', 'KB', 'MB', 'GB'];
var i = 0;
var b = bytes;
while (b >= 1024 && i < units.length - 1) { b /= 1024; i++; }
return (i === 0 ? b : b.toFixed(1)) + ' ' + units[i];
}
var _displayTimezone = {{ display_timezone|tojson }};
function formatTimestamp(ts) {
var d = new Date(ts * 1000);
try {
var opts = {year: 'numeric', month: 'short', day: '2-digit', hour: '2-digit', minute: '2-digit', hour12: false, timeZone: _displayTimezone, timeZoneName: 'short'};
return d.toLocaleString('en-US', opts);
} catch (e) {
var pad = function (n) { return n < 10 ? '0' + n : '' + n; };
return d.getUTCFullYear() + '-' + pad(d.getUTCMonth() + 1) + '-' + pad(d.getUTCDate()) +
' ' + pad(d.getUTCHours()) + ':' + pad(d.getUTCMinutes()) + ' UTC';
}
}
var _gcHistoryIcon = '<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" viewBox="0 0 16 16">' +
'<path d="M8.515 1.019A7 7 0 0 0 8 1V0a8 8 0 0 1 .589.022l-.074.997zm2.004.45a7.003 7.003 0 0 0-.985-.299l.219-.976c.383.086.76.2 1.126.342l-.36.933zm1.37.71a7.01 7.01 0 0 0-.439-.27l.493-.87a8.025 8.025 0 0 1 .979.654l-.615.789a6.996 6.996 0 0 0-.418-.302zm1.834 1.79a6.99 6.99 0 0 0-.653-.796l.724-.69c.27.285.52.59.747.91l-.818.576zm.744 1.352a7.08 7.08 0 0 0-.214-.468l.893-.45a7.976 7.976 0 0 1 .45 1.088l-.95.313a7.023 7.023 0 0 0-.179-.483zm.53 2.507a6.991 6.991 0 0 0-.1-1.025l.985-.17c.067.386.106.778.116 1.17l-1 .025zm-.131 1.538c.033-.17.06-.339.081-.51l.993.123a7.957 7.957 0 0 1-.23 1.155l-.964-.267c.046-.165.086-.332.12-.501zm-.952 2.379c.184-.29.346-.594.486-.908l.914.405c-.16.36-.345.706-.555 1.038l-.845-.535zm-.964 1.205c.122-.122.239-.248.35-.378l.758.653a8.073 8.073 0 0 1-.401.432l-.707-.707z"/>' +
'<path d="M8 1a7 7 0 1 0 4.95 11.95l.707.707A8.001 8.001 0 1 1 8 0v1z"/>' +
'<path d="M7.5 3a.5.5 0 0 1 .5.5v5.21l3.248 1.856a.5.5 0 0 1-.496.868l-3.5-2A.5.5 0 0 1 7 8V3.5a.5.5 0 0 1 .5-.5z"/></svg>';
function _gcRefreshHistory() {
fetch('{{ url_for("ui.system_gc_history") }}?limit=10', {
headers: {'X-CSRFToken': csrfToken}
})
.then(function (r) { return r.json(); })
.then(function (hist) {
var container = document.getElementById('gcHistoryContainer');
if (!container) return;
var execs = hist.executions || [];
if (execs.length === 0) {
container.innerHTML = '<div class="text-center py-2"><p class="text-muted small mb-0">No executions recorded yet.</p></div>';
return;
}
var html = '<h6 class="fw-semibold small text-muted mb-2 d-flex align-items-center gap-2">' +
_gcHistoryIcon + ' Recent Executions</h6>' +
'<div class="table-responsive"><table class="table table-sm small mb-0">' +
'<thead class="table-light"><tr><th>Time</th><th class="text-center">Cleaned</th>' +
'<th class="text-center">Freed</th><th class="text-center">Mode</th></tr></thead><tbody>';
execs.forEach(function (exec) {
var r = exec.result || {};
var cleaned = (r.temp_files_deleted || 0) + (r.multipart_uploads_deleted || 0) +
(r.lock_files_deleted || 0) + (r.orphaned_metadata_deleted || 0) +
(r.orphaned_versions_deleted || 0) + (r.empty_dirs_removed || 0);
var freed = (r.temp_bytes_freed || 0) + (r.multipart_bytes_freed || 0) +
(r.orphaned_version_bytes_freed || 0);
var mode = exec.dry_run
? '<span class="badge bg-warning bg-opacity-10 text-warning">Dry run</span>'
: '<span class="badge bg-primary bg-opacity-10 text-primary">Live</span>';
html += '<tr><td class="text-nowrap">' + formatTimestamp(exec.timestamp) + '</td>' +
'<td class="text-center">' + cleaned + '</td>' +
'<td class="text-center">' + formatBytes(freed) + '</td>' +
'<td class="text-center">' + mode + '</td></tr>';
});
html += '</tbody></table></div>';
container.innerHTML = html;
})
.catch(function () {});
}
function _integrityRefreshHistory() {
fetch('{{ url_for("ui.system_integrity_history") }}?limit=10', {
headers: {'X-CSRFToken': csrfToken}
})
.then(function (r) { return r.json(); })
.then(function (hist) {
var container = document.getElementById('integrityHistoryContainer');
if (!container) return;
var execs = hist.executions || [];
if (execs.length === 0) {
container.innerHTML = '<div class="text-center py-2"><p class="text-muted small mb-0">No scans recorded yet.</p></div>';
return;
}
var html = '<h6 class="fw-semibold small text-muted mb-2 d-flex align-items-center gap-2">' +
_gcHistoryIcon + ' Recent Scans</h6>' +
'<div class="table-responsive"><table class="table table-sm small mb-0">' +
'<thead class="table-light"><tr><th>Time</th><th class="text-center">Scanned</th>' +
'<th class="text-center">Issues</th><th class="text-center">Healed</th>' +
'<th class="text-center">Mode</th></tr></thead><tbody>';
execs.forEach(function (exec) {
var r = exec.result || {};
var issues = (r.corrupted_objects || 0) + (r.orphaned_objects || 0) +
(r.phantom_metadata || 0) + (r.stale_versions || 0) +
(r.etag_cache_inconsistencies || 0) + (r.legacy_metadata_drifts || 0);
var issueHtml = issues > 0
? '<span class="text-danger fw-medium">' + issues + '</span>'
: '<span class="text-success">0</span>';
var mode = exec.dry_run
? '<span class="badge bg-warning bg-opacity-10 text-warning">Dry</span>'
: (exec.auto_heal
? '<span class="badge bg-success bg-opacity-10 text-success">Heal</span>'
: '<span class="badge bg-primary bg-opacity-10 text-primary">Scan</span>');
html += '<tr><td class="text-nowrap">' + formatTimestamp(exec.timestamp) + '</td>' +
'<td class="text-center">' + (r.objects_scanned || 0) + '</td>' +
'<td class="text-center">' + issueHtml + '</td>' +
'<td class="text-center">' + (r.issues_healed || 0) + '</td>' +
'<td class="text-center">' + mode + '</td></tr>';
});
html += '</tbody></table></div>';
container.innerHTML = html;
})
.catch(function () {});
}
var _gcPollTimer = null;
var _gcLastDryRun = false;
function _gcSetScanning(scanning) {
var banner = document.getElementById('gcScanningBanner');
var btns = ['gcRunBtn', 'gcDryRunBtn'];
if (scanning) {
banner.classList.remove('d-none');
btns.forEach(function (id) {
var el = document.getElementById(id);
if (el) el.disabled = true;
});
} else {
banner.classList.add('d-none');
document.getElementById('gcScanElapsed').textContent = '';
btns.forEach(function (id) {
var el = document.getElementById(id);
if (el) el.disabled = false;
});
}
}
function _gcShowResult(data, dryRun) {
var container = document.getElementById('gcResult');
var alert = document.getElementById('gcResultAlert');
var title = document.getElementById('gcResultTitle');
var body = document.getElementById('gcResultBody');
container.classList.remove('d-none');
var totalItems = (data.temp_files_deleted || 0) + (data.multipart_uploads_deleted || 0) +
(data.lock_files_deleted || 0) + (data.orphaned_metadata_deleted || 0) +
(data.orphaned_versions_deleted || 0) + (data.empty_dirs_removed || 0);
var totalFreed = (data.temp_bytes_freed || 0) + (data.multipart_bytes_freed || 0) +
(data.orphaned_version_bytes_freed || 0);
alert.className = totalItems > 0 ? 'alert alert-success mb-0 small' : 'alert alert-info mb-0 small';
title.textContent = (dryRun ? '[Dry Run] ' : '') + 'Completed in ' + (data.execution_time_seconds || 0).toFixed(2) + 's';
var lines = [];
if (data.temp_files_deleted) lines.push('Temp files: ' + data.temp_files_deleted + ' (' + formatBytes(data.temp_bytes_freed) + ')');
if (data.multipart_uploads_deleted) lines.push('Multipart uploads: ' + data.multipart_uploads_deleted + ' (' + formatBytes(data.multipart_bytes_freed) + ')');
if (data.lock_files_deleted) lines.push('Lock files: ' + data.lock_files_deleted);
if (data.orphaned_metadata_deleted) lines.push('Orphaned metadata: ' + data.orphaned_metadata_deleted);
if (data.orphaned_versions_deleted) lines.push('Orphaned versions: ' + data.orphaned_versions_deleted + ' (' + formatBytes(data.orphaned_version_bytes_freed) + ')');
if (data.empty_dirs_removed) lines.push('Empty directories: ' + data.empty_dirs_removed);
if (totalItems === 0) lines.push('Nothing to clean up.');
if (totalFreed > 0) lines.push('Total freed: ' + formatBytes(totalFreed));
if (data.errors && data.errors.length > 0) lines.push('Errors: ' + data.errors.join(', '));
body.innerHTML = lines.join('<br>');
}
function _gcPoll() {
fetch('{{ url_for("ui.system_gc_status") }}', {
headers: {'X-CSRFToken': csrfToken}
})
.then(function (r) { return r.json(); })
.then(function (status) {
if (status.scanning) {
var elapsed = status.scan_elapsed_seconds || 0;
document.getElementById('gcScanElapsed').textContent = ' (' + elapsed.toFixed(0) + 's)';
_gcPollTimer = setTimeout(_gcPoll, 2000);
} else {
_gcSetScanning(false);
_gcRefreshHistory();
fetch('{{ url_for("ui.system_gc_history") }}?limit=1', {
headers: {'X-CSRFToken': csrfToken}
})
.then(function (r) { return r.json(); })
.then(function (hist) {
if (hist.executions && hist.executions.length > 0) {
var latest = hist.executions[0];
_gcShowResult(latest.result, latest.dry_run);
}
})
.catch(function () {});
}
})
.catch(function () {
_gcPollTimer = setTimeout(_gcPoll, 3000);
});
}
window.runGC = function (dryRun) {
_gcLastDryRun = dryRun;
document.getElementById('gcResult').classList.add('d-none');
_gcSetScanning(true);
fetch('{{ url_for("ui.system_gc_run") }}', {
method: 'POST',
headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
body: JSON.stringify({dry_run: dryRun})
})
.then(function (r) { return r.json(); })
.then(function (data) {
if (data.error) {
_gcSetScanning(false);
var container = document.getElementById('gcResult');
var alert = document.getElementById('gcResultAlert');
var title = document.getElementById('gcResultTitle');
var body = document.getElementById('gcResultBody');
container.classList.remove('d-none');
alert.className = 'alert alert-danger mb-0 small';
title.textContent = 'Error';
body.textContent = data.error;
return;
}
_gcPollTimer = setTimeout(_gcPoll, 2000);
})
.catch(function (err) {
_gcSetScanning(false);
var container = document.getElementById('gcResult');
var alert = document.getElementById('gcResultAlert');
var title = document.getElementById('gcResultTitle');
var body = document.getElementById('gcResultBody');
container.classList.remove('d-none');
alert.className = 'alert alert-danger mb-0 small';
title.textContent = 'Error';
body.textContent = err.message;
});
};
{% if gc_status.scanning %}
_gcSetScanning(true);
_gcPollTimer = setTimeout(_gcPoll, 2000);
{% endif %}
var _integrityPollTimer = null;
var _integrityLastMode = {dryRun: false, autoHeal: false};
function _integritySetScanning(scanning) {
var banner = document.getElementById('integrityScanningBanner');
var btns = ['integrityRunBtn', 'integrityHealBtn', 'integrityDryRunBtn'];
if (scanning) {
banner.classList.remove('d-none');
btns.forEach(function (id) {
var el = document.getElementById(id);
if (el) el.disabled = true;
});
} else {
banner.classList.add('d-none');
document.getElementById('integrityScanElapsed').textContent = '';
btns.forEach(function (id) {
var el = document.getElementById(id);
if (el) el.disabled = false;
});
}
}
function _integrityShowResult(data, dryRun, autoHeal) {
var container = document.getElementById('integrityResult');
var alert = document.getElementById('integrityResultAlert');
var title = document.getElementById('integrityResultTitle');
var body = document.getElementById('integrityResultBody');
container.classList.remove('d-none');
var totalIssues = (data.corrupted_objects || 0) + (data.orphaned_objects || 0) +
(data.phantom_metadata || 0) + (data.stale_versions || 0) +
(data.etag_cache_inconsistencies || 0) + (data.legacy_metadata_drifts || 0);
var prefix = dryRun ? '[Dry Run] ' : (autoHeal ? '[Heal] ' : '');
alert.className = totalIssues > 0 ? 'alert alert-warning mb-0 small' : 'alert alert-success mb-0 small';
title.textContent = prefix + 'Completed in ' + (data.execution_time_seconds || 0).toFixed(2) + 's';
var lines = [];
lines.push('Scanned: ' + (data.objects_scanned || 0) + ' objects in ' + (data.buckets_scanned || 0) + ' buckets');
if (totalIssues === 0) {
lines.push('No issues found.');
} else {
if (data.corrupted_objects) lines.push('Corrupted objects: ' + data.corrupted_objects);
if (data.orphaned_objects) lines.push('Orphaned objects: ' + data.orphaned_objects);
if (data.phantom_metadata) lines.push('Phantom metadata: ' + data.phantom_metadata);
if (data.stale_versions) lines.push('Stale versions: ' + data.stale_versions);
if (data.etag_cache_inconsistencies) lines.push('ETag inconsistencies: ' + data.etag_cache_inconsistencies);
if (data.legacy_metadata_drifts) lines.push('Legacy metadata drifts: ' + data.legacy_metadata_drifts);
if (data.issues_healed) lines.push('Issues healed: ' + data.issues_healed);
}
if (data.errors && data.errors.length > 0) lines.push('Errors: ' + data.errors.join(', '));
body.innerHTML = lines.join('<br>');
}
function _integrityPoll() {
fetch('{{ url_for("ui.system_integrity_status") }}', {
headers: {'X-CSRFToken': csrfToken}
})
.then(function (r) { return r.json(); })
.then(function (status) {
if (status.scanning) {
var elapsed = status.scan_elapsed_seconds || 0;
document.getElementById('integrityScanElapsed').textContent = ' (' + elapsed.toFixed(0) + 's)';
_integrityPollTimer = setTimeout(_integrityPoll, 2000);
} else {
_integritySetScanning(false);
_integrityRefreshHistory();
fetch('{{ url_for("ui.system_integrity_history") }}?limit=1', {
headers: {'X-CSRFToken': csrfToken}
})
.then(function (r) { return r.json(); })
.then(function (hist) {
if (hist.executions && hist.executions.length > 0) {
var latest = hist.executions[0];
_integrityShowResult(latest.result, latest.dry_run, latest.auto_heal);
}
})
.catch(function () {});
}
})
.catch(function () {
_integrityPollTimer = setTimeout(_integrityPoll, 3000);
});
}
window.runIntegrity = function (dryRun, autoHeal) {
_integrityLastMode = {dryRun: dryRun, autoHeal: autoHeal};
document.getElementById('integrityResult').classList.add('d-none');
_integritySetScanning(true);
fetch('{{ url_for("ui.system_integrity_run") }}', {
method: 'POST',
headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
body: JSON.stringify({dry_run: dryRun, auto_heal: autoHeal})
})
.then(function (r) { return r.json(); })
.then(function (data) {
if (data.error) {
_integritySetScanning(false);
var container = document.getElementById('integrityResult');
var alert = document.getElementById('integrityResultAlert');
var title = document.getElementById('integrityResultTitle');
var body = document.getElementById('integrityResultBody');
container.classList.remove('d-none');
alert.className = 'alert alert-danger mb-0 small';
title.textContent = 'Error';
body.textContent = data.error;
return;
}
_integrityPollTimer = setTimeout(_integrityPoll, 2000);
})
.catch(function (err) {
_integritySetScanning(false);
var container = document.getElementById('integrityResult');
var alert = document.getElementById('integrityResultAlert');
var title = document.getElementById('integrityResultTitle');
var body = document.getElementById('integrityResultBody');
container.classList.remove('d-none');
alert.className = 'alert alert-danger mb-0 small';
title.textContent = 'Error';
body.textContent = err.message;
});
};
{% if integrity_status.scanning %}
_integritySetScanning(true);
_integrityPollTimer = setTimeout(_integrityPoll, 2000);
{% endif %}
})();
</script>
{% endblock %}

View File

@@ -27,7 +27,10 @@ def app(tmp_path: Path):
"access_key": "test",
"secret_key": "secret",
"display_name": "Test User",
"policies": [{"bucket": "*", "actions": ["list", "read", "write", "delete", "policy"]}],
"policies": [{"bucket": "*", "actions": ["list", "read", "write", "delete", "policy",
"create_bucket", "delete_bucket", "share", "versioning", "tagging",
"encryption", "cors", "lifecycle", "replication", "quota",
"object_lock", "notification", "logging", "website"]}],
}
]
}

View File

@@ -1,3 +1,56 @@
import hashlib
import hmac
from datetime import datetime, timezone
from urllib.parse import quote
def _build_presigned_query(path: str, *, access_key: str = "test", secret_key: str = "secret", expires: int = 60) -> str:
now = datetime.now(timezone.utc)
amz_date = now.strftime("%Y%m%dT%H%M%SZ")
date_stamp = now.strftime("%Y%m%d")
region = "us-east-1"
service = "s3"
credential_scope = f"{date_stamp}/{region}/{service}/aws4_request"
query_items = [
("X-Amz-Algorithm", "AWS4-HMAC-SHA256"),
("X-Amz-Content-Sha256", "UNSIGNED-PAYLOAD"),
("X-Amz-Credential", f"{access_key}/{credential_scope}"),
("X-Amz-Date", amz_date),
("X-Amz-Expires", str(expires)),
("X-Amz-SignedHeaders", "host"),
]
canonical_query = "&".join(
f"{quote(k, safe='-_.~')}={quote(v, safe='-_.~')}" for k, v in sorted(query_items)
)
canonical_request = "\n".join([
"GET",
quote(path, safe="/-_.~"),
canonical_query,
"host:localhost\n",
"host",
"UNSIGNED-PAYLOAD",
])
hashed_request = hashlib.sha256(canonical_request.encode("utf-8")).hexdigest()
string_to_sign = "\n".join([
"AWS4-HMAC-SHA256",
amz_date,
credential_scope,
hashed_request,
])
def _sign(key: bytes, msg: str) -> bytes:
return hmac.new(key, msg.encode("utf-8"), hashlib.sha256).digest()
k_date = _sign(("AWS4" + secret_key).encode("utf-8"), date_stamp)
k_region = _sign(k_date, region)
k_service = _sign(k_region, service)
signing_key = _sign(k_service, "aws4_request")
signature = hmac.new(signing_key, string_to_sign.encode("utf-8"), hashlib.sha256).hexdigest()
return canonical_query + f"&X-Amz-Signature={signature}"
def test_bucket_and_object_lifecycle(client, signer):
headers = signer("PUT", "/photos")
response = client.put("/photos", headers=headers)
@@ -114,6 +167,45 @@ def test_missing_credentials_denied(client):
assert response.status_code == 403
def test_presigned_url_denied_for_disabled_user(client, signer):
headers = signer("PUT", "/secure")
assert client.put("/secure", headers=headers).status_code == 200
payload = b"hello"
headers = signer("PUT", "/secure/file.txt", body=payload)
assert client.put("/secure/file.txt", headers=headers, data=payload).status_code == 200
iam = client.application.extensions["iam"]
iam.disable_user("test")
query = _build_presigned_query("/secure/file.txt")
response = client.get(f"/secure/file.txt?{query}", headers={"Host": "localhost"})
assert response.status_code == 403
assert b"User account is disabled" in response.data
def test_presigned_url_denied_for_inactive_key(client, signer):
headers = signer("PUT", "/secure2")
assert client.put("/secure2", headers=headers).status_code == 200
payload = b"hello"
headers = signer("PUT", "/secure2/file.txt", body=payload)
assert client.put("/secure2/file.txt", headers=headers, data=payload).status_code == 200
iam = client.application.extensions["iam"]
for user in iam._raw_config.get("users", []):
for key_info in user.get("access_keys", []):
if key_info.get("access_key") == "test":
key_info["status"] = "inactive"
iam._save()
iam._load()
query = _build_presigned_query("/secure2/file.txt")
response = client.get(f"/secure2/file.txt?{query}", headers={"Host": "localhost"})
assert response.status_code == 403
assert b"Access key is inactive" in response.data
def test_bucket_policies_deny_reads(client, signer):
import json

View File

@@ -317,7 +317,7 @@ class TestAdminAPI:
)
assert resp.status_code == 200
data = resp.get_json()
assert "temp_files_deleted" in data
assert data["status"] == "started"
def test_gc_dry_run(self, gc_app):
client = gc_app.test_client()
@@ -329,11 +329,17 @@ class TestAdminAPI:
)
assert resp.status_code == 200
data = resp.get_json()
assert "temp_files_deleted" in data
assert data["status"] == "started"
def test_gc_history(self, gc_app):
import time
client = gc_app.test_client()
client.post("/admin/gc/run", headers={"X-Access-Key": "admin", "X-Secret-Key": "adminsecret"})
for _ in range(50):
time.sleep(0.1)
status = client.get("/admin/gc/status", headers={"X-Access-Key": "admin", "X-Secret-Key": "adminsecret"}).get_json()
if not status.get("scanning"):
break
resp = client.get("/admin/gc/history", headers={"X-Access-Key": "admin", "X-Secret-Key": "adminsecret"})
assert resp.status_code == 200
data = resp.get_json()

View File

@@ -0,0 +1,788 @@
import hashlib
import json
import os
import sys
import time
from pathlib import Path
import pytest
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
from app.integrity import IntegrityChecker, IntegrityCursorStore, IntegrityResult
def _wait_scan_done(client, headers, timeout=10):
deadline = time.time() + timeout
while time.time() < deadline:
resp = client.get("/admin/integrity/status", headers=headers)
data = resp.get_json()
if not data.get("scanning"):
return
time.sleep(0.1)
raise TimeoutError("scan did not complete")
def _md5(data: bytes) -> str:
return hashlib.md5(data).hexdigest()
def _setup_bucket(storage_root: Path, bucket_name: str, objects: dict[str, bytes]) -> None:
bucket_path = storage_root / bucket_name
bucket_path.mkdir(parents=True, exist_ok=True)
meta_root = storage_root / ".myfsio.sys" / "buckets" / bucket_name / "meta"
meta_root.mkdir(parents=True, exist_ok=True)
bucket_json = storage_root / ".myfsio.sys" / "buckets" / bucket_name / ".bucket.json"
bucket_json.write_text(json.dumps({"created": "2025-01-01"}))
for key, data in objects.items():
obj_path = bucket_path / key
obj_path.parent.mkdir(parents=True, exist_ok=True)
obj_path.write_bytes(data)
etag = _md5(data)
stat = obj_path.stat()
meta = {
"__etag__": etag,
"__size__": str(stat.st_size),
"__last_modified__": str(stat.st_mtime),
}
key_path = Path(key)
parent = key_path.parent
key_name = key_path.name
if parent == Path("."):
index_path = meta_root / "_index.json"
else:
index_path = meta_root / parent / "_index.json"
index_path.parent.mkdir(parents=True, exist_ok=True)
index_data = {}
if index_path.exists():
index_data = json.loads(index_path.read_text())
index_data[key_name] = {"metadata": meta}
index_path.write_text(json.dumps(index_data))
def _issues_of_type(result, issue_type):
return [i for i in result.issues if i.issue_type == issue_type]
@pytest.fixture
def storage_root(tmp_path):
root = tmp_path / "data"
root.mkdir()
(root / ".myfsio.sys" / "config").mkdir(parents=True, exist_ok=True)
return root
@pytest.fixture
def checker(storage_root):
return IntegrityChecker(
storage_root=storage_root,
interval_hours=24.0,
batch_size=1000,
auto_heal=False,
dry_run=False,
)
class TestCorruptedObjects:
def test_detect_corrupted(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello world"})
(storage_root / "mybucket" / "file.txt").write_bytes(b"corrupted data")
result = checker.run_now()
assert result.corrupted_objects == 1
issues = _issues_of_type(result, "corrupted_object")
assert len(issues) == 1
assert issues[0].bucket == "mybucket"
assert issues[0].key == "file.txt"
assert not issues[0].healed
def test_heal_corrupted(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello world"})
(storage_root / "mybucket" / "file.txt").write_bytes(b"corrupted data")
result = checker.run_now(auto_heal=True)
assert result.corrupted_objects == 1
assert result.issues_healed == 1
issues = _issues_of_type(result, "corrupted_object")
assert issues[0].healed
result2 = checker.run_now()
assert result2.corrupted_objects == 0
def test_valid_objects_pass(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello world"})
result = checker.run_now()
assert result.corrupted_objects == 0
assert result.objects_scanned >= 1
def test_corrupted_nested_key(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"sub/dir/file.txt": b"nested content"})
(storage_root / "mybucket" / "sub" / "dir" / "file.txt").write_bytes(b"bad")
result = checker.run_now()
assert result.corrupted_objects == 1
issues = _issues_of_type(result, "corrupted_object")
assert issues[0].key == "sub/dir/file.txt"
class TestOrphanedObjects:
def test_detect_orphaned(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {})
(storage_root / "mybucket" / "orphan.txt").write_bytes(b"orphan data")
result = checker.run_now()
assert result.orphaned_objects == 1
issues = _issues_of_type(result, "orphaned_object")
assert len(issues) == 1
def test_heal_orphaned(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {})
(storage_root / "mybucket" / "orphan.txt").write_bytes(b"orphan data")
result = checker.run_now(auto_heal=True)
assert result.orphaned_objects == 1
assert result.issues_healed == 1
issues = _issues_of_type(result, "orphaned_object")
assert issues[0].healed
result2 = checker.run_now()
assert result2.orphaned_objects == 0
assert result2.objects_scanned >= 1
class TestPhantomMetadata:
def test_detect_phantom(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
(storage_root / "mybucket" / "file.txt").unlink()
result = checker.run_now()
assert result.phantom_metadata == 1
issues = _issues_of_type(result, "phantom_metadata")
assert len(issues) == 1
def test_heal_phantom(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
(storage_root / "mybucket" / "file.txt").unlink()
result = checker.run_now(auto_heal=True)
assert result.phantom_metadata == 1
assert result.issues_healed == 1
result2 = checker.run_now()
assert result2.phantom_metadata == 0
class TestStaleVersions:
def test_manifest_without_data(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
versions_root = storage_root / ".myfsio.sys" / "buckets" / "mybucket" / "versions" / "file.txt"
versions_root.mkdir(parents=True)
(versions_root / "v1.json").write_text(json.dumps({"etag": "abc"}))
result = checker.run_now()
assert result.stale_versions == 1
issues = _issues_of_type(result, "stale_version")
assert "manifest without data" in issues[0].detail
def test_data_without_manifest(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
versions_root = storage_root / ".myfsio.sys" / "buckets" / "mybucket" / "versions" / "file.txt"
versions_root.mkdir(parents=True)
(versions_root / "v1.bin").write_bytes(b"old data")
result = checker.run_now()
assert result.stale_versions == 1
issues = _issues_of_type(result, "stale_version")
assert "data without manifest" in issues[0].detail
def test_heal_stale_versions(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
versions_root = storage_root / ".myfsio.sys" / "buckets" / "mybucket" / "versions" / "file.txt"
versions_root.mkdir(parents=True)
(versions_root / "v1.json").write_text(json.dumps({"etag": "abc"}))
(versions_root / "v2.bin").write_bytes(b"old data")
result = checker.run_now(auto_heal=True)
assert result.stale_versions == 2
assert result.issues_healed == 2
assert not (versions_root / "v1.json").exists()
assert not (versions_root / "v2.bin").exists()
def test_valid_versions_pass(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
versions_root = storage_root / ".myfsio.sys" / "buckets" / "mybucket" / "versions" / "file.txt"
versions_root.mkdir(parents=True)
(versions_root / "v1.json").write_text(json.dumps({"etag": "abc"}))
(versions_root / "v1.bin").write_bytes(b"old data")
result = checker.run_now()
assert result.stale_versions == 0
class TestEtagCache:
def test_detect_mismatch(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
etag_path = storage_root / ".myfsio.sys" / "buckets" / "mybucket" / "etag_index.json"
etag_path.write_text(json.dumps({"file.txt": "wrong_etag"}))
result = checker.run_now()
assert result.etag_cache_inconsistencies == 1
issues = _issues_of_type(result, "etag_cache_inconsistency")
assert len(issues) == 1
def test_heal_mismatch(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
etag_path = storage_root / ".myfsio.sys" / "buckets" / "mybucket" / "etag_index.json"
etag_path.write_text(json.dumps({"file.txt": "wrong_etag"}))
result = checker.run_now(auto_heal=True)
assert result.etag_cache_inconsistencies == 1
assert result.issues_healed == 1
assert not etag_path.exists()
class TestLegacyMetadata:
def test_detect_unmigrated(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
legacy_meta = storage_root / "mybucket" / ".meta" / "file.txt.meta.json"
legacy_meta.parent.mkdir(parents=True)
legacy_meta.write_text(json.dumps({"__etag__": "different_value"}))
meta_root = storage_root / ".myfsio.sys" / "buckets" / "mybucket" / "meta"
index_path = meta_root / "_index.json"
index_path.unlink()
result = checker.run_now()
assert result.legacy_metadata_drifts == 1
issues = _issues_of_type(result, "legacy_metadata_drift")
assert len(issues) == 1
assert issues[0].detail == "unmigrated legacy .meta.json"
def test_detect_drift(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
legacy_meta = storage_root / "mybucket" / ".meta" / "file.txt.meta.json"
legacy_meta.parent.mkdir(parents=True)
legacy_meta.write_text(json.dumps({"__etag__": "different_value"}))
result = checker.run_now()
assert result.legacy_metadata_drifts == 1
issues = _issues_of_type(result, "legacy_metadata_drift")
assert "differs from index" in issues[0].detail
def test_heal_unmigrated(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
legacy_meta = storage_root / "mybucket" / ".meta" / "file.txt.meta.json"
legacy_meta.parent.mkdir(parents=True)
legacy_data = {"__etag__": _md5(b"hello"), "__size__": "5"}
legacy_meta.write_text(json.dumps(legacy_data))
meta_root = storage_root / ".myfsio.sys" / "buckets" / "mybucket" / "meta"
index_path = meta_root / "_index.json"
index_path.unlink()
result = checker.run_now(auto_heal=True)
assert result.legacy_metadata_drifts == 1
legacy_issues = _issues_of_type(result, "legacy_metadata_drift")
assert len(legacy_issues) == 1
assert legacy_issues[0].healed
assert not legacy_meta.exists()
index_data = json.loads(index_path.read_text())
assert "file.txt" in index_data
assert index_data["file.txt"]["metadata"]["__etag__"] == _md5(b"hello")
def test_heal_drift(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
legacy_meta = storage_root / "mybucket" / ".meta" / "file.txt.meta.json"
legacy_meta.parent.mkdir(parents=True)
legacy_meta.write_text(json.dumps({"__etag__": "different_value"}))
result = checker.run_now(auto_heal=True)
assert result.legacy_metadata_drifts == 1
legacy_issues = _issues_of_type(result, "legacy_metadata_drift")
assert legacy_issues[0].healed
assert not legacy_meta.exists()
class TestDryRun:
def test_dry_run_no_changes(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
(storage_root / "mybucket" / "file.txt").write_bytes(b"corrupted")
(storage_root / "mybucket" / "orphan.txt").write_bytes(b"orphan")
result = checker.run_now(auto_heal=True, dry_run=True)
assert result.corrupted_objects == 1
assert result.orphaned_objects == 1
assert result.issues_healed == 0
meta_root = storage_root / ".myfsio.sys" / "buckets" / "mybucket" / "meta"
index_data = json.loads((meta_root / "_index.json").read_text())
assert "orphan.txt" not in index_data
class TestBatchSize:
def test_batch_limits_scan(self, storage_root):
objects = {f"file{i}.txt": f"data{i}".encode() for i in range(10)}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(
storage_root=storage_root,
batch_size=3,
)
result = checker.run_now()
assert result.objects_scanned <= 3
class TestHistory:
def test_history_recorded(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
checker.run_now()
history = checker.get_history()
assert len(history) == 1
assert "corrupted_objects" in history[0]["result"]
def test_history_multiple(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
checker.run_now()
checker.run_now()
checker.run_now()
history = checker.get_history()
assert len(history) == 3
def test_history_pagination(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
for _ in range(5):
checker.run_now()
history = checker.get_history(limit=2, offset=1)
assert len(history) == 2
AUTH_HEADERS = {"X-Access-Key": "admin", "X-Secret-Key": "adminsecret"}
class TestAdminAPI:
@pytest.fixture
def integrity_app(self, tmp_path):
from app import create_api_app
storage_root = tmp_path / "data"
iam_config = tmp_path / "iam.json"
bucket_policies = tmp_path / "bucket_policies.json"
iam_payload = {
"users": [
{
"access_key": "admin",
"secret_key": "adminsecret",
"display_name": "Admin",
"policies": [{"bucket": "*", "actions": ["list", "read", "write", "delete", "policy", "iam:*"]}],
}
]
}
iam_config.write_text(json.dumps(iam_payload))
flask_app = create_api_app({
"TESTING": True,
"SECRET_KEY": "testing",
"STORAGE_ROOT": storage_root,
"IAM_CONFIG": iam_config,
"BUCKET_POLICY_PATH": bucket_policies,
"API_BASE_URL": "http://testserver",
"INTEGRITY_ENABLED": True,
"INTEGRITY_AUTO_HEAL": False,
"INTEGRITY_DRY_RUN": False,
})
yield flask_app
storage = flask_app.extensions.get("object_storage")
if storage:
base = getattr(storage, "storage", storage)
if hasattr(base, "shutdown_stats"):
base.shutdown_stats()
ic = flask_app.extensions.get("integrity")
if ic:
ic.stop()
def test_status_endpoint(self, integrity_app):
client = integrity_app.test_client()
resp = client.get("/admin/integrity/status", headers=AUTH_HEADERS)
assert resp.status_code == 200
data = resp.get_json()
assert data["enabled"] is True
assert "interval_hours" in data
def test_run_endpoint(self, integrity_app):
client = integrity_app.test_client()
resp = client.post("/admin/integrity/run", headers=AUTH_HEADERS, json={})
assert resp.status_code == 200
data = resp.get_json()
assert data["status"] == "started"
_wait_scan_done(client, AUTH_HEADERS)
resp = client.get("/admin/integrity/history?limit=1", headers=AUTH_HEADERS)
hist = resp.get_json()
assert len(hist["executions"]) >= 1
assert "corrupted_objects" in hist["executions"][0]["result"]
assert "objects_scanned" in hist["executions"][0]["result"]
def test_run_with_overrides(self, integrity_app):
client = integrity_app.test_client()
resp = client.post(
"/admin/integrity/run",
headers=AUTH_HEADERS,
json={"dry_run": True, "auto_heal": True},
)
assert resp.status_code == 200
_wait_scan_done(client, AUTH_HEADERS)
def test_history_endpoint(self, integrity_app):
client = integrity_app.test_client()
client.post("/admin/integrity/run", headers=AUTH_HEADERS, json={})
_wait_scan_done(client, AUTH_HEADERS)
resp = client.get("/admin/integrity/history", headers=AUTH_HEADERS)
assert resp.status_code == 200
data = resp.get_json()
assert "executions" in data
assert len(data["executions"]) >= 1
def test_auth_required(self, integrity_app):
client = integrity_app.test_client()
resp = client.get("/admin/integrity/status")
assert resp.status_code in (401, 403)
def test_disabled_status(self, tmp_path):
from app import create_api_app
storage_root = tmp_path / "data2"
iam_config = tmp_path / "iam2.json"
bucket_policies = tmp_path / "bp2.json"
iam_payload = {
"users": [
{
"access_key": "admin",
"secret_key": "adminsecret",
"display_name": "Admin",
"policies": [{"bucket": "*", "actions": ["list", "read", "write", "delete", "policy", "iam:*"]}],
}
]
}
iam_config.write_text(json.dumps(iam_payload))
flask_app = create_api_app({
"TESTING": True,
"SECRET_KEY": "testing",
"STORAGE_ROOT": storage_root,
"IAM_CONFIG": iam_config,
"BUCKET_POLICY_PATH": bucket_policies,
"API_BASE_URL": "http://testserver",
"INTEGRITY_ENABLED": False,
})
c = flask_app.test_client()
resp = c.get("/admin/integrity/status", headers=AUTH_HEADERS)
assert resp.status_code == 200
data = resp.get_json()
assert data["enabled"] is False
storage = flask_app.extensions.get("object_storage")
if storage:
base = getattr(storage, "storage", storage)
if hasattr(base, "shutdown_stats"):
base.shutdown_stats()
class TestMultipleBuckets:
def test_scans_multiple_buckets(self, storage_root, checker):
_setup_bucket(storage_root, "bucket1", {"a.txt": b"aaa"})
_setup_bucket(storage_root, "bucket2", {"b.txt": b"bbb"})
result = checker.run_now()
assert result.buckets_scanned == 2
assert result.objects_scanned >= 2
assert result.corrupted_objects == 0
class TestGetStatus:
def test_status_fields(self, checker):
status = checker.get_status()
assert "enabled" in status
assert "running" in status
assert "interval_hours" in status
assert "batch_size" in status
assert "auto_heal" in status
assert "dry_run" in status
def test_status_includes_cursor(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
checker.run_now()
status = checker.get_status()
assert "cursor" in status
assert status["cursor"]["tracked_buckets"] == 1
assert "mybucket" in status["cursor"]["buckets"]
class TestUnifiedBatchCounter:
def test_orphaned_objects_count_toward_batch(self, storage_root):
_setup_bucket(storage_root, "mybucket", {})
for i in range(10):
(storage_root / "mybucket" / f"orphan{i}.txt").write_bytes(f"data{i}".encode())
checker = IntegrityChecker(storage_root=storage_root, batch_size=3)
result = checker.run_now()
assert result.objects_scanned <= 3
def test_phantom_metadata_counts_toward_batch(self, storage_root):
objects = {f"file{i}.txt": f"data{i}".encode() for i in range(10)}
_setup_bucket(storage_root, "mybucket", objects)
for i in range(10):
(storage_root / "mybucket" / f"file{i}.txt").unlink()
checker = IntegrityChecker(storage_root=storage_root, batch_size=5)
result = checker.run_now()
assert result.objects_scanned <= 5
def test_all_check_types_contribute(self, storage_root):
_setup_bucket(storage_root, "mybucket", {"valid.txt": b"hello"})
(storage_root / "mybucket" / "orphan.txt").write_bytes(b"orphan")
checker = IntegrityChecker(storage_root=storage_root, batch_size=1000)
result = checker.run_now()
assert result.objects_scanned > 2
class TestCursorRotation:
def test_oldest_bucket_scanned_first(self, storage_root):
_setup_bucket(storage_root, "bucket-a", {"a.txt": b"aaa"})
_setup_bucket(storage_root, "bucket-b", {"b.txt": b"bbb"})
_setup_bucket(storage_root, "bucket-c", {"c.txt": b"ccc"})
checker = IntegrityChecker(storage_root=storage_root, batch_size=5)
checker.cursor_store.update_bucket("bucket-a", 1000.0)
checker.cursor_store.update_bucket("bucket-b", 3000.0)
checker.cursor_store.update_bucket("bucket-c", 2000.0)
ordered = checker.cursor_store.get_bucket_order(["bucket-a", "bucket-b", "bucket-c"])
assert ordered[0] == "bucket-a"
assert ordered[1] == "bucket-c"
assert ordered[2] == "bucket-b"
def test_never_scanned_buckets_first(self, storage_root):
_setup_bucket(storage_root, "bucket-old", {"a.txt": b"aaa"})
_setup_bucket(storage_root, "bucket-new", {"b.txt": b"bbb"})
checker = IntegrityChecker(storage_root=storage_root, batch_size=1000)
checker.cursor_store.update_bucket("bucket-old", time.time())
ordered = checker.cursor_store.get_bucket_order(["bucket-old", "bucket-new"])
assert ordered[0] == "bucket-new"
def test_rotation_covers_all_buckets(self, storage_root):
for name in ["bucket-a", "bucket-b", "bucket-c"]:
_setup_bucket(storage_root, name, {f"{name}.txt": name.encode()})
checker = IntegrityChecker(storage_root=storage_root, batch_size=4)
result1 = checker.run_now()
scanned_buckets_1 = set()
for issue_bucket in [storage_root]:
pass
assert result1.buckets_scanned >= 1
result2 = checker.run_now()
result3 = checker.run_now()
cursor_info = checker.cursor_store.get_info()
assert cursor_info["tracked_buckets"] == 3
def test_cursor_persistence(self, storage_root):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
checker1 = IntegrityChecker(storage_root=storage_root, batch_size=1000)
checker1.run_now()
cursor1 = checker1.cursor_store.get_info()
assert cursor1["tracked_buckets"] == 1
assert "mybucket" in cursor1["buckets"]
checker2 = IntegrityChecker(storage_root=storage_root, batch_size=1000)
cursor2 = checker2.cursor_store.get_info()
assert cursor2["tracked_buckets"] == 1
assert "mybucket" in cursor2["buckets"]
def test_stale_cursor_cleanup(self, storage_root):
_setup_bucket(storage_root, "bucket-a", {"a.txt": b"aaa"})
_setup_bucket(storage_root, "bucket-b", {"b.txt": b"bbb"})
checker = IntegrityChecker(storage_root=storage_root, batch_size=1000)
checker.run_now()
import shutil
shutil.rmtree(storage_root / "bucket-b")
meta_b = storage_root / ".myfsio.sys" / "buckets" / "bucket-b"
if meta_b.exists():
shutil.rmtree(meta_b)
checker.run_now()
cursor_info = checker.cursor_store.get_info()
assert "bucket-b" not in cursor_info["buckets"]
assert "bucket-a" in cursor_info["buckets"]
def test_cursor_updates_after_scan(self, storage_root):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
checker = IntegrityChecker(storage_root=storage_root, batch_size=1000)
before = time.time()
checker.run_now()
after = time.time()
cursor_info = checker.cursor_store.get_info()
entry = cursor_info["buckets"]["mybucket"]
assert before <= entry["last_scanned"] <= after
assert entry["completed"] is True
class TestIntraBucketCursor:
def test_resumes_from_cursor_key(self, storage_root):
objects = {f"file_{chr(ord('a') + i)}.txt": f"data{i}".encode() for i in range(10)}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(storage_root=storage_root, batch_size=3)
result1 = checker.run_now()
assert result1.objects_scanned == 3
cursor_info = checker.cursor_store.get_info()
entry = cursor_info["buckets"]["mybucket"]
assert entry["last_key"] is not None
assert entry["completed"] is False
result2 = checker.run_now()
assert result2.objects_scanned == 3
cursor_after = checker.cursor_store.get_info()["buckets"]["mybucket"]
assert cursor_after["last_key"] > entry["last_key"]
def test_cursor_resets_after_full_pass(self, storage_root):
objects = {f"file_{i}.txt": f"data{i}".encode() for i in range(3)}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(storage_root=storage_root, batch_size=100)
checker.run_now()
cursor_info = checker.cursor_store.get_info()
entry = cursor_info["buckets"]["mybucket"]
assert entry["last_key"] is None
assert entry["completed"] is True
def test_full_coverage_across_cycles(self, storage_root):
objects = {f"obj_{chr(ord('a') + i)}.txt": f"data{i}".encode() for i in range(10)}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(storage_root=storage_root, batch_size=4)
all_scanned = 0
for _ in range(10):
result = checker.run_now()
all_scanned += result.objects_scanned
if checker.cursor_store.get_info()["buckets"]["mybucket"]["completed"]:
break
assert all_scanned >= 10
def test_deleted_cursor_key_skips_gracefully(self, storage_root):
objects = {f"file_{chr(ord('a') + i)}.txt": f"data{i}".encode() for i in range(6)}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(storage_root=storage_root, batch_size=3)
checker.run_now()
cursor_info = checker.cursor_store.get_info()
cursor_key = cursor_info["buckets"]["mybucket"]["last_key"]
assert cursor_key is not None
obj_path = storage_root / "mybucket" / cursor_key
meta_root = storage_root / ".myfsio.sys" / "buckets" / "mybucket" / "meta"
key_path = Path(cursor_key)
index_path = meta_root / key_path.parent / "_index.json" if key_path.parent != Path(".") else meta_root / "_index.json"
if key_path.parent == Path("."):
index_path = meta_root / "_index.json"
else:
index_path = meta_root / key_path.parent / "_index.json"
if obj_path.exists():
obj_path.unlink()
if index_path.exists():
index_data = json.loads(index_path.read_text())
index_data.pop(key_path.name, None)
index_path.write_text(json.dumps(index_data))
result2 = checker.run_now()
assert result2.objects_scanned > 0
def test_incomplete_buckets_prioritized(self, storage_root):
_setup_bucket(storage_root, "bucket-a", {f"a{i}.txt": b"a" for i in range(5)})
_setup_bucket(storage_root, "bucket-b", {f"b{i}.txt": b"b" for i in range(5)})
checker = IntegrityChecker(storage_root=storage_root, batch_size=3)
checker.run_now()
cursor_info = checker.cursor_store.get_info()
incomplete = [
name for name, info in cursor_info["buckets"].items()
if info.get("last_key") is not None
]
assert len(incomplete) >= 1
result2 = checker.run_now()
assert result2.objects_scanned > 0
def test_cursor_skips_nested_directories(self, storage_root):
objects = {
"aaa/file1.txt": b"a1",
"aaa/file2.txt": b"a2",
"bbb/file1.txt": b"b1",
"bbb/file2.txt": b"b2",
"ccc/file1.txt": b"c1",
"ccc/file2.txt": b"c2",
}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(storage_root=storage_root, batch_size=4)
result1 = checker.run_now()
assert result1.objects_scanned == 4
cursor_info = checker.cursor_store.get_info()
cursor_key = cursor_info["buckets"]["mybucket"]["last_key"]
assert cursor_key is not None
assert cursor_key.startswith("aaa/") or cursor_key.startswith("bbb/")
result2 = checker.run_now()
assert result2.objects_scanned >= 2
all_scanned = result1.objects_scanned + result2.objects_scanned
for _ in range(10):
if checker.cursor_store.get_info()["buckets"]["mybucket"]["completed"]:
break
r = checker.run_now()
all_scanned += r.objects_scanned
assert all_scanned >= 6
def test_sorted_walk_order(self, storage_root):
objects = {
"bar.txt": b"bar",
"bar/inner.txt": b"inner",
"abc.txt": b"abc",
"zzz/deep.txt": b"deep",
}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(storage_root=storage_root, batch_size=100)
result = checker.run_now()
assert result.objects_scanned >= 4
assert result.total_issues == 0

Some files were not shown because too many files have changed in this diff Show More