27 Commits

Author SHA1 Message Date
476b9bd2e4 First porting of Python to Rust - update docs and bug fixes 2026-04-20 21:27:02 +08:00
c2ef37b84e Separate Python and Rust into python/ and rust/ with per-stack Dockerfiles 2026-04-19 14:01:05 +08:00
be8e030940 Migrate more Python functions to Rust 2026-04-19 13:53:55 +08:00
ad7b2a02cb Add missing endpoints for Rust S3 API 2026-04-05 15:22:24 +08:00
72ddd9822c Add docker support for rust integration 2026-04-03 12:31:11 +08:00
4c30efd802 Update myfsio rust engines - added more implementations 2026-04-02 21:57:16 +08:00
926a7e6366 Add Rust storage engine foundation 2026-04-02 17:00:58 +08:00
1eadc7b75c Fix more-actions dropdown positioning: use Popper fixed strategy instead of raw CSS position:fixed 2026-04-01 16:24:42 +08:00
4a224a127b Fix more-actions dropdown triggering row selection on object list 2026-04-01 16:17:29 +08:00
c498fe7aee Add self-heal missing ETags and harden ETag index persistence 2026-03-31 21:10:47 +08:00
3838aed954 Fix presigned URL security vulnerabilities: enforce key/user status in SigV4 paths, remove duplicate verification, remove X-Forwarded-Host trust 2026-03-31 20:27:18 +08:00
6a193dbb1c Add --version option for run.py 2026-03-31 17:21:33 +08:00
e94b341a5b Add robust myfsio_core staleness detection with Python fallback; document Rust extension build in README 2026-03-31 17:13:05 +08:00
2ad3736852 Add intra-bucket cursor tracking to integrity scanner for progressive full coverage; Optimize integrity scanner: early batch exit, lazy sorted walk, cursor-aware index reads 2026-03-31 17:04:28 +08:00
f05b2668c0 Reduce per-request overhead: pre-compile SigV4 regex, in-memory etag index cache, 1MB GET chunks, configurable meta cache, skip fsync for rebuildable caches 2026-03-25 13:44:34 +08:00
f7c1c1f809 Update requirements.txt 2026-03-25 13:26:42 +08:00
0e392e18b4 Hide ghost details in object panel when preview fails to load 2026-03-24 15:15:03 +08:00
8996f1ce06 Fix folder selection not showing delete button in bucket browser 2026-03-24 12:10:38 +08:00
f60dbaf9c9 Respect DISPLAY_TIMEZONE in GC and integrity scanner history tables 2026-03-23 18:36:13 +08:00
1a5a7aa9e1 Auto-refresh Recent Scans/Executions tables after GC and integrity scan completion 2026-03-23 18:31:13 +08:00
326367ae4c Fix integrity scanner batch limit and add cursor-based rotation 2026-03-23 17:46:27 +08:00
a7f9b0a22f Convert GC to async with polling to prevent proxy timeouts 2026-03-23 17:14:04 +08:00
0e525713b1 Fix missing CSRF token on presigned URL request 2026-03-23 16:48:25 +08:00
f43fad02fb Replace fetch with XHR for multipart upload progress and add retry logic 2026-03-23 16:27:28 +08:00
eff3e378f3 Fix mobile infinite scroll on object list and ghost preview on fast object swap 2026-03-23 11:55:46 +08:00
5e32cef792 Add I/O throttling to GC and integrity scanner to prevent HDD starvation 2026-03-23 11:36:38 +08:00
9898167f8d Make integrity scan async with progress indicator in UI 2026-03-22 14:17:43 +08:00
200 changed files with 59048 additions and 3195 deletions

7
.gitignore vendored
View File

@@ -27,8 +27,11 @@ dist/
.eggs/ .eggs/
# Rust / maturin build artifacts # Rust / maturin build artifacts
myfsio_core/target/ python/myfsio_core/target/
myfsio_core/Cargo.lock python/myfsio_core/Cargo.lock
# Rust engine build artifacts
rust/myfsio-engine/target/
# Local runtime artifacts # Local runtime artifacts
logs/ logs/

388
README.md
View File

@@ -1,250 +1,212 @@
# MyFSIO # MyFSIO
A lightweight, S3-compatible object storage system built with Flask. MyFSIO implements core AWS S3 REST API operations with filesystem-backed storage, making it ideal for local development, testing, and self-hosted storage scenarios. MyFSIO is an S3-compatible object storage server with a Rust runtime and a filesystem-backed storage engine. The active server lives under `rust/myfsio-engine` and serves both the S3 API and the built-in web UI from a single process.
The repository still contains a `python/` tree, but you do not need Python to run the current server.
## Features ## Features
**Core Storage** - S3-compatible REST API with Signature Version 4 authentication
- S3-compatible REST API with AWS Signature Version 4 authentication - Browser UI for buckets, objects, IAM users, policies, replication, metrics, and site administration
- Bucket and object CRUD operations - Filesystem-backed storage rooted at `data/`
- Object versioning with version history - Bucket versioning, multipart uploads, presigned URLs, CORS, object and bucket tagging
- Multipart uploads for large files - Server-side encryption and built-in KMS support
- Presigned URLs (1 second to 7 days validity) - Optional background services for lifecycle, garbage collection, integrity scanning, operation metrics, and system metrics history
- Replication, site sync, and static website hosting support
**Security & Access Control** ## Runtime Model
- IAM users with access key management and rotation
- Bucket policies (AWS Policy Version 2012-10-17)
- Server-side encryption (SSE-S3 and SSE-KMS)
- Built-in Key Management Service (KMS)
- Rate limiting per endpoint
**Advanced Features** MyFSIO now runs as one Rust process:
- Cross-bucket replication to remote S3-compatible endpoints
- Hot-reload for bucket policies (no restart required)
- CORS configuration per bucket
**Management UI** - API listener on `HOST` + `PORT` (default `127.0.0.1:5000`)
- Web console for bucket and object management - UI listener on `HOST` + `UI_PORT` (default `127.0.0.1:5100`)
- IAM dashboard for user administration - Shared state for storage, IAM, policies, sessions, metrics, and background workers
- Inline JSON policy editor with presets
- Object browser with folder navigation and bulk operations
- Dark mode support
## Architecture If you want API-only mode, set `UI_ENABLED=false`. There is no separate "UI-only" runtime anymore.
```
+------------------+ +------------------+
| API Server | | UI Server |
| (port 5000) | | (port 5100) |
| | | |
| - S3 REST API |<------->| - Web Console |
| - SigV4 Auth | | - IAM Dashboard |
| - Presign URLs | | - Bucket Editor |
+--------+---------+ +------------------+
|
v
+------------------+ +------------------+
| Object Storage | | System Metadata |
| (filesystem) | | (.myfsio.sys/) |
| | | |
| data/<bucket>/ | | - IAM config |
| <objects> | | - Bucket policies|
| | | - Encryption keys|
+------------------+ +------------------+
```
## Quick Start ## Quick Start
From the repository root:
```bash ```bash
# Clone and setup cd rust/myfsio-engine
git clone https://gitea.jzwsite.com/kqjy/MyFSIO cargo run -p myfsio-server --
cd s3
python -m venv .venv
# Activate virtual environment
# Windows PowerShell:
.\.venv\Scripts\Activate.ps1
# Windows CMD:
.venv\Scripts\activate.bat
# Linux/macOS:
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Start both servers
python run.py
# Or start individually
python run.py --mode api # API only (port 5000)
python run.py --mode ui # UI only (port 5100)
``` ```
**Credentials:** Generated automatically on first run and printed to the console. If missed, check the IAM config file at `<STORAGE_ROOT>/.myfsio.sys/config/iam.json`. Useful URLs:
- **Web Console:** http://127.0.0.1:5100/ui - UI: `http://127.0.0.1:5100/ui`
- **API Endpoint:** http://127.0.0.1:5000 - API: `http://127.0.0.1:5000/`
- Health: `http://127.0.0.1:5000/myfsio/health`
On first boot, MyFSIO creates `data/.myfsio.sys/config/iam.json` and prints the generated admin access key and secret key to the console.
### Common CLI commands
```bash
# Show resolved configuration
cargo run -p myfsio-server -- --show-config
# Validate configuration and exit non-zero on critical issues
cargo run -p myfsio-server -- --check-config
# Reset admin credentials
cargo run -p myfsio-server -- --reset-cred
# API only
UI_ENABLED=false cargo run -p myfsio-server --
```
## Building a Binary
```bash
cd rust/myfsio-engine
cargo build --release -p myfsio-server
```
Binary locations:
- Linux/macOS: `rust/myfsio-engine/target/release/myfsio-server`
- Windows: `rust/myfsio-engine/target/release/myfsio-server.exe`
Run the built binary directly:
```bash
./target/release/myfsio-server
```
## Configuration ## Configuration
The server reads environment variables from the process environment and also loads, when present:
- `/opt/myfsio/myfsio.env`
- `.env`
- `myfsio.env`
Core settings:
| Variable | Default | Description | | Variable | Default | Description |
|----------|---------|-------------| | --- | --- | --- |
| `STORAGE_ROOT` | `./data` | Filesystem root for bucket storage | | `HOST` | `127.0.0.1` | Bind address for API and UI listeners |
| `IAM_CONFIG` | `.myfsio.sys/config/iam.json` | IAM user and policy store | | `PORT` | `5000` | API port |
| `BUCKET_POLICY_PATH` | `.myfsio.sys/config/bucket_policies.json` | Bucket policy store | | `UI_PORT` | `5100` | UI port |
| `API_BASE_URL` | `http://127.0.0.1:5000` | API endpoint for UI calls | | `UI_ENABLED` | `true` | Disable to run API-only |
| `MAX_UPLOAD_SIZE` | `1073741824` | Maximum upload size in bytes (1 GB) | | `STORAGE_ROOT` | `./data` | Root directory for buckets and system metadata |
| `MULTIPART_MIN_PART_SIZE` | `5242880` | Minimum multipart part size (5 MB) | | `IAM_CONFIG` | `<STORAGE_ROOT>/.myfsio.sys/config/iam.json` | IAM config path |
| `UI_PAGE_SIZE` | `100` | Default page size for listings | | `API_BASE_URL` | unset | Public API base used by the UI and presigned URL generation |
| `SECRET_KEY` | `dev-secret-key` | Flask session secret | | `AWS_REGION` | `us-east-1` | Region used in SigV4 scope |
| `AWS_REGION` | `us-east-1` | Region for SigV4 signing | | `SIGV4_TIMESTAMP_TOLERANCE_SECONDS` | `900` | Allowed request time skew |
| `AWS_SERVICE` | `s3` | Service name for SigV4 signing | | `PRESIGNED_URL_MIN_EXPIRY_SECONDS` | `1` | Minimum presigned URL expiry |
| `ENCRYPTION_ENABLED` | `false` | Enable server-side encryption | | `PRESIGNED_URL_MAX_EXPIRY_SECONDS` | `604800` | Maximum presigned URL expiry |
| `KMS_ENABLED` | `false` | Enable Key Management Service | | `SECRET_KEY` | loaded from `.myfsio.sys/config/.secret` if present | Session signing key and IAM-at-rest encryption key |
| `LOG_LEVEL` | `INFO` | Logging verbosity | | `ADMIN_ACCESS_KEY` | unset | Optional first-run or reset access key |
| `SIGV4_TIMESTAMP_TOLERANCE_SECONDS` | `900` | Max time skew for SigV4 requests | | `ADMIN_SECRET_KEY` | unset | Optional first-run or reset secret key |
| `PRESIGNED_URL_MAX_EXPIRY_SECONDS` | `604800` | Max presigned URL expiry (7 days) |
| `REPLICATION_CONNECT_TIMEOUT_SECONDS` | `5` | Replication connection timeout | Feature toggles:
| `SITE_SYNC_ENABLED` | `false` | Enable bi-directional site sync |
| `OBJECT_TAG_LIMIT` | `50` | Maximum tags per object | | Variable | Default |
| --- | --- |
| `ENCRYPTION_ENABLED` | `false` |
| `KMS_ENABLED` | `false` |
| `GC_ENABLED` | `false` |
| `INTEGRITY_ENABLED` | `false` |
| `LIFECYCLE_ENABLED` | `false` |
| `METRICS_HISTORY_ENABLED` | `false` |
| `OPERATION_METRICS_ENABLED` | `false` |
| `WEBSITE_HOSTING_ENABLED` | `false` |
| `SITE_SYNC_ENABLED` | `false` |
Metrics and replication tuning:
| Variable | Default |
| --- | --- |
| `OPERATION_METRICS_INTERVAL_MINUTES` | `5` |
| `OPERATION_METRICS_RETENTION_HOURS` | `24` |
| `METRICS_HISTORY_INTERVAL_MINUTES` | `5` |
| `METRICS_HISTORY_RETENTION_HOURS` | `24` |
| `REPLICATION_CONNECT_TIMEOUT_SECONDS` | `5` |
| `REPLICATION_READ_TIMEOUT_SECONDS` | `30` |
| `REPLICATION_MAX_RETRIES` | `2` |
| `REPLICATION_STREAMING_THRESHOLD_BYTES` | `10485760` |
| `REPLICATION_MAX_FAILURES_PER_BUCKET` | `50` |
| `SITE_SYNC_INTERVAL_SECONDS` | `60` |
| `SITE_SYNC_BATCH_SIZE` | `100` |
| `SITE_SYNC_CONNECT_TIMEOUT_SECONDS` | `10` |
| `SITE_SYNC_READ_TIMEOUT_SECONDS` | `120` |
| `SITE_SYNC_MAX_RETRIES` | `2` |
| `SITE_SYNC_CLOCK_SKEW_TOLERANCE_SECONDS` | `1.0` |
UI asset overrides:
| Variable | Default |
| --- | --- |
| `TEMPLATES_DIR` | built-in crate templates directory |
| `STATIC_DIR` | built-in crate static directory |
See [docs.md](./docs.md) for the full Rust-side operations guide.
## Data Layout ## Data Layout
``` ```text
data/ data/
├── <bucket>/ # User buckets with objects <bucket>/
└── .myfsio.sys/ # System metadata .myfsio.sys/
├── config/ config/
│ ├── iam.json # IAM users and policies iam.json
│ ├── bucket_policies.json # Bucket policies bucket_policies.json
├── replication_rules.json connections.json
└── connections.json # Remote S3 connections operation_metrics.json
├── buckets/<bucket>/ metrics_history.json
│ ├── meta/ # Object metadata (.meta.json) buckets/<bucket>/
│ ├── versions/ # Archived object versions meta/
└── .bucket.json # Bucket config (versioning, CORS) versions/
├── multipart/ # Active multipart uploads multipart/
└── keys/ # Encryption keys (SSE-S3/KMS) keys/
``` ```
## API Reference
All endpoints require AWS Signature Version 4 authentication unless using presigned URLs or public bucket policies.
### Bucket Operations
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/` | List all buckets |
| `PUT` | `/<bucket>` | Create bucket |
| `DELETE` | `/<bucket>` | Delete bucket (must be empty) |
| `HEAD` | `/<bucket>` | Check bucket exists |
### Object Operations
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/<bucket>` | List objects (supports `list-type=2`) |
| `PUT` | `/<bucket>/<key>` | Upload object |
| `GET` | `/<bucket>/<key>` | Download object |
| `DELETE` | `/<bucket>/<key>` | Delete object |
| `HEAD` | `/<bucket>/<key>` | Get object metadata |
| `POST` | `/<bucket>/<key>?uploads` | Initiate multipart upload |
| `PUT` | `/<bucket>/<key>?partNumber=N&uploadId=X` | Upload part |
| `POST` | `/<bucket>/<key>?uploadId=X` | Complete multipart upload |
| `DELETE` | `/<bucket>/<key>?uploadId=X` | Abort multipart upload |
### Bucket Policies (S3-compatible)
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/<bucket>?policy` | Get bucket policy |
| `PUT` | `/<bucket>?policy` | Set bucket policy |
| `DELETE` | `/<bucket>?policy` | Delete bucket policy |
### Versioning
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/<bucket>/<key>?versionId=X` | Get specific version |
| `DELETE` | `/<bucket>/<key>?versionId=X` | Delete specific version |
| `GET` | `/<bucket>?versions` | List object versions |
### Health Check
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/myfsio/health` | Health check endpoint |
## IAM & Access Control
### Users and Access Keys
On first run, MyFSIO creates a default admin user (`localadmin`/`localadmin`). Use the IAM dashboard to:
- Create and delete users
- Generate and rotate access keys
- Attach inline policies to users
- Control IAM management permissions
### Bucket Policies
Bucket policies follow AWS policy grammar (Version `2012-10-17`) with support for:
- Principal-based access (`*` for anonymous, specific users)
- Action-based permissions (`s3:GetObject`, `s3:PutObject`, etc.)
- Resource patterns (`arn:aws:s3:::bucket/*`)
- Condition keys
**Policy Presets:**
- **Public:** Grants anonymous read access (`s3:GetObject`, `s3:ListBucket`)
- **Private:** Removes bucket policy (IAM-only access)
- **Custom:** Manual policy editing with draft preservation
Policies hot-reload when the JSON file changes.
## Server-Side Encryption
MyFSIO supports two encryption modes:
- **SSE-S3:** Server-managed keys with automatic key rotation
- **SSE-KMS:** Customer-managed keys via built-in KMS
Enable encryption with:
```bash
ENCRYPTION_ENABLED=true python run.py
```
## Cross-Bucket Replication
Replicate objects to remote S3-compatible endpoints:
1. Configure remote connections in the UI
2. Create replication rules specifying source/destination
3. Objects are automatically replicated on upload
## Docker ## Docker
Build the Rust image from the `rust/` directory:
```bash ```bash
docker build -t myfsio . docker build -t myfsio ./rust
docker run -p 5000:5000 -p 5100:5100 -v ./data:/app/data myfsio docker run --rm -p 5000:5000 -p 5100:5100 -v "${PWD}/data:/app/data" myfsio
``` ```
If the instance sits behind a reverse proxy, set `API_BASE_URL` to the public S3 endpoint.
## Linux Installation
The repository includes `scripts/install.sh` for systemd-style Linux installs. Build the Rust binary first, then pass it to the installer:
```bash
cd rust/myfsio-engine
cargo build --release -p myfsio-server
cd ../..
sudo ./scripts/install.sh --binary ./rust/myfsio-engine/target/release/myfsio-server
```
The installer copies the binary into `/opt/myfsio/myfsio`, writes `/opt/myfsio/myfsio.env`, and can register a `myfsio.service` unit.
## Testing ## Testing
Run the Rust test suite from the workspace:
```bash ```bash
# Run all tests cd rust/myfsio-engine
pytest tests/ -v cargo test
# Run specific test file
pytest tests/test_api.py -v
# Run with coverage
pytest tests/ --cov=app --cov-report=html
``` ```
## References ## Health Check
- [Amazon S3 Documentation](https://docs.aws.amazon.com/s3/) `GET /myfsio/health` returns:
- [AWS Signature Version 4](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html)
- [S3 Bucket Policy Examples](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies.html) ```json
{
"status": "ok",
"version": "0.5.0"
}
```
The `version` field comes from the Rust crate version in `rust/myfsio-engine/crates/myfsio-server/Cargo.toml`.

View File

@@ -1,5 +0,0 @@
#!/bin/sh
set -e
# Run both services using the python runner in production mode
exec python run.py --prod

2716
docs.md

File diff suppressed because it is too large Load Diff

View File

@@ -11,3 +11,7 @@ htmlcov
logs logs
data data
tmp tmp
tests
myfsio_core/target
Dockerfile
.dockerignore

View File

@@ -1,9 +1,9 @@
FROM python:3.14.3-slim FROM python:3.14.3-slim AS builder
ENV PYTHONDONTWRITEBYTECODE=1 \ ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 PYTHONUNBUFFERED=1
WORKDIR /app WORKDIR /build
RUN apt-get update \ RUN apt-get update \
&& apt-get install -y --no-install-recommends build-essential curl \ && apt-get install -y --no-install-recommends build-essential curl \
@@ -12,23 +12,34 @@ RUN apt-get update \
ENV PATH="/root/.cargo/bin:${PATH}" ENV PATH="/root/.cargo/bin:${PATH}"
RUN pip install --no-cache-dir maturin
COPY myfsio_core ./myfsio_core
RUN cd myfsio_core \
&& maturin build --release --out /wheels
FROM python:3.14.3-slim
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
WORKDIR /app
COPY requirements.txt ./ COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt RUN pip install --no-cache-dir -r requirements.txt
COPY . . COPY --from=builder /wheels/*.whl /tmp/
RUN pip install --no-cache-dir /tmp/*.whl && rm /tmp/*.whl
RUN pip install --no-cache-dir maturin \ COPY app ./app
&& cd myfsio_core \ COPY templates ./templates
&& maturin build --release \ COPY static ./static
&& pip install target/wheels/*.whl \ COPY run.py ./
&& cd .. \ COPY docker-entrypoint.sh ./
&& rm -rf myfsio_core/target \
&& pip uninstall -y maturin \
&& rustup self uninstall -y
RUN chmod +x docker-entrypoint.sh RUN chmod +x docker-entrypoint.sh \
&& mkdir -p /app/data \
RUN mkdir -p /app/data \
&& useradd -m -u 1000 myfsio \ && useradd -m -u 1000 myfsio \
&& chown -R myfsio:myfsio /app && chown -R myfsio:myfsio /app

View File

@@ -184,6 +184,7 @@ def create_app(
object_cache_max_size=app.config.get("OBJECT_CACHE_MAX_SIZE", 100), object_cache_max_size=app.config.get("OBJECT_CACHE_MAX_SIZE", 100),
bucket_config_cache_ttl=app.config.get("BUCKET_CONFIG_CACHE_TTL_SECONDS", 30.0), bucket_config_cache_ttl=app.config.get("BUCKET_CONFIG_CACHE_TTL_SECONDS", 30.0),
object_key_max_length_bytes=app.config.get("OBJECT_KEY_MAX_LENGTH_BYTES", 1024), object_key_max_length_bytes=app.config.get("OBJECT_KEY_MAX_LENGTH_BYTES", 1024),
meta_read_cache_max=app.config.get("META_READ_CACHE_MAX", 2048),
) )
if app.config.get("WARM_CACHE_ON_STARTUP", True) and not app.config.get("TESTING"): if app.config.get("WARM_CACHE_ON_STARTUP", True) and not app.config.get("TESTING"):
@@ -293,6 +294,7 @@ def create_app(
multipart_max_age_days=app.config.get("GC_MULTIPART_MAX_AGE_DAYS", 7), multipart_max_age_days=app.config.get("GC_MULTIPART_MAX_AGE_DAYS", 7),
lock_file_max_age_hours=app.config.get("GC_LOCK_FILE_MAX_AGE_HOURS", 1.0), lock_file_max_age_hours=app.config.get("GC_LOCK_FILE_MAX_AGE_HOURS", 1.0),
dry_run=app.config.get("GC_DRY_RUN", False), dry_run=app.config.get("GC_DRY_RUN", False),
io_throttle_ms=app.config.get("GC_IO_THROTTLE_MS", 10),
) )
gc_collector.start() gc_collector.start()
@@ -304,6 +306,7 @@ def create_app(
batch_size=app.config.get("INTEGRITY_BATCH_SIZE", 1000), batch_size=app.config.get("INTEGRITY_BATCH_SIZE", 1000),
auto_heal=app.config.get("INTEGRITY_AUTO_HEAL", False), auto_heal=app.config.get("INTEGRITY_AUTO_HEAL", False),
dry_run=app.config.get("INTEGRITY_DRY_RUN", False), dry_run=app.config.get("INTEGRITY_DRY_RUN", False),
io_throttle_ms=app.config.get("INTEGRITY_IO_THROTTLE_MS", 10),
) )
integrity_checker.start() integrity_checker.start()

View File

@@ -907,15 +907,11 @@ def gc_run_now():
if not gc: if not gc:
return _json_error("InvalidRequest", "GC is not enabled", 400) return _json_error("InvalidRequest", "GC is not enabled", 400)
payload = request.get_json(silent=True) or {} payload = request.get_json(silent=True) or {}
original_dry_run = gc.dry_run started = gc.run_async(dry_run=payload.get("dry_run"))
if "dry_run" in payload:
gc.dry_run = bool(payload["dry_run"])
try:
result = gc.run_now()
finally:
gc.dry_run = original_dry_run
logger.info("GC manual run by %s", principal.access_key) logger.info("GC manual run by %s", principal.access_key)
return jsonify(result.to_dict()) if not started:
return _json_error("Conflict", "GC is already in progress", 409)
return jsonify({"status": "started"})
@admin_api_bp.route("/gc/history", methods=["GET"]) @admin_api_bp.route("/gc/history", methods=["GET"])
@@ -961,12 +957,14 @@ def integrity_run_now():
payload = request.get_json(silent=True) or {} payload = request.get_json(silent=True) or {}
override_dry_run = payload.get("dry_run") override_dry_run = payload.get("dry_run")
override_auto_heal = payload.get("auto_heal") override_auto_heal = payload.get("auto_heal")
result = checker.run_now( started = checker.run_async(
auto_heal=override_auto_heal if override_auto_heal is not None else None, auto_heal=override_auto_heal if override_auto_heal is not None else None,
dry_run=override_dry_run if override_dry_run is not None else None, dry_run=override_dry_run if override_dry_run is not None else None,
) )
logger.info("Integrity manual run by %s", principal.access_key) logger.info("Integrity manual run by %s", principal.access_key)
return jsonify(result.to_dict()) if not started:
return _json_error("Conflict", "A scan is already in progress", 409)
return jsonify({"status": "started"})
@admin_api_bp.route("/integrity/history", methods=["GET"]) @admin_api_bp.route("/integrity/history", methods=["GET"])

View File

@@ -136,6 +136,7 @@ class AppConfig:
site_sync_clock_skew_tolerance_seconds: float site_sync_clock_skew_tolerance_seconds: float
object_key_max_length_bytes: int object_key_max_length_bytes: int
object_cache_max_size: int object_cache_max_size: int
meta_read_cache_max: int
bucket_config_cache_ttl_seconds: float bucket_config_cache_ttl_seconds: float
object_tag_limit: int object_tag_limit: int
encryption_chunk_size_bytes: int encryption_chunk_size_bytes: int
@@ -157,11 +158,13 @@ class AppConfig:
gc_multipart_max_age_days: int gc_multipart_max_age_days: int
gc_lock_file_max_age_hours: float gc_lock_file_max_age_hours: float
gc_dry_run: bool gc_dry_run: bool
gc_io_throttle_ms: int
integrity_enabled: bool integrity_enabled: bool
integrity_interval_hours: float integrity_interval_hours: float
integrity_batch_size: int integrity_batch_size: int
integrity_auto_heal: bool integrity_auto_heal: bool
integrity_dry_run: bool integrity_dry_run: bool
integrity_io_throttle_ms: int
@classmethod @classmethod
def from_env(cls, overrides: Optional[Dict[str, Any]] = None) -> "AppConfig": def from_env(cls, overrides: Optional[Dict[str, Any]] = None) -> "AppConfig":
@@ -313,6 +316,7 @@ class AppConfig:
site_sync_clock_skew_tolerance_seconds = float(_get("SITE_SYNC_CLOCK_SKEW_TOLERANCE_SECONDS", 1.0)) site_sync_clock_skew_tolerance_seconds = float(_get("SITE_SYNC_CLOCK_SKEW_TOLERANCE_SECONDS", 1.0))
object_key_max_length_bytes = int(_get("OBJECT_KEY_MAX_LENGTH_BYTES", 1024)) object_key_max_length_bytes = int(_get("OBJECT_KEY_MAX_LENGTH_BYTES", 1024))
object_cache_max_size = int(_get("OBJECT_CACHE_MAX_SIZE", 100)) object_cache_max_size = int(_get("OBJECT_CACHE_MAX_SIZE", 100))
meta_read_cache_max = int(_get("META_READ_CACHE_MAX", 2048))
bucket_config_cache_ttl_seconds = float(_get("BUCKET_CONFIG_CACHE_TTL_SECONDS", 30.0)) bucket_config_cache_ttl_seconds = float(_get("BUCKET_CONFIG_CACHE_TTL_SECONDS", 30.0))
object_tag_limit = int(_get("OBJECT_TAG_LIMIT", 50)) object_tag_limit = int(_get("OBJECT_TAG_LIMIT", 50))
encryption_chunk_size_bytes = int(_get("ENCRYPTION_CHUNK_SIZE_BYTES", 64 * 1024)) encryption_chunk_size_bytes = int(_get("ENCRYPTION_CHUNK_SIZE_BYTES", 64 * 1024))
@@ -338,11 +342,13 @@ class AppConfig:
gc_multipart_max_age_days = int(_get("GC_MULTIPART_MAX_AGE_DAYS", 7)) gc_multipart_max_age_days = int(_get("GC_MULTIPART_MAX_AGE_DAYS", 7))
gc_lock_file_max_age_hours = float(_get("GC_LOCK_FILE_MAX_AGE_HOURS", 1.0)) gc_lock_file_max_age_hours = float(_get("GC_LOCK_FILE_MAX_AGE_HOURS", 1.0))
gc_dry_run = str(_get("GC_DRY_RUN", "0")).lower() in {"1", "true", "yes", "on"} gc_dry_run = str(_get("GC_DRY_RUN", "0")).lower() in {"1", "true", "yes", "on"}
gc_io_throttle_ms = int(_get("GC_IO_THROTTLE_MS", 10))
integrity_enabled = str(_get("INTEGRITY_ENABLED", "0")).lower() in {"1", "true", "yes", "on"} integrity_enabled = str(_get("INTEGRITY_ENABLED", "0")).lower() in {"1", "true", "yes", "on"}
integrity_interval_hours = float(_get("INTEGRITY_INTERVAL_HOURS", 24.0)) integrity_interval_hours = float(_get("INTEGRITY_INTERVAL_HOURS", 24.0))
integrity_batch_size = int(_get("INTEGRITY_BATCH_SIZE", 1000)) integrity_batch_size = int(_get("INTEGRITY_BATCH_SIZE", 1000))
integrity_auto_heal = str(_get("INTEGRITY_AUTO_HEAL", "0")).lower() in {"1", "true", "yes", "on"} integrity_auto_heal = str(_get("INTEGRITY_AUTO_HEAL", "0")).lower() in {"1", "true", "yes", "on"}
integrity_dry_run = str(_get("INTEGRITY_DRY_RUN", "0")).lower() in {"1", "true", "yes", "on"} integrity_dry_run = str(_get("INTEGRITY_DRY_RUN", "0")).lower() in {"1", "true", "yes", "on"}
integrity_io_throttle_ms = int(_get("INTEGRITY_IO_THROTTLE_MS", 10))
return cls(storage_root=storage_root, return cls(storage_root=storage_root,
max_upload_size=max_upload_size, max_upload_size=max_upload_size,
@@ -417,6 +423,7 @@ class AppConfig:
site_sync_clock_skew_tolerance_seconds=site_sync_clock_skew_tolerance_seconds, site_sync_clock_skew_tolerance_seconds=site_sync_clock_skew_tolerance_seconds,
object_key_max_length_bytes=object_key_max_length_bytes, object_key_max_length_bytes=object_key_max_length_bytes,
object_cache_max_size=object_cache_max_size, object_cache_max_size=object_cache_max_size,
meta_read_cache_max=meta_read_cache_max,
bucket_config_cache_ttl_seconds=bucket_config_cache_ttl_seconds, bucket_config_cache_ttl_seconds=bucket_config_cache_ttl_seconds,
object_tag_limit=object_tag_limit, object_tag_limit=object_tag_limit,
encryption_chunk_size_bytes=encryption_chunk_size_bytes, encryption_chunk_size_bytes=encryption_chunk_size_bytes,
@@ -438,11 +445,13 @@ class AppConfig:
gc_multipart_max_age_days=gc_multipart_max_age_days, gc_multipart_max_age_days=gc_multipart_max_age_days,
gc_lock_file_max_age_hours=gc_lock_file_max_age_hours, gc_lock_file_max_age_hours=gc_lock_file_max_age_hours,
gc_dry_run=gc_dry_run, gc_dry_run=gc_dry_run,
gc_io_throttle_ms=gc_io_throttle_ms,
integrity_enabled=integrity_enabled, integrity_enabled=integrity_enabled,
integrity_interval_hours=integrity_interval_hours, integrity_interval_hours=integrity_interval_hours,
integrity_batch_size=integrity_batch_size, integrity_batch_size=integrity_batch_size,
integrity_auto_heal=integrity_auto_heal, integrity_auto_heal=integrity_auto_heal,
integrity_dry_run=integrity_dry_run) integrity_dry_run=integrity_dry_run,
integrity_io_throttle_ms=integrity_io_throttle_ms)
def validate_and_report(self) -> list[str]: def validate_and_report(self) -> list[str]:
"""Validate configuration and return a list of warnings/issues. """Validate configuration and return a list of warnings/issues.
@@ -642,6 +651,7 @@ class AppConfig:
"SITE_SYNC_CLOCK_SKEW_TOLERANCE_SECONDS": self.site_sync_clock_skew_tolerance_seconds, "SITE_SYNC_CLOCK_SKEW_TOLERANCE_SECONDS": self.site_sync_clock_skew_tolerance_seconds,
"OBJECT_KEY_MAX_LENGTH_BYTES": self.object_key_max_length_bytes, "OBJECT_KEY_MAX_LENGTH_BYTES": self.object_key_max_length_bytes,
"OBJECT_CACHE_MAX_SIZE": self.object_cache_max_size, "OBJECT_CACHE_MAX_SIZE": self.object_cache_max_size,
"META_READ_CACHE_MAX": self.meta_read_cache_max,
"BUCKET_CONFIG_CACHE_TTL_SECONDS": self.bucket_config_cache_ttl_seconds, "BUCKET_CONFIG_CACHE_TTL_SECONDS": self.bucket_config_cache_ttl_seconds,
"OBJECT_TAG_LIMIT": self.object_tag_limit, "OBJECT_TAG_LIMIT": self.object_tag_limit,
"ENCRYPTION_CHUNK_SIZE_BYTES": self.encryption_chunk_size_bytes, "ENCRYPTION_CHUNK_SIZE_BYTES": self.encryption_chunk_size_bytes,
@@ -663,9 +673,11 @@ class AppConfig:
"GC_MULTIPART_MAX_AGE_DAYS": self.gc_multipart_max_age_days, "GC_MULTIPART_MAX_AGE_DAYS": self.gc_multipart_max_age_days,
"GC_LOCK_FILE_MAX_AGE_HOURS": self.gc_lock_file_max_age_hours, "GC_LOCK_FILE_MAX_AGE_HOURS": self.gc_lock_file_max_age_hours,
"GC_DRY_RUN": self.gc_dry_run, "GC_DRY_RUN": self.gc_dry_run,
"GC_IO_THROTTLE_MS": self.gc_io_throttle_ms,
"INTEGRITY_ENABLED": self.integrity_enabled, "INTEGRITY_ENABLED": self.integrity_enabled,
"INTEGRITY_INTERVAL_HOURS": self.integrity_interval_hours, "INTEGRITY_INTERVAL_HOURS": self.integrity_interval_hours,
"INTEGRITY_BATCH_SIZE": self.integrity_batch_size, "INTEGRITY_BATCH_SIZE": self.integrity_batch_size,
"INTEGRITY_AUTO_HEAL": self.integrity_auto_heal, "INTEGRITY_AUTO_HEAL": self.integrity_auto_heal,
"INTEGRITY_DRY_RUN": self.integrity_dry_run, "INTEGRITY_DRY_RUN": self.integrity_dry_run,
"INTEGRITY_IO_THROTTLE_MS": self.integrity_io_throttle_ms,
} }

View File

@@ -21,6 +21,10 @@ if sys.platform != "win32":
try: try:
import myfsio_core as _rc import myfsio_core as _rc
if not all(hasattr(_rc, f) for f in (
"encrypt_stream_chunked", "decrypt_stream_chunked",
)):
raise ImportError("myfsio_core is outdated, rebuild with: cd myfsio_core && maturin develop --release")
_HAS_RUST = True _HAS_RUST = True
except ImportError: except ImportError:
_rc = None _rc = None

View File

@@ -175,13 +175,21 @@ def handle_app_error(error: AppError) -> Response:
def handle_rate_limit_exceeded(e: RateLimitExceeded) -> Response: def handle_rate_limit_exceeded(e: RateLimitExceeded) -> Response:
g.s3_error_code = "SlowDown" g.s3_error_code = "SlowDown"
if request.path.startswith("/ui") or request.path.startswith("/buckets"):
wants_json = (
request.is_json or
request.headers.get("X-Requested-With") == "XMLHttpRequest" or
"application/json" in request.accept_mimetypes.values()
)
if wants_json:
return jsonify({"success": False, "error": {"code": "SlowDown", "message": "Please reduce your request rate."}}), 429
error = Element("Error") error = Element("Error")
SubElement(error, "Code").text = "SlowDown" SubElement(error, "Code").text = "SlowDown"
SubElement(error, "Message").text = "Please reduce your request rate." SubElement(error, "Message").text = "Please reduce your request rate."
SubElement(error, "Resource").text = request.path SubElement(error, "Resource").text = request.path
SubElement(error, "RequestId").text = getattr(g, "request_id", "") SubElement(error, "RequestId").text = getattr(g, "request_id", "")
xml_bytes = tostring(error, encoding="utf-8") xml_bytes = tostring(error, encoding="utf-8")
return Response(xml_bytes, status=429, mimetype="application/xml") return Response(xml_bytes, status="429 Too Many Requests", mimetype="application/xml")
def register_error_handlers(app): def register_error_handlers(app):

View File

@@ -162,6 +162,7 @@ class GarbageCollector:
lock_file_max_age_hours: float = 1.0, lock_file_max_age_hours: float = 1.0,
dry_run: bool = False, dry_run: bool = False,
max_history: int = 50, max_history: int = 50,
io_throttle_ms: int = 10,
) -> None: ) -> None:
self.storage_root = Path(storage_root) self.storage_root = Path(storage_root)
self.interval_seconds = interval_hours * 3600.0 self.interval_seconds = interval_hours * 3600.0
@@ -172,6 +173,9 @@ class GarbageCollector:
self._timer: Optional[threading.Timer] = None self._timer: Optional[threading.Timer] = None
self._shutdown = False self._shutdown = False
self._lock = threading.Lock() self._lock = threading.Lock()
self._scanning = False
self._scan_start_time: Optional[float] = None
self._io_throttle = max(0, io_throttle_ms) / 1000.0
self.history_store = GCHistoryStore(storage_root, max_records=max_history) self.history_store = GCHistoryStore(storage_root, max_records=max_history)
def start(self) -> None: def start(self) -> None:
@@ -212,16 +216,30 @@ class GarbageCollector:
finally: finally:
self._schedule_next() self._schedule_next()
def run_now(self) -> GCResult: def run_now(self, dry_run: Optional[bool] = None) -> GCResult:
start = time.time() if not self._lock.acquire(blocking=False):
raise RuntimeError("GC is already in progress")
effective_dry_run = dry_run if dry_run is not None else self.dry_run
try:
self._scanning = True
self._scan_start_time = time.time()
start = self._scan_start_time
result = GCResult() result = GCResult()
original_dry_run = self.dry_run
self.dry_run = effective_dry_run
try:
self._clean_temp_files(result) self._clean_temp_files(result)
self._clean_orphaned_multipart(result) self._clean_orphaned_multipart(result)
self._clean_stale_locks(result) self._clean_stale_locks(result)
self._clean_orphaned_metadata(result) self._clean_orphaned_metadata(result)
self._clean_orphaned_versions(result) self._clean_orphaned_versions(result)
self._clean_empty_dirs(result) self._clean_empty_dirs(result)
finally:
self.dry_run = original_dry_run
result.execution_time_seconds = time.time() - start result.execution_time_seconds = time.time() - start
@@ -240,21 +258,39 @@ class GarbageCollector:
result.orphaned_version_bytes_freed / (1024 * 1024), result.orphaned_version_bytes_freed / (1024 * 1024),
result.empty_dirs_removed, result.empty_dirs_removed,
len(result.errors), len(result.errors),
" (dry run)" if self.dry_run else "", " (dry run)" if effective_dry_run else "",
) )
record = GCExecutionRecord( record = GCExecutionRecord(
timestamp=time.time(), timestamp=time.time(),
result=result.to_dict(), result=result.to_dict(),
dry_run=self.dry_run, dry_run=effective_dry_run,
) )
self.history_store.add(record) self.history_store.add(record)
return result return result
finally:
self._scanning = False
self._scan_start_time = None
self._lock.release()
def run_async(self, dry_run: Optional[bool] = None) -> bool:
if self._scanning:
return False
t = threading.Thread(target=self.run_now, args=(dry_run,), daemon=True)
t.start()
return True
def _system_path(self) -> Path: def _system_path(self) -> Path:
return self.storage_root / self.SYSTEM_ROOT return self.storage_root / self.SYSTEM_ROOT
def _throttle(self) -> bool:
if self._shutdown:
return True
if self._io_throttle > 0:
time.sleep(self._io_throttle)
return self._shutdown
def _list_bucket_names(self) -> List[str]: def _list_bucket_names(self) -> List[str]:
names = [] names = []
try: try:
@@ -271,6 +307,8 @@ class GarbageCollector:
return return
try: try:
for entry in tmp_dir.iterdir(): for entry in tmp_dir.iterdir():
if self._throttle():
return
if not entry.is_file(): if not entry.is_file():
continue continue
age = _file_age_hours(entry) age = _file_age_hours(entry)
@@ -292,6 +330,8 @@ class GarbageCollector:
bucket_names = self._list_bucket_names() bucket_names = self._list_bucket_names()
for bucket_name in bucket_names: for bucket_name in bucket_names:
if self._shutdown:
return
for multipart_root in ( for multipart_root in (
self._system_path() / self.SYSTEM_MULTIPART_DIR / bucket_name, self._system_path() / self.SYSTEM_MULTIPART_DIR / bucket_name,
self.storage_root / bucket_name / ".multipart", self.storage_root / bucket_name / ".multipart",
@@ -300,6 +340,8 @@ class GarbageCollector:
continue continue
try: try:
for upload_dir in multipart_root.iterdir(): for upload_dir in multipart_root.iterdir():
if self._throttle():
return
if not upload_dir.is_dir(): if not upload_dir.is_dir():
continue continue
self._maybe_clean_upload(upload_dir, cutoff_hours, result) self._maybe_clean_upload(upload_dir, cutoff_hours, result)
@@ -329,6 +371,8 @@ class GarbageCollector:
try: try:
for bucket_dir in buckets_root.iterdir(): for bucket_dir in buckets_root.iterdir():
if self._shutdown:
return
if not bucket_dir.is_dir(): if not bucket_dir.is_dir():
continue continue
locks_dir = bucket_dir / "locks" locks_dir = bucket_dir / "locks"
@@ -336,6 +380,8 @@ class GarbageCollector:
continue continue
try: try:
for lock_file in locks_dir.iterdir(): for lock_file in locks_dir.iterdir():
if self._throttle():
return
if not lock_file.is_file() or not lock_file.name.endswith(".lock"): if not lock_file.is_file() or not lock_file.name.endswith(".lock"):
continue continue
age = _file_age_hours(lock_file) age = _file_age_hours(lock_file)
@@ -356,6 +402,8 @@ class GarbageCollector:
bucket_names = self._list_bucket_names() bucket_names = self._list_bucket_names()
for bucket_name in bucket_names: for bucket_name in bucket_names:
if self._shutdown:
return
legacy_meta = self.storage_root / bucket_name / ".meta" legacy_meta = self.storage_root / bucket_name / ".meta"
if legacy_meta.exists(): if legacy_meta.exists():
self._clean_legacy_metadata(bucket_name, legacy_meta, result) self._clean_legacy_metadata(bucket_name, legacy_meta, result)
@@ -368,6 +416,8 @@ class GarbageCollector:
bucket_path = self.storage_root / bucket_name bucket_path = self.storage_root / bucket_name
try: try:
for meta_file in meta_root.rglob("*.meta.json"): for meta_file in meta_root.rglob("*.meta.json"):
if self._throttle():
return
if not meta_file.is_file(): if not meta_file.is_file():
continue continue
try: try:
@@ -387,6 +437,8 @@ class GarbageCollector:
bucket_path = self.storage_root / bucket_name bucket_path = self.storage_root / bucket_name
try: try:
for index_file in meta_root.rglob("_index.json"): for index_file in meta_root.rglob("_index.json"):
if self._throttle():
return
if not index_file.is_file(): if not index_file.is_file():
continue continue
try: try:
@@ -430,6 +482,8 @@ class GarbageCollector:
bucket_names = self._list_bucket_names() bucket_names = self._list_bucket_names()
for bucket_name in bucket_names: for bucket_name in bucket_names:
if self._shutdown:
return
bucket_path = self.storage_root / bucket_name bucket_path = self.storage_root / bucket_name
for versions_root in ( for versions_root in (
self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_VERSIONS_DIR, self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_VERSIONS_DIR,
@@ -439,6 +493,8 @@ class GarbageCollector:
continue continue
try: try:
for key_dir in versions_root.iterdir(): for key_dir in versions_root.iterdir():
if self._throttle():
return
if not key_dir.is_dir(): if not key_dir.is_dir():
continue continue
self._clean_versions_for_key(bucket_path, versions_root, key_dir, result) self._clean_versions_for_key(bucket_path, versions_root, key_dir, result)
@@ -489,6 +545,8 @@ class GarbageCollector:
self._remove_empty_dirs_recursive(root, root, result) self._remove_empty_dirs_recursive(root, root, result)
def _remove_empty_dirs_recursive(self, path: Path, stop_at: Path, result: GCResult) -> bool: def _remove_empty_dirs_recursive(self, path: Path, stop_at: Path, result: GCResult) -> bool:
if self._shutdown:
return False
if not path.is_dir(): if not path.is_dir():
return False return False
@@ -499,6 +557,8 @@ class GarbageCollector:
all_empty = True all_empty = True
for child in children: for child in children:
if self._throttle():
return False
if child.is_dir(): if child.is_dir():
if not self._remove_empty_dirs_recursive(child, stop_at, result): if not self._remove_empty_dirs_recursive(child, stop_at, result):
all_empty = False all_empty = False
@@ -520,12 +580,17 @@ class GarbageCollector:
return [r.to_dict() for r in records] return [r.to_dict() for r in records]
def get_status(self) -> dict: def get_status(self) -> dict:
return { status: Dict[str, Any] = {
"enabled": not self._shutdown or self._timer is not None, "enabled": not self._shutdown or self._timer is not None,
"running": self._timer is not None and not self._shutdown, "running": self._timer is not None and not self._shutdown,
"scanning": self._scanning,
"interval_hours": self.interval_seconds / 3600.0, "interval_hours": self.interval_seconds / 3600.0,
"temp_file_max_age_hours": self.temp_file_max_age_hours, "temp_file_max_age_hours": self.temp_file_max_age_hours,
"multipart_max_age_days": self.multipart_max_age_days, "multipart_max_age_days": self.multipart_max_age_days,
"lock_file_max_age_hours": self.lock_file_max_age_hours, "lock_file_max_age_hours": self.lock_file_max_age_hours,
"dry_run": self.dry_run, "dry_run": self.dry_run,
"io_throttle_ms": round(self._io_throttle * 1000),
} }
if self._scanning and self._scan_start_time:
status["scan_elapsed_seconds"] = time.time() - self._scan_start_time
return status

View File

@@ -398,9 +398,11 @@ class IamService:
record = self._user_records.get(user_id) record = self._user_records.get(user_id)
if record: if record:
self._check_expiry(access_key, record) self._check_expiry(access_key, record)
self._enforce_key_and_user_status(access_key)
return principal return principal
self._maybe_reload() self._maybe_reload()
self._enforce_key_and_user_status(access_key)
user_id = self._key_index.get(access_key) user_id = self._key_index.get(access_key)
if not user_id: if not user_id:
raise IamError("Unknown access key") raise IamError("Unknown access key")
@@ -414,6 +416,7 @@ class IamService:
def secret_for_key(self, access_key: str) -> str: def secret_for_key(self, access_key: str) -> str:
self._maybe_reload() self._maybe_reload()
self._enforce_key_and_user_status(access_key)
secret = self._key_secrets.get(access_key) secret = self._key_secrets.get(access_key)
if not secret: if not secret:
raise IamError("Unknown access key") raise IamError("Unknown access key")
@@ -1028,6 +1031,16 @@ class IamService:
user, _ = self._resolve_raw_user(access_key) user, _ = self._resolve_raw_user(access_key)
return user return user
def _enforce_key_and_user_status(self, access_key: str) -> None:
key_status = self._key_status.get(access_key, "active")
if key_status != "active":
raise IamError("Access key is inactive")
user_id = self._key_index.get(access_key)
if user_id:
record = self._user_records.get(user_id)
if record and not record.get("enabled", True):
raise IamError("User account is disabled")
def get_secret_key(self, access_key: str) -> str | None: def get_secret_key(self, access_key: str) -> str | None:
now = time.time() now = time.time()
cached = self._secret_key_cache.get(access_key) cached = self._secret_key_cache.get(access_key)
@@ -1039,6 +1052,7 @@ class IamService:
record = self._user_records.get(user_id) record = self._user_records.get(user_id)
if record: if record:
self._check_expiry(access_key, record) self._check_expiry(access_key, record)
self._enforce_key_and_user_status(access_key)
return secret_key return secret_key
self._maybe_reload() self._maybe_reload()
@@ -1049,6 +1063,7 @@ class IamService:
record = self._user_records.get(user_id) record = self._user_records.get(user_id)
if record: if record:
self._check_expiry(access_key, record) self._check_expiry(access_key, record)
self._enforce_key_and_user_status(access_key)
self._secret_key_cache[access_key] = (secret, now) self._secret_key_cache[access_key] = (secret, now)
return secret return secret
return None return None
@@ -1064,9 +1079,11 @@ class IamService:
record = self._user_records.get(user_id) record = self._user_records.get(user_id)
if record: if record:
self._check_expiry(access_key, record) self._check_expiry(access_key, record)
self._enforce_key_and_user_status(access_key)
return principal return principal
self._maybe_reload() self._maybe_reload()
self._enforce_key_and_user_status(access_key)
user_id = self._key_index.get(access_key) user_id = self._key_index.get(access_key)
if user_id: if user_id:
record = self._user_records.get(user_id) record = self._user_records.get(user_id)

View File

@@ -12,6 +12,8 @@ from typing import Any, Dict, List, Optional
try: try:
import myfsio_core as _rc import myfsio_core as _rc
if not hasattr(_rc, "md5_file"):
raise ImportError("myfsio_core is outdated, rebuild with: cd myfsio_core && maturin develop --release")
_HAS_RUST = True _HAS_RUST = True
except ImportError: except ImportError:
_HAS_RUST = False _HAS_RUST = False
@@ -162,6 +164,111 @@ class IntegrityHistoryStore:
return self.load()[offset : offset + limit] return self.load()[offset : offset + limit]
class IntegrityCursorStore:
def __init__(self, storage_root: Path) -> None:
self.storage_root = storage_root
self._lock = threading.Lock()
def _get_path(self) -> Path:
return self.storage_root / ".myfsio.sys" / "config" / "integrity_cursor.json"
def load(self) -> Dict[str, Any]:
path = self._get_path()
if not path.exists():
return {"buckets": {}}
try:
with open(path, "r", encoding="utf-8") as f:
data = json.load(f)
if not isinstance(data.get("buckets"), dict):
return {"buckets": {}}
return data
except (OSError, ValueError, KeyError):
return {"buckets": {}}
def save(self, data: Dict[str, Any]) -> None:
path = self._get_path()
path.parent.mkdir(parents=True, exist_ok=True)
try:
with open(path, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
except OSError as e:
logger.error("Failed to save integrity cursor: %s", e)
def update_bucket(
self,
bucket_name: str,
timestamp: float,
last_key: Optional[str] = None,
completed: bool = False,
) -> None:
with self._lock:
data = self.load()
entry = data["buckets"].get(bucket_name, {})
if completed:
entry["last_scanned"] = timestamp
entry.pop("last_key", None)
entry["completed"] = True
else:
entry["last_scanned"] = timestamp
if last_key is not None:
entry["last_key"] = last_key
entry["completed"] = False
data["buckets"][bucket_name] = entry
self.save(data)
def clean_stale(self, existing_buckets: List[str]) -> None:
with self._lock:
data = self.load()
existing_set = set(existing_buckets)
stale_keys = [k for k in data["buckets"] if k not in existing_set]
if stale_keys:
for k in stale_keys:
del data["buckets"][k]
self.save(data)
def get_last_key(self, bucket_name: str) -> Optional[str]:
data = self.load()
entry = data.get("buckets", {}).get(bucket_name)
if entry is None:
return None
return entry.get("last_key")
def get_bucket_order(self, bucket_names: List[str]) -> List[str]:
data = self.load()
buckets_info = data.get("buckets", {})
incomplete = []
complete = []
for name in bucket_names:
entry = buckets_info.get(name)
if entry is None:
incomplete.append((name, 0.0))
elif entry.get("last_key") is not None:
incomplete.append((name, entry.get("last_scanned", 0.0)))
else:
complete.append((name, entry.get("last_scanned", 0.0)))
incomplete.sort(key=lambda x: x[1])
complete.sort(key=lambda x: x[1])
return [n for n, _ in incomplete] + [n for n, _ in complete]
def get_info(self) -> Dict[str, Any]:
data = self.load()
buckets = data.get("buckets", {})
return {
"tracked_buckets": len(buckets),
"buckets": {
name: {
"last_scanned": info.get("last_scanned"),
"last_key": info.get("last_key"),
"completed": info.get("completed", False),
}
for name, info in buckets.items()
},
}
MAX_ISSUES = 500 MAX_ISSUES = 500
@@ -180,6 +287,7 @@ class IntegrityChecker:
auto_heal: bool = False, auto_heal: bool = False,
dry_run: bool = False, dry_run: bool = False,
max_history: int = 50, max_history: int = 50,
io_throttle_ms: int = 10,
) -> None: ) -> None:
self.storage_root = Path(storage_root) self.storage_root = Path(storage_root)
self.interval_seconds = interval_hours * 3600.0 self.interval_seconds = interval_hours * 3600.0
@@ -189,7 +297,11 @@ class IntegrityChecker:
self._timer: Optional[threading.Timer] = None self._timer: Optional[threading.Timer] = None
self._shutdown = False self._shutdown = False
self._lock = threading.Lock() self._lock = threading.Lock()
self._scanning = False
self._scan_start_time: Optional[float] = None
self._io_throttle = max(0, io_throttle_ms) / 1000.0
self.history_store = IntegrityHistoryStore(storage_root, max_records=max_history) self.history_store = IntegrityHistoryStore(storage_root, max_records=max_history)
self.cursor_store = IntegrityCursorStore(self.storage_root)
def start(self) -> None: def start(self) -> None:
if self._timer is not None: if self._timer is not None:
@@ -229,24 +341,40 @@ class IntegrityChecker:
self._schedule_next() self._schedule_next()
def run_now(self, auto_heal: Optional[bool] = None, dry_run: Optional[bool] = None) -> IntegrityResult: def run_now(self, auto_heal: Optional[bool] = None, dry_run: Optional[bool] = None) -> IntegrityResult:
if not self._lock.acquire(blocking=False):
raise RuntimeError("Integrity scan is already in progress")
try:
self._scanning = True
self._scan_start_time = time.time()
effective_auto_heal = auto_heal if auto_heal is not None else self.auto_heal effective_auto_heal = auto_heal if auto_heal is not None else self.auto_heal
effective_dry_run = dry_run if dry_run is not None else self.dry_run effective_dry_run = dry_run if dry_run is not None else self.dry_run
start = time.time() start = self._scan_start_time
result = IntegrityResult() result = IntegrityResult()
bucket_names = self._list_bucket_names() bucket_names = self._list_bucket_names()
self.cursor_store.clean_stale(bucket_names)
ordered_buckets = self.cursor_store.get_bucket_order(bucket_names)
for bucket_name in bucket_names: for bucket_name in ordered_buckets:
if result.objects_scanned >= self.batch_size: if self._batch_exhausted(result):
break break
result.buckets_scanned += 1 result.buckets_scanned += 1
self._check_corrupted_objects(bucket_name, result, effective_auto_heal, effective_dry_run) cursor_key = self.cursor_store.get_last_key(bucket_name)
self._check_orphaned_objects(bucket_name, result, effective_auto_heal, effective_dry_run) key_corrupted = self._check_corrupted_objects(bucket_name, result, effective_auto_heal, effective_dry_run, cursor_key)
self._check_phantom_metadata(bucket_name, result, effective_auto_heal, effective_dry_run) key_orphaned = self._check_orphaned_objects(bucket_name, result, effective_auto_heal, effective_dry_run, cursor_key)
key_phantom = self._check_phantom_metadata(bucket_name, result, effective_auto_heal, effective_dry_run, cursor_key)
self._check_stale_versions(bucket_name, result, effective_auto_heal, effective_dry_run) self._check_stale_versions(bucket_name, result, effective_auto_heal, effective_dry_run)
self._check_etag_cache(bucket_name, result, effective_auto_heal, effective_dry_run) self._check_etag_cache(bucket_name, result, effective_auto_heal, effective_dry_run)
self._check_legacy_metadata(bucket_name, result, effective_auto_heal, effective_dry_run) self._check_legacy_metadata(bucket_name, result, effective_auto_heal, effective_dry_run)
returned_keys = [k for k in (key_corrupted, key_orphaned, key_phantom) if k is not None]
bucket_exhausted = self._batch_exhausted(result)
if bucket_exhausted and returned_keys:
self.cursor_store.update_bucket(bucket_name, time.time(), last_key=min(returned_keys))
else:
self.cursor_store.update_bucket(bucket_name, time.time(), completed=True)
result.execution_time_seconds = time.time() - start result.execution_time_seconds = time.time() - start
@@ -275,6 +403,17 @@ class IntegrityChecker:
self.history_store.add(record) self.history_store.add(record)
return result return result
finally:
self._scanning = False
self._scan_start_time = None
self._lock.release()
def run_async(self, auto_heal: Optional[bool] = None, dry_run: Optional[bool] = None) -> bool:
if self._scanning:
return False
t = threading.Thread(target=self.run_now, args=(auto_heal, dry_run), daemon=True)
t.start()
return True
def _system_path(self) -> Path: def _system_path(self) -> Path:
return self.storage_root / self.SYSTEM_ROOT return self.storage_root / self.SYSTEM_ROOT
@@ -289,45 +428,121 @@ class IntegrityChecker:
pass pass
return names return names
def _throttle(self) -> bool:
if self._shutdown:
return True
if self._io_throttle > 0:
time.sleep(self._io_throttle)
return self._shutdown
def _batch_exhausted(self, result: IntegrityResult) -> bool:
return self._shutdown or result.objects_scanned >= self.batch_size
def _add_issue(self, result: IntegrityResult, issue: IntegrityIssue) -> None: def _add_issue(self, result: IntegrityResult, issue: IntegrityIssue) -> None:
if len(result.issues) < MAX_ISSUES: if len(result.issues) < MAX_ISSUES:
result.issues.append(issue) result.issues.append(issue)
def _check_corrupted_objects( def _collect_index_keys(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool self, meta_root: Path, cursor_key: Optional[str] = None,
) -> None: ) -> Dict[str, Dict[str, Any]]:
bucket_path = self.storage_root / bucket_name all_keys: Dict[str, Dict[str, Any]] = {}
meta_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_META_DIR
if not meta_root.exists(): if not meta_root.exists():
return return all_keys
try: try:
for index_file in meta_root.rglob("_index.json"): for index_file in meta_root.rglob("_index.json"):
if result.objects_scanned >= self.batch_size:
return
if not index_file.is_file(): if not index_file.is_file():
continue continue
rel_dir = index_file.parent.relative_to(meta_root)
dir_prefix = "" if rel_dir == Path(".") else rel_dir.as_posix()
if cursor_key is not None and dir_prefix:
full_prefix = dir_prefix + "/"
if not cursor_key.startswith(full_prefix) and cursor_key > full_prefix:
continue
try: try:
index_data = json.loads(index_file.read_text(encoding="utf-8")) index_data = json.loads(index_file.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError): except (OSError, json.JSONDecodeError):
continue continue
for key_name, entry in index_data.items():
full_key = (dir_prefix + "/" + key_name) if dir_prefix else key_name
if cursor_key is not None and full_key <= cursor_key:
continue
all_keys[full_key] = {
"entry": entry,
"index_file": index_file,
"key_name": key_name,
}
except OSError:
pass
return all_keys
for key_name, entry in list(index_data.items()): def _walk_bucket_files_sorted(
if result.objects_scanned >= self.batch_size: self, bucket_path: Path, cursor_key: Optional[str] = None,
):
def _walk(dir_path: Path, prefix: str):
try:
entries = list(os.scandir(dir_path))
except OSError:
return return
rel_dir = index_file.parent.relative_to(meta_root) def _sort_key(e):
if rel_dir == Path("."): if e.is_dir(follow_symlinks=False):
full_key = key_name return e.name + "/"
else: return e.name
full_key = rel_dir.as_posix() + "/" + key_name
entries.sort(key=_sort_key)
for entry in entries:
if entry.is_dir(follow_symlinks=False):
if not prefix and entry.name in self.INTERNAL_FOLDERS:
continue
new_prefix = (prefix + "/" + entry.name) if prefix else entry.name
if cursor_key is not None:
full_prefix = new_prefix + "/"
if not cursor_key.startswith(full_prefix) and cursor_key > full_prefix:
continue
yield from _walk(Path(entry.path), new_prefix)
elif entry.is_file(follow_symlinks=False):
full_key = (prefix + "/" + entry.name) if prefix else entry.name
if cursor_key is not None and full_key <= cursor_key:
continue
yield full_key
yield from _walk(bucket_path, "")
def _check_corrupted_objects(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool,
cursor_key: Optional[str] = None,
) -> Optional[str]:
if self._batch_exhausted(result):
return None
bucket_path = self.storage_root / bucket_name
meta_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_META_DIR
if not meta_root.exists():
return None
last_key = None
try:
all_keys = self._collect_index_keys(meta_root, cursor_key)
sorted_keys = sorted(all_keys.keys())
for full_key in sorted_keys:
if self._throttle():
return last_key
if self._batch_exhausted(result):
return last_key
info = all_keys[full_key]
entry = info["entry"]
index_file = info["index_file"]
key_name = info["key_name"]
object_path = bucket_path / full_key object_path = bucket_path / full_key
if not object_path.exists(): if not object_path.exists():
continue continue
result.objects_scanned += 1 result.objects_scanned += 1
last_key = full_key
meta = entry.get("metadata", {}) if isinstance(entry, dict) else {} meta = entry.get("metadata", {}) if isinstance(entry, dict) else {}
stored_etag = meta.get("__etag__") stored_etag = meta.get("__etag__")
@@ -354,6 +569,10 @@ class IntegrityChecker:
meta["__etag__"] = actual_etag meta["__etag__"] = actual_etag
meta["__size__"] = str(stat.st_size) meta["__size__"] = str(stat.st_size)
meta["__last_modified__"] = str(stat.st_mtime) meta["__last_modified__"] = str(stat.st_mtime)
try:
index_data = json.loads(index_file.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
index_data = {}
index_data[key_name] = {"metadata": meta} index_data[key_name] = {"metadata": meta}
self._atomic_write_index(index_file, index_data) self._atomic_write_index(index_file, index_data)
issue.healed = True issue.healed = True
@@ -365,29 +584,30 @@ class IntegrityChecker:
self._add_issue(result, issue) self._add_issue(result, issue)
except OSError as e: except OSError as e:
result.errors.append(f"check corrupted {bucket_name}: {e}") result.errors.append(f"check corrupted {bucket_name}: {e}")
return last_key
def _check_orphaned_objects( def _check_orphaned_objects(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool,
) -> None: cursor_key: Optional[str] = None,
) -> Optional[str]:
if self._batch_exhausted(result):
return None
bucket_path = self.storage_root / bucket_name bucket_path = self.storage_root / bucket_name
meta_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_META_DIR meta_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_META_DIR
last_key = None
try: try:
for entry in bucket_path.rglob("*"): for full_key in self._walk_bucket_files_sorted(bucket_path, cursor_key):
if result.objects_scanned >= self.batch_size: if self._throttle():
return return last_key
if not entry.is_file(): if self._batch_exhausted(result):
continue return last_key
try:
rel = entry.relative_to(bucket_path)
except ValueError:
continue
if rel.parts and rel.parts[0] in self.INTERNAL_FOLDERS:
continue
full_key = rel.as_posix() result.objects_scanned += 1
key_name = rel.name last_key = full_key
parent = rel.parent key_path = Path(full_key)
key_name = key_path.name
parent = key_path.parent
if parent == Path("."): if parent == Path("."):
index_path = meta_root / "_index.json" index_path = meta_root / "_index.json"
@@ -413,8 +633,9 @@ class IntegrityChecker:
if auto_heal and not dry_run: if auto_heal and not dry_run:
try: try:
etag = _compute_etag(entry) object_path = bucket_path / full_key
stat = entry.stat() etag = _compute_etag(object_path)
stat = object_path.stat()
meta = { meta = {
"__etag__": etag, "__etag__": etag,
"__size__": str(stat.st_size), "__size__": str(stat.st_size),
@@ -437,36 +658,38 @@ class IntegrityChecker:
self._add_issue(result, issue) self._add_issue(result, issue)
except OSError as e: except OSError as e:
result.errors.append(f"check orphaned {bucket_name}: {e}") result.errors.append(f"check orphaned {bucket_name}: {e}")
return last_key
def _check_phantom_metadata( def _check_phantom_metadata(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool,
) -> None: cursor_key: Optional[str] = None,
) -> Optional[str]:
if self._batch_exhausted(result):
return None
bucket_path = self.storage_root / bucket_name bucket_path = self.storage_root / bucket_name
meta_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_META_DIR meta_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_META_DIR
if not meta_root.exists(): if not meta_root.exists():
return return None
last_key = None
try: try:
for index_file in meta_root.rglob("_index.json"): all_keys = self._collect_index_keys(meta_root, cursor_key)
if not index_file.is_file(): sorted_keys = sorted(all_keys.keys())
continue
try:
index_data = json.loads(index_file.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError):
continue
keys_to_remove = [] heal_by_index: Dict[Path, List[str]] = {}
for key_name in list(index_data.keys()):
rel_dir = index_file.parent.relative_to(meta_root) for full_key in sorted_keys:
if rel_dir == Path("."): if self._batch_exhausted(result):
full_key = key_name break
else:
full_key = rel_dir.as_posix() + "/" + key_name result.objects_scanned += 1
last_key = full_key
object_path = bucket_path / full_key object_path = bucket_path / full_key
if not object_path.exists(): if not object_path.exists():
result.phantom_metadata += 1 result.phantom_metadata += 1
info = all_keys[full_key]
issue = IntegrityIssue( issue = IntegrityIssue(
issue_type="phantom_metadata", issue_type="phantom_metadata",
bucket=bucket_name, bucket=bucket_name,
@@ -474,14 +697,17 @@ class IntegrityChecker:
detail="metadata entry without file on disk", detail="metadata entry without file on disk",
) )
if auto_heal and not dry_run: if auto_heal and not dry_run:
keys_to_remove.append(key_name) index_file = info["index_file"]
heal_by_index.setdefault(index_file, []).append(info["key_name"])
issue.healed = True issue.healed = True
issue.heal_action = "removed stale index entry" issue.heal_action = "removed stale index entry"
result.issues_healed += 1 result.issues_healed += 1
self._add_issue(result, issue) self._add_issue(result, issue)
if keys_to_remove and auto_heal and not dry_run: if heal_by_index and auto_heal and not dry_run:
for index_file, keys_to_remove in heal_by_index.items():
try: try:
index_data = json.loads(index_file.read_text(encoding="utf-8"))
for k in keys_to_remove: for k in keys_to_remove:
index_data.pop(k, None) index_data.pop(k, None)
if index_data: if index_data:
@@ -492,10 +718,13 @@ class IntegrityChecker:
result.errors.append(f"heal phantom {bucket_name}: {e}") result.errors.append(f"heal phantom {bucket_name}: {e}")
except OSError as e: except OSError as e:
result.errors.append(f"check phantom {bucket_name}: {e}") result.errors.append(f"check phantom {bucket_name}: {e}")
return last_key
def _check_stale_versions( def _check_stale_versions(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool
) -> None: ) -> None:
if self._batch_exhausted(result):
return
versions_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_VERSIONS_DIR versions_root = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / self.BUCKET_VERSIONS_DIR
if not versions_root.exists(): if not versions_root.exists():
@@ -503,6 +732,10 @@ class IntegrityChecker:
try: try:
for key_dir in versions_root.rglob("*"): for key_dir in versions_root.rglob("*"):
if self._throttle():
return
if self._batch_exhausted(result):
return
if not key_dir.is_dir(): if not key_dir.is_dir():
continue continue
@@ -510,6 +743,9 @@ class IntegrityChecker:
json_files = {f.stem: f for f in key_dir.glob("*.json")} json_files = {f.stem: f for f in key_dir.glob("*.json")}
for stem, bin_file in bin_files.items(): for stem, bin_file in bin_files.items():
if self._batch_exhausted(result):
return
result.objects_scanned += 1
if stem not in json_files: if stem not in json_files:
result.stale_versions += 1 result.stale_versions += 1
issue = IntegrityIssue( issue = IntegrityIssue(
@@ -529,6 +765,9 @@ class IntegrityChecker:
self._add_issue(result, issue) self._add_issue(result, issue)
for stem, json_file in json_files.items(): for stem, json_file in json_files.items():
if self._batch_exhausted(result):
return
result.objects_scanned += 1
if stem not in bin_files: if stem not in bin_files:
result.stale_versions += 1 result.stale_versions += 1
issue = IntegrityIssue( issue = IntegrityIssue(
@@ -552,6 +791,8 @@ class IntegrityChecker:
def _check_etag_cache( def _check_etag_cache(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool
) -> None: ) -> None:
if self._batch_exhausted(result):
return
etag_index_path = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / "etag_index.json" etag_index_path = self._system_path() / self.SYSTEM_BUCKETS_DIR / bucket_name / "etag_index.json"
if not etag_index_path.exists(): if not etag_index_path.exists():
@@ -569,6 +810,9 @@ class IntegrityChecker:
found_mismatch = False found_mismatch = False
for full_key, cached_etag in etag_cache.items(): for full_key, cached_etag in etag_cache.items():
if self._batch_exhausted(result):
break
result.objects_scanned += 1
key_path = Path(full_key) key_path = Path(full_key)
key_name = key_path.name key_name = key_path.name
parent = key_path.parent parent = key_path.parent
@@ -618,6 +862,8 @@ class IntegrityChecker:
def _check_legacy_metadata( def _check_legacy_metadata(
self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool self, bucket_name: str, result: IntegrityResult, auto_heal: bool, dry_run: bool
) -> None: ) -> None:
if self._batch_exhausted(result):
return
legacy_meta_root = self.storage_root / bucket_name / ".meta" legacy_meta_root = self.storage_root / bucket_name / ".meta"
if not legacy_meta_root.exists(): if not legacy_meta_root.exists():
return return
@@ -626,9 +872,14 @@ class IntegrityChecker:
try: try:
for meta_file in legacy_meta_root.rglob("*.meta.json"): for meta_file in legacy_meta_root.rglob("*.meta.json"):
if self._throttle():
return
if self._batch_exhausted(result):
return
if not meta_file.is_file(): if not meta_file.is_file():
continue continue
result.objects_scanned += 1
try: try:
rel = meta_file.relative_to(legacy_meta_root) rel = meta_file.relative_to(legacy_meta_root)
except ValueError: except ValueError:
@@ -728,11 +979,17 @@ class IntegrityChecker:
return [r.to_dict() for r in records] return [r.to_dict() for r in records]
def get_status(self) -> dict: def get_status(self) -> dict:
return { status: Dict[str, Any] = {
"enabled": not self._shutdown or self._timer is not None, "enabled": not self._shutdown or self._timer is not None,
"running": self._timer is not None and not self._shutdown, "running": self._timer is not None and not self._shutdown,
"scanning": self._scanning,
"interval_hours": self.interval_seconds / 3600.0, "interval_hours": self.interval_seconds / 3600.0,
"batch_size": self.batch_size, "batch_size": self.batch_size,
"auto_heal": self.auto_heal, "auto_heal": self.auto_heal,
"dry_run": self.dry_run, "dry_run": self.dry_run,
"io_throttle_ms": round(self._io_throttle * 1000),
} }
if self._scanning and self._scan_start_time is not None:
status["scan_elapsed_seconds"] = round(time.time() - self._scan_start_time, 1)
status["cursor"] = self.cursor_store.get_info()
return status

View File

@@ -19,6 +19,10 @@ from defusedxml.ElementTree import fromstring
try: try:
import myfsio_core as _rc import myfsio_core as _rc
if not all(hasattr(_rc, f) for f in (
"verify_sigv4_signature", "derive_signing_key", "clear_signing_key_cache",
)):
raise ImportError("myfsio_core is outdated, rebuild with: cd myfsio_core && maturin develop --release")
_HAS_RUST = True _HAS_RUST = True
except ImportError: except ImportError:
_rc = None _rc = None
@@ -201,6 +205,11 @@ _SIGNING_KEY_CACHE_LOCK = threading.Lock()
_SIGNING_KEY_CACHE_TTL = 60.0 _SIGNING_KEY_CACHE_TTL = 60.0
_SIGNING_KEY_CACHE_MAX_SIZE = 256 _SIGNING_KEY_CACHE_MAX_SIZE = 256
_SIGV4_HEADER_RE = re.compile(
r"AWS4-HMAC-SHA256 Credential=([^/]+)/([^/]+)/([^/]+)/([^/]+)/aws4_request, SignedHeaders=([^,]+), Signature=(.+)"
)
_SIGV4_REQUIRED_HEADERS = frozenset({'host', 'x-amz-date'})
def clear_signing_key_cache() -> None: def clear_signing_key_cache() -> None:
if _HAS_RUST: if _HAS_RUST:
@@ -259,10 +268,7 @@ def _get_canonical_uri(req: Any) -> str:
def _verify_sigv4_header(req: Any, auth_header: str) -> Principal | None: def _verify_sigv4_header(req: Any, auth_header: str) -> Principal | None:
match = re.match( match = _SIGV4_HEADER_RE.match(auth_header)
r"AWS4-HMAC-SHA256 Credential=([^/]+)/([^/]+)/([^/]+)/([^/]+)/aws4_request, SignedHeaders=([^,]+), Signature=(.+)",
auth_header,
)
if not match: if not match:
return None return None
@@ -286,14 +292,9 @@ def _verify_sigv4_header(req: Any, auth_header: str) -> Principal | None:
if time_diff > tolerance: if time_diff > tolerance:
raise IamError("Request timestamp too old or too far in the future") raise IamError("Request timestamp too old or too far in the future")
required_headers = {'host', 'x-amz-date'}
signed_headers_set = set(signed_headers_str.split(';')) signed_headers_set = set(signed_headers_str.split(';'))
if not required_headers.issubset(signed_headers_set): if not _SIGV4_REQUIRED_HEADERS.issubset(signed_headers_set):
if 'date' in signed_headers_set: if not ({'host', 'date'}.issubset(signed_headers_set)):
required_headers.remove('x-amz-date')
required_headers.add('date')
if not required_headers.issubset(signed_headers_set):
raise IamError("Required headers not signed") raise IamError("Required headers not signed")
canonical_uri = _get_canonical_uri(req) canonical_uri = _get_canonical_uri(req)
@@ -533,21 +534,6 @@ def _authorize_action(principal: Principal | None, bucket_name: str | None, acti
raise iam_error or IamError("Access denied") raise iam_error or IamError("Access denied")
def _enforce_bucket_policy(principal: Principal | None, bucket_name: str | None, object_key: str | None, action: str) -> None:
if not bucket_name:
return
policy_context = _build_policy_context()
decision = _bucket_policies().evaluate(
principal.access_key if principal else None,
bucket_name,
object_key,
action,
policy_context,
)
if decision == "deny":
raise IamError("Access denied by bucket policy")
def _object_principal(action: str, bucket_name: str, object_key: str): def _object_principal(action: str, bucket_name: str, object_key: str):
principal, error = _require_principal() principal, error = _require_principal()
try: try:
@@ -556,121 +542,7 @@ def _object_principal(action: str, bucket_name: str, object_key: str):
except IamError as exc: except IamError as exc:
if not error: if not error:
return None, _error_response("AccessDenied", str(exc), 403) return None, _error_response("AccessDenied", str(exc), 403)
if not _has_presign_params():
return None, error return None, error
try:
principal = _validate_presigned_request(action, bucket_name, object_key)
_enforce_bucket_policy(principal, bucket_name, object_key, action)
return principal, None
except IamError as exc:
return None, _error_response("AccessDenied", str(exc), 403)
def _has_presign_params() -> bool:
return bool(request.args.get("X-Amz-Algorithm"))
def _validate_presigned_request(action: str, bucket_name: str, object_key: str) -> Principal:
algorithm = request.args.get("X-Amz-Algorithm")
credential = request.args.get("X-Amz-Credential")
amz_date = request.args.get("X-Amz-Date")
signed_headers = request.args.get("X-Amz-SignedHeaders")
expires = request.args.get("X-Amz-Expires")
signature = request.args.get("X-Amz-Signature")
if not all([algorithm, credential, amz_date, signed_headers, expires, signature]):
raise IamError("Malformed presigned URL")
if algorithm != "AWS4-HMAC-SHA256":
raise IamError("Unsupported signing algorithm")
parts = credential.split("/")
if len(parts) != 5:
raise IamError("Invalid credential scope")
access_key, date_stamp, region, service, terminal = parts
if terminal != "aws4_request":
raise IamError("Invalid credential scope")
config_region = current_app.config["AWS_REGION"]
config_service = current_app.config["AWS_SERVICE"]
if region != config_region or service != config_service:
raise IamError("Credential scope mismatch")
try:
expiry = int(expires)
except ValueError as exc:
raise IamError("Invalid expiration") from exc
min_expiry = current_app.config.get("PRESIGNED_URL_MIN_EXPIRY_SECONDS", 1)
max_expiry = current_app.config.get("PRESIGNED_URL_MAX_EXPIRY_SECONDS", 604800)
if expiry < min_expiry or expiry > max_expiry:
raise IamError(f"Expiration must be between {min_expiry} second(s) and {max_expiry} seconds")
try:
request_time = datetime.strptime(amz_date, "%Y%m%dT%H%M%SZ").replace(tzinfo=timezone.utc)
except ValueError as exc:
raise IamError("Invalid X-Amz-Date") from exc
now = datetime.now(timezone.utc)
tolerance = timedelta(seconds=current_app.config.get("SIGV4_TIMESTAMP_TOLERANCE_SECONDS", 900))
if request_time > now + tolerance:
raise IamError("Request date is too far in the future")
if now > request_time + timedelta(seconds=expiry):
raise IamError("Presigned URL expired")
signed_headers_list = [header.strip().lower() for header in signed_headers.split(";") if header]
signed_headers_list.sort()
canonical_headers = _canonical_headers_from_request(signed_headers_list)
canonical_query = _canonical_query_from_request()
payload_hash = request.args.get("X-Amz-Content-Sha256", "UNSIGNED-PAYLOAD")
canonical_request = "\n".join(
[
request.method,
_canonical_uri(bucket_name, object_key),
canonical_query,
canonical_headers,
";".join(signed_headers_list),
payload_hash,
]
)
hashed_request = hashlib.sha256(canonical_request.encode()).hexdigest()
scope = f"{date_stamp}/{region}/{service}/aws4_request"
string_to_sign = "\n".join([
"AWS4-HMAC-SHA256",
amz_date,
scope,
hashed_request,
])
secret = _iam().secret_for_key(access_key)
signing_key = _derive_signing_key(secret, date_stamp, region, service)
expected = hmac.new(signing_key, string_to_sign.encode(), hashlib.sha256).hexdigest()
if not hmac.compare_digest(expected, signature):
raise IamError("Signature mismatch")
return _iam().principal_for_key(access_key)
def _canonical_query_from_request() -> str:
parts = []
for key in sorted(request.args.keys()):
if key == "X-Amz-Signature":
continue
values = request.args.getlist(key)
encoded_key = quote(str(key), safe="-_.~")
for value in sorted(values):
encoded_value = quote(str(value), safe="-_.~")
parts.append(f"{encoded_key}={encoded_value}")
return "&".join(parts)
def _canonical_headers_from_request(headers: list[str]) -> str:
lines = []
for header in headers:
if header == "host":
api_base = current_app.config.get("API_BASE_URL")
if api_base:
value = urlparse(api_base).netloc
else:
value = request.host
else:
value = request.headers.get(header, "")
canonical_value = " ".join(value.strip().split()) if value else ""
lines.append(f"{header}:{canonical_value}")
return "\n".join(lines) + "\n"
def _canonical_uri(bucket_name: str, object_key: str | None) -> str: def _canonical_uri(bucket_name: str, object_key: str | None) -> str:
@@ -736,8 +608,8 @@ def _generate_presigned_url(
host = parsed.netloc host = parsed.netloc
scheme = parsed.scheme scheme = parsed.scheme
else: else:
host = request.headers.get("X-Forwarded-Host", request.host) host = request.host
scheme = request.headers.get("X-Forwarded-Proto", request.scheme or "http") scheme = request.scheme or "http"
canonical_headers = f"host:{host}\n" canonical_headers = f"host:{host}\n"
canonical_request = "\n".join( canonical_request = "\n".join(
@@ -1010,7 +882,7 @@ def _render_encryption_document(config: dict[str, Any]) -> Element:
return root return root
def _stream_file(path, chunk_size: int = 256 * 1024): def _stream_file(path, chunk_size: int = 1024 * 1024):
with path.open("rb") as handle: with path.open("rb") as handle:
while True: while True:
chunk = handle.read(chunk_size) chunk = handle.read(chunk_size)
@@ -2961,9 +2833,12 @@ def object_handler(bucket_name: str, object_key: str):
is_encrypted = "x-amz-server-side-encryption" in metadata is_encrypted = "x-amz-server-side-encryption" in metadata
cond_etag = metadata.get("__etag__") cond_etag = metadata.get("__etag__")
_etag_was_healed = False
if not cond_etag and not is_encrypted: if not cond_etag and not is_encrypted:
try: try:
cond_etag = storage._compute_etag(path) cond_etag = storage._compute_etag(path)
_etag_was_healed = True
storage.heal_missing_etag(bucket_name, object_key, cond_etag)
except OSError: except OSError:
cond_etag = None cond_etag = None
if cond_etag: if cond_etag:
@@ -3009,7 +2884,7 @@ def object_handler(bucket_name: str, object_key: str):
try: try:
stat = path.stat() stat = path.stat()
file_size = stat.st_size file_size = stat.st_size
etag = metadata.get("__etag__") or storage._compute_etag(path) etag = cond_etag or storage._compute_etag(path)
except PermissionError: except PermissionError:
return _error_response("AccessDenied", "Permission denied accessing object", 403) return _error_response("AccessDenied", "Permission denied accessing object", 403)
except OSError as exc: except OSError as exc:
@@ -3057,7 +2932,7 @@ def object_handler(bucket_name: str, object_key: str):
try: try:
stat = path.stat() stat = path.stat()
response = Response(status=200) response = Response(status=200)
etag = metadata.get("__etag__") or storage._compute_etag(path) etag = cond_etag or storage._compute_etag(path)
except PermissionError: except PermissionError:
return _error_response("AccessDenied", "Permission denied accessing object", 403) return _error_response("AccessDenied", "Permission denied accessing object", 403)
except OSError as exc: except OSError as exc:
@@ -3442,9 +3317,13 @@ def head_object(bucket_name: str, object_key: str) -> Response:
return error return error
try: try:
_authorize_action(principal, bucket_name, "read", object_key=object_key) _authorize_action(principal, bucket_name, "read", object_key=object_key)
path = _storage().get_object_path(bucket_name, object_key) storage = _storage()
metadata = _storage().get_object_metadata(bucket_name, object_key) path = storage.get_object_path(bucket_name, object_key)
etag = metadata.get("__etag__") or _storage()._compute_etag(path) metadata = storage.get_object_metadata(bucket_name, object_key)
etag = metadata.get("__etag__")
if not etag:
etag = storage._compute_etag(path)
storage.heal_missing_etag(bucket_name, object_key, etag)
head_mtime = float(metadata["__last_modified__"]) if "__last_modified__" in metadata else None head_mtime = float(metadata["__last_modified__"]) if "__last_modified__" in metadata else None
if head_mtime is None: if head_mtime is None:

View File

@@ -2,6 +2,7 @@ from __future__ import annotations
import hashlib import hashlib
import json import json
import logging
import os import os
import re import re
import shutil import shutil
@@ -20,12 +21,21 @@ from typing import Any, BinaryIO, Dict, Generator, List, Optional
try: try:
import myfsio_core as _rc import myfsio_core as _rc
if not all(hasattr(_rc, f) for f in (
"validate_bucket_name", "validate_object_key", "md5_file",
"shallow_scan", "bucket_stats_scan", "search_objects_scan",
"stream_to_file_with_md5", "assemble_parts_with_md5",
"build_object_cache", "read_index_entry", "write_index_entry",
"delete_index_entry", "check_bucket_contents",
)):
raise ImportError("myfsio_core is outdated, rebuild with: cd myfsio_core && maturin develop --release")
_HAS_RUST = True _HAS_RUST = True
except ImportError: except ImportError:
_rc = None _rc = None
_HAS_RUST = False _HAS_RUST = False
# Platform-specific file locking logger = logging.getLogger(__name__)
if os.name == "nt": if os.name == "nt":
import msvcrt import msvcrt
@@ -190,6 +200,7 @@ class ObjectStorage:
object_cache_max_size: int = 100, object_cache_max_size: int = 100,
bucket_config_cache_ttl: float = 30.0, bucket_config_cache_ttl: float = 30.0,
object_key_max_length_bytes: int = 1024, object_key_max_length_bytes: int = 1024,
meta_read_cache_max: int = 2048,
) -> None: ) -> None:
self.root = Path(root) self.root = Path(root)
self.root.mkdir(parents=True, exist_ok=True) self.root.mkdir(parents=True, exist_ok=True)
@@ -208,7 +219,7 @@ class ObjectStorage:
self._sorted_key_cache: Dict[str, tuple[list[str], int]] = {} self._sorted_key_cache: Dict[str, tuple[list[str], int]] = {}
self._meta_index_locks: Dict[str, threading.Lock] = {} self._meta_index_locks: Dict[str, threading.Lock] = {}
self._meta_read_cache: OrderedDict[tuple, Optional[Dict[str, Any]]] = OrderedDict() self._meta_read_cache: OrderedDict[tuple, Optional[Dict[str, Any]]] = OrderedDict()
self._meta_read_cache_max = 2048 self._meta_read_cache_max = meta_read_cache_max
self._cleanup_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="ParentCleanup") self._cleanup_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="ParentCleanup")
self._stats_mem: Dict[str, Dict[str, int]] = {} self._stats_mem: Dict[str, Dict[str, int]] = {}
self._stats_serial: Dict[str, int] = {} self._stats_serial: Dict[str, int] = {}
@@ -218,6 +229,7 @@ class ObjectStorage:
self._stats_flush_timer: Optional[threading.Timer] = None self._stats_flush_timer: Optional[threading.Timer] = None
self._etag_index_dirty: set[str] = set() self._etag_index_dirty: set[str] = set()
self._etag_index_flush_timer: Optional[threading.Timer] = None self._etag_index_flush_timer: Optional[threading.Timer] = None
self._etag_index_mem: Dict[str, tuple[Dict[str, str], float]] = {}
def _get_bucket_lock(self, bucket_id: str) -> threading.Lock: def _get_bucket_lock(self, bucket_id: str) -> threading.Lock:
with self._registry_lock: with self._registry_lock:
@@ -427,7 +439,7 @@ class ObjectStorage:
cache_path = self._system_bucket_root(bucket_id) / "stats.json" cache_path = self._system_bucket_root(bucket_id) / "stats.json"
try: try:
cache_path.parent.mkdir(parents=True, exist_ok=True) cache_path.parent.mkdir(parents=True, exist_ok=True)
self._atomic_write_json(cache_path, data) self._atomic_write_json(cache_path, data, sync=False)
except OSError: except OSError:
pass pass
@@ -602,14 +614,7 @@ class ObjectStorage:
is_truncated=False, next_continuation_token=None, is_truncated=False, next_continuation_token=None,
) )
etag_index_path = self._system_bucket_root(bucket_id) / "etag_index.json" meta_cache: Dict[str, str] = self._get_etag_index(bucket_id)
meta_cache: Dict[str, str] = {}
if etag_index_path.exists():
try:
with open(etag_index_path, 'r', encoding='utf-8') as f:
meta_cache = json.load(f)
except (OSError, json.JSONDecodeError):
pass
entries_files: list[tuple[str, int, float, Optional[str]]] = [] entries_files: list[tuple[str, int, float, Optional[str]]] = []
entries_dirs: list[str] = [] entries_dirs: list[str] = []
@@ -1079,6 +1084,30 @@ class ObjectStorage:
safe_key = self._sanitize_object_key(object_key, self._object_key_max_length_bytes) safe_key = self._sanitize_object_key(object_key, self._object_key_max_length_bytes)
return self._read_metadata(bucket_path.name, safe_key) or {} return self._read_metadata(bucket_path.name, safe_key) or {}
def heal_missing_etag(self, bucket_name: str, object_key: str, etag: str) -> None:
"""Persist a computed ETag back to metadata (self-heal on read)."""
try:
bucket_path = self._bucket_path(bucket_name)
if not bucket_path.exists():
return
bucket_id = bucket_path.name
safe_key = self._sanitize_object_key(object_key, self._object_key_max_length_bytes)
existing = self._read_metadata(bucket_id, safe_key) or {}
if existing.get("__etag__"):
return
existing["__etag__"] = etag
self._write_metadata(bucket_id, safe_key, existing)
with self._obj_cache_lock:
cached = self._object_cache.get(bucket_id)
if cached:
obj = cached[0].get(safe_key.as_posix())
if obj and not obj.etag:
obj.etag = etag
self._etag_index_dirty.add(bucket_id)
self._schedule_etag_index_flush()
except Exception:
logger.warning("Failed to heal missing ETag for %s/%s", bucket_name, object_key)
def _cleanup_empty_parents(self, path: Path, stop_at: Path) -> None: def _cleanup_empty_parents(self, path: Path, stop_at: Path) -> None:
"""Remove empty parent directories in a background thread. """Remove empty parent directories in a background thread.
@@ -2088,6 +2117,7 @@ class ObjectStorage:
etag_index_path.parent.mkdir(parents=True, exist_ok=True) etag_index_path.parent.mkdir(parents=True, exist_ok=True)
with open(etag_index_path, 'w', encoding='utf-8') as f: with open(etag_index_path, 'w', encoding='utf-8') as f:
json.dump(raw["etag_cache"], f) json.dump(raw["etag_cache"], f)
self._etag_index_mem[bucket_id] = (dict(raw["etag_cache"]), etag_index_path.stat().st_mtime)
except OSError: except OSError:
pass pass
for key, size, mtime, etag in raw["objects"]: for key, size, mtime, etag in raw["objects"]:
@@ -2211,6 +2241,7 @@ class ObjectStorage:
etag_index_path.parent.mkdir(parents=True, exist_ok=True) etag_index_path.parent.mkdir(parents=True, exist_ok=True)
with open(etag_index_path, 'w', encoding='utf-8') as f: with open(etag_index_path, 'w', encoding='utf-8') as f:
json.dump(meta_cache, f) json.dump(meta_cache, f)
self._etag_index_mem[bucket_id] = (dict(meta_cache), etag_index_path.stat().st_mtime)
except OSError: except OSError:
pass pass
@@ -2324,6 +2355,25 @@ class ObjectStorage:
self._etag_index_dirty.add(bucket_id) self._etag_index_dirty.add(bucket_id)
self._schedule_etag_index_flush() self._schedule_etag_index_flush()
def _get_etag_index(self, bucket_id: str) -> Dict[str, str]:
etag_index_path = self._system_bucket_root(bucket_id) / "etag_index.json"
try:
current_mtime = etag_index_path.stat().st_mtime
except OSError:
return {}
cached = self._etag_index_mem.get(bucket_id)
if cached:
cache_dict, cached_mtime = cached
if current_mtime == cached_mtime:
return cache_dict
try:
with open(etag_index_path, 'r', encoding='utf-8') as f:
data = json.load(f)
self._etag_index_mem[bucket_id] = (data, current_mtime)
return data
except (OSError, json.JSONDecodeError):
return {}
def _schedule_etag_index_flush(self) -> None: def _schedule_etag_index_flush(self) -> None:
if self._etag_index_flush_timer is None or not self._etag_index_flush_timer.is_alive(): if self._etag_index_flush_timer is None or not self._etag_index_flush_timer.is_alive():
self._etag_index_flush_timer = threading.Timer(5.0, self._flush_etag_indexes) self._etag_index_flush_timer = threading.Timer(5.0, self._flush_etag_indexes)
@@ -2342,11 +2392,10 @@ class ObjectStorage:
index = {k: v.etag for k, v in objects.items() if v.etag} index = {k: v.etag for k, v in objects.items() if v.etag}
etag_index_path = self._system_bucket_root(bucket_id) / "etag_index.json" etag_index_path = self._system_bucket_root(bucket_id) / "etag_index.json"
try: try:
etag_index_path.parent.mkdir(parents=True, exist_ok=True) self._atomic_write_json(etag_index_path, index, sync=False)
with open(etag_index_path, 'w', encoding='utf-8') as f: self._etag_index_mem[bucket_id] = (index, etag_index_path.stat().st_mtime)
json.dump(index, f)
except OSError: except OSError:
pass logger.warning("Failed to flush etag index for bucket %s", bucket_id)
def warm_cache(self, bucket_names: Optional[List[str]] = None) -> None: def warm_cache(self, bucket_names: Optional[List[str]] = None) -> None:
"""Pre-warm the object cache for specified buckets or all buckets. """Pre-warm the object cache for specified buckets or all buckets.
@@ -2388,12 +2437,13 @@ class ObjectStorage:
path.mkdir(parents=True, exist_ok=True) path.mkdir(parents=True, exist_ok=True)
@staticmethod @staticmethod
def _atomic_write_json(path: Path, data: Any) -> None: def _atomic_write_json(path: Path, data: Any, *, sync: bool = True) -> None:
path.parent.mkdir(parents=True, exist_ok=True) path.parent.mkdir(parents=True, exist_ok=True)
tmp_path = path.with_suffix(".tmp") tmp_path = path.with_suffix(".tmp")
try: try:
with tmp_path.open("w", encoding="utf-8") as f: with tmp_path.open("w", encoding="utf-8") as f:
json.dump(data, f) json.dump(data, f)
if sync:
f.flush() f.flush()
os.fsync(f.fileno()) os.fsync(f.fileno())
tmp_path.replace(path) tmp_path.replace(path)

View File

@@ -225,10 +225,10 @@ def _policy_allows_public_read(policy: dict[str, Any]) -> bool:
def _bucket_access_descriptor(policy: dict[str, Any] | None) -> tuple[str, str]: def _bucket_access_descriptor(policy: dict[str, Any] | None) -> tuple[str, str]:
if not policy: if not policy:
return ("IAM only", "text-bg-secondary") return ("IAM only", "bg-secondary-subtle text-secondary-emphasis")
if _policy_allows_public_read(policy): if _policy_allows_public_read(policy):
return ("Public read", "text-bg-warning") return ("Public read", "bg-warning-subtle text-warning-emphasis")
return ("Custom policy", "text-bg-info") return ("Custom policy", "bg-info-subtle text-info-emphasis")
def _current_principal(): def _current_principal():
@@ -1063,6 +1063,27 @@ def bulk_delete_objects(bucket_name: str):
return _respond(False, f"A maximum of {MAX_KEYS} objects can be deleted per request", status_code=400) return _respond(False, f"A maximum of {MAX_KEYS} objects can be deleted per request", status_code=400)
unique_keys = list(dict.fromkeys(cleaned)) unique_keys = list(dict.fromkeys(cleaned))
folder_prefixes = [k for k in unique_keys if k.endswith("/")]
if folder_prefixes:
try:
client = get_session_s3_client()
for prefix in folder_prefixes:
unique_keys.remove(prefix)
paginator = client.get_paginator("list_objects_v2")
for page in paginator.paginate(Bucket=bucket_name, Prefix=prefix):
for obj in page.get("Contents", []):
if obj["Key"] not in unique_keys:
unique_keys.append(obj["Key"])
except (ClientError, EndpointConnectionError, ConnectionClosedError) as exc:
if isinstance(exc, ClientError):
err, status = handle_client_error(exc)
return _respond(False, err["error"], status_code=status)
return _respond(False, "S3 API server is unreachable", status_code=502)
if not unique_keys:
return _respond(False, "No objects found under the selected folders", status_code=400)
try: try:
_authorize_ui(principal, bucket_name, "delete") _authorize_ui(principal, bucket_name, "delete")
except IamError as exc: except IamError as exc:
@@ -1093,13 +1114,17 @@ def bulk_delete_objects(bucket_name: str):
else: else:
try: try:
client = get_session_s3_client() client = get_session_s3_client()
objects_to_delete = [{"Key": k} for k in unique_keys] deleted = []
errors = []
for i in range(0, len(unique_keys), 1000):
batch = unique_keys[i:i + 1000]
objects_to_delete = [{"Key": k} for k in batch]
resp = client.delete_objects( resp = client.delete_objects(
Bucket=bucket_name, Bucket=bucket_name,
Delete={"Objects": objects_to_delete, "Quiet": False}, Delete={"Objects": objects_to_delete, "Quiet": False},
) )
deleted = [d["Key"] for d in resp.get("Deleted", [])] deleted.extend(d["Key"] for d in resp.get("Deleted", []))
errors = [{"key": e["Key"], "error": e.get("Message", e.get("Code", "Unknown error"))} for e in resp.get("Errors", [])] errors.extend({"key": e["Key"], "error": e.get("Message", e.get("Code", "Unknown error"))} for e in resp.get("Errors", []))
for key in deleted: for key in deleted:
_replication_manager().trigger_replication(bucket_name, key, action="delete") _replication_manager().trigger_replication(bucket_name, key, action="delete")
except (ClientError, EndpointConnectionError, ConnectionClosedError) as exc: except (ClientError, EndpointConnectionError, ConnectionClosedError) as exc:
@@ -4126,7 +4151,7 @@ def system_dashboard():
r = rec.get("result", {}) r = rec.get("result", {})
total_freed = r.get("temp_bytes_freed", 0) + r.get("multipart_bytes_freed", 0) + r.get("orphaned_version_bytes_freed", 0) total_freed = r.get("temp_bytes_freed", 0) + r.get("multipart_bytes_freed", 0) + r.get("orphaned_version_bytes_freed", 0)
rec["bytes_freed_display"] = _format_bytes(total_freed) rec["bytes_freed_display"] = _format_bytes(total_freed)
rec["timestamp_display"] = datetime.fromtimestamp(rec["timestamp"], tz=dt_timezone.utc).strftime("%Y-%m-%d %H:%M UTC") rec["timestamp_display"] = _format_datetime_display(datetime.fromtimestamp(rec["timestamp"], tz=dt_timezone.utc))
gc_history_records.append(rec) gc_history_records.append(rec)
checker = current_app.extensions.get("integrity") checker = current_app.extensions.get("integrity")
@@ -4135,7 +4160,7 @@ def system_dashboard():
if checker: if checker:
raw = checker.get_history(limit=10, offset=0) raw = checker.get_history(limit=10, offset=0)
for rec in raw: for rec in raw:
rec["timestamp_display"] = datetime.fromtimestamp(rec["timestamp"], tz=dt_timezone.utc).strftime("%Y-%m-%d %H:%M UTC") rec["timestamp_display"] = _format_datetime_display(datetime.fromtimestamp(rec["timestamp"], tz=dt_timezone.utc))
integrity_history_records.append(rec) integrity_history_records.append(rec)
features = [ features = [
@@ -4163,6 +4188,7 @@ def system_dashboard():
gc_history=gc_history_records, gc_history=gc_history_records,
integrity_status=integrity_status, integrity_status=integrity_status,
integrity_history=integrity_history_records, integrity_history=integrity_history_records,
display_timezone=current_app.config.get("DISPLAY_TIMEZONE", "UTC"),
) )
@@ -4179,14 +4205,43 @@ def system_gc_run():
return jsonify({"error": "GC is not enabled"}), 400 return jsonify({"error": "GC is not enabled"}), 400
payload = request.get_json(silent=True) or {} payload = request.get_json(silent=True) or {}
original_dry_run = gc.dry_run started = gc.run_async(dry_run=payload.get("dry_run"))
if "dry_run" in payload: if not started:
gc.dry_run = bool(payload["dry_run"]) return jsonify({"error": "GC is already in progress"}), 409
return jsonify({"status": "started"})
@ui_bp.get("/system/gc/status")
def system_gc_status():
principal = _current_principal()
try: try:
result = gc.run_now() _iam().authorize(principal, None, "iam:*")
finally: except IamError:
gc.dry_run = original_dry_run return jsonify({"error": "Access denied"}), 403
return jsonify(result.to_dict())
gc = current_app.extensions.get("gc")
if not gc:
return jsonify({"error": "GC is not enabled"}), 400
return jsonify(gc.get_status())
@ui_bp.get("/system/gc/history")
def system_gc_history():
principal = _current_principal()
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
return jsonify({"error": "Access denied"}), 403
gc = current_app.extensions.get("gc")
if not gc:
return jsonify({"executions": []})
limit = min(int(request.args.get("limit", 10)), 200)
offset = int(request.args.get("offset", 0))
records = gc.get_history(limit=limit, offset=offset)
return jsonify({"executions": records})
@ui_bp.post("/system/integrity/run") @ui_bp.post("/system/integrity/run")
@@ -4202,11 +4257,46 @@ def system_integrity_run():
return jsonify({"error": "Integrity checker is not enabled"}), 400 return jsonify({"error": "Integrity checker is not enabled"}), 400
payload = request.get_json(silent=True) or {} payload = request.get_json(silent=True) or {}
result = checker.run_now( started = checker.run_async(
auto_heal=payload.get("auto_heal"), auto_heal=payload.get("auto_heal"),
dry_run=payload.get("dry_run"), dry_run=payload.get("dry_run"),
) )
return jsonify(result.to_dict()) if not started:
return jsonify({"error": "A scan is already in progress"}), 409
return jsonify({"status": "started"})
@ui_bp.get("/system/integrity/status")
def system_integrity_status():
principal = _current_principal()
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
return jsonify({"error": "Access denied"}), 403
checker = current_app.extensions.get("integrity")
if not checker:
return jsonify({"error": "Integrity checker is not enabled"}), 400
return jsonify(checker.get_status())
@ui_bp.get("/system/integrity/history")
def system_integrity_history():
principal = _current_principal()
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
return jsonify({"error": "Access denied"}), 403
checker = current_app.extensions.get("integrity")
if not checker:
return jsonify({"executions": []})
limit = min(int(request.args.get("limit", 10)), 200)
offset = int(request.args.get("offset", 0))
records = checker.get_history(limit=limit, offset=offset)
return jsonify({"executions": records})
@ui_bp.app_errorhandler(404) @ui_bp.app_errorhandler(404)

View File

@@ -1,6 +1,6 @@
from __future__ import annotations from __future__ import annotations
APP_VERSION = "0.4.0" APP_VERSION = "0.4.3"
def get_version() -> str: def get_version() -> str:

View File

@@ -0,0 +1,4 @@
#!/bin/sh
set -e
exec python run.py --prod

View File

@@ -125,7 +125,7 @@ pub fn delete_index_entry(py: Python<'_>, path: &str, entry_name: &str) -> PyRes
fs::write(&path_owned, serialized) fs::write(&path_owned, serialized)
.map_err(|e| PyIOError::new_err(format!("Failed to write index: {}", e)))?; .map_err(|e| PyIOError::new_err(format!("Failed to write index: {}", e)))?;
Ok(false) Ok(true)
}) })
} }

View File

@@ -1,4 +1,4 @@
Flask>=3.1.2 Flask>=3.1.3
Flask-Limiter>=4.1.1 Flask-Limiter>=4.1.1
Flask-Cors>=6.0.2 Flask-Cors>=6.0.2
Flask-WTF>=1.2.2 Flask-WTF>=1.2.2
@@ -6,8 +6,8 @@ python-dotenv>=1.2.1
pytest>=9.0.2 pytest>=9.0.2
requests>=2.32.5 requests>=2.32.5
boto3>=1.42.14 boto3>=1.42.14
granian>=2.2.0 granian>=2.7.2
psutil>=7.1.3 psutil>=7.2.2
cryptography>=46.0.3 cryptography>=46.0.5
defusedxml>=0.7.1 defusedxml>=0.7.1
duckdb>=1.4.4 duckdb>=1.5.1

View File

@@ -26,6 +26,7 @@ from typing import Optional
from app import create_api_app, create_ui_app from app import create_api_app, create_ui_app
from app.config import AppConfig from app.config import AppConfig
from app.iam import IamService, IamError, ALLOWED_ACTIONS, _derive_fernet_key from app.iam import IamService, IamError, ALLOWED_ACTIONS, _derive_fernet_key
from app.version import get_version
def _server_host() -> str: def _server_host() -> str:
@@ -229,6 +230,7 @@ if __name__ == "__main__":
parser.add_argument("--check-config", action="store_true", help="Validate configuration and exit") parser.add_argument("--check-config", action="store_true", help="Validate configuration and exit")
parser.add_argument("--show-config", action="store_true", help="Show configuration summary and exit") parser.add_argument("--show-config", action="store_true", help="Show configuration summary and exit")
parser.add_argument("--reset-cred", action="store_true", help="Reset admin credentials and exit") parser.add_argument("--reset-cred", action="store_true", help="Reset admin credentials and exit")
parser.add_argument("--version", action="version", version=f"MyFSIO {get_version()}")
args = parser.parse_args() args = parser.parse_args()
if args.reset_cred or args.mode == "reset-cred": if args.reset_cred or args.mode == "reset-cred":

View File

@@ -2655,7 +2655,7 @@ pre code {
} }
.objects-table-container { .objects-table-container {
max-height: none; max-height: 60vh;
} }
.preview-card { .preview-card {

View File

Before

Width:  |  Height:  |  Size: 200 KiB

After

Width:  |  Height:  |  Size: 200 KiB

View File

Before

Width:  |  Height:  |  Size: 872 KiB

After

Width:  |  Height:  |  Size: 872 KiB

View File

@@ -98,6 +98,9 @@
const previewMetadata = document.getElementById('preview-metadata'); const previewMetadata = document.getElementById('preview-metadata');
const previewMetadataList = document.getElementById('preview-metadata-list'); const previewMetadataList = document.getElementById('preview-metadata-list');
const previewPlaceholder = document.getElementById('preview-placeholder'); const previewPlaceholder = document.getElementById('preview-placeholder');
const previewPlaceholderDefault = previewPlaceholder ? previewPlaceholder.innerHTML : '';
const previewErrorAlert = document.getElementById('preview-error-alert');
const previewDetailsMeta = document.getElementById('preview-details-meta');
const previewImage = document.getElementById('preview-image'); const previewImage = document.getElementById('preview-image');
const previewVideo = document.getElementById('preview-video'); const previewVideo = document.getElementById('preview-video');
const previewAudio = document.getElementById('preview-audio'); const previewAudio = document.getElementById('preview-audio');
@@ -242,12 +245,12 @@
</svg> </svg>
</a> </a>
<div class="dropdown d-inline-block"> <div class="dropdown d-inline-block">
<button class="btn btn-outline-secondary btn-icon dropdown-toggle" type="button" data-bs-toggle="dropdown" data-bs-auto-close="true" aria-expanded="false" title="More actions"> <button class="btn btn-outline-secondary btn-icon dropdown-toggle" type="button" data-bs-toggle="dropdown" data-bs-auto-close="true" data-bs-config='{"popperConfig":{"strategy":"fixed"}}' aria-expanded="false" title="More actions">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" viewBox="0 0 16 16"> <svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" viewBox="0 0 16 16">
<path d="M9.5 13a1.5 1.5 0 1 1-3 0 1.5 1.5 0 0 1 3 0zm0-5a1.5 1.5 0 1 1-3 0 1.5 1.5 0 0 1 3 0zm0-5a1.5 1.5 0 1 1-3 0 1.5 1.5 0 0 1 3 0z"/> <path d="M9.5 13a1.5 1.5 0 1 1-3 0 1.5 1.5 0 0 1 3 0zm0-5a1.5 1.5 0 1 1-3 0 1.5 1.5 0 0 1 3 0zm0-5a1.5 1.5 0 1 1-3 0 1.5 1.5 0 0 1 3 0z"/>
</svg> </svg>
</button> </button>
<ul class="dropdown-menu dropdown-menu-end" style="position: fixed;"> <ul class="dropdown-menu dropdown-menu-end">
<li><button class="dropdown-item" type="button" onclick="openCopyMoveModal('copy', '${escapeHtml(obj.key)}')"> <li><button class="dropdown-item" type="button" onclick="openCopyMoveModal('copy', '${escapeHtml(obj.key)}')">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" class="me-2" viewBox="0 0 16 16"><path fill-rule="evenodd" d="M4 2a2 2 0 0 1 2-2h8a2 2 0 0 1 2 2v8a2 2 0 0 1-2 2H6a2 2 0 0 1-2-2V2Zm2-1a1 1 0 0 0-1 1v8a1 1 0 0 0 1 1h8a1 1 0 0 0 1-1V2a1 1 0 0 0-1-1H6ZM2 5a1 1 0 0 0-1 1v8a1 1 0 0 0 1 1h8a1 1 0 0 0 1-1v-1h1v1a2 2 0 0 1-2 2H2a2 2 0 0 1-2-2V6a2 2 0 0 1 2-2h1v1H2Z"/></svg> <svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" class="me-2" viewBox="0 0 16 16"><path fill-rule="evenodd" d="M4 2a2 2 0 0 1 2-2h8a2 2 0 0 1 2 2v8a2 2 0 0 1-2 2H6a2 2 0 0 1-2-2V2Zm2-1a1 1 0 0 0-1 1v8a1 1 0 0 0 1 1h8a1 1 0 0 0 1-1V2a1 1 0 0 0-1-1H6ZM2 5a1 1 0 0 0-1 1v8a1 1 0 0 0 1 1h8a1 1 0 0 0 1-1v-1h1v1a2 2 0 0 1-2 2H2a2 2 0 0 1-2-2V6a2 2 0 0 1 2-2h1v1H2Z"/></svg>
Copy Copy
@@ -866,6 +869,11 @@
const checkbox = row.querySelector('[data-folder-select]'); const checkbox = row.querySelector('[data-folder-select]');
checkbox?.addEventListener('change', (e) => { checkbox?.addEventListener('change', (e) => {
e.stopPropagation(); e.stopPropagation();
if (checkbox.checked) {
selectedRows.set(folderPath, { key: folderPath, isFolder: true });
} else {
selectedRows.delete(folderPath);
}
const folderObjects = allObjects.filter(obj => obj.key.startsWith(folderPath)); const folderObjects = allObjects.filter(obj => obj.key.startsWith(folderPath));
folderObjects.forEach(obj => { folderObjects.forEach(obj => {
if (checkbox.checked) { if (checkbox.checked) {
@@ -940,7 +948,7 @@
const row = e.target.closest('[data-object-row]'); const row = e.target.closest('[data-object-row]');
if (!row) return; if (!row) return;
if (e.target.closest('[data-delete-object]') || e.target.closest('[data-object-select]') || e.target.closest('a')) { if (e.target.closest('[data-delete-object]') || e.target.closest('[data-object-select]') || e.target.closest('a') || e.target.closest('.dropdown')) {
return; return;
} }
@@ -1350,8 +1358,11 @@
} }
if (selectAllCheckbox) { if (selectAllCheckbox) {
const filesInView = visibleItems.filter(item => item.type === 'file'); const filesInView = visibleItems.filter(item => item.type === 'file');
const total = filesInView.length; const foldersInView = visibleItems.filter(item => item.type === 'folder');
const visibleSelectedCount = filesInView.filter(item => selectedRows.has(item.data.key)).length; const total = filesInView.length + foldersInView.length;
const fileSelectedCount = filesInView.filter(item => selectedRows.has(item.data.key)).length;
const folderSelectedCount = foldersInView.filter(item => selectedRows.has(item.path)).length;
const visibleSelectedCount = fileSelectedCount + folderSelectedCount;
selectAllCheckbox.disabled = total === 0; selectAllCheckbox.disabled = total === 0;
selectAllCheckbox.checked = visibleSelectedCount > 0 && visibleSelectedCount === total && total > 0; selectAllCheckbox.checked = visibleSelectedCount > 0 && visibleSelectedCount === total && total > 0;
selectAllCheckbox.indeterminate = visibleSelectedCount > 0 && visibleSelectedCount < total; selectAllCheckbox.indeterminate = visibleSelectedCount > 0 && visibleSelectedCount < total;
@@ -1373,8 +1384,12 @@
const keys = Array.from(selectedRows.keys()); const keys = Array.from(selectedRows.keys());
bulkDeleteList.innerHTML = ''; bulkDeleteList.innerHTML = '';
if (bulkDeleteCount) { if (bulkDeleteCount) {
const label = keys.length === 1 ? 'object' : 'objects'; const folderCount = keys.filter(k => k.endsWith('/')).length;
bulkDeleteCount.textContent = `${keys.length} ${label} selected`; const objectCount = keys.length - folderCount;
const parts = [];
if (folderCount) parts.push(`${folderCount} folder${folderCount !== 1 ? 's' : ''}`);
if (objectCount) parts.push(`${objectCount} object${objectCount !== 1 ? 's' : ''}`);
bulkDeleteCount.textContent = `${parts.join(' and ')} selected`;
} }
if (!keys.length) { if (!keys.length) {
const empty = document.createElement('li'); const empty = document.createElement('li');
@@ -1513,7 +1528,7 @@
}; };
const response = await fetch(endpoint, { const response = await fetch(endpoint, {
method: 'POST', method: 'POST',
headers: { 'Content-Type': 'application/json' }, headers: { 'Content-Type': 'application/json', 'X-CSRFToken': window.getCsrfToken ? window.getCsrfToken() : '' },
body: JSON.stringify(payload), body: JSON.stringify(payload),
}); });
const data = await response.json(); const data = await response.json();
@@ -1957,6 +1972,10 @@
[previewImage, previewVideo, previewAudio, previewIframe].forEach((el) => { [previewImage, previewVideo, previewAudio, previewIframe].forEach((el) => {
if (!el) return; if (!el) return;
el.classList.add('d-none'); el.classList.add('d-none');
if (el.tagName === 'IMG') {
el.removeAttribute('src');
el.onload = null;
}
if (el.tagName === 'VIDEO' || el.tagName === 'AUDIO') { if (el.tagName === 'VIDEO' || el.tagName === 'AUDIO') {
el.pause(); el.pause();
el.removeAttribute('src'); el.removeAttribute('src');
@@ -1969,9 +1988,38 @@
previewText.classList.add('d-none'); previewText.classList.add('d-none');
previewText.textContent = ''; previewText.textContent = '';
} }
previewPlaceholder.innerHTML = previewPlaceholderDefault;
previewPlaceholder.classList.remove('d-none'); previewPlaceholder.classList.remove('d-none');
}; };
let previewFailed = false;
const handlePreviewError = () => {
previewFailed = true;
if (downloadButton) {
downloadButton.classList.add('disabled');
downloadButton.removeAttribute('href');
}
if (presignButton) presignButton.disabled = true;
if (generatePresignButton) generatePresignButton.disabled = true;
if (previewDetailsMeta) previewDetailsMeta.classList.add('d-none');
if (previewMetadata) previewMetadata.classList.add('d-none');
const tagsPanel = document.getElementById('preview-tags');
if (tagsPanel) tagsPanel.classList.add('d-none');
const versionPanel = document.getElementById('version-panel');
if (versionPanel) versionPanel.classList.add('d-none');
if (previewErrorAlert) {
previewErrorAlert.textContent = 'Unable to load object \u2014 it may have been deleted, or the server returned an error.';
previewErrorAlert.classList.remove('d-none');
}
};
const clearPreviewError = () => {
previewFailed = false;
if (previewErrorAlert) previewErrorAlert.classList.add('d-none');
if (previewDetailsMeta) previewDetailsMeta.classList.remove('d-none');
};
async function fetchMetadata(metadataUrl) { async function fetchMetadata(metadataUrl) {
if (!metadataUrl) return null; if (!metadataUrl) return null;
try { try {
@@ -1993,6 +2041,7 @@
previewPanel.classList.remove('d-none'); previewPanel.classList.remove('d-none');
activeRow = row; activeRow = row;
renderMetadata(null); renderMetadata(null);
clearPreviewError();
previewKey.textContent = row.dataset.key; previewKey.textContent = row.dataset.key;
previewSize.textContent = formatBytes(Number(row.dataset.size)); previewSize.textContent = formatBytes(Number(row.dataset.size));
@@ -2016,18 +2065,71 @@
const previewUrl = row.dataset.previewUrl; const previewUrl = row.dataset.previewUrl;
const lower = row.dataset.key.toLowerCase(); const lower = row.dataset.key.toLowerCase();
if (previewUrl && lower.match(/\.(png|jpg|jpeg|gif|webp|svg|ico|bmp)$/)) { if (previewUrl && lower.match(/\.(png|jpg|jpeg|gif|webp|svg|ico|bmp)$/)) {
previewImage.src = previewUrl; previewPlaceholder.innerHTML = '<div class="spinner-border spinner-border-sm text-secondary" role="status"></div><div class="small mt-2">Loading preview\u2026</div>';
const currentRow = row;
fetch(previewUrl)
.then((r) => {
if (activeRow !== currentRow) return;
if (!r.ok) {
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
handlePreviewError();
return;
}
return r.blob();
})
.then((blob) => {
if (!blob || activeRow !== currentRow) return;
const url = URL.createObjectURL(blob);
previewImage.onload = () => {
if (activeRow !== currentRow) { URL.revokeObjectURL(url); return; }
previewImage.classList.remove('d-none'); previewImage.classList.remove('d-none');
previewPlaceholder.classList.add('d-none'); previewPlaceholder.classList.add('d-none');
};
previewImage.onerror = () => {
if (activeRow !== currentRow) { URL.revokeObjectURL(url); return; }
URL.revokeObjectURL(url);
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
};
previewImage.src = url;
})
.catch(() => {
if (activeRow !== currentRow) return;
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
handlePreviewError();
});
} else if (previewUrl && lower.match(/\.(mp4|webm|ogv|mov|avi|mkv)$/)) { } else if (previewUrl && lower.match(/\.(mp4|webm|ogv|mov|avi|mkv)$/)) {
const currentRow = row;
previewVideo.onerror = () => {
if (activeRow !== currentRow) return;
previewVideo.classList.add('d-none');
previewPlaceholder.classList.remove('d-none');
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
handlePreviewError();
};
previewVideo.src = previewUrl; previewVideo.src = previewUrl;
previewVideo.classList.remove('d-none'); previewVideo.classList.remove('d-none');
previewPlaceholder.classList.add('d-none'); previewPlaceholder.classList.add('d-none');
} else if (previewUrl && lower.match(/\.(mp3|wav|flac|ogg|aac|m4a|wma)$/)) { } else if (previewUrl && lower.match(/\.(mp3|wav|flac|ogg|aac|m4a|wma)$/)) {
const currentRow = row;
previewAudio.onerror = () => {
if (activeRow !== currentRow) return;
previewAudio.classList.add('d-none');
previewPlaceholder.classList.remove('d-none');
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
handlePreviewError();
};
previewAudio.src = previewUrl; previewAudio.src = previewUrl;
previewAudio.classList.remove('d-none'); previewAudio.classList.remove('d-none');
previewPlaceholder.classList.add('d-none'); previewPlaceholder.classList.add('d-none');
} else if (previewUrl && lower.match(/\.(pdf)$/)) { } else if (previewUrl && lower.match(/\.(pdf)$/)) {
const currentRow = row;
previewIframe.onerror = () => {
if (activeRow !== currentRow) return;
previewIframe.classList.add('d-none');
previewPlaceholder.classList.remove('d-none');
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
handlePreviewError();
};
previewIframe.src = previewUrl; previewIframe.src = previewUrl;
previewIframe.style.minHeight = '500px'; previewIframe.style.minHeight = '500px';
previewIframe.classList.remove('d-none'); previewIframe.classList.remove('d-none');
@@ -2052,14 +2154,17 @@
}) })
.catch(() => { .catch(() => {
if (activeRow !== currentRow) return; if (activeRow !== currentRow) return;
previewText.textContent = 'Failed to load preview'; previewText.classList.add('d-none');
previewPlaceholder.classList.remove('d-none');
previewPlaceholder.innerHTML = '<div class="small text-muted">Failed to load preview</div>';
handlePreviewError();
}); });
} }
const metadataUrl = row.dataset.metadataUrl; const metadataUrl = row.dataset.metadataUrl;
if (metadataUrl) { if (metadataUrl) {
const metadata = await fetchMetadata(metadataUrl); const metadata = await fetchMetadata(metadataUrl);
if (activeRow === row) { if (activeRow === row && !previewFailed) {
renderMetadata(metadata); renderMetadata(metadata);
} }
} }
@@ -3157,6 +3262,15 @@
} }
}); });
const foldersInView = visibleItems.filter(item => item.type === 'folder');
foldersInView.forEach(item => {
if (shouldSelect) {
selectedRows.set(item.path, { key: item.path, isFolder: true });
} else {
selectedRows.delete(item.path);
}
});
document.querySelectorAll('[data-folder-select]').forEach(cb => { document.querySelectorAll('[data-folder-select]').forEach(cb => {
cb.checked = shouldSelect; cb.checked = shouldSelect;
}); });
@@ -3957,6 +4071,10 @@
const loadObjectTags = async (row) => { const loadObjectTags = async (row) => {
if (!row || !previewTagsPanel) return; if (!row || !previewTagsPanel) return;
if (previewFailed) {
previewTagsPanel.classList.add('d-none');
return;
}
const tagsUrl = row.dataset.tagsUrl; const tagsUrl = row.dataset.tagsUrl;
if (!tagsUrl) { if (!tagsUrl) {
previewTagsPanel.classList.add('d-none'); previewTagsPanel.classList.add('d-none');

View File

@@ -3,6 +3,8 @@ window.BucketDetailUpload = (function() {
const MULTIPART_THRESHOLD = 8 * 1024 * 1024; const MULTIPART_THRESHOLD = 8 * 1024 * 1024;
const CHUNK_SIZE = 8 * 1024 * 1024; const CHUNK_SIZE = 8 * 1024 * 1024;
const MAX_PART_RETRIES = 3;
const RETRY_BASE_DELAY_MS = 1000;
let state = { let state = {
isUploading: false, isUploading: false,
@@ -204,6 +206,67 @@ window.BucketDetailUpload = (function() {
} }
} }
function uploadPartXHR(url, chunk, csrfToken, baseBytes, fileSize, progressItem, partNumber, totalParts) {
return new Promise((resolve, reject) => {
const xhr = new XMLHttpRequest();
xhr.open('PUT', url, true);
xhr.setRequestHeader('X-CSRFToken', csrfToken || '');
xhr.upload.addEventListener('progress', (e) => {
if (e.lengthComputable) {
updateProgressItem(progressItem, {
status: `Part ${partNumber}/${totalParts}`,
loaded: baseBytes + e.loaded,
total: fileSize
});
}
});
xhr.addEventListener('load', () => {
if (xhr.status >= 200 && xhr.status < 300) {
try {
resolve(JSON.parse(xhr.responseText));
} catch {
reject(new Error(`Part ${partNumber}: invalid response`));
}
} else {
try {
const data = JSON.parse(xhr.responseText);
reject(new Error(data.error || `Part ${partNumber} failed (${xhr.status})`));
} catch {
reject(new Error(`Part ${partNumber} failed (${xhr.status})`));
}
}
});
xhr.addEventListener('error', () => reject(new Error(`Part ${partNumber}: network error`)));
xhr.addEventListener('abort', () => reject(new Error(`Part ${partNumber}: aborted`)));
xhr.send(chunk);
});
}
async function uploadPartWithRetry(url, chunk, csrfToken, baseBytes, fileSize, progressItem, partNumber, totalParts) {
let lastError;
for (let attempt = 0; attempt <= MAX_PART_RETRIES; attempt++) {
try {
return await uploadPartXHR(url, chunk, csrfToken, baseBytes, fileSize, progressItem, partNumber, totalParts);
} catch (err) {
lastError = err;
if (attempt < MAX_PART_RETRIES) {
const delay = RETRY_BASE_DELAY_MS * Math.pow(2, attempt);
updateProgressItem(progressItem, {
status: `Part ${partNumber}/${totalParts} retry ${attempt + 1}/${MAX_PART_RETRIES}...`,
loaded: baseBytes,
total: fileSize
});
await new Promise(r => setTimeout(r, delay));
}
}
}
throw lastError;
}
async function uploadMultipart(file, objectKey, metadata, progressItem, urls) { async function uploadMultipart(file, objectKey, metadata, progressItem, urls) {
const csrfToken = document.querySelector('input[name="csrf_token"]')?.value; const csrfToken = document.querySelector('input[name="csrf_token"]')?.value;
@@ -233,26 +296,14 @@ window.BucketDetailUpload = (function() {
const end = Math.min(start + CHUNK_SIZE, file.size); const end = Math.min(start + CHUNK_SIZE, file.size);
const chunk = file.slice(start, end); const chunk = file.slice(start, end);
updateProgressItem(progressItem, { const partData = await uploadPartWithRetry(
status: `Part ${partNumber}/${totalParts}`, `${partUrl}?partNumber=${partNumber}`,
loaded: uploadedBytes, chunk, csrfToken, uploadedBytes, file.size,
total: file.size progressItem, partNumber, totalParts
}); );
const partResp = await fetch(`${partUrl}?partNumber=${partNumber}`, {
method: 'PUT',
headers: { 'X-CSRFToken': csrfToken || '' },
body: chunk
});
if (!partResp.ok) {
const err = await partResp.json().catch(() => ({}));
throw new Error(err.error || `Part ${partNumber} failed`);
}
const partData = await partResp.json();
parts.push({ part_number: partNumber, etag: partData.etag }); parts.push({ part_number: partNumber, etag: partData.etag });
uploadedBytes += chunk.size; uploadedBytes += (end - start);
updateProgressItem(progressItem, { updateProgressItem(progressItem, {
loaded: uploadedBytes, loaded: uploadedBytes,

View File

@@ -257,7 +257,8 @@
Share Link Share Link
</button> </button>
</div> </div>
<div class="p-3 rounded mb-3" style="background: var(--myfsio-preview-bg);"> <div id="preview-error-alert" class="alert alert-warning d-none py-2 px-3 mb-3 small" role="alert"></div>
<div id="preview-details-meta" class="p-3 rounded mb-3" style="background: var(--myfsio-preview-bg);">
<dl class="row small mb-0"> <dl class="row small mb-0">
<dt class="col-5 text-muted fw-normal">Last modified</dt> <dt class="col-5 text-muted fw-normal">Last modified</dt>
<dd class="col-7 mb-2 fw-medium" id="preview-modified"></dd> <dd class="col-7 mb-2 fw-medium" id="preview-modified"></dd>
@@ -920,14 +921,14 @@
<path d="M8 16A8 8 0 1 0 8 0a8 8 0 0 0 0 16zm.93-9.412-1 4.705c-.07.34.029.533.304.533.194 0 .487-.07.686-.246l-.088.416c-.287.346-.92.598-1.465.598-.703 0-1.002-.422-.808-1.319l.738-3.468c.064-.293.006-.399-.287-.47l-.451-.081.082-.381 2.29-.287zM8 5.5a1 1 0 1 1 0-2 1 1 0 0 1 0 2z"/> <path d="M8 16A8 8 0 1 0 8 0a8 8 0 0 0 0 16zm.93-9.412-1 4.705c-.07.34.029.533.304.533.194 0 .487-.07.686-.246l-.088.416c-.287.346-.92.598-1.465.598-.703 0-1.002-.422-.808-1.319l.738-3.468c.064-.293.006-.399-.287-.47l-.451-.081.082-.381 2.29-.287zM8 5.5a1 1 0 1 1 0-2 1 1 0 0 1 0 2z"/>
</svg> </svg>
<div> <div>
<strong>Storage quota enabled</strong> <strong>Storage quota active</strong>
<p class="mb-0 small"> <p class="mb-0 small">
{% if max_bytes is not none and max_objects is not none %} {% if max_bytes is not none and max_objects is not none %}
Limited to {{ max_bytes | filesizeformat }} and {{ max_objects }} objects. This bucket is limited to {{ max_bytes | filesizeformat }} storage and {{ max_objects }} objects.
{% elif max_bytes is not none %} {% elif max_bytes is not none %}
Limited to {{ max_bytes | filesizeformat }} storage. This bucket is limited to {{ max_bytes | filesizeformat }} storage.
{% else %} {% else %}
Limited to {{ max_objects }} objects. This bucket is limited to {{ max_objects }} objects.
{% endif %} {% endif %}
</p> </p>
</div> </div>
@@ -2057,7 +2058,7 @@
<div class="col-12"> <div class="col-12">
<label class="form-label fw-medium">Select files</label> <label class="form-label fw-medium">Select files</label>
<input class="form-control" type="file" name="object" id="uploadFileInput" multiple required /> <input class="form-control" type="file" name="object" id="uploadFileInput" multiple required />
<div class="form-text">Select one or more files from your device. Files ≥ 8&nbsp;MB automatically switch to multipart uploads.</div> <div class="form-text">Select one or more files from your device. Files ≥ 8&nbsp;MB use multipart uploads with automatic retry.</div>
</div> </div>
<div class="col-12"> <div class="col-12">
<div class="upload-dropzone text-center" data-dropzone> <div class="upload-dropzone text-center" data-dropzone>

View File

@@ -122,6 +122,13 @@
</button> </button>
</div> </div>
<div id="gcScanningBanner" class="mb-3 {% if not gc_status.scanning %}d-none{% endif %}">
<div class="alert alert-info mb-0 small d-flex align-items-center gap-2">
<div class="spinner-border spinner-border-sm text-info" role="status"></div>
<span>GC in progress<span id="gcScanElapsed"></span></span>
</div>
</div>
<div id="gcResult" class="mb-3 d-none"> <div id="gcResult" class="mb-3 d-none">
<div class="alert mb-0 small" id="gcResultAlert"> <div class="alert mb-0 small" id="gcResultAlert">
<div class="d-flex justify-content-between align-items-start"> <div class="d-flex justify-content-between align-items-start">
@@ -148,6 +155,7 @@
</div> </div>
</div> </div>
<div id="gcHistoryContainer">
{% if gc_history %} {% if gc_history %}
<h6 class="fw-semibold small text-muted mb-2 d-flex align-items-center gap-2"> <h6 class="fw-semibold small text-muted mb-2 d-flex align-items-center gap-2">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" viewBox="0 0 16 16"> <svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" viewBox="0 0 16 16">
@@ -193,6 +201,7 @@
<p class="text-muted small mb-0">No executions recorded yet.</p> <p class="text-muted small mb-0">No executions recorded yet.</p>
</div> </div>
{% endif %} {% endif %}
</div>
{% else %} {% else %}
<div class="text-center py-4"> <div class="text-center py-4">
@@ -233,21 +242,28 @@
<div class="card-body px-4 pb-4"> <div class="card-body px-4 pb-4">
{% if integrity_status.enabled %} {% if integrity_status.enabled %}
<div class="d-flex gap-2 flex-wrap mb-3"> <div class="d-flex gap-2 flex-wrap mb-3">
<button class="btn btn-primary btn-sm d-inline-flex align-items-center" id="integrityRunBtn" onclick="runIntegrity(false, false)"> <button class="btn btn-primary btn-sm d-inline-flex align-items-center" id="integrityRunBtn" onclick="runIntegrity(false, false)" {% if integrity_status.scanning %}disabled{% endif %}>
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" class="me-1 flex-shrink-0" viewBox="0 0 16 16"> <svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" class="me-1 flex-shrink-0" viewBox="0 0 16 16">
<path fill-rule="evenodd" d="M8 3a5 5 0 1 0 4.546 2.914.5.5 0 0 1 .908-.417A6 6 0 1 1 8 2v1z"/> <path fill-rule="evenodd" d="M8 3a5 5 0 1 0 4.546 2.914.5.5 0 0 1 .908-.417A6 6 0 1 1 8 2v1z"/>
<path d="M8 4.466V.534a.25.25 0 0 1 .41-.192l2.36 1.966c.12.1.12.284 0 .384L8.41 4.658A.25.25 0 0 1 8 4.466z"/> <path d="M8 4.466V.534a.25.25 0 0 1 .41-.192l2.36 1.966c.12.1.12.284 0 .384L8.41 4.658A.25.25 0 0 1 8 4.466z"/>
</svg> </svg>
Scan Now Scan Now
</button> </button>
<button class="btn btn-outline-warning btn-sm" id="integrityHealBtn" onclick="runIntegrity(false, true)"> <button class="btn btn-outline-warning btn-sm" id="integrityHealBtn" onclick="runIntegrity(false, true)" {% if integrity_status.scanning %}disabled{% endif %}>
Scan &amp; Heal Scan &amp; Heal
</button> </button>
<button class="btn btn-outline-secondary btn-sm" id="integrityDryRunBtn" onclick="runIntegrity(true, false)"> <button class="btn btn-outline-secondary btn-sm" id="integrityDryRunBtn" onclick="runIntegrity(true, false)" {% if integrity_status.scanning %}disabled{% endif %}>
Dry Run Dry Run
</button> </button>
</div> </div>
<div id="integrityScanningBanner" class="mb-3 {% if not integrity_status.scanning %}d-none{% endif %}">
<div class="alert alert-info mb-0 small d-flex align-items-center gap-2">
<div class="spinner-border spinner-border-sm text-info" role="status"></div>
<span>Scan in progress<span id="integrityScanElapsed"></span></span>
</div>
</div>
<div id="integrityResult" class="mb-3 d-none"> <div id="integrityResult" class="mb-3 d-none">
<div class="alert mb-0 small" id="integrityResultAlert"> <div class="alert mb-0 small" id="integrityResultAlert">
<div class="d-flex justify-content-between align-items-start"> <div class="d-flex justify-content-between align-items-start">
@@ -273,6 +289,7 @@
</div> </div>
</div> </div>
<div id="integrityHistoryContainer">
{% if integrity_history %} {% if integrity_history %}
<h6 class="fw-semibold small text-muted mb-2 d-flex align-items-center gap-2"> <h6 class="fw-semibold small text-muted mb-2 d-flex align-items-center gap-2">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" viewBox="0 0 16 16"> <svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" viewBox="0 0 16 16">
@@ -326,6 +343,7 @@
<p class="text-muted small mb-0">No scans recorded yet.</p> <p class="text-muted small mb-0">No scans recorded yet.</p>
</div> </div>
{% endif %} {% endif %}
</div>
{% else %} {% else %}
<div class="text-center py-4"> <div class="text-center py-4">
@@ -369,30 +387,137 @@
return (i === 0 ? b : b.toFixed(1)) + ' ' + units[i]; return (i === 0 ? b : b.toFixed(1)) + ' ' + units[i];
} }
window.runGC = function (dryRun) { var _displayTimezone = {{ display_timezone|tojson }};
setLoading(dryRun ? 'gcDryRunBtn' : 'gcRunBtn', true);
setLoading(dryRun ? 'gcRunBtn' : 'gcDryRunBtn', true, true);
fetch('{{ url_for("ui.system_gc_run") }}', { function formatTimestamp(ts) {
method: 'POST', var d = new Date(ts * 1000);
headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken}, try {
body: JSON.stringify({dry_run: dryRun}) var opts = {year: 'numeric', month: 'short', day: '2-digit', hour: '2-digit', minute: '2-digit', hour12: false, timeZone: _displayTimezone, timeZoneName: 'short'};
return d.toLocaleString('en-US', opts);
} catch (e) {
var pad = function (n) { return n < 10 ? '0' + n : '' + n; };
return d.getUTCFullYear() + '-' + pad(d.getUTCMonth() + 1) + '-' + pad(d.getUTCDate()) +
' ' + pad(d.getUTCHours()) + ':' + pad(d.getUTCMinutes()) + ' UTC';
}
}
var _gcHistoryIcon = '<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" fill="currentColor" viewBox="0 0 16 16">' +
'<path d="M8.515 1.019A7 7 0 0 0 8 1V0a8 8 0 0 1 .589.022l-.074.997zm2.004.45a7.003 7.003 0 0 0-.985-.299l.219-.976c.383.086.76.2 1.126.342l-.36.933zm1.37.71a7.01 7.01 0 0 0-.439-.27l.493-.87a8.025 8.025 0 0 1 .979.654l-.615.789a6.996 6.996 0 0 0-.418-.302zm1.834 1.79a6.99 6.99 0 0 0-.653-.796l.724-.69c.27.285.52.59.747.91l-.818.576zm.744 1.352a7.08 7.08 0 0 0-.214-.468l.893-.45a7.976 7.976 0 0 1 .45 1.088l-.95.313a7.023 7.023 0 0 0-.179-.483zm.53 2.507a6.991 6.991 0 0 0-.1-1.025l.985-.17c.067.386.106.778.116 1.17l-1 .025zm-.131 1.538c.033-.17.06-.339.081-.51l.993.123a7.957 7.957 0 0 1-.23 1.155l-.964-.267c.046-.165.086-.332.12-.501zm-.952 2.379c.184-.29.346-.594.486-.908l.914.405c-.16.36-.345.706-.555 1.038l-.845-.535zm-.964 1.205c.122-.122.239-.248.35-.378l.758.653a8.073 8.073 0 0 1-.401.432l-.707-.707z"/>' +
'<path d="M8 1a7 7 0 1 0 4.95 11.95l.707.707A8.001 8.001 0 1 1 8 0v1z"/>' +
'<path d="M7.5 3a.5.5 0 0 1 .5.5v5.21l3.248 1.856a.5.5 0 0 1-.496.868l-3.5-2A.5.5 0 0 1 7 8V3.5a.5.5 0 0 1 .5-.5z"/></svg>';
function _gcRefreshHistory() {
fetch('{{ url_for("ui.system_gc_history") }}?limit=10', {
headers: {'X-CSRFToken': csrfToken}
}) })
.then(function (r) { return r.json(); }) .then(function (r) { return r.json(); })
.then(function (data) { .then(function (hist) {
var container = document.getElementById('gcHistoryContainer');
if (!container) return;
var execs = hist.executions || [];
if (execs.length === 0) {
container.innerHTML = '<div class="text-center py-2"><p class="text-muted small mb-0">No executions recorded yet.</p></div>';
return;
}
var html = '<h6 class="fw-semibold small text-muted mb-2 d-flex align-items-center gap-2">' +
_gcHistoryIcon + ' Recent Executions</h6>' +
'<div class="table-responsive"><table class="table table-sm small mb-0">' +
'<thead class="table-light"><tr><th>Time</th><th class="text-center">Cleaned</th>' +
'<th class="text-center">Freed</th><th class="text-center">Mode</th></tr></thead><tbody>';
execs.forEach(function (exec) {
var r = exec.result || {};
var cleaned = (r.temp_files_deleted || 0) + (r.multipart_uploads_deleted || 0) +
(r.lock_files_deleted || 0) + (r.orphaned_metadata_deleted || 0) +
(r.orphaned_versions_deleted || 0) + (r.empty_dirs_removed || 0);
var freed = (r.temp_bytes_freed || 0) + (r.multipart_bytes_freed || 0) +
(r.orphaned_version_bytes_freed || 0);
var mode = exec.dry_run
? '<span class="badge bg-warning bg-opacity-10 text-warning">Dry run</span>'
: '<span class="badge bg-primary bg-opacity-10 text-primary">Live</span>';
html += '<tr><td class="text-nowrap">' + formatTimestamp(exec.timestamp) + '</td>' +
'<td class="text-center">' + cleaned + '</td>' +
'<td class="text-center">' + formatBytes(freed) + '</td>' +
'<td class="text-center">' + mode + '</td></tr>';
});
html += '</tbody></table></div>';
container.innerHTML = html;
})
.catch(function () {});
}
function _integrityRefreshHistory() {
fetch('{{ url_for("ui.system_integrity_history") }}?limit=10', {
headers: {'X-CSRFToken': csrfToken}
})
.then(function (r) { return r.json(); })
.then(function (hist) {
var container = document.getElementById('integrityHistoryContainer');
if (!container) return;
var execs = hist.executions || [];
if (execs.length === 0) {
container.innerHTML = '<div class="text-center py-2"><p class="text-muted small mb-0">No scans recorded yet.</p></div>';
return;
}
var html = '<h6 class="fw-semibold small text-muted mb-2 d-flex align-items-center gap-2">' +
_gcHistoryIcon + ' Recent Scans</h6>' +
'<div class="table-responsive"><table class="table table-sm small mb-0">' +
'<thead class="table-light"><tr><th>Time</th><th class="text-center">Scanned</th>' +
'<th class="text-center">Issues</th><th class="text-center">Healed</th>' +
'<th class="text-center">Mode</th></tr></thead><tbody>';
execs.forEach(function (exec) {
var r = exec.result || {};
var issues = (r.corrupted_objects || 0) + (r.orphaned_objects || 0) +
(r.phantom_metadata || 0) + (r.stale_versions || 0) +
(r.etag_cache_inconsistencies || 0) + (r.legacy_metadata_drifts || 0);
var issueHtml = issues > 0
? '<span class="text-danger fw-medium">' + issues + '</span>'
: '<span class="text-success">0</span>';
var mode = exec.dry_run
? '<span class="badge bg-warning bg-opacity-10 text-warning">Dry</span>'
: (exec.auto_heal
? '<span class="badge bg-success bg-opacity-10 text-success">Heal</span>'
: '<span class="badge bg-primary bg-opacity-10 text-primary">Scan</span>');
html += '<tr><td class="text-nowrap">' + formatTimestamp(exec.timestamp) + '</td>' +
'<td class="text-center">' + (r.objects_scanned || 0) + '</td>' +
'<td class="text-center">' + issueHtml + '</td>' +
'<td class="text-center">' + (r.issues_healed || 0) + '</td>' +
'<td class="text-center">' + mode + '</td></tr>';
});
html += '</tbody></table></div>';
container.innerHTML = html;
})
.catch(function () {});
}
var _gcPollTimer = null;
var _gcLastDryRun = false;
function _gcSetScanning(scanning) {
var banner = document.getElementById('gcScanningBanner');
var btns = ['gcRunBtn', 'gcDryRunBtn'];
if (scanning) {
banner.classList.remove('d-none');
btns.forEach(function (id) {
var el = document.getElementById(id);
if (el) el.disabled = true;
});
} else {
banner.classList.add('d-none');
document.getElementById('gcScanElapsed').textContent = '';
btns.forEach(function (id) {
var el = document.getElementById(id);
if (el) el.disabled = false;
});
}
}
function _gcShowResult(data, dryRun) {
var container = document.getElementById('gcResult'); var container = document.getElementById('gcResult');
var alert = document.getElementById('gcResultAlert'); var alert = document.getElementById('gcResultAlert');
var title = document.getElementById('gcResultTitle'); var title = document.getElementById('gcResultTitle');
var body = document.getElementById('gcResultBody'); var body = document.getElementById('gcResultBody');
container.classList.remove('d-none'); container.classList.remove('d-none');
if (data.error) {
alert.className = 'alert alert-danger mb-0 small';
title.textContent = 'Error';
body.textContent = data.error;
return;
}
var totalItems = (data.temp_files_deleted || 0) + (data.multipart_uploads_deleted || 0) + var totalItems = (data.temp_files_deleted || 0) + (data.multipart_uploads_deleted || 0) +
(data.lock_files_deleted || 0) + (data.orphaned_metadata_deleted || 0) + (data.lock_files_deleted || 0) + (data.orphaned_metadata_deleted || 0) +
(data.orphaned_versions_deleted || 0) + (data.empty_dirs_removed || 0); (data.orphaned_versions_deleted || 0) + (data.empty_dirs_removed || 0);
@@ -414,8 +539,67 @@
if (data.errors && data.errors.length > 0) lines.push('Errors: ' + data.errors.join(', ')); if (data.errors && data.errors.length > 0) lines.push('Errors: ' + data.errors.join(', '));
body.innerHTML = lines.join('<br>'); body.innerHTML = lines.join('<br>');
}
function _gcPoll() {
fetch('{{ url_for("ui.system_gc_status") }}', {
headers: {'X-CSRFToken': csrfToken}
})
.then(function (r) { return r.json(); })
.then(function (status) {
if (status.scanning) {
var elapsed = status.scan_elapsed_seconds || 0;
document.getElementById('gcScanElapsed').textContent = ' (' + elapsed.toFixed(0) + 's)';
_gcPollTimer = setTimeout(_gcPoll, 2000);
} else {
_gcSetScanning(false);
_gcRefreshHistory();
fetch('{{ url_for("ui.system_gc_history") }}?limit=1', {
headers: {'X-CSRFToken': csrfToken}
})
.then(function (r) { return r.json(); })
.then(function (hist) {
if (hist.executions && hist.executions.length > 0) {
var latest = hist.executions[0];
_gcShowResult(latest.result, latest.dry_run);
}
})
.catch(function () {});
}
})
.catch(function () {
_gcPollTimer = setTimeout(_gcPoll, 3000);
});
}
window.runGC = function (dryRun) {
_gcLastDryRun = dryRun;
document.getElementById('gcResult').classList.add('d-none');
_gcSetScanning(true);
fetch('{{ url_for("ui.system_gc_run") }}', {
method: 'POST',
headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
body: JSON.stringify({dry_run: dryRun})
})
.then(function (r) { return r.json(); })
.then(function (data) {
if (data.error) {
_gcSetScanning(false);
var container = document.getElementById('gcResult');
var alert = document.getElementById('gcResultAlert');
var title = document.getElementById('gcResultTitle');
var body = document.getElementById('gcResultBody');
container.classList.remove('d-none');
alert.className = 'alert alert-danger mb-0 small';
title.textContent = 'Error';
body.textContent = data.error;
return;
}
_gcPollTimer = setTimeout(_gcPoll, 2000);
}) })
.catch(function (err) { .catch(function (err) {
_gcSetScanning(false);
var container = document.getElementById('gcResult'); var container = document.getElementById('gcResult');
var alert = document.getElementById('gcResultAlert'); var alert = document.getElementById('gcResultAlert');
var title = document.getElementById('gcResultTitle'); var title = document.getElementById('gcResultTitle');
@@ -424,39 +608,43 @@
alert.className = 'alert alert-danger mb-0 small'; alert.className = 'alert alert-danger mb-0 small';
title.textContent = 'Error'; title.textContent = 'Error';
body.textContent = err.message; body.textContent = err.message;
})
.finally(function () {
setLoading('gcRunBtn', false);
setLoading('gcDryRunBtn', false);
}); });
}; };
window.runIntegrity = function (dryRun, autoHeal) { {% if gc_status.scanning %}
var activeBtn = dryRun ? 'integrityDryRunBtn' : (autoHeal ? 'integrityHealBtn' : 'integrityRunBtn'); _gcSetScanning(true);
['integrityRunBtn', 'integrityHealBtn', 'integrityDryRunBtn'].forEach(function (id) { _gcPollTimer = setTimeout(_gcPoll, 2000);
setLoading(id, true, id !== activeBtn); {% endif %}
});
fetch('{{ url_for("ui.system_integrity_run") }}', { var _integrityPollTimer = null;
method: 'POST', var _integrityLastMode = {dryRun: false, autoHeal: false};
headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
body: JSON.stringify({dry_run: dryRun, auto_heal: autoHeal}) function _integritySetScanning(scanning) {
}) var banner = document.getElementById('integrityScanningBanner');
.then(function (r) { return r.json(); }) var btns = ['integrityRunBtn', 'integrityHealBtn', 'integrityDryRunBtn'];
.then(function (data) { if (scanning) {
banner.classList.remove('d-none');
btns.forEach(function (id) {
var el = document.getElementById(id);
if (el) el.disabled = true;
});
} else {
banner.classList.add('d-none');
document.getElementById('integrityScanElapsed').textContent = '';
btns.forEach(function (id) {
var el = document.getElementById(id);
if (el) el.disabled = false;
});
}
}
function _integrityShowResult(data, dryRun, autoHeal) {
var container = document.getElementById('integrityResult'); var container = document.getElementById('integrityResult');
var alert = document.getElementById('integrityResultAlert'); var alert = document.getElementById('integrityResultAlert');
var title = document.getElementById('integrityResultTitle'); var title = document.getElementById('integrityResultTitle');
var body = document.getElementById('integrityResultBody'); var body = document.getElementById('integrityResultBody');
container.classList.remove('d-none'); container.classList.remove('d-none');
if (data.error) {
alert.className = 'alert alert-danger mb-0 small';
title.textContent = 'Error';
body.textContent = data.error;
return;
}
var totalIssues = (data.corrupted_objects || 0) + (data.orphaned_objects || 0) + var totalIssues = (data.corrupted_objects || 0) + (data.orphaned_objects || 0) +
(data.phantom_metadata || 0) + (data.stale_versions || 0) + (data.phantom_metadata || 0) + (data.stale_versions || 0) +
(data.etag_cache_inconsistencies || 0) + (data.legacy_metadata_drifts || 0); (data.etag_cache_inconsistencies || 0) + (data.legacy_metadata_drifts || 0);
@@ -481,8 +669,67 @@
if (data.errors && data.errors.length > 0) lines.push('Errors: ' + data.errors.join(', ')); if (data.errors && data.errors.length > 0) lines.push('Errors: ' + data.errors.join(', '));
body.innerHTML = lines.join('<br>'); body.innerHTML = lines.join('<br>');
}
function _integrityPoll() {
fetch('{{ url_for("ui.system_integrity_status") }}', {
headers: {'X-CSRFToken': csrfToken}
})
.then(function (r) { return r.json(); })
.then(function (status) {
if (status.scanning) {
var elapsed = status.scan_elapsed_seconds || 0;
document.getElementById('integrityScanElapsed').textContent = ' (' + elapsed.toFixed(0) + 's)';
_integrityPollTimer = setTimeout(_integrityPoll, 2000);
} else {
_integritySetScanning(false);
_integrityRefreshHistory();
fetch('{{ url_for("ui.system_integrity_history") }}?limit=1', {
headers: {'X-CSRFToken': csrfToken}
})
.then(function (r) { return r.json(); })
.then(function (hist) {
if (hist.executions && hist.executions.length > 0) {
var latest = hist.executions[0];
_integrityShowResult(latest.result, latest.dry_run, latest.auto_heal);
}
})
.catch(function () {});
}
})
.catch(function () {
_integrityPollTimer = setTimeout(_integrityPoll, 3000);
});
}
window.runIntegrity = function (dryRun, autoHeal) {
_integrityLastMode = {dryRun: dryRun, autoHeal: autoHeal};
document.getElementById('integrityResult').classList.add('d-none');
_integritySetScanning(true);
fetch('{{ url_for("ui.system_integrity_run") }}', {
method: 'POST',
headers: {'Content-Type': 'application/json', 'X-CSRFToken': csrfToken},
body: JSON.stringify({dry_run: dryRun, auto_heal: autoHeal})
})
.then(function (r) { return r.json(); })
.then(function (data) {
if (data.error) {
_integritySetScanning(false);
var container = document.getElementById('integrityResult');
var alert = document.getElementById('integrityResultAlert');
var title = document.getElementById('integrityResultTitle');
var body = document.getElementById('integrityResultBody');
container.classList.remove('d-none');
alert.className = 'alert alert-danger mb-0 small';
title.textContent = 'Error';
body.textContent = data.error;
return;
}
_integrityPollTimer = setTimeout(_integrityPoll, 2000);
}) })
.catch(function (err) { .catch(function (err) {
_integritySetScanning(false);
var container = document.getElementById('integrityResult'); var container = document.getElementById('integrityResult');
var alert = document.getElementById('integrityResultAlert'); var alert = document.getElementById('integrityResultAlert');
var title = document.getElementById('integrityResultTitle'); var title = document.getElementById('integrityResultTitle');
@@ -491,13 +738,13 @@
alert.className = 'alert alert-danger mb-0 small'; alert.className = 'alert alert-danger mb-0 small';
title.textContent = 'Error'; title.textContent = 'Error';
body.textContent = err.message; body.textContent = err.message;
})
.finally(function () {
setLoading('integrityRunBtn', false);
setLoading('integrityHealBtn', false);
setLoading('integrityDryRunBtn', false);
}); });
}; };
{% if integrity_status.scanning %}
_integritySetScanning(true);
_integrityPollTimer = setTimeout(_integrityPoll, 2000);
{% endif %}
})(); })();
</script> </script>
{% endblock %} {% endblock %}

View File

@@ -1,3 +1,56 @@
import hashlib
import hmac
from datetime import datetime, timezone
from urllib.parse import quote
def _build_presigned_query(path: str, *, access_key: str = "test", secret_key: str = "secret", expires: int = 60) -> str:
now = datetime.now(timezone.utc)
amz_date = now.strftime("%Y%m%dT%H%M%SZ")
date_stamp = now.strftime("%Y%m%d")
region = "us-east-1"
service = "s3"
credential_scope = f"{date_stamp}/{region}/{service}/aws4_request"
query_items = [
("X-Amz-Algorithm", "AWS4-HMAC-SHA256"),
("X-Amz-Content-Sha256", "UNSIGNED-PAYLOAD"),
("X-Amz-Credential", f"{access_key}/{credential_scope}"),
("X-Amz-Date", amz_date),
("X-Amz-Expires", str(expires)),
("X-Amz-SignedHeaders", "host"),
]
canonical_query = "&".join(
f"{quote(k, safe='-_.~')}={quote(v, safe='-_.~')}" for k, v in sorted(query_items)
)
canonical_request = "\n".join([
"GET",
quote(path, safe="/-_.~"),
canonical_query,
"host:localhost\n",
"host",
"UNSIGNED-PAYLOAD",
])
hashed_request = hashlib.sha256(canonical_request.encode("utf-8")).hexdigest()
string_to_sign = "\n".join([
"AWS4-HMAC-SHA256",
amz_date,
credential_scope,
hashed_request,
])
def _sign(key: bytes, msg: str) -> bytes:
return hmac.new(key, msg.encode("utf-8"), hashlib.sha256).digest()
k_date = _sign(("AWS4" + secret_key).encode("utf-8"), date_stamp)
k_region = _sign(k_date, region)
k_service = _sign(k_region, service)
signing_key = _sign(k_service, "aws4_request")
signature = hmac.new(signing_key, string_to_sign.encode("utf-8"), hashlib.sha256).hexdigest()
return canonical_query + f"&X-Amz-Signature={signature}"
def test_bucket_and_object_lifecycle(client, signer): def test_bucket_and_object_lifecycle(client, signer):
headers = signer("PUT", "/photos") headers = signer("PUT", "/photos")
response = client.put("/photos", headers=headers) response = client.put("/photos", headers=headers)
@@ -114,6 +167,45 @@ def test_missing_credentials_denied(client):
assert response.status_code == 403 assert response.status_code == 403
def test_presigned_url_denied_for_disabled_user(client, signer):
headers = signer("PUT", "/secure")
assert client.put("/secure", headers=headers).status_code == 200
payload = b"hello"
headers = signer("PUT", "/secure/file.txt", body=payload)
assert client.put("/secure/file.txt", headers=headers, data=payload).status_code == 200
iam = client.application.extensions["iam"]
iam.disable_user("test")
query = _build_presigned_query("/secure/file.txt")
response = client.get(f"/secure/file.txt?{query}", headers={"Host": "localhost"})
assert response.status_code == 403
assert b"User account is disabled" in response.data
def test_presigned_url_denied_for_inactive_key(client, signer):
headers = signer("PUT", "/secure2")
assert client.put("/secure2", headers=headers).status_code == 200
payload = b"hello"
headers = signer("PUT", "/secure2/file.txt", body=payload)
assert client.put("/secure2/file.txt", headers=headers, data=payload).status_code == 200
iam = client.application.extensions["iam"]
for user in iam._raw_config.get("users", []):
for key_info in user.get("access_keys", []):
if key_info.get("access_key") == "test":
key_info["status"] = "inactive"
iam._save()
iam._load()
query = _build_presigned_query("/secure2/file.txt")
response = client.get(f"/secure2/file.txt?{query}", headers={"Host": "localhost"})
assert response.status_code == 403
assert b"Access key is inactive" in response.data
def test_bucket_policies_deny_reads(client, signer): def test_bucket_policies_deny_reads(client, signer):
import json import json

View File

@@ -317,7 +317,7 @@ class TestAdminAPI:
) )
assert resp.status_code == 200 assert resp.status_code == 200
data = resp.get_json() data = resp.get_json()
assert "temp_files_deleted" in data assert data["status"] == "started"
def test_gc_dry_run(self, gc_app): def test_gc_dry_run(self, gc_app):
client = gc_app.test_client() client = gc_app.test_client()
@@ -329,11 +329,17 @@ class TestAdminAPI:
) )
assert resp.status_code == 200 assert resp.status_code == 200
data = resp.get_json() data = resp.get_json()
assert "temp_files_deleted" in data assert data["status"] == "started"
def test_gc_history(self, gc_app): def test_gc_history(self, gc_app):
import time
client = gc_app.test_client() client = gc_app.test_client()
client.post("/admin/gc/run", headers={"X-Access-Key": "admin", "X-Secret-Key": "adminsecret"}) client.post("/admin/gc/run", headers={"X-Access-Key": "admin", "X-Secret-Key": "adminsecret"})
for _ in range(50):
time.sleep(0.1)
status = client.get("/admin/gc/status", headers={"X-Access-Key": "admin", "X-Secret-Key": "adminsecret"}).get_json()
if not status.get("scanning"):
break
resp = client.get("/admin/gc/history", headers={"X-Access-Key": "admin", "X-Secret-Key": "adminsecret"}) resp = client.get("/admin/gc/history", headers={"X-Access-Key": "admin", "X-Secret-Key": "adminsecret"})
assert resp.status_code == 200 assert resp.status_code == 200
data = resp.get_json() data = resp.get_json()

View File

@@ -2,13 +2,25 @@ import hashlib
import json import json
import os import os
import sys import sys
import time
from pathlib import Path from pathlib import Path
import pytest import pytest
sys.path.insert(0, str(Path(__file__).resolve().parents[1])) sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
from app.integrity import IntegrityChecker, IntegrityResult from app.integrity import IntegrityChecker, IntegrityCursorStore, IntegrityResult
def _wait_scan_done(client, headers, timeout=10):
deadline = time.time() + timeout
while time.time() < deadline:
resp = client.get("/admin/integrity/status", headers=headers)
data = resp.get_json()
if not data.get("scanning"):
return
time.sleep(0.1)
raise TimeoutError("scan did not complete")
def _md5(data: bytes) -> str: def _md5(data: bytes) -> str:
@@ -106,7 +118,7 @@ class TestCorruptedObjects:
result = checker.run_now() result = checker.run_now()
assert result.corrupted_objects == 0 assert result.corrupted_objects == 0
assert result.objects_scanned == 1 assert result.objects_scanned >= 1
def test_corrupted_nested_key(self, storage_root, checker): def test_corrupted_nested_key(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"sub/dir/file.txt": b"nested content"}) _setup_bucket(storage_root, "mybucket", {"sub/dir/file.txt": b"nested content"})
@@ -413,8 +425,13 @@ class TestAdminAPI:
resp = client.post("/admin/integrity/run", headers=AUTH_HEADERS, json={}) resp = client.post("/admin/integrity/run", headers=AUTH_HEADERS, json={})
assert resp.status_code == 200 assert resp.status_code == 200
data = resp.get_json() data = resp.get_json()
assert "corrupted_objects" in data assert data["status"] == "started"
assert "objects_scanned" in data _wait_scan_done(client, AUTH_HEADERS)
resp = client.get("/admin/integrity/history?limit=1", headers=AUTH_HEADERS)
hist = resp.get_json()
assert len(hist["executions"]) >= 1
assert "corrupted_objects" in hist["executions"][0]["result"]
assert "objects_scanned" in hist["executions"][0]["result"]
def test_run_with_overrides(self, integrity_app): def test_run_with_overrides(self, integrity_app):
client = integrity_app.test_client() client = integrity_app.test_client()
@@ -424,10 +441,12 @@ class TestAdminAPI:
json={"dry_run": True, "auto_heal": True}, json={"dry_run": True, "auto_heal": True},
) )
assert resp.status_code == 200 assert resp.status_code == 200
_wait_scan_done(client, AUTH_HEADERS)
def test_history_endpoint(self, integrity_app): def test_history_endpoint(self, integrity_app):
client = integrity_app.test_client() client = integrity_app.test_client()
client.post("/admin/integrity/run", headers=AUTH_HEADERS, json={}) client.post("/admin/integrity/run", headers=AUTH_HEADERS, json={})
_wait_scan_done(client, AUTH_HEADERS)
resp = client.get("/admin/integrity/history", headers=AUTH_HEADERS) resp = client.get("/admin/integrity/history", headers=AUTH_HEADERS)
assert resp.status_code == 200 assert resp.status_code == 200
data = resp.get_json() data = resp.get_json()
@@ -484,7 +503,7 @@ class TestMultipleBuckets:
result = checker.run_now() result = checker.run_now()
assert result.buckets_scanned == 2 assert result.buckets_scanned == 2
assert result.objects_scanned == 2 assert result.objects_scanned >= 2
assert result.corrupted_objects == 0 assert result.corrupted_objects == 0
@@ -497,3 +516,273 @@ class TestGetStatus:
assert "batch_size" in status assert "batch_size" in status
assert "auto_heal" in status assert "auto_heal" in status
assert "dry_run" in status assert "dry_run" in status
def test_status_includes_cursor(self, storage_root, checker):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
checker.run_now()
status = checker.get_status()
assert "cursor" in status
assert status["cursor"]["tracked_buckets"] == 1
assert "mybucket" in status["cursor"]["buckets"]
class TestUnifiedBatchCounter:
def test_orphaned_objects_count_toward_batch(self, storage_root):
_setup_bucket(storage_root, "mybucket", {})
for i in range(10):
(storage_root / "mybucket" / f"orphan{i}.txt").write_bytes(f"data{i}".encode())
checker = IntegrityChecker(storage_root=storage_root, batch_size=3)
result = checker.run_now()
assert result.objects_scanned <= 3
def test_phantom_metadata_counts_toward_batch(self, storage_root):
objects = {f"file{i}.txt": f"data{i}".encode() for i in range(10)}
_setup_bucket(storage_root, "mybucket", objects)
for i in range(10):
(storage_root / "mybucket" / f"file{i}.txt").unlink()
checker = IntegrityChecker(storage_root=storage_root, batch_size=5)
result = checker.run_now()
assert result.objects_scanned <= 5
def test_all_check_types_contribute(self, storage_root):
_setup_bucket(storage_root, "mybucket", {"valid.txt": b"hello"})
(storage_root / "mybucket" / "orphan.txt").write_bytes(b"orphan")
checker = IntegrityChecker(storage_root=storage_root, batch_size=1000)
result = checker.run_now()
assert result.objects_scanned > 2
class TestCursorRotation:
def test_oldest_bucket_scanned_first(self, storage_root):
_setup_bucket(storage_root, "bucket-a", {"a.txt": b"aaa"})
_setup_bucket(storage_root, "bucket-b", {"b.txt": b"bbb"})
_setup_bucket(storage_root, "bucket-c", {"c.txt": b"ccc"})
checker = IntegrityChecker(storage_root=storage_root, batch_size=5)
checker.cursor_store.update_bucket("bucket-a", 1000.0)
checker.cursor_store.update_bucket("bucket-b", 3000.0)
checker.cursor_store.update_bucket("bucket-c", 2000.0)
ordered = checker.cursor_store.get_bucket_order(["bucket-a", "bucket-b", "bucket-c"])
assert ordered[0] == "bucket-a"
assert ordered[1] == "bucket-c"
assert ordered[2] == "bucket-b"
def test_never_scanned_buckets_first(self, storage_root):
_setup_bucket(storage_root, "bucket-old", {"a.txt": b"aaa"})
_setup_bucket(storage_root, "bucket-new", {"b.txt": b"bbb"})
checker = IntegrityChecker(storage_root=storage_root, batch_size=1000)
checker.cursor_store.update_bucket("bucket-old", time.time())
ordered = checker.cursor_store.get_bucket_order(["bucket-old", "bucket-new"])
assert ordered[0] == "bucket-new"
def test_rotation_covers_all_buckets(self, storage_root):
for name in ["bucket-a", "bucket-b", "bucket-c"]:
_setup_bucket(storage_root, name, {f"{name}.txt": name.encode()})
checker = IntegrityChecker(storage_root=storage_root, batch_size=4)
result1 = checker.run_now()
scanned_buckets_1 = set()
for issue_bucket in [storage_root]:
pass
assert result1.buckets_scanned >= 1
result2 = checker.run_now()
result3 = checker.run_now()
cursor_info = checker.cursor_store.get_info()
assert cursor_info["tracked_buckets"] == 3
def test_cursor_persistence(self, storage_root):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
checker1 = IntegrityChecker(storage_root=storage_root, batch_size=1000)
checker1.run_now()
cursor1 = checker1.cursor_store.get_info()
assert cursor1["tracked_buckets"] == 1
assert "mybucket" in cursor1["buckets"]
checker2 = IntegrityChecker(storage_root=storage_root, batch_size=1000)
cursor2 = checker2.cursor_store.get_info()
assert cursor2["tracked_buckets"] == 1
assert "mybucket" in cursor2["buckets"]
def test_stale_cursor_cleanup(self, storage_root):
_setup_bucket(storage_root, "bucket-a", {"a.txt": b"aaa"})
_setup_bucket(storage_root, "bucket-b", {"b.txt": b"bbb"})
checker = IntegrityChecker(storage_root=storage_root, batch_size=1000)
checker.run_now()
import shutil
shutil.rmtree(storage_root / "bucket-b")
meta_b = storage_root / ".myfsio.sys" / "buckets" / "bucket-b"
if meta_b.exists():
shutil.rmtree(meta_b)
checker.run_now()
cursor_info = checker.cursor_store.get_info()
assert "bucket-b" not in cursor_info["buckets"]
assert "bucket-a" in cursor_info["buckets"]
def test_cursor_updates_after_scan(self, storage_root):
_setup_bucket(storage_root, "mybucket", {"file.txt": b"hello"})
checker = IntegrityChecker(storage_root=storage_root, batch_size=1000)
before = time.time()
checker.run_now()
after = time.time()
cursor_info = checker.cursor_store.get_info()
entry = cursor_info["buckets"]["mybucket"]
assert before <= entry["last_scanned"] <= after
assert entry["completed"] is True
class TestIntraBucketCursor:
def test_resumes_from_cursor_key(self, storage_root):
objects = {f"file_{chr(ord('a') + i)}.txt": f"data{i}".encode() for i in range(10)}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(storage_root=storage_root, batch_size=3)
result1 = checker.run_now()
assert result1.objects_scanned == 3
cursor_info = checker.cursor_store.get_info()
entry = cursor_info["buckets"]["mybucket"]
assert entry["last_key"] is not None
assert entry["completed"] is False
result2 = checker.run_now()
assert result2.objects_scanned == 3
cursor_after = checker.cursor_store.get_info()["buckets"]["mybucket"]
assert cursor_after["last_key"] > entry["last_key"]
def test_cursor_resets_after_full_pass(self, storage_root):
objects = {f"file_{i}.txt": f"data{i}".encode() for i in range(3)}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(storage_root=storage_root, batch_size=100)
checker.run_now()
cursor_info = checker.cursor_store.get_info()
entry = cursor_info["buckets"]["mybucket"]
assert entry["last_key"] is None
assert entry["completed"] is True
def test_full_coverage_across_cycles(self, storage_root):
objects = {f"obj_{chr(ord('a') + i)}.txt": f"data{i}".encode() for i in range(10)}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(storage_root=storage_root, batch_size=4)
all_scanned = 0
for _ in range(10):
result = checker.run_now()
all_scanned += result.objects_scanned
if checker.cursor_store.get_info()["buckets"]["mybucket"]["completed"]:
break
assert all_scanned >= 10
def test_deleted_cursor_key_skips_gracefully(self, storage_root):
objects = {f"file_{chr(ord('a') + i)}.txt": f"data{i}".encode() for i in range(6)}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(storage_root=storage_root, batch_size=3)
checker.run_now()
cursor_info = checker.cursor_store.get_info()
cursor_key = cursor_info["buckets"]["mybucket"]["last_key"]
assert cursor_key is not None
obj_path = storage_root / "mybucket" / cursor_key
meta_root = storage_root / ".myfsio.sys" / "buckets" / "mybucket" / "meta"
key_path = Path(cursor_key)
index_path = meta_root / key_path.parent / "_index.json" if key_path.parent != Path(".") else meta_root / "_index.json"
if key_path.parent == Path("."):
index_path = meta_root / "_index.json"
else:
index_path = meta_root / key_path.parent / "_index.json"
if obj_path.exists():
obj_path.unlink()
if index_path.exists():
index_data = json.loads(index_path.read_text())
index_data.pop(key_path.name, None)
index_path.write_text(json.dumps(index_data))
result2 = checker.run_now()
assert result2.objects_scanned > 0
def test_incomplete_buckets_prioritized(self, storage_root):
_setup_bucket(storage_root, "bucket-a", {f"a{i}.txt": b"a" for i in range(5)})
_setup_bucket(storage_root, "bucket-b", {f"b{i}.txt": b"b" for i in range(5)})
checker = IntegrityChecker(storage_root=storage_root, batch_size=3)
checker.run_now()
cursor_info = checker.cursor_store.get_info()
incomplete = [
name for name, info in cursor_info["buckets"].items()
if info.get("last_key") is not None
]
assert len(incomplete) >= 1
result2 = checker.run_now()
assert result2.objects_scanned > 0
def test_cursor_skips_nested_directories(self, storage_root):
objects = {
"aaa/file1.txt": b"a1",
"aaa/file2.txt": b"a2",
"bbb/file1.txt": b"b1",
"bbb/file2.txt": b"b2",
"ccc/file1.txt": b"c1",
"ccc/file2.txt": b"c2",
}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(storage_root=storage_root, batch_size=4)
result1 = checker.run_now()
assert result1.objects_scanned == 4
cursor_info = checker.cursor_store.get_info()
cursor_key = cursor_info["buckets"]["mybucket"]["last_key"]
assert cursor_key is not None
assert cursor_key.startswith("aaa/") or cursor_key.startswith("bbb/")
result2 = checker.run_now()
assert result2.objects_scanned >= 2
all_scanned = result1.objects_scanned + result2.objects_scanned
for _ in range(10):
if checker.cursor_store.get_info()["buckets"]["mybucket"]["completed"]:
break
r = checker.run_now()
all_scanned += r.objects_scanned
assert all_scanned >= 6
def test_sorted_walk_order(self, storage_root):
objects = {
"bar.txt": b"bar",
"bar/inner.txt": b"inner",
"abc.txt": b"abc",
"zzz/deep.txt": b"deep",
}
_setup_bucket(storage_root, "mybucket", objects)
checker = IntegrityChecker(storage_root=storage_root, batch_size=100)
result = checker.run_now()
assert result.objects_scanned >= 4
assert result.total_issues == 0

Some files were not shown because too many files have changed in this diff Show More