59 Commits

Author SHA1 Message Date
2a0e77a754 MyFSIO v0.3.7 Release
Reviewed-on: #30
2026-03-09 06:25:50 +00:00
c6e368324a Update docs.md and docs.html for credential expiry, IAM encryption, admin key env vars, and --reset-cred 2026-03-08 13:38:44 +08:00
7b6c096bb7 Remove the check out the documentation paragraph at login page 2026-03-08 13:18:03 +08:00
03353a0aec Add credential expiry support: per-user expires_at with UI management, presets, and badge indicators; Add credential expiry support: per-user expires_at with UI management, presets, and badge indicators; Fix IAM card dropdown clipped by overflow: remove gradient bar, allow overflow visible 2026-03-08 13:08:57 +08:00
eb0e435a5a MyFSIO v0.3.6 Release
Reviewed-on: #29
2026-03-08 04:46:31 +00:00
72f5d9d70c Restore data integrity guarantees: Content-MD5 validation, fsync durability, atomic metadata writes, concurrent write protection 2026-03-07 17:54:00 +08:00
be63e27c15 Reduce per-request CPU overhead: eliminate double stat(), cache content type and policy context, gate logging, configurable stat intervals 2026-03-07 14:08:23 +08:00
7633007a08 MyFSIO v0.3.5 Release
Reviewed-on: #28
2026-03-07 05:53:02 +00:00
81ef0fe4c7 Fix stale object count in bucket header and metrics dashboard after deletes 2026-03-03 19:42:37 +08:00
5f24bd920d Reduce P99 tail latency: defer etag index writes, eliminate double cache rebuild, skip redundant stat() in bucket config 2026-03-02 22:39:37 +08:00
8552f193de Reduce CPU/lock contention under concurrent uploads: split cache lock, in-memory stats, dict copy, lightweight request IDs, defaultdict metrics 2026-03-02 22:05:54 +08:00
de0d869c9f Merge pull request 'MyFSIO v0.3.4 Release' (#27) from next into main
Reviewed-on: #27
2026-03-02 08:31:32 +00:00
5536330aeb Move performance-critical Python functions to Rust: streaming I/O, multipart assembly, and AES-256-GCM encryption 2026-02-27 22:55:20 +08:00
d4657c389d Fix misleading default credentials in README to match actual random generation behavior 2026-02-27 21:58:10 +08:00
3827235232 Reduce CPU usage on heavy uploads: skip SHA256 body hashing in SigV4, use Rust md5_file post-write instead of per-chunk _HashingReader 2026-02-27 21:57:13 +08:00
fdd068feee MyFSIO v0.3.3 Release
Reviewed-on: #26
2026-02-27 04:49:32 +00:00
dfc0058d0d Extend myfsio_core Rust extension with 7 storage hot paths (directory scanning, metadata I/O, object listing, search, bucket stats, cache building) 2026-02-27 12:22:39 +08:00
27aef84311 Fix rclone CopyObject SignatureDoesNotMatch caused by internal metadata leaking as X-Amz-Meta headers 2026-02-26 21:39:43 +08:00
66b7677d2c MyFSIO v0.3.2 Release
Reviewed-on: #25
2026-02-26 10:10:19 +00:00
5003514a3d Fix null ETags in shallow listing by updating etag index on store/delete 2026-02-26 18:09:08 +08:00
4d90ead816 Merge pull request 'Fix incorrect Upgrading & Updates section in Docs' (#24) from next into main
Reviewed-on: #24
2026-02-26 09:50:17 +00:00
20a314e030 Fix incorrect Upgrading & Updates section in Docs 2026-02-26 17:49:59 +08:00
b37a51ed1d MyFSIO v0.3.1 Release
Reviewed-on: #23
2026-02-26 09:42:37 +00:00
d8232340c3 Update docs 2026-02-26 17:38:44 +08:00
a356bb0c4e perf: shallow listing, os.scandir stats, server-side search for large buckets 2026-02-26 17:11:07 +08:00
1c328ee3af Fix list performance for large buckets: delimiter-aware shallow listing, cache TTL increase, UI delimiter streaming. header badge shows total bucket objects, fix status bar text concatenation 2026-02-26 16:29:28 +08:00
5bf7962c04 Fix UI: versioning modals and object browser panel showing 'null' 2026-02-24 20:41:39 +08:00
e06f653606 Fix version panel showing 'null' instead of timestamp, exclude current version from list, auto-refresh versions after upload 2026-02-24 17:19:12 +08:00
0462a7b62e MyFSIO v0.3.0 Release
Reviewed-on: #22
2026-02-22 10:22:35 +00:00
9c2809c195 Backwards compatibility for Proxy trust config 2026-02-22 18:03:38 +08:00
fb32ca0a7d Harden security: fail-closed policies, presigned URL time/expiry validation, SSRF DNS pinning, lockout cap, proxy trust config 2026-02-22 17:55:40 +08:00
6ab702a818 Use cached etag in HEAD instead of re-hashing entire file 2026-02-22 16:01:46 +08:00
550e7d435c Move SigV4 canonical request construction to Rust unified verify function 2026-02-22 14:03:12 +08:00
776967e80d Add Rust index reader, metadata read cache, and 256KB stream chunks 2026-02-19 23:01:40 +08:00
082a7fbcd1 Move index JSON read to Rust for GIL-released parsing (serde_json) 2026-02-19 22:43:28 +08:00
ff287cf67b Improve Sites page UI/UX: dropdown actions, collapsible forms, AJAX submissions, Check All Health, safer selectors 2026-02-16 22:04:46 +08:00
bddf36d52d Fix domain mapping cross-process staleness, filter bucket dropdown to website-enabled only 2026-02-16 17:48:21 +08:00
cf6cec9cab Add 5 missing S3 API operations: DeleteBucketEncryption, GetObjectAcl, PutObjectAcl, GetObjectAttributes, GetBucketPolicyStatus 2026-02-16 16:41:27 +08:00
d425839e57 Remove Rust build artifacts from tracking, update .gitignore 2026-02-16 16:06:42 +08:00
52660570c1 Merge pull request 'MyFSIO v0.2.9 Release' (#21) from next into main
Reviewed-on: #21
2026-02-15 14:24:14 +00:00
35f61313e0 MyFSIO v0.2.8 Release
Reviewed-on: #20
2026-02-10 14:16:22 +00:00
c470cfb576 MyFSIO v0.2.7 Release
Reviewed-on: #19
2026-02-09 12:22:37 +00:00
jun
d96955deee MyFSIO v0.2.6 Release
Reviewed-on: #18
2026-02-05 16:18:03 +00:00
85181f0be6 Merge pull request 'MyFSIO v0.2.5 Release' (#17) from next into main
Reviewed-on: #17
2026-02-02 05:32:02 +00:00
d5ca7a8be1 Merge pull request 'MyFSIO v0.2.4 Release' (#16) from next into main
Reviewed-on: #16
2026-02-01 10:27:11 +00:00
476dc79e42 MyFSIO v0.2.3 Release
Reviewed-on: #15
2026-01-25 06:05:53 +00:00
bb6590fc5e Merge pull request 'MyFSIO v0.2.2 Release' (#14) from next into main
Reviewed-on: #14
2026-01-19 07:12:15 +00:00
899db3421b Merge pull request 'MyFSIO v0.2.1 Release' (#13) from next into main
Reviewed-on: #13
2026-01-12 08:03:29 +00:00
caf01d6ada Merge pull request 'MyFSIO v0.2.0 Release' (#12) from next into main
Reviewed-on: #12
2026-01-05 15:48:03 +00:00
bb366cb4cd Merge pull request 'MyFSIO v0.1.9 Release' (#10) from next into main
Reviewed-on: #10
2025-12-29 06:49:48 +00:00
a2745ff2ee Merge pull request 'MyFSIO v0.1.8 Release' (#9) from next into main
Reviewed-on: #9
2025-12-23 06:01:32 +00:00
28cb656d94 Merge pull request 'MyFSIO v0.1.7 Release' (#8) from next into main
Reviewed-on: #8
2025-12-22 03:10:35 +00:00
3c44152fc6 Merge pull request 'MyFSIO v0.1.6 Release' (#7) from next into main
Reviewed-on: #7
2025-12-21 06:30:21 +00:00
397515edce Merge pull request 'MyFSIO v0.1.5 Release' (#6) from next into main
Reviewed-on: #6
2025-12-13 15:41:03 +00:00
980fced7e4 Merge pull request 'MyFSIO v0.1.4 Release' (#5) from next into main
Reviewed-on: #5
2025-12-13 08:22:43 +00:00
bae5009ec4 Merge pull request 'Release v0.1.3' (#4) from next into main
Reviewed-on: #4
2025-12-03 04:14:57 +00:00
233780617f Merge pull request 'Release V0.1.2' (#3) from next into main
Reviewed-on: #3
2025-11-26 04:59:15 +00:00
fd8fb21517 Merge pull request 'Prepare for binary release' (#2) from next into main
Reviewed-on: #2
2025-11-22 12:33:38 +00:00
c6cbe822e1 Merge pull request 'Release v0.1.1' (#1) from next into main
Reviewed-on: #1
2025-11-22 12:31:27 +00:00
522 changed files with 4907 additions and 2648 deletions

4
.gitignore vendored
View File

@@ -26,6 +26,10 @@ dist/
*.egg-info/
.eggs/
# Rust / maturin build artifacts
myfsio_core/target/
myfsio_core/Cargo.lock
# Local runtime artifacts
logs/
*.log

View File

@@ -80,7 +80,7 @@ python run.py --mode api # API only (port 5000)
python run.py --mode ui # UI only (port 5100)
```
**Default Credentials:** `localadmin` / `localadmin`
**Credentials:** Generated automatically on first run and printed to the console. If missed, check the IAM config file at `<STORAGE_ROOT>/.myfsio.sys/config/iam.json`.
- **Web Console:** http://127.0.0.1:5100/ui
- **API Endpoint:** http://127.0.0.1:5000

View File

@@ -1,12 +1,13 @@
from __future__ import annotations
import html as html_module
import itertools
import logging
import mimetypes
import os
import shutil
import sys
import time
import uuid
from logging.handlers import RotatingFileHandler
from pathlib import Path
from datetime import timedelta
@@ -38,6 +39,8 @@ from .storage import ObjectStorage, StorageError
from .version import get_version
from .website_domains import WebsiteDomainStore
_request_counter = itertools.count(1)
def _migrate_config_file(active_path: Path, legacy_paths: List[Path]) -> Path:
"""Migrate config file from legacy locations to the active path.
@@ -93,7 +96,14 @@ def create_app(
app.config.setdefault("WTF_CSRF_ENABLED", False)
# Trust X-Forwarded-* headers from proxies
app.wsgi_app = ProxyFix(app.wsgi_app, x_for=1, x_proto=1, x_host=1, x_prefix=1)
num_proxies = app.config.get("NUM_TRUSTED_PROXIES", 1)
if num_proxies:
if "NUM_TRUSTED_PROXIES" not in os.environ:
logging.getLogger(__name__).warning(
"NUM_TRUSTED_PROXIES not set, defaulting to 1. "
"Set NUM_TRUSTED_PROXIES=0 if not behind a reverse proxy."
)
app.wsgi_app = ProxyFix(app.wsgi_app, x_for=num_proxies, x_proto=num_proxies, x_host=num_proxies, x_prefix=num_proxies)
# Enable gzip compression for responses (10-20x smaller JSON payloads)
if app.config.get("ENABLE_GZIP", True):
@@ -107,7 +117,7 @@ def create_app(
storage = ObjectStorage(
Path(app.config["STORAGE_ROOT"]),
cache_ttl=app.config.get("OBJECT_CACHE_TTL", 5),
cache_ttl=app.config.get("OBJECT_CACHE_TTL", 60),
object_cache_max_size=app.config.get("OBJECT_CACHE_MAX_SIZE", 100),
bucket_config_cache_ttl=app.config.get("BUCKET_CONFIG_CACHE_TTL_SECONDS", 30.0),
object_key_max_length_bytes=app.config.get("OBJECT_KEY_MAX_LENGTH_BYTES", 1024),
@@ -120,6 +130,7 @@ def create_app(
Path(app.config["IAM_CONFIG"]),
auth_max_attempts=app.config.get("AUTH_MAX_ATTEMPTS", 5),
auth_lockout_minutes=app.config.get("AUTH_LOCKOUT_MINUTES", 15),
encryption_key=app.config.get("SECRET_KEY"),
)
bucket_policies = BucketPolicyStore(Path(app.config["BUCKET_POLICY_PATH"]))
secret_store = EphemeralSecretStore(default_ttl=app.config.get("SECRET_TTL_SECONDS", 300))
@@ -473,13 +484,9 @@ def _configure_logging(app: Flask) -> None:
@app.before_request
def _log_request_start() -> None:
g.request_id = uuid.uuid4().hex
g.request_id = f"{os.getpid():x}{next(_request_counter):012x}"
g.request_started_at = time.perf_counter()
g.request_bytes_in = request.content_length or 0
app.logger.info(
"Request started",
extra={"path": request.path, "method": request.method, "remote_addr": request.remote_addr},
)
@app.before_request
def _maybe_serve_website():
@@ -608,16 +615,17 @@ def _configure_logging(app: Flask) -> None:
duration_ms = 0.0
if hasattr(g, "request_started_at"):
duration_ms = (time.perf_counter() - g.request_started_at) * 1000
request_id = getattr(g, "request_id", uuid.uuid4().hex)
request_id = getattr(g, "request_id", f"{os.getpid():x}{next(_request_counter):012x}")
response.headers.setdefault("X-Request-ID", request_id)
app.logger.info(
"Request completed",
extra={
"path": request.path,
"method": request.method,
"remote_addr": request.remote_addr,
},
)
if app.logger.isEnabledFor(logging.INFO):
app.logger.info(
"Request completed",
extra={
"path": request.path,
"method": request.method,
"remote_addr": request.remote_addr,
},
)
response.headers["X-Request-Duration-ms"] = f"{duration_ms:.2f}"
operation_metrics = app.extensions.get("operation_metrics")

View File

@@ -2,6 +2,7 @@ from __future__ import annotations
import ipaddress
import json
import os
import re
import time
from dataclasses import dataclass, field
@@ -75,7 +76,7 @@ def _evaluate_condition_operator(
expected_null = condition_values[0].lower() in ("true", "1", "yes") if condition_values else True
return is_null == expected_null
return True
return False
ACTION_ALIASES = {
"s3:listbucket": "list",
@@ -268,7 +269,7 @@ class BucketPolicyStore:
self._last_mtime = self._current_mtime()
# Performance: Avoid stat() on every request
self._last_stat_check = 0.0
self._stat_check_interval = 1.0 # Only check mtime every 1 second
self._stat_check_interval = float(os.environ.get("BUCKET_POLICY_STAT_CHECK_INTERVAL_SECONDS", "2.0"))
def maybe_reload(self) -> None:
# Performance: Skip stat check if we checked recently

View File

@@ -241,7 +241,7 @@ class AppConfig:
cors_expose_headers = _csv(str(_get("CORS_EXPOSE_HEADERS", "*")), ["*"])
session_lifetime_days = int(_get("SESSION_LIFETIME_DAYS", 30))
bucket_stats_cache_ttl = int(_get("BUCKET_STATS_CACHE_TTL", 60))
object_cache_ttl = int(_get("OBJECT_CACHE_TTL", 5))
object_cache_ttl = int(_get("OBJECT_CACHE_TTL", 60))
encryption_enabled = str(_get("ENCRYPTION_ENABLED", "0")).lower() in {"1", "true", "yes", "on"}
encryption_keys_dir = storage_root / ".myfsio.sys" / "keys"
@@ -314,7 +314,7 @@ class AppConfig:
site_region = str(_get("SITE_REGION", "us-east-1"))
site_priority = int(_get("SITE_PRIORITY", 100))
ratelimit_admin = _validate_rate_limit(str(_get("RATE_LIMIT_ADMIN", "60 per minute")))
num_trusted_proxies = int(_get("NUM_TRUSTED_PROXIES", 0))
num_trusted_proxies = int(_get("NUM_TRUSTED_PROXIES", 1))
allowed_redirect_hosts_raw = _get("ALLOWED_REDIRECT_HOSTS", "")
allowed_redirect_hosts = [h.strip() for h in str(allowed_redirect_hosts_raw).split(",") if h.strip()]
allow_internal_endpoints = str(_get("ALLOW_INTERNAL_ENDPOINTS", "0")).lower() in {"1", "true", "yes", "on"}

View File

@@ -189,7 +189,13 @@ class EncryptedObjectStorage:
def list_objects(self, bucket_name: str, **kwargs):
return self.storage.list_objects(bucket_name, **kwargs)
def list_objects_shallow(self, bucket_name: str, **kwargs):
return self.storage.list_objects_shallow(bucket_name, **kwargs)
def search_objects(self, bucket_name: str, query: str, **kwargs):
return self.storage.search_objects(bucket_name, query, **kwargs)
def list_objects_all(self, bucket_name: str):
return self.storage.list_objects_all(bucket_name)

View File

@@ -19,6 +19,13 @@ from cryptography.hazmat.primitives import hashes
if sys.platform != "win32":
import fcntl
try:
import myfsio_core as _rc
_HAS_RUST = True
except ImportError:
_rc = None
_HAS_RUST = False
logger = logging.getLogger(__name__)
@@ -338,6 +345,69 @@ class StreamingEncryptor:
output.seek(0)
return output
def encrypt_file(self, input_path: str, output_path: str) -> EncryptionMetadata:
data_key, encrypted_data_key = self.provider.generate_data_key()
base_nonce = secrets.token_bytes(12)
if _HAS_RUST:
_rc.encrypt_stream_chunked(
input_path, output_path, data_key, base_nonce, self.chunk_size
)
else:
with open(input_path, "rb") as stream:
aesgcm = AESGCM(data_key)
with open(output_path, "wb") as out:
out.write(b"\x00\x00\x00\x00")
chunk_index = 0
while True:
chunk = stream.read(self.chunk_size)
if not chunk:
break
chunk_nonce = self._derive_chunk_nonce(base_nonce, chunk_index)
encrypted_chunk = aesgcm.encrypt(chunk_nonce, chunk, None)
out.write(len(encrypted_chunk).to_bytes(self.HEADER_SIZE, "big"))
out.write(encrypted_chunk)
chunk_index += 1
out.seek(0)
out.write(chunk_index.to_bytes(4, "big"))
return EncryptionMetadata(
algorithm="AES256",
key_id=self.provider.KEY_ID if hasattr(self.provider, "KEY_ID") else "local",
nonce=base_nonce,
encrypted_data_key=encrypted_data_key,
)
def decrypt_file(self, input_path: str, output_path: str,
metadata: EncryptionMetadata) -> None:
data_key = self.provider.decrypt_data_key(metadata.encrypted_data_key, metadata.key_id)
base_nonce = metadata.nonce
if _HAS_RUST:
_rc.decrypt_stream_chunked(input_path, output_path, data_key, base_nonce)
else:
with open(input_path, "rb") as stream:
chunk_count_bytes = stream.read(4)
if len(chunk_count_bytes) < 4:
raise EncryptionError("Invalid encrypted stream: missing header")
chunk_count = int.from_bytes(chunk_count_bytes, "big")
aesgcm = AESGCM(data_key)
with open(output_path, "wb") as out:
for chunk_index in range(chunk_count):
size_bytes = stream.read(self.HEADER_SIZE)
if len(size_bytes) < self.HEADER_SIZE:
raise EncryptionError(f"Invalid encrypted stream: truncated at chunk {chunk_index}")
chunk_size = int.from_bytes(size_bytes, "big")
encrypted_chunk = stream.read(chunk_size)
if len(encrypted_chunk) < chunk_size:
raise EncryptionError(f"Invalid encrypted stream: incomplete chunk {chunk_index}")
chunk_nonce = self._derive_chunk_nonce(base_nonce, chunk_index)
try:
decrypted_chunk = aesgcm.decrypt(chunk_nonce, encrypted_chunk, None)
out.write(decrypted_chunk)
except Exception as exc:
raise EncryptionError(f"Failed to decrypt chunk {chunk_index}: {exc}") from exc
class EncryptionManager:
"""Manages encryption providers and operations."""

View File

@@ -1,5 +1,6 @@
from __future__ import annotations
import base64
import hashlib
import hmac
import json
@@ -14,6 +15,8 @@ from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any, Deque, Dict, Iterable, List, Optional, Sequence, Set, Tuple
from cryptography.fernet import Fernet, InvalidToken
class IamError(RuntimeError):
"""Raised when authentication or authorization fails."""
@@ -107,13 +110,24 @@ class Principal:
policies: List[Policy]
def _derive_fernet_key(secret: str) -> bytes:
raw = hashlib.pbkdf2_hmac("sha256", secret.encode(), b"myfsio-iam-encryption", 100_000)
return base64.urlsafe_b64encode(raw)
_IAM_ENCRYPTED_PREFIX = b"MYFSIO_IAM_ENC:"
class IamService:
"""Loads IAM configuration, manages users, and evaluates policies."""
def __init__(self, config_path: Path, auth_max_attempts: int = 5, auth_lockout_minutes: int = 15) -> None:
def __init__(self, config_path: Path, auth_max_attempts: int = 5, auth_lockout_minutes: int = 15, encryption_key: str | None = None) -> None:
self.config_path = Path(config_path)
self.auth_max_attempts = auth_max_attempts
self.auth_lockout_window = timedelta(minutes=auth_lockout_minutes)
self._fernet: Fernet | None = None
if encryption_key:
self._fernet = Fernet(_derive_fernet_key(encryption_key))
self.config_path.parent.mkdir(parents=True, exist_ok=True)
if not self.config_path.exists():
self._write_default()
@@ -125,7 +139,7 @@ class IamService:
self._secret_key_cache: Dict[str, Tuple[str, float]] = {}
self._cache_ttl = float(os.environ.get("IAM_CACHE_TTL_SECONDS", "5.0"))
self._last_stat_check = 0.0
self._stat_check_interval = 1.0
self._stat_check_interval = float(os.environ.get("IAM_STAT_CHECK_INTERVAL_SECONDS", "2.0"))
self._sessions: Dict[str, Dict[str, Any]] = {}
self._session_lock = threading.Lock()
self._load()
@@ -145,6 +159,19 @@ class IamService:
except OSError:
pass
def _check_expiry(self, access_key: str, record: Dict[str, Any]) -> None:
expires_at = record.get("expires_at")
if not expires_at:
return
try:
exp_dt = datetime.fromisoformat(expires_at)
if exp_dt.tzinfo is None:
exp_dt = exp_dt.replace(tzinfo=timezone.utc)
if datetime.now(timezone.utc) >= exp_dt:
raise IamError(f"Credentials for '{access_key}' have expired")
except (ValueError, TypeError):
pass
def authenticate(self, access_key: str, secret_key: str) -> Principal:
self._maybe_reload()
access_key = (access_key or "").strip()
@@ -161,12 +188,18 @@ class IamService:
if not record or not hmac.compare_digest(stored_secret, secret_key):
self._record_failed_attempt(access_key)
raise IamError("Invalid credentials")
self._check_expiry(access_key, record)
self._clear_failed_attempts(access_key)
return self._build_principal(access_key, record)
_MAX_LOCKOUT_KEYS = 10000
def _record_failed_attempt(self, access_key: str) -> None:
if not access_key:
return
if access_key not in self._failed_attempts and len(self._failed_attempts) >= self._MAX_LOCKOUT_KEYS:
oldest_key = min(self._failed_attempts, key=lambda k: self._failed_attempts[k][0] if self._failed_attempts[k] else datetime.min.replace(tzinfo=timezone.utc))
del self._failed_attempts[oldest_key]
attempts = self._failed_attempts.setdefault(access_key, deque())
self._prune_attempts(attempts)
attempts.append(datetime.now(timezone.utc))
@@ -283,12 +316,16 @@ class IamService:
if cached:
principal, cached_time = cached
if now - cached_time < self._cache_ttl:
record = self._users.get(access_key)
if record:
self._check_expiry(access_key, record)
return principal
self._maybe_reload()
record = self._users.get(access_key)
if not record:
raise IamError("Unknown access key")
self._check_expiry(access_key, record)
principal = self._build_principal(access_key, record)
self._principal_cache[access_key] = (principal, now)
return principal
@@ -298,6 +335,7 @@ class IamService:
record = self._users.get(access_key)
if not record:
raise IamError("Unknown access key")
self._check_expiry(access_key, record)
return record["secret_key"]
def authorize(self, principal: Principal, bucket_name: str | None, action: str) -> None:
@@ -342,6 +380,7 @@ class IamService:
{
"access_key": access_key,
"display_name": record["display_name"],
"expires_at": record.get("expires_at"),
"policies": [
{"bucket": policy.bucket, "actions": sorted(policy.actions)}
for policy in record["policies"]
@@ -357,20 +396,25 @@ class IamService:
policies: Optional[Sequence[Dict[str, Any]]] = None,
access_key: str | None = None,
secret_key: str | None = None,
expires_at: str | None = None,
) -> Dict[str, str]:
access_key = (access_key or self._generate_access_key()).strip()
if not access_key:
raise IamError("Access key cannot be empty")
if access_key in self._users:
raise IamError("Access key already exists")
if expires_at:
self._validate_expires_at(expires_at)
secret_key = secret_key or self._generate_secret_key()
sanitized_policies = self._prepare_policy_payload(policies)
record = {
record: Dict[str, Any] = {
"access_key": access_key,
"secret_key": secret_key,
"display_name": display_name or access_key,
"policies": sanitized_policies,
}
if expires_at:
record["expires_at"] = expires_at
self._raw_config.setdefault("users", []).append(record)
self._save()
self._load()
@@ -409,17 +453,43 @@ class IamService:
clear_signing_key_cache()
self._load()
def update_user_expiry(self, access_key: str, expires_at: str | None) -> None:
user = self._get_raw_user(access_key)
if expires_at:
self._validate_expires_at(expires_at)
user["expires_at"] = expires_at
else:
user.pop("expires_at", None)
self._save()
self._principal_cache.pop(access_key, None)
self._secret_key_cache.pop(access_key, None)
self._load()
def update_user_policies(self, access_key: str, policies: Sequence[Dict[str, Any]]) -> None:
user = self._get_raw_user(access_key)
user["policies"] = self._prepare_policy_payload(policies)
self._save()
self._load()
def _decrypt_content(self, raw_bytes: bytes) -> str:
if raw_bytes.startswith(_IAM_ENCRYPTED_PREFIX):
if not self._fernet:
raise IamError("IAM config is encrypted but no encryption key provided. Set SECRET_KEY or use 'python run.py reset-cred'.")
try:
encrypted_data = raw_bytes[len(_IAM_ENCRYPTED_PREFIX):]
return self._fernet.decrypt(encrypted_data).decode("utf-8")
except InvalidToken:
raise IamError("Cannot decrypt IAM config. SECRET_KEY may have changed. Use 'python run.py reset-cred' to reset credentials.")
return raw_bytes.decode("utf-8")
def _load(self) -> None:
try:
self._last_load_time = self.config_path.stat().st_mtime
content = self.config_path.read_text(encoding='utf-8')
raw_bytes = self.config_path.read_bytes()
content = self._decrypt_content(raw_bytes)
raw = json.loads(content)
except IamError:
raise
except FileNotFoundError:
raise IamError(f"IAM config not found: {self.config_path}")
except json.JSONDecodeError as e:
@@ -428,34 +498,48 @@ class IamService:
raise IamError(f"Cannot read IAM config (permission denied): {e}")
except (OSError, ValueError) as e:
raise IamError(f"Failed to load IAM config: {e}")
was_plaintext = not raw_bytes.startswith(_IAM_ENCRYPTED_PREFIX)
users: Dict[str, Dict[str, Any]] = {}
for user in raw.get("users", []):
policies = self._build_policy_objects(user.get("policies", []))
users[user["access_key"]] = {
user_record: Dict[str, Any] = {
"secret_key": user["secret_key"],
"display_name": user.get("display_name", user["access_key"]),
"policies": policies,
}
if user.get("expires_at"):
user_record["expires_at"] = user["expires_at"]
users[user["access_key"]] = user_record
if not users:
raise IamError("IAM configuration contains no users")
self._users = users
self._raw_config = {
"users": [
{
"access_key": entry["access_key"],
"secret_key": entry["secret_key"],
"display_name": entry.get("display_name", entry["access_key"]),
"policies": entry.get("policies", []),
}
for entry in raw.get("users", [])
]
}
raw_users: List[Dict[str, Any]] = []
for entry in raw.get("users", []):
raw_entry: Dict[str, Any] = {
"access_key": entry["access_key"],
"secret_key": entry["secret_key"],
"display_name": entry.get("display_name", entry["access_key"]),
"policies": entry.get("policies", []),
}
if entry.get("expires_at"):
raw_entry["expires_at"] = entry["expires_at"]
raw_users.append(raw_entry)
self._raw_config = {"users": raw_users}
if was_plaintext and self._fernet:
self._save()
def _save(self) -> None:
try:
json_text = json.dumps(self._raw_config, indent=2)
temp_path = self.config_path.with_suffix('.json.tmp')
temp_path.write_text(json.dumps(self._raw_config, indent=2), encoding='utf-8')
if self._fernet:
encrypted = self._fernet.encrypt(json_text.encode("utf-8"))
temp_path.write_bytes(_IAM_ENCRYPTED_PREFIX + encrypted)
else:
temp_path.write_text(json_text, encoding='utf-8')
temp_path.replace(self.config_path)
except (OSError, PermissionError) as e:
raise IamError(f"Cannot save IAM config: {e}")
@@ -470,9 +554,14 @@ class IamService:
def export_config(self, mask_secrets: bool = True) -> Dict[str, Any]:
payload: Dict[str, Any] = {"users": []}
for user in self._raw_config.get("users", []):
record = dict(user)
if mask_secrets and "secret_key" in record:
record["secret_key"] = "••••••••••"
record: Dict[str, Any] = {
"access_key": user["access_key"],
"secret_key": "••••••••••" if mask_secrets else user["secret_key"],
"display_name": user["display_name"],
"policies": user["policies"],
}
if user.get("expires_at"):
record["expires_at"] = user["expires_at"]
payload["users"].append(record)
return payload
@@ -541,8 +630,9 @@ class IamService:
return candidate if candidate in ALLOWED_ACTIONS else ""
def _write_default(self) -> None:
access_key = secrets.token_hex(12)
secret_key = secrets.token_urlsafe(32)
access_key = os.environ.get("ADMIN_ACCESS_KEY", "").strip() or secrets.token_hex(12)
secret_key = os.environ.get("ADMIN_SECRET_KEY", "").strip() or secrets.token_urlsafe(32)
custom_keys = bool(os.environ.get("ADMIN_ACCESS_KEY", "").strip())
default = {
"users": [
{
@@ -555,16 +645,37 @@ class IamService:
}
]
}
self.config_path.write_text(json.dumps(default, indent=2))
json_text = json.dumps(default, indent=2)
if self._fernet:
encrypted = self._fernet.encrypt(json_text.encode("utf-8"))
self.config_path.write_bytes(_IAM_ENCRYPTED_PREFIX + encrypted)
else:
self.config_path.write_text(json_text)
print(f"\n{'='*60}")
print("MYFSIO FIRST RUN - ADMIN CREDENTIALS GENERATED")
print("MYFSIO FIRST RUN - ADMIN CREDENTIALS")
print(f"{'='*60}")
print(f"Access Key: {access_key}")
print(f"Secret Key: {secret_key}")
if custom_keys:
print(f"Access Key: {access_key} (from ADMIN_ACCESS_KEY)")
print(f"Secret Key: {'(from ADMIN_SECRET_KEY)' if os.environ.get('ADMIN_SECRET_KEY', '').strip() else secret_key}")
else:
print(f"Access Key: {access_key}")
print(f"Secret Key: {secret_key}")
print(f"{'='*60}")
print(f"Missed this? Check: {self.config_path}")
if self._fernet:
print("IAM config is encrypted at rest.")
print("Lost credentials? Run: python run.py reset-cred")
else:
print(f"Missed this? Check: {self.config_path}")
print(f"{'='*60}\n")
def _validate_expires_at(self, expires_at: str) -> None:
try:
dt = datetime.fromisoformat(expires_at)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
except (ValueError, TypeError):
raise IamError(f"Invalid expires_at format: {expires_at}. Use ISO 8601 (e.g. 2026-12-31T23:59:59Z)")
def _generate_access_key(self) -> str:
return secrets.token_hex(8)
@@ -583,11 +694,15 @@ class IamService:
if cached:
secret_key, cached_time = cached
if now - cached_time < self._cache_ttl:
record = self._users.get(access_key)
if record:
self._check_expiry(access_key, record)
return secret_key
self._maybe_reload()
record = self._users.get(access_key)
if record:
self._check_expiry(access_key, record)
secret_key = record["secret_key"]
self._secret_key_cache[access_key] = (secret_key, now)
return secret_key
@@ -599,11 +714,15 @@ class IamService:
if cached:
principal, cached_time = cached
if now - cached_time < self._cache_ttl:
record = self._users.get(access_key)
if record:
self._check_expiry(access_key, record)
return principal
self._maybe_reload()
record = self._users.get(access_key)
if record:
self._check_expiry(access_key, record)
principal = self._build_principal(access_key, record)
self._principal_cache[access_key] = (principal, now)
return principal

View File

@@ -15,29 +15,23 @@ from typing import Any, Dict, List, Optional
from urllib.parse import urlparse
import requests
from urllib3.util.connection import create_connection as _urllib3_create_connection
def _is_safe_url(url: str, allow_internal: bool = False) -> bool:
"""Check if a URL is safe to make requests to (not internal/private).
Args:
url: The URL to check.
allow_internal: If True, allows internal/private IP addresses.
Use for self-hosted deployments on internal networks.
"""
def _resolve_and_check_url(url: str, allow_internal: bool = False) -> Optional[str]:
try:
parsed = urlparse(url)
hostname = parsed.hostname
if not hostname:
return False
return None
cloud_metadata_hosts = {
"metadata.google.internal",
"169.254.169.254",
}
if hostname.lower() in cloud_metadata_hosts:
return False
return None
if allow_internal:
return True
return hostname
blocked_hosts = {
"localhost",
"127.0.0.1",
@@ -46,17 +40,46 @@ def _is_safe_url(url: str, allow_internal: bool = False) -> bool:
"[::1]",
}
if hostname.lower() in blocked_hosts:
return False
return None
try:
resolved_ip = socket.gethostbyname(hostname)
ip = ipaddress.ip_address(resolved_ip)
if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved:
return False
return None
return resolved_ip
except (socket.gaierror, ValueError):
return False
return True
return None
except Exception:
return False
return None
def _is_safe_url(url: str, allow_internal: bool = False) -> bool:
return _resolve_and_check_url(url, allow_internal) is not None
_dns_pin_lock = threading.Lock()
def _pinned_post(url: str, pinned_ip: str, **kwargs: Any) -> requests.Response:
parsed = urlparse(url)
hostname = parsed.hostname or ""
session = requests.Session()
original_create = _urllib3_create_connection
def _create_pinned(address: Any, *args: Any, **kw: Any) -> Any:
host, req_port = address
if host == hostname:
return original_create((pinned_ip, req_port), *args, **kw)
return original_create(address, *args, **kw)
import urllib3.util.connection as _conn_mod
with _dns_pin_lock:
_conn_mod.create_connection = _create_pinned
try:
return session.post(url, **kwargs)
finally:
_conn_mod.create_connection = original_create
logger = logging.getLogger(__name__)
@@ -344,16 +367,18 @@ class NotificationService:
self._queue.task_done()
def _send_notification(self, event: NotificationEvent, destination: WebhookDestination) -> None:
if not _is_safe_url(destination.url, allow_internal=self._allow_internal_endpoints):
raise RuntimeError(f"Blocked request to cloud metadata service (SSRF protection): {destination.url}")
resolved_ip = _resolve_and_check_url(destination.url, allow_internal=self._allow_internal_endpoints)
if not resolved_ip:
raise RuntimeError(f"Blocked request (SSRF protection): {destination.url}")
payload = event.to_s3_event()
headers = {"Content-Type": "application/json", **destination.headers}
last_error = None
for attempt in range(destination.retry_count):
try:
response = requests.post(
response = _pinned_post(
destination.url,
resolved_ip,
json=payload,
headers=headers,
timeout=destination.timeout_seconds,

View File

@@ -5,6 +5,7 @@ import logging
import random
import threading
import time
from collections import defaultdict
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
@@ -138,8 +139,8 @@ class OperationMetricsCollector:
self.interval_seconds = interval_minutes * 60
self.retention_hours = retention_hours
self._lock = threading.Lock()
self._by_method: Dict[str, OperationStats] = {}
self._by_endpoint: Dict[str, OperationStats] = {}
self._by_method: Dict[str, OperationStats] = defaultdict(OperationStats)
self._by_endpoint: Dict[str, OperationStats] = defaultdict(OperationStats)
self._by_status_class: Dict[str, int] = {}
self._error_codes: Dict[str, int] = {}
self._totals = OperationStats()
@@ -211,8 +212,8 @@ class OperationMetricsCollector:
self._prune_old_snapshots()
self._save_history()
self._by_method.clear()
self._by_endpoint.clear()
self._by_method = defaultdict(OperationStats)
self._by_endpoint = defaultdict(OperationStats)
self._by_status_class.clear()
self._error_codes.clear()
self._totals = OperationStats()
@@ -232,12 +233,7 @@ class OperationMetricsCollector:
status_class = f"{status_code // 100}xx"
with self._lock:
if method not in self._by_method:
self._by_method[method] = OperationStats()
self._by_method[method].record(latency_ms, success, bytes_in, bytes_out)
if endpoint_type not in self._by_endpoint:
self._by_endpoint[endpoint_type] = OperationStats()
self._by_endpoint[endpoint_type].record(latency_ms, success, bytes_in, bytes_out)
self._by_status_class[status_class] = self._by_status_class.get(status_class, 0) + 1

View File

@@ -85,6 +85,9 @@ def _bucket_policies() -> BucketPolicyStore:
def _build_policy_context() -> Dict[str, Any]:
cached = getattr(g, "_policy_context", None)
if cached is not None:
return cached
ctx: Dict[str, Any] = {}
if request.headers.get("Referer"):
ctx["aws:Referer"] = request.headers.get("Referer")
@@ -98,6 +101,7 @@ def _build_policy_context() -> Dict[str, Any]:
ctx["aws:SecureTransport"] = str(request.is_secure).lower()
if request.headers.get("User-Agent"):
ctx["aws:UserAgent"] = request.headers.get("User-Agent")
g._policy_context = ctx
return ctx
@@ -267,39 +271,6 @@ def _verify_sigv4_header(req: Any, auth_header: str) -> Principal | None:
if not secret_key:
raise IamError("SignatureDoesNotMatch")
method = req.method
canonical_uri = _get_canonical_uri(req)
query_args = []
for key, value in req.args.items(multi=True):
query_args.append((key, value))
query_args.sort(key=lambda x: (x[0], x[1]))
canonical_query_parts = []
for k, v in query_args:
canonical_query_parts.append(f"{quote(k, safe='-_.~')}={quote(v, safe='-_.~')}")
canonical_query_string = "&".join(canonical_query_parts)
signed_headers_list = signed_headers_str.split(";")
canonical_headers_parts = []
for header in signed_headers_list:
header_val = req.headers.get(header)
if header_val is None:
header_val = ""
if header.lower() == 'expect' and header_val == "":
header_val = "100-continue"
header_val = " ".join(header_val.split())
canonical_headers_parts.append(f"{header.lower()}:{header_val}\n")
canonical_headers = "".join(canonical_headers_parts)
payload_hash = req.headers.get("X-Amz-Content-Sha256")
if not payload_hash:
payload_hash = hashlib.sha256(req.get_data()).hexdigest()
canonical_request = f"{method}\n{canonical_uri}\n{canonical_query_string}\n{canonical_headers}\n{signed_headers_str}\n{payload_hash}"
amz_date = req.headers.get("X-Amz-Date") or req.headers.get("Date")
if not amz_date:
raise IamError("Missing Date header")
@@ -321,23 +292,50 @@ def _verify_sigv4_header(req: Any, auth_header: str) -> Principal | None:
if 'date' in signed_headers_set:
required_headers.remove('x-amz-date')
required_headers.add('date')
if not required_headers.issubset(signed_headers_set):
raise IamError("Required headers not signed")
credential_scope = f"{date_stamp}/{region}/{service}/aws4_request"
signing_key = _get_signature_key(secret_key, date_stamp, region, service)
canonical_uri = _get_canonical_uri(req)
payload_hash = req.headers.get("X-Amz-Content-Sha256") or "UNSIGNED-PAYLOAD"
if _HAS_RUST:
string_to_sign = _rc.build_string_to_sign(amz_date, credential_scope, canonical_request)
calculated_signature = _rc.compute_signature(signing_key, string_to_sign)
query_params = list(req.args.items(multi=True))
header_values = [(h, req.headers.get(h) or "") for h in signed_headers_str.split(";")]
if not _rc.verify_sigv4_signature(
req.method, canonical_uri, query_params, signed_headers_str,
header_values, payload_hash, amz_date, date_stamp, region,
service, secret_key, signature,
):
raise IamError("SignatureDoesNotMatch")
else:
method = req.method
query_args = sorted(req.args.items(multi=True), key=lambda x: (x[0], x[1]))
canonical_query_parts = []
for k, v in query_args:
canonical_query_parts.append(f"{quote(k, safe='-_.~')}={quote(v, safe='-_.~')}")
canonical_query_string = "&".join(canonical_query_parts)
signed_headers_list = signed_headers_str.split(";")
canonical_headers_parts = []
for header in signed_headers_list:
header_val = req.headers.get(header)
if header_val is None:
header_val = ""
if header.lower() == 'expect' and header_val == "":
header_val = "100-continue"
header_val = " ".join(header_val.split())
canonical_headers_parts.append(f"{header.lower()}:{header_val}\n")
canonical_headers = "".join(canonical_headers_parts)
canonical_request = f"{method}\n{canonical_uri}\n{canonical_query_string}\n{canonical_headers}\n{signed_headers_str}\n{payload_hash}"
credential_scope = f"{date_stamp}/{region}/{service}/aws4_request"
signing_key = _get_signature_key(secret_key, date_stamp, region, service)
string_to_sign = f"AWS4-HMAC-SHA256\n{amz_date}\n{credential_scope}\n{hashlib.sha256(canonical_request.encode('utf-8')).hexdigest()}"
calculated_signature = hmac.new(signing_key, string_to_sign.encode("utf-8"), hashlib.sha256).hexdigest()
if not hmac.compare_digest(calculated_signature, signature):
if current_app.config.get("DEBUG_SIGV4"):
logger.warning("SigV4 signature mismatch for %s %s", method, req.path)
raise IamError("SignatureDoesNotMatch")
if not hmac.compare_digest(calculated_signature, signature):
raise IamError("SignatureDoesNotMatch")
session_token = req.headers.get("X-Amz-Security-Token")
if session_token:
@@ -366,14 +364,21 @@ def _verify_sigv4_query(req: Any) -> Principal | None:
req_time = datetime.strptime(amz_date, "%Y%m%dT%H%M%SZ").replace(tzinfo=timezone.utc)
except ValueError:
raise IamError("Invalid Date format")
now = datetime.now(timezone.utc)
tolerance = timedelta(seconds=current_app.config.get("SIGV4_TIMESTAMP_TOLERANCE_SECONDS", 900))
if req_time > now + tolerance:
raise IamError("Request date is too far in the future")
try:
expires_seconds = int(expires)
if expires_seconds <= 0:
raise IamError("Invalid Expires value: must be positive")
except ValueError:
raise IamError("Invalid Expires value: must be an integer")
min_expiry = current_app.config.get("PRESIGNED_URL_MIN_EXPIRY_SECONDS", 1)
max_expiry = current_app.config.get("PRESIGNED_URL_MAX_EXPIRY_SECONDS", 604800)
if expires_seconds < min_expiry or expires_seconds > max_expiry:
raise IamError(f"Expiration must be between {min_expiry} second(s) and {max_expiry} seconds")
if now > req_time + timedelta(seconds=expires_seconds):
raise IamError("Request expired")
@@ -381,53 +386,58 @@ def _verify_sigv4_query(req: Any) -> Principal | None:
if not secret_key:
raise IamError("Invalid access key")
method = req.method
canonical_uri = _get_canonical_uri(req)
query_args = []
for key, value in req.args.items(multi=True):
if key != "X-Amz-Signature":
query_args.append((key, value))
query_args.sort(key=lambda x: (x[0], x[1]))
canonical_query_parts = []
for k, v in query_args:
canonical_query_parts.append(f"{quote(k, safe='-_.~')}={quote(v, safe='-_.~')}")
canonical_query_string = "&".join(canonical_query_parts)
signed_headers_list = signed_headers_str.split(";")
canonical_headers_parts = []
for header in signed_headers_list:
val = req.headers.get(header, "").strip()
if header.lower() == 'expect' and val == "":
val = "100-continue"
val = " ".join(val.split())
canonical_headers_parts.append(f"{header.lower()}:{val}\n")
canonical_headers = "".join(canonical_headers_parts)
payload_hash = "UNSIGNED-PAYLOAD"
canonical_request = "\n".join([
method,
canonical_uri,
canonical_query_string,
canonical_headers,
signed_headers_str,
payload_hash
])
credential_scope = f"{date_stamp}/{region}/{service}/aws4_request"
signing_key = _get_signature_key(secret_key, date_stamp, region, service)
if _HAS_RUST:
string_to_sign = _rc.build_string_to_sign(amz_date, credential_scope, canonical_request)
calculated_signature = _rc.compute_signature(signing_key, string_to_sign)
query_params = [(k, v) for k, v in req.args.items(multi=True) if k != "X-Amz-Signature"]
header_values = [(h, req.headers.get(h) or "") for h in signed_headers_str.split(";")]
if not _rc.verify_sigv4_signature(
req.method, canonical_uri, query_params, signed_headers_str,
header_values, "UNSIGNED-PAYLOAD", amz_date, date_stamp, region,
service, secret_key, signature,
):
raise IamError("SignatureDoesNotMatch")
else:
method = req.method
query_args = []
for key, value in req.args.items(multi=True):
if key != "X-Amz-Signature":
query_args.append((key, value))
query_args.sort(key=lambda x: (x[0], x[1]))
canonical_query_parts = []
for k, v in query_args:
canonical_query_parts.append(f"{quote(k, safe='-_.~')}={quote(v, safe='-_.~')}")
canonical_query_string = "&".join(canonical_query_parts)
signed_headers_list = signed_headers_str.split(";")
canonical_headers_parts = []
for header in signed_headers_list:
val = req.headers.get(header, "").strip()
if header.lower() == 'expect' and val == "":
val = "100-continue"
val = " ".join(val.split())
canonical_headers_parts.append(f"{header.lower()}:{val}\n")
canonical_headers = "".join(canonical_headers_parts)
payload_hash = "UNSIGNED-PAYLOAD"
canonical_request = "\n".join([
method,
canonical_uri,
canonical_query_string,
canonical_headers,
signed_headers_str,
payload_hash
])
credential_scope = f"{date_stamp}/{region}/{service}/aws4_request"
signing_key = _get_signature_key(secret_key, date_stamp, region, service)
hashed_request = hashlib.sha256(canonical_request.encode('utf-8')).hexdigest()
string_to_sign = f"AWS4-HMAC-SHA256\n{amz_date}\n{credential_scope}\n{hashed_request}"
calculated_signature = hmac.new(signing_key, string_to_sign.encode("utf-8"), hashlib.sha256).hexdigest()
if not hmac.compare_digest(calculated_signature, signature):
raise IamError("SignatureDoesNotMatch")
if not hmac.compare_digest(calculated_signature, signature):
raise IamError("SignatureDoesNotMatch")
session_token = req.args.get("X-Amz-Security-Token")
if session_token:
@@ -586,7 +596,11 @@ def _validate_presigned_request(action: str, bucket_name: str, object_key: str)
request_time = datetime.strptime(amz_date, "%Y%m%dT%H%M%SZ").replace(tzinfo=timezone.utc)
except ValueError as exc:
raise IamError("Invalid X-Amz-Date") from exc
if datetime.now(timezone.utc) > request_time + timedelta(seconds=expiry):
now = datetime.now(timezone.utc)
tolerance = timedelta(seconds=current_app.config.get("SIGV4_TIMESTAMP_TOLERANCE_SECONDS", 900))
if request_time > now + tolerance:
raise IamError("Request date is too far in the future")
if now > request_time + timedelta(seconds=expiry):
raise IamError("Presigned URL expired")
signed_headers_list = [header.strip().lower() for header in signed_headers.split(";") if header]
@@ -662,7 +676,7 @@ def _extract_request_metadata() -> Dict[str, str]:
for header, value in request.headers.items():
if header.lower().startswith("x-amz-meta-"):
key = header[11:]
if key:
if key and not (key.startswith("__") and key.endswith("__")):
metadata[key] = value
return metadata
@@ -986,7 +1000,7 @@ def _render_encryption_document(config: dict[str, Any]) -> Element:
return root
def _stream_file(path, chunk_size: int = 64 * 1024):
def _stream_file(path, chunk_size: int = 256 * 1024):
with path.open("rb") as handle:
while True:
chunk = handle.read(chunk_size)
@@ -1011,14 +1025,20 @@ def _apply_object_headers(
file_stat,
metadata: Dict[str, str] | None,
etag: str,
size_override: int | None = None,
mtime_override: float | None = None,
) -> None:
if file_stat is not None:
if response.status_code != 206:
response.headers["Content-Length"] = str(file_stat.st_size)
response.headers["Last-Modified"] = http_date(file_stat.st_mtime)
effective_size = size_override if size_override is not None else (file_stat.st_size if file_stat is not None else None)
effective_mtime = mtime_override if mtime_override is not None else (file_stat.st_mtime if file_stat is not None else None)
if effective_size is not None and response.status_code != 206:
response.headers["Content-Length"] = str(effective_size)
if effective_mtime is not None:
response.headers["Last-Modified"] = http_date(effective_mtime)
response.headers["ETag"] = f'"{etag}"'
response.headers["Accept-Ranges"] = "bytes"
for key, value in (metadata or {}).items():
if key.startswith("__") and key.endswith("__"):
continue
safe_value = _sanitize_header_value(str(value))
response.headers[f"X-Amz-Meta-{key}"] = safe_value
@@ -1039,6 +1059,7 @@ def _maybe_handle_bucket_subresource(bucket_name: str) -> Response | None:
"logging": _bucket_logging_handler,
"uploads": _bucket_uploads_handler,
"policy": _bucket_policy_handler,
"policyStatus": _bucket_policy_status_handler,
"replication": _bucket_replication_handler,
"website": _bucket_website_handler,
}
@@ -1321,8 +1342,8 @@ def _bucket_cors_handler(bucket_name: str) -> Response:
def _bucket_encryption_handler(bucket_name: str) -> Response:
if request.method not in {"GET", "PUT"}:
return _method_not_allowed(["GET", "PUT"])
if request.method not in {"GET", "PUT", "DELETE"}:
return _method_not_allowed(["GET", "PUT", "DELETE"])
principal, error = _require_principal()
if error:
return error
@@ -1343,6 +1364,13 @@ def _bucket_encryption_handler(bucket_name: str) -> Response:
404,
)
return _xml_response(_render_encryption_document(config))
if request.method == "DELETE":
try:
storage.set_bucket_encryption(bucket_name, None)
except StorageError as exc:
return _error_response("NoSuchBucket", str(exc), 404)
current_app.logger.info("Bucket encryption deleted", extra={"bucket": bucket_name})
return Response(status=204)
ct_error = _require_xml_content_type()
if ct_error:
return ct_error
@@ -1439,6 +1467,99 @@ def _bucket_acl_handler(bucket_name: str) -> Response:
return _xml_response(root)
def _object_acl_handler(bucket_name: str, object_key: str) -> Response:
from .acl import create_canned_acl, GRANTEE_ALL_USERS, GRANTEE_AUTHENTICATED_USERS
if request.method not in {"GET", "PUT"}:
return _method_not_allowed(["GET", "PUT"])
storage = _storage()
try:
path = storage.get_object_path(bucket_name, object_key)
except (StorageError, FileNotFoundError):
return _error_response("NoSuchKey", "Object not found", 404)
if request.method == "PUT":
principal, error = _object_principal("write", bucket_name, object_key)
if error:
return error
owner_id = principal.access_key if principal else "anonymous"
canned_acl = request.headers.get("x-amz-acl", "private")
acl = create_canned_acl(canned_acl, owner_id)
acl_service = _acl()
metadata = storage.get_object_metadata(bucket_name, object_key)
metadata.update(acl_service.create_object_acl_metadata(acl))
safe_key = storage._sanitize_object_key(object_key, storage._object_key_max_length_bytes)
storage._write_metadata(bucket_name, safe_key, metadata)
current_app.logger.info("Object ACL set", extra={"bucket": bucket_name, "key": object_key, "acl": canned_acl})
return Response(status=200)
principal, error = _object_principal("read", bucket_name, object_key)
if error:
return error
owner_id = principal.access_key if principal else "anonymous"
acl_service = _acl()
metadata = storage.get_object_metadata(bucket_name, object_key)
acl = acl_service.get_object_acl(bucket_name, object_key, metadata)
if not acl:
acl = create_canned_acl("private", owner_id)
root = Element("AccessControlPolicy")
owner_el = SubElement(root, "Owner")
SubElement(owner_el, "ID").text = acl.owner
SubElement(owner_el, "DisplayName").text = acl.owner
acl_el = SubElement(root, "AccessControlList")
for grant in acl.grants:
grant_el = SubElement(acl_el, "Grant")
grantee = SubElement(grant_el, "Grantee")
if grant.grantee == GRANTEE_ALL_USERS:
grantee.set("{http://www.w3.org/2001/XMLSchema-instance}type", "Group")
SubElement(grantee, "URI").text = "http://acs.amazonaws.com/groups/global/AllUsers"
elif grant.grantee == GRANTEE_AUTHENTICATED_USERS:
grantee.set("{http://www.w3.org/2001/XMLSchema-instance}type", "Group")
SubElement(grantee, "URI").text = "http://acs.amazonaws.com/groups/global/AuthenticatedUsers"
else:
grantee.set("{http://www.w3.org/2001/XMLSchema-instance}type", "CanonicalUser")
SubElement(grantee, "ID").text = grant.grantee
SubElement(grantee, "DisplayName").text = grant.grantee
SubElement(grant_el, "Permission").text = grant.permission
return _xml_response(root)
def _object_attributes_handler(bucket_name: str, object_key: str) -> Response:
if request.method != "GET":
return _method_not_allowed(["GET"])
principal, error = _object_principal("read", bucket_name, object_key)
if error:
return error
storage = _storage()
try:
path = storage.get_object_path(bucket_name, object_key)
file_stat = path.stat()
metadata = storage.get_object_metadata(bucket_name, object_key)
except (StorageError, FileNotFoundError):
return _error_response("NoSuchKey", "Object not found", 404)
requested = request.headers.get("x-amz-object-attributes", "")
attrs = {a.strip() for a in requested.split(",") if a.strip()}
root = Element("GetObjectAttributesResponse")
if "ETag" in attrs:
etag = metadata.get("__etag__") or storage._compute_etag(path)
SubElement(root, "ETag").text = etag
if "StorageClass" in attrs:
SubElement(root, "StorageClass").text = "STANDARD"
if "ObjectSize" in attrs:
SubElement(root, "ObjectSize").text = str(file_stat.st_size)
if "Checksum" in attrs:
SubElement(root, "Checksum")
if "ObjectParts" in attrs:
SubElement(root, "ObjectParts")
response = _xml_response(root)
response.headers["Last-Modified"] = http_date(file_stat.st_mtime)
return response
def _bucket_list_versions_handler(bucket_name: str) -> Response:
"""Handle ListObjectVersions (GET /<bucket>?versions)."""
if request.method != "GET":
@@ -2346,7 +2467,7 @@ def _post_object(bucket_name: str) -> Response:
for field_name, value in request.form.items():
if field_name.lower().startswith("x-amz-meta-"):
key = field_name[11:]
if key:
if key and not (key.startswith("__") and key.endswith("__")):
metadata[key] = value
try:
meta = storage.put_object(bucket_name, object_key, file.stream, metadata=metadata or None)
@@ -2360,6 +2481,10 @@ def _post_object(bucket_name: str) -> Response:
if success_action_redirect:
allowed_hosts = current_app.config.get("ALLOWED_REDIRECT_HOSTS", [])
if not allowed_hosts:
current_app.logger.warning(
"ALLOWED_REDIRECT_HOSTS not configured, falling back to request Host header. "
"Set ALLOWED_REDIRECT_HOSTS for production deployments."
)
allowed_hosts = [request.host]
parsed = urlparse(success_action_redirect)
if parsed.scheme not in ("http", "https"):
@@ -2546,54 +2671,43 @@ def bucket_handler(bucket_name: str) -> Response:
else:
effective_start = marker
fetch_keys = max_keys * 10 if delimiter else max_keys
try:
list_result = storage.list_objects(
bucket_name,
max_keys=fetch_keys,
continuation_token=effective_start or None,
prefix=prefix or None,
)
objects = list_result.objects
if delimiter:
shallow_result = storage.list_objects_shallow(
bucket_name,
prefix=prefix,
delimiter=delimiter,
max_keys=max_keys,
continuation_token=effective_start or None,
)
objects = shallow_result.objects
common_prefixes = shallow_result.common_prefixes
is_truncated = shallow_result.is_truncated
next_marker = shallow_result.next_continuation_token or ""
next_continuation_token = ""
if is_truncated and next_marker and list_type == "2":
next_continuation_token = base64.urlsafe_b64encode(next_marker.encode()).decode("utf-8")
else:
list_result = storage.list_objects(
bucket_name,
max_keys=max_keys,
continuation_token=effective_start or None,
prefix=prefix or None,
)
objects = list_result.objects
common_prefixes = []
is_truncated = list_result.is_truncated
next_marker = ""
next_continuation_token = ""
if is_truncated:
if objects:
next_marker = objects[-1].key
if list_type == "2" and next_marker:
next_continuation_token = base64.urlsafe_b64encode(next_marker.encode()).decode("utf-8")
except StorageError as exc:
return _error_response("NoSuchBucket", str(exc), 404)
common_prefixes: list[str] = []
filtered_objects: list = []
if delimiter:
seen_prefixes: set[str] = set()
for obj in objects:
key_after_prefix = obj.key[len(prefix):] if prefix else obj.key
if delimiter in key_after_prefix:
common_prefix = prefix + key_after_prefix.split(delimiter)[0] + delimiter
if common_prefix not in seen_prefixes:
seen_prefixes.add(common_prefix)
common_prefixes.append(common_prefix)
else:
filtered_objects.append(obj)
objects = filtered_objects
common_prefixes = sorted(common_prefixes)
total_items = len(objects) + len(common_prefixes)
is_truncated = total_items > max_keys or list_result.is_truncated
if len(objects) >= max_keys:
objects = objects[:max_keys]
common_prefixes = []
else:
remaining = max_keys - len(objects)
common_prefixes = common_prefixes[:remaining]
next_marker = ""
next_continuation_token = ""
if is_truncated:
if objects:
next_marker = objects[-1].key
elif common_prefixes:
next_marker = common_prefixes[-1].rstrip(delimiter) if delimiter else common_prefixes[-1]
if list_type == "2" and next_marker:
next_continuation_token = base64.urlsafe_b64encode(next_marker.encode()).decode("utf-8")
if list_type == "2":
root = Element("ListBucketResult")
@@ -2669,6 +2783,12 @@ def object_handler(bucket_name: str, object_key: str):
if "legal-hold" in request.args:
return _object_legal_hold_handler(bucket_name, object_key)
if "acl" in request.args:
return _object_acl_handler(bucket_name, object_key)
if "attributes" in request.args:
return _object_attributes_handler(bucket_name, object_key)
if request.method == "POST":
if "uploads" in request.args:
return _initiate_multipart_upload(bucket_name, object_key)
@@ -2708,6 +2828,8 @@ def object_handler(bucket_name: str, object_key: str):
if validation_error:
return _error_response("InvalidArgument", validation_error, 400)
metadata["__content_type__"] = content_type or mimetypes.guess_type(object_key)[0] or "application/octet-stream"
try:
meta = storage.put_object(
bucket_name,
@@ -2722,10 +2844,23 @@ def object_handler(bucket_name: str, object_key: str):
if "Bucket" in message:
return _error_response("NoSuchBucket", message, 404)
return _error_response("InvalidArgument", message, 400)
current_app.logger.info(
"Object uploaded",
extra={"bucket": bucket_name, "key": object_key, "size": meta.size},
)
content_md5 = request.headers.get("Content-MD5")
if content_md5 and meta.etag:
try:
expected_md5 = base64.b64decode(content_md5).hex()
except Exception:
storage.delete_object(bucket_name, object_key)
return _error_response("InvalidDigest", "Content-MD5 header is not valid base64", 400)
if expected_md5 != meta.etag:
storage.delete_object(bucket_name, object_key)
return _error_response("BadDigest", "The Content-MD5 you specified did not match what we received", 400)
if current_app.logger.isEnabledFor(logging.INFO):
current_app.logger.info(
"Object uploaded",
extra={"bucket": bucket_name, "key": object_key, "size": meta.size},
)
response = Response(status=200)
if meta.etag:
response.headers["ETag"] = f'"{meta.etag}"'
@@ -2759,7 +2894,7 @@ def object_handler(bucket_name: str, object_key: str):
except StorageError as exc:
return _error_response("NoSuchKey", str(exc), 404)
metadata = storage.get_object_metadata(bucket_name, object_key)
mimetype = mimetypes.guess_type(object_key)[0] or "application/octet-stream"
mimetype = metadata.get("__content_type__") or mimetypes.guess_type(object_key)[0] or "application/octet-stream"
is_encrypted = "x-amz-server-side-encryption" in metadata
@@ -2816,7 +2951,7 @@ def object_handler(bucket_name: str, object_key: str):
f.seek(start_pos)
remaining = length_to_read
while remaining > 0:
chunk_size = min(65536, remaining)
chunk_size = min(262144, remaining)
chunk = f.read(chunk_size)
if not chunk:
break
@@ -2851,10 +2986,7 @@ def object_handler(bucket_name: str, object_key: str):
response.headers["Content-Type"] = mimetype
logged_bytes = 0
try:
file_stat = path.stat() if not is_encrypted else None
except (PermissionError, OSError):
file_stat = None
file_stat = stat if not is_encrypted else None
_apply_object_headers(response, file_stat=file_stat, metadata=metadata, etag=etag)
if request.method == "GET":
@@ -2871,8 +3003,9 @@ def object_handler(bucket_name: str, object_key: str):
if value:
response.headers[header] = _sanitize_header_value(value)
action = "Object read" if request.method == "GET" else "Object head"
current_app.logger.info(action, extra={"bucket": bucket_name, "key": object_key, "bytes": logged_bytes})
if current_app.logger.isEnabledFor(logging.INFO):
action = "Object read" if request.method == "GET" else "Object head"
current_app.logger.info(action, extra={"bucket": bucket_name, "key": object_key, "bytes": logged_bytes})
return response
if "uploadId" in request.args:
@@ -2890,7 +3023,8 @@ def object_handler(bucket_name: str, object_key: str):
storage.delete_object(bucket_name, object_key)
lock_service.delete_object_lock_metadata(bucket_name, object_key)
current_app.logger.info("Object deleted", extra={"bucket": bucket_name, "key": object_key})
if current_app.logger.isEnabledFor(logging.INFO):
current_app.logger.info("Object deleted", extra={"bucket": bucket_name, "key": object_key})
principal, _ = _require_principal()
_notifications().emit_object_removed(
@@ -2993,6 +3127,32 @@ def _bucket_policy_handler(bucket_name: str) -> Response:
return Response(status=204)
def _bucket_policy_status_handler(bucket_name: str) -> Response:
if request.method != "GET":
return _method_not_allowed(["GET"])
principal, error = _require_principal()
if error:
return error
try:
_authorize_action(principal, bucket_name, "policy")
except IamError as exc:
return _error_response("AccessDenied", str(exc), 403)
storage = _storage()
if not storage.bucket_exists(bucket_name):
return _error_response("NoSuchBucket", "Bucket does not exist", 404)
store = _bucket_policies()
policy = store.get_policy(bucket_name)
is_public = False
if policy:
for statement in policy.get("Statement", []):
if statement.get("Effect") == "Allow" and statement.get("Principal") == "*":
is_public = True
break
root = Element("PolicyStatus")
SubElement(root, "IsPublic").text = "TRUE" if is_public else "FALSE"
return _xml_response(root)
def _bucket_replication_handler(bucket_name: str) -> Response:
if request.method not in {"GET", "PUT", "DELETE"}:
return _method_not_allowed(["GET", "PUT", "DELETE"])
@@ -3205,12 +3365,20 @@ def head_object(bucket_name: str, object_key: str) -> Response:
_authorize_action(principal, bucket_name, "read", object_key=object_key)
path = _storage().get_object_path(bucket_name, object_key)
metadata = _storage().get_object_metadata(bucket_name, object_key)
stat = path.stat()
etag = _storage()._compute_etag(path)
response = Response(status=200)
_apply_object_headers(response, file_stat=stat, metadata=metadata, etag=etag)
response.headers["Content-Type"] = mimetypes.guess_type(object_key)[0] or "application/octet-stream"
etag = metadata.get("__etag__") or _storage()._compute_etag(path)
cached_size = metadata.get("__size__")
cached_mtime = metadata.get("__last_modified__")
if cached_size is not None and cached_mtime is not None:
size_val = int(cached_size)
mtime_val = float(cached_mtime)
response = Response(status=200)
_apply_object_headers(response, file_stat=None, metadata=metadata, etag=etag, size_override=size_val, mtime_override=mtime_val)
else:
stat = path.stat()
response = Response(status=200)
_apply_object_headers(response, file_stat=stat, metadata=metadata, etag=etag)
response.headers["Content-Type"] = metadata.get("__content_type__") or mimetypes.guess_type(object_key)[0] or "application/octet-stream"
return response
except (StorageError, FileNotFoundError):
return _error_response("NoSuchKey", "Object not found", 404)
@@ -3299,8 +3467,8 @@ def _copy_object(dest_bucket: str, dest_key: str, copy_source: str) -> Response:
if validation_error:
return _error_response("InvalidArgument", validation_error, 400)
else:
metadata = source_metadata
metadata = {k: v for k, v in source_metadata.items() if not (k.startswith("__") and k.endswith("__"))}
try:
with source_path.open("rb") as stream:
meta = storage.put_object(
@@ -3440,10 +3608,12 @@ def _initiate_multipart_upload(bucket_name: str, object_key: str) -> Response:
return error
metadata = _extract_request_metadata()
content_type = request.headers.get("Content-Type")
metadata["__content_type__"] = content_type or mimetypes.guess_type(object_key)[0] or "application/octet-stream"
try:
upload_id = _storage().initiate_multipart_upload(
bucket_name,
object_key,
bucket_name,
object_key,
metadata=metadata or None
)
except StorageError as exc:
@@ -3492,6 +3662,15 @@ def _upload_part(bucket_name: str, object_key: str) -> Response:
return _error_response("NoSuchUpload", str(exc), 404)
return _error_response("InvalidArgument", str(exc), 400)
content_md5 = request.headers.get("Content-MD5")
if content_md5 and etag:
try:
expected_md5 = base64.b64decode(content_md5).hex()
except Exception:
return _error_response("InvalidDigest", "Content-MD5 header is not valid base64", 400)
if expected_md5 != etag:
return _error_response("BadDigest", "The Content-MD5 you specified did not match what we received", 400)
response = Response(status=200)
response.headers["ETag"] = f'"{etag}"'
return response

View File

@@ -245,6 +245,7 @@ def stream_objects_ndjson(
url_templates: dict[str, str],
display_tz: str = "UTC",
versioning_enabled: bool = False,
delimiter: Optional[str] = None,
) -> Generator[str, None, None]:
meta_line = json.dumps({
"type": "meta",
@@ -258,11 +259,20 @@ def stream_objects_ndjson(
kwargs: dict[str, Any] = {"Bucket": bucket_name, "MaxKeys": 1000}
if prefix:
kwargs["Prefix"] = prefix
if delimiter:
kwargs["Delimiter"] = delimiter
running_count = 0
try:
paginator = client.get_paginator("list_objects_v2")
for page in paginator.paginate(**kwargs):
for obj in page.get("Contents", []):
for cp in page.get("CommonPrefixes", []):
yield json.dumps({
"type": "folder",
"prefix": cp["Prefix"],
}) + "\n"
page_contents = page.get("Contents", [])
for obj in page_contents:
last_mod = obj["LastModified"]
yield json.dumps({
"type": "object",
@@ -273,6 +283,8 @@ def stream_objects_ndjson(
"last_modified_iso": format_datetime_iso(last_mod, display_tz),
"etag": obj.get("ETag", "").strip('"'),
}) + "\n"
running_count += len(page_contents)
yield json.dumps({"type": "count", "total_count": running_count}) + "\n"
except ClientError as exc:
error_msg = exc.response.get("Error", {}).get("Message", "S3 operation failed")
yield json.dumps({"type": "error", "error": error_msg}) + "\n"

File diff suppressed because it is too large Load Diff

150
app/ui.py
View File

@@ -508,11 +508,15 @@ def bucket_detail(bucket_name: str):
can_manage_quota = is_replication_admin
website_config = None
website_domains = []
if website_hosting_enabled:
try:
website_config = storage.get_bucket_website(bucket_name)
except StorageError:
website_config = None
domain_store = current_app.extensions.get("website_domains")
if domain_store:
website_domains = domain_store.get_domains_for_bucket(bucket_name)
objects_api_url = url_for("ui.list_bucket_objects", bucket_name=bucket_name)
objects_stream_url = url_for("ui.stream_bucket_objects", bucket_name=bucket_name)
@@ -558,6 +562,7 @@ def bucket_detail(bucket_name: str):
site_sync_enabled=site_sync_enabled,
website_hosting_enabled=website_hosting_enabled,
website_config=website_config,
website_domains=website_domains,
can_manage_website=can_edit_policy,
)
@@ -611,6 +616,7 @@ def stream_bucket_objects(bucket_name: str):
return jsonify({"error": str(exc)}), 403
prefix = request.args.get("prefix") or None
delimiter = request.args.get("delimiter") or None
try:
client = get_session_s3_client()
@@ -624,6 +630,7 @@ def stream_bucket_objects(bucket_name: str):
return Response(
stream_objects_ndjson(
client, bucket_name, prefix, url_templates, display_tz, versioning_enabled,
delimiter=delimiter,
),
mimetype='application/x-ndjson',
headers={
@@ -634,6 +641,33 @@ def stream_bucket_objects(bucket_name: str):
)
@ui_bp.get("/buckets/<bucket_name>/objects/search")
@limiter.limit("30 per minute")
def search_bucket_objects(bucket_name: str):
principal = _current_principal()
try:
_authorize_ui(principal, bucket_name, "list")
except IamError as exc:
return jsonify({"error": str(exc)}), 403
query = request.args.get("q", "").strip()
if not query:
return jsonify({"results": [], "truncated": False})
try:
limit = max(1, min(int(request.args.get("limit", 500)), 1000))
except (ValueError, TypeError):
limit = 500
prefix = request.args.get("prefix", "").strip()
storage = _storage()
try:
return jsonify(storage.search_objects(bucket_name, query, prefix=prefix, limit=limit))
except StorageError as exc:
return jsonify({"error": str(exc)}), 404
@ui_bp.post("/buckets/<bucket_name>/upload")
@limiter.limit("30 per minute")
def upload_object(bucket_name: str):
@@ -738,7 +772,6 @@ def initiate_multipart_upload(bucket_name: str):
@ui_bp.put("/buckets/<bucket_name>/multipart/<upload_id>/parts")
@limiter.exempt
@csrf.exempt
def upload_multipart_part(bucket_name: str, upload_id: str):
principal = _current_principal()
@@ -1297,12 +1330,14 @@ def object_versions(bucket_name: str, object_key: str):
for v in resp.get("Versions", []):
if v.get("Key") != object_key:
continue
if v.get("IsLatest", False):
continue
versions.append({
"version_id": v.get("VersionId", ""),
"last_modified": v["LastModified"].isoformat() if v.get("LastModified") else None,
"size": v.get("Size", 0),
"etag": v.get("ETag", "").strip('"'),
"is_latest": v.get("IsLatest", False),
"is_latest": False,
})
return jsonify({"versions": versions})
except (ClientError, EndpointConnectionError, ConnectionClosedError) as exc:
@@ -1719,6 +1754,10 @@ def iam_dashboard():
users = iam_service.list_users() if not locked else []
config_summary = iam_service.config_summary()
config_document = json.dumps(iam_service.export_config(mask_secrets=True), indent=2)
from datetime import datetime as _dt, timedelta as _td, timezone as _tz
_now = _dt.now(_tz.utc)
now_iso = _now.isoformat()
soon_iso = (_now + _td(days=7)).isoformat()
return render_template(
"iam.html",
users=users,
@@ -1728,6 +1767,8 @@ def iam_dashboard():
config_summary=config_summary,
config_document=config_document,
disclosed_secret=disclosed_secret,
now_iso=now_iso,
soon_iso=soon_iso,
)
@@ -1747,6 +1788,8 @@ def create_iam_user():
return jsonify({"error": "Display name must be 64 characters or fewer"}), 400
flash("Display name must be 64 characters or fewer", "danger")
return redirect(url_for("ui.iam_dashboard"))
custom_access_key = request.form.get("access_key", "").strip() or None
custom_secret_key = request.form.get("secret_key", "").strip() or None
policies_text = request.form.get("policies", "").strip()
policies = None
if policies_text:
@@ -1757,8 +1800,21 @@ def create_iam_user():
return jsonify({"error": f"Invalid JSON: {exc}"}), 400
flash(f"Invalid JSON: {exc}", "danger")
return redirect(url_for("ui.iam_dashboard"))
expires_at = request.form.get("expires_at", "").strip() or None
if expires_at:
try:
from datetime import datetime as _dt, timezone as _tz
exp_dt = _dt.fromisoformat(expires_at)
if exp_dt.tzinfo is None:
exp_dt = exp_dt.replace(tzinfo=_tz.utc)
expires_at = exp_dt.isoformat()
except (ValueError, TypeError):
if _wants_json():
return jsonify({"error": "Invalid expiry date format"}), 400
flash("Invalid expiry date format", "danger")
return redirect(url_for("ui.iam_dashboard"))
try:
created = _iam().create_user(display_name=display_name, policies=policies)
created = _iam().create_user(display_name=display_name, policies=policies, access_key=custom_access_key, secret_key=custom_secret_key, expires_at=expires_at)
except IamError as exc:
if _wants_json():
return jsonify({"error": str(exc)}), 400
@@ -1932,6 +1988,45 @@ def update_iam_policies(access_key: str):
return redirect(url_for("ui.iam_dashboard"))
@ui_bp.post("/iam/users/<access_key>/expiry")
def update_iam_expiry(access_key: str):
principal = _current_principal()
try:
_iam().authorize(principal, None, "iam:update_policy")
except IamError as exc:
if _wants_json():
return jsonify({"error": str(exc)}), 403
flash(str(exc), "danger")
return redirect(url_for("ui.iam_dashboard"))
expires_at = request.form.get("expires_at", "").strip() or None
if expires_at:
try:
from datetime import datetime as _dt, timezone as _tz
exp_dt = _dt.fromisoformat(expires_at)
if exp_dt.tzinfo is None:
exp_dt = exp_dt.replace(tzinfo=_tz.utc)
expires_at = exp_dt.isoformat()
except (ValueError, TypeError):
if _wants_json():
return jsonify({"error": "Invalid expiry date format"}), 400
flash("Invalid expiry date format", "danger")
return redirect(url_for("ui.iam_dashboard"))
try:
_iam().update_user_expiry(access_key, expires_at)
if _wants_json():
return jsonify({"success": True, "message": f"Updated expiry for {access_key}", "expires_at": expires_at})
label = expires_at if expires_at else "never"
flash(f"Expiry for {access_key} set to {label}", "success")
except IamError as exc:
if _wants_json():
return jsonify({"error": str(exc)}), 400
flash(str(exc), "danger")
return redirect(url_for("ui.iam_dashboard"))
@ui_bp.post("/connections")
def create_connection():
principal = _current_principal()
@@ -2374,7 +2469,10 @@ def website_domains_dashboard():
store = current_app.extensions.get("website_domains")
mappings = store.list_all() if store else []
storage = _storage()
buckets = [b.name for b in storage.list_buckets()]
buckets = [
b.name for b in storage.list_buckets()
if storage.get_bucket_website(b.name)
]
return render_template(
"website_domains.html",
mappings=mappings,
@@ -3293,9 +3391,12 @@ def sites_dashboard():
@ui_bp.post("/sites/local")
def update_local_site():
principal = _current_principal()
wants_json = request.headers.get("X-Requested-With") == "XMLHttpRequest"
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
if wants_json:
return jsonify({"error": "Access denied"}), 403
flash("Access denied", "danger")
return redirect(url_for("ui.sites_dashboard"))
@@ -3306,6 +3407,8 @@ def update_local_site():
display_name = request.form.get("display_name", "").strip()
if not site_id:
if wants_json:
return jsonify({"error": "Site ID is required"}), 400
flash("Site ID is required", "danger")
return redirect(url_for("ui.sites_dashboard"))
@@ -3327,6 +3430,8 @@ def update_local_site():
)
registry.set_local_site(site)
if wants_json:
return jsonify({"message": "Local site configuration updated"})
flash("Local site configuration updated", "success")
return redirect(url_for("ui.sites_dashboard"))
@@ -3334,9 +3439,12 @@ def update_local_site():
@ui_bp.post("/sites/peers")
def add_peer_site():
principal = _current_principal()
wants_json = request.headers.get("X-Requested-With") == "XMLHttpRequest"
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
if wants_json:
return jsonify({"error": "Access denied"}), 403
flash("Access denied", "danger")
return redirect(url_for("ui.sites_dashboard"))
@@ -3348,9 +3456,13 @@ def add_peer_site():
connection_id = request.form.get("connection_id", "").strip() or None
if not site_id:
if wants_json:
return jsonify({"error": "Site ID is required"}), 400
flash("Site ID is required", "danger")
return redirect(url_for("ui.sites_dashboard"))
if not endpoint:
if wants_json:
return jsonify({"error": "Endpoint is required"}), 400
flash("Endpoint is required", "danger")
return redirect(url_for("ui.sites_dashboard"))
@@ -3362,10 +3474,14 @@ def add_peer_site():
registry = _site_registry()
if registry.get_peer(site_id):
if wants_json:
return jsonify({"error": f"Peer site '{site_id}' already exists"}), 409
flash(f"Peer site '{site_id}' already exists", "danger")
return redirect(url_for("ui.sites_dashboard"))
if connection_id and not _connections().get(connection_id):
if wants_json:
return jsonify({"error": f"Connection '{connection_id}' not found"}), 404
flash(f"Connection '{connection_id}' not found", "danger")
return redirect(url_for("ui.sites_dashboard"))
@@ -3379,6 +3495,11 @@ def add_peer_site():
)
registry.add_peer(peer)
if wants_json:
redirect_url = None
if connection_id:
redirect_url = url_for("ui.replication_wizard", site_id=site_id)
return jsonify({"message": f"Peer site '{site_id}' added", "redirect": redirect_url})
flash(f"Peer site '{site_id}' added", "success")
if connection_id:
@@ -3389,9 +3510,12 @@ def add_peer_site():
@ui_bp.post("/sites/peers/<site_id>/update")
def update_peer_site(site_id: str):
principal = _current_principal()
wants_json = request.headers.get("X-Requested-With") == "XMLHttpRequest"
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
if wants_json:
return jsonify({"error": "Access denied"}), 403
flash("Access denied", "danger")
return redirect(url_for("ui.sites_dashboard"))
@@ -3399,6 +3523,8 @@ def update_peer_site(site_id: str):
existing = registry.get_peer(site_id)
if not existing:
if wants_json:
return jsonify({"error": f"Peer site '{site_id}' not found"}), 404
flash(f"Peer site '{site_id}' not found", "danger")
return redirect(url_for("ui.sites_dashboard"))
@@ -3406,7 +3532,10 @@ def update_peer_site(site_id: str):
region = request.form.get("region", existing.region).strip()
priority = request.form.get("priority", str(existing.priority))
display_name = request.form.get("display_name", existing.display_name).strip()
connection_id = request.form.get("connection_id", "").strip() or existing.connection_id
if "connection_id" in request.form:
connection_id = request.form["connection_id"].strip() or None
else:
connection_id = existing.connection_id
try:
priority_int = int(priority)
@@ -3414,6 +3543,8 @@ def update_peer_site(site_id: str):
priority_int = existing.priority
if connection_id and not _connections().get(connection_id):
if wants_json:
return jsonify({"error": f"Connection '{connection_id}' not found"}), 404
flash(f"Connection '{connection_id}' not found", "danger")
return redirect(url_for("ui.sites_dashboard"))
@@ -3430,6 +3561,8 @@ def update_peer_site(site_id: str):
)
registry.update_peer(peer)
if wants_json:
return jsonify({"message": f"Peer site '{site_id}' updated"})
flash(f"Peer site '{site_id}' updated", "success")
return redirect(url_for("ui.sites_dashboard"))
@@ -3437,16 +3570,23 @@ def update_peer_site(site_id: str):
@ui_bp.post("/sites/peers/<site_id>/delete")
def delete_peer_site(site_id: str):
principal = _current_principal()
wants_json = request.headers.get("X-Requested-With") == "XMLHttpRequest"
try:
_iam().authorize(principal, None, "iam:*")
except IamError:
if wants_json:
return jsonify({"error": "Access denied"}), 403
flash("Access denied", "danger")
return redirect(url_for("ui.sites_dashboard"))
registry = _site_registry()
if registry.delete_peer(site_id):
if wants_json:
return jsonify({"message": f"Peer site '{site_id}' deleted"})
flash(f"Peer site '{site_id}' deleted", "success")
else:
if wants_json:
return jsonify({"error": f"Peer site '{site_id}' not found"}), 404
flash(f"Peer site '{site_id}' not found", "danger")
return redirect(url_for("ui.sites_dashboard"))

View File

@@ -1,6 +1,6 @@
from __future__ import annotations
APP_VERSION = "0.3.0"
APP_VERSION = "0.3.7"
def get_version() -> str:

View File

@@ -35,13 +35,16 @@ class WebsiteDomainStore:
self.config_path = config_path
self._lock = threading.Lock()
self._domains: Dict[str, str] = {}
self._last_mtime: float = 0.0
self.reload()
def reload(self) -> None:
if not self.config_path.exists():
self._domains = {}
self._last_mtime = 0.0
return
try:
self._last_mtime = self.config_path.stat().st_mtime
with open(self.config_path, "r", encoding="utf-8") as f:
data = json.load(f)
if isinstance(data, dict):
@@ -51,19 +54,45 @@ class WebsiteDomainStore:
except (OSError, json.JSONDecodeError):
self._domains = {}
def _maybe_reload(self) -> None:
try:
if self.config_path.exists():
mtime = self.config_path.stat().st_mtime
if mtime != self._last_mtime:
self._last_mtime = mtime
with open(self.config_path, "r", encoding="utf-8") as f:
data = json.load(f)
if isinstance(data, dict):
self._domains = {k.lower(): v for k, v in data.items()}
else:
self._domains = {}
elif self._domains:
self._domains = {}
self._last_mtime = 0.0
except (OSError, json.JSONDecodeError):
pass
def _save(self) -> None:
self.config_path.parent.mkdir(parents=True, exist_ok=True)
with open(self.config_path, "w", encoding="utf-8") as f:
json.dump(self._domains, f, indent=2)
self._last_mtime = self.config_path.stat().st_mtime
def list_all(self) -> List[Dict[str, str]]:
with self._lock:
self._maybe_reload()
return [{"domain": d, "bucket": b} for d, b in self._domains.items()]
def get_bucket(self, domain: str) -> Optional[str]:
with self._lock:
self._maybe_reload()
return self._domains.get(domain.lower())
def get_domains_for_bucket(self, bucket: str) -> List[str]:
with self._lock:
self._maybe_reload()
return [d for d, b in self._domains.items() if b == bucket]
def set_mapping(self, domain: str, bucket: str) -> None:
with self._lock:
self._domains[domain.lower()] = bucket

104
docs.md
View File

@@ -139,18 +139,21 @@ All configuration is done via environment variables. The table below lists every
| `API_BASE_URL` | `http://127.0.0.1:5000` | Internal S3 API URL used by the web UI proxy. Also used for presigned URL generation. Set to your public URL if running behind a reverse proxy. |
| `AWS_REGION` | `us-east-1` | Region embedded in SigV4 credential scope. |
| `AWS_SERVICE` | `s3` | Service string for SigV4. |
| `DISPLAY_TIMEZONE` | `UTC` | Timezone for timestamps in the web UI (e.g., `US/Eastern`, `Asia/Tokyo`). |
### IAM & Security
| Variable | Default | Notes |
| --- | --- | --- |
| `IAM_CONFIG` | `data/.myfsio.sys/config/iam.json` | Stores users, secrets, and inline policies. |
| `IAM_CONFIG` | `data/.myfsio.sys/config/iam.json` | Stores users, secrets, and inline policies. Encrypted at rest when `SECRET_KEY` is set. |
| `BUCKET_POLICY_PATH` | `data/.myfsio.sys/config/bucket_policies.json` | Bucket policy store (auto hot-reload). |
| `AUTH_MAX_ATTEMPTS` | `5` | Failed login attempts before lockout. |
| `AUTH_LOCKOUT_MINUTES` | `15` | Lockout duration after max failed attempts. |
| `SESSION_LIFETIME_DAYS` | `30` | How long UI sessions remain valid. |
| `SECRET_TTL_SECONDS` | `300` | TTL for ephemeral secrets (presigned URLs). |
| `UI_ENFORCE_BUCKET_POLICIES` | `false` | Whether the UI should enforce bucket policies. |
| `ADMIN_ACCESS_KEY` | (none) | Custom access key for the admin user on first run or credential reset. If unset, a random key is generated. |
| `ADMIN_SECRET_KEY` | (none) | Custom secret key for the admin user on first run or credential reset. If unset, a random key is generated. |
### CORS (Cross-Origin Resource Sharing)
@@ -170,6 +173,7 @@ All configuration is done via environment variables. The table below lists every
| `RATE_LIMIT_BUCKET_OPS` | `120 per minute` | Rate limit for bucket operations (PUT/DELETE/GET/POST on `/<bucket>`). |
| `RATE_LIMIT_OBJECT_OPS` | `240 per minute` | Rate limit for object operations (PUT/GET/DELETE/POST on `/<bucket>/<key>`). |
| `RATE_LIMIT_HEAD_OPS` | `100 per minute` | Rate limit for HEAD requests (bucket and object). |
| `RATE_LIMIT_ADMIN` | `60 per minute` | Rate limit for admin API endpoints (`/admin/*`). |
| `RATE_LIMIT_STORAGE_URI` | `memory://` | Storage backend for rate limits. Use `redis://host:port` for distributed setups. |
### Server Configuration
@@ -256,6 +260,12 @@ Once enabled, configure lifecycle rules via:
| `MULTIPART_MIN_PART_SIZE` | `5242880` (5 MB) | Minimum part size for multipart uploads. |
| `BUCKET_STATS_CACHE_TTL` | `60` | Seconds to cache bucket statistics. |
| `BULK_DELETE_MAX_KEYS` | `500` | Maximum keys per bulk delete request. |
| `BULK_DOWNLOAD_MAX_BYTES` | `1073741824` (1 GiB) | Maximum total size for bulk ZIP downloads. |
| `OBJECT_CACHE_TTL` | `60` | Seconds to cache object metadata. |
#### Gzip Compression
API responses for JSON, XML, HTML, CSS, and JavaScript are automatically gzip-compressed when the client sends `Accept-Encoding: gzip`. Compression activates for responses larger than 500 bytes and is handled by a WSGI middleware (`app/compression.py`). Binary object downloads and streaming responses are never compressed. No configuration is needed.
### Server Settings
@@ -269,13 +279,14 @@ Once enabled, configure lifecycle rules via:
Before deploying to production, ensure you:
1. **Set `SECRET_KEY`** - Use a strong, unique value (e.g., `openssl rand -base64 32`)
1. **Set `SECRET_KEY`** - Use a strong, unique value (e.g., `openssl rand -base64 32`). This also enables IAM config encryption at rest.
2. **Restrict CORS** - Set `CORS_ORIGINS` to your specific domains instead of `*`
3. **Configure `API_BASE_URL`** - Required for correct presigned URLs behind proxies
4. **Enable HTTPS** - Use a reverse proxy (nginx, Cloudflare) with TLS termination
5. **Review rate limits** - Adjust `RATE_LIMIT_DEFAULT` based on your needs
6. **Secure master keys** - Back up `ENCRYPTION_MASTER_KEY_PATH` if using encryption
7. **Use `--prod` flag** - Runs with Waitress instead of Flask dev server
8. **Set credential expiry** - Assign `expires_at` to non-admin users for time-limited access
### Proxy Configuration
@@ -285,6 +296,12 @@ If running behind a reverse proxy (e.g., Nginx, Cloudflare, or a tunnel), ensure
The application automatically trusts these headers to generate correct presigned URLs (e.g., `https://s3.example.com/...` instead of `http://127.0.0.1:5000/...`). Alternatively, you can explicitly set `API_BASE_URL` to your public endpoint.
| Variable | Default | Notes |
| --- | --- | --- |
| `NUM_TRUSTED_PROXIES` | `1` | Number of trusted reverse proxies for `X-Forwarded-*` header processing. |
| `ALLOWED_REDIRECT_HOSTS` | `""` | Comma-separated whitelist of safe redirect targets. Empty allows only same-host redirects. |
| `ALLOW_INTERNAL_ENDPOINTS` | `false` | Allow connections to internal/private IPs for webhooks and replication targets. **Keep disabled in production unless needed.** |
## 4. Upgrading and Updates
### Version Checking
@@ -619,9 +636,10 @@ MyFSIO implements a comprehensive Identity and Access Management (IAM) system th
### Getting Started
1. On first boot, `data/.myfsio.sys/config/iam.json` is created with a randomly generated admin user. The access key and secret key are printed to the console during first startup. If you miss it, check the `iam.json` file directly—credentials are stored in plaintext.
1. On first boot, `data/.myfsio.sys/config/iam.json` is created with a randomly generated admin user. The access key and secret key are printed to the console during first startup. You can set `ADMIN_ACCESS_KEY` and `ADMIN_SECRET_KEY` environment variables to use custom credentials instead of random ones. If `SECRET_KEY` is configured, the IAM config file is encrypted at rest using AES (Fernet). To reset admin credentials later, run `python run.py --reset-cred`.
2. Sign into the UI using the generated credentials, then open **IAM**:
- **Create user**: supply a display name and optional JSON inline policy array.
- **Create user**: supply a display name, optional JSON inline policy array, and optional credential expiry date.
- **Set expiry**: assign an expiration date to any user's credentials. Expired credentials are rejected at authentication time. The UI shows expiry badges and preset durations (1h, 24h, 7d, 30d, 90d).
- **Rotate secret**: generates a new secret key; the UI surfaces it once.
- **Policy editor**: select a user, paste an array of objects (`{"bucket": "*", "actions": ["list", "read"]}`), and submit. Alias support includes AWS-style verbs (e.g., `s3:GetObject`).
3. Wildcard action `iam:*` is supported for admin user definitions.
@@ -639,8 +657,11 @@ The API expects every request to include authentication headers. The UI persists
**Security Features:**
- **Lockout Protection**: After `AUTH_MAX_ATTEMPTS` (default: 5) failed login attempts, the account is locked for `AUTH_LOCKOUT_MINUTES` (default: 15 minutes).
- **Credential Expiry**: Each user can have an optional `expires_at` timestamp (ISO 8601). Once expired, all API requests using those credentials are rejected. Set or clear expiry via the UI or API.
- **IAM Config Encryption**: When `SECRET_KEY` is set, the IAM config file (`iam.json`) is encrypted at rest using Fernet (AES-256-CBC with HMAC). Existing plaintext configs are automatically encrypted on next load.
- **Session Management**: UI sessions remain valid for `SESSION_LIFETIME_DAYS` (default: 30 days).
- **Hot Reload**: IAM configuration changes take effect immediately without restart.
- **Credential Reset**: Run `python run.py --reset-cred` to reset admin credentials. Supports `ADMIN_ACCESS_KEY` and `ADMIN_SECRET_KEY` env vars for deterministic keys.
### Permission Model
@@ -800,7 +821,8 @@ curl -X POST http://localhost:5000/iam/users \
-H "X-Access-Key: ..." -H "X-Secret-Key: ..." \
-d '{
"display_name": "New User",
"policies": [{"bucket": "*", "actions": ["list", "read"]}]
"policies": [{"bucket": "*", "actions": ["list", "read"]}],
"expires_at": "2026-12-31T23:59:59Z"
}'
# Rotate user secret (requires iam:rotate_key)
@@ -813,6 +835,18 @@ curl -X PUT http://localhost:5000/iam/users/<access-key>/policies \
-H "X-Access-Key: ..." -H "X-Secret-Key: ..." \
-d '[{"bucket": "*", "actions": ["list", "read", "write"]}]'
# Update credential expiry (requires iam:update_policy)
curl -X POST http://localhost:5000/iam/users/<access-key>/expiry \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "X-Access-Key: ..." -H "X-Secret-Key: ..." \
-d 'expires_at=2026-12-31T23:59:59Z'
# Remove credential expiry (never expires)
curl -X POST http://localhost:5000/iam/users/<access-key>/expiry \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "X-Access-Key: ..." -H "X-Secret-Key: ..." \
-d 'expires_at='
# Delete a user (requires iam:delete_user)
curl -X DELETE http://localhost:5000/iam/users/<access-key> \
-H "X-Access-Key: ..." -H "X-Secret-Key: ..."
@@ -824,8 +858,9 @@ When a request is made, permissions are evaluated in this order:
1. **Authentication** Verify the access key and secret key are valid
2. **Lockout Check** Ensure the account is not locked due to failed attempts
3. **IAM Policy Check** Verify the user has the required action for the target bucket
4. **Bucket Policy Check** If a bucket policy exists, verify it allows the action
3. **Expiry Check** Reject requests if the user's credentials have expired (`expires_at`)
4. **IAM Policy Check** Verify the user has the required action for the target bucket
5. **Bucket Policy Check** If a bucket policy exists, verify it allows the action
A request is allowed only if:
- The IAM policy grants the action, AND
@@ -912,7 +947,7 @@ Objects with forward slashes (`/`) in their keys are displayed as a folder hiera
- Select multiple objects using checkboxes
- **Bulk Delete**: Delete multiple objects at once
- **Bulk Download**: Download selected objects as individual files
- **Bulk Download**: Download selected objects as a single ZIP archive (up to `BULK_DOWNLOAD_MAX_BYTES`, default 1 GiB)
#### Search & Filter
@@ -985,6 +1020,7 @@ MyFSIO supports **server-side encryption at rest** to protect your data. When en
|------|-------------|
| **AES-256 (SSE-S3)** | Server-managed encryption using a local master key |
| **KMS (SSE-KMS)** | Encryption using customer-managed keys via the built-in KMS |
| **SSE-C** | Server-side encryption with customer-provided keys (per-request) |
### Enabling Encryption
@@ -1083,6 +1119,44 @@ encrypted, metadata = ClientEncryptionHelper.encrypt_for_upload(plaintext, key)
decrypted = ClientEncryptionHelper.decrypt_from_download(encrypted, metadata, key)
```
### SSE-C (Customer-Provided Keys)
With SSE-C, you provide your own 256-bit AES encryption key with each request. The server encrypts/decrypts using your key but never stores it. You must supply the same key for both upload and download.
**Required headers:**
| Header | Value |
|--------|-------|
| `x-amz-server-side-encryption-customer-algorithm` | `AES256` |
| `x-amz-server-side-encryption-customer-key` | Base64-encoded 256-bit key |
| `x-amz-server-side-encryption-customer-key-MD5` | Base64-encoded MD5 of the key |
```bash
# Generate a 256-bit key
KEY=$(openssl rand -base64 32)
KEY_MD5=$(echo -n "$KEY" | base64 -d | openssl dgst -md5 -binary | base64)
# Upload with SSE-C
curl -X PUT "http://localhost:5000/my-bucket/secret.txt" \
-H "X-Access-Key: ..." -H "X-Secret-Key: ..." \
-H "x-amz-server-side-encryption-customer-algorithm: AES256" \
-H "x-amz-server-side-encryption-customer-key: $KEY" \
-H "x-amz-server-side-encryption-customer-key-MD5: $KEY_MD5" \
--data-binary @secret.txt
# Download with SSE-C (same key required)
curl "http://localhost:5000/my-bucket/secret.txt" \
-H "X-Access-Key: ..." -H "X-Secret-Key: ..." \
-H "x-amz-server-side-encryption-customer-algorithm: AES256" \
-H "x-amz-server-side-encryption-customer-key: $KEY" \
-H "x-amz-server-side-encryption-customer-key-MD5: $KEY_MD5"
```
**Key points:**
- SSE-C does not require `ENCRYPTION_ENABLED` or `KMS_ENABLED` — the key is provided per-request
- If you lose your key, the data is irrecoverable
- The MD5 header is optional but recommended for integrity verification
### Important Notes
- **Existing objects are NOT encrypted** - Only new uploads after enabling encryption are encrypted
@@ -1959,6 +2033,20 @@ curl -X PUT "http://localhost:5000/my-bucket/file.txt" \
-H "x-amz-meta-newkey: newvalue"
```
### MoveObject (UI)
Move an object to a different key or bucket. This is a UI-only convenience operation that performs a copy followed by a delete of the source. Requires `read` and `delete` on the source, and `write` on the destination.
```bash
# Move via UI API
curl -X POST "http://localhost:5100/ui/buckets/my-bucket/objects/old-path/file.txt/move" \
-H "Content-Type: application/json" \
--cookie "session=..." \
-d '{"dest_bucket": "other-bucket", "dest_key": "new-path/file.txt"}'
```
The move is atomic from the caller's perspective: if the copy succeeds but the delete fails, the object exists in both locations (no data loss).
### UploadPartCopy
Copy data from an existing object into a multipart upload part:

421
myfsio_core/Cargo.lock generated
View File

@@ -1,421 +0,0 @@
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 4
[[package]]
name = "aho-corasick"
version = "1.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ddd31a130427c27518df266943a5308ed92d4b226cc639f5a8f1002816174301"
dependencies = [
"memchr",
]
[[package]]
name = "allocator-api2"
version = "0.2.21"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "683d7910e743518b0e34f1186f92494becacb047c7b6bf616c96772180fef923"
[[package]]
name = "bitflags"
version = "2.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "843867be96c8daad0d758b57df9392b6d8d271134fce549de6ce169ff98a92af"
[[package]]
name = "block-buffer"
version = "0.10.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3078c7629b62d3f0439517fa394996acacc5cbc91c5a20d8c658e77abd503a71"
dependencies = [
"generic-array",
]
[[package]]
name = "cfg-if"
version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"
[[package]]
name = "cpufeatures"
version = "0.2.17"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "59ed5838eebb26a2bb2e58f6d5b5316989ae9d08bab10e0e6d103e656d1b0280"
dependencies = [
"libc",
]
[[package]]
name = "crypto-common"
version = "0.1.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "78c8292055d1c1df0cce5d180393dc8cce0abec0a7102adb6c7b1eef6016d60a"
dependencies = [
"generic-array",
"typenum",
]
[[package]]
name = "digest"
version = "0.10.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9ed9a281f7bc9b7576e61468ba615a66a5c8cfdff42420a70aa82701a3b1e292"
dependencies = [
"block-buffer",
"crypto-common",
"subtle",
]
[[package]]
name = "equivalent"
version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "877a4ace8713b0bcf2a4e7eec82529c029f1d0619886d18145fea96c3ffe5c0f"
[[package]]
name = "foldhash"
version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d9c4f5dac5e15c24eb999c26181a6ca40b39fe946cbe4c263c7209467bc83af2"
[[package]]
name = "generic-array"
version = "0.14.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85649ca51fd72272d7821adaf274ad91c288277713d9c18820d8499a7ff69e9a"
dependencies = [
"typenum",
"version_check",
]
[[package]]
name = "hashbrown"
version = "0.15.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9229cfe53dfd69f0609a49f65461bd93001ea1ef889cd5529dd176593f5338a1"
dependencies = [
"allocator-api2",
"equivalent",
"foldhash",
]
[[package]]
name = "heck"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2304e00983f87ffb38b55b444b5e3b60a884b5d30c0fca7d82fe33449bbe55ea"
[[package]]
name = "hex"
version = "0.4.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7f24254aa9a54b5c858eaee2f5bccdb46aaf0e486a595ed5fd8f86ba55232a70"
[[package]]
name = "hmac"
version = "0.12.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6c49c37c09c17a53d937dfbb742eb3a961d65a994e6bcdcf37e7399d0cc8ab5e"
dependencies = [
"digest",
]
[[package]]
name = "libc"
version = "0.2.182"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6800badb6cb2082ffd7b6a67e6125bb39f18782f793520caee8cb8846be06112"
[[package]]
name = "lock_api"
version = "0.4.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "224399e74b87b5f3557511d98dff8b14089b3dadafcab6bb93eab67d3aace965"
dependencies = [
"scopeguard",
]
[[package]]
name = "lru"
version = "0.14.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9f8cc7106155f10bdf99a6f379688f543ad6596a415375b36a59a054ceda1198"
dependencies = [
"hashbrown",
]
[[package]]
name = "md-5"
version = "0.10.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d89e7ee0cfbedfc4da3340218492196241d89eefb6dab27de5df917a6d2e78cf"
dependencies = [
"cfg-if",
"digest",
]
[[package]]
name = "memchr"
version = "2.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79"
[[package]]
name = "myfsio_core"
version = "0.1.0"
dependencies = [
"hex",
"hmac",
"lru",
"md-5",
"parking_lot",
"pyo3",
"regex",
"sha2",
"unicode-normalization",
]
[[package]]
name = "once_cell"
version = "1.21.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "42f5e15c9953c5e4ccceeb2e7382a716482c34515315f7b03532b8b4e8393d2d"
[[package]]
name = "parking_lot"
version = "0.12.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "93857453250e3077bd71ff98b6a65ea6621a19bb0f559a85248955ac12c45a1a"
dependencies = [
"lock_api",
"parking_lot_core",
]
[[package]]
name = "parking_lot_core"
version = "0.9.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2621685985a2ebf1c516881c026032ac7deafcda1a2c9b7850dc81e3dfcb64c1"
dependencies = [
"cfg-if",
"libc",
"redox_syscall",
"smallvec",
"windows-link",
]
[[package]]
name = "portable-atomic"
version = "1.13.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c33a9471896f1c69cecef8d20cbe2f7accd12527ce60845ff44c153bb2a21b49"
[[package]]
name = "proc-macro2"
version = "1.0.106"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8fd00f0bb2e90d81d1044c2b32617f68fcb9fa3bb7640c23e9c748e53fb30934"
dependencies = [
"unicode-ident",
]
[[package]]
name = "pyo3"
version = "0.28.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "14c738662e2181be11cb82487628404254902bb3225d8e9e99c31f3ef82a405c"
dependencies = [
"libc",
"once_cell",
"portable-atomic",
"pyo3-build-config",
"pyo3-ffi",
"pyo3-macros",
]
[[package]]
name = "pyo3-build-config"
version = "0.28.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f9ca0864a7dd3c133a7f3f020cbff2e12e88420da854c35540fd20ce2d60e435"
dependencies = [
"target-lexicon",
]
[[package]]
name = "pyo3-ffi"
version = "0.28.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9dfc1956b709823164763a34cc42bbfd26b8730afa77809a3df8b94a3ae3b059"
dependencies = [
"libc",
"pyo3-build-config",
]
[[package]]
name = "pyo3-macros"
version = "0.28.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "29dc660ad948bae134d579661d08033fbb1918f4529c3bbe3257a68f2009ddf2"
dependencies = [
"proc-macro2",
"pyo3-macros-backend",
"quote",
"syn",
]
[[package]]
name = "pyo3-macros-backend"
version = "0.28.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e78cd6c6d718acfcedf26c3d21fe0f053624368b0d44298c55d7138fde9331f7"
dependencies = [
"heck",
"proc-macro2",
"pyo3-build-config",
"quote",
"syn",
]
[[package]]
name = "quote"
version = "1.0.44"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "21b2ebcf727b7760c461f091f9f0f539b77b8e87f2fd88131e7f1b433b3cece4"
dependencies = [
"proc-macro2",
]
[[package]]
name = "redox_syscall"
version = "0.5.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ed2bf2547551a7053d6fdfafda3f938979645c44812fbfcda098faae3f1a362d"
dependencies = [
"bitflags",
]
[[package]]
name = "regex"
version = "1.12.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e10754a14b9137dd7b1e3e5b0493cc9171fdd105e0ab477f51b72e7f3ac0e276"
dependencies = [
"aho-corasick",
"memchr",
"regex-automata",
"regex-syntax",
]
[[package]]
name = "regex-automata"
version = "0.4.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6e1dd4122fc1595e8162618945476892eefca7b88c52820e74af6262213cae8f"
dependencies = [
"aho-corasick",
"memchr",
"regex-syntax",
]
[[package]]
name = "regex-syntax"
version = "0.8.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a96887878f22d7bad8a3b6dc5b7440e0ada9a245242924394987b21cf2210a4c"
[[package]]
name = "scopeguard"
version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "94143f37725109f92c262ed2cf5e59bce7498c01bcc1502d7b9afe439a4e9f49"
[[package]]
name = "sha2"
version = "0.10.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a7507d819769d01a365ab707794a4084392c824f54a7a6a7862f8c3d0892b283"
dependencies = [
"cfg-if",
"cpufeatures",
"digest",
]
[[package]]
name = "smallvec"
version = "1.15.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "67b1b7a3b5fe4f1376887184045fcf45c69e92af734b7aaddc05fb777b6fbd03"
[[package]]
name = "subtle"
version = "2.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "13c2bddecc57b384dee18652358fb23172facb8a2c51ccc10d74c157bdea3292"
[[package]]
name = "syn"
version = "2.0.116"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3df424c70518695237746f84cede799c9c58fcb37450d7b23716568cc8bc69cb"
dependencies = [
"proc-macro2",
"quote",
"unicode-ident",
]
[[package]]
name = "target-lexicon"
version = "0.13.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "adb6935a6f5c20170eeceb1a3835a49e12e19d792f6dd344ccc76a985ca5a6ca"
[[package]]
name = "tinyvec"
version = "1.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bfa5fdc3bce6191a1dbc8c02d5c8bffcf557bafa17c124c5264a458f1b0613fa"
dependencies = [
"tinyvec_macros",
]
[[package]]
name = "tinyvec_macros"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1f3ccbac311fea05f86f61904b462b55fb3df8837a366dfc601a0161d0532f20"
[[package]]
name = "typenum"
version = "1.19.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "562d481066bde0658276a35467c4af00bdc6ee726305698a55b86e61d7ad82bb"
[[package]]
name = "unicode-ident"
version = "1.0.24"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75"
[[package]]
name = "unicode-normalization"
version = "0.1.25"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5fd4f6878c9cb28d874b009da9e8d183b5abc80117c40bbd187a1fde336be6e8"
dependencies = [
"tinyvec",
]
[[package]]
name = "version_check"
version = "0.9.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
[[package]]
name = "windows-link"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f0805222e57f7521d6a62e36fa9163bc891acd422f971defe97d64e70d0a4fe5"

View File

@@ -14,6 +14,11 @@ sha2 = "0.10"
md-5 = "0.10"
hex = "0.4"
unicode-normalization = "0.1"
serde_json = "1"
regex = "1"
lru = "0.14"
parking_lot = "0.12"
percent-encoding = "2"
aes-gcm = "0.10"
hkdf = "0.12"
uuid = { version = "1", features = ["v4"] }

192
myfsio_core/src/crypto.rs Normal file
View File

@@ -0,0 +1,192 @@
use aes_gcm::aead::Aead;
use aes_gcm::{Aes256Gcm, KeyInit, Nonce};
use hkdf::Hkdf;
use pyo3::exceptions::{PyIOError, PyValueError};
use pyo3::prelude::*;
use sha2::Sha256;
use std::fs::File;
use std::io::{Read, Seek, SeekFrom, Write};
const DEFAULT_CHUNK_SIZE: usize = 65536;
const HEADER_SIZE: usize = 4;
fn read_exact_chunk(reader: &mut impl Read, buf: &mut [u8]) -> std::io::Result<usize> {
let mut filled = 0;
while filled < buf.len() {
match reader.read(&mut buf[filled..]) {
Ok(0) => break,
Ok(n) => filled += n,
Err(ref e) if e.kind() == std::io::ErrorKind::Interrupted => continue,
Err(e) => return Err(e),
}
}
Ok(filled)
}
fn derive_chunk_nonce(base_nonce: &[u8], chunk_index: u32) -> Result<[u8; 12], String> {
let hkdf = Hkdf::<Sha256>::new(Some(base_nonce), b"chunk_nonce");
let mut okm = [0u8; 12];
hkdf.expand(&chunk_index.to_be_bytes(), &mut okm)
.map_err(|e| format!("HKDF expand failed: {}", e))?;
Ok(okm)
}
#[pyfunction]
#[pyo3(signature = (input_path, output_path, key, base_nonce, chunk_size=DEFAULT_CHUNK_SIZE))]
pub fn encrypt_stream_chunked(
py: Python<'_>,
input_path: &str,
output_path: &str,
key: &[u8],
base_nonce: &[u8],
chunk_size: usize,
) -> PyResult<u32> {
if key.len() != 32 {
return Err(PyValueError::new_err(format!(
"Key must be 32 bytes, got {}",
key.len()
)));
}
if base_nonce.len() != 12 {
return Err(PyValueError::new_err(format!(
"Base nonce must be 12 bytes, got {}",
base_nonce.len()
)));
}
let chunk_size = if chunk_size == 0 {
DEFAULT_CHUNK_SIZE
} else {
chunk_size
};
let inp = input_path.to_owned();
let out = output_path.to_owned();
let key_arr: [u8; 32] = key.try_into().unwrap();
let nonce_arr: [u8; 12] = base_nonce.try_into().unwrap();
py.detach(move || {
let cipher = Aes256Gcm::new(&key_arr.into());
let mut infile = File::open(&inp)
.map_err(|e| PyIOError::new_err(format!("Failed to open input: {}", e)))?;
let mut outfile = File::create(&out)
.map_err(|e| PyIOError::new_err(format!("Failed to create output: {}", e)))?;
outfile
.write_all(&[0u8; 4])
.map_err(|e| PyIOError::new_err(format!("Failed to write header: {}", e)))?;
let mut buf = vec![0u8; chunk_size];
let mut chunk_index: u32 = 0;
loop {
let n = read_exact_chunk(&mut infile, &mut buf)
.map_err(|e| PyIOError::new_err(format!("Failed to read: {}", e)))?;
if n == 0 {
break;
}
let nonce_bytes = derive_chunk_nonce(&nonce_arr, chunk_index)
.map_err(|e| PyValueError::new_err(e))?;
let nonce = Nonce::from_slice(&nonce_bytes);
let encrypted = cipher
.encrypt(nonce, &buf[..n])
.map_err(|e| PyValueError::new_err(format!("Encrypt failed: {}", e)))?;
let size = encrypted.len() as u32;
outfile
.write_all(&size.to_be_bytes())
.map_err(|e| PyIOError::new_err(format!("Failed to write chunk size: {}", e)))?;
outfile
.write_all(&encrypted)
.map_err(|e| PyIOError::new_err(format!("Failed to write chunk: {}", e)))?;
chunk_index += 1;
}
outfile
.seek(SeekFrom::Start(0))
.map_err(|e| PyIOError::new_err(format!("Failed to seek: {}", e)))?;
outfile
.write_all(&chunk_index.to_be_bytes())
.map_err(|e| PyIOError::new_err(format!("Failed to write chunk count: {}", e)))?;
Ok(chunk_index)
})
}
#[pyfunction]
pub fn decrypt_stream_chunked(
py: Python<'_>,
input_path: &str,
output_path: &str,
key: &[u8],
base_nonce: &[u8],
) -> PyResult<u32> {
if key.len() != 32 {
return Err(PyValueError::new_err(format!(
"Key must be 32 bytes, got {}",
key.len()
)));
}
if base_nonce.len() != 12 {
return Err(PyValueError::new_err(format!(
"Base nonce must be 12 bytes, got {}",
base_nonce.len()
)));
}
let inp = input_path.to_owned();
let out = output_path.to_owned();
let key_arr: [u8; 32] = key.try_into().unwrap();
let nonce_arr: [u8; 12] = base_nonce.try_into().unwrap();
py.detach(move || {
let cipher = Aes256Gcm::new(&key_arr.into());
let mut infile = File::open(&inp)
.map_err(|e| PyIOError::new_err(format!("Failed to open input: {}", e)))?;
let mut outfile = File::create(&out)
.map_err(|e| PyIOError::new_err(format!("Failed to create output: {}", e)))?;
let mut header = [0u8; HEADER_SIZE];
infile
.read_exact(&mut header)
.map_err(|e| PyIOError::new_err(format!("Failed to read header: {}", e)))?;
let chunk_count = u32::from_be_bytes(header);
let mut size_buf = [0u8; HEADER_SIZE];
for chunk_index in 0..chunk_count {
infile
.read_exact(&mut size_buf)
.map_err(|e| {
PyIOError::new_err(format!(
"Failed to read chunk {} size: {}",
chunk_index, e
))
})?;
let chunk_size = u32::from_be_bytes(size_buf) as usize;
let mut encrypted = vec![0u8; chunk_size];
infile.read_exact(&mut encrypted).map_err(|e| {
PyIOError::new_err(format!("Failed to read chunk {}: {}", chunk_index, e))
})?;
let nonce_bytes = derive_chunk_nonce(&nonce_arr, chunk_index)
.map_err(|e| PyValueError::new_err(e))?;
let nonce = Nonce::from_slice(&nonce_bytes);
let decrypted = cipher.decrypt(nonce, encrypted.as_ref()).map_err(|e| {
PyValueError::new_err(format!("Decrypt chunk {} failed: {}", chunk_index, e))
})?;
outfile.write_all(&decrypted).map_err(|e| {
PyIOError::new_err(format!("Failed to write chunk {}: {}", chunk_index, e))
})?;
}
Ok(chunk_count)
})
}

View File

@@ -1,5 +1,9 @@
mod crypto;
mod hashing;
mod metadata;
mod sigv4;
mod storage;
mod streaming;
mod validation;
use pyo3::prelude::*;
@@ -10,6 +14,7 @@ mod myfsio_core {
#[pymodule_init]
fn init(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(sigv4::verify_sigv4_signature, m)?)?;
m.add_function(wrap_pyfunction!(sigv4::derive_signing_key, m)?)?;
m.add_function(wrap_pyfunction!(sigv4::compute_signature, m)?)?;
m.add_function(wrap_pyfunction!(sigv4::build_string_to_sign, m)?)?;
@@ -25,6 +30,22 @@ mod myfsio_core {
m.add_function(wrap_pyfunction!(validation::validate_object_key, m)?)?;
m.add_function(wrap_pyfunction!(validation::validate_bucket_name, m)?)?;
m.add_function(wrap_pyfunction!(metadata::read_index_entry, m)?)?;
m.add_function(wrap_pyfunction!(storage::write_index_entry, m)?)?;
m.add_function(wrap_pyfunction!(storage::delete_index_entry, m)?)?;
m.add_function(wrap_pyfunction!(storage::check_bucket_contents, m)?)?;
m.add_function(wrap_pyfunction!(storage::shallow_scan, m)?)?;
m.add_function(wrap_pyfunction!(storage::bucket_stats_scan, m)?)?;
m.add_function(wrap_pyfunction!(storage::search_objects_scan, m)?)?;
m.add_function(wrap_pyfunction!(storage::build_object_cache, m)?)?;
m.add_function(wrap_pyfunction!(streaming::stream_to_file_with_md5, m)?)?;
m.add_function(wrap_pyfunction!(streaming::assemble_parts_with_md5, m)?)?;
m.add_function(wrap_pyfunction!(crypto::encrypt_stream_chunked, m)?)?;
m.add_function(wrap_pyfunction!(crypto::decrypt_stream_chunked, m)?)?;
Ok(())
}
}

View File

@@ -0,0 +1,71 @@
use pyo3::exceptions::PyValueError;
use pyo3::prelude::*;
use pyo3::types::{PyDict, PyList, PyString};
use serde_json::Value;
use std::fs;
const MAX_DEPTH: u32 = 64;
fn value_to_py(py: Python<'_>, v: &Value, depth: u32) -> PyResult<Py<PyAny>> {
if depth > MAX_DEPTH {
return Err(PyValueError::new_err("JSON nesting too deep"));
}
match v {
Value::Null => Ok(py.None()),
Value::Bool(b) => Ok((*b).into_pyobject(py)?.to_owned().into_any().unbind()),
Value::Number(n) => {
if let Some(i) = n.as_i64() {
Ok(i.into_pyobject(py)?.into_any().unbind())
} else if let Some(f) = n.as_f64() {
Ok(f.into_pyobject(py)?.into_any().unbind())
} else {
Ok(py.None())
}
}
Value::String(s) => Ok(PyString::new(py, s).into_any().unbind()),
Value::Array(arr) => {
let list = PyList::empty(py);
for item in arr {
list.append(value_to_py(py, item, depth + 1)?)?;
}
Ok(list.into_any().unbind())
}
Value::Object(map) => {
let dict = PyDict::new(py);
for (k, val) in map {
dict.set_item(k, value_to_py(py, val, depth + 1)?)?;
}
Ok(dict.into_any().unbind())
}
}
}
#[pyfunction]
pub fn read_index_entry(
py: Python<'_>,
path: &str,
entry_name: &str,
) -> PyResult<Option<Py<PyAny>>> {
let path_owned = path.to_owned();
let entry_owned = entry_name.to_owned();
let entry: Option<Value> = py.detach(move || -> PyResult<Option<Value>> {
let content = match fs::read_to_string(&path_owned) {
Ok(c) => c,
Err(_) => return Ok(None),
};
let parsed: Value = match serde_json::from_str(&content) {
Ok(v) => v,
Err(_) => return Ok(None),
};
match parsed {
Value::Object(mut map) => Ok(map.remove(&entry_owned)),
_ => Ok(None),
}
})?;
match entry {
Some(val) => Ok(Some(value_to_py(py, &val, 0)?)),
None => Ok(None),
}
}

View File

@@ -1,6 +1,7 @@
use hmac::{Hmac, Mac};
use lru::LruCache;
use parking_lot::Mutex;
use percent_encoding::{percent_encode, AsciiSet, NON_ALPHANUMERIC};
use pyo3::prelude::*;
use sha2::{Digest, Sha256};
use std::num::NonZeroUsize;
@@ -19,14 +20,29 @@ static SIGNING_KEY_CACHE: LazyLock<Mutex<LruCache<(String, String, String, Strin
const CACHE_TTL_SECS: u64 = 60;
const AWS_ENCODE_SET: &AsciiSet = &NON_ALPHANUMERIC
.remove(b'-')
.remove(b'_')
.remove(b'.')
.remove(b'~');
fn hmac_sha256(key: &[u8], msg: &[u8]) -> Vec<u8> {
let mut mac = HmacSha256::new_from_slice(key).expect("HMAC key length is always valid");
mac.update(msg);
mac.finalize().into_bytes().to_vec()
}
#[pyfunction]
pub fn derive_signing_key(
fn sha256_hex(data: &[u8]) -> String {
let mut hasher = Sha256::new();
hasher.update(data);
hex::encode(hasher.finalize())
}
fn aws_uri_encode(input: &str) -> String {
percent_encode(input.as_bytes(), AWS_ENCODE_SET).to_string()
}
fn derive_signing_key_cached(
secret_key: &str,
date_stamp: &str,
region: &str,
@@ -68,18 +84,91 @@ pub fn derive_signing_key(
k_signing
}
fn constant_time_compare_inner(a: &[u8], b: &[u8]) -> bool {
if a.len() != b.len() {
return false;
}
let mut result: u8 = 0;
for (x, y) in a.iter().zip(b.iter()) {
result |= x ^ y;
}
result == 0
}
#[pyfunction]
pub fn verify_sigv4_signature(
method: &str,
canonical_uri: &str,
query_params: Vec<(String, String)>,
signed_headers_str: &str,
header_values: Vec<(String, String)>,
payload_hash: &str,
amz_date: &str,
date_stamp: &str,
region: &str,
service: &str,
secret_key: &str,
provided_signature: &str,
) -> bool {
let mut sorted_params = query_params;
sorted_params.sort_by(|a, b| a.0.cmp(&b.0).then_with(|| a.1.cmp(&b.1)));
let canonical_query_string = sorted_params
.iter()
.map(|(k, v)| format!("{}={}", aws_uri_encode(k), aws_uri_encode(v)))
.collect::<Vec<_>>()
.join("&");
let mut canonical_headers = String::new();
for (name, value) in &header_values {
let lower_name = name.to_lowercase();
let normalized = value.split_whitespace().collect::<Vec<_>>().join(" ");
let final_value = if lower_name == "expect" && normalized.is_empty() {
"100-continue"
} else {
&normalized
};
canonical_headers.push_str(&lower_name);
canonical_headers.push(':');
canonical_headers.push_str(final_value);
canonical_headers.push('\n');
}
let canonical_request = format!(
"{}\n{}\n{}\n{}\n{}\n{}",
method, canonical_uri, canonical_query_string, canonical_headers, signed_headers_str, payload_hash
);
let credential_scope = format!("{}/{}/{}/aws4_request", date_stamp, region, service);
let cr_hash = sha256_hex(canonical_request.as_bytes());
let string_to_sign = format!(
"AWS4-HMAC-SHA256\n{}\n{}\n{}",
amz_date, credential_scope, cr_hash
);
let signing_key = derive_signing_key_cached(secret_key, date_stamp, region, service);
let calculated = hmac_sha256(&signing_key, string_to_sign.as_bytes());
let calculated_hex = hex::encode(&calculated);
constant_time_compare_inner(calculated_hex.as_bytes(), provided_signature.as_bytes())
}
#[pyfunction]
pub fn derive_signing_key(
secret_key: &str,
date_stamp: &str,
region: &str,
service: &str,
) -> Vec<u8> {
derive_signing_key_cached(secret_key, date_stamp, region, service)
}
#[pyfunction]
pub fn compute_signature(signing_key: &[u8], string_to_sign: &str) -> String {
let sig = hmac_sha256(signing_key, string_to_sign.as_bytes());
hex::encode(sig)
}
fn sha256_hex(data: &[u8]) -> String {
let mut hasher = Sha256::new();
hasher.update(data);
hex::encode(hasher.finalize())
}
#[pyfunction]
pub fn build_string_to_sign(
amz_date: &str,
@@ -87,19 +176,15 @@ pub fn build_string_to_sign(
canonical_request: &str,
) -> String {
let cr_hash = sha256_hex(canonical_request.as_bytes());
format!("AWS4-HMAC-SHA256\n{}\n{}\n{}", amz_date, credential_scope, cr_hash)
format!(
"AWS4-HMAC-SHA256\n{}\n{}\n{}",
amz_date, credential_scope, cr_hash
)
}
#[pyfunction]
pub fn constant_time_compare(a: &str, b: &str) -> bool {
if a.len() != b.len() {
return false;
}
let mut result: u8 = 0;
for (x, y) in a.bytes().zip(b.bytes()) {
result |= x ^ y;
}
result == 0
constant_time_compare_inner(a.as_bytes(), b.as_bytes())
}
#[pyfunction]

817
myfsio_core/src/storage.rs Normal file
View File

@@ -0,0 +1,817 @@
use pyo3::exceptions::PyIOError;
use pyo3::prelude::*;
use pyo3::types::{PyDict, PyList, PyString, PyTuple};
use serde_json::Value;
use std::collections::HashMap;
use std::fs;
use std::path::Path;
use std::time::SystemTime;
const INTERNAL_FOLDERS: &[&str] = &[".meta", ".versions", ".multipart"];
fn system_time_to_epoch(t: SystemTime) -> f64 {
t.duration_since(std::time::UNIX_EPOCH)
.map(|d| d.as_secs_f64())
.unwrap_or(0.0)
}
fn extract_etag_from_meta_bytes(content: &[u8]) -> Option<String> {
let marker = b"\"__etag__\"";
let idx = content.windows(marker.len()).position(|w| w == marker)?;
let after = &content[idx + marker.len()..];
let start = after.iter().position(|&b| b == b'"')? + 1;
let rest = &after[start..];
let end = rest.iter().position(|&b| b == b'"')?;
std::str::from_utf8(&rest[..end]).ok().map(|s| s.to_owned())
}
fn has_any_file(root: &str) -> bool {
let root_path = Path::new(root);
if !root_path.is_dir() {
return false;
}
let mut stack = vec![root_path.to_path_buf()];
while let Some(current) = stack.pop() {
let entries = match fs::read_dir(&current) {
Ok(e) => e,
Err(_) => continue,
};
for entry_result in entries {
let entry = match entry_result {
Ok(e) => e,
Err(_) => continue,
};
let ft = match entry.file_type() {
Ok(ft) => ft,
Err(_) => continue,
};
if ft.is_file() {
return true;
}
if ft.is_dir() && !ft.is_symlink() {
stack.push(entry.path());
}
}
}
false
}
#[pyfunction]
pub fn write_index_entry(
py: Python<'_>,
path: &str,
entry_name: &str,
entry_data_json: &str,
) -> PyResult<()> {
let path_owned = path.to_owned();
let entry_owned = entry_name.to_owned();
let data_owned = entry_data_json.to_owned();
py.detach(move || -> PyResult<()> {
let entry_value: Value = serde_json::from_str(&data_owned)
.map_err(|e| PyIOError::new_err(format!("Failed to parse entry data: {}", e)))?;
if let Some(parent) = Path::new(&path_owned).parent() {
let _ = fs::create_dir_all(parent);
}
let mut index_data: serde_json::Map<String, Value> = match fs::read_to_string(&path_owned)
{
Ok(content) => serde_json::from_str(&content).unwrap_or_default(),
Err(_) => serde_json::Map::new(),
};
index_data.insert(entry_owned, entry_value);
let serialized = serde_json::to_string(&Value::Object(index_data))
.map_err(|e| PyIOError::new_err(format!("Failed to serialize index: {}", e)))?;
fs::write(&path_owned, serialized)
.map_err(|e| PyIOError::new_err(format!("Failed to write index: {}", e)))?;
Ok(())
})
}
#[pyfunction]
pub fn delete_index_entry(py: Python<'_>, path: &str, entry_name: &str) -> PyResult<bool> {
let path_owned = path.to_owned();
let entry_owned = entry_name.to_owned();
py.detach(move || -> PyResult<bool> {
let content = match fs::read_to_string(&path_owned) {
Ok(c) => c,
Err(_) => return Ok(false),
};
let mut index_data: serde_json::Map<String, Value> =
match serde_json::from_str(&content) {
Ok(v) => v,
Err(_) => return Ok(false),
};
if index_data.remove(&entry_owned).is_none() {
return Ok(false);
}
if index_data.is_empty() {
let _ = fs::remove_file(&path_owned);
return Ok(true);
}
let serialized = serde_json::to_string(&Value::Object(index_data))
.map_err(|e| PyIOError::new_err(format!("Failed to serialize index: {}", e)))?;
fs::write(&path_owned, serialized)
.map_err(|e| PyIOError::new_err(format!("Failed to write index: {}", e)))?;
Ok(false)
})
}
#[pyfunction]
pub fn check_bucket_contents(
py: Python<'_>,
bucket_path: &str,
version_roots: Vec<String>,
multipart_roots: Vec<String>,
) -> PyResult<(bool, bool, bool)> {
let bucket_owned = bucket_path.to_owned();
py.detach(move || -> PyResult<(bool, bool, bool)> {
let mut has_objects = false;
let bucket_p = Path::new(&bucket_owned);
if bucket_p.is_dir() {
let mut stack = vec![bucket_p.to_path_buf()];
'obj_scan: while let Some(current) = stack.pop() {
let is_root = current == bucket_p;
let entries = match fs::read_dir(&current) {
Ok(e) => e,
Err(_) => continue,
};
for entry_result in entries {
let entry = match entry_result {
Ok(e) => e,
Err(_) => continue,
};
let ft = match entry.file_type() {
Ok(ft) => ft,
Err(_) => continue,
};
if is_root {
if let Some(name) = entry.file_name().to_str() {
if INTERNAL_FOLDERS.contains(&name) {
continue;
}
}
}
if ft.is_file() && !ft.is_symlink() {
has_objects = true;
break 'obj_scan;
}
if ft.is_dir() && !ft.is_symlink() {
stack.push(entry.path());
}
}
}
}
let mut has_versions = false;
for root in &version_roots {
if has_versions {
break;
}
has_versions = has_any_file(root);
}
let mut has_multipart = false;
for root in &multipart_roots {
if has_multipart {
break;
}
has_multipart = has_any_file(root);
}
Ok((has_objects, has_versions, has_multipart))
})
}
#[pyfunction]
pub fn shallow_scan(
py: Python<'_>,
target_dir: &str,
prefix: &str,
meta_cache_json: &str,
) -> PyResult<Py<PyAny>> {
let target_owned = target_dir.to_owned();
let prefix_owned = prefix.to_owned();
let cache_owned = meta_cache_json.to_owned();
let result: (
Vec<(String, u64, f64, Option<String>)>,
Vec<String>,
Vec<(String, bool)>,
) = py.detach(move || -> PyResult<(
Vec<(String, u64, f64, Option<String>)>,
Vec<String>,
Vec<(String, bool)>,
)> {
let meta_cache: HashMap<String, String> =
serde_json::from_str(&cache_owned).unwrap_or_default();
let mut files: Vec<(String, u64, f64, Option<String>)> = Vec::new();
let mut dirs: Vec<String> = Vec::new();
let entries = match fs::read_dir(&target_owned) {
Ok(e) => e,
Err(_) => return Ok((files, dirs, Vec::new())),
};
for entry_result in entries {
let entry = match entry_result {
Ok(e) => e,
Err(_) => continue,
};
let name = match entry.file_name().into_string() {
Ok(n) => n,
Err(_) => continue,
};
if INTERNAL_FOLDERS.contains(&name.as_str()) {
continue;
}
let ft = match entry.file_type() {
Ok(ft) => ft,
Err(_) => continue,
};
if ft.is_dir() && !ft.is_symlink() {
let cp = format!("{}{}/", prefix_owned, name);
dirs.push(cp);
} else if ft.is_file() && !ft.is_symlink() {
let key = format!("{}{}", prefix_owned, name);
let md = match entry.metadata() {
Ok(m) => m,
Err(_) => continue,
};
let size = md.len();
let mtime = md
.modified()
.map(system_time_to_epoch)
.unwrap_or(0.0);
let etag = meta_cache.get(&key).cloned();
files.push((key, size, mtime, etag));
}
}
files.sort_by(|a, b| a.0.cmp(&b.0));
dirs.sort();
let mut merged: Vec<(String, bool)> = Vec::with_capacity(files.len() + dirs.len());
let mut fi = 0;
let mut di = 0;
while fi < files.len() && di < dirs.len() {
if files[fi].0 <= dirs[di] {
merged.push((files[fi].0.clone(), false));
fi += 1;
} else {
merged.push((dirs[di].clone(), true));
di += 1;
}
}
while fi < files.len() {
merged.push((files[fi].0.clone(), false));
fi += 1;
}
while di < dirs.len() {
merged.push((dirs[di].clone(), true));
di += 1;
}
Ok((files, dirs, merged))
})?;
let (files, dirs, merged) = result;
let dict = PyDict::new(py);
let files_list = PyList::empty(py);
for (key, size, mtime, etag) in &files {
let etag_py: Py<PyAny> = match etag {
Some(e) => PyString::new(py, e).into_any().unbind(),
None => py.None(),
};
let tuple = PyTuple::new(py, &[
PyString::new(py, key).into_any().unbind(),
size.into_pyobject(py)?.into_any().unbind(),
mtime.into_pyobject(py)?.into_any().unbind(),
etag_py,
])?;
files_list.append(tuple)?;
}
dict.set_item("files", files_list)?;
let dirs_list = PyList::empty(py);
for d in &dirs {
dirs_list.append(PyString::new(py, d))?;
}
dict.set_item("dirs", dirs_list)?;
let merged_list = PyList::empty(py);
for (key, is_dir) in &merged {
let bool_obj: Py<PyAny> = if *is_dir {
true.into_pyobject(py)?.to_owned().into_any().unbind()
} else {
false.into_pyobject(py)?.to_owned().into_any().unbind()
};
let tuple = PyTuple::new(py, &[
PyString::new(py, key).into_any().unbind(),
bool_obj,
])?;
merged_list.append(tuple)?;
}
dict.set_item("merged_keys", merged_list)?;
Ok(dict.into_any().unbind())
}
#[pyfunction]
pub fn bucket_stats_scan(
py: Python<'_>,
bucket_path: &str,
versions_root: &str,
) -> PyResult<(u64, u64, u64, u64)> {
let bucket_owned = bucket_path.to_owned();
let versions_owned = versions_root.to_owned();
py.detach(move || -> PyResult<(u64, u64, u64, u64)> {
let mut object_count: u64 = 0;
let mut total_bytes: u64 = 0;
let bucket_p = Path::new(&bucket_owned);
if bucket_p.is_dir() {
let mut stack = vec![bucket_p.to_path_buf()];
while let Some(current) = stack.pop() {
let is_root = current == bucket_p;
let entries = match fs::read_dir(&current) {
Ok(e) => e,
Err(_) => continue,
};
for entry_result in entries {
let entry = match entry_result {
Ok(e) => e,
Err(_) => continue,
};
if is_root {
if let Some(name) = entry.file_name().to_str() {
if INTERNAL_FOLDERS.contains(&name) {
continue;
}
}
}
let ft = match entry.file_type() {
Ok(ft) => ft,
Err(_) => continue,
};
if ft.is_dir() && !ft.is_symlink() {
stack.push(entry.path());
} else if ft.is_file() && !ft.is_symlink() {
object_count += 1;
if let Ok(md) = entry.metadata() {
total_bytes += md.len();
}
}
}
}
}
let mut version_count: u64 = 0;
let mut version_bytes: u64 = 0;
let versions_p = Path::new(&versions_owned);
if versions_p.is_dir() {
let mut stack = vec![versions_p.to_path_buf()];
while let Some(current) = stack.pop() {
let entries = match fs::read_dir(&current) {
Ok(e) => e,
Err(_) => continue,
};
for entry_result in entries {
let entry = match entry_result {
Ok(e) => e,
Err(_) => continue,
};
let ft = match entry.file_type() {
Ok(ft) => ft,
Err(_) => continue,
};
if ft.is_dir() && !ft.is_symlink() {
stack.push(entry.path());
} else if ft.is_file() && !ft.is_symlink() {
if let Some(name) = entry.file_name().to_str() {
if name.ends_with(".bin") {
version_count += 1;
if let Ok(md) = entry.metadata() {
version_bytes += md.len();
}
}
}
}
}
}
}
Ok((object_count, total_bytes, version_count, version_bytes))
})
}
#[pyfunction]
#[pyo3(signature = (bucket_path, search_root, query, limit))]
pub fn search_objects_scan(
py: Python<'_>,
bucket_path: &str,
search_root: &str,
query: &str,
limit: usize,
) -> PyResult<Py<PyAny>> {
let bucket_owned = bucket_path.to_owned();
let search_owned = search_root.to_owned();
let query_owned = query.to_owned();
let result: (Vec<(String, u64, f64)>, bool) = py.detach(
move || -> PyResult<(Vec<(String, u64, f64)>, bool)> {
let query_lower = query_owned.to_lowercase();
let bucket_len = bucket_owned.len() + 1;
let scan_limit = limit * 4;
let mut matched: usize = 0;
let mut results: Vec<(String, u64, f64)> = Vec::new();
let search_p = Path::new(&search_owned);
if !search_p.is_dir() {
return Ok((results, false));
}
let bucket_p = Path::new(&bucket_owned);
let mut stack = vec![search_p.to_path_buf()];
'scan: while let Some(current) = stack.pop() {
let is_bucket_root = current == bucket_p;
let entries = match fs::read_dir(&current) {
Ok(e) => e,
Err(_) => continue,
};
for entry_result in entries {
let entry = match entry_result {
Ok(e) => e,
Err(_) => continue,
};
if is_bucket_root {
if let Some(name) = entry.file_name().to_str() {
if INTERNAL_FOLDERS.contains(&name) {
continue;
}
}
}
let ft = match entry.file_type() {
Ok(ft) => ft,
Err(_) => continue,
};
if ft.is_dir() && !ft.is_symlink() {
stack.push(entry.path());
} else if ft.is_file() && !ft.is_symlink() {
let full_path = entry.path();
let full_str = full_path.to_string_lossy();
if full_str.len() <= bucket_len {
continue;
}
let key = full_str[bucket_len..].replace('\\', "/");
if key.to_lowercase().contains(&query_lower) {
if let Ok(md) = entry.metadata() {
let size = md.len();
let mtime = md
.modified()
.map(system_time_to_epoch)
.unwrap_or(0.0);
results.push((key, size, mtime));
matched += 1;
}
}
if matched >= scan_limit {
break 'scan;
}
}
}
}
results.sort_by(|a, b| a.0.cmp(&b.0));
let truncated = results.len() > limit;
results.truncate(limit);
Ok((results, truncated))
},
)?;
let (results, truncated) = result;
let dict = PyDict::new(py);
let results_list = PyList::empty(py);
for (key, size, mtime) in &results {
let tuple = PyTuple::new(py, &[
PyString::new(py, key).into_any().unbind(),
size.into_pyobject(py)?.into_any().unbind(),
mtime.into_pyobject(py)?.into_any().unbind(),
])?;
results_list.append(tuple)?;
}
dict.set_item("results", results_list)?;
dict.set_item("truncated", truncated)?;
Ok(dict.into_any().unbind())
}
#[pyfunction]
pub fn build_object_cache(
py: Python<'_>,
bucket_path: &str,
meta_root: &str,
etag_index_path: &str,
) -> PyResult<Py<PyAny>> {
let bucket_owned = bucket_path.to_owned();
let meta_owned = meta_root.to_owned();
let index_path_owned = etag_index_path.to_owned();
let result: (HashMap<String, String>, Vec<(String, u64, f64, Option<String>)>, bool) =
py.detach(move || -> PyResult<(
HashMap<String, String>,
Vec<(String, u64, f64, Option<String>)>,
bool,
)> {
let mut meta_cache: HashMap<String, String> = HashMap::new();
let mut index_mtime: f64 = 0.0;
let mut etag_cache_changed = false;
let index_p = Path::new(&index_path_owned);
if index_p.is_file() {
if let Ok(md) = fs::metadata(&index_path_owned) {
index_mtime = md
.modified()
.map(system_time_to_epoch)
.unwrap_or(0.0);
}
if let Ok(content) = fs::read_to_string(&index_path_owned) {
if let Ok(parsed) = serde_json::from_str::<HashMap<String, String>>(&content) {
meta_cache = parsed;
}
}
}
let meta_p = Path::new(&meta_owned);
let mut needs_rebuild = false;
if meta_p.is_dir() && index_mtime > 0.0 {
fn check_newer(dir: &Path, index_mtime: f64) -> bool {
let entries = match fs::read_dir(dir) {
Ok(e) => e,
Err(_) => return false,
};
for entry_result in entries {
let entry = match entry_result {
Ok(e) => e,
Err(_) => continue,
};
let ft = match entry.file_type() {
Ok(ft) => ft,
Err(_) => continue,
};
if ft.is_dir() && !ft.is_symlink() {
if check_newer(&entry.path(), index_mtime) {
return true;
}
} else if ft.is_file() {
if let Some(name) = entry.file_name().to_str() {
if name.ends_with(".meta.json") || name == "_index.json" {
if let Ok(md) = entry.metadata() {
let mt = md
.modified()
.map(system_time_to_epoch)
.unwrap_or(0.0);
if mt > index_mtime {
return true;
}
}
}
}
}
}
false
}
needs_rebuild = check_newer(meta_p, index_mtime);
} else if meta_cache.is_empty() {
needs_rebuild = true;
}
if needs_rebuild && meta_p.is_dir() {
let meta_str = meta_owned.clone();
let meta_len = meta_str.len() + 1;
let mut index_files: Vec<String> = Vec::new();
let mut legacy_meta_files: Vec<(String, String)> = Vec::new();
fn collect_meta(
dir: &Path,
meta_len: usize,
index_files: &mut Vec<String>,
legacy_meta_files: &mut Vec<(String, String)>,
) {
let entries = match fs::read_dir(dir) {
Ok(e) => e,
Err(_) => return,
};
for entry_result in entries {
let entry = match entry_result {
Ok(e) => e,
Err(_) => continue,
};
let ft = match entry.file_type() {
Ok(ft) => ft,
Err(_) => continue,
};
if ft.is_dir() && !ft.is_symlink() {
collect_meta(&entry.path(), meta_len, index_files, legacy_meta_files);
} else if ft.is_file() {
if let Some(name) = entry.file_name().to_str() {
let full = entry.path().to_string_lossy().to_string();
if name == "_index.json" {
index_files.push(full);
} else if name.ends_with(".meta.json") {
if full.len() > meta_len {
let rel = &full[meta_len..];
let key = if rel.len() > 10 {
rel[..rel.len() - 10].replace('\\', "/")
} else {
continue;
};
legacy_meta_files.push((key, full));
}
}
}
}
}
}
collect_meta(
meta_p,
meta_len,
&mut index_files,
&mut legacy_meta_files,
);
meta_cache.clear();
for idx_path in &index_files {
if let Ok(content) = fs::read_to_string(idx_path) {
if let Ok(idx_data) = serde_json::from_str::<HashMap<String, Value>>(&content) {
let rel_dir = if idx_path.len() > meta_len {
let r = &idx_path[meta_len..];
r.replace('\\', "/")
} else {
String::new()
};
let dir_prefix = if rel_dir.ends_with("/_index.json") {
&rel_dir[..rel_dir.len() - "/_index.json".len()]
} else {
""
};
for (entry_name, entry_data) in &idx_data {
let key = if dir_prefix.is_empty() {
entry_name.clone()
} else {
format!("{}/{}", dir_prefix, entry_name)
};
if let Some(meta_obj) = entry_data.get("metadata") {
if let Some(etag) = meta_obj.get("__etag__") {
if let Some(etag_str) = etag.as_str() {
meta_cache.insert(key, etag_str.to_owned());
}
}
}
}
}
}
}
for (key, path) in &legacy_meta_files {
if meta_cache.contains_key(key) {
continue;
}
if let Ok(content) = fs::read(path) {
if let Some(etag) = extract_etag_from_meta_bytes(&content) {
meta_cache.insert(key.clone(), etag);
}
}
}
etag_cache_changed = true;
}
let bucket_p = Path::new(&bucket_owned);
let bucket_len = bucket_owned.len() + 1;
let mut objects: Vec<(String, u64, f64, Option<String>)> = Vec::new();
if bucket_p.is_dir() {
let mut stack = vec![bucket_p.to_path_buf()];
while let Some(current) = stack.pop() {
let entries = match fs::read_dir(&current) {
Ok(e) => e,
Err(_) => continue,
};
for entry_result in entries {
let entry = match entry_result {
Ok(e) => e,
Err(_) => continue,
};
let ft = match entry.file_type() {
Ok(ft) => ft,
Err(_) => continue,
};
if ft.is_dir() && !ft.is_symlink() {
let full = entry.path();
let full_str = full.to_string_lossy();
if full_str.len() > bucket_len {
let first_part: &str = if let Some(sep_pos) =
full_str[bucket_len..].find(|c: char| c == '\\' || c == '/')
{
&full_str[bucket_len..bucket_len + sep_pos]
} else {
&full_str[bucket_len..]
};
if INTERNAL_FOLDERS.contains(&first_part) {
continue;
}
} else if let Some(name) = entry.file_name().to_str() {
if INTERNAL_FOLDERS.contains(&name) {
continue;
}
}
stack.push(full);
} else if ft.is_file() && !ft.is_symlink() {
let full = entry.path();
let full_str = full.to_string_lossy();
if full_str.len() <= bucket_len {
continue;
}
let rel = &full_str[bucket_len..];
let first_part: &str =
if let Some(sep_pos) = rel.find(|c: char| c == '\\' || c == '/') {
&rel[..sep_pos]
} else {
rel
};
if INTERNAL_FOLDERS.contains(&first_part) {
continue;
}
let key = rel.replace('\\', "/");
if let Ok(md) = entry.metadata() {
let size = md.len();
let mtime = md
.modified()
.map(system_time_to_epoch)
.unwrap_or(0.0);
let etag = meta_cache.get(&key).cloned();
objects.push((key, size, mtime, etag));
}
}
}
}
}
Ok((meta_cache, objects, etag_cache_changed))
})?;
let (meta_cache, objects, etag_cache_changed) = result;
let dict = PyDict::new(py);
let cache_dict = PyDict::new(py);
for (k, v) in &meta_cache {
cache_dict.set_item(k, v)?;
}
dict.set_item("etag_cache", cache_dict)?;
let objects_list = PyList::empty(py);
for (key, size, mtime, etag) in &objects {
let etag_py: Py<PyAny> = match etag {
Some(e) => PyString::new(py, e).into_any().unbind(),
None => py.None(),
};
let tuple = PyTuple::new(py, &[
PyString::new(py, key).into_any().unbind(),
size.into_pyobject(py)?.into_any().unbind(),
mtime.into_pyobject(py)?.into_any().unbind(),
etag_py,
])?;
objects_list.append(tuple)?;
}
dict.set_item("objects", objects_list)?;
dict.set_item("etag_cache_changed", etag_cache_changed)?;
Ok(dict.into_any().unbind())
}

View File

@@ -0,0 +1,112 @@
use md5::{Digest, Md5};
use pyo3::exceptions::{PyIOError, PyValueError};
use pyo3::prelude::*;
use std::fs::{self, File};
use std::io::{Read, Write};
use uuid::Uuid;
const DEFAULT_CHUNK_SIZE: usize = 262144;
#[pyfunction]
#[pyo3(signature = (stream, tmp_dir, chunk_size=DEFAULT_CHUNK_SIZE))]
pub fn stream_to_file_with_md5(
py: Python<'_>,
stream: &Bound<'_, PyAny>,
tmp_dir: &str,
chunk_size: usize,
) -> PyResult<(String, String, u64)> {
let chunk_size = if chunk_size == 0 {
DEFAULT_CHUNK_SIZE
} else {
chunk_size
};
fs::create_dir_all(tmp_dir)
.map_err(|e| PyIOError::new_err(format!("Failed to create tmp dir: {}", e)))?;
let tmp_name = format!("{}.tmp", Uuid::new_v4().as_hyphenated());
let tmp_path_buf = std::path::PathBuf::from(tmp_dir).join(&tmp_name);
let tmp_path = tmp_path_buf.to_string_lossy().into_owned();
let mut file = File::create(&tmp_path)
.map_err(|e| PyIOError::new_err(format!("Failed to create temp file: {}", e)))?;
let mut hasher = Md5::new();
let mut total_bytes: u64 = 0;
let result: PyResult<()> = (|| {
loop {
let chunk: Vec<u8> = stream.call_method1("read", (chunk_size,))?.extract()?;
if chunk.is_empty() {
break;
}
hasher.update(&chunk);
file.write_all(&chunk)
.map_err(|e| PyIOError::new_err(format!("Failed to write: {}", e)))?;
total_bytes += chunk.len() as u64;
py.check_signals()?;
}
file.sync_all()
.map_err(|e| PyIOError::new_err(format!("Failed to fsync: {}", e)))?;
Ok(())
})();
if let Err(e) = result {
drop(file);
let _ = fs::remove_file(&tmp_path);
return Err(e);
}
drop(file);
let md5_hex = format!("{:x}", hasher.finalize());
Ok((tmp_path, md5_hex, total_bytes))
}
#[pyfunction]
pub fn assemble_parts_with_md5(
py: Python<'_>,
part_paths: Vec<String>,
dest_path: &str,
) -> PyResult<String> {
if part_paths.is_empty() {
return Err(PyValueError::new_err("No parts to assemble"));
}
let dest = dest_path.to_owned();
let parts = part_paths;
py.detach(move || {
if let Some(parent) = std::path::Path::new(&dest).parent() {
fs::create_dir_all(parent)
.map_err(|e| PyIOError::new_err(format!("Failed to create dest dir: {}", e)))?;
}
let mut target = File::create(&dest)
.map_err(|e| PyIOError::new_err(format!("Failed to create dest file: {}", e)))?;
let mut hasher = Md5::new();
let mut buf = vec![0u8; 1024 * 1024];
for part_path in &parts {
let mut part = File::open(part_path)
.map_err(|e| PyIOError::new_err(format!("Failed to open part {}: {}", part_path, e)))?;
loop {
let n = part
.read(&mut buf)
.map_err(|e| PyIOError::new_err(format!("Failed to read part: {}", e)))?;
if n == 0 {
break;
}
hasher.update(&buf[..n]);
target
.write_all(&buf[..n])
.map_err(|e| PyIOError::new_err(format!("Failed to write: {}", e)))?;
}
}
target.sync_all()
.map_err(|e| PyIOError::new_err(format!("Failed to fsync: {}", e)))?;
Ok(format!("{:x}", hasher.finalize()))
})
}

View File

@@ -1 +0,0 @@
{"rustc_fingerprint":13172970000770725120,"outputs":{"7971740275564407648":{"success":true,"status":"","code":0,"stdout":"___.exe\nlib___.rlib\n___.dll\n___.dll\n___.lib\n___.dll\nC:\\Users\\jun\\.rustup\\toolchains\\stable-x86_64-pc-windows-msvc\npacked\n___\ndebug_assertions\npanic=\"unwind\"\nproc_macro\ntarget_abi=\"\"\ntarget_arch=\"x86_64\"\ntarget_endian=\"little\"\ntarget_env=\"msvc\"\ntarget_family=\"windows\"\ntarget_feature=\"cmpxchg16b\"\ntarget_feature=\"fxsr\"\ntarget_feature=\"sse\"\ntarget_feature=\"sse2\"\ntarget_feature=\"sse3\"\ntarget_has_atomic=\"128\"\ntarget_has_atomic=\"16\"\ntarget_has_atomic=\"32\"\ntarget_has_atomic=\"64\"\ntarget_has_atomic=\"8\"\ntarget_has_atomic=\"ptr\"\ntarget_os=\"windows\"\ntarget_pointer_width=\"64\"\ntarget_vendor=\"pc\"\nwindows\n","stderr":""},"17747080675513052775":{"success":true,"status":"","code":0,"stdout":"rustc 1.93.1 (01f6ddf75 2026-02-11)\nbinary: rustc\ncommit-hash: 01f6ddf7588f42ae2d7eb0a2f21d44e8e96674cf\ncommit-date: 2026-02-11\nhost: x86_64-pc-windows-msvc\nrelease: 1.93.1\nLLVM version: 21.1.8\n","stderr":""}},"successes":{}}

View File

@@ -1,3 +0,0 @@
Signature: 8a477f597d28d172789f06886806bc55
# This file is a cache directory tag created by cargo.
# For information about cache directory tags see https://bford.info/cachedir/

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[\"perf-literal\", \"std\"]","declared_features":"[\"default\", \"logging\", \"perf-literal\", \"std\"]","target":7534583537114156500,"profile":2040997289075261528,"path":6364296192483896971,"deps":[[1363051979936526615,"memchr",false,11090220145123168660]],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\aho-corasick-45694771b543be75\\dep-lib-aho_corasick","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[\"alloc\"]","declared_features":"[\"alloc\", \"default\", \"fresh-rust\", \"nightly\", \"serde\", \"std\"]","target":5388200169723499962,"profile":4067574213046180398,"path":10654049299693593327,"deps":[],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\allocator-api2-db7934dbe96de5b4\\dep-lib-allocator_api2","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[]","declared_features":"[]","target":6962977057026645649,"profile":1369601567987815722,"path":9853093265219907461,"deps":[],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\autocfg-1c4fb7a37cc3df69\\dep-lib-autocfg","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[]","declared_features":"[]","target":4098124618827574291,"profile":2040997289075261528,"path":3658007358608479489,"deps":[[10520923840501062997,"generic_array",false,11555283918993371487]],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\block-buffer-95b0ac364bec72f9\\dep-lib-block_buffer","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[]","declared_features":"[\"core\", \"rustc-dep-of-std\"]","target":13840298032947503755,"profile":2040997289075261528,"path":4093486168504982869,"deps":[],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\cfg-if-be2711f84a777e73\\dep-lib-cfg_if","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[]","declared_features":"[]","target":2330704043955282025,"profile":2040997289075261528,"path":13200428550696548327,"deps":[],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\cpufeatures-980094f8735c42d1\\dep-lib-cpufeatures","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[\"std\"]","declared_features":"[\"getrandom\", \"rand_core\", \"std\"]","target":12082577455412410174,"profile":2040997289075261528,"path":14902376638882023040,"deps":[[857979250431893282,"typenum",false,7416411392359930020],[10520923840501062997,"generic_array",false,11555283918993371487]],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\crypto-common-289a508abdda3048\\dep-lib-crypto_common","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[\"alloc\", \"block-buffer\", \"core-api\", \"default\", \"mac\", \"std\", \"subtle\"]","declared_features":"[\"alloc\", \"blobby\", \"block-buffer\", \"const-oid\", \"core-api\", \"default\", \"dev\", \"mac\", \"oid\", \"rand_core\", \"std\", \"subtle\"]","target":7510122432137863311,"profile":2040997289075261528,"path":11503432597517024930,"deps":[[6039282458970808711,"crypto_common",false,11252724541433210505],[10626340395483396037,"block_buffer",false,17139625223017709343],[17003143334332120809,"subtle",false,8597342066671925934]],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\digest-a91458bfa5613332\\dep-lib-digest","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[]","declared_features":"[]","target":1524667692659508025,"profile":2040997289075261528,"path":17534356223679657546,"deps":[],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\equivalent-943ac856871c0988\\dep-lib-equivalent","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[]","declared_features":"[\"default\", \"std\"]","target":18077926938045032029,"profile":2040997289075261528,"path":9869209539952544870,"deps":[],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\foldhash-b8a92f8c10d550f7\\dep-lib-foldhash","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[\"more_lengths\"]","declared_features":"[\"more_lengths\", \"serde\", \"zeroize\"]","target":12318548087768197662,"profile":1369601567987815722,"path":13853454403963664247,"deps":[[5398981501050481332,"version_check",false,16419025953046340415]],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\generic-array-2462daa120fe5936\\dep-build-script-build-script-build","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[\"more_lengths\"]","declared_features":"[\"more_lengths\", \"serde\", \"zeroize\"]","target":13084005262763373425,"profile":2040997289075261528,"path":12463275850883329568,"deps":[[857979250431893282,"typenum",false,7416411392359930020],[10520923840501062997,"build_script_build",false,16977603856295925732]],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\generic-array-62216349963f3a3c\\dep-lib-generic_array","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"","declared_features":"","target":0,"profile":0,"path":0,"deps":[[10520923840501062997,"build_script_build",false,464306762232604144]],"local":[{"Precalculated":"0.14.7"}],"rustflags":[],"config":0,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[\"allocator-api2\", \"default\", \"default-hasher\", \"equivalent\", \"inline-more\", \"raw-entry\"]","declared_features":"[\"alloc\", \"allocator-api2\", \"core\", \"default\", \"default-hasher\", \"equivalent\", \"inline-more\", \"nightly\", \"raw-entry\", \"rayon\", \"rustc-dep-of-std\", \"rustc-internal-api\", \"serde\"]","target":13796197676120832388,"profile":2040997289075261528,"path":12448322139402656924,"deps":[[5230392855116717286,"equivalent",false,6042941999404782907],[9150530836556604396,"allocator_api2",false,16398368410642502979],[10842263908529601448,"foldhash",false,10953695263156452023]],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\hashbrown-510d641b592c306b\\dep-lib-hashbrown","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[]","declared_features":"[]","target":17886154901722686619,"profile":1369601567987815722,"path":8608102977929876445,"deps":[],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\heck-b47c94fd2a7e00cb\\dep-lib-heck","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
41890ebff4143fa5

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[\"alloc\", \"default\", \"std\"]","declared_features":"[\"alloc\", \"default\", \"serde\", \"std\"]","target":4242469766639956503,"profile":2040997289075261528,"path":6793865871540733919,"deps":[],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\hex-253414d2260adcdf\\dep-lib-hex","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[]","declared_features":"[\"reset\", \"std\"]","target":12991177224612424488,"profile":2040997289075261528,"path":17893893568771568113,"deps":[[17475753849556516473,"digest",false,15621022965039188625]],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\hmac-3297e61b9effb758\\dep-lib-hmac","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

View File

@@ -1 +0,0 @@
{"rustc":8323788817864214825,"features":"[]","declared_features":"[]","target":8726396592336845528,"profile":1369601567987815722,"path":18304219166357541938,"deps":[],"local":[{"CheckDepInfo":{"dep_info":"release\\.fingerprint\\indoc-0c686c3f403a2566\\dep-lib-indoc","checksum":false}}],"rustflags":[],"config":2069994364910194474,"compile_kind":0}

View File

@@ -1 +0,0 @@
This file has an mtime of when this was started.

Some files were not shown because too many files have changed in this diff Show More