commit cdc4a81e3ef787589bcddd60aa01506cb72f14e1 Author: kqjy Date: Wed Dec 3 04:25:42 2025 +0000 Add Home diff --git a/Home.md b/Home.md new file mode 100644 index 0000000..ef044b9 --- /dev/null +++ b/Home.md @@ -0,0 +1,585 @@ +# MyFSIO Documentation + +This document expands on the README to describe the full workflow for running, configuring, and extending MyFSIO. Use it as a playbook for local S3-style experimentation. + +## 1. System Overview + +MyFSIO ships two Flask entrypoints that share the same storage, IAM, and bucket-policy state: + +- **API server** – Implements the S3-compatible REST API, policy evaluation, and Signature Version 4 presign service. +- **UI server** – Provides the browser console for buckets, IAM, and policies. It proxies to the API for presign operations. + +Both servers read `AppConfig`, so editing JSON stores on disk instantly affects both surfaces. + +## 2. Quickstart + +```bash +python -m venv .venv +. .venv/Scripts/activate # PowerShell: .\.venv\Scripts\Activate.ps1 +pip install -r requirements.txt + +# Run both API and UI +python run.py +``` + +Visit `http://127.0.0.1:5100/ui` to use the console and `http://127.0.0.1:5000/` (with IAM headers) for raw API calls. + +### Run modes + +You can run services individually if needed: + +```bash +python run.py --mode api # API only (port 5000) +python run.py --mode ui # UI only (port 5100) +``` + +### Docker quickstart + +The repo now ships a `Dockerfile` so you can run both services in one container: + +```bash +docker build -t myfsio . +docker run --rm -p 5000:5000 -p 5100:5100 \ + -v "$PWD/data:/app/data" \ + -v "$PWD/logs:/app/logs" \ + -e SECRET_KEY="change-me" \ + --name myfsio myfsio +``` + +PowerShell (Windows) example: + +```powershell +docker run --rm -p 5000:5000 -p 5100:5100 ` + -v ${PWD}\data:/app/data ` + -v ${PWD}\logs:/app/logs ` + -e SECRET_KEY="change-me" ` + --name myfsio myfsio +``` + +Key mount points: +- `/app/data` → persists buckets directly under `/app/data/` while system metadata (IAM config, bucket policies, versions, multipart uploads, etc.) lives under `/app/data/.myfsio.sys` (for example, `/app/data/.myfsio.sys/config/iam.json`). +- `/app/logs` → captures the rotating app log. +- `/app/tmp-storage` (optional) if you rely on the demo upload staging folders. + +With these volumes attached you can rebuild/restart the container without losing stored objects or credentials. + +### Versioning + +The repo now tracks a human-friendly release string inside `app/version.py` (see the `APP_VERSION` constant). Edit that value whenever you cut a release. The constant flows into Flask as `APP_VERSION` and is exposed via `GET /healthz`, so you can monitor deployments or surface it in UIs. + +## 3. Configuration Reference + +| Variable | Default | Notes | +| --- | --- | --- | +| `STORAGE_ROOT` | `/data` | Filesystem home for all buckets/objects. | +| `MAX_UPLOAD_SIZE` | `1073741824` | Bytes. Caps incoming uploads in both API + UI. | +| `UI_PAGE_SIZE` | `100` | `MaxKeys` hint shown in listings. | +| `SECRET_KEY` | `dev-secret-key` | Flask session key for UI auth. | +| `IAM_CONFIG` | `/data/.myfsio.sys/config/iam.json` | Stores users, secrets, and inline policies. | +| `BUCKET_POLICY_PATH` | `/data/.myfsio.sys/config/bucket_policies.json` | Bucket policy store (auto hot-reload). | +| `API_BASE_URL` | `None` | Used by the UI to hit API endpoints (presign/policy). If unset, the UI will auto-detect the host or use `X-Forwarded-*` headers. | +| `AWS_REGION` | `us-east-1` | Region embedded in SigV4 credential scope. | +| `AWS_SERVICE` | `s3` | Service string for SigV4. | +| `ENCRYPTION_ENABLED` | `false` | Enable server-side encryption support. | +| `KMS_ENABLED` | `false` | Enable KMS key management for encryption. | +| `KMS_KEYS_PATH` | `data/kms_keys.json` | Path to store KMS key metadata. | +| `ENCRYPTION_MASTER_KEY_PATH` | `data/master.key` | Path to the master encryption key file. | + +Set env vars (or pass overrides to `create_app`) to point the servers at custom paths. + +### Proxy Configuration + +If running behind a reverse proxy (e.g., Nginx, Cloudflare, or a tunnel), ensure the proxy sets the standard forwarding headers: +- `X-Forwarded-Host` +- `X-Forwarded-Proto` + +The application automatically trusts these headers to generate correct presigned URLs (e.g., `https://s3.example.com/...` instead of `http://127.0.0.1:5000/...`). Alternatively, you can explicitly set `API_BASE_URL` to your public endpoint. + +## 4. Authentication & IAM + +1. On first boot, `data/.myfsio.sys/config/iam.json` is seeded with `localadmin / localadmin` that has wildcard access. +2. Sign into the UI using those credentials, then open **IAM**: + - **Create user**: supply a display name and optional JSON inline policy array. + - **Rotate secret**: generates a new secret key; the UI surfaces it once. + - **Policy editor**: select a user, paste an array of objects (`{"bucket": "*", "actions": ["list", "read"]}`), and submit. Alias support includes AWS-style verbs (e.g., `s3:GetObject`). +3. Wildcard action `iam:*` is supported for admin user definitions. + +The API expects every request to include `X-Access-Key` and `X-Secret-Key` headers. The UI persists them in the Flask session after login. + +### Available IAM Actions + +| Action | Description | AWS Aliases | +| --- | --- | --- | +| `list` | List buckets and objects | `s3:ListBucket`, `s3:ListAllMyBuckets`, `s3:ListBucketVersions`, `s3:ListMultipartUploads`, `s3:ListParts` | +| `read` | Download objects | `s3:GetObject`, `s3:GetObjectVersion`, `s3:GetObjectTagging`, `s3:HeadObject`, `s3:HeadBucket` | +| `write` | Upload objects, create buckets | `s3:PutObject`, `s3:CreateBucket`, `s3:CreateMultipartUpload`, `s3:UploadPart`, `s3:CompleteMultipartUpload`, `s3:AbortMultipartUpload`, `s3:CopyObject` | +| `delete` | Remove objects and buckets | `s3:DeleteObject`, `s3:DeleteObjectVersion`, `s3:DeleteBucket` | +| `share` | Manage ACLs | `s3:PutObjectAcl`, `s3:PutBucketAcl`, `s3:GetBucketAcl` | +| `policy` | Manage bucket policies | `s3:PutBucketPolicy`, `s3:GetBucketPolicy`, `s3:DeleteBucketPolicy` | +| `replication` | Configure and manage replication | `s3:GetReplicationConfiguration`, `s3:PutReplicationConfiguration`, `s3:ReplicateObject`, `s3:ReplicateTags`, `s3:ReplicateDelete` | +| `iam:list_users` | View IAM users | `iam:ListUsers` | +| `iam:create_user` | Create IAM users | `iam:CreateUser` | +| `iam:delete_user` | Delete IAM users | `iam:DeleteUser` | +| `iam:rotate_key` | Rotate user secrets | `iam:RotateAccessKey` | +| `iam:update_policy` | Modify user policies | `iam:PutUserPolicy` | +| `iam:*` | All IAM actions (admin wildcard) | — | + +### Example Policies + +**Full Control (admin):** +```json +[{"bucket": "*", "actions": ["list", "read", "write", "delete", "share", "policy", "replication", "iam:*"]}] +``` + +**Read-Only:** +```json +[{"bucket": "*", "actions": ["list", "read"]}] +``` + +**Single Bucket Access (no listing other buckets):** +```json +[{"bucket": "user-bucket", "actions": ["read", "write", "delete"]}] +``` + +**Bucket Access with Replication:** +```json +[{"bucket": "my-bucket", "actions": ["list", "read", "write", "delete", "replication"]}] +``` + +## 5. Bucket Policies & Presets + +- **Storage**: Policies are persisted in `data/.myfsio.sys/config/bucket_policies.json` under `{"policies": {"bucket": {...}}}`. +- **Hot reload**: Both API and UI call `maybe_reload()` before evaluating policies. Editing the JSON on disk is immediately reflected—no restarts required. +- **UI editor**: Each bucket detail page includes: + - A preset selector: **Private** detaches the policy (delete mode), **Public** injects an allow policy granting anonymous `s3:ListBucket` + `s3:GetObject`, and **Custom** restores your draft. + - A read-only preview of the attached policy. + - Autosave behavior for custom drafts while you type. + +### Editing via CLI + +```bash +curl -X PUT http://127.0.0.1:5000/bucket-policy/test \ + -H "Content-Type: application/json" \ + -H "X-Access-Key: ..." -H "X-Secret-Key: ..." \ + -d '{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": "*", + "Action": ["s3:ListBucket"], + "Resource": ["arn:aws:s3:::test"] + } + ] + }' +``` + +The UI will reflect this change as soon as the request completes thanks to the hot reload. + +## 6. Presigned URLs + +- Trigger from the UI using the **Presign** button after selecting an object. +- Or call `POST /presign//` with JSON `{ "method": "GET", "expires_in": 900 }`. +- Supported methods: `GET`, `PUT`, `DELETE`; expiration must be `1..604800` seconds. +- The service signs requests using the caller’s IAM credentials and enforces bucket policies both when issuing and when the presigned URL is used. +- Legacy share links have been removed; presigned URLs now handle both private and public workflows. + +### Multipart Upload Example + +```python +import boto3 + +s3 = boto3.client('s3', endpoint_url='http://localhost:5000') + +# Initiate +response = s3.create_multipart_upload(Bucket='mybucket', Key='large.bin') +upload_id = response['UploadId'] + +# Upload parts +parts = [] +chunks = [b'chunk1', b'chunk2'] # Example data chunks +for part_number, chunk in enumerate(chunks, start=1): + response = s3.upload_part( + Bucket='mybucket', + Key='large.bin', + PartNumber=part_number, + UploadId=upload_id, + Body=chunk + ) + parts.append({'PartNumber': part_number, 'ETag': response['ETag']}) + +# Complete +s3.complete_multipart_upload( + Bucket='mybucket', + Key='large.bin', + UploadId=upload_id, + MultipartUpload={'Parts': parts} +) +``` + +## 7. Encryption + +MyFSIO supports **server-side encryption at rest** to protect your data. When enabled, objects are encrypted using AES-256-GCM before being written to disk. + +### Encryption Types + +| Type | Description | +|------|-------------| +| **AES-256 (SSE-S3)** | Server-managed encryption using a local master key | +| **KMS (SSE-KMS)** | Encryption using customer-managed keys via the built-in KMS | + +### Enabling Encryption + +#### 1. Set Environment Variables + +```powershell +# PowerShell +$env:ENCRYPTION_ENABLED = "true" +$env:KMS_ENABLED = "true" # Optional, for KMS key management +python run.py +``` + +```bash +# Bash +export ENCRYPTION_ENABLED=true +export KMS_ENABLED=true +python run.py +``` + +#### 2. Configure Bucket Default Encryption (UI) + +1. Navigate to your bucket in the UI +2. Click the **Properties** tab +3. Find the **Default Encryption** card +4. Click **Enable Encryption** +5. Choose algorithm: + - **AES-256**: Uses the server's master key + - **aws:kms**: Uses a KMS-managed key (select from dropdown) +6. Save changes + +Once enabled, all **new objects** uploaded to the bucket will be automatically encrypted. + +### KMS Key Management + +When `KMS_ENABLED=true`, you can manage encryption keys via the KMS API: + +```bash +# Create a new KMS key +curl -X POST http://localhost:5000/kms/keys \ + -H "Content-Type: application/json" \ + -H "X-Access-Key: ..." -H "X-Secret-Key: ..." \ + -d '{"alias": "my-key", "description": "Production encryption key"}' + +# List all keys +curl http://localhost:5000/kms/keys \ + -H "X-Access-Key: ..." -H "X-Secret-Key: ..." + +# Get key details +curl http://localhost:5000/kms/keys/{key-id} \ + -H "X-Access-Key: ..." -H "X-Secret-Key: ..." + +# Rotate a key (creates new key material) +curl -X POST http://localhost:5000/kms/keys/{key-id}/rotate \ + -H "X-Access-Key: ..." -H "X-Secret-Key: ..." + +# Disable/Enable a key +curl -X POST http://localhost:5000/kms/keys/{key-id}/disable \ + -H "X-Access-Key: ..." -H "X-Secret-Key: ..." + +curl -X POST http://localhost:5000/kms/keys/{key-id}/enable \ + -H "X-Access-Key: ..." -H "X-Secret-Key: ..." + +# Schedule key deletion (30-day waiting period) +curl -X DELETE http://localhost:5000/kms/keys/{key-id}?waiting_period_days=30 \ + -H "X-Access-Key: ..." -H "X-Secret-Key: ..." +``` + +### How It Works + +1. **Envelope Encryption**: Each object is encrypted with a unique Data Encryption Key (DEK) +2. **Key Wrapping**: The DEK is encrypted (wrapped) by the master key or KMS key +3. **Storage**: The encrypted DEK is stored alongside the encrypted object +4. **Decryption**: On read, the DEK is unwrapped and used to decrypt the object + +### Client-Side Encryption + +For additional security, you can use client-side encryption. The `ClientEncryptionHelper` class provides utilities: + +```python +from app.encryption import ClientEncryptionHelper + +# Generate a client-side key +key = ClientEncryptionHelper.generate_key() +key_b64 = ClientEncryptionHelper.key_to_base64(key) + +# Encrypt before upload +plaintext = b"sensitive data" +encrypted, metadata = ClientEncryptionHelper.encrypt_for_upload(plaintext, key) + +# Upload with metadata headers +# x-amz-meta-x-amz-key: +# x-amz-meta-x-amz-iv: +# x-amz-meta-x-amz-matdesc: + +# Decrypt after download +decrypted = ClientEncryptionHelper.decrypt_from_download(encrypted, metadata, key) +``` + +### Important Notes + +- **Existing objects are NOT encrypted** - Only new uploads after enabling encryption are encrypted +- **Master key security** - The master key file (`master.key`) should be backed up securely and protected +- **Key rotation** - Rotating a KMS key creates new key material; existing objects remain encrypted with the old material +- **Disabled keys** - Objects encrypted with a disabled key cannot be decrypted until the key is re-enabled +- **Deleted keys** - Once a key is deleted (after the waiting period), objects encrypted with it are permanently inaccessible + +### Verifying Encryption + +To verify an object is encrypted: +1. Check the raw file in `data//` - it should be unreadable binary +2. Look for `.meta` files containing encryption metadata +3. Download via the API/UI - the object should be automatically decrypted + +## 8. Bucket Quotas + +MyFSIO supports **storage quotas** to limit how much data a bucket can hold. Quotas are enforced on uploads and multipart completions. + +### Quota Types + +| Limit | Description | +|-------|-------------| +| **Max Size (MB)** | Maximum total storage in megabytes (includes current objects + archived versions) | +| **Max Objects** | Maximum number of objects (includes current objects + archived versions) | + +### Managing Quotas (Admin Only) + +Quota management is restricted to administrators (users with `iam:*` or `iam:list_users` permissions). + +#### Via UI + +1. Navigate to your bucket in the UI +2. Click the **Properties** tab +3. Find the **Storage Quota** card +4. Enter limits: + - **Max Size (MB)**: Leave empty for unlimited + - **Max Objects**: Leave empty for unlimited +5. Click **Update Quota** + +To remove a quota, click **Remove Quota**. + +#### Via API + +```bash +# Set quota (max 100MB, max 1000 objects) +curl -X PUT "http://localhost:5000/bucket/?quota" \ + -H "Content-Type: application/json" \ + -H "X-Access-Key: ..." -H "X-Secret-Key: ..." \ + -d '{"max_bytes": 104857600, "max_objects": 1000}' + +# Get current quota +curl "http://localhost:5000/bucket/?quota" \ + -H "X-Access-Key: ..." -H "X-Secret-Key: ..." + +# Remove quota +curl -X PUT "http://localhost:5000/bucket/?quota" \ + -H "Content-Type: application/json" \ + -H "X-Access-Key: ..." -H "X-Secret-Key: ..." \ + -d '{"max_bytes": null, "max_objects": null}' +``` + +### Quota Behavior + +- **Version Counting**: When versioning is enabled, archived versions count toward the quota +- **Enforcement Points**: Quotas are checked during `PUT` object and `CompleteMultipartUpload` operations +- **Error Response**: When quota is exceeded, the API returns `HTTP 400` with error code `QuotaExceeded` +- **Visibility**: All users can view quota usage in the bucket detail page, but only admins can modify quotas + +### Example Error + +```xml + + QuotaExceeded + Bucket quota exceeded: storage limit reached + my-bucket + +``` + +## 9. Site Replication + +### Permission Model + +Replication uses a two-tier permission system: + +| Role | Capabilities | +|------|--------------| +| **Admin** (users with `iam:*` permissions) | Create/delete replication rules, configure connections and target buckets | +| **Users** (with `replication` permission) | Enable/disable (pause/resume) existing replication rules | + +> **Note:** The Replication tab is hidden for users without the `replication` permission on the bucket. + +This separation allows administrators to pre-configure where data should replicate, while allowing authorized users to toggle replication on/off without accessing connection credentials. + +### Architecture + +- **Source Instance**: The MyFSIO instance where you upload files. It runs the replication worker. +- **Target Instance**: Another MyFSIO instance (or any S3-compatible service like AWS S3, MinIO) that receives the copies. + +Replication is **asynchronous** (happens in the background) and **one-way** (Source -> Target). + +### Setup Guide + +#### 1. Prepare the Target Instance + +If your target is another MyFSIO server (e.g., running on a different machine or port), you need to create a destination bucket and a user with write permissions. + +**Option A: Using the UI (Easiest)** +If you have access to the UI of the target instance: +1. Log in to the Target UI. +2. Create a new bucket (e.g., `backup-bucket`). +3. Go to **IAM**, create a new user (e.g., `replication-user`), and copy the Access/Secret keys. + +**Option B: Headless Setup (API Only)** +If the target server is only running the API (`run_api.py`) and has no UI access, you can bootstrap the credentials and bucket by running a Python script on the server itself. + +Run this script on the **Target Server**: + +```python +# setup_target.py +from pathlib import Path +from app.iam import IamService +from app.storage import ObjectStorage + +# Initialize services (paths match default config) +data_dir = Path("data") +iam = IamService(data_dir / ".myfsio.sys" / "config" / "iam.json") +storage = ObjectStorage(data_dir) + +# 1. Create the bucket +bucket_name = "backup-bucket" +try: + storage.create_bucket(bucket_name) + print(f"Bucket '{bucket_name}' created.") +except Exception as e: + print(f"Bucket creation skipped: {e}") + +# 2. Create the user +try: + # Create user with full access (or restrict policy as needed) + creds = iam.create_user( + display_name="Replication User", + policies=[{"bucket": bucket_name, "actions": ["write", "read", "list"]}] + ) + print("\n--- CREDENTIALS GENERATED ---") + print(f"Access Key: {creds['access_key']}") + print(f"Secret Key: {creds['secret_key']}") + print("-----------------------------") +except Exception as e: + print(f"User creation failed: {e}") +``` + +Save and run: `python setup_target.py` + +#### 2. Configure the Source Instance + +Now, configure the primary instance to replicate to the target. + +1. **Access the Console**: + Log in to the UI of your Source Instance. + +2. **Add a Connection**: + - Navigate to **Connections** in the top menu. + - Click **Add Connection**. + - **Name**: `Secondary Site`. + - **Endpoint URL**: The URL of your Target Instance's API (e.g., `http://target-server:5002`). + - **Access Key**: The key you generated on the Target. + - **Secret Key**: The secret you generated on the Target. + - Click **Add Connection**. + +3. **Enable Replication** (Admin): + - Navigate to **Buckets** and select the source bucket. + - Switch to the **Replication** tab. + - Select the `Secondary Site` connection. + - Enter the target bucket name (`backup-bucket`). + - Click **Enable Replication**. + + Once configured, users with `replication` permission on this bucket can pause/resume replication without needing access to connection details. + +### Verification + +1. Upload a file to the source bucket. +2. Check the target bucket (via UI, CLI, or API). The file should appear shortly. + +```bash +# Verify on target using AWS CLI +aws --endpoint-url http://target-server:5002 s3 ls s3://backup-bucket +``` + +### Pausing and Resuming Replication + +Users with the `replication` permission (but not admin rights) can pause and resume existing replication rules: + +1. Navigate to the bucket's **Replication** tab. +2. If replication is **Active**, click **Pause Replication** to temporarily stop syncing. +3. If replication is **Paused**, click **Resume Replication** to continue syncing. + +When paused, new objects uploaded to the source will not replicate until replication is resumed. Objects uploaded while paused will be replicated once resumed. + +> **Note:** Only admins can create new replication rules, change the target connection/bucket, or delete rules entirely. + +### Bidirectional Replication (Active-Active) + +To set up two-way replication (Server A ↔ Server B): + +1. Follow the steps above to replicate **A → B**. +2. Repeat the process on Server B to replicate **B → A**: + - Create a connection on Server B pointing to Server A. + - Enable replication on the target bucket on Server B. + +**Loop Prevention**: The system automatically detects replication traffic using a custom User-Agent (`S3ReplicationAgent`). This prevents infinite loops where an object replicated from A to B is immediately replicated back to A. + +**Deletes**: Deleting an object on one server will propagate the deletion to the other server. + +**Note**: Deleting a bucket will automatically remove its associated replication configuration. + +## 11. Running Tests + +```bash +pytest -q +``` + +The suite now includes a boto3 integration test that spins up a live HTTP server and drives the API through the official AWS SDK. If you want to skip it (for faster unit-only loops), run `pytest -m "not integration"`. + +The suite covers bucket CRUD, presigned downloads, bucket policy enforcement, and regression tests for anonymous reads when a Public policy is attached. + +## 12. Troubleshooting + +| Symptom | Likely Cause | Fix | +| --- | --- | --- | +| 403 from API despite Public preset | Policy didn’t save or bucket key path mismatch | Reapply Public preset, confirm bucket name in `Resource` matches `arn:aws:s3:::bucket/*`. | +| UI still shows old policy text | Browser cached view before hot reload | Refresh; JSON is already reloaded on server. | +| Presign modal errors with 403 | IAM user lacks `read/write/delete` for target bucket or bucket policy denies | Update IAM inline policies or remove conflicting deny statements. | +| Large upload rejected immediately | File exceeds `MAX_UPLOAD_SIZE` | Increase env var or shrink object. | + +## 13. API Matrix + +``` +GET / # List buckets +PUT / # Create bucket +DELETE / # Remove bucket +GET / # List objects +PUT // # Upload object +GET // # Download object +DELETE // # Delete object +POST /presign// # Generate SigV4 URL +GET /bucket-policy/ # Fetch policy +PUT /bucket-policy/ # Upsert policy +DELETE /bucket-policy/ # Delete policy +GET /?quota # Get bucket quota +PUT /?quota # Set bucket quota (admin only) +``` + +## 14. Next Steps + +- Tailor IAM + policy JSON files for team-ready presets. +- Wrap `run_api.py` with gunicorn or another WSGI server for long-running workloads. +- Extend `bucket_policies.json` to cover Deny statements that simulate production security controls.