Email Us |TEL: 050-1720-0641 | LinkedIn | Daily Posts

Mintarc
  Mintarc Forge   Contact Us   News Letter   Blog   Partners
Collaboration Questions? Monthly Letter Monthly Blog Our Partners

SeaweedFS

A fast, scalable distributed storage system designed primarily for handling blobs, objects, files, and data lakes, capable of managing billions of files with efficiency. Originating from inspirations like Facebook's Haystack design, it optimizes for two core goals, storing massive numbers of files and serving them at high speeds through an O(1) disk seek mechanism. Where as traditional systems that centralize all metadata on a master node, SeaweedFS distributes file metadata across volume servers, reducing concurrency bottlenecks and enabling rapid access, requiring just one disk read per operation.

​This architecture begins with a master server that manages volume locations across storage nodes, volume servers handle actual file data and their lightweight metadata only 40 bytes per file. Users can deploy it via a single binary executable named "weed," making setup straightforward, as demonstrated by commands like "weed server -dir=/some/data/dir -s3" to launch master, volume, filer, and S3 gateway components simultaneously. Scalability comes naturally by adding more volume servers, each pointing to the master without mandatory data rebalancing, supporting rack-aware replication and even cloud tiering for warm data offloading. ​

Features

It supports flexible replication strategies, such as no replication for cost savings or multi-data-center options like "200" for dual-site redundancy, alongside automatic gzip compression, TTL expiration, and optional image resizing. The optional Filer component adds POSIX-like directory structures and attributes, mountable via FUSE, with metadata backed by proven stores like PostgreSQL, Redis, or Cassandra for linear scalability.

​Additional capabilities include S3-compatible APIs for integration with existing tools, Hadoop compatibility for analytics workloads, WebDAV for drive mapping, and Kubernetes CSI drivers for containerized environments. Erasure coding reduces storage costs for warm data, while features like active-active replication and AES256-GCM encryption enhance reliability and security. Transparent cloud integration allows hot data to stay local for speed while archiving to providers like AWS S3 or Google Cloud, minimizing API costs through O(1) access patterns. Benchmarks on modest hardware show it handling mixed workloads at over 369 MiB/s throughput, with reads for a million 1KB files averaging under 1ms latency. ​

Choose SeaweedFS Over Traditional Systems

Use SeaweedFS when dealing with high volumes of small files, such as media libraries, logs, or backups, where systems like HDFS falter due to chunk-based overhead unsuitable for quick concurrent access. Compared to GlusterFS or Ceph, it offers a flatter architecture with direct HTTP client communication to volume servers, caching volume lookups for O(1) seeks, and customizable metadata without hashing complexities that trigger costly rebalancing. Its single-binary mode simplifies operations versus multi-node setups in competitors, and SSD-friendliness prevents performance degradation from fragmentation.

​For large-scale deployments, SeaweedFS supports up to 128 exbibytes theoretically, with volumes fixed at 32GiB for predictability, and rack/data-center awareness ensures fault tolerance without single points of failure via master failover. Developers appreciate inserting custom keys, chunking super-large files transparently, and tiered storage that blends local speed with cloud elasticity hot data served instantly, warm data cost-effectively archived. In Kubernetes or Docker, its operator and single-command S3 starts like "docker run -p 8333:8333 testfs/seaweedfs server -s3" enable prototyping to production.

The MinIO Controversy and SeaweedFS as an Alternative

MinIO, once a darling of open-source S3-compatible object storage, entered maintenance mode in late 2025 amid backlash over restrictive licensing changes that locked enterprise features behind exorbitant paid tiers, alienating users who built workflows on its free version. Critics labeled it a "bait-and-switch," as core functionalities like advanced versioning, object lock, and IAM integrations became inaccessible without subscriptions, pushing self-hosters toward vendor lock-in despite its S3 focus suiting cloud-native apps. This shift exposed MinIO's limitations for small-file heavy workloads, where separate metadata files per object amplify I/O and lack O(1) optimizations. ​ SeaweedFS is an alternative, maintaining fully open-source Apache 2.0 licensing with active development, recently adding server-side encryption, object versioning, retention locks, and IAM features MinIO paywalled without compromising simplicity. It outperforms MinIO in benchmarks for small objects and mixed operations, leveraging volume-based metadata for true O(1) reads versus MinIO's full erasure coding overhead on all data, which slows hot access. Users report SeaweedFS handling massive concurrency better for logs or CDNs, with simpler scaling just spin up volume servers versus MinIO's rigid layouts.

​MinIO prioritizes S3 ecosystem familiarity and UI polish, SeaweedFS trades some for raw performance and flexibility, supporting POSIX FUSE alongside S3, erasure coding selectively for warm tiers, and cloud gateways without data migration pains. For teams escaping MinIO's controversy, it offers easier deployment no complex erasure setups and proven benchmarks for small-file tests, making it good for media, analytics, or high-throughput apps.

If you are in this tough spot you may wan to have a look. https://github.com/seaweedfs/seaweedfs