Milvus with SolidFire and E-Series
source link: https://scaleoutsean.github.io/2022/07/07/milvus-with-solidfire-e-series.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Milvus with SolidFire and E-Series
07 Jul 2022 -
6 minute read
WTF is Milvus
Milvus is a vector database built for scalable similarity search.
Storage-related stuff
To get Milvus up and running I first RTFM. One of deployment options available that fit my existing environment* was Milvus Standalone - local Milvus that can be started with Docker Compose. (* I officially don’t work this week, so I didn’t want to go out of my way to try it out. I had three SolidFire volumes mounted from my recent Kafka efficiency testing, so I used those). The volumes:
- etcd - as the name suggests, Milvus Standalone uses singleton etcd instance for cluster metadata
- standalone - location for local Milvus data when Milvus is deployed in stand-alone mode
- object store - volume for S3, currently must be Minio-based, where Milvus moves sealed segments when it’s done indexing them
Normally things are more complicated - not all-in-one (Standalone), that is.
Source: Milvus v2.0 (documentation)
I don’t have the resources to do this easily right now, so for time being I’ll stick with Milvus Standalone. Let’s see about those three volumes used by docker compose file for Milvus Standalone.
Meta storage (etcd)
etcd I/O is small and the workload not a novelty since we know it from Kubernetes.
For that we’d just provision a volume (or volumes, for larger clusters) on SSD storage. SolidFire is all-flash, so we’d just set Min IOPS on each such volume to say 5,000 IOPS. E-Series has no QoS settings, so we’d simply create an SSD-backed volume for each instance of etcd.
Logs and queues
Milvus Standalone uses just one volume, which I think keeps just message logs on this persistent volume. This workload could be be similar to Kafka (in fact, Milvus supports Pulsar and Kafka for message storage, but Milvus Standalone uses RocksMQ).
For small-to-medium Milvus, SolidFire should be fine, but for large check out E-Series EF300 or EF600 - this is the same recipe that we would use for S3-tiered Kafka.
Capacity-wise I expect <20 GB should be enough for Milvus Standalone (even more, but in the case S3 goes down, some time to retry uploads should be allowed), but we need to remember that production clusters are different (there are more containers, some are not even stateful, and stateful volumes may need different sizes), so I’ll take another look when I build a larger Milvus cluster.
Object store
Object Store workload is 100% write when there’s no query/search workload, and because uploading data to S3 deals with entire segments, these are large (1MB+) writes. I’m not sure how reads work in terms of request sizes, but I expect smaller reads (index data) combined with full segment downloads, so large and medium read requests. It’d be wasteful to run this off SolidFire; it’s OK for up to perhaps 1 GB/s, but large Minio runs better on E-Series and NetApp StorageGRID does too (and for large Milvus clusters we’d use dedicated StorageGRID appliances that we can also use for Kafka).
My SolidFire “cluster” at home is a small VM, which means I couldn’t properly benchmark Milvus with it, but even this environment provided some insights regarding possible I/O patterns.
In a small “INSERT” test I did, the first volume (ID 613; etcd) was mostly write workload that consisted of small-request sizes, the second volume (ID 614; S3 service) had a similar pattern due to Milvus tiering data to it, while the third volume (ID 615) was was mostly large-size IO.
Workloads on S3 (ID 614) and Milvus Standalone (ID 615; chart below) were similar, which wasn’t unexpected because data first lands on Milvus data volume and after indexing, it’s moved to S3.
As I said above, normally we wouldn’t use SolidFire for Minio back-end - we want less fancy storage for that - so the S3 workload would be the first to go (to E-Series or StorageGRID) if we wanted to deploy Milvus in production. As mentioned above, Milvus seems to currently only support Minio which will probably change in coming months (I don’t have any “inside info”, I only know what other enterprise who prototype stuff with Minio eventually do).
The rest would then be similar to other databases that write to S3 (when they cool data), and read from S3 (to download, decompress and search).
Storage efficiency
Milvus can be very storage-efficient, so don’t expect much in terms of savings from storage array compression and deduplication.
After using random data to populate Milvus Standalone, observed SolidFire efficiency was only 1.04x (4% savings) from deduplication and compression. This may be better with real-life data and given that Milvus doesn’t need a lot of capacity on local tier it’s not a big deal, but be cautious when counting on storage efficiencies if your available space is very tight (< 1 TB). I may run additional tests with real-life data if need arises.
E-Series has no compression and deduplication so we don’t need to mind this section.
High availability of block and S3 storage services
Production clusters would have multiple replicas for etcd, messaging, index, and data. We could place redundant copies on one E-Series array or SolidFire cluster (both have redundant components), but to get even better redundancy we’d deploy two or three storage back-ends across two or three sites. A lower cost version of this could probably use self-deployed Milvus in the public cloud (if the license allows it, I haven’t checked).
E-Series array capacity could be shared between Milvus and StorageGRID SDS, and located all on the same site, or one array per each site. I’d recommend this for medium sites with geo-cluster requirements. Milvus microservices and StorageGRID both rely on software-based replication so neither E-Series nor SolidFire replication would need to be used.
If dedicated StorageGRID appliances were used for S3 service (in addition to E-Series for block storage), we’d need at least three StorageGRID appliances (either per site, for highest availability, or all together (one appliance per site, with limited site redundancy at a lower cost)).
Next steps
I plan to run more extensive tests with Milvus, and I’ll probably do it with E-Series because I’d like to make sure I have several GB/s of sequential performance at my disposal in order to avoid having Minio slow down Milvus.
That should also give me some more detailed insight into S3 workload and roughly determine at what level of Milvus performance should Minio should be backed by all-flash disks.
Recommend
-
10
Storage efficiency with Kafka 3.2 and NetApp SolidFire 12 05 Jul 2022 - 6 minute read Objective I wanted to see how much data...
-
5
Cassandra and AstraDB with SolidFire and E-Series 09 Jul 2022 - 10 minute read Introduction Apache Cassandra is an open-sourc...
-
4
How Does the Milvus Vector Database Ensure Data Security? This...
-
9
Backup and restore KubeVirt VMs on SolidFire with Kasten K10 12 Feb 2023 - 3 minute read Introduction KubeVirt is still ear...
-
51
Velero 1.10 with NetApp Trident and SolidFire 17 Feb 2023 - 3 minute read Velero 1.10 came out recently so I tried it with Kubernetes v1.26.1, Trident v2...
-
7
Use cases for AWS mountpoint-s3 with NetApp E-Series and SolidFire 16 Mar 2023 - 5 minute read AWS mountpoint-s3
-
4
CloudCasa, Velero, NetApp Trident, and SolidFire 15 Apr 2023 - 4 minute read Table of Contents Conclu...
-
7
StatefulSet PVC Retention with Trident and SolidFire 21 Aug 2023 - 7 minute read Introduction PVC retention has entered beta...
-
6
NetApp SolidFire backup to S3 and MinIO compression savings 10 Dec 2023 - 5 minute read Introduction I’m not sure if I used...
-
5
Kubernetes KEDA with NetApp SolidFire and E-Series 24 Jan 2024 - 13 minute read Introduction WTF is KEDA? From
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK