Kacper-Lukawski
Kacper-Lukawski t1_j5yineq wrote
Reply to comment by keisukegoda3804 in [D] Efficient retrieval of research information for graduate research by [deleted]
I do not know any benchmark that would measure that. It would also be quite challenging to compare to SaaS like Pinecone (it should be running on the same infrastructure to have comparable results). When it comes to Milvus, as far as I know, they use prefiltering for filtered search (https://github.com/milvus-io/milvus/discussions/12927). So they need to store the ids of matching entries somewhere during the vector search phase, possibly even all the ids if your filtering criteria do not exclude anything.
Kacper-Lukawski t1_j5xp10a wrote
Reply to comment by keisukegoda3804 in [D] Efficient retrieval of research information for graduate research by [deleted]
Each vector may have a payload object: https://qdrant.tech/documentation/payload/ Payload attributes can be used to make some additional constraints on the search results: https://qdrant.tech/documentation/filtering/ The unique feature is the filtering is already built-in into the vector search phase, so there is no need to pre- or postfilter the results.
Kacper-Lukawski t1_j5um27o wrote
Reply to comment by qalis in [D] Efficient retrieval of research information for graduate research by [deleted]
Moreover, you need a proper vector database to avoid kNN-like full scans for every query to run a semantic search at scale. Qdrant (https://qdrant.tech) is one of the options, probably the fastest according to benchmarks.
Kacper-Lukawski t1_j5itq5h wrote
Reply to Evaluation for similarity search [P] by silverstone1903
You need some ground truth labels to evaluate the quality of the semantic search. It might be a relevancy score or just binary information that a particular item is relevant. But you don't need to label all our data points.
There is a great article describing the metrics: https://neptune.ai/blog/recommender-systems-metrics I use that as a reference quite often. And if you are interested in a more step-by-step introduction, here is an article I wrote: https://qdrant.tech/articles/qa-with-cohere-and-qdrant/ It's an end-to-end solution, but some basic quality measurement is also included.
Kacper-Lukawski t1_j3mn2ji wrote
Reply to comment by leeliop in Image matching within database? [P] by Clarkmilo
It should be able to capture some transformations of the original images, but maybe I should think about measuring that. Thanks for the idea!
Kacper-Lukawski t1_j32skz3 wrote
Reply to Image matching within database? [P] by Clarkmilo
I wrote an article describing how to do it with NN models: https://medium.com/analytics-vidhya/how-to-implement-a-visual-search-at-no-time-5515270d27e3
Kacper-Lukawski t1_j725avq wrote
Reply to [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen
Qdrant has a recommendation API that allows doing exactly what you want, I suppose: https://qdrant.tech/documentation/search/#recommendation-api