Zestyclose-Check-751

Zestyclose-Check-751 t1_j4qcv02 wrote

Could someone explain how Data Scientists work as consulters?

I can imagine only a few cases:
* A company already has a DS team, but they are not deep enough in some domains and need help/consultation.
* The integration of the solution is simple enough and may be delivered as API.
* A company wants PoC / demo, after that they gonna hire someone to work on it.

But usually, DS needs insides into how business works and the integration of the solution may be really long-term, especially if it includes A/B tests, re-iterations over model training, datasets collection and so on. In this case, even onboarding may be long enough.

So, I'm wondering to hear about real cases that have been solved by consulters and how it generally may work.

5

Zestyclose-Check-751 OP t1_ixym3ea wrote

>How to relate the input patch embeddings to one another s.t we can discriminate between the classes?

Hi, metric learning is an umbrella term like self-supervised learning, detection, and tracking. So, nobody pretends that the domain is new. But there are new approaches in this domain which are also mentioned in the article (like Hyp-ViT). Finally, despite the domain is not new, people still need some tools and tutorials to solve their problems.

0

Zestyclose-Check-751 OP t1_iri3e1m wrote

From the very beginning, we built the library in a way it can work with both -- CV and NLP. But right now there are only CV-related examples since it's our main focus. Generally, you only need to implement your own dataset returning encoding texts and a model knowing how to deal with that. We gonna add such an example in the future (seems like it will be similar to the content under your link).

2

Zestyclose-Check-751 OP t1_ire4qd5 wrote

We don't have the implementation of this particular paper (SupCon). But I would say our current pipeline is somehow similar because you may consider triplet loss and its variations also as a Supervised Contrastive method (because we need supervised labels to form the triplets, and the triplet itself has a contrastive nature since it works with positive and negative pairs).
So, you can try our models and see, if they work better or not :)

1

Zestyclose-Check-751 OP t1_irc75zv wrote

Initially, in our project, we decided to start using PML, but we found it inconvenient and we even were not able to complete our pipeline because we struggled with the validation setup. The following is my IMHO:

  1. The design & documentation of PML is not clear and intuitive
  2. PML provides some Trainer, but even the author does not use it in his examples (which is also a sign of a not perfect design) and he writes train and test functions by himself. As a side effect, it means that I see no easy way to use it with DDP without rewriting the examples' code, which is a big issue.

I think it may be related to the fact that PML was not designed as an open-source project from its beginning and there was no strict plan for the whole library. In contrast, when we started working on OML we already realised the whole structure, which helped us a lot. Anyway, I believe PML's author made a big contribution to the metric learning sphere, especially with his paper "A Metric Learning Reality Check".

UPD. I also forget to mention that we have a ZOO of pre-trained models with automatic checkpoints downloading (as in torchvision), so it's also a big advantage and allows us to start experimenting immediately. I am talking about models like MoCo, CLIP, DINO

UPD. We added the comparison with PML to FAQ and also added examples of the usage our library with losses from PML.

14