MadScientist-1214
MadScientist-1214 t1_j9q1bll wrote
Reply to [D] Model size vs task complexity by Fine-Topic-6127
The only shortcut I can give you is to look on Kaggle to see what the competitors have used. Most of the papers are not suitable for real world applications. It's not really about the complexity or scale of the task, but rather that the authors leave out some important information. For example, in object detection, there is DETR, but if you look on Kaggle, nobody uses that. The reason is that the original DETR has too slow a convergence speed and was only trained on 640 size images. Instead, many people still use YOLO. But you don't realize that until you try it yourself or someone tells you.
MadScientist-1214 t1_j8ox26g wrote
Reply to [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie
Better than AdamW if (a) the model is a transformer, (b) not a lot of augmentations are used. Otherwise, the improvements are not that large. I doubt this optimizer works well with regular CNNs like efficientnet or convnext.
MadScientist-1214 t1_j7dh2rw wrote
From a linguistic perspective, no language is more efficient than another language. Switching to an Asian language like Chinese would not necessarily be a better representation for the neural network than English. Mandarin Chinese is a very analytical language with a low inflectional morphology, but it is no less complex. For example, it has a large number of modal particles that have no equivalent in English.
In linguistics, there are also attempts to convert languages into other forms of representation. The natural semantic metalanguage (NSM), for example, reduces words to a set of semantic primitives.
I am a bit more skeptical from what I have seen both in linguistics and in NLP.
MadScientist-1214 t1_j6yj0v6 wrote
Some models actually just use [0, 1] normalization (divide by 255). Some normalization is necessary, but [0, 1] is enough. On real world datasets, computing the specific mean/std never gave me better results.
MadScientist-1214 t1_j6fbt5k wrote
Reply to [D] Remote PhD by TheRealMrMatt
Yes, but that depends on your supervisor. I did my PhD completely remotely for half a year but I'm not at a top institute.
MadScientist-1214 t1_j6433qc wrote
Reply to [D] ImageNet2012 Advice by MyActualUserName99
At my institute, nobody trained on ImageNet, so I had to figure it out myself too. If you train on architectures like VGG, it does not take long. <2 days on a single A100, with worse GPU max. 5 days. The most important thing is to use SSD, this increases speed by around 2 days. A good learning scheduler is really important. Most researchers ignore the test set, use only validation set. And also important: use mixed precision. You should really tune the training speed, if you need to do a lot of experiments.
MadScientist-1214 t1_je6o0st wrote
Reply to [D] Improvements/alternatives to U-net for medical images segmentation? by viertys
Most new architectures based on U-Net do not actually work. Researchers need papers to get published, so they introduce leakage or optimize the seed. Segmentation papers in journals like CVPR are of better quality.