Far-Butterscotch-436
Far-Butterscotch-436 t1_j2pii5n wrote
I started my PhD when I was 34, I say go for it!
Far-Butterscotch-436 t1_j2nb2qb wrote
Reply to [D] Machine Learning Illustrations by fdis_
Maybe explain the gblinear version of xgboost? You only explained gbtree
Far-Butterscotch-436 t1_j2ixhun wrote
Reply to comment by Mathwins in [D] ML PhD vs. Master of Data Science by [deleted]
You can get fully funded for phd. School paid my tuition and gave me a stipend
Far-Butterscotch-436 t1_j1vqqp2 wrote
Reply to [D] analyst in a manufacturing company seeking to bring machine learning to the table. by turnip_markets
Why do all these posts get deleted? And where do they go?
Far-Butterscotch-436 t1_j0ajlfn wrote
Reply to comment by hopedallas in [D] Dealing with extremely imbalanced dataset by hopedallas
When you downsample try to get at least 1:10 ratio (minority:majority)
Far-Butterscotch-436 t1_j0a9083 wrote
5% imbalance isn't bad. Just use a cost function that uses a metric to handle imbalance. Ie, the weighted average binomial deviance and you'll be fine.
Also you can create downsampling ensemble to compare performance and compare. Don't downsample to 50/50, try for at least 10%
You've got a good problem, lots of observations with few features
Far-Butterscotch-436 t1_j0a8ny2 wrote
Reply to comment by biophysninja in [D] Dealing with extremely imbalanced dataset by hopedallas
Regarding 2, there are only 500 features, dimension reduction not needed.
1 and 3 are last resorts
Far-Butterscotch-436 t1_iy6s65v wrote
Reply to comment by [deleted] in Is coding from scratch a requirement to be able to do research? [D] by [deleted]
Yes
Far-Butterscotch-436 t1_iy3vwok wrote
Its more like a toolkit than one method that works better than the rest. Don't forget to add an auto encoder to your list
Far-Butterscotch-436 t1_ivuhdm4 wrote
Reply to [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
Easy, use all the training data, use smaller label weights for the uncertain data. But keep in mind, if the data is uncertain how can you trust it??? If you say the label is uncertain is there a probability that the label is incorrect? How will you measure performance on your uncertain data vs certain? Boosting algorithms will certainly overfit , it will be difficult.
Far-Butterscotch-436 t1_iudn2cw wrote
Reply to [D] I got this email after the first round of ML SWE round at Meta. Does it mean that I cleared the round but there's a hiring freeze in place? by Difficult-Big-3890
Why was the original post removed? In fact most of the posts I get notified for are actually removed....?
Far-Butterscotch-436 t1_ir3uu4h wrote
Reply to [R] Self-Programming Artificial Intelligence Using Code-Generating Language Models by Ash3nBlue
Is that just an 'automatic' grid search?
Far-Butterscotch-436 t1_jad1g2v wrote
Reply to [D] What is the most "opaque" popular machine learning model in 2023? by fromnighttilldawn
Just about every discussion i get notifications for is deleted , what s up with that?