Far-Butterscotch-436 t1_jad1g2v wrote on February 28, 2023 at 3:46 PM

5% imbalance isn't bad. Just use a cost function that uses a metric to handle imbalance. Ie, the weighted average binomial deviance and you'll be fine.

Also you can create downsampling ensemble to compare performance and compare. Don't downsample to 50/50, try for at least 10%

You've got a good problem, lots of observations with few features

Far-Butterscotch-436 t1_j0a8ny2 wrote on December 15, 2022 at 4:26 AM

Reply to comment by biophysninja in [D] Dealing with extremely imbalanced dataset by hopedallas

Regarding 2, there are only 500 features, dimension reduction not needed.

1 and 3 are last resorts

Far-Butterscotch-436 t1_iy6s65v wrote on November 29, 2022 at 4:02 AM

Reply to comment by [deleted] in Is coding from scratch a requirement to be able to do research? [D] by [deleted]

Yes

Far-Butterscotch-436 t1_iy3vwok wrote on November 28, 2022 at 3:43 PM

Reply to [D] What method is state of the art dimensionality reduction by olmec-akeru

Its more like a toolkit than one method that works better than the rest. Don't forget to add an auto encoder to your list

Far-Butterscotch-436 t1_ivuhdm4 wrote on November 10, 2022 at 6:22 PM

Reply to [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen

Easy, use all the training data, use smaller label weights for the uncertain data. But keep in mind, if the data is uncertain how can you trust it??? If you say the label is uncertain is there a probability that the label is incorrect? How will you measure performance on your uncertain data vs certain? Boosting algorithms will certainly overfit , it will be difficult.

Far-Butterscotch-436 t1_iudn2cw wrote on October 30, 2022 at 3:16 PM

Reply to [D] I got this email after the first round of ML SWE round at Meta. Does it mean that I cleared the round but there's a hiring freeze in place? by Difficult-Big-3890

Why was the original post removed? In fact most of the posts I get notified for are actually removed....?

Far-Butterscotch-436 t1_ir3uu4h wrote on October 5, 2022 at 3:21 AM

Reply to [R] Self-Programming Artificial Intelligence Using Code-Generating Language Models by Ash3nBlue

Is that just an 'automatic' grid search?