I-am_Sleepy t1_j7ybb41 wrote on February 10, 2023 at 7:07 AM

Reply to comment by Ulfgardleo in [D] Critique of statistics research from machine learning perspectives (and vice versa)? by fromnighttilldawn

I don’t think ML researcher didn’t care about model calibration or tail risks. Just it often doesn’t came up in experimental settings

It also depends on the objective. If your goal is regression or classification, then tail risk and model calibration might be necessary as supporting metrics

But for more abstract use case such as generative modeling, it is debatable if tail risk and model calibration actually matter. For example GANs model can experience mode collapse such that the generated data isn’t as diverse as the original data distribution. But it doesn’t mean the model is totally garbage either

Also I don’t think statistics and ML is totally different, because most of statistical fundamentals is also ML fundamentals. And such many of ML metrics is directly derive from fundamental statistics and / or related fields

I-am_Sleepy t1_j639x7w wrote on January 27, 2023 at 12:14 PM

Reply to comment by RealKillering in [D] Simple Questions Thread by AutoModerator

Check GPU version with “!nvidia-smi”, and for dataset this probably is not GPU fault but memory bottleneck. See https://stackoverflow.com/questions/49360888/google-colab-is-very-slow-compared-to-my-pc

I-am_Sleepy t1_j49d3mv wrote on January 14, 2023 at 1:51 AM

Reply to comment by JobPsychological5509 in [D] Simple Questions Thread by AutoModerator

FYI, using output from first stage model is called model stacking

Are you trying to model time series classification (many-to-one)? I don't know if making it a 2 stage model is appropriate i.e. using 0 and 1 as an intermediate representation

The hierarchical classification error will propagate through multiple stage if using raw prediction from previous stage alone. For example, if first stage model is 0.9 in accuracy, and second stage is also 0.9. The maximal accuracy two stage model will be 0.9*0.9 = 0.81 (performance degrade)

I-am_Sleepy t1_j3vok1m wrote on January 11, 2023 at 11:30 AM

Reply to comment by LetGoAndBeReal in [D] Simple Questions Thread by AutoModerator

Hey, I’ve found another paper (Git Re-Basin) about merging model weight trained on a disjoint dataset while retaining both model performance. This paper is quite technical, but there is an implementation online. I think you should check it out

I-am_Sleepy t1_j3qa3yq wrote on January 10, 2023 at 9:25 AM

Reply to comment by LetGoAndBeReal in [D] Simple Questions Thread by AutoModerator

I am not really in this field (NLP), but you should checkout Fast Model Editing at Scale from 2021 (use google scholar to find citation thread)

I-am_Sleepy t1_j0sh383 wrote on December 19, 2022 at 1:54 AM

Reply to comment by Maria_Adel in [D] Simple Questions Thread by AutoModerator

If you have a target variable, and other input features, you can treat this problem as a normal regression problem. Using model like linear regression, Random Forest Regression, or XGBoost is very straight forward from there

You can then look at feature importance to try to weed-out the uncorrelated features (if you want to). There are a few automated ml for timeseries, but currently I mostly use Pycaret

But if you suspect that your target variable autocorrelate, model like SARIMAX can be used instead. An automated version of that is Statsforecast e.g. AutoARIMA with exogenous variables (haven't used it though)

But noted that if you are in direct control of a few variables, and you want to predict want will happen, this is no longer a simple regression anymore i.e. the data distribution may shift. That would be in Casual Inference territory (see this handbook)

I-am_Sleepy t1_j0phzjn wrote on December 18, 2022 at 1:19 PM

Reply to comment by Maria_Adel in [D] Simple Questions Thread by AutoModerator

I'm not sure, but I think there are several ways to model product assortments

First, Demand forecasting - You predict demand of each product, and act accordingly. This usually can be done using time-series forecast, or

Second, personalize taste - You assume that each customer has their own fixed preference, and you modeled that. If you know the demographic of each customer, you would be able to estimate the demand from recommended products

But the later probably going to output a static distribution, so I think you can apply demand forecast on the second method to discount them correctly (I think)

However, every method need data. If you have a cold-start product, you might want to perform basic A/B testing first to get the initial data

I-am_Sleepy t1_j0mhitl wrote on December 17, 2022 at 8:05 PM

Reply to comment by honchokomodo in [D] Simple Questions Thread by AutoModerator

For starter, look at InfoVAE (See this blog for context). Another way is to vector-quantized it (VQ-VAE based models), as the model only need to learn a small number of latent, it can optimize them better

I-am_Sleepy t1_izr9aqr wrote on December 11, 2022 at 6:22 AM

Reply to comment by Username912773 in [D] Simple Questions Thread by AutoModerator

Right now, diffusion model (see FID score https://paperswithcode.com/sota/image-generation-on-celeba-64x64)

I-am_Sleepy t1_izr913e wrote on December 11, 2022 at 6:20 AM

Reply to comment by _RootUser_ in [D] Simple Questions Thread by AutoModerator

Your description is a bit vague, but if it is a regression, try linear model using least squared with polynomial features

I-am_Sleepy t1_izr846m wrote on December 11, 2022 at 6:11 AM

Reply to comment by Ricenaros in [D] Simple Questions Thread by AutoModerator

I am not sure why your output need to not be correlated with other predictor. If the task is correlated then its feature should be correlated too e.g. panoptic segmentation and depth estimation

For feature de-correlation there are some technique you can applied. For example in DL there is orthogonal regularization (enforce feature dot product to be 0), and this blog post

I-am_Sleepy t1_iy7xfo0 wrote on November 29, 2022 at 12:31 PM

Reply to comment by Pomdapi113 in [D] Simple Questions Thread by AutoModerator

The basic idea of log likelihood is

Assume data is generated from parameterized distribution x ~ p(x| z)
Let X be a set of {x1, x2, …, xi} ~ p(x|z). Because each item is generated independently, to generate this dataset, the probability becomes p(X|z) = p(x1|z) * p(x2|z) * … * p(xi|z)
Best fit z will maximize the above formula, but because multiplication can cause numerical inaccuracy, we apply a monotonic function as it won’t change the optimum point. Thus we get log(p(X|z)) = sum [log p(xi|z)]
Using traditional optimization paradigm, we want to minimize the cost function, thus we need to multiply the formula by -1. Then we arrived at Negative Log Likelihood i.e. Optimize for -log(p(X|z))

Your formula estimate the p distribution as a gaussian, which is parameterized by mu and sigma. Usually initialized as zero vector and identity matrix

Using standard autograd, you can then optimize for those parmateters iteratively. But other optimization method is also possible depends on your preference such as genetic algorithm, or bayesian optimization

For bayesian, if your prior is normal, then its conjugate prior is also normal. For multivariate, it is a bit trickier, depends on your settings (likelihood distribution) you can lookup here. You need to look into Interpretation of hyperparameters columns to understand it better, and/or maybe here too

I-am_Sleepy t1_iy7dqu4 wrote on November 29, 2022 at 8:01 AM

Reply to comment by nwatab in [D] Simple Questions Thread by AutoModerator

I am not sure, but maybe the read data is cached? Try disable that first or maybe there is memory leak code somewhere

If your data is a single large file, it will try to read entire tensor first, before load into memory. So if it is too large, try implement your dataset as a generator (batching), or speed up preprocessing time by save the processed input as protobuff files

But single large file dataset shouldn’t slowdown at half epoch, so that is up to debate I guess

I-am_Sleepy t1_iy3x7p2 wrote on November 28, 2022 at 3:52 PM

Reply to comment by ustainbolt in [D] Simple Questions Thread by AutoModerator

So what is your task again? If it is a regression problem i.e. given 10 people, calculate probability of label being 1. Then basic binary classifier should do the trick. If the problem is maximizing probability of label being 1, that will be closer to reinforcement learning. Which you can go a few way of here but for me, I would implement using genetic algorithm

I-am_Sleepy t1_ixfrnr0 wrote on November 23, 2022 at 2:40 AM

Reply to comment by DeepArdent in [D] Simple Questions Thread by AutoModerator

Using tfjs? The sentence embedding vector and be then compare using cosine similarity (which is relatively easy to implement in javascript, better yet the project page already implement dotProduct, and the vector is already normalize)

I-am_Sleepy t1_ix8g3q5 wrote on November 21, 2022 at 3:34 PM

Reply to comment by DevilsPrada007 in [D] Simple Questions Thread by AutoModerator

I'm guessing you are trying to make sentiment analysis (NLP) on Newswires data source. If there is a public API, you can queried data directly. If not, you would need to write your own crawler. Then you can save the data locally, or upload them to cloud like BigQuery. For a lazy solution, you can then connect your BigQuery dataset to AutoML

But if you want to train your own model, you can try picking some from HuggingFace, or follow paper trails from paperswithcode

I-am_Sleepy t1_ix8etfd wrote on November 21, 2022 at 3:25 PM

Reply to comment by BegalBoi in [D] Simple Questions Thread by AutoModerator

Have you scaled your data? If one signal magnitude too large, it can dominate the others, if not try StandardScaler, or PCA Decomposition
Why use kNN? Why not other models? But if you are somewhat lazy, there is Pycaret you can try (It automagically preprocess data + compare a lot of models for you)
Also is it a time-series data?

I-am_Sleepy t1_ix8d95u wrote on November 21, 2022 at 3:14 PM

Reply to comment by pormflakes-o_o in [D] Simple Questions Thread by AutoModerator

Genetic or gradient-based is okay, but if you really don't want to do anything and have only few parameters, you can use HyperOpt (It usually being used to optimize hyper-parameters, because it treat the objective as a black-box)

I-am_Sleepy t1_ix8c4un wrote on November 21, 2022 at 3:06 PM

Reply to comment by Still-Barracuda5245 in [D] Simple Questions Thread by AutoModerator

Usually normal distribution is used to fitted with target distribution, but if it is a multimodal, you can try Gaussian Mixture Models (GMMs). But if it is unimodal, but non-symmetric you can try fitting parameterized distribution through MLE (see Fitting a gamma distribution with (python) Scipy), or try transforming your variable through non-linear transformations such as log transform or box-cox transformation)

I-am_Sleepy t1_ix2lsyc wrote on November 20, 2022 at 6:54 AM

Reply to comment by Fun-ghoul in [D] Simple Questions Thread by AutoModerator

ELO ranking?

I-am_Sleepy t1_ix2lnza wrote on November 20, 2022 at 6:52 AM

Reply to comment by lcw520 in [D] Simple Questions Thread by AutoModerator

Without any context, probably try object detector model. But if you want simpler model, you can try traditional CV like median filtering + color model separation + Kalman filtering

I-am_Sleepy t1_ix2kx3a wrote on November 20, 2022 at 6:43 AM

Reply to comment by dulipat in [D] Simple Questions Thread by AutoModerator

It actually depends on the data size, a small tabular data 8 Gb would be sufficient. But a larger one might require more ram

If you train a single model, this shouldn’t be a problem. But using framework like Pycaret would need a bit more ram as it also use parallel processing

I have 16 Gb model with about 6m rows and 10 columns, Pycaret used ~10-15 Gb of ram (yep, it also use swap), but that also depends on what model you are using (SVM use a lot of ram, but LightGBM should be fine)

For the long run, you would eventually off load heavy training task to cloud with team green gpu anyway (cuML and/or RAPIDS). For starters, Colab + gDrive is fine, but a dedicate compute engine is a lot more convenient

I-am_Sleepy t1_ix2j0n3 wrote on November 20, 2022 at 6:20 AM

Reply to comment by sanman in [D] Simple Questions Thread by AutoModerator

You can try WSL2 for linux part. But Zluda seems to be abandoned for sometime now

I-am_Sleepy t1_irddanm wrote on October 7, 2022 at 4:40 AM

Reply to comment by Zestyclose-Check-751 in [P] We released a new open-source library for metric learning! by Zestyclose-Check-751

Is it compatible with PML, or various deep metric learning method need to be re-implement along with OML?