I-am_Sleepy

I-am_Sleepy t1_j7ybb41 wrote

I don’t think ML researcher didn’t care about model calibration or tail risks. Just it often doesn’t came up in experimental settings

It also depends on the objective. If your goal is regression or classification, then tail risk and model calibration might be necessary as supporting metrics

But for more abstract use case such as generative modeling, it is debatable if tail risk and model calibration actually matter. For example GANs model can experience mode collapse such that the generated data isn’t as diverse as the original data distribution. But it doesn’t mean the model is totally garbage either

Also I don’t think statistics and ML is totally different, because most of statistical fundamentals is also ML fundamentals. And such many of ML metrics is directly derive from fundamental statistics and / or related fields

13

I-am_Sleepy t1_j49d3mv wrote

FYI, using output from first stage model is called model stacking

Are you trying to model time series classification (many-to-one)? I don't know if making it a 2 stage model is appropriate i.e. using 0 and 1 as an intermediate representation

The hierarchical classification error will propagate through multiple stage if using raw prediction from previous stage alone. For example, if first stage model is 0.9 in accuracy, and second stage is also 0.9. The maximal accuracy two stage model will be 0.9*0.9 = 0.81 (performance degrade)

1

I-am_Sleepy t1_j0sh383 wrote

If you have a target variable, and other input features, you can treat this problem as a normal regression problem. Using model like linear regression, Random Forest Regression, or XGBoost is very straight forward from there

You can then look at feature importance to try to weed-out the uncorrelated features (if you want to). There are a few automated ml for timeseries, but currently I mostly use Pycaret

But if you suspect that your target variable autocorrelate, model like SARIMAX can be used instead. An automated version of that is Statsforecast e.g. AutoARIMA with exogenous variables (haven't used it though)

But noted that if you are in direct control of a few variables, and you want to predict want will happen, this is no longer a simple regression anymore i.e. the data distribution may shift. That would be in Casual Inference territory (see this handbook)

1

I-am_Sleepy t1_j0phzjn wrote

I'm not sure, but I think there are several ways to model product assortments

First, Demand forecasting - You predict demand of each product, and act accordingly. This usually can be done using time-series forecast, or

Second, personalize taste - You assume that each customer has their own fixed preference, and you modeled that. If you know the demographic of each customer, you would be able to estimate the demand from recommended products

But the later probably going to output a static distribution, so I think you can apply demand forecast on the second method to discount them correctly (I think)

However, every method need data. If you have a cold-start product, you might want to perform basic A/B testing first to get the initial data

1

I-am_Sleepy t1_izr846m wrote

I am not sure why your output need to not be correlated with other predictor. If the task is correlated then its feature should be correlated too e.g. panoptic segmentation and depth estimation

For feature de-correlation there are some technique you can applied. For example in DL there is orthogonal regularization (enforce feature dot product to be 0), and this blog post

1

I-am_Sleepy t1_iy7xfo0 wrote

The basic idea of log likelihood is

  1. Assume data is generated from parameterized distribution x ~ p(x| z)
  2. Let X be a set of {x1, x2, …, xi} ~ p(x|z). Because each item is generated independently, to generate this dataset, the probability becomes p(X|z) = p(x1|z) * p(x2|z) * … * p(xi|z)
  3. Best fit z will maximize the above formula, but because multiplication can cause numerical inaccuracy, we apply a monotonic function as it won’t change the optimum point. Thus we get log(p(X|z)) = sum [log p(xi|z)]
  4. Using traditional optimization paradigm, we want to minimize the cost function, thus we need to multiply the formula by -1. Then we arrived at Negative Log Likelihood i.e. Optimize for -log(p(X|z))

Your formula estimate the p distribution as a gaussian, which is parameterized by mu and sigma. Usually initialized as zero vector and identity matrix

Using standard autograd, you can then optimize for those parmateters iteratively. But other optimization method is also possible depends on your preference such as genetic algorithm, or bayesian optimization

For bayesian, if your prior is normal, then its conjugate prior is also normal. For multivariate, it is a bit trickier, depends on your settings (likelihood distribution) you can lookup here. You need to look into Interpretation of hyperparameters columns to understand it better, and/or maybe here too

1

I-am_Sleepy t1_iy7dqu4 wrote

I am not sure, but maybe the read data is cached? Try disable that first or maybe there is memory leak code somewhere

If your data is a single large file, it will try to read entire tensor first, before load into memory. So if it is too large, try implement your dataset as a generator (batching), or speed up preprocessing time by save the processed input as protobuff files

But single large file dataset shouldn’t slowdown at half epoch, so that is up to debate I guess

1

I-am_Sleepy t1_iy3x7p2 wrote

So what is your task again? If it is a regression problem i.e. given 10 people, calculate probability of label being 1. Then basic binary classifier should do the trick. If the problem is maximizing probability of label being 1, that will be closer to reinforcement learning. Which you can go a few way of here but for me, I would implement using genetic algorithm

1

I-am_Sleepy t1_ix8g3q5 wrote

I'm guessing you are trying to make sentiment analysis (NLP) on Newswires data source. If there is a public API, you can queried data directly. If not, you would need to write your own crawler. Then you can save the data locally, or upload them to cloud like BigQuery. For a lazy solution, you can then connect your BigQuery dataset to AutoML

But if you want to train your own model, you can try picking some from HuggingFace, or follow paper trails from paperswithcode

1

I-am_Sleepy t1_ix8c4un wrote

Usually normal distribution is used to fitted with target distribution, but if it is a multimodal, you can try Gaussian Mixture Models (GMMs). But if it is unimodal, but non-symmetric you can try fitting parameterized distribution through MLE (see Fitting a gamma distribution with (python) Scipy), or try transforming your variable through non-linear transformations such as log transform or box-cox transformation)

3

I-am_Sleepy t1_ix2kx3a wrote

It actually depends on the data size, a small tabular data 8 Gb would be sufficient. But a larger one might require more ram

If you train a single model, this shouldn’t be a problem. But using framework like Pycaret would need a bit more ram as it also use parallel processing

I have 16 Gb model with about 6m rows and 10 columns, Pycaret used ~10-15 Gb of ram (yep, it also use swap), but that also depends on what model you are using (SVM use a lot of ram, but LightGBM should be fine)

For the long run, you would eventually off load heavy training task to cloud with team green gpu anyway (cuML and/or RAPIDS). For starters, Colab + gDrive is fine, but a dedicate compute engine is a lot more convenient

2