Desperate-Whereas50 t1_iyqiqqi wrote on December 3, 2022 at 10:59 AM

Reply to comment by Craksy in [D] PyTorch 2.0 Announcement by joshadel

>However, using unsafe does not necessarily mean UB. You preferably want to avoid that regardless.

>Unsafe code simply means that you are responsible for memory safety, not that it should be ignored.

Maybe I am wrong but I think you misunderstand UB. Of course you want to avoid UB and have memory safety in your code/executable because otherwise you can not argue about the program anymore. But you want UB (at least in C/C++ the language I work with) in your standard. UB is more like a promise of the programmer to not do specific things. The compiler assumes the code contains no UB and optimizies like that. See for example signed integer overflow. Because the compiler knows this is UB and the programmer promised to not allow it he can use better optimizations. Rust does not have this "feature" in safe blocks and produces less optimal code.

>And UB is not the only way a compiler can optimize.

I would not disagree about that. But if you want the last .x% of performance increase than you need it too. Especially if you want your language to work on different systems. Because even Hardware can have UB.

The only other option (as far as I know) you have to get some comperable (with UB assumption) performance is to rely on other assumptions like functions have no side effects etc.

>I don't know, you're talking about UB as if it was a feature and not an unfortunate development of compilers over the years.

As language specification it is like a feature. In the binary it is a bug. I have read enough discussions of UB in C++ threads to know that a lot of C++ developers dont see UB as unfortunate development of compilers.

>In fact, Rust made it very clear that if you rely on UB that's your pain.

By the way this is the sentence why I think you that you misunderstand UB. As mentioned: You should never rely on UB you promised the compiler to avoid it. And by avoiding it the compiler can work better.

Desperate-Whereas50 t1_iyow7w7 wrote on December 3, 2022 at 12:33 AM

Reply to comment by kc3w in [D] PyTorch 2.0 Announcement by joshadel

I am no rust expert therefore convince me that I am wrong, but that is only true if you dont use unsafe blocks. This would exclude using CUDA and as far as I know in some cases you need unsafe blocks to get C like performance.

But even if I am wrong and no undefined behaviour is needed. Even Rust has a pure function attribute to improve optimizations.

It just makes sense to use this improvements in libraries like pytorch/jax. Especially since mainly mathematical operations are performed that are pure functions anyway.

Desperate-Whereas50 t1_iynukpa wrote on December 2, 2022 at 7:59 PM

Reply to comment by gambs in [D] PyTorch 2.0 Announcement by joshadel

I only have the informations of your link. So I dont know about the other issues you talk about.

But if you set for the functional paradigm it is obvious that you need some jax.numpy and that jax.numpy can not implement every numpy function. Numpy and some of its functions (like inplace updates) are inherent non functional. I cant imagine an other way to fix this.

Desperate-Whereas50 t1_iynp7vk wrote on December 2, 2022 at 7:22 PM

Reply to comment by gambs in [D] PyTorch 2.0 Announcement by joshadel

Imho you need functional programming or undefined behaviour (like in C/C++) to get high optimized code. Undefined behaviour is more pain than functional programming, so i doubt it.

Edit: And even C/C++ compilers like gcc have tags for pure functions to improve optimizations.

Desperate-Whereas50 t1_iye7hf3 wrote on November 30, 2022 at 7:07 PM

Reply to comment by currentscurrents in [D] Other than data what are the common problems holding back machine learning/artificial intelligence by BadKarma-18

Thats correct. But to define what is the bare minimum, you need a baseline. I just wanted to say that humans are a bad baseline because we have "training data" encoded in our DNA. Further for tabular data ML systems often outperform humans with not as much training data.

But of course less data needed with good training results is always better. I would not argue about that.

Edit: Typos

Desperate-Whereas50 t1_iye5kfo wrote on November 30, 2022 at 6:54 PM

Reply to comment by currentscurrents in [D] Other than data what are the common problems holding back machine learning/artificial intelligence by BadKarma-18

>I doubt the typical human hears more than a million words of english in their childhood, but they know the language much better than GPT-3 does after reading billions of pages of it.

But is this a fair comparison? I am far a way from being an expert in Evolution but I assume we have some evolutinoary in coded bias to learn language easier. Whereas ML systems have to begin from 0.

Desperate-Whereas50 t1_iqwzlgc wrote on October 3, 2022 at 6:34 PM

Reply to comment by DeepNonseNse in [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187

I am not a transformer expert. So maybe this is a stupid question, but is this also true for transformer based architectures? For example BERT uses 12/24 transformer Blocks. Thats sounds not as deep as for example a resnet-256.