learn-deeply
learn-deeply t1_jdxgxsx wrote
Reply to comment by Smallpaul in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed
This isn't correct, at least in the US. AI-generated material is not considered copyrightable unless there has been significant human involvement.
learn-deeply t1_jdnkaw7 wrote
Reply to comment by nekize in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
If you need to pad your paper, that means there hasn't been enough original research done.
learn-deeply t1_jdl1bmp wrote
Reply to [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
Anyone else tired of papers that obscure a simple concept with endless paragraphs of verbose gibberish? This 17 page could be a few sentences.
Tl;DR the authors wrote prompts to tell GPT-4 to fix code given some unit tests and the output of the broken code. It performs better than GPT-4 that doesn't have access to the output of the code execution.
https://github.com/noahshinn024/reflexion-human-eval/blob/main/reflexion.py#L7-L12
learn-deeply t1_jchhzqo wrote
Reply to [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234
The value that nanoGPT offers is that it is a self-contained (minimal dependencies), easy to understand code. This repo is essentially a wrapper for huggingface's models, dataset and accelerator, which is not very useful for didactic purposes.
learn-deeply t1_j9sukrc wrote
Reply to comment by Tonkotsu787 in [D] To the ML researchers and practitioners here, do you worry about AI safety/alignment of the type Eliezer Yudkowsky describes? by SchmidhuberDidIt
Also, Paul actually has trained and works closely with ML models, unlike Eliezer, who does not understand how deep learning works .
learn-deeply t1_j4na276 wrote
Note: graphs comparing GPUs are not actual benchmarks but theoretical results. Nvidia likes to arbitrarily add restraints to their non-datacenter GPUs, so its not clear what the real-word performance is.
learn-deeply t1_j4ad1te wrote
Reply to [D] Is MusicGPT a viable possibility? by markhachman
Google Magenta has done a bunch of research in this area. This one based on Perceiver is pretty promising, but currently only trained on piano.
learn-deeply t1_j3qdgsl wrote
Reply to comment by joossss in [D] Deep Learning Training Server by joossss
10Gbps is more than sufficient, data loading from the internet is not the bottleneck. Most likely you'll have the data already stored on the machine itself. Btw why did you remove the post?
learn-deeply t1_j3nx99l wrote
Reply to [D] Deep Learning Training Server by joossss
Are you looking to do distributed training across machines? Otherwise the NIC seems complete overkill.
learn-deeply t1_j342462 wrote
Reply to comment by ApprehensiveNature69 in [News] AMD Instinct MI300 APU for AI and HPC announced by samobon
What models have you tried? Wonder what the gaps between CUDA and ROCm are.
learn-deeply t1_j2vac5q wrote
Reply to comment by ElectronicCress3132 in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
The model hasn't reached convergence, and/or the train dataset was too small.
learn-deeply t1_j2u53ek wrote
Reply to comment by bloc97 in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
My unsubstantiated hypothesis: BLOOM is severely undertrained, so most neurons aren't contributing at all to the final result compared to OPT-175.
learn-deeply t1_j28hirz wrote
Reply to comment by londons_explorer in [P]Run CLIP on your iPhone to Search Photos offline. by RingoCatKeeper
Is that calculation taking into account memory (RAM/SSD) access latencies?
learn-deeply t1_j287u7z wrote
Reply to comment by RingoCatKeeper in [P]Run CLIP on your iPhone to Search Photos offline. by RingoCatKeeper
So it's calculating nearest neighbor compared to all of the images in the index every time a new search is done? Might be slow past say, 1,000 images.
learn-deeply t1_j2875vr wrote
How do you do the top-k neighbor search in iOS? Is there a library for it?
learn-deeply t1_j1yd1z6 wrote
Reply to comment by zveroboy152 in [R] PyTorch | Budget GPU Benchmarking by zveroboy152
TorchBench (https://github.com/pytorch/benchmark) is used by PyTorch core developers to test performance across a wide variety of models. I've never used it personally though.
learn-deeply t1_j1xjm5p wrote
Reply to [R] PyTorch | Budget GPU Benchmarking by zveroboy152
This benchmark is not representative of real models, making the comparison invalid. The model has ~5,000 parameters, while the smallest resnet (18) has 10 million parameters. You're essentially just comparing the overhead of PyTorch and CUDA, which isn't saying anything about the actual performance of the different GPUs.
learn-deeply t1_iyctgh0 wrote
Arc GPU only has 16GB, it would be worth giving it a shot if it had 24GB+ like the 3090/4090 does imo.
learn-deeply t1_iyb3tkk wrote
Reply to [D] I'm at NeurIPS, AMA by ThisIsMyStonerAcount
Which company has the best after-parties?
learn-deeply t1_iyb2qum wrote
Reply to comment by ThePerson654321 in [r] The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable - LessWrong by visarga
/r/machinelearning is more mainstream than LW and is less of a community. It's easy to bully weirdos.
learn-deeply t1_iyamfqj wrote
Reply to comment by mrconter1 in [r] The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable - LessWrong by visarga
LessWrong is a silly place where people take themselves too seriously, but its pretty cringy to have a subreddit dedicated to make fun of those people.
learn-deeply t1_ixtwsrv wrote
You'll have to try it out and see what works best. By the way, PyTorch has some beta support for mobile deployment: https://pytorch.org/mobile/home/
learn-deeply t1_ixew0do wrote
Reply to [R] Getting GPT-3 quality with a model 1000x smaller via distillation plus Snorkel by bradenjh
tl;dr a thinly disguised ad comparing a zero-shot model with a fine-tuned model, of course the fine tuned model is going to be better. the lack of intellectual honesty really encourages me to try snorkel
also, /u/bradenjh good job pretending that you have no affiliation with the company
learn-deeply t1_iwoglwu wrote
Reply to comment by marvelous_madness in [Research] Can we possibly get access to large language models (PaLM 540B, etc) like GPT-3 but no cost? by NLP2829
Yes, the post talks about research, not commercial use.
learn-deeply t1_je9eovt wrote
Reply to comment by ustainbolt in [D] Training a 65b LLaMA model by Business-Lead2679
Tensor (aka model parallel) parallel with model checkpointing works better than FSDP (though they can be used in conjunction) from my experience. FSDP is easier to work with though.