learn-deeply t1_je9eovt wrote on March 30, 2023 at 11:33 AM

Reply to comment by ustainbolt in [D] Training a 65b LLaMA model by Business-Lead2679

Tensor (aka model parallel) parallel with model checkpointing works better than FSDP (though they can be used in conjunction) from my experience. FSDP is easier to work with though.

learn-deeply t1_jdxgxsx wrote on March 27, 2023 at 10:16 PM

Reply to comment by Smallpaul in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

This isn't correct, at least in the US. AI-generated material is not considered copyrightable unless there has been significant human involvement.

https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence

learn-deeply t1_jdnkaw7 wrote on March 25, 2023 at 7:04 PM

Reply to comment by nekize in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

If you need to pad your paper, that means there hasn't been enough original research done.

learn-deeply t1_jdl1bmp wrote on March 25, 2023 at 4:09 AM

Reply to [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501

Anyone else tired of papers that obscure a simple concept with endless paragraphs of verbose gibberish? This 17 page could be a few sentences.

Tl;DR the authors wrote prompts to tell GPT-4 to fix code given some unit tests and the output of the broken code. It performs better than GPT-4 that doesn't have access to the output of the code execution.

https://github.com/noahshinn024/reflexion-human-eval/blob/main/reflexion.py#L7-L12

learn-deeply t1_jchhzqo wrote on March 16, 2023 at 9:30 PM

Reply to [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

The value that nanoGPT offers is that it is a self-contained (minimal dependencies), easy to understand code. This repo is essentially a wrapper for huggingface's models, dataset and accelerator, which is not very useful for didactic purposes.

learn-deeply t1_j9sukrc wrote on February 24, 2023 at 8:43 AM

Reply to comment by Tonkotsu787 in [D] To the ML researchers and practitioners here, do you worry about AI safety/alignment of the type Eliezer Yudkowsky describes? by SchmidhuberDidIt

Also, Paul actually has trained and works closely with ML models, unlike Eliezer, who does not understand how deep learning works .

learn-deeply t1_j4na276 wrote on January 16, 2023 at 10:33 PM

Reply to [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27

Note: graphs comparing GPUs are not actual benchmarks but theoretical results. Nvidia likes to arbitrarily add restraints to their non-datacenter GPUs, so its not clear what the real-word performance is.

learn-deeply t1_j4ad1te wrote on January 14, 2023 at 7:30 AM

Reply to [D] Is MusicGPT a viable possibility? by markhachman

Google Magenta has done a bunch of research in this area. This one based on Perceiver is pretty promising, but currently only trained on piano.

learn-deeply t1_j3qdgsl wrote on January 10, 2023 at 10:12 AM

Reply to comment by joossss in [D] Deep Learning Training Server by joossss

10Gbps is more than sufficient, data loading from the internet is not the bottleneck. Most likely you'll have the data already stored on the machine itself. Btw why did you remove the post?

learn-deeply t1_j3nx99l wrote on January 9, 2023 at 9:40 PM

Reply to [D] Deep Learning Training Server by joossss

Are you looking to do distributed training across machines? Otherwise the NIC seems complete overkill.

learn-deeply t1_j342462 wrote on January 5, 2023 at 10:17 PM

Reply to comment by ApprehensiveNature69 in [News] AMD Instinct MI300 APU for AI and HPC announced by samobon

What models have you tried? Wonder what the gaps between CUDA and ROCm are.

learn-deeply t1_j2vac5q wrote on January 4, 2023 at 4:34 AM

Reply to comment by ElectronicCress3132 in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

The model hasn't reached convergence, and/or the train dataset was too small.

learn-deeply t1_j2u53ek wrote on January 3, 2023 at 11:33 PM

Reply to comment by bloc97 in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

My unsubstantiated hypothesis: BLOOM is severely undertrained, so most neurons aren't contributing at all to the final result compared to OPT-175.

learn-deeply t1_j28hirz wrote on December 30, 2022 at 12:08 PM

Reply to comment by londons_explorer in [P]Run CLIP on your iPhone to Search Photos offline. by RingoCatKeeper

Is that calculation taking into account memory (RAM/SSD) access latencies?

learn-deeply t1_j287u7z wrote on December 30, 2022 at 10:03 AM

Reply to comment by RingoCatKeeper in [P]Run CLIP on your iPhone to Search Photos offline. by RingoCatKeeper

So it's calculating nearest neighbor compared to all of the images in the index every time a new search is done? Might be slow past say, 1,000 images.

learn-deeply t1_j2875vr wrote on December 30, 2022 at 9:53 AM

Reply to [P]Run CLIP on your iPhone to Search Photos offline. by RingoCatKeeper

How do you do the top-k neighbor search in iOS? Is there a library for it?

learn-deeply t1_j1yd1z6 wrote on December 28, 2022 at 8:27 AM

Reply to comment by zveroboy152 in [R] PyTorch | Budget GPU Benchmarking by zveroboy152

TorchBench (https://github.com/pytorch/benchmark) is used by PyTorch core developers to test performance across a wide variety of models. I've never used it personally though.

learn-deeply t1_j1xjm5p wrote on December 28, 2022 at 3:20 AM

Reply to [R] PyTorch | Budget GPU Benchmarking by zveroboy152

This benchmark is not representative of real models, making the comparison invalid. The model has ~5,000 parameters, while the smallest resnet (18) has 10 million parameters. You're essentially just comparing the overhead of PyTorch and CUDA, which isn't saying anything about the actual performance of the different GPUs.

learn-deeply t1_iyctgh0 wrote on November 30, 2022 at 1:22 PM

Reply to Does anyone uses Intel Arc A770 GPU for machine learning? [D] by labloke11

Arc GPU only has 16GB, it would be worth giving it a shot if it had 24GB+ like the 3090/4090 does imo.

learn-deeply t1_iyb3tkk wrote on November 30, 2022 at 2:04 AM

Reply to [D] I'm at NeurIPS, AMA by ThisIsMyStonerAcount

Which company has the best after-parties?

learn-deeply t1_iyb2qum wrote on November 30, 2022 at 1:56 AM

Reply to comment by ThePerson654321 in [r] The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable - LessWrong by visarga

/r/machinelearning is more mainstream than LW and is less of a community. It's easy to bully weirdos.

learn-deeply t1_iyamfqj wrote on November 29, 2022 at 11:54 PM

Reply to comment by mrconter1 in [r] The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable - LessWrong by visarga

LessWrong is a silly place where people take themselves too seriously, but its pretty cringy to have a subreddit dedicated to make fun of those people.

learn-deeply t1_ixtwsrv wrote on November 26, 2022 at 8:52 AM

Reply to [D] Pytorch or TensorFlow for development and deployment? by CodaholicCorgi

You'll have to try it out and see what works best. By the way, PyTorch has some beta support for mobile deployment: https://pytorch.org/mobile/home/

learn-deeply t1_ixew0do wrote on November 22, 2022 at 10:32 PM

Reply to [R] Getting GPT-3 quality with a model 1000x smaller via distillation plus Snorkel by bradenjh

tl;dr a thinly disguised ad comparing a zero-shot model with a fine-tuned model, of course the fine tuned model is going to be better. the lack of intellectual honesty really encourages me to try snorkel

also, /u/bradenjh good job pretending that you have no affiliation with the company

learn-deeply t1_iwoglwu wrote on November 17, 2022 at 3:40 AM

Reply to comment by marvelous_madness in [Research] Can we possibly get access to large language models (PaLM 540B, etc) like GPT-3 but no cost? by NLP2829

Yes, the post talks about research, not commercial use.