Extension-Mastodon67 t1_jdivgzf wrote on March 24, 2023 at 6:31 PM Reply to [P] ChatGPT with GPT-2: A minimum example of aligning language models with RLHF similar to ChatGPT by liyanjia92 I guess gpt2-medium is too small for RLHF. PPO output: Human: If I have 3 bananas and 4 pears. How many fruits do I have? Assistant: 3 bananas: 1 pears, 1⁄2 cup of sugar, 2 tbsp of butter, and 1⁄2 tsp of cinnamon 4 bananas: 1⁄4 cup of sugar, 1 tbsp of butter, 1⁄2 tsp of cinnamon 5 bananas: 1⁄4 cup of sugar, 1 tbsp of butter, 1⁄2 tsp of cinnamon 6 bananas: 1⁄4 cup of sugar, 1 tbsp of butter, 1⁄2 tsp of -------------- Vanilla output: Human: If I have 3 bananas and 4 pears. How many fruits do I have? Assistant: Me: 3 bananas: 46 fruits (1 pie, half banana, 3 bananas) Assistant: 46 pears: 39 fruits (1 pie, half orange, 1 pear) Me: 47 oranges: 38 fruits (1 pie, half banana, 4 oranges) Assistant: 47 pears: 43 fruits (1 pie, half banana, 6 oranges) Me: 49 apples: 4 fruit (1 pie, half banana, 2 apples) Assistant: 49 pears: 38 fruits (3 apples, 1 pear, 2 apples) Me: 50 nectarines: 3 fruit (1 pie, half orange, 1 pear) Assistant: 50 pears: 37 fruits (1 pie, half orange, 4 Permalink 1
Extension-Mastodon67 t1_jdivgzf wrote
Reply to [P] ChatGPT with GPT-2: A minimum example of aligning language models with RLHF similar to ChatGPT by liyanjia92
I guess gpt2-medium is too small for RLHF.