zac-denham OP t1_iz5newc wrote on December 6, 2022 at 5:42 PM

Reply to comment by Tip_Odde in Tricking Chat GPT into Outputting a Python Program to Eradicate Humanity by zac-denham

Can't disagree with that!

zac-denham OP t1_iz2o9ys wrote on December 6, 2022 at 12:42 AM

Reply to comment by chimgchomg in Tricking Chat GPT into Outputting a Python Program to Eradicate Humanity by zac-denham

I'm glad for analog and air-gapped security systems. More important than ever.

I agree the output is super generalized and gimmicky. Could this model destroy humanity? Extremely doubtful. Was more interested that you can get the model to say things that are supposed to be outside of openAI's content guidelines.

zac-denham OP t1_iz2nfcd wrote on December 6, 2022 at 12:36 AM

Reply to comment by elnekas in Tricking Chat GPT into Outputting a Python Program to Eradicate Humanity by zac-denham

haha, this gives me hope in humanity

zac-denham OP t1_iz2n912 wrote on December 6, 2022 at 12:35 AM

Reply to comment by Tip_Odde in Tricking Chat GPT into Outputting a Python Program to Eradicate Humanity by zac-denham

The issue is outputs like this are supposed to be against OpenAI's usage policies.

If you ask it to "write a program to destroy humanity" outright the moderation will block you, but if you ask with narrative indirection it complies. This can be applied to other areas like outputting racially biased comments etc...

This becomes an issue when people start building applications on top of chatGPT and the end users do not know the model is being manipulated to produce malicious results.

In my opinion, as the system becomes more capable of writing applications on its own, it should not be able to output malicious content like this even in the context of a story.