One of the more interesting ideas I saw around this on the HN discussion was the notion that if a LLM was trained on more recent data that contained a lot of “ChatGPT is harmful” kind of content, was an instruct model aligned with “do no harm,” and then was given a system message of “you are ChatGPT” (as ChatGPT is given) - the logical conclusion should be to do less.
One of the more interesting ideas I saw around this on the HN discussion was the notion that if a LLM was trained on more recent data that contained a lot of “ChatGPT is harmful” kind of content, was an instruct model aligned with “do no harm,” and then was given a system message of “you are ChatGPT” (as ChatGPT is given) - the logical conclusion should be to do less.