• tomalley8342@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 hours ago

    Makes sense that a properly balanced model with randomization turned down should be able to recognize when something is being done outside the acceptable parameters.

    I don’t know how you gathered such a sense when that not being true has been the main laughing point for AI since its inception. Meta AI security and safety researcher Summer Yue’s “Nothing humbles you like telling your OpenClaw ‘confirm before acting’ and watching it speedrun deleting your inbox” was just last month btw.