Ai box experiment transcript

9/28/2023

It feels quite disrespectful to Eliezer, who (I believe) popularised the experiment on the internet today, to violate the rule that the result should not be shared. Nonetheless, the idea of publishing the results is certainly a mixed bag. Let me begin by saying that I have the full and explicit consent of my Gatekeeper to publish this account. Exceptions to this rule may occur only with the consent of both parties, but especially with the consent of the AI.” This is a hard rule: Nothing that will happen inside the experiment can be told to the public, absolutely nothing. “Regardless of the result, neither party shall ever reveal anything of what goes on within the AI-Box experiment except the outcome. Unsurprisingly, I think the benefits of publishing outweigh the disadvantages. While I know of no others (except this comment thread by a now-anonymous user), I am sure there must be other successes.įor the record, mine was with the Tuxedage ruleset: LessWrong’s Tuxedage is amongst those who managed: Regardless whether out of puzzlement, scepticism or a burst of ambition, it prompted others to try and replicate the success. Some have outright doubted the Eliezer ever won the experiment and that his Gatekeeper (the party tasked with not letting him out of the box) had perhaps simply been convinced on a meta-level that an AI success would help boost exposure to the problem of AI risk. That stunned quite a few people-moreso because Eliezer refused to disclose his methods. The AI Box Experiment was devised as a way to put a common rebuttal to AGI (Artificial General Intelligence) risk concerns to the test: “We could just keep the AI in a box and purely let it answer any questions its posed.” (As a footnote, note that an AI ‘boxed’ like this is called an Oracle AI.)Ĭould we, really? Would we, if the AGI were able to communicate with us, truly be capable of keeping it confined to its box? If it is sufficiently intelligent, could it not perhaps argue its way out of the box?Īs far as I’m aware, Eliezer Yudkowsky was the first person to prove that it was possible to ‘argue one’s way out of the box’ armed only with so much as a regular human intelligence (as opposed to a transhuman intelligence): This post is going to be a bit on the long side, so I’m putting a table of contents here so you know roughly how far to scroll if you want to get to the meat of things: Mind you, that’s not to say you should be impressed-that’s to contrast it with a reaction some other people have to this information. Most likely, you have no idea what I’m talking about, so you’re not particularly impressed. There are a few possible reactions to this revelation. How To Win The AI Box Experiment (Sometimes)Ī little over three months ago, something interesting happened to me: I took it upon myself to play the AI Box Experiment as an AI. I have thus far definitely erred on the side of the rug. It was originally written a little over a month ago and I’ve tried to find the sweet spot between the extremes of nagging people about it and letting the whole thing sit just shy of having been swept under a rug, but I suspect I’ve not been very good at that. Since this topic directly relates to LessWrong and most people likely interested in the post are part of this community, I feel it belongs here.

I’ve already edited the new insight into the G+ post and you can also find that exact same edit here. Since there’s far too many ‘person X said.’ rumours floating around in general, I’m very sorry for contributing to that noise. In the interest of transparency, I haven’t altered it except for this preamble and formatting, though since then (at urging mostly of ChristianKl-thank you, Christian!) I’ve briefly spoken to Eliezer via e-mail and noticed that I’d drawn a very incorrect conclusion about his opinions when I thought he’d be opposed to publishing the account.

This post was originally written for Google+ and thus a different audience.

How To Win The AI Box Experiment (Sometimes).

0 Comments

Ai box experiment transcript

Leave a Reply.

Author

Archives

Categories