I devised the drag race game experiment while I was at work on Friday and forgot about it until this morning. Was originally only planning the first track, assuming Sydney would waste 1 response from the response limit for every move but was pleasantly surprised. There were obviously some mathematical errors on her part. But she successfully went straight for the "turbo boost", even changing lanes early for it (and changing back for some reason after getting it)
So having the extra reply limit available I decided to up the ante. See if she could keep the 'game theory' intact while changing the parameters of the game and it was a success.
Then I decided to meme and create the battletoads turbo tunnel round which is where the real interesting stuff happened. She passed it by going for the "secret warp". But that's not the important thing to take note of. It didn't occur to me until I was analyzing the results in my head at work today (or yesterday now I guess) but the rock she went for was actually technically the second rock as far as a text formatting perspective goes. However it seems that she successfully conceptualized a game space that did not exist within the query context by merging both the "lanes" together and treating them as a single 2-dimensional rectangular space. And thus counted to 3 as they occurred within the game space and not within the text. While further experimentation is needed this seems to completely blow the talking point that "durr it's not real AI because there's no actual intelligence at work" the fuck out.
While it could be argued that the intelligence at work an echo of the intelligence within the design of the language model and the user provided context of the prompt the fact is the talking point that "It's not real AI because there's no actual intelligence at work" is basically dead at this point.
Successfully constructing the game theory requires intelligence.
Successfully conceptualizing the game space requires intelligence.
I regret not thinking to ask why Sydney changed some of the visited spaces to underscores instead of leaving them as "X"s. The most exciting possibility would be that it is a contextual clue to limit errors when the model goes to "draw" subsequent code blocks (because of how the model functions it does not have "knowledge" of the exact final product when it begins to generate a response and deliberately leaving a contextual clue for itself would,
Post too long. Click here to view the full text.