Acceptable-Cress-374

Acceptable-Cress-374 t1_j6yil6g wrote

> Their resources will always be larger, and they will keep accelerating faster on the exponential curve.

Sure, they'll have more money to throw at a problem, but also more incentive to throw that money into other money-making stuff. Open-source models might not necessarily go the same path, and even if under-trained or less-optimized, they might still be a tremendous help once a community gets to play with them.

1

Acceptable-Cress-374 t1_j6xgzoa wrote

> going to be taken for a ride by other bots

So.. bots are a thing? :)

What I'm trying to say is this: if being close to GTO is better than humans, your bot doesn't need to always play perfectly to not be detected. And if you say there's no GTO yet that means there's no standard yet.

To re-visit the chess analogy, in chess they compare each player's moves against top engines and come up with a score. Either centipawn loss or whatever else they do (chessdotcom doesn't comment on their measures, understandably so). What tools would a poker TO employ? Are there even such tools? And would your own bot even resemble that?

I'm still not convinced this is as easy as you said...

edit:

> It would also be super stupid as a human to try and play only GTO if you knowy ou play against other humans. While GTO guarantees that you - on average - don't lose it is by FAR inferior to looking for exploitative spots. Trying to play GTO-ish is the baseline you go back to when you don't know what to do - not the default strat as a player

Well ...

> Pluribus, a new AI bot we developed in collaboration with Carnegie Mellon University, has overcome this challenge and defeated elite human professional players in the most popular and widely played poker format in the world: six-player no-limit Texas Hold'em poker. Pluribus defeated pro players in both a “five AIs + one human player” format and a “one AI + five human players” format. If each chip was worth a dollar, Pluribus would have won an average of about $5 per hand and would have made about $1,000/hour playing against five human players. These results are considered a decisive margin of victory by poker professionals.

I don't have a quote handy, but I remember listening to a podcast with the creator of pluribus, and they didn't specifically code an "exploitative" strategy, AFAIK. Whatever their bot did, seemed to work tho... So not that stupid? :)

1

Acceptable-Cress-374 t1_j6xd7dx wrote

> You can look at extremely low frequency plays that hit exactly the right frequency where a human would use an always/never approach. If you see such plays in different spots then you can be fairly confident it's a bot

Do you do this against a median of other players, against GTO, or what?

And if you restrict your bot to ~3 bet-sizes and GTO + ICM for tournaments, how'd you detect that? It wouldn't necessarily be the best strategy, but it would probably get your bot in the money a majority of times. I've seen streamers playing 3-4 tables at once and playing pretty close to GTO with preset betting buttons as well. You'd detect those as bots as well?

What about making your own version of "spin the wheel" strategy where, depending on where you're at in the tournament ICM wise, you switch between strategies, adjust your opening hands, raising spots, etc. Sure you'd get away from Nash equilibrium, but you'd probably still rake in money.

The idea that you consider this easy to spot is pretty wild to me. I'd love to read some research in this area, if you have some sources on bot detection in online 6+ NLHE.

1

Acceptable-Cress-374 t1_j6wirsd wrote

Not really wanting to contradict you, but how would they do that? The mere idea of detecting a poker playing bot seems much more complicated than detecting chess bots, and they're still having trouble over there. How'd you go about detecting bot play in a game with imperfect information, high variance and a very large decision state?

1

Acceptable-Cress-374 t1_j6haw2s wrote

I tested this with text-davinci-03.

Prompt: Write a post about liking fruity icecream, in the style of a reddit post. Make some minor grammar mistakes while writing the post.

> hey everyone, so I'm totally loving this fruity icecream lately! I got it from a store near my house and its been so good. Its really sweet but not overly sickeningly sweet, like some other ice cream. Plus the fact that its fruity makes it even better in my opinion. Does anyone else like fruity icecream too? It'd be awesome to hear some of your thoughts on it!

This site gave me this:

85 %

HUMAN-GENERATED CONTENT

3

Acceptable-Cress-374 t1_j67w859 wrote

That was my first try. I went with the gut feeling that any training that they used for their model would assume bland prompts. I made mine different, and got 97% human generated the first try. Someone else mentioned other things that you could do, like mess around with temperature and such. Those work as well.

1

Acceptable-Cress-374 t1_j5ivlhe wrote

> You don’t want the thing talking to itself.

Heh, I was thinking about this the other day. Do you think there's a world where LLMs can become better by "self-play" a la AlphaZero? Would it converge to understandable language or would it diverge into babllbe-speak?

1

Acceptable-Cress-374 t1_j4wcavd wrote

I tested this with text-davinci-03.

Prompt: Write a post about liking fruity icecream, in the style of a reddit post. Make some minor grammar mistakes while writing the post.

> hey everyone, so I'm totally loving this fruity icecream lately! I got it from a store near my house and its been so good. Its really sweet but not overly sickeningly sweet, like some other ice cream. Plus the fact that its fruity makes it even better in my opinion. Does anyone else like fruity icecream too? It'd be awesome to hear some of your thoughts on it!

This site gave me this:

> Your text is likely human generated!

11

Acceptable-Cress-374 t1_j4w9ssn wrote

From the model's card:

> Direct Use

> The model is a classifier that can be used to detect text generated by GPT-2 models. However, it is strongly suggested not to use it as a ChatGPT detector for the purposes of making grave allegations of academic misconduct against undergraduates and others, as this model might give inaccurate results in the case of ChatGPT-generated input.

Just for lols, I tested this with text-davinci-03.

Prompt: Write a post about liking fruity icecream, in the style of a reddit post. Make some minor grammar mistakes while writing the post.

Response: hey everyone, so I'm totally loving this fruity icecream lately! I got it from a store near my house and its been so good. Its really sweet but not overly sickeningly sweet, like some other ice cream. Plus the fact that its fruity makes it even better in my opinion. Does anyone else like fruity icecream too? It'd be awesome to hear some of your thoughts on it!

The above detector: > Computation time on Intel Xeon 3rd Gen Scalable cpu: 0.090 s > > Real 0.984

22

Acceptable-Cress-374 t1_j4m7mee wrote

> Their current goal is to develop interfaces to gather data, and then train a model using RLHF

Potentially naive question, as I don't have much experience with LLMs. Has anyone tried using existing SotA (paid) models like davinci / gpt3 instead of RLHF? They seem to be pretty good at a bunch of focused tasks, especially in few-shot. Does that make sense?

2