|
Hey all, It's been quite a long time since I last played here. I don't know for sure whether I'll return to old activity levels eventually, but that's not what I'm posting about. What I'm here to talk about is the ability of computer programs to play forum mafia. The details are quite technical, but for everyone else I will spare that and get straight to the results.
What it can do
Joke's on you, the results are also technical. This computer program (actually three of them, working together), henceforth titled Scumbot, reads the filters of the players and labels them either town or mafia. Assuming 25% of the other players are mafia (as it is from the perspective of a townie in a standard 13-person game) and the bot accuses 25% of people of being mafia, the bot's success rates are as follows:
Working with half a page of filter: 76% of townreads are correct, 29% or scumreads are correct (64% accurate overall) With 1 page of filter: 77% on town, 33% on scum (66% overall) With 2 pages: 78% on town, 33% on scum (67% overall) With 4 pages: 78% on town, 34% on scum (67% overall) With 8 pages: 79% on town, 37% on scum (69% overall) All figures are accurate to within 1 percentage point. Accuracy metrics for more than 8 pages of filter are unreliable due to sample size issues.
Scumbot does not have godlike scumreading powers, but it is significantly better than blind guessing (which would have accuracy of 75% / 25% / 62.5%). I don't know exactly how it compares to real players, but research I did last year indicates that about 33% of votes land on mafia by deadline. I do not as of yet have statistics for how often Scumbot's #1 scumread in a game would be mafia, but comparing a computer's reads in a vacuum to human votes in a game situation is flawed anyway. In conclusion, Scumbot's reads are decent but not groundbreaking compared to human players. As I continue to refine it, its reads may improve slightly, but it's doubtful that its reads would improve by more than about 5 percentage points in scumread accuracy.
How it works
The initial part of this process was downloading mass amounts of filters. The dataset was the filters of all players in all games from January 2013 to December 2016 (huge thanks to kita for compiling the database, it was a life saver). The full dataset was about 1500 filters and 267000 posts. Next, all posts were edited with some strategic regexing to be more consistent. All words were eventually replaced by numbers to convert the data into a csv format. Posts shorter than 50 words were padded and posts longer than 50 words were truncated to make all posts the same length. (More accuracy could potentially be eked out by raising the limit, but it would take so long for my computer to process it that it's not worth it at this stage.)
The original plan was to use a recurrent neural network to evaluate every post, but differences are so small on a post-by-post level that this approach had little success. Instead, I converted each post into a row describing how often each word was used in the post. These rows were aggregated back into filters, so each row became the average number of times each word was used in the first 10, 20, 30, etc. posts of a given filter.
These aggregated filters were then fed into a simple neural network (one hidden layer with 8 neurons). Extensive tweaking and study led me to isolate fifteen "words" that were useful in predicting players' alignments. Using totals of those 15 words, the machine would train on the dataset to learn how to predict new alignments.
A machine's guide to mafia
According to Scumbot, some words make people mafia and some words make people town. I use "words" lightly, because it also includes pad characters, punctuation marks, and BBCode.
The scummiest things you can do is, in fact, ask a question. Question marks were consistently rated the best indicator that a player was scum. Other words that predict scumminess in a player are: even, like, day, what, the, and you. Semicolons are also scummy, but most of them occur due to incorrect parsing of images and gifs.
And it seems to hold true that you are what you post: The towniest word, according to Scumbot, is “town” itself. Further town-indicative words are they, probably, bad, and really. Also townie were periods and “pad” characters. (Pads meaning the post was shorter than 50 words--brevity is apparently pro-town. Or maybe spamming is, depending on how you look at it.)
Each of these “words” allows a small insight into a player’s alignment. All fifteen of them together, however, allow for a medium-sized insight. The power of teamwork.
The future of forum mafia
Robots are not going to take over TL Mafia and drive out the human players. For one thing, Scumbot could be rendered totally useless by spamming the word “town” 50 times at the start of every post. It is also not even guaranteed to be as reliable on future games of mafia; the available dataset from 2014-2016 is only an approximation of the actual meta.
Furthermore, nobody else has the bot. I don’t plan on using it in games. (Obs threads, though, are fair game.) The main point of the creation of a mafia bot is to prove it can be done, and to see how far it can go. And to learn coding skills.
That’s all I really had to say. I hope you learned something. Have a nice day. I’ll be here to answer questions or to discuss the ethics of using bots in mafia.
-TW
|
They probably bad town. Fuck it was there all along...
|
United Kingdom30774 Posts
Can we see some results of what it guessed for some games?
|
I would like to see in obs chat the scumbot prediction each eod. Could be fun.
|
Question marks were consistently rated the best indicator that a player was scum. Other words that predict scumminess in a player are: even, like, day, what, the, and you.
lol
So words everyone uses on a regular basis make people mafia.
Means that the bot is able to find correctly positives (can't miss mafia if everyone is mafia), but has a lot of false positives (ditto) and idk how it finds correctly negatives and false negatives.
It might be true for many that asking lots of questions is a way of getting by as scum, but then you need to find what makes people town and do another test on a false positive (for example) that checks if the person does town stuff and give back a probability from both outcomes.
Also you truncate where and pad posts with what?
|
And for the accuracy you should factor out the probability that it made a correct guess by chance. Interesting project though wouldn't mind some more insight as you work on it.
|
On July 20 2018 04:06 Vivax wrote:Show nested quote +Question marks were consistently rated the best indicator that a player was scum. Other words that predict scumminess in a player are: even, like, day, what, the, and you. Also you truncate where and pad posts with what?
Asking questions is for scum...scum!
|
On July 20 2018 04:06 Vivax wrote:Show nested quote +Question marks were consistently rated the best indicator that a player was scum. Other words that predict scumminess in a player are: even, like, day, what, the, and you. lol So words everyone uses on a regular basis make people mafia. Means that the bot is able to find correctly positives (can't miss mafia if everyone is mafia), but has a lot of false positives (ditto) and idk how it finds correctly negatives and false negatives. the bot's output is normalized such that it would predict 25% of players to be mafia. it could make predictions as probabilities, too (i.e., "player x is 40% mafia and player y is 35% mafia), but to avoid discussion of error metrics i listed the most scummy 25% of players as 100% mafia and the least scummy 75% as 100% town. this methodology also has its flaws but it is more intuitive to understand.
It might be true for many that asking lots of questions is a way of getting by as scum, but then you need to find what makes people town and do another test on a false positive (for example) that checks if the person does town stuff and give back a probability from both outcomes. note that in the OP some words that are also common everyday words were listed as town-indicative. also, i don't mean to give the impression that the words it looks for are hugely influential individually, or even all together. the way it works is more like this: assign a player with 1 question mark per post over 4 pages of filter a 30% chance of being scum and a player with 0.5 question marks per post a 23% chance of being scum, and then i make the same kind of adjustment for all fourteen other notably alignment-indicative words. (this is a bit of an oversimplification but more accurate to how the program works.)
Also you truncate where and pad posts with what? each word or special character was represented by a number, and then if the post was shorter than 50 words the program would add 0s until it was. (for example, a post that read "i am town." would be converted to 3,95,23,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.) a post longer than 50 words would do the same replacement procedure except it would stop reading words after the fiftieth. this would also mean there is no need for 0s to pad it out.
|
On July 20 2018 04:20 Vivax wrote: And for the accuracy you should factor out the probability that it made a correct guess by chance. Interesting project though wouldn't mind some more insight as you work on it.
the op said, if u were reading it, vivax: it is significantly better than blind guessing (which would have accuracy of 75% / 25% / 62.5%)
On July 20 2018 04:26 Ace wrote:Show nested quote +On July 20 2018 04:06 Vivax wrote:Question marks were consistently rated the best indicator that a player was scum. Other words that predict scumminess in a player are: even, like, day, what, the, and you. Also you truncate where and pad posts with what? Asking questions is for scum...scum! couldn't have said it better myself
|
On July 19 2018 20:07 Holyflare wrote: Can we see some results of what it guessed for some games? not yet. i could probably get results for games going forwards, but it is too much work to predict games as they were at a previous point in time (because it requires me to go back and see how many posts each player had made by a certain point in time, plus other difficulties mentioned below). although i can tell you that a previous incarnation of this neural network predicted that ms paint game from august and had 3 notable scumreads, 2 of which were correct.
On July 20 2018 02:51 Koshi wrote: I would like to see in obs chat the scumbot prediction each eod. Could be fun. yeah, that would be neat, and i do plan to do it for some games eventually. but as of now, to make predictions i have to reprocess every data point, not just the one filter i'm adding in. gimme a week and i'll probably be able to set up a pipeline to predict individual filters.
|
On July 20 2018 08:07 Tumblewood wrote:Show nested quote +On July 20 2018 04:20 Vivax wrote: And for the accuracy you should factor out the probability that it made a correct guess by chance. Interesting project though wouldn't mind some more insight as you work on it. Show nested quote +the op said, if u were reading it, vivax: it is significantly better than blind guessing (which would have accuracy of 75% / 25% / 62.5%) Show nested quote +On July 20 2018 04:26 Ace wrote:On July 20 2018 04:06 Vivax wrote:Question marks were consistently rated the best indicator that a player was scum. Other words that predict scumminess in a player are: even, like, day, what, the, and you. Also you truncate where and pad posts with what? Asking questions is for scum...scum! couldn't have said it better myself
Oi, don't be rude >=( I read your shit.
What I meant to express is that the difference your bot makes isn't big enough to be distinguished from mere lucky guesses as long as you don't analyze games with say, at least 100 players. (fictive number).
So you'd need something like chance that your bot guessed correctly cause of the algo being put in relation to the chance it just got a lucky guess in mathematical ways I don't want to think about off the top of my head.
|
interesting!
is it possible to bucket players who have larger data sets and figure out if your model can determine a unique scum vs town meta based on each person’s filters and what they ended up being using your heuristic for past games?
|
so instead of a broad “is this general player mafia,” could you apply your model to say “Vivax is n% mafia based on my model and his past games and word usage weights when he was scum vs when he was town”
|
On July 20 2018 08:32 Vivax wrote:Show nested quote +On July 20 2018 08:07 Tumblewood wrote:On July 20 2018 04:20 Vivax wrote: And for the accuracy you should factor out the probability that it made a correct guess by chance. Interesting project though wouldn't mind some more insight as you work on it. the op said, if u were reading it, vivax: it is significantly better than blind guessing (which would have accuracy of 75% / 25% / 62.5%) On July 20 2018 04:26 Ace wrote:On July 20 2018 04:06 Vivax wrote:Question marks were consistently rated the best indicator that a player was scum. Other words that predict scumminess in a player are: even, like, day, what, the, and you. Also you truncate where and pad posts with what? Asking questions is for scum...scum! couldn't have said it better myself Oi, don't be rude >=( I read your shit. What I meant to express is that the difference your bot makes isn't big enough to be distinguished from mere lucky guesses as long as you don't analyze games with say, at least 100 players. (fictive number). So you'd need something like chance that your bot guessed correctly cause of the algo being put in relation to the chance it just got a lucky guess in mathematical ways I don't want to think about off the top of my head. sorry for being rude, that was not my intention. but there is more i don't think i've sufficiently explained. at each level of information, i trained 10 neural networks using the same parameters and randomly split the data into training and testing data every time. if this method only succeeded based on lucky guessing, each build of the neural net could vary, but would be between 56% and 68% accurate in predicting mafia 95% of tests. the odds that the average accuracy after 10 tests on this size of dataset would be as extreme as it is at the lowest information level due purely to luck is about 0.02%. (at the highest information level it is virtually 0.) taking into account all builds of the neural network i did (hundreds in total, each with sample size between 200 and 1000), i am 100% certain that the results the neural network achieves is not due to random chance. on a single, normal-sized game, i agere that 62.5% accuracy and 69% accuracy are basically indistinguishable, but over the sample i used and repeating this procedure over and over they are distinguishable. i hope that clears things up.
|
On July 20 2018 23:27 Conversion wrote: interesting!
is it possible to bucket players who have larger data sets and figure out if your model can determine a unique scum vs town meta based on each person’s filters and what they ended up being using your heuristic for past games? On July 20 2018 23:28 Conversion wrote: so instead of a broad “is this general player mafia,” could you apply your model to say “Vivax is n% mafia based on my model and his past games and word usage weights when he was scum vs when he was town” sorta. an earlier version of this neural network did that and achieved even better resutls (up to 50% accuracy in calling players mafia for some players). players were grouped together based on what allowed the net to predict them most accurately. however, any individual player has such a small sample size that it is very easy to achieve misleading results by overfitting. grouping players together combats this issue, and i do not know for certain whether the model is valid. however, i can't trust the results of that model because it was effectively "cheating". it used a more rudimentary way of gathering filters, so instead of looking at a player's first X posts, it was only able to examine a player's entire filter. regular non-time-traveling mafia players do not have such an advantage because all posts have not been created until the very end of the game, so it is not proven to be more successful. eventually i will probably update the model so it can do that, but that will take time. stay posted i guess
|
United Kingdom30774 Posts
I'd say my data set is pretty large.
|
kitaman27
United States9244 Posts
That's really cool. Nice work Tumble!
A long time ago I remember someone from another site posting their research on analyzing patterns in posts in respect to alignment, though I don't remember what methods they used or if any significant conclusions were made.
Did you do anything special to parse out text from quotes? I imagine a lot of the text in a player's filter is probably made up of quotes from other players.
If you don't mind sharing, it would be awesome to play around with the dataset if it's available. ^_^
|
Tumble, if you are interested in taking this forward (or getting the best feedback afaik), send a PM to Xatalos. He likes this kinda stuff!!
|
I haven’t been doing much data analysis on TL Mafia (more like web tools/apps and the like), but a friend of mine attempted something similar to this some time ago. I linked this thread to him in case he’s interested.
His work didn’t come to a conclusion, unfortunately. I think the only real result was that townies post quite a bit more frequently on average than scum, but I doubt that’s a surprise to anyone! The thing about questions being scummy (in the OP) was a bit interesting, though.
|
I'm working on data analysis with a focus on time series. I'm unfamiliar with natural language processing but I understand how this is actually very challenging, especially if you're looking at the frequency of words as the core of your approach (as you said, it can be gamed), is this the convention?
Currently the numbers aren't mindblowing but I'd be very interested if you could make this work!
How is your training set prepared? Do you label them as town/mafia only or are there other variables?
Have you considered something like what we do in actual forum mafia, which is that we classify by player in addition to faction?
PS: Hi Ray and Xatalos! It's been ages since our Diplomacy game :D
|
|
|
|