Standard for word censoring?

sethmachine · Jul 22, 2014

Hi,

What's the industry standard for censoring textual input that can be interpreted as language?

I imagine that it probably involves comparing the word against a list of profane/illegal words, but obviously that doesn't cut it, because people can write out such words in different characters, e.g. 0 instead of o, or use spaces or other arbitrary separators.

So I am guessing that they also include some heuristics or even use machine learning (i.e. spam vs non-spam classification).

Language censorship (for profane words) never came up once in 4 years of college, which leads me to guess it's considered very mundane / simple and not worth much research (the school I went had specialized department for natural language processing).

However it seems having this area underlooked leads to very dumb results, e.g. in the newest Pokemon I am told these names can't be used to name a Pokemon

"Cofagrigus"

and

"Sharpedo"

simply because each has a substring "fag" and "pedo", respectively.

Now I know Nintendo has hit rough times, but that's the kind of program you'd expect from a novice. It doesn't seem it even uses contextual information or even exceptions to the list. And this is 2014.

noob · Jul 23, 2014

I remember of an forum who had an censor bot who censored 45s because he knew lithspeak and through 4 means A and 5 means S and so he put kitten instead of 45s (it is the origin of this unit of time)

Dr Super Good · Jul 23, 2014

Some form of AI is used for detecting obscured profanity. You detect how similar it is to a blocked word and if it is very similar (high probability) you block it. You probably can perform this on sequences without spaces with the right sort of computations for efficiency so you get a sort of profanity map where at each character you get a probability that profanity is nearby. Such filters are never perfect and usually they just block short sequences with manually added examples of obscured profanity, never completely perfect.

The fact such filters are even in games is damn stupid. It is not like these nasty words will leap out the monitor and start strangling you or something. People need to learn to grow up and get a backbone. Sure naming something like a Pokémon "dienigger" is certainly not nice or mature, but so what? No one gets hurt or should even be offended as this is just a game for crying out loud. Or if I wanted to make a SC2 map called "Destroy The USA" I should be allowed to, since it is not like the map will kill anyone in real life or that the maker intends to harm anyone (not like I would make this map lol). Yes the map may be extremely poor taste, but so are a lot of maps that are allowed.

The best example of this going to the extreme is that people cannot make Risk or WW2 maps in SC2 as putting any country name or nationality will block the map from being published due to their profanity filter.

What is really pathetic is Microsoft now classes user names that are similar to voice commands as profanity and bars you from using them. Why do they just not all give us a random number and be done with it as clearly they do not want us being creative with names.

Now I know Nintendo has hit rough times, but that's the kind of program you'd expect from a novice. It doesn't seem it even uses contextual information or even exceptions to the list. And this is 2014.

The game is made in Japanese, so you will probably find the filters work very badly in English.

Standard for word censoring?

sethmachine

sethmachine

noob

noob

Resources

Dr Super Good

Dr Super Good

Resources

Similar threads