- Joined
- Aug 7, 2013
- Messages
- 1,338
Hi,
What's the industry standard for censoring textual input that can be interpreted as language?
I imagine that it probably involves comparing the word against a list of profane/illegal words, but obviously that doesn't cut it, because people can write out such words in different characters, e.g. 0 instead of o, or use spaces or other arbitrary separators.
So I am guessing that they also include some heuristics or even use machine learning (i.e. spam vs non-spam classification).
Language censorship (for profane words) never came up once in 4 years of college, which leads me to guess it's considered very mundane / simple and not worth much research (the school I went had specialized department for natural language processing).
However it seems having this area underlooked leads to very dumb results, e.g. in the newest Pokemon I am told these names can't be used to name a Pokemon
"Cofagrigus"
and
"Sharpedo"
simply because each has a substring "fag" and "pedo", respectively.
Now I know Nintendo has hit rough times, but that's the kind of program you'd expect from a novice. It doesn't seem it even uses contextual information or even exceptions to the list. And this is 2014.
What's the industry standard for censoring textual input that can be interpreted as language?
I imagine that it probably involves comparing the word against a list of profane/illegal words, but obviously that doesn't cut it, because people can write out such words in different characters, e.g. 0 instead of o, or use spaces or other arbitrary separators.
So I am guessing that they also include some heuristics or even use machine learning (i.e. spam vs non-spam classification).
Language censorship (for profane words) never came up once in 4 years of college, which leads me to guess it's considered very mundane / simple and not worth much research (the school I went had specialized department for natural language processing).
However it seems having this area underlooked leads to very dumb results, e.g. in the newest Pokemon I am told these names can't be used to name a Pokemon
"Cofagrigus"
and
"Sharpedo"
simply because each has a substring "fag" and "pedo", respectively.
Now I know Nintendo has hit rough times, but that's the kind of program you'd expect from a novice. It doesn't seem it even uses contextual information or even exceptions to the list. And this is 2014.