- Joined
- Aug 7, 2013
- Messages
- 1,342
Hi,
Does anyone have an idea or set of references for cognitive properties of regular languages?
I've got a problem here. Suppose you are given a finite number of strings from an infinite language, and after X amount of time, you are supposed to have recovered that language completely (i.e. you can build the appropriate FSA). Furthermore, you are never told directly what strings are not in the language, as your input is only those strings in the language.
Now is this problem solvable at all--can you come up with a learner which eventually converges on the infinite language given a finite amount of input and finite amount of time? Or do you need to have some assumptions about the language a priori, since there are an infinite number of FSAs which accept the input strings, yet possibly only a handful actually captures the generalizations of the actual target language. Remember your final FSA cannot accept strings which are not in the language (otherwise you have failed). But your FSA needs to also accept those strings which are in the language, but you never had as input (since it's infinite, you could not have seen every string).
Edit: If you believe the problem is not solvable without some clever inputs or a priori assumptions by the learner, please give a brief proof. Here's mine.
Since we can never sample every string in the language, it is possible there is a string which has symbols we have never seen. Therefore it is impossible for any FSA to capture it, since we never had those symbols added to our alphabet. Even if we did have all the symbols, we could have had very unlucky input, and never had samples of a certain pattern. If we were given a uniform sampling of all the strings, would these two issues still arise?
If we're assuming the Chomsky hierarchy of languages, would it make sense a learner would look first for regular patterns, and then only go to more powerful expressions (context-free, etc.) when evidence from the input suggests non-regular patterns?
What does it mean for a language to be regular in terms of cognition, as compared to say a context-free or context sensitive language? When would it be ideal to assume the input language is actually context-free or context-sensitive (what kind of evidence would you need?).
Does anyone have an idea or set of references for cognitive properties of regular languages?
I've got a problem here. Suppose you are given a finite number of strings from an infinite language, and after X amount of time, you are supposed to have recovered that language completely (i.e. you can build the appropriate FSA). Furthermore, you are never told directly what strings are not in the language, as your input is only those strings in the language.
Now is this problem solvable at all--can you come up with a learner which eventually converges on the infinite language given a finite amount of input and finite amount of time? Or do you need to have some assumptions about the language a priori, since there are an infinite number of FSAs which accept the input strings, yet possibly only a handful actually captures the generalizations of the actual target language. Remember your final FSA cannot accept strings which are not in the language (otherwise you have failed). But your FSA needs to also accept those strings which are in the language, but you never had as input (since it's infinite, you could not have seen every string).
Edit: If you believe the problem is not solvable without some clever inputs or a priori assumptions by the learner, please give a brief proof. Here's mine.
Since we can never sample every string in the language, it is possible there is a string which has symbols we have never seen. Therefore it is impossible for any FSA to capture it, since we never had those symbols added to our alphabet. Even if we did have all the symbols, we could have had very unlucky input, and never had samples of a certain pattern. If we were given a uniform sampling of all the strings, would these two issues still arise?
If we're assuming the Chomsky hierarchy of languages, would it make sense a learner would look first for regular patterns, and then only go to more powerful expressions (context-free, etc.) when evidence from the input suggests non-regular patterns?
What does it mean for a language to be regular in terms of cognition, as compared to say a context-free or context sensitive language? When would it be ideal to assume the input language is actually context-free or context-sensitive (what kind of evidence would you need?).