euler: THe problem of existing Chinese-word algorithm

2007年3月2日星期五

THe problem of existing Chinese-word algorithm

In Chinese language, there exist a lot of words that composite of more than one character, and their meaning can't be easily guess from individual character.(basic semantic units) So in Chinese input method, there is an algorithmic that automatically locate the next character by the character you input. Presumably that would save the user's time in inputting Chinese.
For instance, if you enter B, then a choice would pop-up with BA, BAC, BDD, BFE... etc. The user only need to pick one of many option, therefore save user's time by preventing the need to enter the whole word(For example, BAC).

The problem of this algorithmic is that is it is using only First character base system, and it has no memory. It would assume every character you enter is the first character of the word, and list the choice of word accordingly. For instance, assume BAC is a word, then the input of BA would not prompt the option BAC. And instead it list the choice of all words start with A. So the challenge to remake the algorithmic sensitive to more than one character, that could theoretically done by a function to mark the end of last complete semantic unit, and store any character afterward in the memory, then use this value in memory to search for words starting with these character.

We could further extend this idea to the case of people remember only part of the word, like A?C?. Once the user type ?, the program would be on-hold for automatically guess the character; only after the user mark the end of word( i.e. the last ?) would the program start locating for the word match this combination.

My suggestion could be implemented using any MySQL like database. And all words are enter in the dictionary. The point is so that everyone with lower literacy level could easily navigate the Internet.

沒有留言:

張貼留言

2007年3月2日星期五

THe problem of existing Chinese-word algorithm

沒有留言:

euler

網誌存檔

標籤

關於我自己

追蹤者

2007年3月2日 星期五

THe problem of existing Chinese-word algorithm

沒有留言:

euler

網誌存檔

標籤

關於我自己

追蹤者

2007年3月2日星期五