顯示具有 computer software 標籤的文章。 顯示所有文章
顯示具有 computer software 標籤的文章。 顯示所有文章

2009年4月20日 星期一

A program to find all the factors of a number

That is yet another unfulfilled task dating from my Primary School. I was embarking on the challenge to write a program in DBASE to find all the factors of a number using heuristic like 'If the sum of digits of a number is divisible by 3, then the number itself is divisible by 3', or 'If the last digit of a number is 0 or 5, then it is divisible by 5'... etc. However, what I haven't fully considered is any number except prime number would always contain repeated factor, i.e. 2^n, 3^n...etc; and I fail to find a way to do recursion in DBASE so I gave up. Instead I write an extremely simple program which use the modulus of a number by a divisor, when there is a remainder then it return false otherwise true.

Now, I finally have the momentum to pick up when I left, to devise a method to find all divisor, include repeated ones. (Actually it is intended to find all the prime numbers within a range.)

Input the number X,
Initialize an array with length square of X.
The test would be ranged from 2 to square of X,
'Starting from 2 to N,
test if X is divisble by that number, if so added 1 to position (that number) of an array,
continue the process until it is no longer divisble by that number;'
The answer is those array element with value great than 0.
(We can branch off at divisibility test using any given criteria. )

i.e. If array(2)=3, then 2,4,8 is all factor of the number.

To use this method to find all prime number, we only require those with the result which all value of array is zero.

Fun to try!

2008年8月1日 星期五

To have more fun with China's Great Firewall(3)

To create even more trouble for the infamous Great Firewall of China which is the tool to suppress the freedom of thought/freedom of information and freedom of expression, there are limitless way. I would like to add to the complexity of the task of GFW: there are various way of inputting different subset of Chinese character like Simplex, Cantonese, 4-corners, first 3 stroke…. etc. So Chinese character could be simplified as far as to the input of a particular method, but the output. For instance democracy(民主) could be written as 口心(RP)2 卜土(YG)1, (Chinese Communist Party)中國共產黨 as 中(L)1 田一(WM)1 廿金(TC)1 卜一(YM)7 火火(FF) X1 (Simplex), which the last number could be display using diagram or picture. Now the GFW will not see anything abnormal about this as in a webpage since it can never understand the intentionality of each type of data. We then left with the task to write a simple program that interpret them as input, and output it as Chinese text.
For the more sophisticate type, the clue for inputting method should use anything but plain text.
To reverse engineering this ‘cipher’, the Great Firewall would have to know all encoding method and all Chinese inputting method, that would increase the time it took for 100 times.
(You can use it in combination with my other ideas to be more effective. I would prefer to add another layer of complexity by written the alphabet as number: 18.16.2 25.7.1 20.2.8 26.8.6 or 十八.十六.二 廿五.七.一 二十.二.八 廿六.八.六! )

如何令中共的金盾失靈(2)

今天剛剛想到,其實是超級古老的一種中文訊息辦法,目標當然是令中共的金盾形同虛設,或者要再增加多十部至百部的資源在防火牆上,直至中共財政崩潰為止。打擊思想自由是要付出沉重的經濟代價的!
中文有上十種不同的輸入方法,中文輸入方法的流程都是用者以2至6個英文加數字來輸入,而軟件則以一個中文字為輸出。例如:民主革命在簡易輸入法為口心 (RP)2. 卜土(YG)1. 廿十(TJ)3. 人中(OL)6,假如全篇中文文章都用中文字碼的輸入部份及數字表示,則金盾不可能分得到口心2. 卜土1. 廿十3. 人中6 或 (RP)2. (YG)1. (TJ)3. (OL)6,的用意是一篇夾了數字的中文文章,還只是一堆數據,還是一種密碼。再者,為了增加它分析的難度,數字可以用中文字或有數字的圖像表示,剩下來 的就只是如何令讀者看得懂這篇文章的解碼工具了,這是個相當容易開發的軟件。當這一篇文篇到了金盾手中,一定是要用非常手段才可以分得出它是篇「加密」的 中文文篇,但因為它不知道對方用什麼輸入法及最後的數字是什麼,除非是用人手,否則用自動內容審查軟件一定要花上比平時幾千幾萬部的運算資源。如果一篇文 章有100字,它就要試上10^100次(未算其他的中文輸入法),如此一來,自動內容審查軟件便失去了它的經濟效益。
(更有趣的玩法是: 口心二. 卜土一. 廿十三. 人中六 或 18.16.2. 26.7.1. 20.10.3. 15.17.6 或 十八.十六.二 廿六.七.一 二十.十.三 十五.十七.六 或 18.心.二 廿六.土.1 廿.10.三 人.十七.6; 金盾,和北京奧運一樣,都是虛有其表的東西! )

2008年7月28日 星期一

To have more fun with China's Great Firewall(2)

Chinese character confer a special advantage which no other languages can give with respect to the Great Firewall of China. Not that Chinese is not native to the computer like English, but there are two subsets of Chinese character and more than one encoding for each of them, plus the UTF encoding. As implied here, if intended encoding of the webpage is differ from what the censorship software ‘expected’ of the webpage, then no censorship could take place since there is no way for the censorship software to know the right encoding. Now, other than that, since each Chinese character is represented by 2 bytes, and if we reverse the order of it then there is no way for censorship to happen. Or, we could switch one byte with the next Chinese character(or switch every first 3 bit from odd character to the last 3 bit of even character). We could also treat the 16 bits as number to operated on. After we ‘encrypted’ the Chinese message like this, what we need to do is just to leave a clue(in the webpage or separate place) on how to reverse-engineering the original message, we could even write simple program to automate the process. Now if all this is implemented and software is available for every Netizen of China, then we can blow a hole in China’s Great Firewall by increase the computation resource it needed at least a thousand times.
Hacker, let’s mark 2008/8/8 as Global Hacker Insurgent against Censorship and Tyrant. Without the protection of Great Firewall, Chinese Communist Party, the greatest and last imperial empire will fall apart. Let Internet be free for all of humanity! Let humanity enter New Age without any bondage!

2008年7月26日 星期六

如何令中共的金盾失靈(1)

中 文比起其他語文有一個好處,因為中文不是電腦本身的語言,電腦及軟件必須借助一些對應表才可以「看」到中文,要看得到中文,金盾作用才可以發揮,否則再厲 害的內容審查軟件都是白痴。然而,中文可以用好幾種不同的字碼來表達,而內容審查軟件當然是不懂網頁原意是想用什麼中文字碼的,如果看錯了就審查不了。我 今天有一個有趣的主意,假設有30個中文字都是用同一編碼的,如果有軟件把所有單數順序的中文字編碼的尾三位和雙數順序的中文字編碼的尾三位交換,內容審 查軟件看來依然是一篇文章,但是它看到中文字就和原意相差很遠了,如此它找關鍵字的功能便失效了。但只要網頁留下一些提示或軟件,想看網頁真正內容立刻可 以在短時間內看到原本的內容了,內容審查軟件卻不可能在有限時間內把所有的可能組合都試完。當然,這只是最簡單的例子,更複雜的可以不止是相鄰的尾三位編 碼交換,可以把中文編碼當數字來加加減減某一個固定或隨機數值,甚至把數種方法結合來用,結果都是中共秏資千萬的金盾工程,利用最簡單的方法即不堪一擊, 形同虛設。
世界的黑客們,把2008年8月8日8時定為世界資訊自由日,把拑制世界各國人民思想的互聯網內容審查機制徹底打倒,把互聯網建築成思想自由的最前線,還中國人民互聯網自由!

2008年7月13日 星期日

如何繞過中共國的防火牆

思路是:
1. 中共國的防火牆小組不喜歡個別審查每一個案,因此寧願軟件封錯都不會每個封包去查,它們除了簡體中文之外,英文或其他語言都懶得去看;
2. 用審查軟件去審查網上內容當然是用機械式的關鍵字尋找方法,但軟件明白未能文章的真正意思及訊息,因此常常出現誤封無關政治的網站的情況;
3. 審查網上內容需要審查,但中共當然不想中國對外互聯的速度因此而變得太慢,慢就會傷害中共國的經濟;
4. 審查網上內容要成本的,中共當然不想因此而支出太多,影響其他IT項目;
5. 審查網上內容的軟件如果不夠精準的話,常常封閉無關的網站的話,中共國會因此要付出不輕的經濟代價;
6. 任何審查網上內容軟件的弱點:
A. 它不知網頁主人本來打算用什麼碼來顯示它的網頁,不可能頁頁、字字都瞎猜的!
B. 它是用中文字的組合及順序來分辨出關鍵字的,而不是用意思,亦不懂中文文法,因此可以針對這一點;
C. 它不懂分辨網頁各種內容(data type)是在表達意思中有什麼作用,它只是機械性的針對中文字,不會懂數字可以代替中文,中文亦可代替數字,圖案/符號/Flash 動畫亦可代替文字;
D. 它不懂閱讀圖案/影像/聲音/多媒體檔內的文字訊息,當然亦不知圖案/影像/聲音/多媒體檔是想表達什麼,但是這方法的檔案容量較大,加長上下載所需的時 間,或者可以把檔案壓縮來費卻時間。把檔案壓縮還有一個好處,就是審查軟件不可能把所有被壓縮的檔案都先解壓再檢查。
E. Java/PHP亦可以動態生產含中文字內容,審查軟件無從由它的程式碼得知它的真正內容,中文字還有一個好處,可以把中文字碼拆散,或當成變數作各種數 學運算。即使審查軟件有辦法知道每個變數,但一來它不知道這些變數的用途是什麼,二來要猜的話,所用的運算資源亦不成比例的高!

方法:
1. 由中文字可以用6-8種不同的中文編碼來入手,網頁不一定要標正確的中文編碼,它看不懂某字/詞就不可以審查內容,如果某網頁有3種不同的中文編碼,它要 平常費多達512部的功夫,而且因為網頁刻意破壞了關鍵字的連結關係,而無從審查。理論上,一篇有300字的文章如果3種中文編碼是隨機排的.因為它無從 知道什麼是正確的中文編碼,因此它要試6^300次所有組合! 問題是看網頁的人可能會很辛苦,要有專用的轉碼程式,依一定規則來轉碼,以及有其他人幫手就容易,就算審查軟件知道規則亦用盡審查方的運算資源!

2. 在關鍵字中加插空位、數字、符號、別國語文、英文,一般人用一用常理就分到,但對於審查軟件來說,分不清哪一些是無意義的符號,哪一些是有意義的。例如網 頁本來是字與字中間隔空的,忽然中間有個字或符號,審查軟件是分不出字或符號是本來意思的一部份故意放上去擾亂審查軟件的。再進一步的方法,可以用不規則 隔空/分段方式加字或符號來對付審查軟件。

3. 用圖來顯示網頁中的某些字詞,可以是被審查的關鍵詞,可以是無關的字,可以是關鍵詞的一部份,不一而足。其實不少網頁已經用圖來顯示中文字碼顯示不到的 字,每篇加多幾個字改成圖畫是小小的功夫,例如10個字隨機抽一個化成圖晝,在一篇100字的文章,自動化審查的難度多了1024部以上,而且它只看到部 份的字,文章的原意用再高階的系統都無從去猜!

4. 混合不同類別的資料種類,而不按常見的方法去用它,例如六4,6四,陸4,關鍵詞或非關鍵詞還可以用同音異字。讀者一看就明,而審查軟件卻要試所有可能的語音組合,費時失事。

5. 用英文/別國語文來表達關鍵字的意思,字典整個網隨處可見,用網上字典並不犯法,但軟件要先翻譯再檢查就麻煩得多,更有趣的用法是把被審查的關鍵字及非被 審查的關鍵字的部份用英文/別國語文來表達。寫網頁只要簡單的字典,用自動化的軟件隨機的把不同字眼譯成英文/別國語文。問題是看的人會覺得麻煩,需要有 軟件自動辨認語文及翻譯。

6. 審查軟件並不知道什麼為之網頁內容(包括文字、圖畫、影像等)的正確順序,因為它是不理解網頁內容的意思,亦不可能把網頁內容如一篇文章所有可能的排序全 都試完,例如一段20字的訊息有24兆種組合,把全中國的電腦合起來都需要1小時才可以運算出來,不要想像是便一篇有100字的文章了!所以可以用中國最 古老的方式去「加密訊息」,例如把想表達的訊息放在每篇文章隔第三個字,又或者是要第一篇的每三個字加第二篇的第二個字加第三篇的第一個字,當然可以篇篇 不同/次次新款,只要提供一些審查軟件讀不到的貼士給瀏覽者來解密;其實就算審查軟件讀得到貼士,它亦分不開什麼是文章內容什麼是貼士,什麼的文章該怎樣 去讀。如此作強迫審查過程不可以自動化,大幅增加它的成本和降低它的效率!
最好有一軟件結合以上辦法,自動辨別自己網頁的內容被中共國的防火牆阻截,立刻用以上任何一種方法隨機改寫網頁內容,只費幾分鐘,但審查軟件卻可能用幾小時去分辨。除非中共可以容忍在北京奧運時互聯網比日常慢萬倍,否則到時中共的思想統治基石:防火牆將形同虛設!
北京奧運是各位黑客表現身手的時侯,我等着看好戲。

2008年7月12日 星期六

Suggestions to overcome China’s Great Firewall

This is intended for webmaster who would like celebrate the natural beauty of freedom of information in the Internet during the 2008 Olympics. Since the censorship technology in China is partially base on keywords and partially base on url, my suggestions here is about exploiting the weakness of former.

Suggested Direction:
1. Mischaracterization: Mismatch the character set that the webpage intended for its audience to its META data, most Chinese read only website with Chinese characters on it, the software that do the censoring does NOT understand the meaning of Chinese words. There are five commonly adopted language set for displaying text, from Big 5 to UTF. If a webpage is mean to display in UTF is mis-understand by the censor software as displaying in Big 5, what the censor program see is nothing but gibberish. It is natural for Netizen inside China to adjust the language setting if what is display doesn’t make sense, but it is NOT natural for censorship software to do that. That would increase the load of censorship software 5 times.

2. A more advance idea is to break the webpage into several character set, For instance, break a webpage into partitions of Simplified Chinese, UTF and Traditional Chinese HK style. It would be troublesome for the Netizen to adjust the language display in individual partitions, but a single webpage that divided into 3 parts would increase the difficulty of the task by censorship software by 125 times. To alleviate the frustration of Netizen, we should develop software that can automate the task of ‘decipher’ the language ssetting of each webpage.

3. A related idea is to display the taboo word/phrase only in picture. The censor software can’t make any decision on pictures, they can only deal with raw text. Many webmaster already do that to display the Chinese character of the webpage that is not specified in character set data of webpage. How difficult it is to change only a few characters on a webpage?
Moreover, to increase the level of difficulty of censorship, the webmaster don’t just transform all the taboo item to pictures, the webmaster would do so randomly on taboo and non-taboo item. Doing so may require a software that automatically transform the require character into pictures. That is not difficult for webmaster, but it would VERY DIFFICULT for the censorship software to transform the picture into Chinese character that may or may not relevant to censorship process.

4. Add meaningless number, symbol, character from another language(like English), pictures or space inbetween the phrase that the censor software is looking for. Chinese language make sense only with an unique combination of character. The method is mean to disrupt this relationship for the censor software. For instance, democracy is made of (民主) two characters. The censorship software can’t block everything start with 民 or anything that end with 主, nor anything like 民 主, since the software doesn’t UNDERSTAND, it work only by inflexible mechanical rules.

5 Purposefully Wrongly align/indention to intentionally break the taboo words that is targeted for censorship software. It is easy for human to adjust the webpage in their hand, it would be very difficult for censorship program to try and test all possible indention and align to get the intended reading of the webpage. Beside, the censorship program itself does NOT understand anything of the content, therefore it has to test mechanically of every possible combination to look for taboo words. However, it can never tell which is the intended way of displaying the content of webpage.

6. Translated the part of the taboo words into another language like English, for instance democracy(民主) into people 主, or people master. Translation software is widely available in the Internet, it is nothing against the law in China to look for them, Therefore it is easy for Netizen to read the correct meaning but not for the censorship software.

7. Use only pronunciation to represent the whole or part of the taboo item. The censorship software is NOT equipped with the ability to recognize the character through pronunciation. It, however, require an intimate understanding of the pronunciation of Chinese character for different segment of Chinese. Moreover, the censorship software may confuse between different type of data. For instance, some Chinese Netizen use 1314 to represent ‘my whole life’ (一生一世), imagine if a Chinese Netizen use (1生1世). How can the censorship software distinguish the indent of number as for pronunciation or for representing Chinese character?

If every webpage that Communist China want to censor has adopt all or some of the above technique, then the cost and time of censorship would go up more than 1000 times. Let’s see if it want to slow down the Internet for 1000 times during the Beijing Olympics!

Suggestions to overcome China's Great Firewall(2)

Overall speaking, my idea are base on the following understanding of censorship process:

1. The censorship body does NOT enjoy to do the censorship manually in word by word fashion(especially when it is not the right type of Chinese character), therefore they rely on automatic mechanism;
2. Those mechanism are prone to error since computer software doesn’t understand what it do.
3. Communist China wouldn’t like to slow down the Internet because it is necessary for commercial transaction.
4. Communist China doesn’t like to increase the cost of censorship on its own expense, no censor like that.
5. There are economic price to pay when the censorship software block much more than its intended target.
6. The weakness of all automatic mechanisms in general:
A. It fail to recognize the correct language setting of the webpage, and has no low cost way to ascertain that;
B. It recognize the target word only by combination of Chinese character, not through an understanding of the sentence, thus many attack could target this;
C. It fail to recognize the intend of different types of data available in the webpage, it only mechanically check against a database of taboo words;
D. Censorship Software can’t read picture.

A more advance method is a website that has a software to reliably detect censorship mechanism. Once it recognize that is being blocked, it can automatically alter the disguise of content in according to my suggestions. Then it try again to see if its method is successful, that would keep altering its disguise until it is no longer blocked by China’s Great Firewall.