mozdev.org

同文堂

网站资源:

使用以词定字需要一个繁简词汇表作为选字的索引。这个词汇表因为是根据语义来决定的,所以必须使用人工完成,机器它笨呐!

如果没有这个词汇表,繁体字用户在观看简体网页时,会发现一些“错别字”的出现,譬如皇后,後来,全变成皇後,後来了。因为简体字里有些字同一个字对应几个繁体字,造成必须根据上下文才能确定正确的转换。就繁体字的最常用编码五大码 Big5 来说,同样有一个繁体字对应几个简体字的问题,比如裤字,Big5 里只有一个裤,所以简体字用户只能看到纨裤子弟,而看不见纨绔子弟,Big5 里没有着,只有著,所以土著是它,着急也是它。但这个问题简体转繁体要比繁体转简体严重多了。没有这个词汇表,繁体字用户就只好忍受错别字的出现了。

这里我们以 xcin 和 scim 这两个输入法的词汇表为基础生成了一个繁简词汇对应表,这个表里只包括繁简不能一一对应的词汇。当我们剔除了不适用的词汇以后,就会生成一个可以使用的词汇表。

因为作者本人使用简体字,所以可以筛选出一个繁体转简体的词汇转换表,但简体转繁体就无能为力了。如果弄错一堆字,自找一片指责,划不来呀。所以在进行一对多的转换时,目前简体字用户看到的大多都是正确的转换,繁体字用户有可能看到错字,那就猜吧。

这个表因为需要同时表示繁体和简体字,所以使用的是 UTF-8 编码。当筛选正确词汇的时候,你只须在正确的词汇的头或者尾部加入一个特殊符号作为标记,比如 / 或者 ] :

    皇后 皇后]
    皇后 皇後
    头发 头發
    头发 头髮]
    发财 發财]
    发财 髮财

筛选完毕后,我们可以用程序根据特殊标记把正确的词汇提取出来,这部分是作者本人的工作,没问题。

在 Linux 下 gedit 可以很好的编辑 UTF-8 文件。Windows 下据说不少编辑软件(ultraedit?)都可以编辑 UTF-8 文件。 Mozilla 本身的编辑器 Composer 或者 Nvu 当然也可以编辑 UTF-8 文件啦。

下载需要筛选的繁简词汇对照表: Simp2Trad.zip
(为便于使用,主文件被分割成 0-24 个小文件,可以分别编辑。)

请有兴趣帮忙的繁体字用户帮我们进行筛选,谢谢。

备注: xcin 和 scim 是Unix下的自由软件。
xcin: http://xcin.linux.org.tw
scim: http://www.freedesktop.org/~suzhe/index_cn.html

User Notes: [?]

If you do not get a response to a question posted in this forum, please try sending a message to the project's mailing list or to the project owner directly.

[1] Submitted by: on Tuesday June 1st 2004

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
[2] Submitted by: vertex on Monday June 14th 2004

我將 Simp2Trad.zip 下載回來後,將
要怎麼才能將標記好的文件傳給你呢?
我已經將 s2t21 至 s2t24 標記好了。

希望不要跟網友做重複的工作....

感謝您的努力與貢獻

ruan@seed.net.tw

Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.6) Gecko/20040211 Firefox/0.8
[3] Submitted by: vertex on Sunday June 20th 2004

怎樣才能跟作者互動呢?想要幫忙也不知如何幫起
既要用戶們幫忙篩選,又不給聯絡管道
留言版又沒有人在理...
真的很希望能改進。感謝!

Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.6) Gecko/20040211 Firefox/0.8
[4] Submitted by: 哈少 on Saturday June 26th 2004

不好意思,最近一是忙,一是没心情,所以没有碰这个项目的事。

你可以把你的结果寄过我或者寄到邮件列表。

我刚想到用 xcin 的繁体词汇表反查,可以用程序做出一个简体到繁体的词汇词汇表,但是由于 xcin 的词汇来源主要是台湾 BBS ,用词方式跟大陆,新加坡不同,但简体到繁体的词汇表是用来转换简体网页的,所以还是有必要进行筛选工作。

谢谢

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040604 Firefox/0.8.0+ (MozJF)
[5] Submitted by: YC on Tuesday 20th July 2004 at 16:02 -0400

其实早就有big5gbk的对照表了,就附在下面的文档中,而且没有版权上的问题。就麻烦哈少大侠了。如果可以的话,还请linux跟windows的都能有一份。

说明

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040707 Firefox/0.8
[6] Submitted by: YC on Tuesday 20th July 2004 at 16:04 -0400

唉呀出包了。
(bgconv1035.exe)
search.cpatch.org/download/patchutil/bgconv/bgconv1035.exe
www2.tw.freebsd.org/cpatch/patchutil/bgconv/bgconv1033.txt

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040707 Firefox/0.8
[7] Submitted by: moxa on Wednesday 21st July 2004 at 05:12 -0400

一箇勁"儿"、一個勁"儿"

這兩個都不對,應該是 一個勁"兒"
要怎麼辦?

一個蘿"蔔"一個坑,不是一個蘿"卜"一個坑
也有這個問題

Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.7) Gecko/20040626 Firefox/0.9.1
[8] Submitted by: on Tuesday 7th September 2004 at 02:35 -0400

我是

Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)
[9] Submitted by: dd on Saturday 23rd October 2004 at 01:49 -0400

呼吁是應該呼籲吧。

Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1
[10] Submitted by: dd on Saturday 23rd October 2004 at 08:48 -0400

由于->由於

Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1
[11] Submitted by: ul on Monday 25th October 2004 at 12:48 -0400

制造->製造

Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1
[12] Submitted by: jk on Thursday 28th October 2004 at 22:33 -0400

游记->遊記
游览->遊覽
珠帘->珠簾
朴素->樸素
宁海->寧海

Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.3) Gecko/20040910
[13] Submitted by: 喜欢 on Saturday 6th November 2004 at 23:06 -0500

喜欢

Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
[14] Submitted by: 喜欢 on Saturday 6th November 2004 at 23:08 -0500

张张张

Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
[15] Submitted by: 喜欢 on Saturday 6th November 2004 at 23:10 -0500

喜欢->

Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
[16] Submitted by: hysd2zxx on Saturday 13th November 2004 at 01:44 -0500

我还不会

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
[17] Submitted by: 110 on Wednesday 17th November 2004 at 05:58 -0500

承诺

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; TencentTraveler )
[18] Submitted by: 110 on Wednesday 17th November 2004 at 06:01 -0500

承诺

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; TencentTraveler )
[19] Submitted by: 米诺 on Monday 29th November 2004 at 08:55 -0500

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
[20] Submitted by: 泡沫 on Wednesday 22nd December 2004 at 00:26 -0500

泡沫

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
[21] Submitted by: MM on Friday 31st December 2004 at 06:12 -0500

想向你祈祷,祝你幸福!

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
[22] Submitted by: Chris on Monday 10th January 2005 at 05:30 -0500

請問有哪些是已經有網友幫忙註記好的呢?或是都已經完成只待出新版,能請哈少大哥公告一下嗎?
昨晚試註了第七個檔案,半個小時只註了一半,眼睛都快花了,而且還有數個不能確定正確的詞…真是辛苦的工作,只能對作者已經曾幫過忙的網友們致上最高的敬意!
我也抓了bgconv1035.exe 回來看過,但實在找不到他的詞彙翻譯表(只找到comp的部份),或許兩位作者可以試著連絡看看,直接拿到已經完成的詞彙表來做?

Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.7.5) Gecko/20041119 Firefox/1.0
[23] Submitted by: Chris on Monday 10th January 2005 at 07:10 -0500

另外,怎麼把校正過的結果檔案寄給你呢?哈少大哥
眼拙了找尋不到你的email...

Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.7.5) Gecko/20041119 Firefox/1.0
[24] Submitted by: news on Monday 10th January 2005 at 13:56 -0500

在 moztw.org 已經實現簡體轉繁體的「詞彚轉換表」

forum.moztw.org/viewforum.php?f=11

Chris 可以移駕到 moztw.org,在那兒「新同文堂」還是活著的

Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.7.5) Gecko/20041119 Firefox/1.0
[25] Submitted by: 地方工业 on Wednesday 12th January 2005 at 14:02 -0500

冼剑辉

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Maxthon)
[26] Submitted by: Chris on Wednesday 12th January 2005 at 15:49 -0500

oops......
十分感謝!
也希望哈少大哥早日復出。
在下先往

Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.7.5) Gecko/20041119 Firefox/1.0
[26] Submitted by: Chris on Wednesday 12th January 2005 at 15:49 -0500

oops......
十分感謝!
也希望哈少大哥早日復出。
在下先往

Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.7.5) Gecko/20041119 Firefox/1.0
[27] Submitted by: 傻了吧你 on Thursday 20th January 2005 at 20:05 -0500

退出网络 恋爱幼儿园

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
[28] Submitted by: wanpeebaw on Friday 18th February 2005 at 22:20 -0500

詞彙檔由個人貢獻者一個一個查找沒有效率況且新詞會一直增加恐怕不是一個長遠之計,我想到一個方法不知道可不可行,就是讓使用者在瀏覽網頁時發現有簡繁翻譯對應錯誤的時候能夠很方便的submit到一個資料庫做整理(註1),甚至也可順便留下前後文的資料方便以後可以分析。

註1:可能用 選取+右鍵選單 或其他方式,初期可以提供一個簡單的網頁介面來輸入。

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041111 Firefox/1.0
[29] Submitted by: on Tuesday 8th March 2005 at 21:25 -0500

遇见你是最美丽的意外

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
[30] Submitted by: on Tuesday 8th March 2005 at 21:26 -0500

到底怎么转啊,我不会用啊

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
[31] Submitted by: on Sunday 13th March 2005 at 03:18 -0500

Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; (R1 1.5))
[32] Submitted by: on Sunday 13th March 2005 at 03:19 -0500

士大夫似的十分似的

Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; (R1 1.5))
[33] Submitted by: on Sunday 13th March 2005 at 03:39 -0500

Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; (R1 1.5))
[34] Submitted by: 无忌 on Saturday 16th April 2005 at 06:54 -0400

无忌

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
[35] Submitted by: 贺强 on Tuesday 19th April 2005 at 14:16 -0400

亲切

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
[36] Submitted by: 天目 on Wednesday 4th May 2005 at 13:33 -0400

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
[37] Submitted by: 我啊! on Tuesday 31st May 2005 at 12:48 -0400

情恋珈琐

Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
[37] Submitted by: 我啊! on Tuesday 31st May 2005 at 12:49 -0400

情恋珈琐

Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
[38] Submitted by: 赖波 on Sunday 21st August 2005 at 01:57 -0400

Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
[39] Submitted by: dd on Tuesday 4th October 2005 at 00:49 -0400

知难而进

共勉

Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; TencentTraveler )
[40] Submitted by: 遗忘 on Friday 10th February 2006 at 17:10 -0500

遗忘

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; TencentTraveler )
[41] Submitted by: 了解 on Thursday 2nd March 2006 at 20:16 -0500

什么啊 垃圾

Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
[42] Submitted by: 兄弟 on Saturday 3rd February 2007 at 20:01 -0800

兄弟

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
[43] Submitted by: richie on Friday 8th June 2007 at 00:09 -0700

刘海

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
[44] Submitted by: 林林 on Wednesday 13th June 2007 at 00:06 -0700

钟爱一生

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
[45] Submitted by: 美女 on Saturday 27th October 2007 at 01:29 -0700

美女

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
[46] Submitted by: 妍妍 on Tuesday 30th October 2007 at 11:08 -0700

我永遠是我

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
[47] Submitted by: sdad on Thursday 28th February 2008 at 07:25 -0800

dad

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; cafe8)

To add a comment fill in the form below. HTML is not allowed in posts. Use two carriage returns to start a new paragraph.

This is not provided as an advertising medium, so posts with excessive numbers of links in will be discarded, as they are assumed to be spam. Javascript is used to stem the flow of auto-submissions.

For questions or comments about tongwen, please send a message to the tongwen mailing list.
For questions or comments not about a specific project, please read our feedback page.
This page was last updated on Jul 24, 2008.
Copyright © 2008. All rights reserved.

Keywords: chinese, simplified, traditional, content, homepage, html, browser, mozilla, firefox, convert, switch, change, from, to, converter, conversion, switcher, browse, between, auto, automatic, automatical, automatically, surfing, surf, web, internet, javascript, xul, extension, plugin, button, toolbar, ie, internet explorer, linux, windows, XP, 2000, show, display, font, view, watch, read, home, page, software, program, script, java, xpcom, charset, encoding, character set, gb, gb2312, big5, unicode, gbk, china, hong kong, hk, taiwan, macao, macau, singapore, malaysia, mandarin, like alibabar, culture, life, exchange, communicate, communication, increase, raise, science, arts, improve, improvement, participate, discussion, novel, story, wuxia, knight, romance, detective, love, classic, pinyin, input, output, origin, original, free, freedom, open, source, code,
关键字: 中文, 繁简, 转换, 转化, 转码, 转变, 切换, 替换, 改变, 网页, 网站, 页面, 显示, 浏览, 浏览器, 网络, 互联网, 互连网, 简体, 繁体, 简化, 简体字, 繁体字, 字体, 字型, 宋体, 明柳, 细明, 仿宋体, 仿宋, 中国, 香港, 大陆, 台湾, 澳门, 新加坡,马来西亚, 华语, 编码, 字符, 字符集, 国标, 五大, 大五, 统一码, 快速, 汉字, 汉语, 中文化, 延伸, 套件, 插件, 外挂, 软件, 程序, 程式, 修正, 自动, 互换, 互译, 文化, 艺术, 科学, 科技, 交流, 交换, 交往, 增进, 增加, 促进, 沟通, 互助, 互动, 两岸, 三地, 同文堂, 吃葡萄不吐葡萄皮儿, 不吃葡萄倒吐葡萄皮儿, 火狐, 魔斯拉, 参与, 讨论, 小说, 武侠, 言情, 名著, 侦探, 拼音, 注音, 之间, 进行, 输入, 输出, 原文, 自由, 源码, 开放, 代码, 国语, 国文,