[ 2010 September,10, Friday ]
中国中文信息学会
Chinese Information Processing Society of China
首页
学会简介
学会领导
学会办公室
工作委员会
专业委员会
学术活动
发展会员
钱伟长中文信息处理奖
科技工作者之家
中文信息学报
新书介绍
按年代和期次浏览(最新数据: 1999年第2期)
口语自动翻译系统技术评析
[作者]宗成庆; 黄泰翼; 徐波;

[摘要]近几年来,随着信息技术的发展,口语自动翻译技术成为新的研究热点。目前国际上一些著名大学和研究机构甚至企业,都纷纷加入这一高技术的竞争行列,我国在相关技术方面也进行了卓有成效的研究。本文对目前自动口语翻译研究的技术现状进行了全面综述和分析,并对一些具体问题作了深入探讨。作者希望本文作出的分析和讨论的问题,能够对我国的自动口语翻译研究提供有益的参考

[Abstract]With development of information technology, the technique of automatic spoken language translation becomes a new research point. Recently, many famous universities and institutes in the world compete against with each other in this new technique field, and researchers of our country have made great progress in the related aspects. In this paper the techniques of automatic spoken language translation are summarized and analyzed and some concrete problems are discussed in detail. The authors hope that the pap...
[关键字]口语翻译; 语音翻译; 对话处理; 机器翻译; 鲁棒性;



从GB2312-80汉字到整型数的连续可逆映射
[作者]游荣彦;

[摘要]本文讨论建立从GB2312-80汉字到整型数据的映射在中文信息处理中的意义,提出了一个连续可逆映射,并论证了该映射,并论证了该映射所具的优良性

[Abstract]This paper discusses the significance of the mapping from Chinese characters of GB2312-80 to integers,puts forward a continuous and invertible mapping and expounds the goodness of the mapping.
[关键字]ASCI码; 区位码; 数据压缩; 映射; 取模;



基于单元合并的汉字切分算法的改进
[作者]周嫔; 马少平; 姜哲;

[摘要]本文介绍了对基于单元合并的汉字切分算法作出的改进。该改进算法对原算法中的核心部分高级合并部分进行了修改,通过在所有的可合并单元中找最佳合并组合,来避免原来的算法在高级合并过程中可能导致的某些合并错误。经过多个实际样本的测试,所作的改进在不降低原算法各种性能的前提下,消除了原算法在某些情况下产生的错误,进一步有效地提高了切分的正确率

[Abstract]This paper introduces the modification of the Chinese character segmentation method based on units amalgamation. This modified method alters the advanced amalgamation part which is the core of the original method. Because the modified method looks for the best amalgamating combination from all the units which can be amalgamated, it can avoid some amalgamation errors which will be caused by the advanced amalgamation in the original method. By many tests on actual samples, the modification does not decline th...
[关键字]单元合并; 切分算法; 高级合并; 最佳合并组合;



Internet中文个人信息搜索
[作者]沈达阳; 孙茂松;

[摘要]本文构造了一个用于自动生成Internet个人信息索引的实验系统PersonIndexer。在CERNET两个网址上进行的初步实验表明,PersonIndexer对中文姓名、拼音人名、中文机构名的召回率和精确率平均分别为97.8%和61.9%、100%和64.5%、94.5%和92.1%,对电子邮件地址和电话传真号码的召回率和精确率均为100%。鉴于Internet上的信息检索以及自然语言处理这两个领域都互向对方提出了要求,我们相信,面向大规模真实文本的汉语分析技术与Internet的结合,将是今后几年中文信息处理一个新的研究热点

[Abstract]PersonIndexer, a prototype system for automatically generating Chinese personal information index in Internet, is proposed in this paper. Preliminary experimental results on all HTML texts under two CERNET web sites indicate that, the average recall and precision for extraction of Chinese names, Chinese names in Pinyin form as well as Chinese organization names are 97.8% & 61.9%, 100% & 64.5%,94.5% & 92.1% respectively, and the recall and precision for extracting email addresses, telephone and fax numbers ...
[关键字]索引自动生成器; 中文姓名辨识; 个人信息搜索; Internet;



利用语料库技术的中文自动文摘系统
[作者]姜贤塔; 陈根才;

[摘要]本文着重介绍利用“后邻字符树”的方法在领域语料库中生成字符树库,用于自动文摘候选句子选取时提高精度,介绍了后邻字符树的构造、后邻字符树库的生成及优化和句子权值计算方法

[Abstract]A method of generating Rear Charater Trees (RCT) in domanial corpus is presented in this paper.The construction of RCT,the generation and optimization of the RCT database as well as the computation of sentenee weight are discussed in detail.
[关键字]自动文摘; 字符树; 字频统计; 语料库;



曲线轮廓汉字自动生成及其变形方法
[作者]马小虎; 刘玉龙; 潘志庚; 石教英;

[摘要]本文介绍一种基于曲线轮廓汉字和Fourier级数描述的汉字变形新方法。该方法利用曲线轮廓所提供的字形笔划结构特征信息,运用多级数学模型,通过计算机软件功能,对字形自动进行变形,生成一系列形式多样的新字形,是一种动态汉字库技术

[Abstract]A new method is presented to automatically transform Chinese character based on outline font and Fourier series description. Making full use of structure features of strokes of Chinese character provided by outline font and applying several mathematical models,the method can automatically create various new font,This is a new kind of dynamic font technique.
[关键字]轮廓字库; Fourier级数; 汉字变形; 动态字库;



一种基于骨架法形态分析的粘连字符图象分切方法
[作者]卢达; 谢铭培; 钱忆平; 浦炜; 常熟;

[摘要]本文介绍了一种基于骨架法形态分析的粘连字符图象的分切方法,能较正确、容易地确定切分位置,经修剪算法对各切分点的处理,被分离的字符骨架可直接通过分类器而不必再作进一步处理

[Abstract]A segmentation method of topographic approach is proposed for merged character images based on skeletonization. The breaking positions can be correctly and easily located. A splitting algorithm is then applied on each breaking position. The separated skeletons can be directly passed through a classifier without any further process.
[关键字]字符识别; 分切; 粘连字符; 骨架; 分离算法;



基于语义知识的汉语句法结构排歧
[作者]苑春法; 黄锦辉; 李文捷;

[摘要]汉语在词类这个语言层次上存在着许多歧义结构,这给汉语的自动句法分析带来了难以逾越的障碍。通过寻找汉语语义类之间可能存在的句法关系建立汉语语义关联网,这为用汉语语义知识来解决句法歧义开辟了道路。文章针对具体的汉语歧义结构研究具体的解决办法,从而减少了计算的复杂度。

[Abstract]There are too many syntactic ambiguities in the Chinese, and it is not easy to parse the Chinese sentences automatically only with the Part of Speech (POS) tagging. In order to overcome this barrier we looked for the possible syntactic relations between the semantic clas ses, then built a Chinese Semantic Associate Net (CSAN). Using the CSAN we find a way to solve the problem about the syntactic ambiguities. In this way we do not try to solve all the ambiguities, we only try to solve the certain kind of am...
[关键字]语义关联网; 依存语法; 句法树;



中文文本的关键词自动抽取和模糊分类
[作者]何新贵; 彭甫阳;

[摘要]本文提出了中文文本分类的两种模糊方法,一种基于模糊集间的语义距离[2],一种基于本文中提出的‘模糊分类网络’。两者都必须首先从文本中抽取关键词集合,本文给出了一种主要采用统计方法结合受限自然语言理解技术的模糊关键词集合提取方法,它与模糊分类方法结合,可望达到文本信息的自动分类。所提出的方法同样适合于模式识别之类问题的解决。

[Abstract]Two methods for text classification based on fuzzy techniques are presented in the paper.A concept of ‘fuzzy classification network’is used in the text classification.To meet the needs of these two classification methods,an approach based on statistics and natural language understanding for automatically extracting the Key words from the text is presented.It is point out that the methods are also suitable for pattern recognition and text retrieval.
[关键字]文本; 分类; 模糊方法; 模糊分类网络; 语义距离;



藏文内码扩展体系
[作者]于洪志;

[摘要]针对藏文编码字符集的基本集和辅助集建立在不同平面、编码体系不同所存在的问题,本文提出建立藏文内码扩展体系,给出了藏文合成、生成、分解的规则和方法:通过内码转换表合成藏文藏文内字,实现基本集与辅助集的信息交换;通过构件集,生成规范、标准的藏文外字,满足藏文编码字符集开放性的需要。并且,向上,在字汇一级,兼容UCS;向下,与GB2312的事实上的内码标准兼容,是一个全藏文编码体系。作者建议在UCS基本平面的拼音文字区建立内码扩充体系。

[Abstract]For the existing problem in Basic sets and auxiliary sets builed on different plane and different system of the tibetan coded character sets,We Present an extended tibetan coded character system--Full Tibetan Coded Character System.In this paper,we have also given the rule and method for composing and decomposing of the tibetan and,implemented the information interchanging between the Basic sets and auxiliary sets by means of internal,code converting table.This new system can produce the normal and standard...
[关键字]藏文内码扩展体系; 内字、外字复合序列; 组合用字符; 合成内字; 生成外字;



共95页 当前第53页 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95   
©中国中文信息学会 1981-2007
京ICP备05039057号