汉语短语的自动划分和标注
[作者]周强;
[摘要]考虑到传统的基于规则的汉语分析器对大规模真实文本的分析所遇到的困难,本文在使用统计方法进行汉语自动句法分析方面作了一些探索,提出了一套基于统计的汉语短语自动划分和标注算法。它分为预测划分点、括号匹配和分析树生成等三个处理阶段,其间利用了从人工标注的树库中统计得到的各种数据进行自动句法排歧,最终得到一棵最佳句法分析树,从而可以自顶向下地完成对一句句子的短语自动划分和标注,对一千多句句子的封闭测试结果表明,短语划分的正确率约为86%,短语标注的正确率约为92%,处理效果还是比较令人满意的。
[Abstract]: this paper. we describe work toward the construction of a probabilistic parsing system for Chinese phrase . The system is intend to bracket and tag the Chinese pbrase automatically in large-scale real text corpus . The algorithm has three processing stages : to pre- dict the bracketing point, to match brackets and to generate the syntactic tree, using the scatis- tics information got from a supervised training treebank . Through syntactically disambiguating, the parser gets the best syntactic tree. Using ...
[关键字]短语自动划分和标注; 语料库加工;
|
基于语境类似度的并列成分的判定方法
[作者]简幼良; 高健; 王秀坤;
[摘要]本文针对日语处理中的疑难问题之一──长句并列成分的系受关系和范围的判定,介绍了日本长尾真等人提出的关于并列关键字语境类似度的日语并引成分的分析方法。该方法对日语并列的分类、关列关键字的确定、类似性的决定因素及其量化、并列构造范围的求解等进行了详细的讨论,并给出了算法。我们把这种算法应用到我处开发日中翻译系统”孙悟空”里,并进行了一定的调整和补充。取得了比较满意的效果。
[Abstract]How to find the response relation and scope of conjuctive scructures in a long Japanese sentence is one of the posers in Japanese processing . this paper, a method based on language environment similarity proposed by Nagao. M. is introduced. It is discussed detailed about conjuctive classity, conjuctive keys. decisive factors of Similarity, analysis of conjuctive scope and the corresponding algorithem is given. This algo- rithem is applied to "Sun wukong" , the Japaness-chinese translating svstem, develope...
[关键字]类似性; 关键字; 并列结构; 并列成分; 翻译系统; 类似度; 语境;
|
一种实用型汉语词汇处理系统CWP之设计
[作者]唐棠; 戎启俊;
[摘要]本文介绍了一种微型计算机汉语词汇处理系统及其分词算法的设计、该系统在中文信息处理技术中有较好的应用价值。
[Abstract]this paper, we introduce a practical Chinese word processing system for personal computer, and discuss some design issues of its word segmentating algorithm. The sys- tem is veryvaluable in the area of Chinese inforrnation processing.
[关键字]词库; 分词算法; 中文信息处理;
|
语句级音调规律的研究与实现
[作者]吴岩; 刘挺; 李秀坤; 王开铸;
[摘要]本文首先讨论了人的发音规律,然后介绍了在语句级上提高语音合成质量的小系统,并对其实现过程进行了形式化描述,最后给出了本系统已实现了的实例。
[Abstract]First I discuss the rule of peron' s voice, then introduce a small system of raising the quality of voice compound at the grade of sentence and give the formal description of it's realization . At last, I give a example that this system have realized .
[关键字]文语转换; 语音合成; 人工智能;
|
骨架汉字字形存储与显示技术
[作者]赵恒; 金通; 王国瑾;
[摘要]本文通过对骨架技术的分析,提出了一种基于骨架技术的计算机汉字存储与显示方式,即骨架汉字字形存储与显示技术。为了控制骨架汉字的笔划形状,本文运用了两个控制参数,取得了较好的控制效果。
[Abstract]: Using Skeleton Technique ,We can give a flexible method to store and create Chi- nese characters on computer ,A character stored by this method is called skeletal Chinese charac- ter , Its efficiency for interactive use make it suitable as a basic data structure in Chinese charac- ter design system. This paper introduces two parameter to control stroke shape of skeletal Chi- nese character, which is found to be sanstactory .
[关键字]骨架技术; 骨架汉字; 字形存储与显示;
|
藏文综合编码方案的研究与实现
[作者]彭寿全; 黄可; 张义刚;
[摘要]目前,藏文处理系统中普遍存在着外字困扰问题,本文首次实现了彻底排除外、字的全新编码方案──综合编码方案,它由通行编码方案和外字编码方案两部分组成。通行编码方案按双字节11模式编码,外字编码方案采用组合叠加编码。文中对外字组合叠加编码方案作了深入地研究,提出了外字运算符、外字描述和顺序输入叠加输出等概念.设计了自动造字算法和程序,解决了外字处理的一系列技术障碍,并在Super-CCDOS汉字系统WPS5.0版本上实现了藏文综合编码方案,其方法可引伸到其它藏文处理系统。
[Abstract]AbstractMany Tibetan information processing systems are puzzled by the problem of Tibetan new characters at the moment. A new Coding method for Tibetan characters──comprehensive coding method (CCM) is put forward for the first time in this paper, which slovesthe problem of Tibetan new characters thoroughly. CCM consists of the Normal Coding Method (NCM) and the Tibetan new character coding method (TNCCM). NCM is 2 bytes 11 mode coding, and TNCCM is combined coding. In this paper, TNCCM is discussed thorough...
[关键字]编码方案; 字元; 藏文; 编码字符集; 点阵字模;
|
汉语短语标注标记集的确定
[作者]周强; 俞士汶;
[摘要]本文提出了一个汉语短语标注的基本标记集,并从句法功能和结构组成方面对不同短语的性质进行了深入的分析和探讨,以期为汉语短语划分和标注的自动处理和人工校对提供一个统一的处理标准。
[Abstract]Abstract In this paper, we propose a syntactic tagset for annotating Chinese phrase.and discuss the syntactic functions and constituent structures of the different kinds of phrase tags. We hope that this research work can be developed to become a working standard for bracketing and annotating Chinese phrase.
[关键字]汉语短语; 句法功能; 短语标记; 联合结构; 状中结构; 现代汉语; 动词性短语; 标记集; 短语标注;
|
汉字字形的关系稳定原理
[作者]王开铸; 王英伟;
[摘要]本文对汉字的字形描述进行了深入的研究,并在此基础上总结得出了汉字字形的关系稳定原理:在汉字字形中,笔划基元的方向、长度、位置等属性均是不稳定的,而各笔划基元之间的关系是稳定的。基元间关系是反映字形木质的因素,是汉字字形信息的主体。关系稳定原理作为反应汉字字形本质的重要原理,除了在研究汉字字形方面有重要意义之外,最重要的应用就是对汉字识别的研究提供方向性的指导。
[Abstract]his paper presents some new researches about the script description of Chinese characters. On the basis of these researches we sum up the Relations Stable Principle of Chinese Character Script as following: In Chinese character scripts, the attributes(such as directions, lengths, positions) of stroke primitives are not stable, but the relations between stroke primitives are stable. The relations between primitives, which are the main part of Chinese character script information , reflect the essential of Ch...
[关键字]中文信息处理; 汉字字形描述; 关系稳定原理;
|
藏文综合编码方案的研究与实现
[作者]彭寿全; 黄可; 张义刚;
[摘要]目前,藏文处理系统中普遍存在着外字困扰问题,本文首次实现了彻底排除外、字的全新编码方案──综合编码方案,它由通行编码方案和外字编码方案两部分组成。通行编码方案按双字节11模式编码,外字编码方案采用组合叠加编码。文中对外字组合叠加编码方案作了深入地研究,提出了外字运算符、外字描述和顺序输入叠加输出等概念.设计了自动造字算法和程序,解决了外字处理的一系列技术障碍,并在Super-CCDOS汉字系统WPS5.0版本上实现了藏文综合编码方案,其方法可引伸到其它藏文处理系统。
[Abstract]AbstractMany Tibetan information processing systems are puzzled by the problem of Tibetan new characters at the moment. A new Coding method for Tibetan characters──comprehensive coding method (CCM) is put forward for the first time in this paper, which slovesthe problem of Tibetan new characters thoroughly. CCM consists of the Normal Coding Method (NCM) and the Tibetan new character coding method (TNCCM). NCM is 2 bytes 11 mode coding, and TNCCM is combined coding. In this paper, TNCCM is discussed thorough...
[关键字]编码方案; 字元; 藏文; 编码字符集; 点阵字模;
|
汉字字形的关系稳定原理
[作者]王开铸; 王英伟;
[摘要]本文对汉字的字形描述进行了深入的研究,并在此基础上总结得出了汉字字形的关系稳定原理:在汉字字形中,笔划基元的方向、长度、位置等属性均是不稳定的,而各笔划基元之间的关系是稳定的。基元间关系是反映字形木质的因素,是汉字字形信息的主体。关系稳定原理作为反应汉字字形本质的重要原理,除了在研究汉字字形方面有重要意义之外,最重要的应用就是对汉字识别的研究提供方向性的指导。
[Abstract]his paper presents some new researches about the script description of Chinese characters. On the basis of these researches we sum up the Relations Stable Principle of Chinese Character Script as following: In Chinese character scripts, the attributes(such as directions, lengths, positions) of stroke primitives are not stable, but the relations between stroke primitives are stable. The relations between primitives, which are the main part of Chinese character script information , reflect the essential of Ch...
[关键字]中文信息处理; 汉字字形描述; 关系稳定原理;
|
共95页 当前第61页