[ 2010 September,10, Friday ]
中国中文信息学会
Chinese Information Processing Society of China
首页
学会简介
学会领导
学会办公室
工作委员会
专业委员会
学术活动
发展会员
钱伟长中文信息处理奖
科技工作者之家
中文信息学报
新书介绍
按年代和期次浏览(最新数据: 2001年第3期)
基于实例的汉语句法结构分析歧义消解
[作者]杨晓峰; 李堂秋; 洪青阳;

[摘要]本文论述了一种基于实例的汉语句法结构分析的消歧方法。本文首先提出了这种方法的总体思路 ,并对其语义知识资源—《知网》作了简要的介绍。然后详细地描述了基于实例的排歧法的主要算法。最后给出的算法实验结果例子证明 ,这种方法是对汉语的结构分析排歧是有效的。

[Abstract]This paper presents a description of the method of example based Chinese syntactic structure disambiguation method.First we put forward the general idea about this method and give a brief introduce to its semantic knowledge resource the Hownet Dictionary.Then the main algorithm for the method of example based disambiguation is proposed with detail.The experiment result given in the end proves our method to be effective.
[关键字]歧义消解; 基于实例; 相似度; 知网; 依存关系树;



汉语名物性短语句法位置语料库的设计
[作者]王家钺;

[摘要]汉语句物性短语 (NP)在汉语信息检索中有重要价值。本文以非统计的信息处理方法为出发点 ,介绍一个汉语名物性短语句法位置语料库的设计思想、所使用的句法位置标记集以及标记加工规范 ,并指出了这样一个语料库的潜在价值。目前正在以此为出发点建立一个汉语名物性短语句法位置语料库。

[Abstract]Chinese nominal phrases(NP)are of vital importance for Chinese information retrieval.This paper introduces the idea of a corpus of Chinese NP syntactic positions,as well as the tag set used and the principles for tagging.The potential value of this corpus is also pointed out.A corpus of Chinese NP syntactic positions is being constructed.
[关键字]句法位置; 语料库; 名物性短语; 非统计信息检索; 汉语信息检索;



基于二元接续关系检查的字词级自动查错方法
[作者]张仰森; 丁冰青;

[摘要]本文探讨了基于字字同现、词性二元接续和语义二元接续的中文文本的自动查错原理和查错算法 ;给出了字词接续判断模型 ,并讨论了与接续判断模型相关的查错知识库的构造方法。通过对实验结果的分析和评测 ,证明本文所述方法是可行的。

[Abstract]Automatic error detecting principle and algorithm of Chinese texts based on character character co occurrence,POS bi neighborship and semantic bi neighborship are discussed in this article.The models of judging character and word neighborship are presented,and the method of constructing error detecting knowledge bases which is related to these models is introduced.According to the analysis and estimation for experiment results,the method given in this paper is workable.
[关键字]中文文本自动校对; 自动查错; 二元接续关系;



基于PATRICIA tree的汉语自动分词词典机制
[作者]杨文峰; 陈光英; 李星;

[摘要]分词词典是汉语信息处理系统的一个基本组成部分 ,其查询和更新效率将直接影响汉语信息处理系统的性能。本文采用PATRICIAtree的数据结构 ,设计了一种可以对词典词条进行快速查询、更新的分词词典机制 ,并从理论上初步分析了它的性能。最后通过实验 ,在时间效率上与逐字二分的分词词典机制进行了比较。结果表明 ,基于PATRICIAtree的分词词典机制具有更高的查询速度和更新效率 ,能满足大规模、开放文本处理系统的需求。

[Abstract]The dictionary mechanism is the basic component of Chinese informationprocessing systems,and its efficiency will greatly affect the performances of those systems.Based on the data structure of PATRICIA tree,this paper designed a new PATRICIA tree based dictionary mechanism.Firstly,the paper presents the primary function analysis of this PATRICIA tree based dictionary mechanism.Then a comparison is given between PATRICIA tree based and binary seek by characters dictionary mechanism.All the results p...
[关键字]信息检索; PATRICIA; tree; 汉语自动分词;



Hough变换在中文名片图像倾斜校正中的应用
[作者]潘武模; 焦扬; 王庆人;

[摘要]近来 ,文档图像的计算机自动理解已取得很多进展。但是 ,对于具有倾斜的图像的理解仍然存在许多困难。这种困难在中文名片图像自动识别与理解系统中尤为突出。必须在系统的输入端对图像作有效的倾斜校正以保证系统的性能。由于中文名片版面复杂 ,名片中文字行以及每行字符较少 ,使得现有的倾斜校正算法在处理名片图像时效果很不理想。Hough变换可用于一般文档图像的倾斜校正。但是 ,Hough变换在名片图像中的应用还有待研究。本文提出一种二级Hough变换算法 ,并应用于名片图像理解系统 ,利用名片图像自身的特点提高Hough变换的精确度和速度。这一方法的效果已被实验结果所证实。

[Abstract]Automatic document understanding has undergone great progresses in the past decade.Yet the difficulties due to image skewness have not been overcome.Such difficulties are especially vital in Chinese Business Card understanding systems.Because of the complex card layout,none of current de skew algorithms shows satisfactory performance on Chinese Business Card image.Although Hough transform has been widely used in the de skew of general document image,it's application to Chinese Business Card image needs mo...
[关键字]文档分析; 版面理解; 倾斜校正; Hough变换; 中文名片;



用网络[定向图]实现形态素之间的接续提高假名汉字转换的用户操作性能
[作者]中岛晃; 河野胜也;

[摘要]本文描述了把扩展句节数最小法进行形态素解析的结果登录在网络 [定向图 ]上 ,并把这个网络 [定向图 ]保留在计算机的内存中直到修改操作结束 ,实现了在不同句节切分的各种选择候补中取出所需候补。再根据所选候补去检索网络 [定向图 ]上的路径 ,就可以得到符合操作者意图的全句的转换结果 ,从而使日文汉字输入时所需的假名汉字转换操作简单易行 ,提高了操作性能。

[Abstract]Japanese sentences were recently got by transforming‘kana sequence’into the array of the several syllables.We investigated conversion algorithm and user interface, having following characteristics.We got other syllable boundaries without changing action about those. We adopted fundamentally a method of‘minimizing a sum of syllables in a sentence'. In this method, the result of searching dictionary and of grammatical checking about the chain of the morphemes are maintained in the computer′s memory, so we...
[关键字]网络定向图; 扩展句节数最小法; 假名汉字转换; 汉字输入法;



现状和设想——试论中文信息处理与现代汉语研究
[作者]许嘉璐;

[摘要]本文介绍了中文信息处理技术发展的现状及面临的主要困难 ,指出 :关键在于对现代汉语研究的滞后。到目前为止 ,中文信息处理主要依赖于对大规模语料的统计 ,根据概率 ,对词与词的关系作出界定。多年来中文信息处理技术徘徊难进的现实说明 ,这一方法已经难以突破“瓶颈” ,要使计算机对现代汉语进行自动化的处理 ,即使之真正“智能化” ,就必须把人的语言知识“教”给计算机。这就需要根据计算机的要求加强对现代汉语的研究 ,特别是对语义的研究。文中介绍了当前朝此方向努力并已有较大进展的三个流派 ,并分别指出其不足 ;参考作者主持国家“九五”重点项目“信息处理用现代汉语词汇研究”的经验 ,提出了统一使用资源、携手并进、共同攻关的设想

[Abstract]The paper surveys the state of the art of Chinese information processing and the major obstacles being faced currently, pointing out that the underlying factor to block the development of Chinese information processing is the lag of the systematic and in depth study on contemporary Chinese language. The main stream in Chinese information processing community depends heavily on corpus based methods, by making full use of the statistical relationship among words, in recent years. The fact that the Chine...
[关键字]中文信息处理; 现代汉语研究; 战略性设想;



句法范畴的代数结构与演绎系统
[作者]于江生;

[摘要]本文给出了建立在含幺半群基础上的范畴语法的代数结构 ,定义了范畴方程和它的解并对范畴方程的解作了分类 :相容性的相关性。定理“对于范畴方程的任意一个解X ,都存在唯一的本质解Y使得Y X”使得我们可以通过一定的演绎规则对词w的本质范畴作扩张以得到w的所有句法范畴。最后 ,作者从范畴理论的角度给出了句法范畴演绎系统的数学描述

[Abstract]In this article,we showed the algebraic structure of syntactic categories based on monoid and defined categorial equation whose solutions are described by consistency and correlation .The result “If X is a solution of a categorial equation,then there exists an unique essential solution Y such thatYX”makes it possible that the essential catgories of a word could generate all possible syntactic categories by some deductive rules.Finally,the author described the deductive system of syntactic categories from t...
[关键字]句法范畴; 范畴方程; 本质解; 类型提升;



汉字自适应散列分组查找算法
[作者]王忠效; 范植华;

[摘要]在文献 [1]的基础上 ,本文提出了一个适合中文信息处理用的汉字自适应散列分组查找算法。由于引进了动态遗忘机制以及根据频率动态调整汉字顺序 ,算法的平均查找长度成倍缩短 ,从而能够更有效地保证涉及大量汉字信息检索操作的应用对时间性能的要求。此外 ,提出了一个与文献 [1]相比计算量更小、散列效果相当的散列函数

[Abstract]Based on a previous algorithm proposed in [1],this paper addressed an adaptive hashing algorithm of Chinese characters.By introducing an oblivious policy and sorting Chinese characters in accordance with their dynamic frequencies,the algorithm made important improvements on the average search length of Chinese characters,which could better guarantee the strict demand on time of any application driven by the dyanmic statistics of Chinese texts.In addition,a simpler hash function was given which sorked almost...
[关键字]汉字查找; 散列查找; 散列函数; 自适应散列查找;



无词典高频字串快速提取和统计算法研究
[作者]韩客松; 王永成; 陈桂林;

[摘要]本文提出了一种快速的高频字串提取和统计方法。使用Hash技术 ,该方法不需要词典 ,也不需要语料库的训练 ,不进行分词操作 ,依靠统计信息 ,提取高频字串。用语言学知识进行前缀后缀等处理后 ,得到的高频字串可以作为未登录词处理、歧义消解和加权处理等的辅助信息。实验显示了该方法速度较快且不受文章本身的限制 ,在处理小说等真实文本时体现了较高的可用性

[Abstract]In this paper we describe a fast high frequency strings extracting algorithm. Our approach uses HASH technology to avoid relying on corpus and word segmentation. To extract the high frequency strings, we only use statistics information. After processing the prefixes and suffixes, the high frequency strings we get can be the supplement knowledge for the un-login words processing, word disambiguation and word weighting. The experimental results show that it has a high speed and can work on arbitrary texts. O...
[关键字]Hash技术; 高频字串; 统计; 算法;



共95页 当前第40页 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95   
©中国中文信息学会 1981-2007
京ICP备05039057号