基于How Net的事件角色语义特征提取
[作者]郝秀兰; 杨尔弘; 舒鑫柱;
[摘要]本文提出了一种将HowNet中事件的主要特征与实体的主要特征联系起来的方法———为事件类定义角色语义表 ,从而将HowNet的事件类与语义解释联接起来。文中给出了角色语义表的形式描述、一个角色语义表获取算法 ,并举例说明了角色语义表的应用
[Abstract]A method associating main features of events with ontology categories in HowNet is presented.By defining role and semanteme lists for events,we can link event classes in HowNet into semantic interpretation.The formal description of role and semanteme lists and an algorithm acquiring them are given,then their applications are exemplified by instances.
[关键字]How; Net; 角色语义表; 事件类; 实体类; 特征提取;
|
中文校对系统中纠错知识库的构造及纠错建议的产生算法
[作者]张仰森;
[摘要]本文依据待校对文本中的常见错误类型介绍了纠错知识库的构造方法以及基于该纠错知识库的自动纠错算法。该算法通过利用出错字串的特征 ,结合上下文启发信息 ,可有效地对文本中的别字、漏字、多字、易位、多字替换等错误提供纠错建议。文中还对纠错建议的排序算法进行了讨论
[Abstract]According to common error types in pre proofreading text,this paper introduce the method to structure correcting knowledge sets and a automatic correcting algorithm based on this correcting knowledge sets.The algorithm makes a full use of the characteristics of wrong strings and context heuristic information.It can provide correcting suggestions for such errors as ghost word,missed Chinese characters,superfluous Chinese characters,reversed Chinese characters and substituted Chinese characters etc.The metho...
[关键字]纠错知识库; 纠错建议; 纠错算法; 似然匹配;
|
数据库自然语言查询系统Nchiql中语义依存树向SQL的转换
[作者]孟小峰; 王珊;
[摘要]本文介绍了关系数据库受限自然语言查询系统NChiql中语义依存树向SQL的转换算法。文章首先介绍了集合块的概念、划分方法以及集合块向SQL的转换算法 ,然后再给出最大集合块的再次划分方法 ,最后形成完整的转化算法
[Abstract]This paper introduces the method of tranforming the semantic dependent tree to SQL in Nchiql.First we describe the concept of setblock and its transforming method,then we give the whole algorithmic of SQL transforming method based on concept of setblock.
[关键字]语义依存树; SQL语言; 自然语言接口;
|
文本数字水印
[作者]黄华; 齐春; 李俊; 朱伟芳;
[摘要]目前数字水印技术的研究和文献主要集中在静止图像和视频的保护等方面 ,文本数字水印研究的很少 ,国内甚至还未见到文本数字水印的相关文献。而实际上 ,一些文本文档比图像、视频等更需要得到保护 ;数字文本的保护对互联网时代的政府工作和电子商务等也具有重要意义。本文主要介绍文本数字水印技术的基本思想和目前的研究状况 ,首先介绍了文本数字水印的嵌入与检测方法 ,然后分析了用于中文的文本数字水印的研究方向以及可能的应用前景
[Abstract]Presently the researches and literatures on the technology of watermarking are mainly focused on the protection of still images and video documents.But few researchers'interests are in the watermarking of text document.In China,there is even no literature on this area.But in fact,some text documents are more necessary to be protected than images and video documents.And protecting digital text documents is very important for the work of government and electronic business in the era of Internet.In this paper ...
[关键字]文本数字水印; 版权保护;
|
一种新的基于统计的词典扩展方法
[作者]周正宇; 李宗葛;
[摘要]在建立统计语言模型时 ,往往会遇到词典的词汇量不够的问题。对于医学等专业领域的语料 ,这一问题尤为严重。针对这一问题 ,本文提出了一种新的基于统计的识别新词方法———右边缘扩展法。该方法对分词后的语料中产生的连续单字词进行关联范数估计 ,利用右边缘扩展的方法判断词的边界。在实验中 ,我们将右边缘扩展法与基于Witten Bellbackoff方法的两两合并法相结合 ,循环地调整词典 ,优化语言模型。实验结果表明 ,该算法具有很高的识别正确率与检出率 ,可以有效地识别出语料中出现的新词汇 ,尤其是专业术语
[Abstract]The out of vocabulary problem is one of the bottlenecks in Chinese Language Modeling.The problem is especially serious for domain specific training data set.This paper presents a new statistical method to extract new words from the training data.This new method is based on association norm estimation,and searches for the word boundaries by right boundary expanding.Combining the new method with another word merging method,we can iteratively optimize the lexicon,segmentation and language model.And very en...
[关键字]词典; 关联范数估计; 右边缘扩展法; 语言模型;
|
适用于信息设备的汉字输入法研究
[作者]倪小东; 李人厚; 余克艰; 庞宣明;
[摘要]当前 ,小电器产品和移动通讯产品都朝着数字化和网络化方向发展 ,特征之一是允许企业和用户之间、用户和用户之间可以进行交互式的信息交换 ,汉字输入对于这类产品在中国的推广应用是非常重要的。本文介绍了一种适用于数字键盘上使用的汉字输入技术 ,它由基于数字键盘的英文、全拼和前导拼音输入法组成 ,能够用于各类信息设备进行大量中英文混合信息的方便、快速输入。本文首先描述了输入法设计思想 ,然后分析了其性能和特点
[Abstract]More recently,tiny electronic products and mobile communication products are being developed towards digital and network.Through these information devices,interactive information exchanges may be enabled between enterprises and users,and among various users.Chinese input technology is very important for them to be applied in China.The paper introduces a kind of Chinese input technology including three input methods based on digital keyboards,such as English,PinYin and Qian Dao Pin Yin,for information device...
[关键字]汉字输入法; 数字键盘; 信息设备; 中文信息处理;
|
健壮性学习算法
[作者]刘颖;
[摘要]使用统计方法可以对汉英机器翻译的词性标注和句法语义分析阶段产生的歧义进行消歧 ,在估计过程中往往使用最大可能方法 ,但是并不是在所有的情况下取最大值都是正确的。为了从所有候选结果中取到正确的结果 ,本文使用健壮性学习算法。使用这个算法 ,当正确的候选结果评分不是最高时 ,仍能通过健壮性算法来调整正确结果的评分 ,使之最大 ,并且降低不正确候选的评分。而且 ,由于训练集与测试集存在不同 ,使训练集中的错误率最小不能保证测试集中的错误率也最小。因此当考虑训练语料库和测试语料库存在统计变化时 ,应该使用健壮性学习算法
[Abstract]Disambiguities of part of speech tagging , syntactic and semantic analysis are disambiguted using statistical method. Maximal likelihood principle is used for disambiguting, but it is not all right under all conditions. Robust learning algorithm is used in this paper in order to acquire the right result among all candidates. When score of the right candidate is not maximal, it can be adjusted using robust learning algorithm, thus score of the right candidate is maximal and score of the wrong candidate is ...
[关键字]健壮性学习算法; 机器翻译; 评分;
|
基于ER模型和受限汉语的数据库中文查询语言研究
[作者]崔宗军; 唐世渭; 杨冬青;
[摘要]本文给出了一个基于ER模型和受限汉语的关系数据库汉语查询语言的计算模型RChiQL (RestrictiveChineseQueryLanguage)及其实现方案 ,系统模拟人脑对语言处理的并行机制 ,将中文查询句的处理分为四个相互依存、相互交织的步骤 (词的切分 ,文法分析 ,语义分析和SQL转换 ) ,其中引入了一种新的文法GWERSC(GrammarwithERSemanticCharac teristics,ER语义特征文法 ) ,其内嵌的ER模型语义既有利于语法分析又简化了语义分析 ,取得了很好的效果
[Abstract]A computational model of ER model based restrictive Chinese query language of relational database is put forward which simulates the langguage process mechanism of human and the process of communicating in natural language is divided into four mutually dependant and interlaced steps: word segmentation,parsing,semantic processing and SQL transformation.A new grammar,GWERSC(Grammar with ER Semantic Characteristics)is introduced,which could contribute to syntactic parsing and simplify semantic understanding...
[关键字]关系数据库; 自然语言接口; ER模型; 受限汉语;
|
基于单汉字索引的全文检索系统的优化研究
[作者]余海燕; 张仲义;
[摘要]对于按照单汉字建立倒排索引的全文检索系统 ,最需要解决的问题是如何提高其存储效率和运算速度。本文针对此问题提出了以下优化方法 :一是利用参数化的Golomb编码对倒排文件进行压缩 ;二是对求集合交集的逻辑乘算法进行改进 ;三是运用并行计算和双缓冲技术。实验结果表明 ,经过优化后的单汉字全文检索系统已达到实用化的程度。
[Abstract]This paper discusses the optimization of full text retrieval system based on “indexing of single Chinese character” from three aspects: the compression of inverted index file using Golomb coding method, the bidirectional binary search intersection algorithm, the technique of parallel computing and double buffer cache. The experiment shows that these optimizations introduce the less storage spending and higher performance to the system.
[关键字]全文检索; 单汉字标引; 倒排文件; Golomb编码;
|
三个层面的中文文本主题自动提取研究
[作者]韩客松; 王永成; 沈洲; 吴芳芳;
[摘要]为适应Internet时代和大规模文献处理的需要 ,以中文文本为处理对象 ,研究了从主题词、主题概念和主题句三个不同层面自动抽取文本主题的方法 ,着重讨论了加权体系和一些经验值的获取方法。对新闻类文献做了实验 ,并简单进行了性能分析
[Abstract]To meet the requirement of Internet and large scale text processing,this paper introduces how to automatically extract subject from Chinese texts. We extract the subject from three different levels: subject word,subject concept and subject sentence. We put the emphasis on how to form the weighting system and acquire the experience coefficient values. Based on the experimental results of news articles,we briefly analyze the performance.
[关键字]主题词; 主题概念; 主题句; 加权;
|
共95页 当前第38页