句法评分和语义评分
[作者]刘颖;
[摘要]本文使用句法评分和语义评分对句法分析和语义分析阶段进行消歧。句法评分和语义评分可以和传统的句法语义分析阶段结合起来 ,更有效地对自然语言进行分析。这是规则方法和统计方法相结合的一种行之有效的方法。对于句法语义评分 ,使用最大可能原理和K best方法进行实验 ,实验结果表明 :对于训练集和测试集 ,两种方法在考虑一个左上文或一个左右上下文时都比不考虑上下文的正确率高。所有训练集的正确率比测试集的正确率高。对于训练集 ,当语料规模越来越大时 ,正确率也逐渐在增加。
[Abstract]Syntactic score and semantic score are used for disambiguiting during syntactic analysis and semantic analysis phases.Syntactic and semantic score can be combined with traditional syntactic and semantic analysis,thus natural languages can be analyzed effectively.This is a kind of effective method that rule based method is combined with statistical method.Experiments are made for syntactic and semantic score using maximum likelihood principle and k best method,experimental results show that for training se...
[关键字]句法评分; 语义评分; 消歧;
|
中文全文检索系统中的压缩模型和模式匹配技术
[作者]刘祖斌; 王永成; 刘椿年;
[摘要]本文给出了一种适用中文全文检索系统的压缩模型 ,使传统的LZW模型能适用于大字符集语言源文本。方法的关键是通过引入切割标记控制字典多叉树的节点的无限扩大。对文件的检索直接在压缩文件上进行 ,因而可较大地提高检索效率。
[Abstract]We propose an efficient compression scheme for Chinese text which is based on the useful LZW method.The general purpose compression utilities is not suited for Chinese text for its large alphabet.The key technique in our scheme is“Chinese words segment signs”which could reduce the size of the tree dictionary.The retrieve of the document is processed in the compressed file directly,therefore,allowing faster search at the same time.
[关键字]数据压缩; 模式匹配; 全文检索;
|
基于p范式模型的检索
[作者]迟呈英; 战学刚; 姚天顺;
[摘要]随着电子文本的大量涌现 ,人们对信息检索工具提出了更高的要求。本文介绍一种扩展的布尔检索模型及其在中文信息检索系统中的应用 ,并利用相关反馈技术改善检索系统性能。
[Abstract]With more and more electronic texts available,more efficient retrieval tools are required.This article describes an extended Boolean retrieval model and its application in Chinese information retrieval systems.It also discusses retrieval improvement through relevance feedback.
[关键字]信息检索; 向量空间模型; 布尔模型; p范式模型;
|
面向置标文档的文档转换技术研究
[作者]李景春; 武港山; 王强; 张福炎;
[摘要]文档系统间的转换是文档内容共享和协作的必然途径 ,转换根据不同应用目的包括失真 ,不失真和增值三种方式。置标文档是用标签 (Tag)进行文档结构描述的文档。本文介绍了一种面向置标文档的文档转换增值技术 ,给出了一种文档转换描述语言 ,用户可以利用它来定义转换信息从而实现文档间复杂的转换。
[Abstract]Document transformation among different document system is a necessary approach to content sharing and cooperation.Based on the application background,the transformation can be divided into three categories:distortion,non distortion and increment.Markup document uses the tag to describe the structure.This paper introduces technology of markup document oriented document transformation,and presents a document transformation describe language which can be used for transformation information definition,with t...
[关键字]文档转换; 失真; 增值; 置标文档;
|
车牌识别(LPR)中的图像提取及分割
[作者]刘智勇; 刘迎建;
[摘要]在车牌识别 (LPR)系统的实现过程中 ,最关键的部分就是车牌图像的提取以及车牌字符图像的分割。本文详细介绍了一种实际应用的车牌识别系统中的图像提取及分割的过程。针对车牌的固有特点 ,设计了一个变换函数突出其特点从而进行车牌的提取 ;对车牌字的图像分割提出并解决了一些在实际中应该注意的地方。理论分析及实验结果表明文章中提出的方法是非常有效的。在我们的实验中 ,在PentiumⅡ 30 0 ,内存 6 4M的环境 ,从图像输入到识别结果输出的平均时间大概为 0 .6秒。
[Abstract]The key portions in the Vehicle License Plate Recognition (LPR) are License Plate Image Extraction and Character Image Segmentation.In this paper the two portions of an applied LPR are introduced in detail.The author designed a function to extrude the characteristics of vehicle license plate so as to extract the license plate image from complex background.In segmentation some important problems encountered in practice are put forward and solved.Theoretical analyses and experimental results demonstrate that ...
[关键字]车牌识别(LPR); 图像复原; 图像提取; 图像分割;
|
一种层次化的LSD规则体系及其分析算法
[作者]李沐; 姚天顺;
[摘要]本文提出了一种基于词汇属性结构描述和规则继承的层次化LSD规则体系 ,讨论了该规则体系下的规则搜索策略和词汇化规则索引的实现方法 ,并在此基础上首次给出了LSD文法的非确定性分析算法。该规则系统具有从传统属性文法到现代词汇文法的可伸缩性 ,同时较好地解决了线性规则库中复杂的规则交互问题。
[Abstract]This paper presents a rule system based on lexical attributes description and rule inheritance.We discuss its searching and indexing strategy, and propose a non deterministic LSD parsing algorithm for the first time.The rule system is scalable from attribute grammar to lexicalized grammar and can eliminate the problems of rule interacting in a linear rule base.
[关键字]LSD方法; 词汇语义驱动; 规则继承; 规则搜索策略; 层次化规则体系;
|
基于ER模型的数据库受限汉语查询界面RChiQL的文法分析系统研究
[作者]崔宗军; 唐世渭; 杨冬青;
[摘要]RChiQL是一个基于受限汉语的关系数据库查询语言界面的计算模型 ,其中文法分析占有重要地位。本文引入了一种新的文法GWERSC(GrammarwithERSemanticCharacter istics ,ER语义特征文法 ) ,设计了分析算法 ,其内嵌的ER语义特征有利于排除语法分析的歧义并可简化语义分析。
[Abstract]RChiQL is a computational model of restrictive Chinese based query interface to relational database in which grammar parsing takes very important place.A new grammar named as GWERSC(Grammar with ER Semantic Characteristics)is put forward and it's parsing algorithm is designed and implemented in which the embedded Semantic Characteristics can contribute to syntactic parsing and simplify semantic processing.
[关键字]关系数据库; ER模型; 自然语言接口; 文法分析;
|
机器学习在汉语关联词语识别中的应用
[作者]高维君; 姚天顺; 黎邦洋; 陈伟光; 邹嘉彦;
[摘要]关联词语在一些汉语议论文章中占很大的比重 ,因而 ,对于此类汉语文章的分析 ,关联词可以起到非常重要的作用。本文主要讨论如何将机器学习应用于汉语关联词的歧义辨别———原因 ,方法和效果。我们在已经加工完毕的 80篇汉语语料的基础上 ,抽取了用于机器学习的训练集和测试集 ,并使用C4.5进行了测试 ,识别正确率在 80 %以上。在文章的后面 ,我们还从语言学的角度对机器学习的结果进行了解释和分析。
[Abstract]With their high occurrence rates in argumentative Chinese texts,discourse markers play a significant role in the automatic processing of these kinds of Chinese texts,such as automatic summarization.This paper reports on an effort in applying machine learning to identify discourse markers in Chinese.We have processed 80 Chinese texts from which we have selected subsets for data training and data testing.We used C4.5 in our experiments and obtained accuracies of the order of 80%.We also interpret and analyze ...
[关键字]关联词语; 机器学习; C4.5; 话语分析; 语料库;
|
基于参照的对词结构操作语义的归纳学习
[作者]危辉;
[摘要]心理语言学的研究和认知发展过程证明在语言获得的早期经历了一个自主的归纳学习过程 ,本文的出发点是语言发展的规律 ,并将词结构形式语义的获得过程和表示基础放在一个具有统一的语言理解和语言产生机制的语言信息加工模型中来考虑。本文讨论了一个基于实例的机器学习系统 ,为了获得词结构的形式语义 ,采用了操作语义的定义 ,并设计了一个基于参照的发现学习算法 ,其目的是使语义能伴随例句样本的丰富而精密化。
[Abstract]The research of Psycholinguistics and cognition development have proved that there was a independent inductive learning phase in the early course of language development.The start point of this paper is the order of language development,the acquisition process of formal semantics and its representation base are regarded as one pace of a speech information processing model which has the consistent mechanism to achieve language understanding and its production.In this paper a case based machine learning syste...
[关键字]归纳学习; 操作语义; 计算语言学;
|
基于统计方法的中文姓名识别
[作者]刘秉伟; 黄萱菁; 郭以昆; 吴立德;
[摘要]本文介绍一个中文姓名的自动识别系统 ,该系统使用从姓名样本库和真实文本语料库中得到的大量统计数据 ,以提高系统识别性能。我们从 1 994年人民日报中随机抽取 1 0 0篇文章作为测试样本 ,实验结果表明 ,准确率和召回率可同时达到 90 %以上。
[Abstract]This paper presents an automatic identifying system of Chinese names.The system makes use of a large amount of statistical data,which are extracted from real name library and real text corpus,to enhance its identifying performance.The testing sample,including 100 articles,are extracted from the People's Daily 1994 News Corpus.The experiment shows that the recall and the precision can both reach above 90%.
[关键字]自动分词; 未登录词; 中文姓名识别;
|
共95页 当前第45页