|
|
|
语音识别准确率与检索性能的关联性研究
[作者]周梁; 高鹏; 丁鹏; 徐波;
[摘要]对海量语音进行基于内容的检索需要语音识别技术和检索技术的结合。本文通过调节语言模型的途径研究在不同识别率的语音识别文本上进行关键词检索的差异,由此研究语音识别性能和检索性能之间的关联性。通过对114小时语音数据的实验表明:语音识别性能与检索性能有一定的相关性,同时也说明改进检索的方法可以消除一部分由于语音识别所带来的误差。研究结果为进一步针对性地改进识别引擎、语音识别输出的表示和相应的快速检索方法提供了基础。
[Abstract]It is a paradigm to integrate speech recognition and information retrieval techniques to implement content-based retrieval in mass speech data.The paper studies the relationship between speech recognition performance and retrieval performance,through analyzing the differences of retrieval in the recognition documents with different recognition rates,which are adjusted by the language models.The experiment on 114 hours speech data indicates: speech recognition performance has some correlation with retrieval ...
[关键字]计算机应用; 中文信息处理; 语音识别; 关键词检索; 查全率; 查准率;
| 中文文本分类中基于概念屏蔽层的特征提取方法
[作者]廖莎莎; 江铭虎;
[摘要]本文提出了一种新的基于概念抽取和屏蔽层的特征选择方法。该方法利用HowNet概念词典中的概念树,通过义原在概念树中的位置信息进行概念抽取,并赋予其适当权值来说明其描述能力。对于权值低于屏蔽层的义原,我们不将其选入特征集,并相应保留原词。具体到每个词,我们计算其DEF条目中的权值,决定是将原词选入特征集还是进行概念抽取。本文重点研究了如何给义原设定一个合适的权值,如何在选取原词和概念之间取得平衡以及针对非概念词的加权处理。实验证明,设定合适的屏蔽层,不仅可以缩小特征维数,使分类正确率得到一定的提高,而且可以减少不同类别间的分类正确率的差别。
[Abstract]In this paper,we propose a novel feature selection method based on concept extraction and shielded level.In this method,we use HowNet as the semantic dictionary to extract concept attributes.Based on their positions in the concept tree,the attributes will get proper weights,which present their description powers.A concept attribute will not be selected as feature if its weight is lower than the shielded level and the original word will be reserved for use.To each word,we calculate all the weights of the con...
[关键字]计算机应用; 中文信息处理; 文本分类; 特征提取; 概念抽取; 属性特征树; 屏蔽层; 描述能力;
| 基于相关文档池建模的查询扩展
[作者]吕碧波; 赵军;
[摘要]在信息检索领域,相关反馈是提高检索性能的有效方法之一。所谓相关反馈,指用户按照一定策略从查找到的相关文档中选择一些和主题相关的词进行查询扩展的技术。本文介绍了概率模型和向量空间模型下的常用查询扩展方法,并提出了一种基于语言模型的相关反馈方法,该方法同时考虑了扩展词应该具备的两个特征,即相关性和覆盖性。在TREC测试集上对这些算法进行了比较,结果表明这种新算法在平均准确率上比传统方法有所提高。
[Abstract]In information retrieval,relevance feedback is an effective way to improve retrieval performance.The goal is to input user's judgement on previous retrieved documents,and to select some terms for query expansion using certain strategy.This paper introduces some common query expansion approaches in relevance feedback based on probability model and vector space model,then a new term selection method is introduced based on language model,which takes into account two features of expanded terms-"relevance"and"co...
[关键字]计算机应用; 中文信息处理; 信息检索; 相关反馈; 查询扩展;
| 基于n-gram语言模型和链状朴素贝叶斯分类器的中文文本分类系统
[作者]毛伟; 徐蔚然; 郭军;
[摘要]本文提出了一个基于n-gram语言模型进行文本表示,采用链状朴素贝叶斯分类器进行分类的中文文本分类系统。介绍了如何用n-gram语言模型进行文本表示,阐述了链状朴素贝叶斯分类器与n-gram语言模型相结合的优势,分析了n-gram语言模型参数的选取,讨论了分类系统的若干重要问题,研究了训练集的规模和质量对分类系统的影响。根据863计划文本分类测评组所提供的测试标准、训练集以及测试集对本文所设计的分类系统进行测试,实验结果表明该分类系统有良好的分类效果。
[Abstract]An automatic Chinese text categorization method based on n-gram language model and chain augmented na?ve Bayesian classifier is proposed.The paper introduces the representation of a text through n-gram language model,argues the advantage of combining n-gram language model and chain augmented na?ve Bayesian classifier,analyzes how to choose the parameters of n-gram language model,and discusses some crucial problems of the categorization system.The effect of quantity and quality of training corpus on classifi...
[关键字]计算机应用; 中文信息处理; 中文文本分类; n-gram语言模型; 链状朴素贝叶斯分类器;
| 《元朝秘史》电子文本检索系统的研制
[作者]江荻; 严海林; 孙伯君; 斯钦朝克图; 孟达来;
[摘要]本文概要地介绍了13世纪《元朝秘史》的文献背景及原文所独有的复杂文本形式,通过对文本的内涵分析和版面分析,设计了关于《元朝秘史》电子检索系统的研制方案。其中主要解决了原文三行一体显示格式的还原问题,而且系统可以分别对原文汉字音写、汉语译文、汉字旁译、语音语法标注等不同部分进行检索和统计。检索输出结果包括研究者最重视的传统学术章节号、卷页码、在电子文本出现的具体位置。另外,系统对检索词采用了上下文检索技术,输出文本包括检索词的部分上下文内容。本系统基本满足历史、文学和语言研究的应用需求。
[Abstract]This paper firstly gives a brief introduction to the background of the Secret History of the Mongols,the great book published in 13 century in Yuan Dynasty,and its special complicated original typeface in form.After an analysis to its content and page form,a scheme of electronic retrieval system has been then designed for it,which resolves the problem of returning to the original shape of the archaic writing form with three lines representing one content.Furthe more,the retrieval system also provide the fun...
[关键字]计算机应用; 中文信息处理; 元朝秘史; 复杂文本; 电子检索系统;
| 一种基于概率上下文无关文法的汉语句法分析
[作者]林颖; 史晓东; 郭锋;
[摘要]本文研究了PCFG独立性假设的局限性,并针对这一局限性提出了句法结构共现的概念以引入上下文信息,给出了计算方法;为了打破中文树库规模过小的局限性,对于句法规则参数的获取,本文利用In-side-Outside算法进行迭代,最后提出了一个基于统计模型的自顶向下的汉语句法分析器。在封闭测试下,其标记精确率和标记召回率分别为88.1%和86.8%。实验结果表明,这种方法确实能够提高标记的精确率和召回率,值得深入研究。
[Abstract]This paper studies the limitations of probabilistic context free grammar,and proposes a concept of co-occurrence in syntax structure so as to use the context information.To address the limitation of the Chinese Treebank's small scale,an Inside-Outside algorithm to obtain the parameters of syntactic rules is given.At last,we present a probabilistic top-down Chinese parser.In the closed test,we get the result that label precision and label recall are 88.1% and 86.8%, showing that this method has potential to ...
[关键字]人工智能; 自然语言处理; 统计句法分析; 概率上下文无关文法; 汉语自动分析;
| 基于句法结构分析的中文问题分类
[作者]文勖; 张宇; 刘挺; 马金山;
[摘要]问题分类是问答系统中重要的组成部分,问题分类结果的好坏直接影响问答系统的质量。本文提出了一种用于问题分类的特征提取的新方法,该方法主要使用句法分析的结果,提取问题的主干和疑问词及其附属成分作为分类的特征,此方法大幅度地减少了噪音,突出了问题分类的主要特征,利用贝叶斯分类器分类,有效地提高了问题分类的精度。实验结果证明了该方法的有效性,大类和小类的分类精度分别达到了86.62%和71.92%,取得了较好的效果。
[Abstract]Question classification is very important for question answering,and the result of question classification directly affects the quality of question answering.This paper presents a new method on feature extraction for question classification.The output of syntactic parsing is used in this method to extract the Subject-Predicate structure as well as interrogative words and their adjunctive parts as features for classification,leading to substantial reduction in noise,and emphasis on the main features of quest...
[关键字]计算机应用; 中文信息处理; 问答系统; 问题分类; 特征提取; 句法分析;
| 基于事件框架的信息抽取系统
[作者]梁晗; 陈群秀; 吴平博;
[摘要]信息抽取技术能够提供高质量的检索服务。本文提出一种基于框架的信息抽取模式并建立统一的灾难性事件框架,利用框架的继承-归纳特性简化系统实现过程,概括事件信息,并提出按时间流顺序的线索性文件抽取的输出方式。本文使用这种方法建立了一个灾难性事件信息抽取系统。实验证明本文中的方法是有效的。
[Abstract]Information extraction technologies can provide high quality retrieval service.In this paper we present an information extraction model based on event frame,and build an unified calamity event frame.The extraction system can be easily implemented due to the inheritance and induction characters of the frame.We also use the frame to collect event information and then output the results in the order of time.A calamity event information extraction system is conducted using the methods.The experiment indicates t...
[关键字]计算机应用; 中文信息处理; 信息抽取; 框架; 继承; 灾难性事件;
| 一种改进的Wu-Manber多模式匹配算法及应用
[作者]孙晓山; 王强; 关毅; 王晓龙;
[摘要]本文针对Wu-Manber多模式匹配算法在处理后缀模式情况下的不足,给出了一种改进的后缀模式处理算法,减少了匹配过程中字符比较的次数,提高了算法的运行效率。本文在随机选择的TREC2000的52,067篇文档上进行了全文检索实验,对比了Wu-Manber算法、使用后缀模式的改进算法、不使用后缀模式的简单改进等三种算法的匹配过程中字符比较的次数。实验结果说明,本文的改进能够比较稳定的减少匹配过程中字符比较的次数,提高匹配的速度和效率。
[Abstract]The Wu-Manber multiple-pattern matching algorithm does not work well when some patterns are suffix of other patterns.To solve the problem,an improved algorithm is introduced which reduces the number of comparisons during pattern matching and leads to a faster matching algorithm.The text retrieval experiments use 52,067 passages which are randomly selected from TREC2000.Three algorithms including the Wu-Manber algorithm,the improved algorithm and the algorithm simply breaks halfway,are compared and the resul...
[关键字]计算机应用; 中文信息处理; 多模式匹配; 后缀模式; 字符串匹配; 全文检索; 信息检索;
| 《说文解字》音义关系的产生式表达
[作者]宋继华; 李国玉; 王宁;
[摘要]汉语语义关系的探求离不开汉字音义关系的探求,汉字的音义关系分为同音、同义和同源三种。探求汉字之间的音义关系、利用汉字的字音来推求字义之间的关系, 是《说文解字》研究的一项重要内容。为了便于基于计算机技术更全面地探求音义关系尤其是同源关系中的“音近”、“义通”关系,本文对音韵通转规则进行了形式化表述。在《说文》知识库中,建立了《说文》双声规则库和叠韵规则库(含8个规则表),它们通过“规则槽”与传统框架表示法中的“属性槽”和“属性库” 共同构成产生式框架,有效地表达了《说文》中的各项描述性知识和规则性知识,为后续研究奠定了基础。
[Abstract]The study of the Chinese semantic relationship relies on that of the phonetic-semantic relations among Chinese characters which fall into three types: homophony,synonymy and paronym.It is one of the most important tasks for Shuowenjiezi(SWJZ) researchers to explore the phonetic-semantic relationship of Chinese characters and then reason out the semantic relations between the characters using their phonemes.In order to better understand the phonetic-semantic relations based on computer technology,especially ...
[关键字]计算机应用; 中文信息处理; 说文解字; 音义关系; 同源; 产生式规则; 产生式框架;
|
共95页 当前第5页 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
|