问答篇章生成系统中的用户模型和文本规划
[作者]吴华; 黄泰翼;
[摘要]在问答生成系统中 ,如果系统首先了解用户对问题所涉及的领域知识的掌握程度 ,系统则能根据这些知识组织文本 ,生成符合用户需要的内容 ,更好地进行人机交互。本文以花卉知识查询系统为基础 ,探索了用户知识对生成结果的影响 ,以及用户模型与文本规划之间的相互作用。实验结果表明 :用户知识模型不但影响生成的内容 ,而且影响生成内容的风格。在此系统中 ,我们采取两种基本生成策略 :Schema方法和Process方法 ,并探讨这两种生成方法的相互结合过程
[Abstract]In a question answering system,if the system can get a view of the do main knowledge that the user masters,it can generate answers both informative and understandable to the user,which can make the interaction between human and computer better.Based on the flower knowledge retrieval system,this paper discusses the effect of the user model on the generated contents and the relationship between the user model and the text planner.Experiments show that the user model affects not only the generated contents bu...
[关键字]用户模型; 文本规划; 汉语生成;
|
校园导航系统Easy Nav的设计与实现
[作者]黄寅飞; 郑方; 燕鹏举; 徐明星; 吴文虎;
[摘要]本文介绍了校园导航口语对话系统EasyNav的设计与实现。在分析了口语对话系统的特点和要求之后 ,我们提出了适合于对话系统的基于规则的语言理解流程。在这一流程中 ,句法分析使用GLR分析器处理上下文无关文法 (CFG) ,获取句子结构特征以便为语义分析服务 ,句法规则照顾到覆盖率和准确率间的平衡。语义分析使用考虑句法约束条件的模板匹配方法 ,以获取话者意图为目标 ,并消除句法分析引入的歧义。这一设计的优点是系统容易搭建 ,也容易扩展
[Abstract]In this paper we present the design and the implementation of a Chinese spoken language dialogue system named EasyNav which is for Tsinghua University Campus Navigation. By analyzing the features and requirements of spoken language dialogue system, we design a rule based language understanding procedure that is suitable to it. The syntactic parser applies the GLR algorithm to process the Context Free Grammar(CFG), whose purpose is to extract features of syntactic structure for use by the semantic parser. T...
[关键字]口语对话系统; 语音理解; 句法分析; 模板匹配;
|
一种现代藏文笔段提取算法
[作者]王浩军; 赵南元; 邓钢轶;
[摘要]针对藏文字符笔段的几何特征和拓扑结构 ,本文提出了一种基于字符轮廓信息的藏文笔段提取算法 :通过链码跟踪的方法得到笔段轮廓的点列 ,然后从点列中提取特征点并利用特征点切分出笔段 ,最后用笔段的轮廓线代替骨架线来表征藏文的笔段。本算法用于印刷体藏文笔段提取 ,取得了良好的效果 ,避免了传统细化算法所造成的畸变 ,提高了笔段提取的抗干扰能力 ,并减小了计算量 ,加快了特征提取的速度
[Abstract]A stroke segment extraction algorithm for Tibetan character is presented in this paper.Based on the geometrical features and topology structures of Tibetan character,this method successfully utilizes contour information to extract stroke segments of Tibetan characters.First contour points are extracted bychain code following,then feature points are detected and used to separate strokes,finally contour lines are used to represent strokes instead of skeleton lines.Experimental results show that the proposed a...
[关键字]文字识别; 藏文识别; 笔段提取; 轮廓信息;
|
基于HMM的联机汉字识别系统及其改进的训练方法
[作者]刘家锋; 黄健华; 唐降龙;
[摘要]本文描述了一个基于HMM模型的联机汉字识别系统的设计思想与实现方法。系统以联机汉字的笔段序列作为观察序列 ,采用带有多跨越的模型结构消除自由书写汉字笔段序列的冗余与丢失问题。HMM模型的训练是本系统设计的一个重要问题 ,针对复杂HMM模型参数训练容易收敛于局部最小的情况 ,本文结合联机汉字识别的特点 ,提出了一种利用“引导模型”进行训练的改进方法 ,避免了训练过程收敛于局部最小点的发生。经过大量样本的训练 ,本系统对规范书写汉字和自由书写汉字均取得了比较令人满意的结果。
[Abstract]This paper describes the design and implementation of an on line Chinese Character recognition system, which is based on Hidden Markov Models The strokes of on line Chinese character are regarded as the input observation sequence, and a multi cross left right model structure is employed in order to eliminate the influence caused by redundancy or loosing of strokes The training of HMM models is also an important problem for this system, in order to avoid the training process falls into local minimum,...
[关键字]隐含Markov模型; 联机汉字识别;
|
“炎黄”中文平台结构设计
[作者]吴健; 孙玉芳; 李国华; 李祥凯;
[摘要]随着我国计算机应用水平的提高 ,Internet的迅速普及 ,GB2 31 2 - 80中的 6 76 3个汉字已不能满足应用的需要。ISO 1 0 6 46标准的制定 ,使得为开发支持大汉字字符集的中文平台提供了宽阔的代码空间。我们的工作目标就是探讨在现有的的开放系统上 ,提供支持ISO 1 0 6 46标准CJK大字符集、支持多种内码、兼容现有中文平台、与原英文系统及版本无关、符合国际、国家标准、具有一定跨平台功能、实用高效的中文平台解决方案及实现技术。本文详细阐述了该中文平台的设计目标、模块结构、以及各个子系统的实现方法
[Abstract]With the application of computer being more and more deeply and Internet being more and more popular,6763 Chinese characters defined in GB 2312-80 can not meet the needs.ISO 10646 standard provides a square built code space for developing Chinese platform that supports large Chinese character set. We have studied the technique on implementing Chinese platforms.Our Chinese platform supports CJK large Chinese character set of ISO 10646 standard and multi internal codes,and it is compatible with present Chi...
[关键字]ISO10646标准; 大字符集; GBK; 中文平台; 跨平台;
|
蒙古文整词计算机生成理论研究
[作者]S·苏雅拉图;
[摘要]采用面向对象方法 ,模拟传统蒙古文整词各种形式构成机理 ,提出了几种蒙古语整词计算机生成数据模型。文章主要依据整词计算机生成三种模型 ,探讨了传统蒙古文整词计算机最优化生成理论所涉及的精确度、时间复杂度、空间复杂度三项基本要素以及最优化生成必须考虑的整词复杂特征载荷与一体化合一计算知识表示方法和计算结构 ,证明了“B -J -T=W”数据模型是传统蒙古文整词计算与生成最优化对象模型。
[Abstract]The author simulates various forms of construction mechanism of traditional Mongolian word and proposes some Mathematical models for whole word construction on computer.Based on these mathematical models of word construction the author caries out an investigation of accuracy,time computer,space complexity etc,three key elements for optimistic word construction theory on computer of traditional Mongolian writing language.It also gives a study on computational structure,parallel knowledge processing method an...
[关键字]拼音文字整词; 生成模型; 精确度; 时间度; 空间度; 优化;
|
中国中文信息学会第五次全国会员代表大会暨学会成立二十周年学术年会征文通知
[作者]
[摘要]
[Abstract]
[关键字]学术年会; 中文信息;
|
基于词性和语义知识的汉语句法规则学习
[作者]苑春法; 陈刚; 黄昌宁;
[摘要]本文提出了一种汉语句法规则学习的新方法。本方法的特点是 :在规则的学习和表示上都利用了词性、语义以及上下文相关的信息。它不仅能自动学习上下文无关的二元规则 ,而且还能自动发现词类搭配中的歧义结构 ,并利用语义和上下文相关信息将歧义规则在句法分析之前进行排除。实验结果表明 ,该方法较好地解决了汉语句法规则的自动获取及排歧问题并极大地降低了句法分析的难度 ,显示了很好的应用前景。
[Abstract]In this paper,we put forward a new method about Chinese Grammar ruleslearning.The key point of this method is using part of speech,semantic and contextual information together in the learning and expression of Chinese grammar rules.In this way not only some context free rules can be learnt,but also the ambiguous structures in part of speech can be found automatically.And the ambiguous problem can be solved before the parsing.Experimental results demonstrate that Chinese grammar rules could be acquired a...
[关键字]句法分析; 二元语义规则; 二元词性规则; 禁止规则;
|
义类自动标注方法的研究
[作者]齐璇; 王挺; 陈火旺;
[摘要]句法分析不能满足汉语分析的需要 ,句法和语义相结合的分析方法适用于汉语分析。分析的基础要有一部语法语义词典。目前的机读词典多是语法词典 ,因而需要在语法词典中加入词的语义信息。《同义词词林》是一部较好的义类词典 ,但没有语法信息 ,可以《同义词词林》的分类体系对语法词典进行义类标注 ,得到语法语义词典。这一过程中有不一致的情况 ,特别地 ,对《同义词词林》中未收录的词就不能直接标注义类。本文采用《同义词词林》的分类体系 ,研究设计了一个汉语词自动义类标注算法 ,对北大《现代汉语语法信息词典》进行自动义类标注。实验结果较为满意 ,得到 91 %的准确率。
[Abstract]During the development of Machine Translation System,We need a semantic class dictionary to do semantic analyzing.This paper designs an arithmetic of semantic class automatic tagging which tags words by use of “TongYiCiCiLin”.The idea of the arithmetic is that the meaning of most of words can be represented by the main component of words.We did an experiment tagging words in "XianDaiHanYuYuFaXinXiCiDian" and got a machine readable syntactic semantic dictionary.
[关键字]语义类; 义类标注; 自然语言处理;
|
面向范畴语法分析的汉语词库的构造及实现
[作者]秦莉娟; 周昌乐;
[摘要]在蒙太鸠语法理论的基础上 ,利用范畴语法对汉语进行句法分析 ,并针对汉语范畴动态标注的不确定性进行跨层次松弛关联的计算研究 ,需要相应地构造范畴化机器词库。本文采用基本词库加扩展生成的思想构建生成的面向范畴语法分析的汉语词库 ,除具有一般词库的特点外 ,还对词语的范畴归属、词谓、词用等相关信息给出说明 ,以供范畴句法分析时选用。实验结果表明 ,在假设完备的前提下 ,测试该词库取得了较好的效果。
[Abstract]Based on the Montague Semantics,it is badly needed to build a corresponded category word stock for the Chinese analysis,which puts emphases on the category affirming.In this article,a new way is proposed for the stock,which is to develop the word stock on a basic one with the utilization of itself.This Chinese word stock can not only be used as a general dictionary,but also provide the refer information for the Chinese understanding.The experimental result shows encouraging performance.
[关键字]范畴语法; 汉语词库; 自然语言理解;
|
共95页 当前第39页