以词汇知识驱动的词网自动对映
[作者]柯淑津;
[摘要]机读字典蕴藏着非常丰富的词汇语意知识 ,这些知识可由自动化方式粹取出来 ,有效地利用在各种自然语言处理相关研究上。本研究提出一套方法 ,以英文版的WordNet作为基本骨架 ,结合比对属类词与比对定义内容两种技巧 ,将WordNet同义词集对映到朗文当代英汉双语词典之词条。并藉由这个对映将Word Net同义词集冠上中文翻译词汇。在实验部分 ,我们依岐义程度将词汇分为单一语意与语意岐义两部分进行。在单一语意部分的实验结果 ,以 10 0 %的涵盖率计算 ,可获得 97 7%的精准率。而在语意岐义部分 ,我们得到 85 4 %精准率 ,以及 6 3 4 %涵盖率的实验结果。
[Abstract]Machine readable dictionaries have been regarded as a rich knowledge source from which various relations in lexical semantics can be effectively extracted.These semantic relations have been found useful for supporting a wide range of natural language processing tasks,from information retrieval to interpretation of noun sequences,and to machine translation.In this paper,we address issues related to problems in building a linkage between two existing lexical resources,WordNet and LDOCE E/C (English ...
[关键字]词汇网络; 机读字典; 统计处理; 属类词; 自动对映;
|
汉英法律文献的子条级自动索引和对齐
[作者]吕学强; 李清隐; 陈文亮; 姚天顺;
[摘要]本文提出了基于结构标识的法律文献层次结构模型 ,该模型描述了汉英法律文献的层次结构特征及章、条、子条的连续性和对应性。根据该模型实现了汉英法律文献的子条级自动索引和对齐 ,系统具有纠错和容错能力。实验结果表明每篇文献的平均索引时间为 3 31ms ,对齐准确率为 98 6 %。与基于词汇的方法结合后 ,对齐准确率为 99 3%。
[Abstract]A hierarchical structure model of the law literature based on structure identifier is presented in this paper.The model reflects the hierarchical characteristics and the continuity and parallelity of the chapters,articles,and sub articles.A system for indexing and aligning Chinese English law literature at sub article level based on the model is implemented which supports fault correction and fault tolerance.The test result shows that the average indexing time is 3.31ms,and the alignment precision is 9...
[关键字]文本索引; 文本对齐; 汉英法律文献; 结构标识; 层次结构模型;
|
一种基于混合分析的汉语文本句法语义分析方法
[作者]尹凌; 姚天昉; 张冬茉; 李芳;
[摘要]本文提出了一种领域相关的汉语文本句法语义分析方法。根据领域文本的特点 ,该方法将浅层句法分析和深层句法语义分析结合在了一起。其浅层句法分析部分采用有限状态层叠的方法 ,将文本中的命名实体识别出来 ,从而大大减轻了深层分析部分的负担。其深层句法语义分析部分将语义分析和语法分析结合起来 ,主要依靠词汇搭配信息来决定句子的结构。该方法在解决领域相关文本的短语结构歧义方面取得了较好的试验结果
[Abstract]This paper proposes a Chinese text analysis method on specific domain.According to the texts character,this method combines shallow parsing technology with deep parsing and semantic analysis technology.Drawing on finite state cascades method,its shallow parsing module recognizes named entities in the texts.So that it greatly eases the burden of the deep analysis module.Principally depending on word collocation information,its deep analysis module combines syntactic analysis and semantic analysis to determi...
[关键字]浅层句法分析; 深层句法分析; 有限状态层叠; 分语义场;
|
唐宋诗中词汇语义相似度的统计分析及应用
[作者]胡俊峰; 俞士汶;
[摘要]基于上下文的词汇向量空间模型可以用来近似地描述词汇的语义。在此基础上定义的词汇相似关系或聚类关系可以应用于词典编纂、智能搜索引擎的开发等许多领域。本研究 基于 6 40万字的唐宋诗语料。在进行多字词计算机辅助提取的基础上 ,定义了相应的词汇语义的统计表达。建立了词汇相似关系的语义网络。开发了具有词义联想功能的面向概念的唐宋诗搜索引擎。实验表明 ,达到了接近实用的水平
[Abstract]Context environment can be used to describe the meaning similarity between words.Corpus based similarity word extraction can be used in various kind of fields such as lexicon compiling and intelligent search engine.Based on 6 4 million chars of Chinese ancient poetry,a statistic model was defined to extract contextual similarity words from the corpus.A concept based intelligent search engine for Chinese ancient poetry was developed on top of the word similarity relations.The result is encouraging.
[关键字]词义相似度; 词义联想; 概念检索; 唐宋诗;
|
《全衡》词典的设计与建设
[作者]张小衡; 张群显;
[摘要]《全衡》是第一个较全面考虑香港和国际的需求的网上汉字输入系统 ,其核心部件是词典。《全衡》使用的是一部拥有六万余词条的词典 ,每一词条讲述一个词语 ,信息包括该词语的简体字形式、繁体字形式、汉语拼音表达式、粤语拼音表达式、仓颉输入法代码、速成输入法代码等。由其中任何一项入手 ,借助于系统中的检索程序可以方便地查找其它各项信息。这不仅有力地支持了汉字输入 ,对于汉语学习也很有帮助。本文简要介绍《全衡》的词典建设
[Abstract]AllBalanced is the first Web based Chinese character input system with substantial functions to meet the needs of Hong Kong in particular and the needs of the international societies in general.The primary knowledgebase of the system is a dictionary of over 60,000 Chinese word entries encoded in Unicode.The contents of each word entry include the traditional characters of the word,the simplified characters,the Hanyu Pinyin expression,the Jyutping expression,the Changjie code and the Sucheng code.The presen...
[关键字]网上汉字输入; 词典编辑; 汉语拼音; 粤语拼音; 简体字; 繁体字;
|
Outline字体结构式压缩算法及其实现
[作者]宋晓丹; 罗予频;
[摘要]针对CJKOutline字体在存储量上存在的不足 ,本文提出一种结构式压缩算法。算法对CJK字体进行集合变换 ,得到笔划集合元素 ;并利用聚类算法得到模板笔划 ;对相似数据进行统一存储与调用。同时 ,本文还提出了一种基于笔划段的笔划抽取算法 ,从图论角度实现了集合变换。结果显示 ,算法取得了较好的效果 ,而且适用于多种字体
[Abstract]Because the ability to display and output high quality characters for Chinese,Japanese,and Korean(CJK)languages is still limited now,we propose a structure based compression scheme.The approach uses "Stroke Extraction"to transform CJK outline fonts from Contour Set to Stroke Set.And the strokes are clustered to find appropriate templates.Then the similar strokes are saved as templates and rebuilding information.We also present a stroke extraction algorithm based on stroke segment topologically.The results...
[关键字]结构式Outline字体; 字体压缩; 笔划抽取;
|
基于拼音模型的声学层识别的研究
[作者]黄顺珍; 方棣棠;
[摘要]本文介绍拼音模型的原理及应用。拼音模型是累加语言模型中同音字的相关数据后得到的 3元模型 ,是在原来的声学模型和语言模型之间增加的一个新环节 ,可用来求取相关拼音串的先验概率 ,实验结果表明 ,用它作为声学层识别的后处理 ,可使第 1名的识别率提高 13个百分点 ,可使前 5名的识别率与原来声学模型输出前 10名的识别率相当
[Abstract]The principle and application of Pinyin model are introduced in this paper.The Pinyin model is Trigram that adds up same voice data in language model and it is a new link between original acoustic and language model.It can be used to obtain probability of interrelated Pinyin string.The results in experiments show that by using the model to make final process of the recognition of acoustic layer,the recognition rate of the top one can be increased 13 percent,and the rate of the front fine is similar with the...
[关键字]声学模型; 拼音模型; 语言模型; 连续语音识别;
|
解决多音字字-音转换的一种统计学习方法
[作者]张子荣; 初敏;
[摘要]字 -音转换是语音合成系统中的一个重要模块 ,其中多音词和以单字词形式存在的多音字读音的确定一直是个没有很好解决的问题。本文通过对大量标注有正确拼音的语料的统计 ,指出着重解决 4 1个重点多音字和 2 2个重点多音词就可基本解决字 -音转换的问题。本文采用基于扩展的随机复杂度的随机决策列表方法自动提取多音字 (词 )的读音规则 ,将字 -音转换的错误率由 8 8‰降低到 4 4‰。规则的训练和测试的材料的标注是一个耗费人力和时间的工作 ,而训练材料的数量和质量又直接影响最终的结果。本文提出一种半自动的语料标注流程 ,可以节省将近一半的人工和时间
[Abstract]Grapheme to phoneme conversion is a very important module in a TTS system.How to decide the pronunciation of polyphone characters is a problem that hasn't been solved well in Chinese.By studying a large corpus with corrected pinyin transcription,this paper points out that 41 key characters and 22 key words should be processed.This paper proposes to use the ESC based stochastic decision list to learn pronunciation rules for words ith multi pronunciations.With the generated rules,the error rate for graphe...
[关键字]基于扩展的随机复杂度的随机决策列表; 字-音转换; 多音字; 多音词;
|
古文字信息化处理国际学术研讨会
[作者]张再兴;
[摘要]摘要
[Abstract]摘要
[关键字]古文字信息化处理; 古文字整理;
|
基于News ML的大规模个性化新闻定制研究
[作者]朱友芹; 祁宁; 夏国平;
[摘要]本文分析了多媒体新闻信息新标准NewsML的组成及用法 ,将知识经济时代工业产品大规模个性化定制思想引入新闻信息发布系统中 ,应用智能计算、元数据、多维数据立方体等先进技术 ,提出了一个基于NewsML的大规模个性化新闻定制原型系统
[Abstract]This thesis analyses the components and the application of a new standard named NewML in multi media and news system.The idea of Mass Customization resulted from the intellectual economy in which industrial products are being produced on a large scale and being personalized is now introduced into the news promulgating system.By applying several advanced techniques such as intelligence computation, cell data and multi dimensions data cube,it puts forward a prototype system named Mass Customization News Bas...
[关键字]NewsML; 大规模定制; 新闻定制;
|
共95页 当前第33页