手写汉字的集群识别
[作者]姜珊; 孙玉方;
[摘要]为了降低单个汉字的分辨率,论文分析了通用的汉字识别模型,并在此基础上建立了适于多字识别的集群识别模型。为了充分论证集群识别模型的观点,本文从理论证明和实验两方面获得支持根据。实验结果表明基于多字识别模型的集群识别能可靠提高对连续文字的识别效果,是手写汉字识别中很有希望的发展方向。
[Abstract]In order to decrease the of single handwritten Chinese character recognition,this paper discuesses the gerenal model of Chinese character recognition.Based on the discussion,the paper proposer the model of multi-character recognition.To fully support multi-character recognition,the paper supplies theory certification and experiment results.Our experiment results show that multi-character recognition can increase text recognition rate and is hopeful developing direction for handwritten Chinese character reco...
[关键字]手写体汉字识别; 汉字识别模型; 集群识别;
|
扩展词组数最小法的假名汉字转换
[作者]中岛晃;
[摘要]本文描述了改进后的“词组最小法”、并提出了新算法。它被名为“扩展词组最小法”。重新定义了句子中词组的计算方法。为了实现此目标,从始读到句子假名的全部读入,将词库查询及语法检查的结果以“树”型数据加以保留。采用上述算法后,以假名文字为单位的变换率可达95.8%;以词组为单位的变换率可达88.9%。
[Abstract]We tried to enhance a“minimumizing a sum of syllables in a sentence”and proposed a new algorithm,named“ninimumizing a sum of syllables in a broad sense”.We redefined a way of counting syllables in a sentence.Realizing this,we searched dictionary and checked grammatical rules and maintained into'tree'form till the analysis for all'Yomi'were finished.In evaluating the conversion-accuracy using said conversion algorithm,we got 95.8% achievement based on the count of Kana-character and got 88.9% achievement on ...
[关键字]词法分析; 词组数最小法; 假名汉字转换;
|
一类规范文本篇章结构的自动标引
[作者]单永明;
[摘要]本文通过对汉语文本中标题和段的级、标题的型等概念的描述与分析,讨论了汉语文本篇章结构的标引问题,提出了规范文本的概念,并给出了规范文本篇章结构的一种标记方法,在此基础上,讨论并实现了规范文本篇章结构的自动标引,给出了标引算法。
[Abstract]By the definitions and descriptions of the notions,that the level of title and paragraph,the type of title in chinese text,that indexing the writings-structure of chinese text is discussed in this paper.The concept of the regular chinese text and a tagging method on its writings-structure are defined.On this basis,an implementation technique and algorithm about automatic indexing for regular chinese text writings-structure are presented.
[关键字]中文信息处理; 文本自动分析; 自动标引; 篇章结构; 标引算法;
|
数据库汉语查询语言的分词研究与实现
[作者]徐九韵; 仝兆岐; 向逐聪; 王新民;
[摘要]在综合考虑数据库查询这一特殊性的基础上,根据查询语句中词汇对数据查询不同贡献程度分级建立分词词典;然后提出了分步--正向单扫描的分词方法(DSWS),并对该分词方法的时间复杂度进行了分析。
[Abstract]We think over the specials of data retrieving,and build word segmentation dictionaries based on the lexicons of data retrieve;Then we propose the Different step--Single scan Word Segmentation(DSWS),and analyze the time complexity of the segmentation algorithm.It is useful of this word segmentation based on language environment in disambiguation and word segmentation efficiency.
[关键字]汉语分词; 分词词典; 数据库查询;
|
汉字异或动态散列分组查找算法
[作者]王忠效; 范植华;
[摘要]本文根据汉字内码特点,提出一个适合汉字信息处理用的汉字动态散列分组查找算法。该算法采用简单的异或散列函数将汉字进行分组,组内取链式结构顺序查找。由于散列均匀,其渐近时间复杂度为O(1)。
[Abstract]Based on the analysis of machine codes for Chinese characters,this paper proposed a dynamic hashing algorithm for quick search of Chinese characters,which adopts simple xor operation to disperse all the probable Chinese characters into 256 groups equally and follows a linear search within each group.Experiments show that the algorithm is of practical value and its asymptotic time complexity is O(1).
[关键字]汉字查找; 散列查找; 散列函数; 自适应散列查找;
|
面向EBMT的汉语单句谓语中心词识别研究
[作者]穗志方; 俞士汶;
[摘要]在基于实例的汉英机器翻译(EBMT)系统中,为计算语句相似度,需要对句子进行适当的分析。本文首先提出了一种折中的汉语句子分析方法———骨架依存分析法,通过确定谓语中心词来把握句子的整体结构,然后,提出了一种根据汉英例句集中英语例句的谓语中心词来识别相应的汉语例句的谓语中心词的策略。实验结果是令人满意的
[Abstract]An appropriate parsing to the sentences is necessary for sentence similarity calculation in EBMT.This paper puts forward a parsing approach—skeleton analysis,which masters the main structure of the sentence through the determination of its predicate head.Then discusses a strategy to recognize the predicate head of the Chinese sentence according to the predicate head of the corresponding English sentence.The experimental result is satisfactory.
[关键字]基于实例的机器翻译; 语句相似度; 谓语中心词; 知识获取; 语义匹配;
|
汉英机器翻译中的冠词处理研究
[作者]常宝宝; 刘颖; 刘群;
[摘要]对于汉英机器翻译而言,由于汉语中缺乏与英语冠词相应的语言范畴,而且面向人的冠词用法规则很难满足机器翻译处理的需要,冠词的误用严重地影响了最终译文的质量。本文提出一种将基于转换的错误驱动的学习机制用于冠词处理的策略,初步实验显示,这种方法可以有效地提高机器译文中冠词使用的准确率。
[Abstract]Because Chinese language has no corresponding category with English articles and the usage rules of articles oriented to human are difficult to operationalize for machine translation,there are many cases in using articles incorrectly in Chinese English machine translation system,which degrade the quality of the output translation severely.In this paper,we proposal a strategy for article selection which based the Transformation Based Error Driven Learning Algorithm,an initial experimental result shows the...
[关键字]机器翻译; 冠词选择; 基于转换的错误驱动学习;
|
日汉机器翻译系统中的词典讨论
[作者]雍殿书; 胡海文; 陈家骏; 王启祥;
[摘要]本文讨论了日汉机器翻译系统中有关词典的同音词、同型词、兼类词、挑选汉译词以及惯用型处理等几个问题,这些问题的解决将直接影响日汉机器翻译系统的译文质量。
[Abstract]In this paper,we discussed several problems,such as:homonym,potysemant.compatible type word,how to process idioms.We make a lot of researches in solving these problems which great effect the quality of generation languages in the system of Japanese-Chinese Machine Translation.
[关键字]机器翻译; 词典; 同音词; 多义词; 惯用型;
|
一种无约束手写体数字串分割方法
[作者]赵斌; 苏辉; 夏绍玮;
[摘要]针对无约束手写体数字串中的连笔字符,本文提出以基于识别的分割方法为主,结合运用剖分方法和全局识别方法等多种分割策略的数字串分割方法。这种方法直接针对数字串分割,也可以运用到非数字字符串的分割中,其分割思想对连笔汉字的分割也具有一定指导意义。
[Abstract]Aim at segmenting the joint characters in unconstrained handwritten numeral string, multi-strategies segmentation method,which mainly based on the Recognition-Based Method,combined with Holistic and Dissection,is proposed in this paper. This method can treat not only numeral string,but also alphabetic string.The idea is useful for reference for segmenting Chinese characters strings.
[关键字]分割; 字符识别; 轮廓; 投影; 动态规划;
|
汉语三字词声调的模式分析
[作者]陶维青; 徐士林; 钟金宏;
[摘要]汉语三字词的声调模式是复杂的。本文对4男4女各192个三字词的声调进行分析,选择各音节的头尾差和相对调位比为特征,在进行特征抽取、统计和分析的基础上,研究了三字调整声调的模式和变调规则。本文的结果对三字词和连续语音声调的识别具有重要价值。
[Abstract]The tone pattern of Chinese trisyllabic words are more complicated.This paper presents the tone pattern analysis of 192 trisyllabic words pronounced by 4 males and 4 females respectively.The head-tail difference and relative tone level rate are determined as two main features. Based on the two features and static analysis the performance of tone patterns and rults of tone sandhi for Chinese trisyllabic words are studied. The resules are very variable for the further tone pattern classification of trisyllabi...
[关键字]三字词; 特征抽取; 模式分析;
|
共95页 当前第55页