利用上下文相关信息的汉字文本识别
[作者]夏莹; 常新功; 马少平; 朱小燕; 金奕江;
[摘要]为了改善汉字文本识别率,本文提出了一种基于语料库统计概率的后处理方法,该方法利用上下文相关信息,超过词汇。对于汉字文本识别,把具有确定性边界的一个汉字序列(多数情况为一个句子)作为一个处理单元,利用统计获得的字字同现概率,采用动态规划方法,获得了令人满意的效果。
[Abstract]In order to improve Chinese text recognition rate, in this paper we present a post processing method of corpus-based statistical probabilities. The method has used contextual information more than the lexical lever knowledge. For Chinese text recognition, a bounded seguence of Chinese characters (more often, a sentence) is processed as an unit. And the cooccurrence probability between characters and dynamic progamming strategy are employed to acquire the satisficatory recognition results.
[关键字]汉字识别; 语料库语言学; MARKO∨模型; 后处理;
|
中文页面描述语言解释器CPDL2中的文字处理技术
[作者]徐福培; 张炜;
[摘要]页面描述语言(PDL)已在电子出版等领域得到了广泛的应用,本文阐述了中文页面描述语言解释器CPDL2中文字处理的几项关键技术,包括中西文字库的组织、高速还原、解释流程和hinting技术等.
[Abstract]Abstract This paper introduces the main word processing technologies in Chinese Page Description Languagc interpreter CPDL 2. These technologics include the organization of font, the methods for high speed character rendering and technology for hinting
[关键字]文字处理; 解释器; 中文页面描述语言;
|
汉语书面语的分词问题──一个有关全民的信息化问题
[作者]陈力为;
[摘要]汉语书面语的分词问题──一个有关全民的信息化问题陈力为(电子部计算机与微电子发展研究中心)汉语的书面语是按句连写的,词间无间隙。因此在汉语书面语的处理中,例如,统计、分析、理解等,我们首先遇到的问题是词的切分。把按句连写转换为按词连写,所以,词的正确...
[Abstract]
[关键字]中文信息处理; 国民经济信息化; 汉语书面语; 计算语言学;
|
基于基因算法的时间规正算法
[作者]贺前华; 韦岗; 徐秉铮;
[摘要]本文提出了一种适用于孤立字识别的基于基因算法的时间规正算法;详细讨论了其中一些关键技术,如编码方法、适应度技术、基因操作子设计等。该算法可弥补动态时间规划(DTW)的某些不足:(1)使距离归一化因子MΦ与实际路径相关,这使不同路径的比较更合理;(2)以自然方式提供多条最佳规划路径。建立了试验数据库,在试验结果的基础上提出了算法性能分析模型:模板间距离遵循正态分布.通过与DTW及串行多路径搜索法的性能进行比较,结果表明基因时间规正算法具有明显的识别优势。
[Abstract]In this paper, a Genetic Time Warping (GTW) algorihm for isolated word recognition was proposed. Relative representation technique, fitness technique and reproduction techniques were described. And genetic operators were also discussed in detail. Genetic time warping has some advantages over Dynamic Time Warping (DTW) and Serial Multi-path Searching Algorithm (SMS): (1) The normalization factor is related to the actual warping path, which makes the comparison of the mean distortion of different warping path...
[关键字]时间规正; 基因算法; 染色体;
|
蒙古文信息处理通用系统内部码体系结构详析
[作者]拉西吉格木德;
[摘要]Abstract This paper summarises specical characteristics of mongolian scripts and analyses existing mongolian system. Then we pointed out that the key is cstabilish internal code architecture for mongolian information processing system having universal processing function and compatibility with latin and Hanzi. Then, several plans for internal code architecture are analyzed and compared in detail, and investigated their feasibility
[Abstract]
[关键字]蒙古文字; 蒙古文信息处理; 通用系统; 体系结构; 内部码;
|
基于汉字发生器的汉字字形CAD系统的设计
[作者]温立新; 何尔恭; 薛开平; 温卫红;
[摘要]本文提出一种基于汉字发生器的汉字字形计算机辅助设计系统的设计方案,系统功能上分为设计笔划形体和快速造字两部分,在设定了笔划形体之后,就能方便快速制造该字体的全字号、全字集的汉字。在ISO10646CJK大字库的开发中,初步得到验证,综合性能高,具有现实意义。
[Abstract]The paper is to present a design method for Chinese character pattern CAD system based on the Chinese character-generator. The system can be divided into two parts by function: stroke style designing and quick making of character. Once the style of strokes is defined, the whole set characters or every size can be casily made. And the system has been used for the development or ISO 10646 CJK Han charcater set, and tested to be quite practical
[关键字]字形; 汉字发生器; 字形库; 计算机辅助设计;
|
Performance-index Analysis of Chinese Character Coding Scheme Oriented to the Middle and Primary Schools
[作者]He Kekang(Beijing Normal University);
[摘要]In this paper, a performance-index system of Chinese character coding scheme oriented to the middleand primary schools is proposed according to the social requirement analysis and the results ofon-the-spot trials in the middle and primary schools. The concrete mcasures and methods for realizing theperformace-index system are analysed deeply.
[关键字]Chinese character coding; Performance-index system.;
|
自由码及其词组输入处理方法
[作者]何尔恭; 曾锡山;
[摘要]本文提出自由输入法的特点及说明。还提出了与自由码输入环境有关的词组输入处理方法。自由输入法由大自由,小自由和超自由三个形成系列的输入方法组成。"大自由"为最简易的一种,允许多组按键输入一字。"超自由"是快速高效的专业型输入法。使用词信息库输入词组。存储需求大大节省。多种输入方法可共用同一信息库。
[Abstract]
[关键字]汉字信息处理; 输入方法; 汉字编码; 词组输入;
|
汉字原型与手写汉字识别
[作者]姜珊;
[摘要]本文评述了目前三种汉字的计算机表示和二种传统的汉字结构分析方法。应用拓朴和几何的基本原理,分析汉字结构及其制约关系。从而,确定四类组成汉字的基本关系并在此基础上实现了汉字原型,给出了把汉字原型应用在手写汉字认别的实例
[Abstract]
[关键字]汉字结构; 汉字原型; 手写汉字识别;
|
论歧义结构的潜在性
[作者]冯志伟;
[摘要]本文把作者在科技术语结构研究中提出的"潜在歧义论"(PA论)进一步推广到日常语言,说明在汉语日常语言中也广泛地存在着潜在歧义结构,而在具体的语言文本中,许多潜在歧义都消解了。自然语言有歧义性的一面,又有非歧义性的一面,潜在歧义论正好揭示了自然语言的歧义性和非歧义性对立统一的规律。潜在歧义论指出了潜在歧义结构本身就包含了消解歧义的因素,因而这种理论可为自然语言处理提供消解歧义的方法和手段。
[Abstract]n this paper,the author extends his PA Theory(Potential Ambiguity Theory) Proposed in Chinese scientific terminology to the field of Chinese language for everyday use.The paper reveals the phenomena of potential ambiguity:there is not only ambiguousness,but also nonambiguousness in natural language.The PA Theory precisely represents the law of the unity of opposites between ambiguousness and nonambiguousness of natural langUage,and it also provides an useful means for disambiguating in natural language proc...
[关键字]潜在歧义论(PA论); PT-结构; 实例化; 歧义性; 非歧义性; 歧义消解;
|
共95页 当前第63页