科技汉、日词汇的计算机计量及中日英文字的比较
[作者]王懋江; 吴振益;
[摘要]本文对中日两国工业技术词汇的平均词长及排序时间进行了测量并得到大量数据。实测数据表明,外来语名词(英语名词或其他语种的名词)的意译比音译,其译名的词长要短得多,故作者提倡外来语名词意译成汉语名词,而不赞成音译。
[Abstract]Abstract This paper describes the average length and sorting time of industrial technical words and phrases in Chinese and Japanese, which were measured by bomputer, and a lot of data were obtained. Jhese data show that free translation of words of foreign origin (words and phrases in English or other languages) is much shorter than that of transliteration.Jherefore, the author of this paper preferred to the former.
[关键字]日文汉字; 日本汉字; 片假名; 日语词; 单字词; 计算机计量; 汉语词; 表示能力;
|
关于汉字的两个分组查找算法
[作者]周建钦; 马述杰; 李进忠;
[摘要]处理汉字的以比较为基础的二分查找算法,其复杂性为O(NlogN)。本文结合概率论知识,提出汉字的随机分组查找算法和分组散列查找算法,给出算法描述,并证明其算法复杂性为O(N),从而优于二分查找算法。最后给出实验结果。
[Abstract]The binary searching algorithm based on comparison have the complexity of O(NlogN).Inthis paper, we presented random blocking searching algorithms for Chinese Characters and blocking scattering searching algorithms for Chinese Characters. We proved their expected complexity to be O(N). We gave the experiment result with these algorithms.
[关键字]汉字; 二分查找; 随机分组查找; 分组散列查找; 概率分布;
|
藏文信息处理属性统计研究
[作者]江荻; 董颖红;
[摘要]本文统计分析:1、藏字的字长和构调频度;2、藏字的声母和韵母结构方式及频度;3、藏字的位置字符及结构方式。通过统计分析,从藏字结构方式的量和位置字符的量的度量揭示其质的面貌,为藏文研究和藏字信息处理应用提供基础数据。
[Abstract]Abstract This paper count up the length of Tibetan characters and the number of structural mode of Tibetan characters, and count up the number of initial clusters and finals of tibetan characters, as well as the number of special letters of Tibetan characters. By measuring quantity of its structural mode and letters we'll get the real understanding of the nature of the Tibetan characters.
[关键字]藏文; 中央民族学院; 民族出版; 结构方式; 属性统计; 字符组合;
|
曲线轮廓汉字字形缩放与还原中几个问题的研究
[作者]黄宜华; 袁春凤;
[摘要]本文主要讨论了曲线轮廓字形缩放与还原中两个重要的技术问题。首先描述了一个新的用于提高还原速度的快速封闭区域填充算法。然后,给出了一个笔划缩放误差调整技术,它可保持缩放字形笔划的均匀美观;同时,文中给出了一个完整的字形缩放与还原算法。
[Abstract]Abstract This paper discusses two technical problems for font scaling and generating from a kind or curve-outlined Chinese Character Fonts. First, a new Fast Area Filling Algorithm based on outlines offonts is described in order to enhance generating speed. Then, a Error Adjusting Method for scaled strokes is presented which can keep scaled fonts having even strokes and high quality. Meanwhile, a whole algorithm for font scaling and generating is presented.
[关键字]汉字字形; 开关点; 舍入误差; 轮廓字形; 误差调整; 缩放; 几个问题; 曲线轮廓;
|
一个高精度的简、繁体印刷体汉字文本识别系统
[作者]张炘中; 沈兰生; 刘秀英; 李燕; 闫昌德;
[摘要]本文叙述了一个基于改进的"汉字识别特征点方法"的高精度简、繁体印刷体汉字文本识别系统。引入特征点的方向属性,明显地提高了"汉字识别特征点方法"的汉字识别率。文中阐述了该系统各主要环节的原理。经过百万汉字真实印刷文本的严格测试,本系统汉字识别率达到97.84%。对质量较高的真实印刷文本,汉字识别率达到99%以上。
[Abstract]AbstractThe system presented in this paper is based on an improved version of thefeature-point method. The new version uses the direction properties of feature points and this greatly improves the recognition accuracy of the method. The main parts of the recognition system are discussed in this paper. The system has been tested on millions of characters from common printed materials, and the recognition rate reaches 97.84%. For textwith higher printing quality, the recognition rate is above 99%.
[关键字]高精度; 特征点法; 识别率; 印刷体汉字; 文本识别; 自动分词;
|
中文姓名的自动辨识
[作者]孙茂松; 黄昌宁; 高海燕; 方捷;
[摘要]中文姓名的辨识对汉语自动分词研究具有重要意义。本文提出了一种在中文文本中自动辨识中文姓名的算法。我们从新华通讯社新闻语料库中随机抽取了300个包含中文姓名的句子作为测试样本。实验结果表明,召回率达到了99.77%。
[Abstract]he processing of Chinese names is significant to the approach of Chinese word segmentation. This paper presents an effective algorithm for automatically identifying this sort of proper nouns in Chinese texts. The testing sample, involving 300 sentences each of which contains at least one Chinese names, is extracted at random from the Xinhua News Corpus. The preliminary experiment shows that the recall rate of this algorithm reaches 99.77%.
[关键字]中文姓名自动辨识; 生词处理; 汉语自动分词; 中文信息处理;
|
以压缩词库为数据基的重码自动区分技术
[作者]黄希琛; 崔广才;
[摘要]重码自动区分技术是汉字键盘输入技术中的重要研究方向,是解决编码易学和输入快速之间矛盾的有效方法。本文首先介绍了联想字库的结构和双向联想区分重码的技术原理,接着阐述了压缩词库的存储结构以及区分重码所采用的查询、生成、匹配和剪切技术。
[Abstract]Abstract Automaticly distinquishing technique of overlapping code is an important subject in keyboard-inputting technology of chinese characters. It is an efficient method in soluting problems between eastly learning and quickly inputting. This paper introduces the structure of association library and the distinquishing technique for overlapping code with bidirectional association. Then it discusses the restoring structure of condensed word library and the principle of searching, generating, matching and cu...
[关键字]汉字输入输出; 重码处理; 信息压缩;
|
一种优化的并行汉字/字符串匹配算法
[作者]王素琴; 邹旭楷;
[摘要]字符串检索指在一个文本Text=t1…tn中找出一个字符串Pat=p1…pm的所有出现.本文给出了在CREW/CRCWPRAM机器模型上并行检索汉字/字符串的算法,它使用n/m个处理机,预处理时间为O(m+|Σ|),并行执行时间为0(m)。
[Abstract]The string matching problem means to find all occurrences of a pattern of length m in a text or length n. This paper offers an optimal parallel algorithm to char/chinese character string matching. The algorithm can run on CREW/CRCW PRAM machines, uses n/m processors,precomputing time is O, and parallel execution time is O(m).
[关键字]并行算法; 文本; 模式; 字符串检索; 搜索状态向量; 字符──模式匹配向量;
|
页面描述语言PostScript字库机制的一个层次式实现模型
[作者]胡长原; 张福炎;
[摘要]将国际流行的页面描述语言PostScript扩充以支持高质量的汉字页面的输出,是近年来国内外对PostScript语言及其解释器研究的一个热点.本文提出了一个PostScript字库机制的层次式实现模型。该层次模型从语言的描述能力到字库解释器的实现,充分体现了作者的两个目标:一是兼容和增强PostScriPt语言对页面上正文输出的灵活高效的描述能力,并推广到大字符集,多文种的应用场合;二是在解释器的字库技术上突破AdobeType1的局限,使解释器能支持多种实用的字库技术,特别是国内的多种曲线轮廓汉字库技术.本文同时提出了将字库技术从页面描述语言中独立出来的研究与开发方法.
[Abstract]AbstractAlthough PostScript Language has been widely used in the area of electronic printing and publishing,it still need to be extended to have the facilities to deal with high quality Chinese characters on a printed page.In this paper,a hierarchical implementation model for PostScript's font mechanism is presented.This model has two main goals, one is to extend the language's description capability for Chinese characters compatiblly;the other is to use various kinds or font techniques especially technique...
[关键字]页面描述语言(PageDescriptionLanguage); 解释器(Interpreter); 字库(font);
|
中华汉字输入编码方案
[作者]周建钦; 赵志远; 刘世民;
[摘要]本文首先论证了利用西文键盘的汉字输入,完全根据汉字的音,形或意等属性编码,很难达到最优。因而,我们提出既利用汉字的某些属性编码,同时又强制固定汉字的一些编码,使得编码位数达到最优.本文详细介绍了中华汉字输入编码方案及其具体实现。该方案有三个主要特点:A.比拼音码更易学易用.B.编码位数分别为1,2,3。C.可输入全部国标(GB2312-80)汉字和10,000多常用词组和短语。
[Abstract]On the basis of the chinese character Pinyin order and information theory,this paper proposes the Great China coding scheme for chinese character input.The Great China coding scheme has the following three main advantages:A. Users can input chinese character without additional knowlege.B.A coding string has 1,2 or 3 English letters (the last one can be non-English character).C. The coding range include the chinese characters or GB2312-80 and more than 10,000 common words.
[关键字]汉字输入; 编码; 拼音顺序;
|
共95页 当前第65页