[ 2010 September,09, Thursday ]
中国中文信息学会
Chinese Information Processing Society of China
首页
学会简介
学会领导
学会办公室
工作委员会
专业委员会
学术活动
发展会员
钱伟长中文信息处理奖
科技工作者之家
中文信息学报
新书介绍
按年代和期次浏览(最新数据: 2000年第1期)
基于伪MMX技术的并行识别算法及其应用
[作者]黎明刚; 郭军;

[摘要]本文提出了一种通用的并行算法模型。这种模型可以适用于许多多数据块处理系统。该算法可以成倍提高系统的处理速度。算法的核心采用了伪MMX 技术,对机器硬件没有特殊要求,保证了程序的可移植性。本文对此做了详细论述。同时本文还讨论了该算法模型在余弦整形变换系统上的实现,其处理速度较原算法有了成倍提高。

[Abstract]A general parallel algorithm is proposed in this article.The algorithm can be used in many multi datablock processing system.It can speed up the system doubled and redoubled.The core of the algorithm is fake MMX technology and it has no restriction to the machine,which is illustrated in detail.In this paper,the application of the algorithm in pattern transformation method with cosine function is also discussed.
[关键字]并行算法; 手写汉字识别; 伪MMX技术; 整形变换;



汉语自动分词词典机制的实验研究
[作者]孙茂松; 左正平; 黄昌宁;

[摘要]分词词典是汉语自动分词系统的一个基本组成部分。其查询速度直接影响到分词系统的处理速度。本文设计并通过实验考察了三种典型的分词词典机制:整词二分、TRIE 索引树及逐字二分,着重比较了它们的时间、空间效率。实验显示:基于逐字二分的分词词典机制简洁、高效,较好地满足了实用型汉语自动分词系统的需要。

[Abstract]The dictionary mechanism serves as one of the basic components in Chinese word segmentation systems.Its performance influences the segmentation speed significantly.In this paper,we design and implement three typical dictionary mechanisms,i.e.binary seek by word,TRIE indexing tree and binary seek by characters,from word segmentation point of view,and compare their space and time complexity experimentally.It can be seen that the binary seek by characters model is the most appropriate one being capabl...
[关键字]中文信息处理; 汉语自动分词; 汉语自动分词词典机制;



基于语料库的中文姓名识别方法研究
[作者]郑家恒; 李鑫; 谭红叶;

[摘要]本文在大规模语料基础上提取和分析了中文姓氏和名字用字的使用频率,研究了中文姓名识别的评价函数,动态地建立了姓名识别统计数据表和姓名阈值。提出了在不作分词处理的原始文本中进行中文姓名识别的方法。经开放测试,召回率为95 .23 % ;精确率为87 .31 % 。

[Abstract]This paper dynamically builds parameter table and threshold by extracting and analyzing usage frequency of characters of Chinese names based on large scale corpus and researches evaluation function for Chinese name recognition. And it presents the method of Chinese name recognition without text segmentation. After open test, the recall rate and precision rate are respectively 95.23% and 87.31%.
[关键字]中文姓名识别; 姓氏使用频率; 自动分词;



基于DOP的汉语句法分析技术
[作者]张玥杰; 朱靖波; 张跃; 姚天顺;

[摘要]本文提出一种以DOP技术作为基本框架,同时利用基于相似的概率评估技术,实现汉语句法分析的方法。其中,对于输入语句,首先需要经过词汇层与词性层两层初选。然后,基于已构建知识源,获取输入语句的片段组合形式。最后,对输入语句与初选结果进行相似性评估,完成输入语句的组合分析过程。为论证方法有效性,基于包含1 000 个语句的真实汉语语料构建知识源,并采用包含100 个语句的真实汉语语料作为测试集。实验表明,句法分析的各项指标都比较令人满意,可有效地实现汉语句法分析。

[Abstract]This paper presents a kind of Chinese parsing method which takes the DOP technique as the basic frame and utilizes the similarity based probabilityestimate technique. In the implementation, every input sentence must by preprocessed through the initial selection in word level and part of speech level. Then the fragment combination forms of the input sentence are acquired based on the constructed knowledge source which includes treebank, fragment bank and fragment combination bank. Finally, the similar...
[关键字]面向数据的分析; 汉语句法分析; 相似性评估; 树库; 片段库; 片段组合库;



中文手写文稿的二值化与行列切分
[作者]蔡樱; 盛立;

[摘要]灰度图象的二值化与行列切分是预处理中的重要环节,对识别系统有很大的影响。针对带有框线的文稿图象,本文提出了双重阈值法的二值化方法,有效地去除了框线。在字符分割部分。本文提出了先三行后单行列切分的方法,准确地提取了字符

[Abstract]This paper puts forward a binarizing method named double threshhold methodology which can eliminate the frame line effectively.And also a 3 lines first single line later method to segment the charaters accurately.
[关键字]手写文稿识别; 二值化; 行列切分;



汉语文本动态字母表0阶模型算术编码
[作者]王忠效; 范植华;

[摘要]本文探讨汉语文本的0 阶统计模型的构造方法,提出了一个卓有成效的汉语文本压缩算法。仅仅凭借这一最初级的模型,汉语文本的编码效率已经超过LZ与Huffman 编码的混合算法。由于0 阶统计模型是各种高阶统计模型的基础,所以,本文对汉语以及其他大字符集文种( 如日文、朝鲜文) 的文本压缩研究具有重要的参考意义

[Abstract]This paper addressed the construction of a dynamic alphabet order 0 model of Chinese text for arithmetic coding and provided an algorithm of Chinese text compression.The model had shown to be of high performance because the algorithm driven by it could compress Chinese texts more efficiently than those that combined both LZ and Huffman coding.Because order 0 model laid the foundation of order n models,what the paper discussed was important to the text compression of any large alphabet natural language,...
[关键字]数据压缩; 汉语文本压缩; 算术编码; 统计模型;



论藏文的序性及排序方法
[作者]江荻; 周季文;

[摘要]为解决藏文排序问题,本文提出藏文的构造序和字符序概念,并在此基础上提出解决藏文词典序的计算机方案。文章对各类藏文构造及字符进行了分析和赋值,给出了藏文计算机排序的技术流程图

[Abstract]On the basis of coded character sets for Tibetan information processing,we need to discuss the sequence of Tibetan words and the method of making sequence. First, we put forward a concept of construction sequence for Tibetan words, which is much different from the sequence of Tibetan transliteration. There are two sequences in Tibetan, one is word order in dictionary, the other is character order in words as well as in a dictionary, and both of them make the structure inconceivable complex with multi hiera...
[关键字]藏文; 词典序; 构造序; 字符序;



基于国产开放系统平台Java虚拟机的中文化
[作者]丁宇新; 梅嘉; 程虎;

[摘要]本文在深入分析Java 内部编码机制的基础上,指出了现存Java 开发工具中中文化存在的问题,并提出了解决方案。我们采用了编码识别技术,将两种不同形式的中文字符编码转换为统一的JVM 内部表示,解决了现存Java 开发工具中不同版本间的中文乱码现象;通过在类文件加载时,将类文件名转换为本地操作系统可识别的字符编码,实现了虚拟机对中文类名的支持。实验结果表明我们的设计方案是可行的。

[Abstract]In this paper the encoding technique for Java is analyzed and the Chinese problems which lie in the current Java developing tools are discussed.Then a new method in which the Chinese code recognition and conversion techniques is adopted is presented to solve the Chinese incompatibility for different Sun JDK version.At last the experiments prove the method is feasible.
[关键字]Java; 虚拟机; 中文化;



关于汉字字符串排序算法
[作者]钟诚;

[摘要]分析汉字字符串分组排序算法,在讨论基选择的基础上,给出将字符串映射成整数和处理映射冲突数据的改进的有效方法。

[Abstract]Based on Paper[1],the choice of base,method of mapping chinese character strings into integers and collision problem are researched,and an improved chinese character strings sorting technique is given.
[关键字]汉字字符串; 排序算法; 映射;



基于结构助词驱动的韵律短语界定的研究
[作者]应宏; 蔡莲红;

[摘要]提高合成语音的自然度是汉语文语转换系统(CTTS) 的核心任务,而韵律短语的界定扮演着重要的角色。本文通过分析虚词的特征,研究了结构助词在连续语流中的特点、地位,以及在韵律短语界定中的作用,得到了一组相应的规则和结论

[Abstract]To improve the naturalness of the synthesis speech is the TTS' task at the core. But the segmentation of the prosodic phrase plays an important role. Through the analysis of the function word's characteristic, by way of studying the feature and position of the structural auxiliary word in the continuous speech flow and its function in the segmentation of prosodic phrase, a set of corresponding rules and conclusions has been got, in the paper.
[关键字]虚词; 结构助词; 韵律短语; 韵律短语界定; 汉语文语转换系统(CTTS);



共95页 当前第48页 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95   
©中国中文信息学会 1981-2007
京ICP备05039057号