[ 2010 September,10, Friday ]
中国中文信息学会
Chinese Information Processing Society of China
首页
学会简介
学会领导
学会办公室
工作委员会
专业委员会
学术活动
发展会员
钱伟长中文信息处理奖
科技工作者之家
中文信息学报
新书介绍
按年代和期次浏览(最新数据: 2005年第4期)
一种基于超链接结构的向量空间模型改进算法
[作者]原福永; 褚蓓蓓;

[摘要]在基于向量空间模型的信息检索系统中,TF-IDF算法被广泛的应用在基于关键字的信息检索中。然而,对于网页独特的超链接结构,需要有一种技术在表示网页内容的同时将与它相邻链接的网页内容考虑进去。本文分析了向量空间模型的实质,并找出了其精度低的原因,在传统模型基础上提出了一种基于网页超链接结构的向量空间模型改进算法。实验分析表明改进后的算法与原算法相比检索精确度提高了10%,在一定程度上改善了检索效果。

[Abstract]In information retrieval systems based on the vector space model, the TF-IDF scheme is widely used to characterize documents. However, in the case of documents with hyperlink structures such as Web pages, it is necessary to develop a technique for representing the contents of Web pages more accurately by using the contents of their hyperlink neighboring pages. VSM is analyzed to find the reason for its low precision, and propose an approach by using the contents of hyperlink neighboring pages. The experimen...
[关键字]计算机应用; 中文信息处理; 搜索引擎; 信息检索; 向量空间模型; 超链接;



一种有效的手写体汉字组合特征的抽取与识别算法
[作者]孙权森; 金忠; 王平安; 夏德深;

[摘要]基于特征融合的思想,从有利于模式分类的角度,推广了典型相关分析的理论,建立了广义的典型相关分析用于图像识别的理论框架。在该框架下,首先利用广义的典型相关判据准则函数,求取两组特征矢量的广义投影矢量集,构成一对变换矩阵;然后根据所提出的新的特征融合策略,对两种手写体汉字特征进行融合,所抽取的模式的相关特征矩阵,在普通分类器下取得了良好的分类效果,优于已有的特征融合方法及基于单一特征的PCA方法和FLDA方法。

[Abstract]A new method of combined feature extraction, based on the idea of feature fusion, is proposed in this paper. The theory of canonical correlation analysis(CCA) in consideration of pattern classification have generalized. A framework of generalized canonical correlation analysis(GCCA) used in pattern recognition is established. In this framework, first of all, based on generalized canonical correlation discriminant criterion, solve the generalized projective vectors of the two groups of feature vectors to com...
[关键字]人工智能; 模式识别; 手写体汉字识别; 广义的典型相关分析; 特征融合;



语音识别中的一种说话人聚类算法
[作者]肖述才; 欧智坚; 王作英;

[摘要]本文介绍了稳健语音识别中的一种说话人聚类算法,包括它在语音识别中的作用和具体的用法,聚类中常用的特征、距离测度,聚类的具体实现步骤等。我们从两个方面对该算法的性能进行了测试,一是直接计算句子聚类的正确率,二是对说话人自适应效果的改进的作用,即比较使用此算法后系统性能的改进进行评价。实验表明:在使用GLR距离作为距离测度的时候,该算法对句子的聚类正确率达85·69%;在识别实验中,该聚类算法的使用,使得用于说话人自适应的数据更加充分,提高了自适应的效果,系统的误识率已经接近利用已知说话人信息进行自适应时的误识率。

[Abstract]In this paper, We introduced a speaker clustering algorithm in speech recognition, which includes its effect to the recognition system. Also, its usage, the features used, distance measurement and the procedure of the algorithm were described. To evaluate the effectiveness of the algorithm, we do two kinds of experiments. One is by calculating the clustering correction rate directly and the other is by comparing the word error rate (WER) of the recognition system under two different conditions: whether usin...
[关键字]计算机应用; 中文信息处理; 说话人聚类; 说话人自适应; 语音识别;



口语对话中的语句主题分析
[作者]徐为群; 徐波; 黄泰翼;

[摘要]本文研究如何根据浅层的语义分析确定自然口语对话中的语句主题。首先将对话中的语句主题定义为说话者所关注的显著语义实体,并讨论了这样的语句主题所具有的两个特点(即话语性和连续性)以及语句主题跟(扩展)句子类型的关系(因而也介绍了句子类型及其扩展和扩展句子类型的识别)。然后根据这些建立了语句主题分析算法,并在实际的对话语料中进行分析。实验结果表明,语句主题的分析正确率可达到61·1~87·6%,取决于不同的扩展句子类型和不同的正确率定义。

[Abstract]This paper investigates how to identify utterance topics in spontaneous spoken dialogues based on some shallow semantic analysis. First the topic of an utterance is defined as the salient semantic entity its speaker focuses his/her attention on. Then we discuss two features of such a topic (i.e., topic as discourse construct and topic continuity) and the relationship between utterance topic and (extended) sentence type. According to these an algorithm is established to identify utterance topic and evaluated...
[关键字]人工智能; 自然语言处理; 语句主题; 句子类型; 对话; 计算分析;



X Window核心系统的民文支持
[作者]谢谦; 吴健; 孙玉芳;

[摘要]Linux系统对少数民族文字的支持需要建立在国际化机制基础上,本文在总结现有Linux国际化框架层次结构基础上,分析了X核心系统国际化的一些关键问题,并以增加藏文支持的实践为例,系统说明了增加民族文字支持所需对X核心系统进行的改动,对在相关项目中的实施情况和效果进行了评估,最后结合其他民族文字系统的研究,对这些工作的局限性进行了深入分析,并提出了今后的工作方向。

[Abstract]Ethnic language support in Linux should be based on internationalization (I18N) mechanism. In this paper, after summarizing the hierarchical structure of Linux I18N framework, several crucial issues related to X window core system are analyzed. Necessary modifications for adding ethnic language support in X window core system are systemically enumerated, exemplifying by the practice of adding Tibetan support. The implementation in related project is evaluated. Along with the research on other ethnic languag...
[关键字]计算机应用; 中文信息处理; X窗口系统; 国际化; 藏文;



多策略机器翻译系统IHSMTS中实例模式泛化匹配算法
[作者]张孝飞; 陈肇雄; 黄河燕; 胡春玲;

[摘要]基于精确匹配的EBMT,由于翻译覆盖率过低,导致其难以大规模实际应用。本文提出一种实例模式泛化匹配算法,试图改善EBMT的翻译覆盖率:以输入的待翻译句子为目标导向,对候选翻译实例有针对性地进行实时泛化,使得算法既能满足实时文档翻译对速度的要求,又能充分利用系统使用过程中用户新添加和修改的翻译知识,从而总体上提高了系统的翻译覆盖率和翻译质量。实验结果表明,在语料规模为16万句对的情况下,系统翻译覆盖率达到了75%左右,充分说明了本文算法的有效性。

[Abstract]Example-based machine translation is currently difficult in large-scale implications because of its low translation coverage. In this paper, an algorithm of generalizing match of translation examples is proposed to improve the translation coverage of EBMT: the candidate translation examples are generalized in real time controlled and guided by the input sentence which to be translated. The algorithm not only can satisfy the speed of real time documents translation but also can use the new language knowledge...
[关键字]人工智能; 机器翻译; 基于实例的机器翻译; 泛化匹配; 翻译覆盖率;



规则加权的文本关联分类
[作者]陈晓云; 胡运发;

[摘要]近年来,基于关联规则的文本分类方法受到普遍关注。虽然在一般情况下这种方法可获得较好的分类效果。但当样本特征词分布明显不均时,分类规则在各类别的分布也出现不均,从而导致分类准确率下降。本文设计和实现的基于规则权重调整的关联规则文本分类算法可有效地解决这一问题。该算法根据误分类训练样本的数量定义规则强度。对强规则通过乘以小于1的调整因子降低其权重,而弱规则乘以大于1的调整因子提高其权重。实验结果表明经过规则权重的调整,分类质量显著提高。

[Abstract]Recently, categorization methods based on association rules have been given much attention. In general, association classification has the higher accuracy and the better performance. However, the classification accuracy drops rapidly when the distribution of feature words in training set is uneven. Therefore, text categorization algorithm Weighted Association Rules Categorization (WARC) is proposed in this paper. In this method,rule intensity is defined according to the number of misclassified training samp...
[关键字]计算机应用; 中文信息处理; 关联分类; 规则强度; 权重;



基于投影寻踪的中文网页分类算法
[作者]万中英; 王明文; 廖海波;

[摘要]随着Web信息迅猛发展,网络用户对网页自动分类器的需求日益增长。为了提高分类精度,本文提出了一种新的基于投影寻踪(ProjectionPursuit,简称PP)的中文网页分类算法。我们首先利用遗传算法找到一个最好的投影方向,然后将已被表示成为n维向量的网页投影到一维空间。最后采用KNN分类算法对其进行分类。此方法能解决“维数灾难”问题。实验结果表明,我们提出的算法是可行而且是有效的。

[Abstract]With the rapid growth of the World Wide Web (www), there is an increasing need to provide automated classifier to Web users for Web page classification and categorization. In this paper, we propose a new Web-page classification algorithm based on projection pursuit for improving the accuracy. We first seek the best projection direction using the genetic algorithm, and the Web-document (represent by n-dimension vector) is projected to One-dimension space. Then classify the Web-document using classical KNN (k...
[关键字]计算机应用; 中文信息处理; 投影寻踪; 网页分类; 遗传算法; KNN算法;



短语树到依存树的自动转换研究
[作者]党政法; 周强;

[摘要]不同标注体系的树库之间的相互转换是计算语言学研究的重要内容之一。本文在总结国内外几种树库标注体系及相互转换实践的基础上,结合清华汉语树库(TsinghuaChineseTreebank ,简称TCT)标注体系的特点,提出了一种将TCT从短语结构转换成依存结构(DependencyStructure)的算法。这种算法充分利用了TCT具有的功能、结构的双重标记,转换得到的依存树不仅包含了各个节点之间相互依存的层次关系,更包含了相互依存的两个节点的具体的依存关系类型。我们对转换的效果进行了抽样评估,准确率可以达到97 37%。

[Abstract]Automatically conversion between different annotated treebank is an important subject of natural language processing. After a brief summarization of several treebank annotation schema and conversion between them, we proposed a new converting algorithm to automatically convert Tsinghua Chinese Treebank(TCT for brief) from phase structure to dependency structure. This algorithm makes full use of syntactic constituent tag and grammatical relation tag of TCT, and generates dependency structure treebank. The ou...
[关键字]人工智能; 自然语言处理; 树库; 短语树; 依存树; 自动转换;



面向语言信息处理的朝鲜语知识库研究
[作者]毕玉德;

[摘要]在自然语言处理系统(包括机器翻译系统)中,语法、语义信息词典是必不可少的构件。本文以国内外语义工程研究成果为基础,通过对朝鲜语谓词进行句法语义一体化描述,建立面向信息处理的朝鲜语知识库。该研究的语言学理论根据是论元结构理论和语义场理论。我们首先对谓词进行语义分类,然后再对谓词义项作详细的属性描述。在知识库构建上,采用结构体方式将谓词的句法、语义等属性整合在一起。

[Abstract]In any NLP systems, including that of MT, syntactic and semantic information dictionary is an essential component. Based on the achievements in semantic project studies both at home and abroad, the present paper provides an integrative description of the syntax and semantics of Korean predicates, with an aim to construct an information-processing-oriented Korean knowledge database. The semantic framework is drawn from theta structure theory and semantic field theory. We begin with a semantic classification ...
[关键字]计算机应用; 中文信息处理; 朝鲜语; 知识库; 一体化描述; 结构体在;



共95页 当前第12页 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95   
©中国中文信息学会 1981-2007
京ICP备05039057号