学会通知



中国中文信息学会会员发展工作的通知

 


        为推进学会的改革,建立以会员为主体的管理体制,健全会员管理制度,按照中国科协《关于规范全国性学会个人会员登记号的通知》的要求和规定,结合本会的具体情况,建立个人会员登记制度。

会员登记的简要流程:

1.请有意申请者下载并填写"会员信息登记表"

2.将填写完整的"会员信息登记表"通过电子邮件方式发送至学会办公室会员部(huangyi@iscas.ac.cn

3.请任选以下三种方式之一缴纳会费:

    1)银行转账:

        开户银行:工商行北京市分行海淀西区支行

        户        名:中国中文信息学会

        账        号:0200004509014415619

        注:请在附言中注明会员姓名

    2)邮局汇款:

        地        址:北京8718信箱"中国中文信息学会"

        收   款 人:中国中文信息学会

        邮政编码:100190

        联系电话:010-62562916

        注:请在附言中注明会员姓名

    3)中国中文信息学会办公室缴纳

        地        址:北京海淀区中关村南四街4号院7号楼201房间

        联系电话:010-62562916


2013年度"中国中文信息学会"个人会员收费标准:

        个人会员:120元/年

        学生会员:  60元/年


        会员经注册并缴费后,将获得会员登记号和会员证。在参加学会主办的各类学术活动时,凭会员证将享受会费优惠;定期获赠中国中文信息学会会员通讯(电子版)。

        为鼓励更多学者加入学会,完成2013度会员登记的全体会员和部分学生会员(以缴费顺序,先到先得,赠完为止),将获赠2013年度全年《中文信息学报》(纸质版)。


另附中国中文信息学会章程


中国中文信息学会
2013年3月15日

学术活动



IJCAI Workshop - August 3-5,2013,Bejing,China

 


About

This workshop will explore the novel use of techniques from machine learning, data mining, text mining, information retrieval, statistics, information security and privacy, and user modelling, to identify patterns of potentially positive and negative activities in social media by examining the online content, social interactions, and user behaviours. It will also study the metrics in measuring the positive and negative impact of social media on individuals, business organizations, and government agencies. The analysis and mining of these patterns aim to promote positive activities in social medial, while at the same time reveal harmful aspects of social media and suggest ways to tackle and to overcome the negative side.


Objectives

In recent years, social media has continued to grow in popularity and has become a powerful platform for people to unite together under common interests. The explosive use of social media has turned it into a double-edged sword. On the one hand, the information revolution has proven to have a positive impact in society. Social platforms introduce a canvas for self-expression where users can create, manipulate and share content. Positive impacts of these platforms in society include their use in bringing information out of conflicted nations to the World. They have also proven to be an effective way of propagating information, proving to spread the word before mainstream media prints a story. This has been particularly useful for word spreading-based mobilisation in emergency response and crisis situations.


On the other hand, social media platforms have appeared to be also the catalyst in fuelling violent events. The proliferation of insults and personal attacks online along with the appearance of socially disruptive patterns in online social behaviour has become more and more common. Young people are becoming increasingly narcissistic, and obsessed with self-image and shallow friendships partly due to the use of Facebook and other social media platforms. Social media addiction also leads to low self-esteem and even anti-social behaviours.


The aim of this workshop is to bring together researchers from various backgrounds including those from computer science, social science, and psychology, to discuss the current and emergent topics, and cutting-edge approaches to address issues relating to both positive and negative sides of social media.


Important Dates

  • April 20, 2013 – Paper submission deadline
  • May 20, 2013 – Paper acceptance notification
  • May 30, 2013 – Camera-ready copy due

Contact

E-mail: pansom13@easychair.org

Twitter hashtag: #pansom13


详细内容:

http://t.cn/zY3hYgz

百度校园电影推荐系统算法创新大赛正式启动

 


        百度举办"电影推荐算法创新大赛",旨在挖掘更多高精尖的技术开发人员,该活动已于2013年3月1日正式上线。无论你是在读的本科生、还是正在从事数据挖掘研究的从业者,硕士、博士生抑或是在异国读书的学生,只要你有过硬的技术才能、有十足的工作热情,都可以报名参与到活动中来。


        网络内容如此丰富多彩,信息量大到让我们难以驾驭,内容推荐引擎在这个时候就派上了用场,它可以根据我们的喜好,甚至分析我们的用户行为为我们推荐我们想要的内容。在国外,这些年来细分领域的推荐引擎如雨后春笋纷纷拔地而起,比如,音乐推荐有Pandora,书的推荐有 GoodRead,视频的推荐有Netflix等等。如此多的推荐引擎又让用户目不暇接,用户会定期收到各种内容推荐。反观国内,互联网的飞速发展,无线城市的进程不断加快,无处不在的WiFi热点,使笔记本电脑、平板、手机逐渐成为人们娱乐休闲的主要手段,人们观看视频的习惯逐渐从电视转移到了网络,但是由于互联网上的视频数据量呈几何级数增长,人们的选择越来越丰富,想看点电影、视频,反倒不知道如何选择。


        百度作为全球最大的中文搜索引擎,一直致力于不断地扩展搜索范围和深度,为用户带来最舒适的搜索体验是百度始终坚持的目标。针对用户无法有效找到感兴趣的影片这个问题,百度已经成立专门的技术团队去用户授权的社交网络抓取视频,根据用户的社交关系和浏览历史分析用户的兴趣、进行视频推荐,并以最优的顺序展现给用户。纵观国内互联网市场,使用网络视频的用户数已达1.7个亿,面对如此庞大的用户群进行信息的提取、分析、计算,团队的工作量可谓十分巨大,为了扩大技术开发团队的规模,百度特此举办"电影推荐算法创新大赛",希望通过这个活动挖掘更多高精尖的技术开发人员,该活动已于2013年3月1日正式上线。无论你是在读的本科生、还是正在从事数据挖掘研究的从业者,硕士、博士生抑或是在异国读书的学生,只要你有过硬的技术才能、有十足的工作热情,都可以报名参与到活动中来。


        本次校园品牌部活动的技术和数据支持均来自百度垂直搜索部门的技术团队,领队由普林斯顿荣誉归来的汪冠春博士以及宾夕法尼亚大学的胡一川担任,团队的其他几名成员也均来自百度的技术班底。为了增加活动的参与性,主办方还设置了丰厚的活动奖金,一等奖可达10000元,在此你将有机会与世界顶尖高校的博士生,也是本次活动的主办人汪冠春,胡一川等技术大牛过招,交流技术,探讨算法。除此之外,在活动中脱颖而出者还将有进入百度的技术部门实习的机会,进而成为百度的一员!快来参加"电影推荐算法创新大赛"的活动吧!活动网址是:http://openresearch.baidu.com/topic/40.jspx,你将有机会加入到一个高智商的技术团队,实现你的技术梦想并获得意想不到的丰厚大奖,还在等什么呢?!


Google Research Releases Wikilinks Corpus With 40M Mentions And 3M Entities

 


Google Research just launched its Wikilinks corpus, a massive new data set for developers and researchers that could make it easier to add smart disambiguation and cross-referencing to their applications. The data could, for example, make it easier to find out if two web sites are talking about the same person or concept, Google says. In total, the corpus features 40 million disambiguated mentions found within 10 million web pages. This, Google notes, makes it "over 100 times bigger than the next largest corpus," which features fewer than 100,000 mentions.


For Google, of course, disambiguation is something that is a core feature of the Knowledge Graph project, which allows you to tell Google whether you are looking for links related to the planet, car or chemical element when you search for 'mercury,' for example. It takes a large corpus like this one and the ability to understand what each web page is really about to make this happen.


To construct this data set, Google looked at links to Wikipedia pages "where the anchor text of the link closely matches the title of the target Wikipedia page." There is a high probability that this anchor text is a mention of the corresponding entity that's the focus of the entity that's discussed in the Wikipedia entry.


The 10 million annotated web pages, sadly, aren't part of the corpus because of copyright issues, but the UMass Wikilinks project features all the necessary tools to create this data from scratch. The UMass team also published a paper that explains the process that was used to create this data set in more detail (PDF).


Last year, Google released a similar data set when it launched a database with over 7.5 million concepts and 175 million unique text strings, which is similar to what Google itself uses to suggest targeted keywords for advertisers. That set, too, was built by looking at Wikipedia articles to identify concepts and the anchor links that other websites used to link to them.


详细内容:

http://t.cn/zYmoqft

一个命名排歧(name disambiguation)的数据集

Xuezhi Wang, Jie Tang, Hong Cheng, Philip S. Yu

KEG Group, Tsinghua University, China

 


Introduction

Name ambiguity has long been viewed as a challenging problem in many applications, such as scientific literature management, people search, and social network analysis. When we search a person name in these systems, many documents (e.g., papers, webpages) containing that person's name may be returned. Which documents are about the person we care about? Although much research has been conducted, the problem remains largely unsolved, especially with the rapid growth of the people information available on the Web.


We share related data sets and our ideas for name disambiguation on this page. If you use the data for publication, please kindly cite the following papers:

@article{Tang:12TKDE,

    author = {Jie Tang and Alvis C.M. Fong and Bo Wang and Jing Zhang},

    title = {A Unified Probabilistic Framework for Name Disambiguation in Digital Library},

    journal ={IEEE Transactions on Knowledge and Data Engineering},

    volume = {24},

    mber = {6},

    year = {2012},

}


@INPROCEEDINGS{ wang:adana:,

    AUTHOR = "Xuezhi Wang and Jie Tang and Hong Cheng and Philip S. Yu",

    TITLE = "ADANA: Active Name Disambiguation",

    BOOKTITLE = "ICDM'11",

    PAGES = {794-803},

    YEAR = {2011},

}

[PDF] [Slides] [Simple Version Download , Readme] [Simple Version Download , Readme]

详细内容:

http://t.cn/zYnThJz

 


数据挖掘、数据分析、人工智能及机器学习课程资源



Easyhadoop技术大学:大数据实战专业培训班


详细内容:http://t.cn/zY8JjWq


WWW 2013 Accepted Papers


详细内容:http://www2013.org/2013/02/18/www2013-accepted-papers/


Stanford Deep Learning Wiki


详细内容:http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial


《程序员》杂志2013年2月刊发表的科普文章: 深度学习 - 机器学习的新浪潮


详细内容:文章:http://t.cn/zY3iQdZ


《中文信息学报》目录



《中文信息学报》第27卷,第1期 2013年1月目录

 


题目

作者

页码

基于大规模语料库的汉语词义相似度计算方法 石 静, 吴云芳, 邱立坤, 吕学强

1

一种基于搭配的中文词汇语义相似度计算方法 王 石,曹存根,裴亚军,夏 飞

7

基于双语依存关系映射的中英文词表构建研究 徐 华,刘丹丹,钱龙华,周国栋

15

网页中商品"属性—值"关系的自动抽取方法研究 唐 伟,洪 宇,冯艳卉,姚建民,朱巧明

21

事件超图模型及类型识别 肖 升,何炎祥

30

一种基于社会化标签的信息检索方法 李 鹏,王 斌,晋 薇

39

中文博客多方面话题情感分析研究 傅向华, 刘 国, 郭岩岩, 郭武彪

47

第三届中文倾向性分析评测(COAE2011)语料的构建与分析 廖祥文,许洪波,孙 乐,姚天昉

56

统计机器翻译中一致性解码方法比较分析 段 楠,李 沐,周 明

64

BFSCTC汉语句义结构标注语料库 刘盈盈,罗森林,冯 扬,韩 磊,陈 功,王 倩

72

基于统计的记叙文语句焦点的分布特点研究 赵建军, 杨玉芳, 吕士楠

81

基于组合核的蛋白质交互关系抽取 李丽双,刘 洋,黄德根

86

"方言同音字汇"自动生成软件的设计及实现 程南昌, 侯 敏

93

针对发音质量评测的声学模型优化算法 严 可,魏 思,戴礼荣

98

新标准体系下蒙古文变形显现模型的设计与实现 王 震,刘汇丹,吴 健

108

现代藏语助动词结尾句子边界识别方法 赵维纳, 于 新,刘汇丹,李 琳,王 磊,吴 健

115

水书键盘输入系统研究与实现 陈笑蓉,杨撼岳,郑高山,黄 千

120