借助八爪鱼采集器实现过刊网刊元数据的自动提取

崔玉洁; 廖坤

文章摘要

崔玉洁,廖坤.借助八爪鱼采集器实现过刊网刊元数据的自动提取.编辑学报,2016,28(5):485-487

借助八爪鱼采集器实现过刊网刊元数据的自动提取

Realization of automatic extraction of metadata in back issues of network journals by octopus collector

投稿时间：2016-03-06 修订日期：2016-03-06

DOI：

中文关键词: 采集器网刊元数据自动提取

英文关键词: collector webzine metadata automatic extraction

基金项目:中国高校科技期刊研究会2015年专项课题资助项目(CUJS2015-010)；中央高校基本业务费专项资金资助项目(SWU1609165)；全国理工农医院校社科学报2016年度基金资助项目(LGNY16B8)

作者	单位
崔玉洁	西南大学期刊社,400715,重庆
廖坤	西南大学期刊社,400715,重庆

摘要点击次数: 1050

全文下载次数: 1193

中文摘要:

现有的元数据提取方法提取规则烦琐、适应性差。针对这一问题,文章提出了借助八爪鱼采集器实现过刊网刊元数据提取的新方法。该方法以大型数据库的网页信息为对象,建立了提取元数据的流程图,通过该流程图设置相应的规则,并配置抓取数据模块,最后将该方法应用于网刊元数据的自动提取中。实际应用显示,该方法有效地提高了元数据的提取性能,并且具有较强的适应性。

英文摘要:

Existing metadata extraction methods have problems such as cumbersome rules and poor adaptability. To solve this problem, we propose a means of octopus collector to realize metadata extraction for published webzines. In this method, a large database of information on the page is regarded as an object, a flowchart of extracting metadata is established, rules are set through the flow chart, and the data capture module is configured. The method has been applied to the final webzine automatic metadata extraction. Practical application shows that the method can effectively improve the performance of metadata extraction, and has strong adaptability.

查看全文查看/发表评论下载PDF阅读器

关闭