文章摘要
网刊元数据自动提取和PDF文件自动分割实践
Practice on automatic extraction of metadata from Founder typesetting file and pages from PDF file for webzine: Website of Nursing of Integrated Traditional Chinese and Western Medicine as an example∥PAN Xin,TENG Fei,YAN Yan
投稿时间:2017-11-20  修订日期:2018-01-10
DOI:
中文关键词: 网刊  元数据  自动提取  PDF文件  自动分割-合并
英文关键词: webzine  metadata  automatic extraction  PDF file  automatic split-merge
基金项目:
作者单位E-mail
潘新 上海交通大学 panxin2015@sjtu.edu.cn 
摘要点击次数: 25
全文下载次数: 
中文摘要:
      本文以《中西医结合护理(中英文)》排版所用的方正书版文件为例,介绍了用于圈定元数据字段的“准标签对”的选择技巧,以及fbd文件与html文件之间的字符兼容性和格式对等性的处理方法。以此为基础,可以方便地实现高质量网刊元数据的高效率自动提取,以及PDF文件的精准自动分割与转页合并。实践证明,对于特定期刊而言,上述工作是很容易自主完成的。
英文摘要:
      Taking the Founder typesetting file (fbd file) of Nursing of Integrated Traditional Chinese and Western Medicine as an example, this paper introduces the tips for select prospective tag- pairs to locate different metadata fields in fbd file, and the way to solve the problems of character compatibility and format equivalence between fbd file and HTML file. Thus, High-quality metadata can be automatically extracted from Founder typesetting file with high efficiency, and split-merge of pages from PDF file can be accurately realized. Practices prove that all the above jobs can be easily completed for a particular journal.
View Fulltext   查看/发表评论  下载PDF阅读器
关闭