基于LDA和word2vec模型的情报学期刊主题挖掘与演化分析
首发时间:2023-05-12
摘要:LDA模型常常被用来探寻情报学学科领域的研究变化趋势以及某一时期的研究热点。针对LDA模型主题模型挖掘过程中主题个数选择问题和主题关联构建时未考虑主题词语义含义问题做进一步优化研究,为丰富和完善主题演化分析方法提供参考。以《情报杂志》为例,运行LDA模型对期刊中摘要、标题和关键字结合作为语料主题模型。在设置主题个数时,结合使用困惑度和主题平均相似度初步确定主题个数,再运用信息熵进一步优化过滤识别出的主题;在主题演化建立关联时,提出一种基于LDA和word2vec的主题演化研究方法方法。基于LDA和word2vec主题演化研究方法能够结合语义表示很好地发现主题内容新生、消亡、继承、分化、融合关系,这对科研人员判断学科变化趋势,决策者发现研究重点有重要意义。
关键词: 管理科学与工程 主题挖掘 主题演化 LDA word2vec 《情报杂志》
For information in English, please click here
LDA and word2vec Based Topic Mining and Evolution Analysis of Chinese Information Science Journal Paper
Abstract:The LDA model is usually adopted to investigate the research dynamics and research hotspots in the field of information science. In the process of topic mining with LDA model, an optimization is made on the determination of the number of topics and taking semantic meaning of topic words into consideration in the construction of topic association, so as to provide a reference for enriching and improving the method of topic evolution.Taking Journal of Intelligence as an example, the LDA model is used to investigate the topics of the journal with abstracts, titles and keywords as the corpus. When setting the number of topics, the number of topics is preliminarily determined by the perplexity and average similarity of topics, and then the information entropy is used to filter the identified topics; when establishing the association of topics, a LDA and word2vec based topic evolution analysis method is proposed. With semantic representation, the proposed method can identify key topic semantic evolution patterns effectively such as rebirth, extinction, inheritance, division and merging of topic content, which is of great significance for researchers to investigate the trend of discipline development.
Keywords: Management Science and Engineering topic mining topic evolution LDA word2vec Journal of Intelligence
基金:
引用
No.****
动态公开评议
共计0人参与
勘误表
基于LDA和word2vec模型的情报学期刊主题挖掘与演化分析
评论
全部评论