Volume 7, Issue 1 (6-2020)                   Human Information Interaction 2020, 7(1): 15-26 | Back to browse issues page

XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Mansouri A, Zarmehr F, Karshenas H. A review of text mining approaches and their function in discovering and extracting a topic. Human Information Interaction 2020; 7 (1)
URL: http://hii.khu.ac.ir/article-1-2909-en.html
Isfahan University
Abstract:   (3122 Views)
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery.
Methodology: The study is an analytical review of the literature of text mining and topic modeling. 
Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text mining methods focus on topics and general partiality of the text. PLSA is applicable to documents dealing with a topic, unlike the LSA, it is used to discover general themes and contexts. However, LDA is more applicable to documents that address several issues. The CTM, method can be used to identify relationship between different subject categories.
Conclusion: Text mining tactics are suitable for employing analysis in discovering and extracting the text subjects.
Full-Text [PDF 548 kb]   (1234 Downloads)    
Type of Study: Applicable | Subject: Special

References
1. Abosaba Kazemaini, A(2011). Comparison of Comprehensiveness and Prevention of Recovered Information Based on Front and Back Storage Storage Systems in Persian Library Software. Master thesis. Department of Library & Information Science, Faculty of Educational Sciences and Psychology, Isfahan University.
2. Babu, P, B., Sarangi, A.K., & Madalli, D. P. (2012). "Knowledge Organization Systems for semantic digital Libraries". International Conference Trends in Knowledge and Information Dynamics. Bangolare, Pakistan. Retrieved from: http://eprints.rclis.org/19759/1/KOS semantic Digital Libraries.pdf
3. Bitterman, Andre; Fischer, Andreas (2018). How to identify hot topics in psychology using topic modeling. Zeitschrift fur psychologie. 226(1), 3-13. [DOI:10.1027/2151-2604/a000318]
4. Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 17-35. [DOI:10.1214/07-AOAS114]
5. Blei, D; Ng, A; Jordan, M (2003), "Latent dirichlet allocation," Journal, 3, 993-1022.
6. Blei, David & Lafferty, John (2007). A correlated topic model of science. The annual of applied statistics,1(1), 17-35. [DOI:10.1214/07-AOAS114]
7. Chien, Jt(2016). Hierarchical theme and topic modeling. IEEE trans neural netw learn syst.27(3): 565-578. Available at: https://www.researchgate.net/publication/274394886 Hierarchical Theme and Topic Modeling [DOI:10.1109/TNNLS.2015.2414658]
8. Dean, J(2014). Bigdata, datamining & machine learning: Value creation for business leader and practitioners, Retrieved from: https://www.wiley.com/en-ir/Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners-p-9781118618042 [DOI:10.1002/9781118691786]
9. Drakos, G(2019). NLP Tutorials: topic modeling with SVD and truncated SVD. GDcoder. Retrieved from: https://medium.com/@george.drakos62/nlp-tutorial-topic-modeling-with-singular-value-decomposition-svd-and-truncated-svd-fbpca-and-5fa612277c22.
10. Efsun ,S., Yadav, K., Chio, H. A (2017). Topic modeling based classification of clinical report. Association for computational linguistics, 67-73. Retrieved from: http://aclweb.org/anthology/P13-3010.
11. Fang EX, Li M-D, Jordan MI, Liu H (2018) Mining massive amounts of genomic data: a semi parametric topic modeling approach
12. Fang, D., Yang, H., Gao, B. and Li, X. (2018), "Discovering research topics from library electronic references using latent Dirichlet allocation", Library Hi Tech, 36(3), 400-410. [DOI:10.1108/LHT-06-2017-0132]
13. Figuerola, C.G., García Marco, F.J. & Pinto, M. Sci-entometrics (2017) 112: 1507. Retrrieved from: https://doi.org/10.1007/s11192-017-2432-9 [DOI:10.1007/s11192-017-2432-9.]
14. Gupta ,V.and G. Lehal(2009)"A Survey of Text Mining Techniques and Applications", Journal of Emerging Technologies In Web Intelligence, 1. [DOI:10.4304/jetwi.1.1.60-76]
15. Hagen, Loni(2018). Content analysis of e-petition with topic modeling: how to train and evaluate LDA models? Information processing & management,54(6), 1292-1307. [DOI:10.1016/j.ipm.2018.05.006]
16. Heydari, F(2014). Web users clustering and initial fetching of web pages using hidden probabilistic semantic analysis. Master thesis. Isfahan University of Technology.
17. Hinde J. (2011) Logistic Normal Distribution. In: Lovric M. (eds) International Encyclopedia of Statistical Science. Springer, Berlin, Heidelberg [DOI:10.1007/978-3-642-04898-2_342]
18. Hofmann T.(2001) "Unsupervised learning by probabilistic latent semantic analysis," Machine learning, 42(1-2), 177-196. [DOI:10.1023/A:1007617005950]
19. Hwang, S.Y., Wei, C.P., Lee, C.H., & Chen, Y.S. (2017). Coauthor ship network based literature recommendation with topic model. Online Information Review, 41(3), 318-336. [DOI:10.1108/OIR-06-2016-0166]
20. Khademian, M., Kokabi, M(2018). Liberian Thing's Social Labels Versus Subject Headings in the Library of Congress: Review of Texts. Journal of Library and Information Science, 8 (1,) 313- 335. Retrieved 3/3/98, from : https://infosci.um.ac.ir/index.php/riis/article/view/57823
21. Kinyanjui, Daniel (2016) Subject cataloguing and the principles on which the choice of subject headings should be based, GRIN Verlag: Munich.
22. Koller, D., and Friedman, N.(2009), "Probabilistic Graphical Models: Principles and Techniques", The MIT Press.
23. Kurata, K & et al (2018). Analyzing library and information science full-text articles using a topic modeling approach. 81Annual meeting of the association for information science & technology I nVancouvar of Canada (10-14, November, 2018). Retrived from: https://www.researchgate.net/publication/330812928 Analyzing library and information science full-text articles using a topic modeling approach
24. lee, S., Song, J & Kim,Y(2010). An Empirical comparison of four text mining methods.Journal of computer information system. 51(1):1-10. Retrieved from: https://www.researchgate.net/publication/286840108 An empirical comparison of four text mining methods
25. Meen Ch & Yongjun, Zh (July 18th 2018). Scientometrics of Scientometrics: Mapping Historical Footprint and Emerging Technologies in Scientometrics, Scientometrics, Mari Jibu and Yoshiyuki Osabe, IntechOpen, DOI: 10.5772/intechopen.77951. Available from: https://www.intechopen.com/books/scientometrics/scientometrics-of-scientometrics-mapping-historical-footprint-and-emerging-technologies-in-scientome
26. Mohammadian, B (2014) Identification of scientific theft in Persian documents based on thematic modeling. Master thesis. Department of Computer, Faculty of Engineering, Kharazmi University.
27. Mortazavi, A., Javaherian, A(2013). Application of single value decomposition to random noise attenuation in synthetic and real seismic data. Oil Research. (80), 123-134. Retrieved from: https://pr.ripi.ir/article 459 85173420168d8944de96e91cba871aa2.pdf
28. Nadezhda, Y & Aleksey, F(2018). Improving the quality of information retrieval using syntactic analysis of search query. Retrieved from:
29. https://www.semanticscholar.org/paper/Improving-the-Quality-of-Information-Retrieval-of-Yarushkina-Filippov/d0955103ee4e4cd78a0d24f880a1cda7f3b35d5e
30. Newman, D., Hagedorn, K., Chemudugunta, C., & Smyth, P. (2007). Subject metadata enrichment using statistical topic models. JCDL. Retrieved from: https://www.researchgate.net/publication/220924369 Subject Metadata Enrichment using Statistical Topic Models [DOI:10.1145/1255175.1255248]
31. Norouzi, Y.,Khavidaki, S(2014). Social Semantic Digital Library: A Perspective for Digital Libraries in Iran. Rahyaft, 57, 63-74. Retrieved 5/8/98 from http://rahyaft.nrisp.ac.ir/article 13557.html
32. Rani, M., Dhar, A, K., Vyas, O.P(2017). Semi- Automatic terminology ontology learning based on topic modeling. Engineering Application of Artificial Intelligence, 63, 108-125. Retrived from: https://www.researchgate.net/publication/317195300 Semi-Automatic Terminology Ontology Learning Based on Topic Modeling [DOI:10.1016/j.engappai.2017.05.006]
33. Rani, M., Dhar, A., Kumar; Vyas, O.P(2017). Semi- Automatic terminology ontology learning based on topic modeling. Engineering Application of Artificial Intelligence, 63, 108-125. [DOI:10.1016/j.engappai.2017.05.006]
34. Sanandres, E; Madariaga, C; Abello, R(2018). Topic modelling of twitter conversations. Retrieved from: https://www.researchgate.net/publication/326450126 Topic Modeling of Twitter Conversations/citations
35. Selvi, M & et al (2019). Classification of medical dataset along with topic modeling using LDA. Lecture notes in electrical engineering 511.Springer. [DOI:10.1007/978-981-13-0776-8_1]
36. Soergel, D )2004. (Indexing language and thesauri: construction and maintenance. Los Angeles, CA: Melville
37. Sohrabi, B; Raeesi vanani, I; Baranizade Shineh, M (2017). Topic Modeling and classification of cyber-space papers using text mining. Cyberspace studies, 2(1), 103- 125.
38. Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.
39. Steyvers, M; Smyth, P; Rosen-Zvi, M; Griffiths, T, (2004) "Probabilistic author-topic models for in-formation discovery," in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington. [DOI:10.1145/1014052.1014087]
40. Strunk Jr, W.(2007), "The elements of style", Fili-quarian Publishing, LLC.
41. Venkat N. Gudivada, Amogh R. Gudivada(2018). Hand book of ststistic. USA, Elsevier. Retrieved from : https://www.sciencedirect.com/topics/computer-science/vector-space-models
42. Zamani, M., Dianat, R., Sadeghzadeh, M(2013). Classification of Persian Texts Using Probabilistic Hidden Semantic Analysis Method, 1st National Conference on Application of Intelligent Systems (Soft Computing) in Science and Technology, Quchan, Islamic Azad University of Quchan.
43. Zhao, R., & K. Mao. 2018. Fuzzy Bag-of-Words Model for Document Representation. IEEE Trans-actions on Fuzzy Systems .لی26 (2): 794-804. doi:10.1109/TFUZZ.2017.2690222. [DOI:10.1109/TFUZZ.2017.2690222]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2024 CC BY-NC 4.0 | Human Information Interaction

Designed & Developed by : Yektaweb