Informatics and Automation
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Informatics and Automation:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Informatics and Automation, 2021, Issue 20, volume 3, Pages 623–653
DOI: https://doi.org/10.15622/ia.2021.3.5
(Mi trspy1155)
 

This article is cited in 2 scientific papers (total in 2 papers)

Artificial Intelligence, Knowledge and Data Engineering

Efficient natural language classification algorithm for detecting duplicate unsupervised features

S. Altafa, S. Iqbalb, M. Soomroc

a Pir Mehr Ali Shah Arid Agriculture University
b Pakistan Space and Upper Atmosphere Research Commission (SUPARCO), Pakistan
c Manukau Institute of Technology
Abstract: This paper focuses on capturing the meaning of Natural Language Understanding (NLU) text features to detect the duplicate unsupervised features. The NLU features are compared with lexical approaches to prove the suitable classification technique. The transfer-learning approach is utilized to train the extraction of features on the Semantic Textual Similarity (STS) task. All features are evaluated with two types of datasets that belong to Bosch bug and Wikipedia article reports. This study aims to structure the recent research efforts by comparing NLU concepts for featuring semantics of text and applying it to IR.
The main contribution of this paper is a comparative study of semantic similarity measurements. The experimental results demonstrate the Term Frequency–Inverse Document Frequency (TF-IDF) feature results on both datasets with reasonable vocabulary size. It indicates that the Bidirectional Long Short Term Memory (BiLSTM) can learn the structure of a sentence to improve the classification.
Keywords: clustering, information retrieval, TF-IDF feature, Par2Vec, natural language texts, lexical approaches.
Document Type: Article
UDC: 006.72
Language: English
Citation: S. Altaf, S. Iqbal, M. Soomro, “Efficient natural language classification algorithm for detecting duplicate unsupervised features”, Informatics and Automation, 20:3 (2021), 623–653
Citation in format AMSBIB
\Bibitem{AltIqbSoo21}
\by S.~Altaf, S.~Iqbal, M.~Soomro
\paper Efficient natural language classification algorithm for detecting duplicate unsupervised features
\jour Informatics and Automation
\yr 2021
\vol 20
\issue 3
\pages 623--653
\mathnet{http://mi.mathnet.ru/trspy1155}
\crossref{https://doi.org/10.15622/ia.2021.3.5}
Linking options:
  • https://www.mathnet.ru/eng/trspy1155
  • https://www.mathnet.ru/eng/trspy/v20/i3/p623
  • This publication is cited in the following 2 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Informatics and Automation
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2024