S. Altaf, S. Iqbal, M. Soomro, “Efficient natural language classification algorithm for detecting duplicate unsupervised features”, Informatics and Automation, 20:3 (2021), 623

Informatics and Automation

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Informatics and Automation:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Informatics and Automation, 2021, Issue 20, volume 3, Pages 623–653
DOI: https://doi.org/10.15622/ia.2021.3.5 (Mi trspy1155)

This article is cited in 2 scientific papers (total in 2 papers)

Artificial Intelligence, Knowledge and Data Engineering

Efficient natural language classification algorithm for detecting duplicate unsupervised features

S. Altaf^a, S. Iqbal^b, M. Soomro^c

^a Pir Mehr Ali Shah Arid Agriculture University
^b Pakistan Space and Upper Atmosphere Research Commission (SUPARCO), Pakistan
^c Manukau Institute of Technology

Full-text PDF (1974 kB) Citations (2)

DOI: https://doi.org/10.15622/ia.2021.3.5

Abstract: This paper focuses on capturing the meaning of Natural Language Understanding (NLU) text features to detect the duplicate unsupervised features. The NLU features are compared with lexical approaches to prove the suitable classification technique. The transfer-learning approach is utilized to train the extraction of features on the Semantic Textual Similarity (STS) task. All features are evaluated with two types of datasets that belong to Bosch bug and Wikipedia article reports. This study aims to structure the recent research efforts by comparing NLU concepts for featuring semantics of text and applying it to IR.
The main contribution of this paper is a comparative study of semantic similarity measurements. The experimental results demonstrate the Term Frequency–Inverse Document Frequency (TF-IDF) feature results on both datasets with reasonable vocabulary size. It indicates that the Bidirectional Long Short Term Memory (BiLSTM) can learn the structure of a sentence to improve the classification.

Keywords: clustering, information retrieval, TF-IDF feature, Par2Vec, natural language texts, lexical approaches.

Document Type: Article

UDC: 006.72

Language: English

Citation: S. Altaf, S. Iqbal, M. Soomro, “Efficient natural language classification algorithm for detecting duplicate unsupervised features”, Informatics and Automation, 20:3 (2021), 623–653

Citation in format AMSBIB

\Bibitem{AltIqbSoo21}

\by S.~Altaf, S.~Iqbal, M.~Soomro

\paper Efficient natural language classification algorithm for detecting duplicate unsupervised features

\jour Informatics and Automation

\yr 2021

\vol 20

\issue 3

\pages 623--653

\mathnet{http://mi.mathnet.ru/trspy1155}

\crossref{https://doi.org/10.15622/ia.2021.3.5}

Linking options:

https://www.mathnet.ru/eng/trspy1155

https://www.mathnet.ru/eng/trspy/v20/i3/p623

This publication is cited in the following 2 articles:

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Что такое QR-код?

Registration to the website

Logotypes