Loading [MathJax]/jax/output/SVG/config.js
Proceedings of the Institute for System Programming of the RAS
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Proceedings of the Institute for System Programming of the RAS, 2022, Volume 34, Issue 5, Pages 163–170
DOI: https://doi.org/10.15514/ISPRAS-2022-34(5)-10
(Mi tisp727)
 

Data mining methods to compare englishes

O. V. Donina

Voronezh State University
Abstract: The paper presents the results of the corpus-based research of noun cryptotypes in 20 varieties of English (Englishes). The data for this research collected from Mark Davies’ corpora GloWbE and NOW enabled us to focus on variation in the covert classification of nouns in modern Englishes. A noun cryptotype introduced by Whorf is approached as ‘a covert type of classification of nouns, marked by lexical selection in a syntactical classifier rather than a morphological tag’. The purpose of the study has been to compare and contrast the covert classification of basic 23 emotions in 20 Englishes (64,702 tokens). 20 Englishes have been clustered with the help of Data Mining methods (such as k-means clustering and a self-organizing Kohonen map). There are six clusters that appeared to be corresponding to geographic areas: American cluster (American and Canadian Englishes); Australian cluster (Australian and New Zealand Englishes); European cluster (British and Irish Englishes); Asian cluster (Indian, Pakistani, Singapore, Hong Kong, Malaysian, Bangladeshi, Sri Lankan, and Philippine Englishes); African cluster (Kenyan, South African, Nigerian, Ghanaian, and Tanzanian Englishes); Caribbean cluster (Jamaican English). The correlation coefficients among Englishes in the Asian and African clusters (the Outer Circle in the World Englishes Paradigm of Braj B. Kachru) range from 0.74 to 0.8 due to little contact among the varieties inside these clusters. The correlation coefficients between Englishes in the American, Australian and European clusters (the Inner Circle, Kachru) range from 0.92 to 0.933, which indicates a high consistency of these varieties owing to the long lasting, enduring linguistic contacts.
Keywords: Data Mining, computer modeling, corpora studies, cryptotype analysis, Englishes
Document Type: Article
Language: English
Citation: O. V. Donina, “Data mining methods to compare englishes”, Proceedings of ISP RAS, 34:5 (2022), 163–170
Citation in format AMSBIB
\Bibitem{Don22}
\by O.~V.~Donina
\paper Data mining methods to compare englishes
\jour Proceedings of ISP RAS
\yr 2022
\vol 34
\issue 5
\pages 163--170
\mathnet{http://mi.mathnet.ru/tisp727}
\crossref{https://doi.org/10.15514/ISPRAS-2022-34(5)-10}
Linking options:
  • https://www.mathnet.ru/eng/tisp727
  • https://www.mathnet.ru/eng/tisp/v34/i5/p163
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Proceedings of the Institute for System Programming of the RAS
    Statistics & downloads:
    Abstract page:30
    Full-text PDF :19
     
      Contact us:
    math-net2025_05@mi-ras.ru
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025