Proceedings of the Institute for System Programming of the RAS
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Proceedings of ISP RAS:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Proceedings of the Institute for System Programming of the RAS, 2015, Volume 27, Issue 4, Pages 129–144
DOI: https://doi.org/10.15514/ISPRAS-2015-27(4)-7
(Mi tisp167)
 

This article is cited in 3 scientific papers (total in 3 papers)

Methods for construction of socio-demographic profile of Internet users

A. G. Gomzinab, S. D. Kuznetsovacb

a Lomonosov Moscow State University
b Institute for System Programming of the Russian Academy of Sciences
c Moscow Institute of Physics and Technology
Full-text PDF (283 kB) Citations (3)
References:
Abstract: The paper is devoted to methods for construction of socio-demographic profile of Internet users. Gender, age, political and religion views, region, relationship status are examples of demographic attributes. This work is a survey of methods that detect demographic attributes from user’s profile and messages. The most of surveyed works are devoted to gender detection. Age, political views and region are also interested researches.
The most popular data sources for demographic attributes extraction are social networks, such as Facebook, Twitter, Youtube.
The most of solutions are based on supervised machine learning. Machine learning allows to find target values (demographic attributes) dependencies from input data and use them to predict the value of the target attribute for the new data. The following problem solving steps are surveyed in the paper: feature extraction, feature selection, model training, evaluation.
Researches use different kind of data to predict demographic attributes. The most popular data source is text. Words sequences (n-grams), parts of speech, emoticons, features specific to particular resources (eg, @ mentions and # Hashtags on Twitter) are extracted and used as input for machine learning algorithms. Social graphs are also used as source data. Communities of users that are automatically extracted from social graph are user as features for attributes prediction.
Text data produces a lot of features. Feature selection algorithms are needed to reduce feature space.
The paper surveys feature selection, classification and regression algorithms, evaluation metrics.
Keywords: demographic attributes, social networks, text processing, machine learning.
Bibliographic databases:
Document Type: Article
Language: Russian
Citation: A. G. Gomzin, S. D. Kuznetsov, “Methods for construction of socio-demographic profile of Internet users”, Proceedings of ISP RAS, 27:4 (2015), 129–144
Citation in format AMSBIB
\Bibitem{GomKuz15}
\by A.~G.~Gomzin, S.~D.~Kuznetsov
\paper Methods for construction of socio-demographic profile of Internet users
\jour Proceedings of ISP RAS
\yr 2015
\vol 27
\issue 4
\pages 129--144
\mathnet{http://mi.mathnet.ru/tisp167}
\crossref{https://doi.org/10.15514/ISPRAS-2015-27(4)-7}
\elib{https://elibrary.ru/item.asp?id=24928726}
Linking options:
  • https://www.mathnet.ru/eng/tisp167
  • https://www.mathnet.ru/eng/tisp/v27/i4/p129
  • This publication is cited in the following 3 articles:
    Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Proceedings of the Institute for System Programming of the RAS
    Statistics & downloads:
    Abstract page:268
    Full-text PDF :114
    References:46
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025