Using machine learning to study the population life quality: methodological aspects

E. V. Shchekotin; В. Л. Гойко; P. A. Basina; B. B. Bakulin

doi:10.26425/2658-347X-2022-5-1-87-97

Using machine learning to study the population life quality: methodological aspects

E. V. Shchekotin, В. Л. Гойко, P. A. Basina, B. B. Bakulin

https://doi.org/10.26425/2658-347X-2022-5-1-87-97

Full Text:

PDF (Rus)

Generate QR code

Abstract

Assessment of the population life quality is an important and relevant sociological task. Machine learning as a classiﬁcation tool of social network users’ digital traces makes it possible to create a base to calculate subjective life quality index. The article consistently reviews all stages of the machine learning algorithms application to assess the life quality of the population of the regions of the Russian Federation and the issues of improving neural network accuracy. To train the neural network the authors formed a set of marked-up data extracted from regional communities of the social network “VKontakte”. Various approaches to text vectorisation, publicly available neural network models pre-trained on large Russian-language text corpora, as well as metrics for evaluating the algorithms results were analysed. Computational experiments with different algorithms were carried out, according to the results of which the Rubert-tiny algorithm was selected due to its high learning and classiﬁcation rate. During the model parameters adjustment, the accuracy of f1-macro 0.545 was achieved. Computational experiments were carried out using Python scripts.Typical errors that a neural network makes in the process of automatic content classiﬁcation were considered. The results of the study can be used to calculate the online activity index in the VKontakte social network of users from various Russian regions, on the basis of which the subjective life quality index will be calculated in the future. Improving the neural network accuracy will make it possible to obtain more reliable data for assessing the life quality in Russian regions based on users’ digital traces.

Keywords

life quality, well-being, digital methods, non-reactive methods, digital traces, social networks, VKontakte, machine learning, text classiﬁcations

About the Authors

E. V. Shchekotin

Novosibirsk State University of Economics and Management
Russian Federation

Evgeniy V. Shchekotin, Cand. Sci. (Philos.), Assoc. Prof., Head of the laboratory

Novosibirsk

В. Л. Гойко

National Research Tomsk State University
Russian Federation

Vyacheslav L. Goiko, Head of the laboratory

Tomsk

P. A. Basina

National Research Tomsk State University
Russian Federation

Polina A. Basina, Analyst

Tomsk

B. B. Bakulin

National Research Tomsk State University
Russian Federation

Vyacheslav V. Bakulin, Analyst

Tomsk

References

1. Bogdanov M.B. and Smirnov I.B. (2021), “Opportunities and limitations of digital footprints and machine learning methods in Sociology”, Monitoring obshchestvennogo mneniya: ekonomicheskie i sotsial’nye peremeny, no. 1, pp. 304–328. (In Russian). https://doi.org/10.14515/monitoring.2021.1.1760

2. Chen T. and Guestrin C. (2016), “XGBoost: A Scalable Tree Boosting System”, KDD ‘16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. https://doi.org/10.1145/2939672.293978515

3. Chichkanov V.P. and Vasilyeva E.V. (2014), “Management of regional life quality: eﬀectiveness evaluation and mechanism”, Gosudarstvennoe upravlenie. Elektronnyi vestnik, no. 47, pp. 163–182. (In Russian).

4. Dawson C. (2019), A–Z of digital research methods, Routledge, New York, USA.

5. Devlin J., Chang M., Lee K. and Toutanova K. (2019), “Bert: Pre-training of deep bidirectional transformers for language understanding”, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), vol. 1, pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423

6. Dvoynikova A.A. and Karpov A.A. (2020), “Analytical review of approaches to Russian text sentiment recognition”, Information and control systems, no. 4 (107), pp. 20–30. (In Russian). https://doi.org/10.31799/1684-8853-2020-4-20-30

7. Jones K.S. (2004), “A statistical interpretation of term speciﬁcity and its application in retrieval”, Journal of Documentation, vol. 60, no. 5, pp. 493–502. https://doi.org/10.1108/00220410410560573

8. Joulin A., Grave E., Bojanowski P. and Mikolov T. (2016), “Bag of tricks for eﬃcient text classiﬁcation”, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 2. Valencia, Spain: Association for Computational Linguistics, pp. 427–431. https://doi.org/10.18653/V1/E17-2068

9. Kryshtanovskaya O.V. (2018), “Contactless sociology: new forms of research in a digital age”, Digital Sociology, no. 1, pp. 4-9. (In Russian). https://doi.org/10.26425/2658-347Х-2018-1-4-8

10. Kutuzov A. and Kuzmenko E. (2017), “WebVectors: A toolkit for building web interfaces for vector semantic models”, Communications in Computer and Information Science, vol. 661, pp. 155–161. https://doi.org/10.1007/978-3-319-52920-2_15

11. McGillivray M., Clarke M. [Eds], (2006.) Understanding human well-being, United Nations University Press, Tokyo, Japan; New York, USA; Paris, France.

12. Mikolov T., Chen K., Corrado G. and Dean J. (2013a), “Eﬃcient estimation of word representations in vector space”, Proceedings of Workshop at ICLR, Scottsdale, May 2–4, pp. 1–11.

13. Mikolov T., Yih W.-T. and Zweig G. (2013b), “Linguistic regularities in continuous space word representations”, Proceedings of NAACL HLT, Atlanta, Georgia, June 9–14, pp. 746–751.

14. Müller A. and Guido S. (2016), Introduction to machine learning with Python, trans. from Eng. and ed. A.V. Gruzdeva, Williams, Moscow, Russia. (In Russian).

15. Nikolaenko G.A. and Fedorova A.A. (2017), “Non-reactive strategy: unobtrusive methods of gathering sociological information in web 2.0 age – evidence from digital ethnography and big data”, Sociology of power, vol. 29, no. 4, pp. 36–54. (In Russian). https://doi.org/10.22394/2074-0492-2017-4-36-54

16. Pennington J., Socher R. and Manning C.D. (2014), “GloVe: Global vectors for word representation”, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar: Association for Computational Linguistics, pp. 1532–1543. https://doi.org/10.3115/v1/D14-1162

17. Peters M.E., Neumann M., Iyyer M., Gardner M., Clark C., Lee K. and Zettlemoyer L. (2018), “Deep contextualized word representations”, Proceedings of NAACL-HLT, vol. 1, June 1–6, New Orleans, Louisiana, Association for Computational Linguistics, pp. 2227–2237. https://doi.org/10.18653/v1/N18-1202

18. Potdar K., Pardawala T.S. and Pai C.D. (2017), “A comparative study of categorical variable encoding techniques for neural network classiﬁers”, International Journal of Computer Applications, vol. 175, no. 4, pp. 7–9. https://doi.org/10.5120/IJCA2017915495

19. Shchekotin E.V. (2021), “Digital footprints as a new source of data on quality of life and well-being: an overview of current trends”, Tomsk State University journal, no. 467, pp. 170-181. (In Russian). https://doi.org/10.17223/15617793/467/21

20. Shchekotin E.V., Myagkov M.G., Goiko V.L., Kashpur V.V. and Kovarzh G.Yu. (2020), “Subjective measurement of population ill-being/well-being in the Russian regions based on social media data”, Monitoring obshchestvennogo mneniya: ekonomicheskie i sotsial’nye peremeny, no. 1 (155), pp. 78–116. (In Russian). https://doi.org/10.14515/monitoring.2020.1.05

21. Schober M.F., Pasek J., Guggenheim L., Lampe C. and Conrad F.G. (2016), “Research synthesis: Social media analyses for social measurement”, Public Opinion Quarterly, vol. 80, no. 1, pp. 180–211. https://doi.org/10.1093/poq/nfv048

22. Soumya G.K. and Joseph S. (2014), “Text classiﬁcation by augmenting bag of words (BOW) representation with co-occurrence feature”, IOSR Journal of Computer Engineering, vol. 16, no. 1, pp. 34–38. https://doi.org/10.9790/0661-16153438

23. Tolstova Yu.N. (2015), “Sociology and computer technologies”, Sotsiologicheskie issledovaniya, no. 8 (376), pp. 3–13. (In Russian).

Review

For citations:

Shchekotin E.V., Гойко В.Л., Basina P.A., Bakulin B.B. Using machine learning to study the population life quality: methodological aspects. Digital Sociology. 2022;5(1):87-97. (In Russ.) https://doi.org/10.26425/2658-347X-2022-5-1-87-97

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2658-347X (Print)
ISSN 2713-1653 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Digital Sociology

Using machine learning to study the population life quality: methodological aspects

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy