|
HTTP-request classification in automatic web application crawling
A. V. Lapkina, A. A. Petukhov Lomonosov Moscow State University
Abstract:
The problem of automatic requests classification, as well as the problem of determining the routing rules for the requests on the server side, is directly connected with analysis of the user interface of dynamic web pages. This problem can be solved at the browser level, since it contains complete information about possible requests arising from interaction interaction between the user and the web application. In this paper, in order to extract the classification features, using data from the request execution context in the web client is suggested. A request context or a request trace is a collection of additional identification data that can be obtained by observing the web page JavaScript code execution or the user interface elements changes as a result of the interface elements activation. Such data, for example, include the position and the style of the element that caused the client request, the JavaScript function call stack, and the changes in the page's DOM tree after the request was initialized. In this study the implementation of the Chrome Developer Tools Protocol is used to solve the problem at the browser level and to automate the request trace selection.
Keywords:
request classification, application crawling, dynamic web application, Chrome DevTools.
Citation:
A. V. Lapkina, A. A. Petukhov, “HTTP-request classification in automatic web application crawling”, Proceedings of ISP RAS, 33:3 (2021), 77–86
Linking options:
https://www.mathnet.ru/eng/tisp600 https://www.mathnet.ru/eng/tisp/v33/i3/p77
|
Statistics & downloads: |
Abstract page: | 468 | Full-text PDF : | 77 | References: | 24 |
|