To identify attack vectors, we systematically surveyed which potentially dangerous features exist in the PDF specification. We created a comprehensive list with all PDF Actions that can be called. This list contains 18 different actions which we carefully studied.
We selected eight actions – the ones that directly or indirectly allow access to a file handle and may therefore be abused for dangerous features such as URL invocation or writing to files. Having a list of security sensitive actions, we proceeded by investigating all objects and related events which can trigger these actions.
We identified four PDF objects which allow to call arbitrary actions (Page, Annotation, Field, and Catalog). For calling them, most objects offer multiple alternatives. The Catalog object, for example, defines the OpenAction or additional actions (AA) as events. Each event can launch any sequence of PDF actions, e.g., Launch, Thread, etc.. In addition, JavaScript actions can be embedded within documents, opening a new area for attacks. By using JavaScript, for example, new annotations can be created, which can have actions that once again lead to accessing file handles.
The goal of this class of attacks is to build a specially crafted PDF document which enforces processing applications to consume all available resources (i.e., computing time or memory) or causes them to crash.
Inducing an endless loop causes the program execution to get stuck. The PDF standard allows various elements of the document structure to reference to themselves, or to other elements of the same type.
PDF actions allow to specify a Next action to be performed, thereby resulting in "action cycles".
Object streams may extend other object streams allows the crafting of a document with cycles.
PDF documents may contain an outline. Its entries, however, can refer to themselves or each other.
PDF defines "Type 4" calculator functions, for example, to transform colors. Processing hard-to-solve mathematical formulas may lead to high demands of CPU.
Finally, in case the PDF application processes scripts within documents, infinite loops can be induced.
Data amplification attacks based on malicious zip archives are well-known. The first publicly documented DoS attack using a "zip bomb" was conducted in 1996 against a Fidonet BBS administrator. However, not only zip files but also stream objects within PDF documents can be compressed using various algorithms such as Deflate to reduce the overall file size.
The goal of this class of attacks is to track the usage of a document by silently invoking a connection to the attacker's server once the file is opened, or to leak PDF document form data, local files, or NTLM credentials to the attacker.
PDF documents that silently "phone home" should be considered as privacy-invasive. They can be used, for example, to deanonymize reviewers, journalists, or activists behind a shared mailbox. The goal of this attack is to open a backchannel to an attacker controlled server once the PDF file is opened by the victim.
The possibility of malicious URI resolving in PDF documents has been introduced by Hamon [1] who gave an evaluation for URI and SubmitForm actions in Acrobat Reader. We extend their analysis to all standard PDF features that allow to open a URL, such as ImportData, Launch, GoToR, and JavaScript.
Documents can contain forms to be filled out by the user – a feature introduced with PDF version 1.2 in 1996 and used on a daily basis for routine offices tasks, such as travel authorization or vacation requests. The idea of this attack is as follows: The victim downloads a form – a PDF document which contains form fields – from an attacker controlled source and fills it out on the screen, for example, in order to print it. The form is manipulated by the attacker in such a way that it silently, without the user noticing, sends input data to the attacker's server.
The PDF standard defines various methods to embed external files into a document or otherwise access files on the host's file system, as documented below.
If a malicious document managed to firstly read files from the victim’s disk and secondly, send them back to the attacker, such behavior would arguably be critical.
In 1997, Aaron Spangler posted a vulnerability in Windows NT on the Bugtraq mailing list [2]: Any client program can trigger a connection to a rogue SMB server. If the server requests authentication, Windows will automatically try to log in with a hash of the user's credentials. Such captured NTLM hashes allow for efficient offline cracking and can be re-used by applying pass-the-hash or relay attacks to authenticate under the user's identity. In April 2018, Check Point Research [3] showed that a similar attacks can be performed with malicious PDF files. They found that the target of GoToR and GoToE actions can be set to \\attacker.com\dummyfile, thereby leaking credentials in the form of NTLM hashes.
This attack class deals with the capabilities of malicious documents to silently modify form data, to write to local files on the host’s file system, or to show a different content based on the application that is used to open the document.
The idea of this attack is as follows: Similar to form data leakage, the victim obtains a harmlessly looking PDF document from an attacker controlled source, for example, a remittance slip or a tax form. The goal of the attacker is to dynamically, and without knowledge of the victim, manipulate form field data.
As previously described, the PDF standard enables documents to submit form data to external webservers. However, technically the webserver’s URL is defined using a PDF File Specification. This ambiguity in the standard may be interpreted by implementations in such a way that they enable documents to submit PDF form data to a local file, thereby writing to this file.
The goal of this attack is to craft a document that renders differently, depending on the applied PDF interpreter. This can be used, for example, to show different content to different reviewers, to trick content filters (AI-based machines as well as human content moderators), plagiarism detection software, or search engines, which index a different text than the one shown to users when opening the document.
The goal of this attack is to execute attacker controlled code. This can be achieved by silently launching an executable file, embedded within the document, to infect the host with malware. The PDF specification defines the Launch action, which allows documents to launch arbitrary applications. The file to be launched can either be specified by a local path, a network share, a URL, or a file embedded within the PDF document itself.
[1] V. Hamon. "Malicious URI resolving in PDF documents". In: Journal of Computer Virology and Hacking Techniques 9.2 (2013), pp. 65–76.
[2] Aaron Spangler. WinNT/Win95 Automatic Authentication Vulnerability (IE Bug #4). https://insecure.org/sploits/winnt.automatic.authentication.html. Mar. 1997.
[3] Check Point Research. NTLM Credentials Theft via PDF Files. https://research.checkpoint.com/ntlm-credentials-theft-via-pdf-files/. 2018.