I ended up with the following pie graph
The sample size was roughly 300+ known APT samples that we have. It wasn't our whole sample set of PDF's but for starters was a decent size. List (top 10) looked like this
Acrobat Web Capture 8.0 (15%)
Adobe LiveCycle Designer ES 8.2 (15%)
Acrobat Web Capture 9.0 (8%)
Python PDF Library - http://pybrary.net/pyPdf/ (7%)
Acrobat Distiller 9.0.0 (Windows) (7%)
Acrobat Distiller 6.0.1 (Windows) (7%)
pdfeTeX-1.21a (7%)
Adobe Acrobat 9.2.0 (4%)
Adobe PDF Library 9.0 (4%)
A number of things amazed me about this data. One of them was the lack of opsec on the attackers perspective, and the old versions of software that they are using. From the offensive perspective if you are dealing with targets that have resources to do deep level forensics and operations then every little bit of opsec is needed. It only takes a small amount of data to put together a large piece of the puzzle.
From the defensive position it points out the ability for defense organizations to do some early detection. I doubt that most organizations are actually keeping track or analyzing what types of clean, business case pdfs come through the front doors. What do the normal clean pdf's coming through your front doors actually look like? Are the clean business case PDFs being created by the
"Python PDF Library - http://pybrary.net/pyPdf/" software? This is a piece of software that is no longer maintained. If you have a standard set of pdf's that come through your front doors and they aren't using strange libraries such as pyPDF then it might be time to create a nice little snort signature and alert on it. I wouldn't recommend blocking at that level (unless you are up for it), but alerting on something simple like that can create extremely large dividends for response/defense teams. Imagine telling your CIO/CISO that you detected and re-mediated APT* attack coming through the front door by a simple snort sig.
From the defensive position it points out the ability for defense organizations to do some early detection. I doubt that most organizations are actually keeping track or analyzing what types of clean, business case pdfs come through the front doors. What do the normal clean pdf's coming through your front doors actually look like? Are the clean business case PDFs being created by the
"Python PDF Library - http://pybrary.net/pyPdf/" software? This is a piece of software that is no longer maintained. If you have a standard set of pdf's that come through your front doors and they aren't using strange libraries such as pyPDF then it might be time to create a nice little snort signature and alert on it. I wouldn't recommend blocking at that level (unless you are up for it), but alerting on something simple like that can create extremely large dividends for response/defense teams. Imagine telling your CIO/CISO that you detected and re-mediated APT* attack coming through the front door by a simple snort sig.
Some of the honorable mentions for that didn't make it into the top 10 are:
Advanced PDF Repair at http://www.pdf-repair.com
Acrobat Web Capture 6.0 (wow that is old)
¦ d o P D F V e r 6 . 2 B u i l d 2 8 8 ( W i n d o w s X P x 3 2 ) *Ya that is the way it show's up
Acrobat Web Capture 6.0 (wow that is old)
¦ d o P D F V e r 6 . 2 B u i l d 2 8 8 ( W i n d o w s X P x 3 2 ) *Ya that is the way it show's up
alientools PDF Generator 1.52
PDFlib 7.0.3 (C++/Win32)
PDFlib 7.0.3 (C++/Win32)
I am getting to the point that you must look at data sets and see what type of information you can gleam from them. This idea might be feasible in your organization and it might not, but you as the defender have the ability to determine that for yourself.
At the end of April (25-26th) we are debuting Rapid Reverse Engineering in New York City with Trail Of Bits http://www.trailofbits.com/training/#rapidre. Rapid Reverse Engineering is a class designed for helping students learn how to rapidly assess files for incident response scenarios.
Aren't most APT PDFs reused from other places/victims? I think the meta data might be related to the originator (another victim maybe?) and not the attacker in this case. The tool they use to inject the exploit code most likely doesn't update most/any of those fields.
ReplyDeleteI would be interested to see if the created date (in meta data) was during a time that the pyPDF was still supported and when the exploit became known. That could tell you some interesting things
I'm not confident that most of these PDFs are re-used, though it certainly does bear further investigation. In the investigations I've seen, they have generally been targeted material.
ReplyDeleteYeah almost all the APT1 payload PDFs were from other origins.
ReplyDeleteGiven the pretty even distribution you see in your data, I believe recommending that people block or build signatures based on this metadata is just hyperbole to promote your training classes. I'm really sad to see this stuff when it happens in infosec.
There is definitely some possible in keying off this metadata. Of course, if you have better decoding/detection/magic working against your e-mail or otherwise inbound PDFs then this might be a moot point. These artifacts perhaps combined with other fields such as author or time might help whittle down those false positives.
ReplyDeleteSource: been doing this exact crap for too long.