Download Undownloadable Pdf Downloading “Undownloadable” Web Pdfs with Fiddler
Total Page:16
File Type:pdf, Size:1020Kb
download undownloadable pdf Downloading “undownloadable” web PDFs with Fiddler. I was once teaching a course in the area of backend software engineering. I didn’t own the course ma t erial, my duties included going over and presenting the slide deck that I had been provided by the course coordinator, answering any outstanding questions from the class, being on time, having lunch, and timely getting lost at 5:30 pm. At the end of the course, naturally, the students asked me to share the slide deck with them so they could go over it on their own. And that’s when the issue revealed itself — the course slides were provided to me via a secure document sharing platform, let’s call it PDFLord [I won’t mention the actual name for the sake of… reasons], which imposed downloading and printing restrictions on all the course PDFs. So, unfortunately, the students had to leave the class empty-handed. However, something didn’t seem right in my mind — if you can see the document on your screen, surely its source is hiding somewhere in the files downloaded/cached by your browser, and consequently the download restriction is artificial in a sense. In this article I will show you a method to overcome these restrictions that I discovered in the two days following the course. My tutorial will assume MacOS (High Sierra) development environment, Chrome browser, and PDFLord platform, but similar steps could be undertaken for other operating systems and other document sharing platforms. To begin with, let’s list the reasons why PDFLord was a bane of my existence: As mentioned before, the PDFs had downloading and printing restrictions (as indicated by the grayed out icons in the top right corner). The PDFs were copy-protected, meaning I could not select any text (as indicated by the “Protected File” pop-up on mouse click). The PDFs were unsearchable, meaning I had to memorize the page numbers of all sections in the course that I wanted to quickly navigate to. There was no fullscreen or present button. My first intuition was to examine the page source files. I will skip the parts where I was randomly clicking through all possible directories and folders while looking for the right files, and instead will go straight to the ones relevant to this tutorial. You can press Command+Shift+C to bring up the developer console in Chrome. Then open the Sources tab. As you can see there is a pdflord.com directory, with a plugins folder under assets . If you scroll down, you will find a folder called pdfjs , which contains two files — pdf.js and viewer.js. It turns out that PDFLord is using an open-source PDF rendering and parsing javascript library by Mozilla, which you can find here https://mozilla.github.io/pdf.js/ Let’s dig through the viewer.js file a bit more. After some inspection we find a method which sounds like it deals with page rendering: Let’s add a breakpoint on line 2141 inside this method right after the pageView variable and reload the page. Our goal is to examine what the object pointed at by this variable represents. After clicking through a bunch of object members… voila! We finally stumble on what we have been looking for — an integer array that very likely represents pixel data of the image of page 1 of the PDF. Surely, now we can just write a script to go over every page in the PDF, extract the image data arrays, convert them to jpegs, and end up with a sequence of images of the PDF file. To be honest, I wasn’t quite satisfied with this finding — I would still not be able to select any text or search through the images. I was looking for a better way. If we examine the viewer.js file a bit more, we find another interesting function: In particular, there is this very intriguing line which looks like it deals with restricting downloads: And then we also find the following sequence which deals with binding events to button click listeners. It’s amusing how the “print” and “download” events are very sloppily commented out, most likely to handle print and download logic in a different part of the code. At this point our action plan is clear: We will rebind one of the buttons to serve as a download button (simply uncommenting the download event listener didn’t work, I didn’t dig too much into why). Change the download permissions logic to not require allowdownload. Proceed to downloading the PDF. To make changes to javascript files returned by a web page we need a man-in-the-middle proxy server. For this purpose, we will be using Fiddler — a free web debugging proxy by Telerik https://www.telerik.com/fiddler. Fiddler was originally developed as a Windows application, and only recently got ported to Mac. On MacOS it runs using Mono — an open-source implementation of .NET Framework. You can follow this tutorial https://www.telerik.com/blogs/introducing-fiddler-for-os-x-beta-1 to install Mono and Fiddler. The only difference is that Fiddler 64bit version doesn’t work on OS X, so you would need to use this command to start Fiddler and avoid errors: Most websites nowadays use https, so we need to configure Fiddler to correctly capture and decrypt https traffic. Open Tools->Options- >HTTPS, and check the Decrypt HTTPS Traffic checkbox. Since Fiddler acts as a proxy, browser traffic gets redirected to it. All browsers know how to protect user data from man-in-the-middle attacks, so they don’t let the traffic be delivered to actors whose certificates are not trusted. To bypass this constraint we click on Actions-> Export Root Certificate To Desktop . Next, open Keychain Access — MacOS app that manages certificates — and drag-n-drop the generated certificate from your desktop to the Keychain window. The certificate will appear as DO_NOT_TRUST_FiddlerRoot. Double click on it, and in the new window select Always Trust. The final step is to actually redirect the traffic from Chrome to Fiddler. Open System Preferences->Network->Advanced->Proxies. Check Web Proxy and Secure Web Proxy , and for both set the host to 127.0.0.1 and the port to 8888. Click Ok , then Apply . You should now start seeing the traffic from your browser in the main Fiddler window. If you don’t see anything, try using an Incognito Window. Now the fun part: hacking the javascript files and serving them in place of the original files. Download (or copy paste) the viewer.js file, open it in your favorite editor, and replace line 10279 with: In short, we are binding the download event to the zoom-in button. Next, remove `PDFViewerApplication.appConfig.allowdownload ` from lines 1475 and 5067 (and anywhere else in the file for that matter): Our substitute viewer.js file is ready for deployment. Find and select the viewer.js resource in Fiddler (you might want to stop capturing traffic to prevent the window from refreshing by disabling File->Capture Traffic). Then in the panel on the right select AutoResponder->Add Rule. In the bottom drop-down menu choose Find File , select your substitute viewer.js file and click Save . Make sure both Enable rules and Unmatched requests passthrough are checked. Aaaaaand… drum roll… we are done! We are ready to download our PDF. Open your Chrome window with the PDF viewer. With your debugging console being open, right click the refresh button and click on Empty Cache and Hard Reload. Don’t forget to reenable Capture Traffic in Fiddler. Emptying the cache is necessary to not let Chrome pick up the original version of viewer.js and instead make it download it again from the web. The downloaded javascript file gets intercepted by Fiddler and replaced with our custom one. Now, whenever you click on the Zoom In button (“+”), your PDF will get downloaded. Great success! Final thoughts and lessons learned: When any data reaches your computer, there is absolutely no way to guarantee its complete integrity. Basing your business model on a premise that the data you share is fully secure and protected is a terrible idea. Hope y’all who got this far had as much fun with this tutorial as I did when fiddling with this challenge. Disclaimer: use at your own risk. Make sure you are not breaching any contracts with your document providers. There is a very obvious potential harm to the business models of the secure document sharing companies. How to Force Download Files from Google Drive. Force Download from Google Drive – Google Drive is only second to Dropbox and you can easily upload and share files using the awesome cloud storage service. In Google Drive, you can pretty much upload and share almost any and all sorts of files. Moreover, you can collaborate with your team members on editing and modify these uploaded files. That being said, Just like in Dropbox, there is no easy way to download files or docs from Google Drive as it can pretty easily handle most of the common file types and displays or renders them directly in your browser. Though this is convenient in most cases, this may be a pain if you want people to download your files instead of viewing them right in their browser. So if you ever want to download files from Google drive, here is a simple tip to force Google Drive to let users download the file instead of rendering them inside the browser.