PhantomJs is an opensource browser that runs headlessly. Learn how to install on Windows and create a quick PhantomJs test. Click on the download link zip file. I am trying to download a pdf file which is accessed by clicking a form button; the filename and download url are unknown. With your version of PhantomJS, after I click the download button, the onFileDownload function is triggered which is the desired effect, but the download fails. I'm trying to download a file using PhantomJS, but when I click to download, no file is downloaded, I read that Phantomjs doesn't support downloads, but I need that, can you help me? Here's the code from just the part when I try to download.
PhantomJS is a headless WebKit scriptable with a JavaScript API multiplatform, available on major operating systems as: Windows, Mac OS X, Linux, and other Unices. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.
PhantomJS by itself has many features as website testing, it allow you to run functional tests with frameworks such as Jasmine, QUnit, Mocha, Capybara, WebDriver, and many others. It allow you to create screen captures, website automatization, manipulation of the document and network monitoring etc.
In this article we'll learn how to manipulate PhantomJS from the command line in Windows and test basic features as screenshots, PDF generation etc.
Requirements
Python Selenium Phantomjs Download File
- A PhantomJS distribution for Windows, you can get the latest version in the download area of the official website here.
Note: there's no installation process as you'll get .zip
file with two folder, examples and bin (which contains phantomjs.exe).
How does PhantomJS works
Imagine a simple web browser like Google Chrome, ready? Now remove the Graphic User Interface (GUI) and you'll get a headless browser, that's basically PhantomJS. They're great for automating and testing web pages programmatically and PhantomJS is one of the best available headless browsers.
Start using PhantomJS from cmd.exe
After the extraction of the download .zip file you'll get 2 folders : examples and bin. In Bin is located the executable of PhantomJS.
First, open the windows terminal cmd.exe and navigate to the bin path of phantomJS executing the following command:
Note: you can simply create an environment variable pointing to the location of Phantomjs.exe and the execute it from wherever you are in the console.
Now that you're located in the path of PhantomJS you'll be able to execute commands easily with phantomjs.
To manipulate phantomjs you'll use mainly Javascript, to execute a phantomjs command it expects the path of a JS file as first parameter.
And that's all ! Now you only need to learn how to write suitable javascript for phantomJS.
For your first exercise, we'll take a screenshot of Our Code World website. Create a screenshot.js
file in the same location of the phantomjs executable :
And include the following code in the screenshot.js
file:
Finally execute the following command in the command prompt :
Wait till is executed see the success message, and open the bin folder again.
Our Screenshot of the website has been created, awesome and really easy isn't ?.
Known windows issues
If the data is not transferred correctly, check if the network works as expected.
Specifically on Windows, the default proxy setting may cause a massive network latency. The workaround is to disable proxy completely, e.g. by launching PhantomJS with --proxy-type=none
command-line argument.
Free Csv Viewer
Conclusion
Now that you know how does PhantomJS basically works, you'll be able to understand the documentation and discover all the awesome features that PhantomJS has to offer.
As always, we encourage you to check out the documentation to learn how to generate even PDF's, remote debuggin etc. Have fun !
When I'm browsing a website A using normal browser (Chrome) and when I click on a link on the website A, Chrome imediatelly downloads report in a form of CSV file.
When I checked a server response headers I get the following results:
Now, I want to download and parse this file using PhantomJS. I set page
onResourceReceived
listener to see if Phantom will receive/download the file.
When I make Phantom request to download a file (this is page.open('URL OF THE FILE')), I can see in Phantom log that file is downloaded. Here are logs:
I received the file and its content, but how to access file data? When I print current PhantomJS page
object, I get the HTML of the page A and I don't want that, I want CSV file, which I need to parse using JavaScript.
3 Answers
I found a solution for PhantomJS. Reading through this discussion I found a jsfiddle which downloads a url via jQuery's ajax method and encodes the file as base64.
The file I wanted to download was plain text (CSV) so I have removed the encoding functions. My target page also already had jQuery included so I didn't need to inject jQuery into the target page.
My code assumes you have already opened the page you want to download the file from using PhantomJS, and that page has jQuery in it. In my case I had to first login to the site in order to get the download link.
Matthew LockMatthew LockAfter days and days of investigation, I have to say that there are some solutions:
- In your evaluate function you can make AJAX call to download and encode your file, then you can return this content back to phantom script
- You can use some custom Phantom library available on some GitHub pages
If you need to download a file using PhanotmJS, then run away from PhantomJS and use CasperJS. CasperJS is based on PhantomJS, but it has much better and intuitive syntax and program flow.
Here is good post explaining 'Why CasperJS is better than PhantomJS'. In this post you can find section about file download.
How to download CSV file using CasperJS (this works even when server sends header Content-Disposition:attachment; filename='file.csv
)
Here you can find some custom csv file available for download: http://captaincoffee.com.au/dump/items.csv
In order to download this file using CasperJS execute the following code:
The code above will download http://captaincoffee.com.au/dump/csv.csv
CSV file and will print results as base64 string. So this way, you don't even have to download data to file, you have your data as base64 string.
If you explicitly want to download file to file system, you can use download
function which is available in CasperJS.
The previous 2 answers assume you can know in advance the URL of the final CSV file. That won't be the case if the link goes to an HTML page that does a Javascript-computed redirect to the file and you don't want to evaluate that Javascript outside of PhantomJS. Your options then are:
- put PhantomJS behind an upstream proxy, and use said upstream proxy to intercept the download URL (and its expected Cookie and Referer headers)—but you'd have to be careful to positively identify the real download URL and not some random data 'blob' if the page makes binary XMLHttpRequests as well;
- instead of PhantomJS use Headless Chrome which can automatically save downloaded files (or Firefox with PyVirtualDisplay, which can also be set to do this, or wait for Headless Firefox) and monitor the downloads directory—but you'd have to be able to figure out by yourself when the download has completed (or use an upstream proxy to monitor it for completion, but Headless Chrome/Firefox cannot currently be set to ignore SSL certificates, which means if the site goes 'secure' it's much more difficult to monitor the requests of Headless Chrome/Firefox than it is to monitor the requests of PhantomJS, at least until Chromium issue 721739 is fixed; you could watch a CONNECT request but if it's kept alive you will have no way of knowing for sure that a transfer has finished);
- put PhantomJS behind an upstream proxy that changes all unknown content types to
text/plain
and deletesContent-Disposition
headers, so you can read the file from PhantomJS in the normal way—that should work for a CSV file but won't work for binaries with 0-bytes in them.
The first of these options (PhantomJS + upstream proxy) is made easier if the upstream proxy can monitor the Accept
header that PhantomJS sends to the remote site. At least in PhantomJS version 2.1.1, main requests have Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
, stylesheet requests have Accept: text/css,*/*;q=0.1
, and all other requests (images, scripts, XMLHttpRequest) default to Accept: */*
although this can be overridden by sites that use XMLHttpRequest.setRequestHeader()
. Therefore if the upstream proxy sees a request with an Accept
header containing text/html
, and passing on this request to the server results in a CSV file or other non-HTML document, then there's a good chance this is the one to save.