Version 1.310709
AutoTools HTML Read function. First off, let's explain what this feature does. In a nutshell, it is a web scraping tool. In laymans terms, it is a feature to retrieve just the contents you want from a webpage, say an image or the current playing song on your favorite radio show.
Now let's go through the menus to explain what each bit does. At least that way, you become a little more familiar with what it has to offer.
Either the URL of the webpage containing the information, a file on your local storage or a variable containing HTML. If it is a URL from a webpage, it will need to be the exact URL as you see in a web browser. Some sites require authentication, which can be either authenticated using the Authenticate feature (explained below) or if the site supports BASIC Auth in the form of https://user:password@website/url/page.php for example. Any page that delivers HTML as it's response can be used (whether the page ends with .html .php .js .cgi etc).
This is where things get complex, both to learn and to explain. Using a specific syntax called JSOUP, it allows us to easily search for specific tags, elements, ids and classes within a web page. When you do Easy Setup, you will notice how this box becomes filled. Understanding HTML is really important here, you do not have to an expert web developer, but ability to be able to understand some of it will greatly help.
The querying is done using JSOUP Selector; https://jsoup.org/apidocs/org/jsoup/select/Selector.html
Our CSS Queries could end up becoming very long and daunting to look at. In turn, the variables AutoTools will generate may be equally as long and daunting. Using this area, we can use whatever name we want for these Queries. So say we have one large search Query, div.image-box img()=:=src, we can set a name here for images and have a simple Array called %images() to use instead. If we are doing multiple search, comma separated, simply give the variables an equal amount of comma separated names.
If you don't want an Array and instead have your data within a singular variable, set this field and the entries from the array will become a variable instead. Each entry will then be separated by whatever character you use here. Most commonly, the comma is used.
Instead of just pulling the information you want, this pulls the entire HTML code. This can be handy to retrieve the source code of a page to paste into / share with a Text editor for viewing.
Some websites depend on JavaScript to deliver their content. Sometimes it will be impossible to get the data you want without setting this option. Fortunately you wont need to know JavaScript to use this feature, it simply allows AutoTools to render the page properly so you can extract the data you want.
Wait this amount of time (in milliseconds) after loading the page to wait to allow JavaScript to render content. Useful on sites that either take a while to load or render data after the rest of the page has loaded. An ideal time for most sites is 2000 (2 seconds).
Depending on which version of a webpage you wish to view, you may want to select Request Desktop Site in order to load the “full” version of the page. However, sometimes the mobile version only has the information. This setting allows you to choose which version of the page you want extract from.
Used to authenticate into some websites, so that extraction can work on sites requiring a login. It may be useful to set “Remember me” on the login page of the service that you use when Authenticating via AutoTools, else you may need to authenticate again before being able to extract at a later time. AutoTools (like your browser will), uses cookies to perform the login later.