Web Scraper: Web Tables

Web Scraper Windows

Web Browser

Web Tables

CDS Tables

(1) Table Display - This window displays the current table from the set of recently downloaded and converted tables or the CDS (whichever is latest).  If you have just downloaded from a web site, you may need to make some adjustments using options in the left-hand pane to adjust file type or assign a header row or delimiter.

(2) Use the Edit Target URL to quickly update the table without having to go back to the Web Browser page.  Note, you may have to manually update the table settings (below) when editing the URL this way.  Use the Choose Web Table to select the desired table when multiple tables are downloaded.

(3) Table Settings - Use these options to adjust how the scraper reads the data into the table.  See detailed descriptions below.

(4) Create As CDS Table - Use this button to save the data to a CDS table.  See description below.

Next, go to the CDS Tables window to see the CDS tables.

Table Settings

Occasionally, the web scraper may have difficulty recognizing the file type on a web site or there may be additional information before the data of interest begins (banners, titles, etc.)  Use these Table Settings options to manually adjust how the data is scraped.

Set Web Filetype - Use this setting to manually select the file type.

Header Row - Use this setting to specify the header row.

Number of Columns - Typically, all columns will have headings in the header row.  However, if there is a column with a blank heading the scraper may not align the data correctly.  Use the Specify option to tell the scraper exactly how many columns to expect.  

Delimiter - Use this setting to specify the delimiter used on the web site.

Refresh Current File - Use this button to refresh the data download or the current file. This is especially useful if you need set the file type and then refresh the data.

Create As Aurora CDS Table

When clicking this button, a popup window will appear. Use this window to name the CDS table and set scheduling, if desired, for recurring data downloads.

Table Name - Use this field to give the table a unique name.

AutoUpdate Frequency - Use the dropdown to select how often to update the table.

Next Update Date - Use the dropdown to schedule the next update date.

Save Settings - Use this setting to specify whether the data will go to a new table or overwrite the existing table.

Base URL and Format Date - Use this section to define how a URL may change dynamically.  For instance, when the web site updates the data daily the URL may also change.  In this example, enter YYYYMMDD in place of the date '20121102'.  Use the check boxes to assign how often the URL changes.

Get Page Link - This setting tells the scraper to get data from links nested within the base URL.  When selected, the first option will scrape the first link it encounters on the page.  The second option looks for specific text in a link.  And the last options performs a custom search for html code as specified in the box.

Then, go to the CDS Tables window to see the content in the CDS table.

 Web Scraper

 Source Text


For further assistance, please contact Aurora Support.

Copyright© 1997-2024 Energy Exemplar LLC. All rights reserved.