• 概述
  • 评论
  • 评论 (7)

WebDataExtractor

This is a non-trading expert which transforms unstructured web-pages into structured data tables. It downloads and parses an HTML file, then creates DOM (Document Object Model) for it, and finally applies specified CSS (Cascading Style Sheets) selectors for required fields and extracts data for them. You may consider it as a powerful and highly customizable HTML to CSV (Comma-Separated Values) converter.

The EA can process web-pages from remote sites (using MetaTrader's WebRequest) or local files. To use WebRequest properly make sure you added corresponding domains in the Expert Advisors options dialog.

Demo version of the expert or its full version running in the tester cannot use WebRequests, because of MetaTrader limitation. Only local files are accessible from the tester.

The expert is useful for reading economic news, trading reports, and other information available in Internet but hardly "understandable" by programs due to loose formatting and rich user-friendly decorations in web-pages.
The demo version of the expert randomly chooses some data fields in extracted data and replaces them with the string "demo". This is done intentionally to prevent a possible use of the demo version for smooth extraction of complete data sets in production.

Parameters

  • URL - web-page address (starts with http:// or https://) or local file name to load (must be in HTML format);
  • SaveName - output file name (will be in CSV format);
  • TimerSeconds - period in seconds to reload the URL and process it again; if 0 - the expert does the job only once and then exits;
  • TriggerVariable - optional global variable to check with the timer; if it's specified, the expert checks if the variable exists: if yes - EA starts processing and then removes the variable, if not - EA skips this timer invocation and keeps waiting for next time to check again;
  • RowSelector - CSS selector for table row;
  • ColumnSettingsFile - file in CSV format with selectors for columns (see comments for details);
  • SubstitutionSettingsFile - optional file in CSV format for data substitution rules (see comments for details);
  • TestQuery - test a CSS selector for a row;
  • TestSubQuery - test a CSS selector for a field in the row;
  • LogXXX - enable/disable XXX logging option; 


Main Principles

In order to start data extraction using this expert you need to investigate source code (HTML) of the target web-page. This can be done in any modern browser by means of its built-in tools for web-developers. For example, in Chrome you may open "Developer tools" window by clicking "Settings" button (in the upper right corner of the main window), then choosing "Tools" -> "Developer Tools" in the dropdown menu (the menu items may vary between different browser versions). Please find more details in the official DevTools overview. What is important for us here is the section Inspecting the DOM.

User should locate all data fields of the interest in the web-page and lookup corresponding DOM elements in HTML source. Based on specific attributes of the DOM elements, one should elaborate suitable CSS selectors which will unambiguously pin down a value for every specific data field. The developer window provides a console box, where you can try CSS selectors on the fly. Please, make sure you're trying only those selectors supported by the expert.

Alternatively you can enable DOM logging in the expert (LogDomElements) and then study output in the experts log of MetaTrader. This method lacks interactivity (you can not try selectors right away), but it does not require to deal with external tools. You can test selectors using corresponding input parameters (TestQuery and TestSubQuery). It's advisable to run such experiments on a local file (URL should point to a file instead of Internet address). You can download a web-page from a site into a file using the following parameters: URL - Internet address of the page, SaveName - name of the local file to write (existing file, if any, will be overwritten), RowSelector should be empty.

Supported CSS selectors

Please find the list of supported selectors in the comments.

Examples

Examples are provided in the comments for this product. For every example, HTML fragment can be found with comments added for explanations, and formatting changed for readability.



无评论