• Icon: Feature Request Feature Request
    • Resolution: Obsolete
    • Icon: Major Major
    • Backlog
    • None
    • Misc. Connectors
    • None

          [TEIID-3733] Add support for web scraping

          Bulk resolving older issues as out of date.

          Steven Hawkins added a comment - Bulk resolving older issues as out of date.

          Need to evaluate if these for 9.3 or push to 10.x

          Steven Hawkins added a comment - Need to evaluate if these for 9.3 or push to 10.x

          Pulling out of 8.12.x. There isn't sufficient time to make progress on this for the server and tooling.

          Steven Hawkins added a comment - Pulling out of 8.12.x. There isn't sufficient time to make progress on this for the server and tooling.

          Out of all the doc types from TEIID-3693 this is the only one where you would conceivably do something in the near term 8.12.x - but even then the scope would have to be sufficiently narrow , such as just the element extraction approach above. And of course a tech preview label.

          Steven Hawkins added a comment - Out of all the doc types from TEIID-3693 this is the only one where you would conceivably do something in the near term 8.12.x - but even then the scope would have to be sufficiently narrow , such as just the element extraction approach above. And of course a tech preview label.

          Note the jsoup example will extract based upon the jsoup selector - http://jsoup.org/apidocs/org/jsoup/select/Selector.html which is a css like selector syntax. This is somewhat idiomatic to jsoup and the results are simple the set of selected elements - and component information such as inner_text, tag name, id, etc. are returned in the result. For any usage scenarios more logic would be needed to transform the result, and this would not handle tabular data well (at best assuming that you could somewhat easily identify a single html table to extract, you would read the rows, then for each row use the soup extraction again to extract the columns - then a pivot would be needed. however that may not work well in practice unless the table is regular. missing or spanning values would likely be an issue).

          Steven Hawkins added a comment - Note the jsoup example will extract based upon the jsoup selector - http://jsoup.org/apidocs/org/jsoup/select/Selector.html which is a css like selector syntax. This is somewhat idiomatic to jsoup and the results are simple the set of selected elements - and component information such as inner_text, tag name, id, etc. are returned in the result. For any usage scenarios more logic would be needed to transform the result, and this would not handle tabular data well (at best assuming that you could somewhat easily identify a single html table to extract, you would read the rows, then for each row use the soup extraction again to extract the columns - then a pivot would be needed. however that may not work well in practice unless the table is regular. missing or spanning values would likely be an issue).

            Unassigned Unassigned
            van.halbert Van Halbert (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: