[TEIID-3733] Add support for web scraping

Type: Feature Request
Resolution: Obsolete
Priority: Major
Fix Version/s: Backlog
Affects Version/s: None
Component/s: Misc. Connectors
Labels:
None

Add support for web scraping.

Here's one from CA using JSoup - https://github.com/rokhmanov/teiid-translators/blob/master/translator-scrape/src/main/java/com/rokhmanov/teiid/translator/scrape/

relates to

TEIID-3693 Add data source support for reading documents (i.e., RTF, DOC, PDF)

Resolved

Steven Hawkins added a comment - 2020/08/10 11:15 AM

Bulk resolving older issues as out of date.

Steven Hawkins added a comment - 2020/08/10 11:15 AM Bulk resolving older issues as out of date.

Steven Hawkins added a comment - 2017/03/02 8:22 AM

Need to evaluate if these for 9.3 or push to 10.x

Steven Hawkins added a comment - 2017/03/02 8:22 AM Need to evaluate if these for 9.3 or push to 10.x

Steven Hawkins added a comment - 2016/02/02 8:58 AM

Pulling out of 8.12.x. There isn't sufficient time to make progress on this for the server and tooling.

Steven Hawkins added a comment - 2016/02/02 8:58 AM Pulling out of 8.12.x. There isn't sufficient time to make progress on this for the server and tooling.

Steven Hawkins added a comment - 2015/10/07 10:01 AM

Out of all the doc types from ~~TEIID-3693~~ this is the only one where you would conceivably do something in the near term 8.12.x - but even then the scope would have to be sufficiently narrow , such as just the element extraction approach above. And of course a tech preview label.

Steven Hawkins added a comment - 2015/10/07 10:01 AM Out of all the doc types from TEIID-3693 this is the only one where you would conceivably do something in the near term 8.12.x - but even then the scope would have to be sufficiently narrow , such as just the element extraction approach above. And of course a tech preview label.

Steven Hawkins added a comment - 2015/09/30 4:40 PM

Note the jsoup example will extract based upon the jsoup selector - http://jsoup.org/apidocs/org/jsoup/select/Selector.html which is a css like selector syntax. This is somewhat idiomatic to jsoup and the results are simple the set of selected elements - and component information such as inner_text, tag name, id, etc. are returned in the result. For any usage scenarios more logic would be needed to transform the result, and this would not handle tabular data well (at best assuming that you could somewhat easily identify a single html table to extract, you would read the rows, then for each row use the soup extraction again to extract the columns - then a pivot would be needed. however that may not work well in practice unless the table is regular. missing or spanning values would likely be an issue).

Steven Hawkins added a comment - 2015/09/30 4:40 PM Note the jsoup example will extract based upon the jsoup selector - http://jsoup.org/apidocs/org/jsoup/select/Selector.html which is a css like selector syntax. This is somewhat idiomatic to jsoup and the results are simple the set of selected elements - and component information such as inner_text, tag name, id, etc. are returned in the result. For any usage scenarios more logic would be needed to transform the result, and this would not handle tabular data well (at best assuming that you could somewhat easily identify a single html table to extract, you would read the rows, then for each row use the soup extraction again to extract the columns - then a pivot would be needed. however that may not work well in practice unless the table is regular. missing or spanning values would likely be an issue).

Assignee:: Unassigned

Reporter:: Van Halbert (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2015/09/30 12:35 PM

Updated:: 2020/09/14 5:32 AM

Resolved:: 2020/08/10 11:15 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Steven Hawkins added a comment - 2020/08/10 11:15 AM

Expand comment: Steven Hawkins added a comment - 2020/08/10 11:15 AM

Collapse comment: Steven Hawkins added a comment - 2017/03/02 8:22 AM

Expand comment: Steven Hawkins added a comment - 2017/03/02 8:22 AM

Collapse comment: Steven Hawkins added a comment - 2016/02/02 8:58 AM

Expand comment: Steven Hawkins added a comment - 2016/02/02 8:58 AM

Collapse comment: Steven Hawkins added a comment - 2015/10/07 10:01 AM

Expand comment: Steven Hawkins added a comment - 2015/10/07 10:01 AM

Collapse comment: Steven Hawkins added a comment - 2015/09/30 4:40 PM

Expand comment: Steven Hawkins added a comment - 2015/09/30 4:40 PM

People

Dates