[TEIIDSB-170] Automate materialization to JDG

Ramesh Reddy added a comment - 2020/05/13 4:25 PM

Initial automation of the Infinispan cluster is finished, there is further work as defined in ~~TEIIDSB-169~~

The current documentation can be found at https://github.com/teiid/teiid-openshift-examples/blob/master/materializing.adoc

rhn-engineering-shawkins can you review the above docs and we will change or enhance accordingly.

Ramesh Reddy added a comment - 2020/05/13 4:25 PM Initial automation of the Infinispan cluster is finished, there is further work as defined in TEIIDSB-169 The current documentation can be found at https://github.com/teiid/teiid-openshift-examples/blob/master/materializing.adoc rhn-engineering-shawkins can you review the above docs and we will change or enhance accordingly.

Ramesh Reddy added a comment - 2020/05/11 6:07 PM

> Especially if there is a danger that this will get forked for standalone lsp efforts.

Did you mean, since we are going to fork the parser for LSP?

Ramesh Reddy added a comment - 2020/05/11 6:07 PM > Especially if there is a danger that this will get forked for standalone lsp efforts. Did you mean, since we are going to fork the parser for LSP?

Ramesh Reddy added a comment - 2020/05/11 5:57 PM

> In other posts I believe you've advocated for not worrying about the efficiency of the operator,

yes, in other posts I talked about getting the functionality but this what you asking is at a different level to bring whole different model where we are leasing out other programs to define functionality. by no means spinning up a java process is quick and may require more resources allocations etc.

>Can you elaborate what that means? You're trying to based upon the ddl assign cache names that will be associated to views - which could be a combination >of vdb, schema, view name, deployment number - how does a prefix come into play?

The cache names generated already have prefix of vdb, schema, view-name and deployment number. I am saying instead of looking up each individual cache, we can just look up by vdb and deployment number as prefix to collect all the names that match for destruction or metrics purposes. Right now Infinispan does not have this feature but will have it in next revision. Even if had the full name we can not destroy a individual cache in this version anyway, so no functionality is lost.

Ramesh Reddy added a comment - 2020/05/11 5:57 PM > In other posts I believe you've advocated for not worrying about the efficiency of the operator, yes, in other posts I talked about getting the functionality but this what you asking is at a different level to bring whole different model where we are leasing out other programs to define functionality. by no means spinning up a java process is quick and may require more resources allocations etc. >Can you elaborate what that means? You're trying to based upon the ddl assign cache names that will be associated to views - which could be a combination >of vdb, schema, view name, deployment number - how does a prefix come into play? The cache names generated already have prefix of vdb, schema, view-name and deployment number. I am saying instead of looking up each individual cache, we can just look up by vdb and deployment number as prefix to collect all the names that match for destruction or metrics purposes. Right now Infinispan does not have this feature but will have it in next revision. Even if had the full name we can not destroy a individual cache in this version anyway, so no functionality is lost.

Steven Hawkins added a comment - 2020/05/11 5:34 PM

> It is not the target name, it is internal cache name inside the cluster I am after, target name is defined on the View, cacheName defined on the Infinispan's foreign table's options.

You are differentiating between the teiid fqn for the materialization source target table and the name in source for the target table. Since you are generating only a single materialization schema that is hitting the same materialization target for the whole vdb - these are nearly identical. The teiid fqn is just qualified by the materialization schema name. In any case I'm not saying that it has to exactly be the materialization target property - it can be something else that conveys it's a base name for automation, which will at least be augmented by the "deployment number".

> sigh, I am not questioning if that can be done or not it is a model of execution. I would need to design a whole new process which may not be efficient at all, one good thing with the model is I can validate the DDL at the start and stop progress if DDL is invalid.

In other posts I believe you've advocated for not worrying about the efficiency of the operator, which makes sense given that the build will far out way anything that we're doing otherwise. So if there are other benefits, why worry about efficiency for just this?

> If you are saying older user's DDL fits to that paradigm then I am saying we should not make that as the decision to choose alternatives designs to accommodate it just yet.

Why lock in to any legacy decisions with regards to DDL structure into the operator? Especially if there is a danger that this will get forked for standalone lsp efforts.

> What you think about ignoring all this just going by the just the cache prefix?

Can you elaborate what that means? You're trying to based upon the ddl assign cache names that will be associated to views - which could be a combination of vdb, schema, view name, deployment number - how does a prefix come into play?

Steven Hawkins added a comment - 2020/05/11 5:34 PM > It is not the target name, it is internal cache name inside the cluster I am after, target name is defined on the View, cacheName defined on the Infinispan's foreign table's options. You are differentiating between the teiid fqn for the materialization source target table and the name in source for the target table. Since you are generating only a single materialization schema that is hitting the same materialization target for the whole vdb - these are nearly identical. The teiid fqn is just qualified by the materialization schema name. In any case I'm not saying that it has to exactly be the materialization target property - it can be something else that conveys it's a base name for automation, which will at least be augmented by the "deployment number". > sigh, I am not questioning if that can be done or not it is a model of execution. I would need to design a whole new process which may not be efficient at all, one good thing with the model is I can validate the DDL at the start and stop progress if DDL is invalid. In other posts I believe you've advocated for not worrying about the efficiency of the operator, which makes sense given that the build will far out way anything that we're doing otherwise. So if there are other benefits, why worry about efficiency for just this? > If you are saying older user's DDL fits to that paradigm then I am saying we should not make that as the decision to choose alternatives designs to accommodate it just yet. Why lock in to any legacy decisions with regards to DDL structure into the operator? Especially if there is a danger that this will get forked for standalone lsp efforts. > What you think about ignoring all this just going by the just the cache prefix? Can you elaborate what that means? You're trying to based upon the ddl assign cache names that will be associated to views - which could be a combination of vdb, schema, view name, deployment number - how does a prefix come into play?

Ramesh Reddy added a comment - 2020/05/11 4:54 PM - edited

> Recall that only in the internal materialization case did we not require the setting of a materialization target name.

It is not the target name, it is internal cache name inside the cluster I am after, target name is defined on the View, cacheName defined on the Infinispan's foreign table's options.

> In short yes. There should certainly be examples of spinning up a java process from go.

sigh, I am not questioning if that can be done or not it is a model of execution. I would need to design a whole new process which may not be efficient at all, one good thing with the model is I can validate the DDL at the start and stop progress if DDL is invalid.

> I don't fully understand what you are advocating here.

You mentioned that schema variance from backward compatibility, I am saying if the user already writing the DDL based on your strict schema parsing, where this issue arises? If you are saying older user's DDL fits to that paradigm then I am saying we should not make that as the decision to choose alternatives designs to accommodate it just yet.

What you think about ignoring all this just going by the just the cache prefix?

Ramesh Reddy added a comment - 2020/05/11 4:54 PM - edited > Recall that only in the internal materialization case did we not require the setting of a materialization target name. It is not the target name, it is internal cache name inside the cluster I am after, target name is defined on the View, cacheName defined on the Infinispan's foreign table's options. > In short yes. There should certainly be examples of spinning up a java process from go. sigh, I am not questioning if that can be done or not it is a model of execution. I would need to design a whole new process which may not be efficient at all, one good thing with the model is I can validate the DDL at the start and stop progress if DDL is invalid. > I don't fully understand what you are advocating here. You mentioned that schema variance from backward compatibility, I am saying if the user already writing the DDL based on your strict schema parsing, where this issue arises? If you are saying older user's DDL fits to that paradigm then I am saying we should not make that as the decision to choose alternatives designs to accommodate it just yet. What you think about ignoring all this just going by the just the cache prefix?

Steven Hawkins added a comment - 2020/05/11 4:18 PM

> I think it is bad to ask user for more input based because of the your processing model, let's say if we are going to materialize to database, do we need this? no. Then why it should be different for this data grid case.

Whether we need the target name or not is an implementation decision - which isn't really based upon data grid. Recall that only in the internal materialization case did we not require the setting of a materialization target name. For all others we did.

if for whatever reason it's too much work (and/or we want to differentiate between in-process internal) to automate around the cache name, regardless of the materialization source type, then it's not too much of a stretch to still require it.

> Unless we want to start writing an Operator in Java I do not how to do that. You are asking spin an external process out to java and execute and capture the results in GO operator.

In short yes. There should certainly be examples of spinning up a java process from go. If not it's really easy to imagine a service, probably serverless, that could be provided whatever input ddl and could provide an output of what you need to know.

> I do not agree completely, I suggest we keep thinking this as new development, then how we can bring old customers if any into this model, not the other way around. We can not compromise on the designs for it IMO.

I don't fully understand what you are advocating here. I'm not suggesting that stuff based upon the latest Teiids isn't somehow a "new/different", but rather that it drags along a lot of baggage because of the architecture - which you are proposing to base the operator on.

Steven Hawkins added a comment - 2020/05/11 4:18 PM > I think it is bad to ask user for more input based because of the your processing model, let's say if we are going to materialize to database, do we need this? no. Then why it should be different for this data grid case. Whether we need the target name or not is an implementation decision - which isn't really based upon data grid. Recall that only in the internal materialization case did we not require the setting of a materialization target name. For all others we did. if for whatever reason it's too much work (and/or we want to differentiate between in-process internal) to automate around the cache name, regardless of the materialization source type, then it's not too much of a stretch to still require it. > Unless we want to start writing an Operator in Java I do not how to do that. You are asking spin an external process out to java and execute and capture the results in GO operator. In short yes. There should certainly be examples of spinning up a java process from go. If not it's really easy to imagine a service, probably serverless, that could be provided whatever input ddl and could provide an output of what you need to know. > I do not agree completely, I suggest we keep thinking this as new development, then how we can bring old customers if any into this model, not the other way around. We can not compromise on the designs for it IMO. I don't fully understand what you are advocating here. I'm not suggesting that stuff based upon the latest Teiids isn't somehow a "new/different", but rather that it drags along a lot of baggage because of the architecture - which you are proposing to base the operator on.

Ramesh Reddy added a comment - 2020/05/11 3:46 PM

> It's better to require slightly more user input, than to introduce a brittle solution.

I think it is bad to ask user for more input based because of the your processing model, let's say if we are going to materialize to database, do we need this? no. Then why it should be different for this data grid case.

> there are instances where it's better to have an in-process and in-memory cache, which we can't express here correct?

Yes, you can express it, see that table I provided above. The create flag can be tuned off, but it is not by default.

> If you want foolproof, then we'd make an out of process call to the existing parser.

Unless we want to start writing an Operator in Java I do not how to do that. You are asking spin an external process out to java and execute and capture the results in GO operator.

The other we could do, we start all the caches with "resource name" as prefix or post fix, then delete all of them that matches to that. It can not any simpler than that.

Ramesh Reddy added a comment - 2020/05/11 3:46 PM > It's better to require slightly more user input, than to introduce a brittle solution. I think it is bad to ask user for more input based because of the your processing model, let's say if we are going to materialize to database, do we need this? no. Then why it should be different for this data grid case. > there are instances where it's better to have an in-process and in-memory cache, which we can't express here correct? Yes, you can express it, see that table I provided above. The create flag can be tuned off, but it is not by default. > If you want foolproof, then we'd make an out of process call to the existing parser. Unless we want to start writing an Operator in Java I do not how to do that. You are asking spin an external process out to java and execute and capture the results in GO operator. The other we could do, we start all the caches with "resource name" as prefix or post fix, then delete all of them that matches to that. It can not any simpler than that.

Steven Hawkins added a comment - 2020/05/11 2:38 PM

> I am saying any of these require further user input that that takes away the usability of it as user expected to additional input.

It's better to require slightly more user input, than to introduce a brittle solution. It also for example allows the user to differentiate between internal and datagrid - there are instances where it's better to have an in-process and in-memory cache, which we can't express here correct?

> It is not like language features keep extending, these 3 different statements that we are interested in IMO are much static in nature.

That is not entirely correct and is building assumptions of context around artificial limitations we introduced for backwards compatibility - such as create schema cannot take a statement list (which are not semicolon delimited), create view, etc. cannot be schema qualified, option values being literals (which had changed from the initial incarnation to accept things other than strings), etc. Why start down the path of baking that into another place?

> if we want foolproof one this is it.

If you want foolproof, then we'd make an out of process call to the existing parser.

Steven Hawkins added a comment - 2020/05/11 2:38 PM > I am saying any of these require further user input that that takes away the usability of it as user expected to additional input. It's better to require slightly more user input, than to introduce a brittle solution. It also for example allows the user to differentiate between internal and datagrid - there are instances where it's better to have an in-process and in-memory cache, which we can't express here correct? > It is not like language features keep extending, these 3 different statements that we are interested in IMO are much static in nature. That is not entirely correct and is building assumptions of context around artificial limitations we introduced for backwards compatibility - such as create schema cannot take a statement list (which are not semicolon delimited), create view, etc. cannot be schema qualified, option values being literals (which had changed from the initial incarnation to accept things other than strings), etc. Why start down the path of baking that into another place? > if we want foolproof one this is it. If you want foolproof, then we'd make an out of process call to the existing parser.

Ramesh Reddy added a comment - 2020/05/11 2:01 PM

> They simply are providing a materialization target (or similar) unique name, which is used to drive the base names of the caches created.

I am saying any of these require further user input that that takes away the usability of it as user expected to additional input.

> I'm not looking forward to maintaining two parsers. I'd prefer most of the other solutions to this one.

IMO, one does not need to write a full parser, just enough for the commands, and this is the best option for complete automation. It is not like language features keep extending, these 3 different statements that we are interested in IMO are much static in nature. The problem of cache names is not alone, the issue also exists for maven pom.xml generation, if we want foolproof one this is it.

Ramesh Reddy added a comment - 2020/05/11 2:01 PM > They simply are providing a materialization target (or similar) unique name, which is used to drive the base names of the caches created. I am saying any of these require further user input that that takes away the usability of it as user expected to additional input. > I'm not looking forward to maintaining two parsers. I'd prefer most of the other solutions to this one. IMO, one does not need to write a full parser, just enough for the commands, and this is the best option for complete automation. It is not like language features keep extending, these 3 different statements that we are interested in IMO are much static in nature. The problem of cache names is not alone, the issue also exists for maven pom.xml generation, if we want foolproof one this is it.

Steven Hawkins added a comment - 2020/05/11 1:12 PM - edited

> Actually, this does not work, the scanning we are looking at is done before the maven code-gen is run, so most of the techniques fall into the user's lap to have the knowledge to enter these, thus failing the automation step.

I don't follow you on this. They simply are providing a materialization target (or similar) unique name, which is used to drive the base names of the caches created.

> I have started looking at yacc passed parser in GO to see if I can write a parser for only the DDL statements that we need currently. Steven Hawkins can you give examples of inline comments in SQL for these as you mentioned above? Looking through SQL grammar file in Teiid, I find only one place like

I'm not looking forward to maintaining two parsers. I'd prefer most of the other solutions to this one.

> I was not sure any others exist?

End of line comments

 -- I'm a comment

And inline/multi-line comments are supported - I even added comment nesting to further complicate things.

/* I'm a big
    comment
*/

/* I'm a big
    comment
    /* with a nested comment */
*/

Steven Hawkins added a comment - 2020/05/11 1:12 PM - edited > Actually, this does not work, the scanning we are looking at is done before the maven code-gen is run, so most of the techniques fall into the user's lap to have the knowledge to enter these, thus failing the automation step. I don't follow you on this. They simply are providing a materialization target (or similar) unique name, which is used to drive the base names of the caches created. > I have started looking at yacc passed parser in GO to see if I can write a parser for only the DDL statements that we need currently. Steven Hawkins can you give examples of inline comments in SQL for these as you mentioned above? Looking through SQL grammar file in Teiid, I find only one place like I'm not looking forward to maintaining two parsers. I'd prefer most of the other solutions to this one. > I was not sure any others exist? End of line comments -- I'm a comment And inline/multi-line comments are supported - I even added comment nesting to further complicate things. /* I'm a big comment */ /* I'm a big comment /* with a nested comment */ */

Ramesh Reddy added a comment - 2020/05/11 10:39 AM

>> by reusing the MATERIALIZATION_TARGET property
>This will work actually, this directly represents the cache name. I am ok with this. This is the most simple one.

Actually, this does not work, the scanning we are looking at is done before the maven code-gen is run, so most of the techniques fall into the user's lap to have the knowledge to enter these, thus failing the automation step.

> but yes a full parser of these multiple statements is probably a better way to do this.

I have started looking at yacc passed parser in GO to see if I can write a parser for only the DDL statements that we need currently. rhn-engineering-shawkins can you give examples of inline comments in SQL for these as you mentioned above? Looking through SQL grammar file in Teiid, I find only one place like

CREATE VIEW {Name} ( columns ...) OPTIONS (...) AS /* comment */ SELECT ...

I was not sure any others exist?

Ramesh Reddy added a comment - 2020/05/11 10:39 AM >> by reusing the MATERIALIZATION_TARGET property >This will work actually, this directly represents the cache name. I am ok with this. This is the most simple one. Actually, this does not work, the scanning we are looking at is done before the maven code-gen is run, so most of the techniques fall into the user's lap to have the knowledge to enter these, thus failing the automation step. > but yes a full parser of these multiple statements is probably a better way to do this. I have started looking at yacc passed parser in GO to see if I can write a parser for only the DDL statements that we need currently. rhn-engineering-shawkins can you give examples of inline comments in SQL for these as you mentioned above? Looking through SQL grammar file in Teiid, I find only one place like CREATE VIEW {Name} ( columns ...) OPTIONS (...) AS /* comment */ SELECT ... I was not sure any others exist?

Ramesh Reddy added a comment - 2020/05/04 4:58 PM

>introduce a fuller parser to the operator. Problems still exist as you are doing a lot of contextual parsing here - assumptions about what schema we're creating >things under for example. If we add support for create view schema.name - then that would break this.

Currently, with Regex I did try to capture the context, but yes a full parser of these multiple statements is probably a better way to do this.

>Move the crd creation to Teiid Spring Boot where we are currently implicitly creating the caches. If we don't do this, we'll need to set the translator to not create >if a cache does not exist.

This is more to do with deletion, so IMO this does not help

> Use an explicit cr construct:

Do not like this, IMO the regex has a better chance than this.

> by reusing the MATERIALIZATION_TARGET property

This will work actually, this directly represents the cache name. I am ok with this. This is the most simple one.

> Maybe a table of who creates/destroys the secret, the cluster, and the caches.

Secret Name	Owner (who creates cluster)	"Create" Key in Secret	On VDB delete	Cluster Shared
teiid-cache-store	User	NO	caches removed (not currently)	Yes, across all vdbs and their versions
`vdb`-cache-store	User	NO	caches removed (not currently)	No, only for the given versions of the VDB
`vdb`-cache-store	Operator	Yes	Operator removes the cluster	No, only for the given versions of the VDB
none	Operator	Yes	cluster removed	No (only for the given versions of the VDB

in all instances the Infinispan Operator is available, if not available then feature is turned off and Operator will not generate the materialized model

Ramesh Reddy added a comment - 2020/05/04 4:58 PM >introduce a fuller parser to the operator. Problems still exist as you are doing a lot of contextual parsing here - assumptions about what schema we're creating >things under for example. If we add support for create view schema.name - then that would break this. Currently, with Regex I did try to capture the context, but yes a full parser of these multiple statements is probably a better way to do this. >Move the crd creation to Teiid Spring Boot where we are currently implicitly creating the caches. If we don't do this, we'll need to set the translator to not create >if a cache does not exist. This is more to do with deletion, so IMO this does not help > Use an explicit cr construct: Do not like this, IMO the regex has a better chance than this. > by reusing the MATERIALIZATION_TARGET property This will work actually, this directly represents the cache name. I am ok with this. This is the most simple one. > Maybe a table of who creates/destroys the secret, the cluster, and the caches. Secret Name Owner (who creates cluster) "Create" Key in Secret On VDB delete Cluster Shared teiid-cache-store User NO caches removed (not currently) Yes, across all vdbs and their versions vdb -cache-store User NO caches removed (not currently) No, only for the given versions of the VDB vdb -cache-store Operator Yes Operator removes the cluster No, only for the given versions of the VDB none Operator Yes cluster removed No (only for the given versions of the VDB in all instances the Infinispan Operator is available, if not available then feature is turned off and Operator will not generate the materialized model

Steven Hawkins added a comment - 2020/05/04 2:28 PM

Continued from the pr about updating the regex for parsing comments and new lines:

It's not really about a single comment, it's the complexity that it introduces to these regex expressions - anywhere you can have whitespace you can have a comment - and multi-line comments I believe would be pretty messy to capture.

> need the FQN of the view names to manage the cache when the Infinispan moves to support Cache as CR, that is is the reason for additional parsing. Otherwise, there are no other good options.

Options:

introduce a fuller parser to the operator. Problems still exist as you are doing a lot of contextual parsing here - assumptions about what schema we're creating things under for example. If we add support for create view schema.name - then that would break this.

Move the crd creation to Teiid Spring Boot where we are currently implicitly creating the caches. If we don't do this, we'll need to set the translator to not create if a cache does not exist.

Use an explicit cr construct:
```
  datagrid:
     - viewname1: basecachename
     - viewname2 ...
```
where the view name would need to be fully qualified and base cache name would be manipulated to account for multiple deployments.

Or require something similar in DDL - by reusing the MATERIALIZATION_TARGET property (as an unqualfied name) or a new property that indicates the cache name to use. That simplifies the parsing to just that property key/value. This would require some additional logic in the codegen plugin more than likely.

Have the operator run a Java "vdb service" pod that provides a rest interface for getting vdb information from submitted ddl.

A general thought is that we'll need to default to transactional caches.

I'd also like to have it spelled out the various creation / deletion scenarios we are proposing to support. It looks like so far we have:

teiid-cache-store indicates a cache store to be used by all dv in the namespace
vdb-cache-store indicates a cache store to be used by all deployments of the given vdb
???
Maybe a table of who creates/destroys the secret, the cluster, and the caches.

Steven Hawkins added a comment - 2020/05/04 2:28 PM Continued from the pr about updating the regex for parsing comments and new lines: It's not really about a single comment, it's the complexity that it introduces to these regex expressions - anywhere you can have whitespace you can have a comment - and multi-line comments I believe would be pretty messy to capture. > need the FQN of the view names to manage the cache when the Infinispan moves to support Cache as CR, that is is the reason for additional parsing. Otherwise, there are no other good options. Options: introduce a fuller parser to the operator. Problems still exist as you are doing a lot of contextual parsing here - assumptions about what schema we're creating things under for example. If we add support for create view schema.name - then that would break this. Move the crd creation to Teiid Spring Boot where we are currently implicitly creating the caches. If we don't do this, we'll need to set the translator to not create if a cache does not exist. Use an explicit cr construct: datagrid: - viewname1: basecachename - viewname2 ... where the view name would need to be fully qualified and base cache name would be manipulated to account for multiple deployments. Or require something similar in DDL - by reusing the MATERIALIZATION_TARGET property (as an unqualfied name) or a new property that indicates the cache name to use. That simplifies the parsing to just that property key/value. This would require some additional logic in the codegen plugin more than likely. Have the operator run a Java "vdb service" pod that provides a rest interface for getting vdb information from submitted ddl. A general thought is that we'll need to default to transactional caches. I'd also like to have it spelled out the various creation / deletion scenarios we are proposing to support. It looks like so far we have: teiid-cache-store indicates a cache store to be used by all dv in the namespace vdb-cache-store indicates a cache store to be used by all deployments of the given vdb ??? Maybe a table of who creates/destroys the secret, the cluster, and the caches.

Ramesh Reddy added a comment - 2020/05/04 10:15 AM

However, I do see that in rollup deployments, the cache gets new names without collisions but since it reuses the same Infinispan cluster, it does not clear the previous entries. This is an issue can be only solved when Infinispan supports the CR based cache creation and deletion.

Ramesh Reddy added a comment - 2020/05/04 10:15 AM However, I do see that in rollup deployments, the cache gets new names without collisions but since it reuses the same Infinispan cluster, it does not clear the previous entries. This is an issue can be only solved when Infinispan supports the CR based cache creation and deletion.

Ramesh Reddy added a comment - 2020/05/01 6:19 PM - edited

Will be adding the following environment properties to the Pod from the Operator

TEIID_NODENAME - OpenShift Node Name where the Pod exists
TEIID_PODNAME - Pod Name

The Materialization tables will be added with a version name in them, so that there will be no collisions in rollup deployments of VDBs.

Ramesh Reddy added a comment - 2020/05/01 6:19 PM - edited Will be adding the following environment properties to the Pod from the Operator TEIID_NODENAME - OpenShift Node Name where the Pod exists TEIID_PODNAME - Pod Name The Materialization tables will be added with a version name in them, so that there will be no collisions in rollup deployments of VDBs.

Ramesh Reddy added a comment - 2020/04/29 4:06 PM

Where caches in Infinispan can be managed as Custom Resources. Teiid would need this to manage caches when working with shared clusters

https://github.com/infinispan/infinispan-operator/issues/356

Using a REST API https://infinispan.org/docs/dev/titles/rest/rest.html#rest_v2_remove_cache

However, in Openshift there is no route created to reach the HTTP endpoint

Ramesh Reddy added a comment - 2020/04/29 4:06 PM Where caches in Infinispan can be managed as Custom Resources. Teiid would need this to manage caches when working with shared clusters https://github.com/infinispan/infinispan-operator/issues/356 Using a REST API https://infinispan.org/docs/dev/titles/rest/rest.html#rest_v2_remove_cache However, in Openshift there is no route created to reach the HTTP endpoint

Ramesh Reddy added a comment - 2020/04/22 7:51 AM

After thinking a little more, I think the user should not create `CacheStore` in the YAML file at all, as all the materialization decoration/enhancements are occurring without user input, this should be no different.

Instead the user just deploys a YAML file, however if Operator finds a secret with name "

{vdb-name}

-cachestore" or "teiid-cacheStore" then it can read credentials from the secret and configures the rest as defined in above comment. If not found, no changes will be made to the VDB.

Ramesh Reddy added a comment - 2020/04/22 7:51 AM After thinking a little more, I think the user should not create `CacheStore` in the YAML file at all, as all the materialization decoration/enhancements are occurring without user input, this should be no different. Instead the user just deploys a YAML file, however if Operator finds a secret with name " {vdb-name} -cachestore" or "teiid-cacheStore" then it can read credentials from the secret and configures the rest as defined in above comment. If not found, no changes will be made to the VDB.

Ramesh Reddy added a comment - 2020/04/21 8:37 PM

The approach I will be taking is using a shared Infinispan cluster, where user is responsible for creating the cluster and providing the connection details in the form of a `Datasource` properties in the YAML file.

The DataSource name MUST be named `CacheStore`, when Operator runs it will find this Datastore and generate and run the S2I build, in the maven build then uses `vdb-codegen-plugin` a generate a new VDB, where all the Views with `materialized = true` flag without an external materialization target set are picked and copied/cloned into a separate schema called `materialized` with their fully qualified name. The status table also gets created in the same schema. A VDB with the name `materialized.ddl` gets generated and put on the classpath.

Then Operator in the next step will adjust the `application.properties` to point to the new `materialized.ddl` VDB and loads the VDB.

Ramesh Reddy added a comment - 2020/04/21 8:37 PM The approach I will be taking is using a shared Infinispan cluster, where user is responsible for creating the cluster and providing the connection details in the form of a `Datasource` properties in the YAML file. The DataSource name MUST be named `CacheStore`, when Operator runs it will find this Datastore and generate and run the S2I build, in the maven build then uses `vdb-codegen-plugin` a generate a new VDB, where all the Views with `materialized = true` flag without an external materialization target set are picked and copied/cloned into a separate schema called `materialized` with their fully qualified name. The status table also gets created in the same schema. A VDB with the name `materialized.ddl` gets generated and put on the classpath. Then Operator in the next step will adjust the `application.properties` to point to the new `materialized.ddl` VDB and loads the VDB.

Steven Hawkins added a comment - 2020/03/10 10:32 AM

There are two approaches:

1. Handle this as part of codegen based upon ddl/cr. This minimizes changes to the core.

2. Push the handling into the core. This requires adding the concept of an implicit source, something that can be injected into the vdb like our system schema, but with a schema that is generated based upon the materialization targets.

Either way we have an issue of discoverability - if/where is the infinipsan operator installed.

A lot depends on the assumption of an infinispan cluster per vdb, or a shared cluster for all vdbs.

If shared then we need to account for that in the cache naming strategy - in general it be vdb_table_fqn (version would need to be accounted for in core version). We would also only delete the needed caches on start-up/shut-down, rather than destroying the entire cluster. Note the need for the fqn as we can have materialized views in multiple schema in vdb in general, which would all point to the same source schema/cluster.

If we do create a cluster per vdb, then the operator will likely need we need the operator to manage the creation / destruction of the cluster (using a cr like the one shown with the transactional cache example) upon the vdb being deployed / undeployed.

There is also an issue of security. We can hide the materialization target schema, but we also need to consider if access is allowed at all. At least for syndesis org.teiid.hiddenMetadataResolvable is always set to false so there's not an issue there. In core/teiid spring boot we may need to drive that on a per schema basis.

Also captured ~~TEIID-5916~~ so that we can reach feature parity with internal materialization indexing. It appears we need to change how we are using the Indexed annotation.

Steven Hawkins added a comment - 2020/03/10 10:32 AM There are two approaches: 1. Handle this as part of codegen based upon ddl/cr. This minimizes changes to the core. 2. Push the handling into the core. This requires adding the concept of an implicit source, something that can be injected into the vdb like our system schema, but with a schema that is generated based upon the materialization targets. Either way we have an issue of discoverability - if/where is the infinipsan operator installed. A lot depends on the assumption of an infinispan cluster per vdb, or a shared cluster for all vdbs. If shared then we need to account for that in the cache naming strategy - in general it be vdb_table_fqn (version would need to be accounted for in core version). We would also only delete the needed caches on start-up/shut-down, rather than destroying the entire cluster. Note the need for the fqn as we can have materialized views in multiple schema in vdb in general, which would all point to the same source schema/cluster. If we do create a cluster per vdb, then the operator will likely need we need the operator to manage the creation / destruction of the cluster (using a cr like the one shown with the transactional cache example) upon the vdb being deployed / undeployed. There is also an issue of security. We can hide the materialization target schema, but we also need to consider if access is allowed at all. At least for syndesis org.teiid.hiddenMetadataResolvable is always set to false so there's not an issue there. In core/teiid spring boot we may need to drive that on a per schema basis. Also captured TEIID-5916 so that we can reach feature parity with internal materialization indexing. It appears we need to change how we are using the Indexed annotation.

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Ramesh Reddy added a comment - 2020/05/13 4:25 PM

Expand comment: Ramesh Reddy added a comment - 2020/05/13 4:25 PM

Collapse comment: Ramesh Reddy added a comment - 2020/05/11 6:07 PM

Expand comment: Ramesh Reddy added a comment - 2020/05/11 6:07 PM

Collapse comment: Ramesh Reddy added a comment - 2020/05/11 5:57 PM

Expand comment: Ramesh Reddy added a comment - 2020/05/11 5:57 PM

Collapse comment: Steven Hawkins added a comment - 2020/05/11 5:34 PM

Expand comment: Steven Hawkins added a comment - 2020/05/11 5:34 PM

Collapse comment: Ramesh Reddy added a comment - 2020/05/11 4:54 PM, Edited by Ramesh Reddy - 2020/05/11 4:54 PM

Expand comment: Ramesh Reddy added a comment - 2020/05/11 4:54 PM, Edited by Ramesh Reddy - 2020/05/11 4:54 PM

Collapse comment: Steven Hawkins added a comment - 2020/05/11 4:18 PM

Expand comment: Steven Hawkins added a comment - 2020/05/11 4:18 PM

Collapse comment: Ramesh Reddy added a comment - 2020/05/11 3:46 PM

Expand comment: Ramesh Reddy added a comment - 2020/05/11 3:46 PM

Collapse comment: Steven Hawkins added a comment - 2020/05/11 2:38 PM

Expand comment: Steven Hawkins added a comment - 2020/05/11 2:38 PM

Collapse comment: Ramesh Reddy added a comment - 2020/05/11 2:01 PM

Expand comment: Ramesh Reddy added a comment - 2020/05/11 2:01 PM

Collapse comment: Steven Hawkins added a comment - 2020/05/11 1:12 PM, Edited by Steven Hawkins - 2020/05/11 1:22 PM

Expand comment: Steven Hawkins added a comment - 2020/05/11 1:12 PM, Edited by Steven Hawkins - 2020/05/11 1:22 PM

Collapse comment: Ramesh Reddy added a comment - 2020/05/11 10:39 AM

Expand comment: Ramesh Reddy added a comment - 2020/05/11 10:39 AM

Collapse comment: Ramesh Reddy added a comment - 2020/05/04 4:58 PM

Expand comment: Ramesh Reddy added a comment - 2020/05/04 4:58 PM

Collapse comment: Steven Hawkins added a comment - 2020/05/04 2:28 PM

Expand comment: Steven Hawkins added a comment - 2020/05/04 2:28 PM

Collapse comment: Ramesh Reddy added a comment - 2020/05/04 10:15 AM

Expand comment: Ramesh Reddy added a comment - 2020/05/04 10:15 AM

Collapse comment: Ramesh Reddy added a comment - 2020/05/01 6:19 PM, Edited by Ramesh Reddy - 2020/05/04 10:12 AM

Expand comment: Ramesh Reddy added a comment - 2020/05/01 6:19 PM, Edited by Ramesh Reddy - 2020/05/04 10:12 AM

Collapse comment: Ramesh Reddy added a comment - 2020/04/29 4:06 PM

Expand comment: Ramesh Reddy added a comment - 2020/04/29 4:06 PM

Collapse comment: Ramesh Reddy added a comment - 2020/04/22 7:51 AM

Expand comment: Ramesh Reddy added a comment - 2020/04/22 7:51 AM

Collapse comment: Ramesh Reddy added a comment - 2020/04/21 8:37 PM

Expand comment: Ramesh Reddy added a comment - 2020/04/21 8:37 PM

Collapse comment: Steven Hawkins added a comment - 2020/03/10 10:32 AM

Expand comment: Steven Hawkins added a comment - 2020/03/10 10:32 AM

People

Dates

Time Tracking

PagerDuty