Uploaded image for project: 'Teiid Spring Boot'
  1. Teiid Spring Boot
  2. TEIIDSB-170

Automate materialization to JDG

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

    • Icon: Enhancement Enhancement
    • Resolution: Done
    • Icon: Major Major
    • 1.5.0
    • None
    • OpenShift
    • None
    • DV Sprint 60, DV Sprint 61, DV Sprint 62, DV Sprint 63
    • 3

      Create an internal materialization replacement needs that is turnkey materialization to JDG (little to no user setup required)

      • the operator may create the infinispan cluster if needed
      • the status table and internal representation of the materialization target would be setup automatically

      For the user this would be as simple marking a view as materialized and then it would be populated in jdg upon deployment. They would not have any concerns with cache naming, status tables, etc.

      For simplicity the initial version would make a similar assumption to the current internal logic - it is for only a specific vdb. If the vdb cr is modified, then it's expected that the cache would be recreated.

            [TEIIDSB-170] Automate materialization to JDG

            Initial automation of the Infinispan cluster is finished, there is further work as defined in TEIIDSB-169

            The current documentation can be found at https://github.com/teiid/teiid-openshift-examples/blob/master/materializing.adoc

            rhn-engineering-shawkins can you review the above docs and we will change or enhance accordingly.

            Ramesh Reddy added a comment - Initial automation of the Infinispan cluster is finished, there is further work as defined in TEIIDSB-169 The current documentation can be found at https://github.com/teiid/teiid-openshift-examples/blob/master/materializing.adoc rhn-engineering-shawkins can you review the above docs and we will change or enhance accordingly.

            > Especially if there is a danger that this will get forked for standalone lsp efforts.

            Did you mean, since we are going to fork the parser for LSP?

            Ramesh Reddy added a comment - > Especially if there is a danger that this will get forked for standalone lsp efforts. Did you mean, since we are going to fork the parser for LSP?

            > In other posts I believe you've advocated for not worrying about the efficiency of the operator,

            yes, in other posts I talked about getting the functionality but this what you asking is at a different level to bring whole different model where we are leasing out other programs to define functionality. by no means spinning up a java process is quick and may require more resources allocations etc.

            >Can you elaborate what that means? You're trying to based upon the ddl assign cache names that will be associated to views - which could be a combination >of vdb, schema, view name, deployment number - how does a prefix come into play?

            The cache names generated already have prefix of vdb, schema, view-name and deployment number. I am saying instead of looking up each individual cache, we can just look up by vdb and deployment number as prefix to collect all the names that match for destruction or metrics purposes. Right now Infinispan does not have this feature but will have it in next revision. Even if had the full name we can not destroy a individual cache in this version anyway, so no functionality is lost.

            Ramesh Reddy added a comment - > In other posts I believe you've advocated for not worrying about the efficiency of the operator, yes, in other posts I talked about getting the functionality but this what you asking is at a different level to bring whole different model where we are leasing out other programs to define functionality. by no means spinning up a java process is quick and may require more resources allocations etc. >Can you elaborate what that means? You're trying to based upon the ddl assign cache names that will be associated to views - which could be a combination >of vdb, schema, view name, deployment number - how does a prefix come into play? The cache names generated already have prefix of vdb, schema, view-name and deployment number. I am saying instead of looking up each individual cache, we can just look up by vdb and deployment number as prefix to collect all the names that match for destruction or metrics purposes. Right now Infinispan does not have this feature but will have it in next revision. Even if had the full name we can not destroy a individual cache in this version anyway, so no functionality is lost.

            > It is not the target name, it is internal cache name inside the cluster I am after, target name is defined on the View, cacheName defined on the Infinispan's foreign table's options.

            You are differentiating between the teiid fqn for the materialization source target table and the name in source for the target table. Since you are generating only a single materialization schema that is hitting the same materialization target for the whole vdb - these are nearly identical. The teiid fqn is just qualified by the materialization schema name. In any case I'm not saying that it has to exactly be the materialization target property - it can be something else that conveys it's a base name for automation, which will at least be augmented by the "deployment number".

            > sigh, I am not questioning if that can be done or not it is a model of execution. I would need to design a whole new process which may not be efficient at all, one good thing with the model is I can validate the DDL at the start and stop progress if DDL is invalid.

            In other posts I believe you've advocated for not worrying about the efficiency of the operator, which makes sense given that the build will far out way anything that we're doing otherwise. So if there are other benefits, why worry about efficiency for just this?

            > If you are saying older user's DDL fits to that paradigm then I am saying we should not make that as the decision to choose alternatives designs to accommodate it just yet.

            Why lock in to any legacy decisions with regards to DDL structure into the operator? Especially if there is a danger that this will get forked for standalone lsp efforts.

            > What you think about ignoring all this just going by the just the cache prefix?

            Can you elaborate what that means? You're trying to based upon the ddl assign cache names that will be associated to views - which could be a combination of vdb, schema, view name, deployment number - how does a prefix come into play?

            Steven Hawkins added a comment - > It is not the target name, it is internal cache name inside the cluster I am after, target name is defined on the View, cacheName defined on the Infinispan's foreign table's options. You are differentiating between the teiid fqn for the materialization source target table and the name in source for the target table. Since you are generating only a single materialization schema that is hitting the same materialization target for the whole vdb - these are nearly identical. The teiid fqn is just qualified by the materialization schema name. In any case I'm not saying that it has to exactly be the materialization target property - it can be something else that conveys it's a base name for automation, which will at least be augmented by the "deployment number". > sigh, I am not questioning if that can be done or not it is a model of execution. I would need to design a whole new process which may not be efficient at all, one good thing with the model is I can validate the DDL at the start and stop progress if DDL is invalid. In other posts I believe you've advocated for not worrying about the efficiency of the operator, which makes sense given that the build will far out way anything that we're doing otherwise. So if there are other benefits, why worry about efficiency for just this? > If you are saying older user's DDL fits to that paradigm then I am saying we should not make that as the decision to choose alternatives designs to accommodate it just yet. Why lock in to any legacy decisions with regards to DDL structure into the operator? Especially if there is a danger that this will get forked for standalone lsp efforts. > What you think about ignoring all this just going by the just the cache prefix? Can you elaborate what that means? You're trying to based upon the ddl assign cache names that will be associated to views - which could be a combination of vdb, schema, view name, deployment number - how does a prefix come into play?

            Ramesh Reddy added a comment - - edited

            > Recall that only in the internal materialization case did we not require the setting of a materialization target name.

            It is not the target name, it is internal cache name inside the cluster I am after, target name is defined on the View, cacheName defined on the Infinispan's foreign table's options.

            > In short yes. There should certainly be examples of spinning up a java process from go.

            sigh, I am not questioning if that can be done or not it is a model of execution. I would need to design a whole new process which may not be efficient at all, one good thing with the model is I can validate the DDL at the start and stop progress if DDL is invalid.

            > I don't fully understand what you are advocating here.

            You mentioned that schema variance from backward compatibility, I am saying if the user already writing the DDL based on your strict schema parsing, where this issue arises? If you are saying older user's DDL fits to that paradigm then I am saying we should not make that as the decision to choose alternatives designs to accommodate it just yet.

            What you think about ignoring all this just going by the just the cache prefix?

            Ramesh Reddy added a comment - - edited > Recall that only in the internal materialization case did we not require the setting of a materialization target name. It is not the target name, it is internal cache name inside the cluster I am after, target name is defined on the View, cacheName defined on the Infinispan's foreign table's options. > In short yes. There should certainly be examples of spinning up a java process from go. sigh, I am not questioning if that can be done or not it is a model of execution. I would need to design a whole new process which may not be efficient at all, one good thing with the model is I can validate the DDL at the start and stop progress if DDL is invalid. > I don't fully understand what you are advocating here. You mentioned that schema variance from backward compatibility, I am saying if the user already writing the DDL based on your strict schema parsing, where this issue arises? If you are saying older user's DDL fits to that paradigm then I am saying we should not make that as the decision to choose alternatives designs to accommodate it just yet. What you think about ignoring all this just going by the just the cache prefix?

            > I think it is bad to ask user for more input based because of the your processing model, let's say if we are going to materialize to database, do we need this? no. Then why it should be different for this data grid case.

            Whether we need the target name or not is an implementation decision - which isn't really based upon data grid. Recall that only in the internal materialization case did we not require the setting of a materialization target name. For all others we did.

            if for whatever reason it's too much work (and/or we want to differentiate between in-process internal) to automate around the cache name, regardless of the materialization source type, then it's not too much of a stretch to still require it.

            > Unless we want to start writing an Operator in Java I do not how to do that. You are asking spin an external process out to java and execute and capture the results in GO operator.

            In short yes. There should certainly be examples of spinning up a java process from go. If not it's really easy to imagine a service, probably serverless, that could be provided whatever input ddl and could provide an output of what you need to know.

            > I do not agree completely, I suggest we keep thinking this as new development, then how we can bring old customers if any into this model, not the other way around. We can not compromise on the designs for it IMO.

            I don't fully understand what you are advocating here. I'm not suggesting that stuff based upon the latest Teiids isn't somehow a "new/different", but rather that it drags along a lot of baggage because of the architecture - which you are proposing to base the operator on.

            Steven Hawkins added a comment - > I think it is bad to ask user for more input based because of the your processing model, let's say if we are going to materialize to database, do we need this? no. Then why it should be different for this data grid case. Whether we need the target name or not is an implementation decision - which isn't really based upon data grid. Recall that only in the internal materialization case did we not require the setting of a materialization target name. For all others we did. if for whatever reason it's too much work (and/or we want to differentiate between in-process internal) to automate around the cache name, regardless of the materialization source type, then it's not too much of a stretch to still require it. > Unless we want to start writing an Operator in Java I do not how to do that. You are asking spin an external process out to java and execute and capture the results in GO operator. In short yes. There should certainly be examples of spinning up a java process from go. If not it's really easy to imagine a service, probably serverless, that could be provided whatever input ddl and could provide an output of what you need to know. > I do not agree completely, I suggest we keep thinking this as new development, then how we can bring old customers if any into this model, not the other way around. We can not compromise on the designs for it IMO. I don't fully understand what you are advocating here. I'm not suggesting that stuff based upon the latest Teiids isn't somehow a "new/different", but rather that it drags along a lot of baggage because of the architecture - which you are proposing to base the operator on.

            > It's better to require slightly more user input, than to introduce a brittle solution.

            I think it is bad to ask user for more input based because of the your processing model, let's say if we are going to materialize to database, do we need this? no. Then why it should be different for this data grid case.

            > there are instances where it's better to have an in-process and in-memory cache, which we can't express here correct?

            Yes, you can express it, see that table I provided above. The create flag can be tuned off, but it is not by default.

            > If you want foolproof, then we'd make an out of process call to the existing parser.

            Unless we want to start writing an Operator in Java I do not how to do that. You are asking spin an external process out to java and execute and capture the results in GO operator.

            The other we could do, we start all the caches with "resource name" as prefix or post fix, then delete all of them that matches to that. It can not any simpler than that.

            Ramesh Reddy added a comment - > It's better to require slightly more user input, than to introduce a brittle solution. I think it is bad to ask user for more input based because of the your processing model, let's say if we are going to materialize to database, do we need this? no. Then why it should be different for this data grid case. > there are instances where it's better to have an in-process and in-memory cache, which we can't express here correct? Yes, you can express it, see that table I provided above. The create flag can be tuned off, but it is not by default. > If you want foolproof, then we'd make an out of process call to the existing parser. Unless we want to start writing an Operator in Java I do not how to do that. You are asking spin an external process out to java and execute and capture the results in GO operator. The other we could do, we start all the caches with "resource name" as prefix or post fix, then delete all of them that matches to that. It can not any simpler than that.

            > I am saying any of these require further user input that that takes away the usability of it as user expected to additional input.

            It's better to require slightly more user input, than to introduce a brittle solution. It also for example allows the user to differentiate between internal and datagrid - there are instances where it's better to have an in-process and in-memory cache, which we can't express here correct?

            > It is not like language features keep extending, these 3 different statements that we are interested in IMO are much static in nature.

            That is not entirely correct and is building assumptions of context around artificial limitations we introduced for backwards compatibility - such as create schema cannot take a statement list (which are not semicolon delimited), create view, etc. cannot be schema qualified, option values being literals (which had changed from the initial incarnation to accept things other than strings), etc. Why start down the path of baking that into another place?

            > if we want foolproof one this is it.

            If you want foolproof, then we'd make an out of process call to the existing parser.

            Steven Hawkins added a comment - > I am saying any of these require further user input that that takes away the usability of it as user expected to additional input. It's better to require slightly more user input, than to introduce a brittle solution. It also for example allows the user to differentiate between internal and datagrid - there are instances where it's better to have an in-process and in-memory cache, which we can't express here correct? > It is not like language features keep extending, these 3 different statements that we are interested in IMO are much static in nature. That is not entirely correct and is building assumptions of context around artificial limitations we introduced for backwards compatibility - such as create schema cannot take a statement list (which are not semicolon delimited), create view, etc. cannot be schema qualified, option values being literals (which had changed from the initial incarnation to accept things other than strings), etc. Why start down the path of baking that into another place? > if we want foolproof one this is it. If you want foolproof, then we'd make an out of process call to the existing parser.

            > They simply are providing a materialization target (or similar) unique name, which is used to drive the base names of the caches created.

            I am saying any of these require further user input that that takes away the usability of it as user expected to additional input.

            > I'm not looking forward to maintaining two parsers. I'd prefer most of the other solutions to this one.

            IMO, one does not need to write a full parser, just enough for the commands, and this is the best option for complete automation. It is not like language features keep extending, these 3 different statements that we are interested in IMO are much static in nature. The problem of cache names is not alone, the issue also exists for maven pom.xml generation, if we want foolproof one this is it.

            Ramesh Reddy added a comment - > They simply are providing a materialization target (or similar) unique name, which is used to drive the base names of the caches created. I am saying any of these require further user input that that takes away the usability of it as user expected to additional input. > I'm not looking forward to maintaining two parsers. I'd prefer most of the other solutions to this one. IMO, one does not need to write a full parser, just enough for the commands, and this is the best option for complete automation. It is not like language features keep extending, these 3 different statements that we are interested in IMO are much static in nature. The problem of cache names is not alone, the issue also exists for maven pom.xml generation, if we want foolproof one this is it.

            Steven Hawkins added a comment - - edited

            > Actually, this does not work, the scanning we are looking at is done before the maven code-gen is run, so most of the techniques fall into the user's lap to have the knowledge to enter these, thus failing the automation step.

            I don't follow you on this. They simply are providing a materialization target (or similar) unique name, which is used to drive the base names of the caches created.

            > I have started looking at yacc passed parser in GO to see if I can write a parser for only the DDL statements that we need currently. Steven Hawkins can you give examples of inline comments in SQL for these as you mentioned above? Looking through SQL grammar file in Teiid, I find only one place like

            I'm not looking forward to maintaining two parsers. I'd prefer most of the other solutions to this one.

            > I was not sure any others exist?

            End of line comments

             -- I'm a comment 

            And inline/multi-line comments are supported - I even added comment nesting to further complicate things.

            /* I'm a big
                comment
            */
            
            /* I'm a big
                comment
                /* with a nested comment */
            */
            

            Steven Hawkins added a comment - - edited > Actually, this does not work, the scanning we are looking at is done before the maven code-gen is run, so most of the techniques fall into the user's lap to have the knowledge to enter these, thus failing the automation step. I don't follow you on this. They simply are providing a materialization target (or similar) unique name, which is used to drive the base names of the caches created. > I have started looking at yacc passed parser in GO to see if I can write a parser for only the DDL statements that we need currently. Steven Hawkins can you give examples of inline comments in SQL for these as you mentioned above? Looking through SQL grammar file in Teiid, I find only one place like I'm not looking forward to maintaining two parsers. I'd prefer most of the other solutions to this one. > I was not sure any others exist? End of line comments -- I'm a comment And inline/multi-line comments are supported - I even added comment nesting to further complicate things. /* I'm a big comment */ /* I'm a big comment /* with a nested comment */ */

              rhn-engineering-rareddy Ramesh Reddy
              rhn-engineering-shawkins Steven Hawkins
              Archiver:
              rhn-support-adandapa Aitik Dandapat (Inactive)

                Created:
                Updated:
                Resolved:
                Archived:

                  Estimated:
                  Original Estimate - 1 week Original Estimate - 1 week
                  1w
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 2 weeks, 2 days, 2 hours
                  2w 2d 2h