I have typical OLAP query that does a join, group by on a dimension, and a aggregation of a metric all on the same connected DB. The query returns ~100 rows.
But Trino is breaking up the query and doing the filtering on the connected DB, but then does the join and aggregation in the Trino Server, moving ~100M rows across the network.
Is there a way to “quote” a sub query, to tell the Trino planner just pushdown the query and don’t bother making a sub plan?
I have a star schema, with one huge table (>100M rows) and a dimension table that has static attributes of the huge table.
The dimension table is filtered to create a map, that is joined to the huge table. The result is group by on a dimension and finally some of the metrics from the huge table are aggregated to calculate stats.
trino> USE mysql.tiny;
trino:tiny> SELECT * FROM TABLE(system.query(query => 'SELECT 1 a'));
trino:tiny> SELECT * FROM TABLE(system.query(query => 'SELECT @@version'));
So this will run exactly the command on the underlying database (not exactly a pushdown but a pass-through) and return the results to Trino as a Table. SELECT @@version is MySQL specific syntax that returns MySQL output as a table that now Trino is able to further process.
Just know that this has the trade-off of you managing the underlying syntax yourself.
Did you happen to set the catalog and schema in the session? This is still very new syntax (so new that we haven’t set up the documentation yet) but from what I’ve deduced, is that the query function is registered to different catalogs, and you need to specify the catalog in order for it to know which function to use. Notice I set the session property in my example to set catalog/schema.
Could you give an example of an encapsulating query that joins between two quoted queries?
How can the catalog.datebase context be set in two different parts of the encapsulated query?
Is there a context parameter?