I’ve did a small showcase on how to ad-hoc copy data from one database/object store to another database/object store via Trino. This is definitely not the usecase for which Trino was initially conceived, but I was curious to see if such scenario basically works.
I know that Trino JDBC connectors can benefit of the write_batch_size session property (by default 1000) to speed up writing in the target database.
Can the writing on a JDBC database be done in parallel or only one worker can perform the writing?
In case that there’s parallelism for writing, can there be provided a hint to Trino to avoid doing too many parallel INSERT statements when executing a CTAS or INSERT ... SELECT statement on a larger Trino cluster ?
To add a little bit more information, here’s how writes work in Trino today (for almost all JDBC connectors).
The query plan has a TableWrite node in case of queries that write data back to a connector.
Depending on the number of tasks that push data into the TableWrite node there can be multiple table-writer tasks running on the cluster - maybe on all workers or maybe on a few of them.
Each writer task writes it’s own output to a “staging” table. Once the coordinator notices that all table writer tasks have finished it “commits” the operation by copying data from the “staging” table into the final table that you used in your query (mostly via SELECT INTO …)
That way it can provide atomicity - if any of the writer tasks fail the staging table can be dropped and the actual table will remain as it was.
It does have a cost though - copying data from the staging into final table takes time. If you want to trade-off that safety for performance you can use the non-transactional inserts feature as described by @Minnesota_Cold .
TL; DR: By default in most of the cases the writes to JDBC connectors happen in parallel. You can try to reduce the number of nodes that perform writes (and all other operations) by reducing hash_partition_count session property (but that affects all operations - not just the write so it may slow down the query).