add paper icde 2020 Signed-off-by: Bertty Contreras-Rojas <bertty@scalytics.io>

commit: 1e37df313d2ad8855e4da55541175d26f08258c4 [log] [tgz]
author: Bertty Contreras-Rojas <bertty@scalytics.io> Tue Mar 30 18:25:55 2021 -0300
committer: Bertty Contreras-Rojas <bertty@scalytics.io> Tue Mar 30 18:25:55 2021 -0300
tree: 4cb2486fd25c6d9bef7f7d997858e49ed4508215
parent: 2d7d3b40991cb109237b79416f53eccb41c4d76d [diff]
diff --git a/_publications/2020-04-20-icde.md b/_publications/2020-04-20-icde.md
new file mode 100644
index 0000000..042c773
--- /dev/null
+++ b/_publications/2020-04-20-icde.md

@@ -0,0 +1,31 @@
+---
+license: |
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+   
+           http://www.apache.org/licenses/LICENSE-2.0
+   
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+layout: publication
+title: Publication
+subtitle: >
+   ML-based Cross-Platform Query Optimization
+link-name: ICDE 2020
+img-thumb: assets/img/screenshot/rheem.png
+authors: Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, Bertty Contreras-Rojas, Rodrigo Pardo-Meza, Anis Troudi and Sanjay Chawla
+year: 2020
+month: 04
+day: 20
+link-paper: assets/pdf/paper/icde20.pdf
+link-external: false
+---
+
+Cost-based optimization is widely known to suffer from a major weakness: administrators spend a significant amount of time to tune the associated cost models. This problem only gets exacerbated in cross-platform settings as there are many more parameters that need to be tuned. In the era of machine learning(ML), the first step to remedy this problem is to replace the cost model of the optimizer with an ML model. However, such a solution brings in two major challenges. First, the optimizer has to transform a query plan to a vector million times during plan enumeration incurring a very high overhead. Second, a lot of training data is required to effectively train the ML model.We overcome these challenges in Robopt, a novel vector-based optimizer we have built for Rheem, a cross-platform system. Robopt not only uses an ML model to prune the search space but also bases the entire plan enumeration on a set of algebraic operations that operate on vectors, which are a natural fit to the ML model. This leads to both speed-up and scale-up of the enumeration process by exploiting modern CPUs via vectorization. We also accompany Robopt with a scalable training data generator for building its ML model. Our evaluation shows that (i) the vector-based approach is more efficient and scalable than simply using an ML model and (ii) Robopt matches and, in some cases, improves Rheem’s cost-based optimizer in choosing good plans without requiring any tuning effort.Index  Terms—query optimization, machine learning, cross-platform data processing, polystores.

diff --git a/assets/pdf/paper/icde20.pdf b/assets/pdf/paper/icde20.pdf
new file mode 100644
index 0000000..d3f092e
--- /dev/null
+++ b/assets/pdf/paper/icde20.pdf
Binary files differ
commit	1e37df313d2ad8855e4da55541175d26f08258c4	[log] [tgz]
author	Bertty Contreras-Rojas <bertty@scalytics.io>	Tue Mar 30 18:25:55 2021 -0300
committer	Bertty Contreras-Rojas <bertty@scalytics.io>	Tue Mar 30 18:25:55 2021 -0300
tree	4cb2486fd25c6d9bef7f7d997858e49ed4508215
parent	2d7d3b40991cb109237b79416f53eccb41c4d76d [diff]