layout: global title: “ML Model Security” displayTitle: “Spark ML Model Security” license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
In Apache Spark, loading a machine learning (ML) model is fundamentally equivalent to loading and executing code. Spark ML models often contain serialized objects, transformation logic, and execution graphs that are evaluated by the Spark runtime during model loading and inference. The principle is not unique to Spark, it applies equally to scikit-learn, PyTorch, TensorFlow, and other modern ML ecosystems. As a result, loading a model from an untrusted source introduces the same security risks as executing untrusted software.
Spark ML frameworks serialize not only data (such as weights and parameters) but also executable structures and behaviors. Because of this, model loading is not merely data parsing. It involves interpreting and executing instructions, which means a malicious model can:
In practice, loading a model from an untrusted source is equivalent to running a program downloaded from the internet.
Because Spark ML models can embed executable logic, loading untrusted models can lead to:
These risks are amplified in distributed environments, where a malicious model may execute across multiple cluster nodes.
Because loading ML models is equivalent to loading executable code, the responsibility for security ultimately lies with the end user or deploying organization. End users are responsible for ensuring that ML models are subject to the same security assessment, validation, and operational controls as any third-party software. This includes:
Frameworks and libraries can provide safeguards, but they cannot guarantee security when loading arbitrary third-party models.