Open source machine learning systems are highly vulnerable to security threats

Open source machine learning systems are highly vulnerable to security threats

  • MLflow identified as most vulnerable open-source ML platform
  • Directory traversal flaws allow unauthorized file access in Weave
  • ZenML Cloud’s access control issues enable privilege escalation risks

Recent analysis of the security landscape of machine learning (ML) frameworks has revealed ML software is subject to more security vulnerabilities than more mature categories like DevOps or Web servers.

The growing adoption of machine learning across industries highlights the critical need to secure ML systems, as vulnerabilities can lead to unauthorized access, data breaches, and compromised operations.

The report from JFrog claims ML projects such as MLflow have seen an increase in critical vulnerabilities. Over the last few months, JFrog has uncovered 22 vulnerabilities across 15 open source ML projects. Among these vulnerabilities, two categories stand out: threats targeting server-side components and risks of privilege escalation within ML frameworks.

Critical vulnerabilities in ML frameworks

The vulnerabilities identified by JFrog affect key components often used in ML workflows, which could allow attackers to exploit tools which are often trusted by ML practitioners for their flexibility, to gain unauthorized access to sensitive files or to elevate privileges within ML environments.

One of the highlighted vulnerabilities involves Weave, a popular toolkit from Weights & Biases (W&B), which aids in tracking and visualizing ML model metrics. The WANDB Weave Directory Traversal vulnerability (CVE-2024-7340) enables low-privileged users to access arbitrary files across the filesystem.

This flaw arises due to improper input validation when handling file paths, potentially allowing attackers to view sensitive files that could include admin API keys or other privileged information. Such a breach could lead to privilege escalation, giving attackers unauthorized access to resources and compromising the security of the entire ML pipeline.

ZenML, an MLOps pipeline management tool, is also affected by a critical vulnerability that compromises its access control systems. This flaw allows attackers with minimal access privileges to elevate their permissions within ZenML Cloud, a managed deployment of ZenML, thereby accessing restricted information, including confidential secrets or model files.

The access control issue in ZenML exposes the system to significant risks, as escalated privileges could enable an attacker to manipulate ML pipelines, tamper with model data, or access sensitive operational data, potentially impacting production environments reliant on these pipelines.

Another serious vulnerability, known as the Deep Lake Command Injection (CVE-2024-6507), was found in the Deep Lake database – a data storage solution optimized for AI applications. This vulnerability permits attackers to execute arbitrary commands by exploiting how Deep Lake handles external dataset imports.

Due to improper command sanitization, an attacker could potentially achieve remote code execution, compromising the security of both the database and any connected applications.

A notable vulnerability was also found in Vanna AI, a tool designed for natural language SQL query generation and visualization. The Vanna.AI Prompt Injection (CVE-2024-5565) allows attackers to inject malicious code into SQL prompts, which the tool subsequently processes. This vulnerability, which could lead to remote code execution, allows malicious actors to target Vanna AI’s SQL-to-graph visualization feature to manipulate visualizations, execute SQL injections, or exfiltrate data.

Mage.AI, an MLOps tool for managing data pipelines, has been found to have multiple vulnerabilities, including unauthorized shell access, arbitrary file leaks, and weak path traversal checks.

These issues allow attackers to gain control over data pipelines, expose sensitive configurations, or even execute malicious commands. The combination of these vulnerabilities presents a high risk of privilege escalation and data integrity breaches, compromising the security and stability of ML pipelines.

By gaining admin access to ML databases or registries, attackers can embed malicious code in models, leading to backdoors that activate upon model load. This can compromise downstream processes as the models are utilized by various teams and CI/CD pipelines. The attackers can also exfiltrate sensitive data or conduct model poisoning attacks to degrade model performance or manipulate outputs.

JFrog’s findings highlight an operational gap in MLOps security. Many organizations lack robust integration of AI/ML security practices with broader cybersecurity strategies, leaving potential blind spots. As ML and AI continue to drive significant industry advancements, safeguarding the frameworks, datasets, and models that fuel these innovations becomes paramount.

You might also like

administrator

Related Articles