11

I am working on a django based web app that takes python file as input which contains some function, then in backend i have some lists that are passed as parameters through the user's function,which will generate a single value output.The result generated will be used for some further computation.

Here is how the function inside the user's file look like :

def somefunctionname(list):

    ''' some computation performed on list'''

    return float value

At present the approach that i am using is taking user's file as normal file input. Then in my views.py i am executing the file as module and passing the parameters with eval function. Snippet is given below.

Here modulename is the python file name that i had taken from user and importing as module

exec("import "+modulename)

result = eval(f"{modulename}.{somefunctionname}(arguments)")

Which is working absolutely fine. But i know this is not the secured approach.

My question , Is there any other way through which i can run users file securely as the method that i am using is not secure ? I know the proposed solutions can't be full proof but what are the other ways in which i can run this (like if it can be solved with dockerization then what will be the approach or some external tools that i can use with API )? Or if possible can somebody tell me how can i simply sandbox this or any tutorial that can help me..?

Any reference or resource will be helpful.

2 Answers 2

11

It is an important question. In python sandboxing is not trivial.

It is one of the few cases where the question which version of python interpreter you are using. For example, Jyton generates Java bytecode, and JVM has its own mechanism to run code securely.

For CPython, the default interpreter, originally there were some attempts to make a restricted execution mode, that were abandoned long time ago.

Currently, there is that unofficial project, RestrictedPython that might give you what you need. It is not a full sandbox, i.e. will not give you restricted filesystem access or something, but for you needs it may be just enough.

Basically the guys there just rewrote the python compilation in a more restricted way.

What it allows to do is to compile a piece of code and then execute, all in a restricted mode. For example:

from RestrictedPython import safe_builtins, compile_restricted

source_code = """
print('Hello world, but secure')
"""

byte_code = compile_restricted(
    source_code,
    filename='<string>',
    mode='exec'
)
exec(byte_code, {__builtins__ = safe_builtins})

>>> Hello world, but secure

Running with builtins = safe_builtins disables the dangerous functions like open file, import or whatever. There are also other variations of builtins and other options, take some time to read the docs, they are pretty good.

EDIT:

Here is an example for you use case

from RestrictedPython import safe_builtins, compile_restricted
from RestrictedPython.Eval import default_guarded_getitem


def execute_user_code(user_code, user_func, *args, **kwargs):
    """ Executed user code in restricted env
        Args:
            user_code(str) - String containing the unsafe code
            user_func(str) - Function inside user_code to execute and return value
            *args, **kwargs - arguments passed to the user function
        Return:
            Return value of the user_func
    """

    def _apply(f, *a, **kw):
        return f(*a, **kw)

    try:
        # This is the variables we allow user code to see. @result will contain return value.
        restricted_locals = {
            "result": None,
            "args": args,
            "kwargs": kwargs,
        }

        # If you want the user to be able to use some of your functions inside his code,
        # you should add this function to this dictionary.
        # By default many standard actions are disabled. Here I add _apply_ to be able to access
        # args and kwargs and _getitem_ to be able to use arrays. Just think before you add
        # something else. I am not saying you shouldn't do it. You should understand what you
        # are doing thats all.
        restricted_globals = {
            "__builtins__": safe_builtins,
            "_getitem_": default_guarded_getitem,
            "_apply_": _apply,
        }

        # Add another line to user code that executes @user_func
        user_code += "\nresult = {0}(*args, **kwargs)".format(user_func)

        # Compile the user code
        byte_code = compile_restricted(user_code, filename="<user_code>", mode="exec")

        # Run it
        exec(byte_code, restricted_globals, restricted_locals)

        # User code has modified result inside restricted_locals. Return it.
        return restricted_locals["result"]

    except SyntaxError as e:
        # Do whaever you want if the user has code that does not compile
        raise
    except Exception as e:
        # The code did something that is not allowed. Add some nasty punishment to the user here.
        raise

Now you have a function execute_user_code, that receives some unsafe code as a string, a name of a function from this code, arguments, and returns the return value of the function with the given arguments.

Here is a very stupid example of some user code:

example = """
def test(x, name="Johny"):
    return name + " likes " + str(x*x)
"""
# Lets see how this works
print(execute_user_code(example, "test", 5))
# Result: Johny likes 25

But here is what happens when the user code tries to do something unsafe:

malicious_example = """
import sys
print("Now I have the access to your system, muhahahaha")
"""
# Lets see how this works
print(execute_user_code(malicious_example, "test", 5))
# Result - evil plan failed:
#    Traceback (most recent call last):
#  File "restr.py", line 69, in <module>
#    print(execute_user_code(malitious_example, "test", 5))
#  File "restr.py", line 45, in execute_user_code
#    exec(byte_code, restricted_globals, restricted_locals)
#  File "<user_code>", line 2, in <module>
#ImportError: __import__ not found

Possible extension:

Pay attention that the user code is compiled on each call to the function. However, it is possible that you would like to compile the user code once, then execute it with different parameters. So all you have to do is to save the byte_code somewhere, then to call exec with a different set of restricted_locals each time.

EDIT2:

If you want to use import, you can write your own import function that allows to use only modules that you consider safe. Example:

def _import(name, globals=None, locals=None, fromlist=(), level=0):
    safe_modules = ["math"]
    if name in safe_modules:
       globals[name] = __import__(name, globals, locals, fromlist, level)
    else:
        raise Exception("Don't you even think about it {0}".format(name))

safe_builtins['__import__'] = _import # Must be a part of builtins
restricted_globals = {
    "__builtins__": safe_builtins,
    "_getitem_": default_guarded_getitem,
    "_apply_": _apply,
}

....
i_example = """
import math
def myceil(x):
    return math.ceil(x)
"""
print(execute_user_code(i_example, "myceil", 1.5))

Note that this sample import function is VERY primitive, it will not work with stuff like from x import y. You can look here for a more complex implementation.

EDIT3

Note, that lots of python built in functionality is not available out of the box in RestrictedPython, it does not mean it is not available at all. You may need to implement some function for it to become available.

Even some obvious things like sum or += operator are not obvious in the restricted environment.

For example, the for loop uses _getiter_ function that you must implement and provide yourself (in globals). Since you want to avoid infinite loops, you may want to put some limits on the number of iterations allowed. Here is a sample implementation that limits number of iterations to 100:

MAX_ITER_LEN = 100

class MaxCountIter:
    def __init__(self, dataset, max_count):
        self.i = iter(dataset)
        self.left = max_count

    def __iter__(self):
        return self

    def __next__(self):
        if self.left > 0:
            self.left -= 1
            return next(self.i)
        else:
            raise StopIteration()

def _getiter(ob):
    return MaxCountIter(ob, MAX_ITER_LEN)

....

restricted_globals = {
    "_getiter_": _getiter,

....

for_ex = """
def sum(x):
    y = 0
    for i in range(x):
        y = y + i
    return y
"""

print(execute_user_code(for_ex, "sum", 6))

If you don't want to limit loop count, just use identity function as _getiter_:

restricted_globals = {
    "_getiter_": labmda x: x,

Note that simply limiting the loop count does not guarantee security. First, loops can be nested. Second, you cannot limit the execution count of a while loop. To make it secure, you have to execute unsafe code under some timeout.

Please take a moment to read the docs.

Note that not everything is documented (although many things are). You have to learn to read the project's source code for more advanced things. Best way to learn is to try and run some code, and to see what kind function is missing, then to see the source code of the project to understand how to implement it.

EDIT4

There is still another problem - restricted code may have infinite loops. To avoid it, some kind of timeout is required on the code.

Unfortunately, since you are using django, that is multi threaded unless you explicitly specify otherwise, simple trick for timeouts using signeals will not work here, you have to use multiprocessing.

Easiest way in my opinion - use this library. Simply add a decorator to execute_user_code so it will look like this:

@timeout_decorator.timeout(5, use_signals=False)
def execute_user_code(user_code, user_func, *args, **kwargs):

And you are done. The code will never run more than 5 seconds. Pay attention to use_signals=False, without this it may have some unexpected behavior in django.

Also note that this is relatively heavy on resources (and I don't really see a way to overcome this). I mean not really crazy heavy, but it is an extra process spawn. You should hold that in mind in your web server configuration - the api which allows to execute arbitrary user code is more vulnerable to ddos.

1
  • Most code in this answer is awful: mutating global state (safe_builtins), misimplemented __import__ replacement, code injection vulnerability in execute_user_code, ability to propagate BaseException out of the sandbox… The first block does not even parse ({__builtins__ = safe_builtins} is invalid syntax). I had to basically discard it all in my own do-over: <stackoverflow.com/a/71911219/3840170>. Apr 18, 2022 at 18:41
1

For sure with docker you can sandbox the execution if you are careful. You can restrict CPU cycles, max memory, close all network ports, run as a user with read only access to the file system and all).

Still,this would be extremely complex to get it right I think. For me you shall not allow a client to execute arbitrar code like that.

I would be to check if a production/solution isn't already done and use that. I was thinking that some sites allow you to submit some code (python, java, whatever) that is executed on the server.

7
  • Yes it would be extremely complex to work with docker. And moreover i m still new to all these containerization. The project is in development phase so i Haven't decided with what solution should i go ahead. Yes some sites allows us to execute the code but i have to run the user's function about 1000 times i have 1000 list to run on user's function and sites have some limitations let's say for about only 60 calls. But still if there you find some sites that can help with my condition Please let me know. Thanks
    – WOZNIK
    Jul 30, 2020 at 6:33
  • Docker is 100% absolutely not meant to ever execute untrusted code, even if you "get it right" with locked down networking, not running as root, update frequently enough that there aren't any known containerization breakout vulnerabilities, and all that jazz. Jul 30, 2020 at 6:47
  • So what could be the solution (at least for initial kick start) that i can come up with ?
    – WOZNIK
    Jul 30, 2020 at 7:07
  • Docker is still a viable starting point. I use a combination of a Docker image, static analysis, and watchdog. The watchdog checks a database for any pending executions, grabs the next one and fires up the Docker image; if the container takes longer than X seconds, it manually shuts it down. The image runs a static analysis, then a code tester file to process the results, then stores them in the database for retrieval. There are still some vulnerabilities to it, but this is a good kick start and something my team is researching presently.
    – tsumnia
    Aug 4, 2020 at 21:13
  • 1
    Sadly not yet, the build is still very much in its infancy. However, I would say to dig into reading up on my old SO post on unit testing python then dig into making a Docker image run it. Basically, start from the assumption the code is not malicious, get that understood, then start to think about how to protect against potential vulnerabilities.
    – tsumnia
    Aug 5, 2020 at 22:11

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.