Disclaimer: Ideas stolen/borrowed from Andrej Karpathy's Twitter feed and Diogo Santiago's reply to it. (both available at): https://lnkd.in/ePUvM3gB For some time now, I have wondered how I, as a programmer, am always quick to hop on to the bandwagon for the next shiny piece of tech. Be it a new library or a framework or an entirely new paradigm in software development, I have often found myself underestimating the old and overestimating the new - perhaps it is even seen positively in the community as it may demonstrate someone's ability to learn new things continuously. However, just a little while ago, I had the shock of my life as I read Dr Karpathy's latest tweet: I found out that Python's inbuilt square-root function is about 10x faster than the math library's sqrt() function which is about 10x faster than NumPy's sqrt() function! Claims like these beg to be tested and that's what I did! Here's what I found: 1. Using Jupyter's %%timeit command to time the square root of the same function does actually verify the results as claimed. However, note that here we are computing the square root of the same number (1337) over and over again (Caching alert!) 2. So, to avoid any possible caching, I tried to time these functions using a different random number each time. The gains made by math.sqrt() vanish. So, math.sqrt() is just a little bit (15%) better than the vanilla implementation. 3. For my final experiment, I tried to time these functions using a random 1-D array of size 1000. Here, we see that the NumPy implementation actually beats the vanilla implementation by a sizeable gap of 25%! #python #computing I started to write this post as a way to glorify the old and the simple but after looking at the results it seems that we have reached a very anti-climatic conclusion: "Using the right tool for the job usually provides the best results!" Experiments are shown in the image.
Python calculates constant expressions at import time which is why `1337. ** .5` seems so much faster - the code you are timing is not really doing anything. In [28]: def f(): ...: return 1337. ** .5 ...: In [29]: dis.dis(f) 2 0 LOAD_CONST 1 (36.565010597564445) 2 RETURN_VALUE
Interesting. I have tried multiple times. But saw no significant performance diff between the vanilla and np.sqrt even for 1 million numbers. Vanilla is even "faster" in some tries. 🤔 🤔 🤔
You really put the effort on it! Congrats! Tesla DataLoaders will surely be faster by now.
Chief Security Alchemist at Coda
2yThe random benchmarks aren’t good because random() is likely an expensive call compared to sqrt, so that’s going to take the bulk of the call time. Also, looping over the np array to call sqrt on each item is going to have a lot of overhead compared to using a builtin list and iterating over it directly instead of fetching array[i] in a loop.