Is there a discussion topic or faq that discusses best practices to optimize python performance in excel? I was hoping to get better performance by using python to filter a large dataset instead of sql queries, but so far it is disappointing. Also, the python code seems to be recalculating when upstream cells haven’t changed; inserting a column on a separate worksheet even triggers a python recalc.
Python code cells always recalculate when any Python cell changes. Python is not a purely functional language, so the reason for this forced recalculation is to guarantee consistent results. Microsoft is aware of the pain this causes and is working on a solution.
As for Python being slower than SQL, if you’re using the QUERY function, that will almost always be faster than calling out to Python. Python has to copy the data from your table out to Azure, load it into a pandas dataframe, compute the result, copy it back to your local spreadsheet, and display it. The QUERY function is using a local instance of sqlite (I believe), so everything stays local to your PC. In this scenario, the performance of Python isn’t the issue – it’s the data movement across the internet.
Thank you for your reply and the clarification. That is very helpful. Unfortunately, I don’t think Python in Excel offers the best solution in this case.
Python in Excel offers new capabilities beyond what Excel can offer – fancier images, machine learning, regular expression matching, etc. But it is meant to augment, not replace, the basic functionalities in Excel. If you are comfortable using QUERY, SUMIF, XLOOKUP and these meet your needs, keep using them.