I’m building a model that uses historical data that is then applied forward to future periods using monte carlo simulations - lets say for 3 future years worth of monthly data, based on ~8 years of historical data (organized into a large table hard-coded in my workbook).
The monthly calculations are somewhat adjustable, so I can tweak what percentiles represent outliers in the data, or even which years’ worth of historical data to include in the projection of future data- so I essentially have a large table with all of the months I’m projecting data for with various toggles/adjustment cells for each month.
Originally I built the code in the M$ implementation of Python in Excel to essentially work in one big script using global variables that would parse all the historical data, read in the parameters for the future months, and do all the calculations. Then my output tab would reference those global variables by year/month to display the statistics and outputs associated with it. This worked for a single month or two of data, but would time-out prior to completing the calculation if I extended it out further.
Next I tried splitting the code up so that each step was broken into individual functions and each month’s worth of data was filtered out and returned as python objects that got referenced in the next cell - so my table of monthly parameters now also included all of the outputs that would be used in subsequent steps. This worked, but was very slow, and even with manual calculation mode enabled, there were times where Excel, in its infinite wisdom, would decide to recalculate everything from the start even when only half the cells had completed their calculation. I also kept running out of Cloud Compute time - even when I upgraded to a paid license (apparently even paid licenses don’t have unlimited compute).
So that brought me to Anaconda Code and an attempt to move this workbook to a local computation. I tried simply migrating the code that I had working above into Anaconda.code, but I still kept running into issues with Excel causing a recalculation before all of the cells had completed calculating - and, in fact, unlike the Cloud version, Excel would eventually hang and the calculation would never fully complete no matter how long I left it. Now I’m debating trying to move back to my initial plan of having all the code in one cell with global variables, but before spinning my wheels on that, I thought I’d see if anyone here had any advice or suggestions on how best to approach this problem.
Am I best perhaps trying to break this down into independent blocks that run in isolated mode and simply reference each other? Should I continue trying to have a large set of cells running in linked mode all referencing each other? Does having everything calculate in a single cell and then referencing that one cell from all output cells make better sense?
Open to any thoughts or ideas!
Thanks