Virtual machine

Hello everyone,

Due to data protection guidelines, I would like to install Anconda on a virtual machine. I want to use it to run machine learning algorithms on a dataset of about 10,000 cases.

What CPU and memory do I need to make this all work smoothly? Do you have any recommendations?

Thanks in advance!

Hey Julia, welcome to the Anaconda community!

Annoyingly my only answer would be that it really depends on a lot of factors. In theory a dataset with 10k cases isn’t that big and most up to date ML algorithms should be able to run find on some pretty basic specs, e.g. 8GB RAM and 2 cores of CPU. Something like a t3.medium or t3.large.

However, the bit where I say “it depends”, is the choice of algorithm and how you specify the model before running it. I’ve seen poorly specified models take 24 hours to run, then only take a minute or two when specified properly. By poorly specified I mean either using parameters that are too narrow/broad, or data that is either highly correlated or not relevant at all which results in models failing to converge or finish calculating.

So I guess my advice is to start with some modest resources, maybe create a smaller subset of your data just while you’re refining the code and model params, and only pay for bigger compute when you’re 100% confident that’s the only way to get the result you need.

Hope that helps and let us know how you get on!
Jack

Thank you very much for your kind reply!
Does setting up a virtual machine change the hardware requirements?

Thank you very much for your kind reply!
Does setting up a virtual machine change the hardware requirements?

I don’t believe so… if anything it would reduce the hardware requirements slightly, as you’d likely have less stuff running on the virtual machine compared to your laptop where you’d probably have slack/teams/zoom etc. always running in the background and soaking up a small amounts of RAM/CPU each.

Okay, sounds logical! Thank you!