While deploying our internal A.I. service I have been looking for GPU options to speed up the local models I want to enable as an option along the premium services. There are definitely no shortage of options, but they all come with quite hefty price tag.
There are turnkey solutions where tou can just select a model and deploy it to the cloud without worrying the underlying infrastructure. The ease sure comes with a price. These were the first options I ruled out. There are also middle ground options where you still need to take care of some of the infrastructure that are somewhat cheaper.
Going with the cheapest option does require quite a lot of work getting the setup right, but the benefit will be the savings from the running costs. It also enables running more models when you can actually run those on the same instance instead of having an individual instance for each model.
Setting such custom solution takes it's own time. So to not to delay the release of the service the local models feature have to be postponed, or launces with much slower CPU interference.