'Resource-Efficient Machine Learning' reviewed

The document resumes the discussions from the Dagsthul Seminar 24311. It happened in 4 working groups:

Resource-Efficient Data Selection;
The Future of Portable, Extensible and Composable Machine Learning Systems
Hardware-Software Co-Design for Machine Learning
Workload-Aware Machine Learning Serving

Regarding the topic 4, it highlights that model personalization, fine tunning and prompt engineering is a growing trend, while weight prunning and saprsity exploitation is becomming more practical. Although most of the existing tools are applied in a heuristic manner, in isolation, and/or in a pre-deployment phase.

The authors point to an opportunity to design an algorithm that adaptively tune these optimization variables on a per-request basis, in an online optimization fashion.

With that, they envision that it would be possible to deploy multiple variants of sparsified and specialized models . Periodically profile samples of client requests, and then tune the deployment configurations to maximize throughput subject to accuracy, energy and latency constraints.

The methodology proposed encompass:

Share a workload characterization of real serving traces, the potential of sparsification and oportunities of adaption.
Develop a reconfiguration system with an optimization method, tuning hyperparameters and objectives
Conduct preliminar experiments on two different prototypes for LLMs and traditional ML models