Due to increasingly flexible production schemes, adaptations in production lines have to be made in shorter time intervals. New configurations and parameter settings that were never run before in the production line produce uncertain results which may cause downtimes or defective products. This challenge is especially prevalent in production lines with fast changing production requirements and low output volumes. A mathematical approach towards parameter setting has potential to improve the efficiency and find alternative and optimal parameter configurations that guarantee the best possible results from the start.
Most parameter settings are not well documented or saved in formats suitable for applying machine learning. In fact, manufacturing companies mostly rely on their employees experience to set parameters. This tacit knowledge of employees is not reproducible and may lead to inefficiencies once these employees leave the company. In addition, different parameter settings can not be implemented at low cost which causes inefficiencies as multiple options can not be compared to each other and the optimal parameter settings are not implemented in most occasions.
Reinforcement learning or (deep) Q-learning algorithms are a technical approach to optimize parameters efficiently. In reinforcement learning an objective function is defined, which depends on actions and states. The function is approximated by observation and trained with additional training data, i.e. outcomes. This function is then optimized until human performance is reached or even exceeded. With additional simulations, the algorithm can increase its experience on the influence or weights of different parameters and can be further optimized.
Reinforcement learning has successfully proven to beat human-level performance in complex tasks as e.g. AlphaGo and is already applied in train scheduling and routing.