Data augmentation is a technique that involves artificially expanding a dataset by creating new, varied data points from existing data. This process follows the data collection step and is essential for training machine learning (ML) models, particularly deep learning models, as they require large and diverse datasets to make accurate predictions. By applying transformations such as cropping, rotation, scaling, or flipping to images, or using methods like synonym replacement and back-translation for text data, data augmentation helps enhance the model's ability to generalize across different scenarios.
Data augmentation is a great method for machine learning scientists that follow a data-centric machine learning approach.