Train, Validation and Test Sets
Last updated
Was this helpful?
Last updated
Was this helpful?
Let's get a common understanding of a few terms in machine learning world. Next few sections is for anyone who needs to know the difference between the various dataset splits concepts while training Machine Learning models. For deeper understanding, refer to a great article .
A few key terms to get familiarized in the context of Jasper
Training Dataset: The sample of data used to train an algorithm to build the model.
Validation Dataset: The sample of data used tell how good the model performs its prediction or classification on a data that it had not seen so far. Results from the validation set indicates if the model need to be trained on more data or not
Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset. The test set is well curated. It should contain sampled data that spans the various that the model would face, when used in the real world.
In short Validation Set is used to fine tune the model , where as Test Set is used to find the performance of your model
Before building any model , once the test data is prepared, the data is split into Train & Test. After this from the Test set, randomly choose X% of Train dataset to be the actual Train set and the remaining (100-X)% to be the Validation set, where X is a fixed number(say 80%).
Why is Validation Set important
Model performance is computed against running the model against the validation set.Validation set is used for tuning the parameters of a model. See more about performance metrics in Jasper