Train, Validation and Test Sets

Before You Begin

Let's get a common understanding of a few terms in machine learning world. Next few sections is for anyone who needs to know the difference between the various dataset splits concepts while training Machine Learning models. For deeper understanding, refer to a great article here.

A few key terms to get familiarized in the context of Jasper

Training Dataset: The sample of data used to train an algorithm to build the model.

Validation Dataset: The sample of data used tell how good the model performs its prediction or classification on a data that it had not seen so far. Results from the validation set indicates if the model need to be trained on more data or not

Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset. The test set is well curated. It should contain sampled data that spans the various that the model would face, when used in the real world.

🧙‍♂️ In short Validation Set is used to fine tune the model , where as Test Set is used to find the performance of your model

Data Split Ratio

Before building any model , once the test data is prepared, the data is split into Train & Test. After this from the Test set, randomly choose X% of Train dataset to be the actual Train set and the remaining (100-X)% to be the Validation set, where X is a fixed number(say 80%).

Why is Validation Set important

Model performance is computed against running the model against the validation set.Validation set is used for tuning the parameters of a model. See more about performance metrics in Jasper Build Your Model

PreviousIntegrate the model NextData split in Jasper

Last updated 5 years ago

Was this helpful?