Annotation Do's & Dont's
Your model is as good as the data provided to it during the training phase. Its important to have the right quantity of data, right representation of the real world data is picked for training your model. Its equally important to label and annotate the data the properly to get accurate results
📌 All labels should be present in all samples for training.
All the labels that the model to be trained on should be annotated in the entire training set. Lets say you need to extract 10 labels from W2 and you got 50 training files. All the 50 documents should be annotated for the 10 labels. all the Inconsistent labels across training data will bring down the model accuracy


📌Name of label should be same for similar fields across the training data
Lets take an example, If you need to label Social Security Number, you are free to pick any name for the label, but the same name should be used across all the documents

📌Annotate only the text portion of the area of interest
Don't include too much white space around your label when you annotate a text. This noise will reduce the over all model accuracy

Last updated
Was this helpful?