Annotation Do's & Dont's
Last updated
Was this helpful?
Last updated
Was this helpful?
Your model is as good as the data provided to it during the training phase. Its important to have the right quantity of data, right representation of the real world data is picked for training your model. Its equally important to label and annotate the data the properly to get accurate results
All the labels that the model to be trained on should be annotated in the entire training set. Lets say you need to extract 10 labels from W2 and you got 50 training files. All the 50 documents should be annotated for the 10 labels. all the Inconsistent labels across training data will bring down the model accuracy
Lets take an example, If you need to label Social Security Number, you are free to pick any name for the label, but the same name should be used across all the documents
Don't include too much white space around your label when you annotate a text. This noise will reduce the over all model accuracy