Table of contents
What do you understand by TF-IDF?
TF-IDF:
It stands for the term frequency-inverse document frequency.
TF-IDF weight:
It is a statistical measure used to evaluate how important a word is to a document in a collection or corpus.
The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus.
- Term Frequency (TF):
It is a scoring of the frequency of the word in the current document.
Since every document is different in length, it is possible that a term would appear much more times in long documents than in shorter ones. The term frequency is often divided by the document length to normalize
- Inverse Document Frequency (IDF):
- It is a scoring of how rare the word is across the documents. It is a measure of how rare a term is, the Rarer the term, and more is the IDF score.
Thus,