I am new to IR techniques.
I looking for a Java based API or tool that does the following.
- Download the given set of URLs
- Extract the tokens
- Remove the stop words
- Perform Stemming
- Create Inverted Index
- Calculate the TF-IDF
Kindly let me know how can Lucene be helpful to me.
Regards Yuvi