Lucene / Solr for Academia: PhD Thesis Ideas
November 29, 2010 5 Comments
If you are a Lucene or Solr user or developer, please read on, we’d like to hear from you. If you use a different search tool, please also keep reading. And if you have 5 minutes of free time, we’d like to hear from you, too! ;)
We are looking for your suggestions for advanced features that tools like Lucene, Sol, etc. could or should have, but unfortunately don’t have today, and that could be good topics for one’s Master’s or PhD thesis. Some of us here at Sematext are PhD candidates and are looking for suggestions that could result in working code ready to be contributed to open-source. Plus, we are trying to go beyond that and involve the academic community, as described below. Please add your suggestions to the Lucene / Solr Wishlist public spreadsheet, but please keep in mind that we are looking for advanced functionality, not simple features that would be too simple or small as research/thesis topics. Feel free to pass the link to friends and colleagues you think would be interested in this or may want to make suggestions.
We are in early stages of collaborating with academia in areas such as IR/IE/ML/NLP. What we’d like to do is involve the academic community, but with an explicit intention of producing research whose day one goal is to result in an implementation that will get integrated (in)to a specific, non-academic system. Thus, we’d like to come up with very real, very practical problems or deficiencies in existing IR/IE/ML/NLP systems, but that are not simple and that require academic sort of work that then requires real hacking in order to produce at least a working prototype/proof of concept. Our hope is that such a PoC could then be truly integrated, and maybe even improved upon, by industry people.
This may be too abstract and vague, so how about an example.
- Say the target is Lucene and IR.
- Say we identify that ability to do X is missing from Lucene.
- Say that X is non-trivial, that it’s nobody’s immediate itch, and thus won’t be implemented by anyone in Lucene community in the next N months.
- Say that X involves advanced functionality that could benefit from relatively advanced and/or new research coming out of academia, and is thus something that could be a part of someone’s PhD thesis.
- Say we find a PhD candidate with adequate background knowledge and interest in X.
- N months later we could have a working PofC of X.
We are hoping that by doing this we can help everyone:
- The future PhD will have a non-made-up, real-world problem to solve and existing code (Lucene) to hack on.
- Lucene community will get X.
- Lucene community may get a good contributor or committer down the road.
As facilitators of this, we will try hard to work with the academia and teach them “open-source ways”, which includes teaching how to effectively work with the specific open-source community (to the extent this is permissible by one’s academic institution), in order for the research and the real-world needs to be aligned.
So….. at this point we are looking for suggestions of various interesting and practical advanced topics that have both the academic and industry facet to it. And, with this debut blog post, we are specifically turning to the IR/Lucene/Solr community at large to make suggestions. Please add your suggestions to the Lucene / Solr Wishlist public spreadsheet, but please keep in mind that we are looking for advanced functionality, not simple features that would be too simple or small as research/thesis topics. Feel free to pass the link to friends and colleagues you think would be interested in this or may want to make suggestions.