Why am I elasticsearch’ing
What does it mean to do IT? What is information? Information is not data. It is data that has been crunched and digested, so we can learn something about that. Do you have some text PDFs or Word documents that make you sick everytime you try to search them? No more! Behold the tool
Elasticsearch is my hobby. Not only as ELK part. Can be easily entangled to work with any data. This data is in your files. Underneath there is information.
Here in this repo you will find doc searcher. You can run it locally. Make your documents searchable. How does it work?
Step #1 Run engine – Elastic
Following is about Elastic version = 5 you have embedded mechanism
ingestnot tested by me yet.
Elasticsearch will accept all docs. There is great libary written in Java for attachments – Tika. Based on that guys made fancy plugin
mapper-attachment that can be found here. Follow instruction matching your Elasticsearch version to install that using good old
bin\plugin install. After running successful plugin installation you should have list of at least one plugin:
>> bin\plugin.bat list Installed plugins in D:\p\elasticsearch-2.4.0\plugins: - mapper-attachments
Step #2 Index docs – Python
You could pass all of document contents manually using Sense or Postman but it could be little troublesome. That’s why automation is good. So I have wrote some Python script that can do it. There is Python3 needed. You just run the python app and it indexes by default all contents of folder
files_to_index to your local ES instance. Python library for that is very cool, you just type the following and it opens up connection happy and ready for any query.
from elasticsearch import Elasticsearch HOST = 'http://localhost:9200' es = Elasticsearch([HOST])
Step #3 Browse your docs – Browser
Let us show indexed docs! I have made little js browser. You can also check how does it look with scoring for each document. Just run
web/view.html and type anything in input at this ultra-simple html page.
right tool to right task
If you only have a hammer, you tend to see every problem as a nail.
Elasticsearch is not a database but search engine. It doesn’t come without cost – indexes with words from all documents can take some space. But whenever you encounter issue with search please consider search tool.