Be elastic. Search.

Why am I elasticsearch’ing

What does it mean to do IT? What is information? Information is not data. It is data that has been crunched and digested, so we can learn something about that. Do you have some text PDFs or Word documents that make you sick everytime you try to search them? No more! Behold the tool

the tool

Elasticsearch is my hobby. Not only as ELK part. Can be easily entangled to work with any data. This data is in your files. Underneath there is information.


Here in this repo you will find doc searcher. You can run it locally. Make your documents searchable. How does it work?

Step #1 Run engine – Elastic

Following is about Elastic version = 5 you have embedded mechanism ingest not tested by me yet.

Elasticsearch will accept all docs. There is great libary written in Java for attachments – Tika. Based on that guys made fancy plugin mapper-attachment that can be found here. Follow instruction matching your Elasticsearch version to install that using good old bin\plugin install. After running successful plugin installation you should have list of at least one plugin:

>> bin\plugin.bat list
Installed plugins in D:\p\elasticsearch-2.4.0\plugins:
- mapper-attachments

Step #2 Index docs – Python

You could pass all of document contents manually using Sense or Postman but it could be little troublesome. That’s why automation is good. So I have wrote some Python script that can do it. There is Python3 needed. You just run the python app and it indexes by default all contents of folder files_to_index to your local ES instance. Python library for that is very cool, you just type the following and it opens up connection happy and ready for any query.

from elasticsearch import Elasticsearch
HOST = 'http://localhost:9200'
es = Elasticsearch([HOST])

Step #3 Browse your docs – Browser

Let us show indexed docs! I have made little js browser. You can also check how does it look with scoring for each document. Just run web/view.html and type anything in input at this ultra-simple html page.

right tool to right task

If you only have a hammer, you tend to see every problem as a nail.

Elasticsearch is not a database but search engine. It doesn’t come without cost – indexes with words from all documents can take some space. But whenever you encounter issue with search please consider search tool.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s