Tuesday, May 31, 2016

How to setup Elasticsearch Custer in Centos

I have followed these steps in order to setup Elastic search in production.

# OS Requirements: Centos 6+ & Java 1.8+    


Installing Java

Download JDK from : http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html 
tar xzvf jdk.tar.gz
sudo mkdir /usr/local/java
sudo mv jdk1.8.0_45 /usr/local/java/
sudo ln -s /usr/local/java/jdk1.8.0_45 /usr/local/java/jdk
export PATH="$PATH:/usr/local/java/jdk/bin"
export JAVA_HOME=/usr/local/java/jdk1.8.0_91/jre
sudo sh -c "echo export JAVA_HOME=/usr/local/java/jdk1.8.0_91/jre >> /etc/environment"


Installing Elasticsearch

wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.3.3/elasticsearch-2.3.3.rpm
sudo rpm -ivh elasticsearch-2.3.3.rpm

Step 3:

Configure Elasticsearch

sudo vi /etc/elasticsearch/elasticsearch.yml

cluster.name: cluster_name # [cluster name. common name given to all elasticsearch servers to join the same cluster]
node.name: hostname-31 #[servername]
node.master: false #[set true only for master node]
node.data: true #[set true for all nodes, except for dedicated master node]

#[This is the path where all data is stored. Give one or multiple paths. Directories must have permsission to user elasticsearch] 
# chown elasticsearch:elasticsearch /home/es/
path.data: ["/home/es/data/es"] 

#[This is the path where all log files are stored. Directories must have permsission to user elasticsearch] 
#chown elasticsearch:elasticsearch /var/logs/es
path.logs: /var/logs/es 

path.work: /home/es/work

bootstrap.mlockall: true #[This setting is must for production]

network.host: "hostname.com"
#http.port: 9200 [9200 is default port. if needed modify this]

#Enter all master nodes
discovery.zen.ping.unicast.hosts: ["hostname.com"]

# calculate by formula (total number of nodes / 2 + 1)
discovery.zen.minimum_master_nodes: 3

# required for production
node.max_local_storage_nodes: 1

# -------------------------------- Threads ------------------
threadpool.index.type: fixed
threadpool.index.size: 40
threadpool.index.queue_size: 10000

threadpool.search.type: fixed
threadpool.search.size: 40
threadpool.search.queue_size: 10000

threadpool.bulk.type: fixed
threadpool.bulk.size: 40
threadpool.bulk.queue_size: 30000

#------------------ allocate memory for write --------------
  indices.memory.index_buffer_size: 30%

#------------------ shard --------------
cluster.routing.allocation.same_shard.host: true

Step 4:

Configure Elasticsearch System Level Setting
# (Important setting)

sudo vi /etc/sysconfig/elasticsearch

#setheap size to maximum 32GB or Half of the RAM size

Step 4:

Change number of open files
Change ulimit Setting: [http://stackoverflow.com/a/36142698/453486]

sudo vi /etc/security/limits.conf

#add line: 
elasticsearch - nofile 65535

Step 5:

disable swap

sudo vi /etc/sysctl.conf
#add lines: 

Step 6:

Installing plugins for monitoring cluster

sudo /usr/share/elasticsearch/bin/plugin install mobz/elasticsearch-head
sudo /usr/share/elasticsearch/bin/plugin install lmenezes/elasticsearch-kopf

# reboot system for the system level settings to work. /sbin/reboot

Step 7:

To Start/Stop/Restart Elasticsearch

sudo /etc/init.d/elasticsearch start
sudo /etc/init.d/elasticsearch stop
sudo /etc/init.d/elasticsearch restart

Tuesday, May 24, 2016

How to disable Full text search in ElasticSearch

Elastic search will index every field and every word within a value.

For Example:

Document 1 has : "text": "Hello World"

Document 2 has : "text": "Hello Srikanth"

ElasticSearch by default will create many indexes and in that the 3 index would be, ["Hello", "World", "Srikanth"]

In some case we want to disable the Full text search, So that we can aggregate by that value.

For Example:

Document 1 has : "filepath": "/home/srikanth/1.c"
Document 2 has : "filepath": "/home/srikanth/2.c"

By default, ElasticSearch will index these documents by ["home", "srikanth, ".c"] , So at the time of aggregating with the path, these values will mess up the aggregated document count.

So we have to tell ElasticSearch, not to index the data by

By this we tell ElasticSearch, that we will always search by the full string and not by sub-strings.