Tuesday, June 27, 2017

elasticsearch: distributing indices over multiple disk volumes

Leave a Comment

I have one index which is quite large (about 100Gb), so I had to extend my disk space on my digital ocean survey by adding another volume (I run everything on only one node). I told elasticsearch that it now has to consider two disk locations by

/usr/share/elasticsearch/bin/elasticsearch -Epath.data=/var/lib/elasticsearch,/mnt/volume-sfo2-01/es_data 

elasticsearch does seem to have taken notice of this since it wrote some stuff to the new location

/mnt/volume-sfo2-01/es_data# cd nodes/ /mnt/volume-sfo2-01/es_data/nodes# ls 0 /mnt/volume-sfo2-01/es_data/nodes# cd 0/ /mnt/volume-sfo2-01/es_data/nodes/0# ls indices  node.lock  _state /mnt/volume-sfo2-01/es_data/nodes/0# cd indices /mnt/volume-sfo2-01/es_data/nodes/0/indices# ls DixLGLrJRXm1gSYcFzkzzw  nmZbce8wTayJC2s_eMC0-g  Qd-9ZnFIRoSM2z7AohKm-w  Sm_tyYTJTty0ImvDamFaQw /mnt/volume-sfo2-01/es_data/nodes/0/indices# cd DixLGLrJRXm1gSYcFzkzzw/ /mnt/volume-sfo2-01/es_data/nodes/0/indices/DixLGLrJRXm1gSYcFzkzzw# ls _state 

which is identical to the stuff I find in /var/lib/elasticsearch/data, except of the actual index information in the lowest level.

Reading the elasticsearch documentary I got the impression that elasticsearch is arranging the new index over the two disk locations, but will not split a shard between the two locations. So I initialized the index with 5 shards so that it can split the data between the volumes.

The survey does seem to have detected the two data paths since the log file shows

[2017-06-17T19:16:57,079][INFO ][o.e.e.NodeEnvironment    ] [WU6cQ-o] using [2] data paths, mounts [[/ (/dev/vda1), /mnt/volume-sfo2-01 (/dev/sda)]], net usable_space [29.6gb], net total_space [98.1gb], spins? [possibly], types [ext4] 

However, when I index the new indices, with constantly uses all the disk space on my original disk and eventually runs out of disk space with the error

raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) elasticsearch.exceptions.TransportError: TransportError(500, u'index_failed_engine_exception', u'Index failed for [pubmed_paper#25949809]') 

It never shifts one of the shards to the second volume? Do I miss anything? Can I manually guide the disk space usage?

Here are the elasticsearch version details:

# curl -XGET 'localhost:9200' {   "name" : "WU6cQ-o",   "cluster_name" : "elasticsearch",   "cluster_uuid" : "hKc147QfQqCefLliStLNtw",   "version" : {     "number" : "5.1.1",     "build_hash" : "5395e21",     "build_date" : "2016-12-06T12:36:15.409Z",     "build_snapshot" : false,     "lucene_version" : "6.3.0"   },   "tagline" : "You Know, for Search" } 

and here is the default path file structure, where ekasticsearch stores all the information (instead of sharing it with the second path)

/var/lib/elasticsearch/elasticsearch/nodes/0/indices/DixLGLrJRXm1gSYcFzkzzw# ls 0  1  2  3  4  _state 

one question is probably whether I can just take one of these shards and move it to the other location?

0 Answers

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment