2015年5月26日 星期二

Elasticsearch Class Diagram for Shard Routing and Preference


RotationShardShuffler is initialized with a random seed, then the seed will be increased by one (round-robin) on every certain action.

2015年5月5日 星期二

Interview with LZ4 (Extremely Fast Compression algorithm)

Why is LZ4 so fast?
  1. Fast scan strategy
    • xxHash (Extremely fast non-cryptographic hash algorithm)
  2. Multi-threading
  3. Reduced memory usage fits into Intel x86 L1 cache
    • #define LZ4_MEMORY_USAGE 14 (default 16 KB, see lz4.h)
Reference:
  1. LZ4 - Extremely fast compression
  2. LZ4 Java
    1. JNI (fastest)
    2. Pure Java
    3. Java uses sun.misc.Unsafe API

2015年4月23日 星期四

Performance tuning for Elasticsearch

Some important environment variables about performance tuning for Elasticsearch.
  1. Linux
    1. max_file_descriptors: 65536
    2. vm.max_map_count: 262144 (per process)
  2. elasticsearch.yml
    1. bootstrap.mlockall: true
    2. discovery.zen.minimum_master_nodes
    3. gateway.recover_after_nodes
    4. gateway.expected_nodes
    5. gateway.recover_after_time
    6. index.number_of_shards
    7. index.number_of_replicas
    8. index.refresh_interval
    9. indices.fielddata.cache.size: 50%
    10. indices.breaker.fielddata.limit: 50%
  3. Create index
    1. index.number_of_shards
    2. index.number_of_replicas
    3. index.refresh_interval: 30s
    4. #index.merge.policy.type: tiered
    5. #index.translog.flush_threshold_size
    6. #index.search.slowlog.threshold.query
    7. #index.search.slowlog.threshold.fetch
    8. #index.routing.allocation.include.box_type
    9. #indices.memory.index_buffer_size
    10. #indices.memory.min_index_buffer_size
    11. #indices.memory.min_shard_index_buffer_size
    12. #indices.ttl.interval
  4. Search time tips
    1. _optimize?max_num_segments=1 (less segments more efficiency)
    2. Index per Time Frame
    3. Faking Index per User with Aliases
    4. shard_size & size, by default, shard_size = size * shards
    5. Index Warmer (suffer refresh time)
    6. collect_mode: breadth_first
  5. Fielddata
    1. enable doc_values, it will use mmapfs by default
    2. Fielddata Filtering
    3. Eagerly Loading Fielddata (suffer merge time)
    4. Global ordinals
    5. Eager Global ordinals (suffer refresh time)
You could google above terms for more information.