白金之星札記: 2015

2015年5月26日星期二

Elasticsearch Class Diagram for Shard Routing and Preference

RotationShardShuffler is initialized with a random seed, then the seed will be increased by one (round-robin) on every certain action.

2015年5月8日星期五

Elasticsearch class diagram about Searching

2015年5月5日星期二

Interview with LZ4 (Extremely Fast Compression algorithm)

Why is LZ4 so fast?

Fast scan strategy

xxHash (Extremely fast non-cryptographic hash algorithm)

Multi-threading
Reduced memory usage fits into Intel x86 L1 cache

#define LZ4_MEMORY_USAGE 14 (default 16 KB, see lz4.h)

Reference:

Official site:

LZ4 - Extremely fast compression
LZ4 Java

JNI (fastest)
Pure Java
Java uses sun.misc.Unsafe API

2015年5月4日星期一

Elasticsearch and Lucene class diagram about Indexing

2015年4月23日星期四

Performance tuning for Elasticsearch

Some important environment variables about performance tuning for Elasticsearch.

Linux
1. max_file_descriptors: 65536
2. vm.max_map_count: 262144 (per process)
elasticsearch.yml
1. bootstrap.mlockall: true
2. discovery.zen.minimum_master_nodes
3. gateway.recover_after_nodes
4. gateway.expected_nodes
5. gateway.recover_after_time
6. index.number_of_shards
7. index.number_of_replicas
8. index.refresh_interval
9. indices.fielddata.cache.size: 50%
10. indices.breaker.fielddata.limit: 50%
Create index
1. index.number_of_shards
2. index.number_of_replicas
3. index.refresh_interval: 30s
4. #index.merge.policy.type: tiered
5. #index.translog.flush_threshold_size
6. #index.search.slowlog.threshold.query
7. #index.search.slowlog.threshold.fetch
8. #index.routing.allocation.include.box_type
9. #indices.memory.index_buffer_size
10. #indices.memory.min_index_buffer_size
11. #indices.memory.min_shard_index_buffer_size
12. #indices.ttl.interval
Search time tips
1. _optimize?max_num_segments=1 (less segments more efficiency)
2. Index per Time Frame
3. Faking Index per User with Aliases
4. shard_size & size, by default, shard_size = size * shards
5. Index Warmer (suffer refresh time)
6. collect_mode: breadth_first
Fielddata

enable doc_values, it will use mmapfs by default
Fielddata Filtering
Eagerly Loading Fielddata (suffer merge time)
Global ordinals
Eager Global ordinals (suffer refresh time)

You could google above terms for more information.

訂閱：文章 (Atom)