Luajit String Interning

Luajit String Interning

Why string interning?

String Interning is a common optimization in most of programming languages.

It has below advantages in luajit:

  • compare string via pointers directly
  • use string as table key efficiently, because the hash is calculated at alloc stage and use pointer as uniqueness
  • save memory for duplicated strings

Drawback

Meanwhile, it brings in extra time-consuming tasks:

  • Hash
  • when collision, memcmp all strings in the same hash slot
  • rehash when collision is high and/or load factor is low

Analyze the overhead of string interning

How expensive is memcmp?

In a signle CPU (Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz), for each string with 1MB, only the last char is different, it could only do memcmp 90 times per second at most.

How expensive is hash collision?

We construct a workload to simulate the cache usecase (only set kv, no get).

Each http request inserts a key-value in the lua table. The key is integer and the value is string of 1MB size. Note that the key is not string, but luajit still does string interning for each string.

The http client would choose to use unique random strings or similar strings.

We use systemtap script to check the lj_str_new time cost differences.

Could we do dynamic string interning?

We could do string interning only when the string is used to:

  • use as table key
  • string comparison

Please check my blog post for detail:

luajit.io/post/2022/luajit-string-interning