Ruby symbols and its garbage collecting

2016, Jun 30    

Symbols are one of the most mysterious things for Ruby beginners. When I was starting with Ruby I also wondered what is going on sometimes and what these symbols are at all. I’m not a programming veteran, but today I know a little bit more about some things and will share some thoughts here with you. This time about Ruby symbols, not accidentally described as things above. We start with the basics about what are symbols and when to use them. Finally, we’ll dive deeper to take a look at how they are managed by the Ruby GC.

Symbols vs. Strings

Bob: What is a Symbol? Is it a String?
Alice: No.
Bob: An Integer?
Alice: Nope.
Bob: So what it is?
Alice: It’s a… an object.

What a surprise! Yes, Symbol is an Object, just like String or Integer, but it is much easier to point out the differences between strings and integers, that’s why it’s hard to define what a Symbol is. We can say that symbols are kind of strings and integers at the same time. Ruby does not create an Object every time we refer to Symbol, Ruby maps each Symbol to an Integer. Like this:

2.2.3 :028 > :ozim.object_id
 => 1112348
2.2.3 :029 > :ozim.object_id
 => 1112348
2.2.3 :030 > "ozim".to_sym.object_id
 => 1112348
2.2.3 :031 > "ozim".to_sym
 => :ozim
2.2.3 :032 > ozim = _
 => :ozim
2.2.3 :033 > ozim.object_id
 => 1112348
2.2.3 :034 > ozim.class
 => Symbol

So if you use the same Symbol, it always has the same object_id. It refers to an Integer and thus it’s immutable. That’s why symbols are mostly not garbage collected (more info later). Imagine the situation that a Symbol is removed from the memory and recreated the next time you want to use it. It would have a different object_id and that would not be an expected Ruby behavior. This is an important fact that makes the symbols so strange for newbies. This is one of the most significant differences between a Symbol and a String.

2.2.3 :042 > "ozim".object_id
 => 16908140
2.2.3 :043 > "ozim".object_id
 => 16873480
2.2.3 :044 > "ozim".object_id
 => 16861900

You see? We had the same id when we used symbols, here we have three strings, each with a different object_id.

Another huge difference is that symbols don’t have the methods you know from strings. Try to invoke to_i or + on a Symbol and you’ll get NoMethodError.

Still not enough? Try to hardcode a Symbol starting from an Integer, just like a String:

2.2.3 :077 > :1symbol
SyntaxError: (irb):77: syntax error, unexpected tINTEGER, expecting tSTRING_CONTENT or tSTRING_DBEG or tSTRING_DVAR or tSTRING_END
:1symbol
  ^
  from /home/ozim/.rvm/rubies/ruby-2.2.3/bin/irb:11:in `<main>'

Of course you can do that using quotes, but in my opinion using symbols wrapped in quotes is not the best shot.

When to use symbols

Hash keys are an excellent example of using symbols. Just take a look: hash keys are things that should remain unchanged in your app forever. Using symbols is exactly what you need in this case, after you use the Symbol for the first time you do not need more memory when using the same object again. Of course, premature optimization is usually a waste of time, but using symbols in some certain situations is definitely a good choice.

Symbols are obviously not the silver bullet and sometimes it’s better to use strings. Especially when you want to use String class’ instance methods. But that’s not the only disadvantage of using symbols - if you have e.g. a Rails app running Ruby version below 2.2 there’s one more important thing to notice.

The DoS problem using symbols in Ruby < 2.2

I assume you know what is the Denial of Service attack. Imagine that you have a method in your controller, which performs a task taking the user params and converting all of them to symbols:

# code in a sample rails controller
params.each(&:to_sym)

In Ruby below 2.2 version, all the symbols created above are not garbage collected. What does it mean? It means that if you allow the user to flood your app with params, you will face a moment when you run out of memory.

As you know a living web app is a long-time running process, so created symbols will stay in the memory for some time. A day, a month, maybe a few years. If someone doesn’t like you and will purposely submit a million of params to your app, it can be dead very fast. Go on if you’re still curious.

What is this garbage collecting?

Garbage Collection is a secret way to automatically manage the memory. You probably use it all the time, even if you don’t know that. This is a thing that allows the programmer to forget about the memory limits, at least at the beginning of the project. This mechanism called Garbage Collector, shortly, removes the objects from memory when it decides the object is not longer in use. It’s a quite complicated part of programming, so it’s better for you if you never have to dig in the dark deep of garbage collecting.

GC of symbols in Ruby >= 2.2

The hardcoded symbols continue to use a mapped Integer ID all the time, so they’re never purged by the Garbage Collector mechanism. By the hardcoded symbols I mean all the elements of Ruby language, variable and method names and constants. Example:

2.2.3 :001 > Symbol.all_symbols.size
 => 3382
2.2.3 :002 > :ozim
 => :ozim
2.2.3 :003 > Symbol.all_symbols.size
 => 3383
2.2.3 :004 > GC.start
 => nil
2.2.3 :005 > Symbol.all_symbols.size
 => 3383
2.2.3 :006 > :ozimeu
 => :ozimeu
2.2.3 :007 > Symbol.all_symbols.size
 => 3384
2.2.3 :008 > GC.start
 => nil
2.2.3 :009 > Symbol.all_symbols.size
 => 3384

Every time we run the GC.start command it flushes all the unnecessary things from the memory, as you can see hardcoded :ozim and :ozimeu objects were not garbage collected. The number of all the symbols remains the same even after launching the Garbage Collector “cleaning” mechanism. Now take a look on all the dynamically created symbols:

2.2.3 :001 > Symbol.all_symbols.size
 => 3382
2.2.3 :002 > "ozim".to_sym
 => :ozim
2.2.3 :003 > Symbol.all_symbols.size
 => 3383
2.2.3 :004 > GC.start
 => nil
2.2.3 :005 > Symbol.all_symbols.size
 => 3382

We started with 3382 symbols, then we created one with to_sym and the Symbol.all_symbols.size incremented by 1. Next we run the Garbage Collector and it removed the dynamically created Symbol from the memory. It’s a good news when we talk about that potential DoS attack problem and I think it’s actually a nice improvement.

Sum up

Using Symbol instead of String is a good idea when you don’t plan to change the Object too often, preferably never :-) Symbols are immutable and improve the memory usage. It’s “DoS-safe” to convert user input to symbols when you use Ruby version above 2.2. The lack of Symbol instance methods compared to String may be annoying, but quite often that means you want to change the Object, so you should consider using String then.