This post has been imported from the old blog and has not yet been converted to the new syntax yet.
Over the years,
plenty has been written about string performance, lots of comparisons between String.Concat and StringBuilder. Today I decided to do some of my own research into the subject and
contribute to the knowledge already out there. More specifically, I'll be taking a look at the
memory usage for various concatenation methods and compiler optimizations used to generate the IL.
The test scenario I defined consists out of several methods, each returning the same string. The string I created is supposed to resemble a
real-life scenario. I identified five different ways of concatenating strings for my test. I will be taking a look at the numbers when calling each method once and inside a very small loop of 50 calls, which is another real-life number in my case.
Single line concatenation.The
easiest way of concatenating strings together, by simply putting a plus sign between them.
[csharp]
public string GetPlussedString()
{
string myString = "SELECT column1,"
+ " column2,"
+ " column3,"
+ " column4,"
+ " column5,"
+ " column6,"
+ " FROM table1 t1"
+ " JOIN table2 t2"
+ " ON t1.column1 = t2.column1";
return myString;
}
[/csharp]
Although it seems like we are creating 9 string instances, the compiler
optimizes this into the following IL:
[code]
.method public hidebysig instance string GetPlussedString() cil managed
{
.maxstack 1
.locals init (
[0] string myString)
L_0000: ldstr "SELECT column1, column2, column3, column4, column5, column6, FROM table1 t1 JOIN table2 t2 ON t1.column1 = t2.column1"
L_0005: stloc.0
L_0006: ldloc.0
L_0007: ret
}
[/code]
In reality, we created
one string instance and returned it, which is about the
most efficient way we can achieve.
When profiling the test application, I couldn't even find a call to GetPlussedString in the profiler, which makes me believe the runtime even optimized this.
In total, our application created 113 string instances and
barely used any memory.
Running this in the loop gives the following result:
Important to note is the fact that we still have 113 string instances. This is because .NET used
String Interning on my string and simply returns a reference to that instance over and over.
Variable concatenation.Another frequently used way of concatenating strings is by appending a variable with
the += operator for each line.
[csharp]
public string GetPlussedVarString()
{
string myString = "SELECT column1,";
myString += " column2,";
myString += " column3,";
myString += " column4,";
myString += " column5,";
myString += " column6,";
myString += " FROM table1 t1";
myString += " JOIN table2 t2";
myString += " ON t1.column1 = t2.column1";
return myString;
}
[/csharp]
Things become
messy here, take a look at the generated IL for this code:
[code]
.method public hidebysig instance string GetPlussedVarString() cil managed
{
.maxstack 2
.locals init (
[0] string myString)
L_0000: ldstr "SELECT column1,"
L_0005: stloc.0
L_0006: ldloc.0
L_0007: ldstr " column2,"
L_000c: call string [mscorlib]System.String::Concat(string, string)
L_0011: stloc.0
L_0012: ldloc.0
L_0013: ldstr " column3,"
L_0018: call string [mscorlib]System.String::Concat(string, string)
L_001d: stloc.0
L_001e: ldloc.0
L_001f: ldstr " column4,"
L_0024: call string [mscorlib]System.String::Concat(string, string)
L_0029: stloc.0
L_002a: ldloc.0
L_002b: ldstr " column5,"
L_0030: call string [mscorlib]System.String::Concat(string, string)
L_0035: stloc.0
L_0036: ldloc.0
L_0037: ldstr " column6,"
L_003c: call string [mscorlib]System.String::Concat(string, string)
L_0041: stloc.0
L_0042: ldloc.0
L_0043: ldstr " FROM table1 t1"
L_0048: call string [mscorlib]System.String::Concat(string, string)
L_004d: stloc.0
L_004e: ldloc.0
L_004f: ldstr " JOIN table2 t2"
L_0054: call string [mscorlib]System.String::Concat(string, string)
L_0059: stloc.0
L_005a: ldloc.0
L_005b: ldstr " ON t1.column1 = t2.column1"
L_0060: call string [mscorlib]System.String::Concat(string, string)
L_0065: stloc.0
L_0066: ldloc.0
L_0067: ret
}
[/code]
Every += operation translates into a call to String.Concat()
creating a new temporary string.
Looking at the profiler we end up with 129 string instances, which is 16 more than the our comparison base. These strings can be split up into 8 coming from the 8 calls to String.Concat and from having 8 more strings declared in code.
Calling this 50 times quickly shows the downside of this method. We end up with
408 additional string instances, 400 coming from 50*8 calls to String.Concat and our original 8 extra strings, which got Interned by the way.
Note the explosion in memory size used for this simple example, 73kB vs 16kB.
I
strongly discourage the use of the += operator for string concatenation in these scenarios.
String.Concat(array) concatenation.A less used way of concatenating strings is by using one of the String.Concat overloads which accept a
string array.
[csharp]
public string GetConcatedString()
{
string[] pieces = new string[] {
"SELECT column1,",
" column2,",
" column3,",
" column4,",
" column5,",
" column6,",
" FROM table1 t1",
" JOIN table2 t2",
" ON t1.column1 = t2.column1"
};
return String.Concat(pieces);
}
[/csharp]
This is a
more efficient variation of String.Concat by using it explicitly with a string array, as can be seen in the following IL:
[code]
.method public hidebysig instance string GetConcatedString() cil managed
{
.maxstack 3
.locals init (
[0] string[] pieces,
[1] string[] CS$0$0000)
L_0000: ldc.i4.s 9
L_0002: newarr string
L_0007: stloc.1
L_0008: ldloc.1
L_0009: ldc.i4.0
L_000a: ldstr "SELECT column1,"
L_000f: stelem.ref
L_0010: ldloc.1
L_0011: ldc.i4.1
L_0012: ldstr " column2,"
L_0017: stelem.ref
L_0018: ldloc.1
L_0019: ldc.i4.2
L_001a: ldstr " column3,"
L_001f: stelem.ref
L_0020: ldloc.1
L_0021: ldc.i4.3
L_0022: ldstr " column4,"
L_0027: stelem.ref
L_0028: ldloc.1
L_0029: ldc.i4.4
L_002a: ldstr " column5,"
L_002f: stelem.ref
L_0030: ldloc.1
L_0031: ldc.i4.5
L_0032: ldstr " column6,"
L_0037: stelem.ref
L_0038: ldloc.1
L_0039: ldc.i4.6
L_003a: ldstr " FROM table1 t1"
L_003f: stelem.ref
L_0040: ldloc.1
L_0041: ldc.i4.7
L_0042: ldstr " JOIN table2 t2"
L_0047: stelem.ref
L_0048: ldloc.1
L_0049: ldc.i4.8
L_004a: ldstr " ON t1.column1 = t2.column1"
L_004f: stelem.ref
L_0050: ldloc.1
L_0051: stloc.0
L_0052: ldloc.0
L_0053: call string [mscorlib]System.String::Concat(string[])
L_0058: ret
}
[/code]
This method uses 9 more string instances than our base, which is already better than using += resulting in 16.
These 9 come from the 8 additional strings defined in the code and 1 coming from the single call to String.Concat().
Calling this 50 times will result in 58 additional strings compared to our base, coming from 50 calls to String.Concat() and our 8 additional strings in code (again,
Interned @ work).
Internally the array overload for String.Concat() will first
count the needed length for the result and then create a temporary string variable of the correct length, where as the previous method could not use this optimization since it were 8 separate calls.
StringBuilder.Append() concatenation.Method number four uses a StringBuilder to create a string, as demonstrated in
plenty of tutorials.
[csharp]
public string GetBuildString()
{
StringBuilder builder = new StringBuilder();
builder.Append("SELECT column1,");
builder.Append(" column2,");
builder.Append(" column3,");
builder.Append(" column4,");
builder.Append(" column5,");
builder.Append(" column6,");
builder.Append(" FROM table1 t1");
builder.Append(" JOIN table2 t2");
builder.Append(" ON t1.column1 = t2.column1");
return builder.ToString();
}
[/csharp]
The not so interesting IL for this method simply shows the creation of the object and several method calls.
[code]
.method public hidebysig instance string GetBuildString() cil managed
{
.maxstack 2
.locals init (
[0] class [mscorlib]System.Text.StringBuilder builder)
L_0000: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor()
L_0005: stloc.0
L_0006: ldloc.0
L_0007: ldstr "SELECT column1,"
L_000c: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0011: pop
L_0012: ldloc.0
L_0013: ldstr " column2,"
L_0018: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_001d: pop
L_001e: ldloc.0
L_001f: ldstr " column3,"
L_0024: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0029: pop
L_002a: ldloc.0
L_002b: ldstr " column4,"
L_0030: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0035: pop
L_0036: ldloc.0
L_0037: ldstr " column5,"
L_003c: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0041: pop
L_0042: ldloc.0
L_0043: ldstr " column6,"
L_0048: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_004d: pop
L_004e: ldloc.0
L_004f: ldstr " FROM table1 t1"
L_0054: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0059: pop
L_005a: ldloc.0
L_005b: ldstr " JOIN table2 t2"
L_0060: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0065: pop
L_0066: ldloc.0
L_0067: ldstr " ON t1.column1 = t2.column1"
L_006c: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0071: pop
L_0072: ldloc.0
L_0073: callvirt instance string [mscorlib]System.Object::ToString()
L_0078: ret
}
[/code]
Note I am using a default StringBuilder, which
defaults to a size of 16 characters.
From the profiler we can see this approach created 13 more string instances than our base, from which 8 are again the extra strings in code, one is coming from the final ToString() call and 4 are coming from the internals of the StringBuilder, since it had to
increase its capacity 4 times (At 16 characters, 32, 64 and 128).
Interesting to note here is the fact that the usage of a StringBuilder already
uses less memory than concatenating with += when using 9 strings. Choosing a
good estimate of the target size upon constructing the StringBuilder would have made the difference even bigger.
This becomes even more obvious when comparing the results from the loop:
Using 258 more than our base, 8 Interned strings, 50 ToString() calls and 200 increases inside StringBuilder, we can clearly see the StringBuilder being more efficient than += even taking StringBuilder object creation into account. It is however
not as efficient than the String.Concat(array) method.
StringBuilder.AppendFormat() concatenation.And lastly, for my own personal curiosity, I wanted to see the effect of using Append
Format() versus Append().
[csharp]
public string GetBuildFormatString()
{
// AppendFormat will first parse your string to find {x} instances
// and then fill them in. Afterwards it calls .Append
// Better to simply call .Append several times.
StringBuilder builder = new StringBuilder();
builder.AppendFormat("SELECT {0},", "column1");
builder.AppendFormat(" {0},", "column2");
builder.AppendFormat(" {0},", "column3");
builder.AppendFormat(" {0},", "column4");
builder.AppendFormat(" {0},", "column5");
builder.AppendFormat(" {0},", "column6");
builder.AppendFormat(" FROM {0} t1", "table1");
builder.AppendFormat(" JOIN {0} t2", "table2");
builder.Append(" ON t1.column1 = t2.column1");
return builder.ToString();
}
[/csharp]
This method is the
most inefficient method of the pack. First a look at the IL:
[code]
.method public hidebysig instance string GetBuildFormatString() cil managed
{
.maxstack 3
.locals init (
[0] class [mscorlib]System.Text.StringBuilder builder)
L_0000: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor()
L_0005: stloc.0
L_0006: ldloc.0
L_0007: ldstr "SELECT {0},"
L_000c: ldstr "column1"
L_0011: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_0016: pop
L_0017: ldloc.0
L_0018: ldstr " {0},"
L_001d: ldstr "column2"
L_0022: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_0027: pop
L_0028: ldloc.0
L_0029: ldstr " {0},"
L_002e: ldstr "column3"
L_0033: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_0038: pop
L_0039: ldloc.0
L_003a: ldstr " {0},"
L_003f: ldstr "column4"
L_0044: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_0049: pop
L_004a: ldloc.0
L_004b: ldstr " {0},"
L_0050: ldstr "column5"
L_0055: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_005a: pop
L_005b: ldloc.0
L_005c: ldstr " {0},"
L_0061: ldstr "column6"
L_0066: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_006b: pop
L_006c: ldloc.0
L_006d: ldstr " FROM {0} t1"
L_0072: ldstr "table1"
L_0077: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_007c: pop
L_007d: ldloc.0
L_007e: ldstr " JOIN {0} t2"
L_0083: ldstr "table2"
L_0088: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_008d: pop
L_008e: ldloc.0
L_008f: ldstr " ON t1.column1 = t2.column1"
L_0094: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0099: pop
L_009a: ldloc.0
L_009b: callvirt instance string [mscorlib]System.Object::ToString()
L_00a0: ret
}
[/code]
Inside the AppendFormat() calls is where the
ugly stuff happens. The format is converted to a character array, after which it is scanned for occurrences of {x} and in the end it is being
passed to StringBuilder.Append() anyway.
I didn't spent much time trying to extract a conclusion out of the results of this test, since it's bound to
perform worse than the previous method anyway, since it's the same logic with extra operations.
Interesting to note are the results inside the loop, demonstrating it uses even
more memory than += concatenating.
ConclusionThe conclusions I made for myself and will be following in my
future development are as follows:
- If you can avoid concatenating, do it!
This is a no brainer, if you don't have to concatenate but want your source code to look nice, use the first method. It will get optimized as if it was a single string.
- Don't use += concatenating ever.
Too much changes are taking place behind the scene, which aren't obvious from my code in the first place. I advise to rather use String.Concat() explicitly with any overload (2 strings, 3 strings, string array). This will clearly show what your code does without any surprises, while allowing yourself to keep a check on the efficiency.
- Try to estimate the target size of a StringBuilder.
The more accurate you can estimate the needed size, the less temporary strings the StringBuilder will have to create to increase its internal buffer.
- Do not use any Format() methods when performance is an issue.
Too much overhead is involved in parsing the format, when you could construct an array out of pieces when all you are using are {x} replaces. Format() is good for readability, but one of the things to go when you are squeezing all possible performance out of your application.
One conclusion I am not 100 percent sure of is the difference between using String.Concat(array) and using a StringBuilder. It seems using an array incurs
less memory overhead than using a StringBuilder, unless the cost of array creation is big, which I couldn't determine in my tests. I'd be more than interested to know if someone could provide more detail on this.
The guidelines in
Jouni Heikniemi's article seem to be accurate when comparing between String.Concat(string, string) and StringBuilder, and will be the ones I'll be following, until I get a clear picture of the String.Concat(array) implementation.
Once again, I've uploaded the solution I used as a
.zip file, with an
additional HTML page displaying the results below each other.