Deze post is geïmporteerd van de oude blog en is nog niet geconverteerd naar de nieuwe syntax.
Over the years, plenty has been written about string performance, lots of comparisons between String.Concat and StringBuilder. Today I decided to do some of my own research into the subject and contribute to the knowledge already out there. More specifically, I'll be taking a look at the memory usage for various concatenation methods and compiler optimizations used to generate the IL.

The test scenario I defined consists out of several methods, each returning the same string. The string I created is supposed to resemble a real-life scenario. I identified five different ways of concatenating strings for my test. I will be taking a look at the numbers when calling each method once and inside a very small loop of 50 calls, which is another real-life number in my case.

Single line concatenation.

The easiest way of concatenating strings together, by simply putting a plus sign between them.

[csharp]
public string GetPlussedString()
{
string myString = "SELECT column1,"
+ " column2,"
+ " column3,"
+ " column4,"
+ " column5,"
+ " column6,"
+ " FROM table1 t1"
+ " JOIN table2 t2"
+ " ON t1.column1 = t2.column1";
return myString;
}
[/csharp]

Although it seems like we are creating 9 string instances, the compiler optimizes this into the following IL:

[code]
.method public hidebysig instance string GetPlussedString() cil managed
{
.maxstack 1
.locals init (
[0] string myString)
L_0000: ldstr "SELECT column1, column2, column3, column4, column5, column6, FROM table1 t1 JOIN table2 t2 ON t1.column1 = t2.column1"
L_0005: stloc.0
L_0006: ldloc.0
L_0007: ret
}
[/code]

In reality, we created one string instance and returned it, which is about the most efficient way we can achieve.

When profiling the test application, I couldn't even find a call to GetPlussedString in the profiler, which makes me believe the runtime even optimized this.

GetPlussedString Single Call

In total, our application created 113 string instances and barely used any memory.

Running this in the loop gives the following result:

GetPlussedString Multiple Calls

Important to note is the fact that we still have 113 string instances. This is because .NET used String Interning on my string and simply returns a reference to that instance over and over.

Variable concatenation.

Another frequently used way of concatenating strings is by appending a variable with the += operator for each line.

[csharp]
public string GetPlussedVarString()
{
string myString = "SELECT column1,";
myString += " column2,";
myString += " column3,";
myString += " column4,";
myString += " column5,";
myString += " column6,";
myString += " FROM table1 t1";
myString += " JOIN table2 t2";
myString += " ON t1.column1 = t2.column1";
return myString;
}
[/csharp]

Things become messy here, take a look at the generated IL for this code:

[code]
.method public hidebysig instance string GetPlussedVarString() cil managed
{
.maxstack 2
.locals init (
[0] string myString)
L_0000: ldstr "SELECT column1,"
L_0005: stloc.0
L_0006: ldloc.0
L_0007: ldstr " column2,"
L_000c: call string [mscorlib]System.String::Concat(string, string)
L_0011: stloc.0
L_0012: ldloc.0
L_0013: ldstr " column3,"
L_0018: call string [mscorlib]System.String::Concat(string, string)
L_001d: stloc.0
L_001e: ldloc.0
L_001f: ldstr " column4,"
L_0024: call string [mscorlib]System.String::Concat(string, string)
L_0029: stloc.0
L_002a: ldloc.0
L_002b: ldstr " column5,"
L_0030: call string [mscorlib]System.String::Concat(string, string)
L_0035: stloc.0
L_0036: ldloc.0
L_0037: ldstr " column6,"
L_003c: call string [mscorlib]System.String::Concat(string, string)
L_0041: stloc.0
L_0042: ldloc.0
L_0043: ldstr " FROM table1 t1"
L_0048: call string [mscorlib]System.String::Concat(string, string)
L_004d: stloc.0
L_004e: ldloc.0
L_004f: ldstr " JOIN table2 t2"
L_0054: call string [mscorlib]System.String::Concat(string, string)
L_0059: stloc.0
L_005a: ldloc.0
L_005b: ldstr " ON t1.column1 = t2.column1"
L_0060: call string [mscorlib]System.String::Concat(string, string)
L_0065: stloc.0
L_0066: ldloc.0
L_0067: ret
}
[/code]

Every += operation translates into a call to String.Concat() creating a new temporary string.

Looking at the profiler we end up with 129 string instances, which is 16 more than the our comparison base. These strings can be split up into 8 coming from the 8 calls to String.Concat and from having 8 more strings declared in code.

GetPlussedVarString Single Call

Calling this 50 times quickly shows the downside of this method. We end up with 408 additional string instances, 400 coming from 50*8 calls to String.Concat and our original 8 extra strings, which got Interned by the way.

GetPlussedVarString Multiple Calls

Note the explosion in memory size used for this simple example, 73kB vs 16kB.

I strongly discourage the use of the += operator for string concatenation in these scenarios.

String.Concat(array) concatenation.

A less used way of concatenating strings is by using one of the String.Concat overloads which accept a string array.

[csharp]
public string GetConcatedString()
{
string[] pieces = new string[] {
"SELECT column1,",
" column2,",
" column3,",
" column4,",
" column5,",
" column6,",
" FROM table1 t1",
" JOIN table2 t2",
" ON t1.column1 = t2.column1"
};
return String.Concat(pieces);
}
[/csharp]

This is a more efficient variation of String.Concat by using it explicitly with a string array, as can be seen in the following IL:

[code]
.method public hidebysig instance string GetConcatedString() cil managed
{
.maxstack 3
.locals init (
[0] string[] pieces,
[1] string[] CS$0$0000)
L_0000: ldc.i4.s 9
L_0002: newarr string
L_0007: stloc.1
L_0008: ldloc.1
L_0009: ldc.i4.0
L_000a: ldstr "SELECT column1,"
L_000f: stelem.ref
L_0010: ldloc.1
L_0011: ldc.i4.1
L_0012: ldstr " column2,"
L_0017: stelem.ref
L_0018: ldloc.1
L_0019: ldc.i4.2
L_001a: ldstr " column3,"
L_001f: stelem.ref
L_0020: ldloc.1
L_0021: ldc.i4.3
L_0022: ldstr " column4,"
L_0027: stelem.ref
L_0028: ldloc.1
L_0029: ldc.i4.4
L_002a: ldstr " column5,"
L_002f: stelem.ref
L_0030: ldloc.1
L_0031: ldc.i4.5
L_0032: ldstr " column6,"
L_0037: stelem.ref
L_0038: ldloc.1
L_0039: ldc.i4.6
L_003a: ldstr " FROM table1 t1"
L_003f: stelem.ref
L_0040: ldloc.1
L_0041: ldc.i4.7
L_0042: ldstr " JOIN table2 t2"
L_0047: stelem.ref
L_0048: ldloc.1
L_0049: ldc.i4.8
L_004a: ldstr " ON t1.column1 = t2.column1"
L_004f: stelem.ref
L_0050: ldloc.1
L_0051: stloc.0
L_0052: ldloc.0
L_0053: call string [mscorlib]System.String::Concat(string[])
L_0058: ret
}
[/code]

This method uses 9 more string instances than our base, which is already better than using += resulting in 16.

These 9 come from the 8 additional strings defined in the code and 1 coming from the single call to String.Concat().

GetConcatedString Single Call

Calling this 50 times will result in 58 additional strings compared to our base, coming from 50 calls to String.Concat() and our 8 additional strings in code (again, Interned @ work).

GetConcatedString Multiple Calls

Internally the array overload for String.Concat() will first count the needed length for the result and then create a temporary string variable of the correct length, where as the previous method could not use this optimization since it were 8 separate calls.

StringBuilder.Append() concatenation.

Method number four uses a StringBuilder to create a string, as demonstrated in plenty of tutorials.

[csharp]
public string GetBuildString()
{
StringBuilder builder = new StringBuilder();
builder.Append("SELECT column1,");
builder.Append(" column2,");
builder.Append(" column3,");
builder.Append(" column4,");
builder.Append(" column5,");
builder.Append(" column6,");
builder.Append(" FROM table1 t1");
builder.Append(" JOIN table2 t2");
builder.Append(" ON t1.column1 = t2.column1");
return builder.ToString();
}
[/csharp]

The not so interesting IL for this method simply shows the creation of the object and several method calls.

[code]
.method public hidebysig instance string GetBuildString() cil managed
{
.maxstack 2
.locals init (
[0] class [mscorlib]System.Text.StringBuilder builder)
L_0000: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor()
L_0005: stloc.0
L_0006: ldloc.0
L_0007: ldstr "SELECT column1,"
L_000c: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0011: pop
L_0012: ldloc.0
L_0013: ldstr " column2,"
L_0018: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_001d: pop
L_001e: ldloc.0
L_001f: ldstr " column3,"
L_0024: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0029: pop
L_002a: ldloc.0
L_002b: ldstr " column4,"
L_0030: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0035: pop
L_0036: ldloc.0
L_0037: ldstr " column5,"
L_003c: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0041: pop
L_0042: ldloc.0
L_0043: ldstr " column6,"
L_0048: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_004d: pop
L_004e: ldloc.0
L_004f: ldstr " FROM table1 t1"
L_0054: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0059: pop
L_005a: ldloc.0
L_005b: ldstr " JOIN table2 t2"
L_0060: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0065: pop
L_0066: ldloc.0
L_0067: ldstr " ON t1.column1 = t2.column1"
L_006c: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0071: pop
L_0072: ldloc.0
L_0073: callvirt instance string [mscorlib]System.Object::ToString()
L_0078: ret
}
[/code]

Note I am using a default StringBuilder, which defaults to a size of 16 characters.

GetBuildString Single Call

From the profiler we can see this approach created 13 more string instances than our base, from which 8 are again the extra strings in code, one is coming from the final ToString() call and 4 are coming from the internals of the StringBuilder, since it had to increase its capacity 4 times (At 16 characters, 32, 64 and 128).

Interesting to note here is the fact that the usage of a StringBuilder already uses less memory than concatenating with += when using 9 strings. Choosing a good estimate of the target size upon constructing the StringBuilder would have made the difference even bigger.

This becomes even more obvious when comparing the results from the loop:

GetBuildString Multiple Calls

Using 258 more than our base, 8 Interned strings, 50 ToString() calls and 200 increases inside StringBuilder, we can clearly see the StringBuilder being more efficient than += even taking StringBuilder object creation into account. It is however not as efficient than the String.Concat(array) method.

StringBuilder.AppendFormat() concatenation.

And lastly, for my own personal curiosity, I wanted to see the effect of using AppendFormat() versus Append().

[csharp]
public string GetBuildFormatString()
{
// AppendFormat will first parse your string to find {x} instances
// and then fill them in. Afterwards it calls .Append
// Better to simply call .Append several times.
StringBuilder builder = new StringBuilder();
builder.AppendFormat("SELECT {0},", "column1");
builder.AppendFormat(" {0},", "column2");
builder.AppendFormat(" {0},", "column3");
builder.AppendFormat(" {0},", "column4");
builder.AppendFormat(" {0},", "column5");
builder.AppendFormat(" {0},", "column6");
builder.AppendFormat(" FROM {0} t1", "table1");
builder.AppendFormat(" JOIN {0} t2", "table2");
builder.Append(" ON t1.column1 = t2.column1");
return builder.ToString();
}
[/csharp]

This method is the most inefficient method of the pack. First a look at the IL:

[code]
.method public hidebysig instance string GetBuildFormatString() cil managed
{
.maxstack 3
.locals init (
[0] class [mscorlib]System.Text.StringBuilder builder)
L_0000: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor()
L_0005: stloc.0
L_0006: ldloc.0
L_0007: ldstr "SELECT {0},"
L_000c: ldstr "column1"
L_0011: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_0016: pop
L_0017: ldloc.0
L_0018: ldstr " {0},"
L_001d: ldstr "column2"
L_0022: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_0027: pop
L_0028: ldloc.0
L_0029: ldstr " {0},"
L_002e: ldstr "column3"
L_0033: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_0038: pop
L_0039: ldloc.0
L_003a: ldstr " {0},"
L_003f: ldstr "column4"
L_0044: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_0049: pop
L_004a: ldloc.0
L_004b: ldstr " {0},"
L_0050: ldstr "column5"
L_0055: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_005a: pop
L_005b: ldloc.0
L_005c: ldstr " {0},"
L_0061: ldstr "column6"
L_0066: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_006b: pop
L_006c: ldloc.0
L_006d: ldstr " FROM {0} t1"
L_0072: ldstr "table1"
L_0077: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_007c: pop
L_007d: ldloc.0
L_007e: ldstr " JOIN {0} t2"
L_0083: ldstr "table2"
L_0088: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object)
L_008d: pop
L_008e: ldloc.0
L_008f: ldstr " ON t1.column1 = t2.column1"
L_0094: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
L_0099: pop
L_009a: ldloc.0
L_009b: callvirt instance string [mscorlib]System.Object::ToString()
L_00a0: ret
}
[/code]

Inside the AppendFormat() calls is where the ugly stuff happens. The format is converted to a character array, after which it is scanned for occurrences of {x} and in the end it is being passed to StringBuilder.Append() anyway.

I didn't spent much time trying to extract a conclusion out of the results of this test, since it's bound to perform worse than the previous method anyway, since it's the same logic with extra operations.

GetBuildFormatString Single Call

Interesting to note are the results inside the loop, demonstrating it uses even more memory than += concatenating.

GetBuildFormatString Multiple Calls

Conclusion

The conclusions I made for myself and will be following in my future development are as follows:


  • If you can avoid concatenating, do it!
    This is a no brainer, if you don't have to concatenate but want your source code to look nice, use the first method. It will get optimized as if it was a single string.
     


  • Don't use += concatenating ever.
    Too much changes are taking place behind the scene, which aren't obvious from my code in the first place. I advise to rather use String.Concat() explicitly with any overload (2 strings, 3 strings, string array). This will clearly show what your code does without any surprises, while allowing yourself to keep a check on the efficiency.
     


  • Try to estimate the target size of a StringBuilder.
    The more accurate you can estimate the needed size, the less temporary strings the StringBuilder will have to create to increase its internal buffer.
     


  • Do not use any Format() methods when performance is an issue.
    Too much overhead is involved in parsing the format, when you could construct an array out of pieces when all you are using are {x} replaces. Format() is good for readability, but one of the things to go when you are squeezing all possible performance out of your application.



One conclusion I am not 100 percent sure of is the difference between using String.Concat(array) and using a StringBuilder. It seems using an array incurs less memory overhead than using a StringBuilder, unless the cost of array creation is big, which I couldn't determine in my tests. I'd be more than interested to know if someone could provide more detail on this.

The guidelines in Jouni Heikniemi's article seem to be accurate when comparing between String.Concat(string, string) and StringBuilder, and will be the ones I'll be following, until I get a clear picture of the String.Concat(array) implementation.

Once again, I've uploaded the solution I used as a .zip file, with an additional HTML page displaying the results below each other.
 
Reacties: 15
 
  • GeekkFromIndia

    Thanks David. This blog was really helpful.

     
     
  • Reflecting String.Contact(String[]) reveals that internally, another array is allocated with the same length as the argument. The total size of the strings is summed and a new string is created with the actual length.

    This means that two copies of your array would be allocated. I think this can be problematic for large arrays. The only benefit seems to be a pass whereby the final string size is known before final string allocation.

    Contrast this with StringBuilder, where if you knew the approximate size of the resulting string, you could initialize the capacity to something large enough to hold it. Combine this with the fact that array concatenation would seem strange to me as a code reader and the solution becomes clear: Use a stringbuilder and guess the initial capacity. If you are unsure of the capacity, trace the actual length of your resulting strings and adjust.

     
     
  • I came to the same conclusion today Brandon, when having to concatenate strings, I thought by myself 'when would I ever be using the array overload?', and indeed, it'll be a very rare occurrence.

    In the future I'll be using StringBuilder according to Jouni Heikniemi's guidelines.

    Quoted:
    "If you have no idea on the resulting string size, use StringBuilder if you have at least 7 concatenations.

    If you can roughly (with 30% accuracy) estimate the resulting string size, use StringBuilder if you have at least 5 concatenations.

    If you can estimate the resulting string size with good accuracy, use StringBuilder if you have at least 3 concatenations.

    Under no conditions is StringBuilder faster for less than 3 concatenations.

    StringBuilder beats strings for 10+ concatenations in every practical situation.

    The longer the strings are, the more final string size estimations will help you (but accuracy becomes more critical)."

    I've also come to the conclusion that simply estimating the minimum size is a good start as well.

    eg: I know I'm going to be appending StackTraces, so the StringBuilder I'm creating starts with an initial size of 256 or 512 (I'm using those values since the StringBuilder has 16 as default and would always be doubling them, this I'm saving out some initial doubling)

     
     
  • So what happens if you have a scenario 1 but instead you have a dynamic variable in the concats? I suppose that'll be the same as scenario 2 then?

    As a general rule if I do 3-4 concats I tend to use + operators if only for readability. Antyhing more I use a StringBuilder.

    While this stuff makes sense in real performance critical scenarios, I would consider this the sort of micro profiling that yields the least performance gains in most applications.

     
     
  • If I remember correctly (not verified right now), doing: return "String" + var + "string"; will result in one String.Concat call under the hood (using the string, string, string overload).

    Normally it'll do this till 4 string overloads before creating scenario 3 internally. (That's what I've come to understand from the references on top of my article).

     
     
  • Hi David,

    I do have an older blog post on the use of StringBuilder at http://community.bartdesmet.net/blogs/bart/archive/2005/10/04/3583.aspx - it might be an interesting read as well. Notice it's written in a pre-.NET 2.0 manner without the use of the Stopwatch class to measure perf results.

    -Bart

     
     
  • Very informative and well researched article. I also did some research into StringBuilder vs "+" but this sheds much better light into the generated IL and memory usage.

    By the way, what was the application you used to profile your test code and to produce those profiling charts? Chinh

     
     
  • I used the CLR Profiler 2.0, a (free) tool from Microsoft. You can download it at:

    http://www.microsoft.com/downloads/details.aspx?FamilyID=A362781C-3870-43BE-8926-862B40AA0CD0&displaylang=en

     
     
  • Jose Delli Gatti

    Hi,

    I'm really impressed by the depth of your research and the quality of this article.

    It's clear and concise.

    Thanks for the contribution!

     
     
  • bob scola

    Your test case, GetPlussedString(), is bogus. With it you have only discovered that the compiler is smart enough to pre-concatinate literals and consts, (which is great to know for code reviews) but it has nothing to do with runtime concatenating of strings. Therefore you’ve learned nothing in GetPlussedString(). About GetPlussedString(), "is about the most efficient way we can achieve". No, as you haven’t achieved anything, other than breaking up literals for readability in your source without incurring a performance penalty. Perhaps if you had used actual string variables instead, you would have learned something from your first example.

     
     
  • Great article! Thanks. I knew about the differences between String and StringBuilder, but I'm really impressed by the depth of the explanations. As for the StringBuilder.AppendFormat(), the mechanism is a big suprise to me.

     
     
  • Alan J

    Just done a load of benchmarking from a time perspective on operation, variable, array, builder and format based concatenation and found the following "rules" (in relative terms since absolute is too situationally dependant):

    - for a determinable end string length, the fastest method is the operation method (#1 above) for = 5

    - for an undeterminable end string length, the fastest method is the operation method (#1 above) for = 7. However since dynamically sizing an array has it's own overhead, for almost all practical purposes a null instantiated string builder is more efficient

    - string.format is an interesting case because it allows you to effectively concatenate fewer strings than you would have to using another method (cf. building a string to match the format "a{0}b{1}c"), practically though it's still the least efficient method for any situation where you aren't formatting about 7 fewer strings than a builder would concatenate. I can think of no situation where that could reasonably occur.

    This essentially matches David and Jouni Heikniemi's findings: think about using stringbuilder at around 5-7 concats

     
     
  • Sumesh

    Is string.format() that nasty as stringbuilder.appendformat? CLR profiler shows it as best option in following case;


    //;Way1

    //string s1 = string.Format("Data Source={0}; Initial Catalog={1}", "HA", "HA");

    //;Way2
    //string s = "Data Source=";
    //s+="HA";
    //s += "Initial Catalog=";
    //s += "HA";

    //;Way3
    string[] pieces = new string[] {"Data Source={0}; ", "Initial Catalog={1}", "HA", "HA"};
    string s1 = string.Concat(pieces);

     
     
  • Ed

    A good article, but I think what needs mentioning is that string.Format is not for concatenating strings, it's for formatting them, typically in a culture-sensitive manner.

    String.Format is powerful, but powerful tools often sacrifice raw speed for robust features.

    String.Format is very readable, and for localization issues it's extremely powerful. If you have a resource file that has a literal called MyFormat being:
    en-US = "There are {0} cars in {1} garages."
    fr-FR = "Dans {1} garage, il est {0} auto."
    (pardon my French, just for illustration)

    String.Format is also good in this manner because the culture-specific format string can have date/time, precision and other formatting which will vary culture to culture. String.Format allows the code to move from culture-hardcoded to culture agnostic and is very portable and robust at the same time.

    String.Concat on the other hand is raw speed. If you can get away with it, use it.

    If you have a variable number of concatenations, then a StringBuilder is the way to go.

    If you need to format strings and address internationalization issues, there's nothing better than string.format.

     
     
  • @Sumesh
    Way1: // 4 string instances, 3 pieces, 1 format
    Way2: // 7 string instances, 4 pieces and 3 concat operations
    Way3: / 4 instances, 4 array pieces, 1 concat

    Way4 however would just replace Way2 by leaving out the "=" signs and your result is a single string in IL code ;)

    You'd have to run your samples in a loop (100?) and look at the profiler in that case. If interning works, it doesn't really matter much, since you'd have 4/5 strings at most, if it's not at work you should see ending up with 400+ ish strings

     
     
  • Reageer
    Items aangeduid met * zijn verplicht. (Naam, Email, Commentaar)
    Enkele items ontbreken of zijn fout ingevuld.
     
     
     
    Om zeker te zijn dat je geen computer bent, typ de onderstaande tekst over.