Thursday 20 February 2014

Understanding yield return, yield break in C# for C/C++ programmers

Introduction

These two things are very simple keywords in C# which don't exist on C/C++. I am writing this article for somebody learning C# with C/C++ background.

Basics

IEnumerator: An iterator which can perform the following; Just imagine POSITION in MFC or iterator in STL. 
  • MoveNext(): Get next value
  • Reset(): Return to the initial position
  • Current: Current value 
IEnumerable: A set of iterators. One of great advantages to use this type is "foreach" statement which is also supported in STL and the new C++11 standard. But this supports much more - data query, conversion, ... 

Example of IEnumerator & yield return / break 

(Code 1 - a function to return IEnumerator) 
               /// <summary>
               /// Iterate natural numbers until threshold reaches
               /// </summary>
               public static IEnumerator< int> GetNaturalEnumerator ()
              {
                      for (int i = 1; i < 99999999; i++)
                     {
                            Console.WriteLine ("GetNaturalEnumerator - Before YieldReturn");
                            yield return i;
                            Console.WriteLine ("GetNaturalEnumerator - After YieldReturn");
                            if (i > _threshold)     // threshold = 3 
                           {
                                   Console.WriteLine ("GetNaturalEnumerator - Before YieldBreak");
                                   yield break ;
                                   Console.WriteLine ("GetNaturalEnumerator - After YieldBreak"); // unreachable code
                           }
                     }
              }

               static void Main( string[] args )
              {
                      for (var j = GetNaturalEnumerator(); j .MoveNext(); )
                     {
                            Console.WriteLine (j. Current);
                            Console.ReadLine ();
                     }
              }
  
Let's see the for loop in Main() function - it initialises "j" as IEnumerator<int>, but the function itself does nothing. Say, if I set a breakpoint at the function body, it won't hit. Instead, j.MoveNext() actually triggers the breakpoint. This means, a function that returns IEnumerator (or IEnumerable - will be covered next) works differently. ' var j = GetNaturalEnumerator() ' statement sets up the connection between the enumerator variable and the function body and the actual function body won't run until somebody calls MoveNext() on the enumerator. 


This is the first stage of execution, where the process is waiting for a line of console input at Console.ReadLine(). As per the string displayed, the function had run until "yield return i" statement. 
I hit the enter key and got this. This means, the function body resumed its work just after "yield return i" statement. This happens over and over until it gets the end of function or "yield break" statement. 
As the threshold was 3, "yield break" was hit when the number was 4. The final Console.WriteLine("...After YieldBreak") didn't run as it was unreachable. 

Example of IEnumerable & yield return / break 

(Code 2 - a function to return IEnumerable) 
               /// <summary>
               /// Iterate natural numbers until threshold reaches
               /// </summary>
               public static IEnumerable< int> GetNaturalNumbers ()
              {
                      for (int i = 1; i < 99999999; i++)
                     {
                            Console.WriteLine ("GetNaturalNumbers - Before YieldReturn");
                            yield return i;
                            Console.WriteLine ("GetNaturalNumbers - After YieldReturn");
                            if (i > _threshold)
                           {
                                   Console.WriteLine ("GetNaturalNumbers - Before YieldBreak");
                                   yield break ;
                                   Console.WriteLine ("GetNaturalNumbers - After YieldBreak"); // unreachable code
                           }
                     }
             }

               static void Main( string[] args )
              {
                      foreach (var i in GetNaturalNumbers ())
                     {
                            Console.WriteLine (i);
                            Console.ReadLine ();
                     }
              }
  
Did you see? The function body of GetNaturalNumbers() is identical to GetNaturalEnumerator() on Code 1. However, see the Main() which now uses "foreach" keyword. In this example, the 'foreach' code block does the same thing as the previous 'for' code block on Code 1. 'yield return' & 'yield break' do the exactly same things. 
I bet you think now - you may ask, "why should we have a duplicate"? Of course not, because IEnumerator and IEnumerable are convertible to each other. Let me show you the 3rd example. 

Example of converting IEnumerable to IEnumerator and vice versa 

(Code 3 - modified GetNaturalEnumerator)

               /// <summary>
               /// Iterate natural numbers until threshold reaches
               /// </summary>
               public static IEnumerator< int> GetNaturalEnumerator ()
              {
                      foreach (var i in GetNaturalNumbers ())
                     {
                            yield return i;
                     }
                }

               /// <summary>
               /// Iterate natural numbers until threshold reaches
               /// </summary>
               public static IEnumerable< int> GetNaturalNumbers ()
              {
                      for (int i = 1; i < 99999999; i++)
                     {
                            Console.WriteLine ("GetNaturalNumbers - Before YieldReturn");
                            yield return i;
                            Console.WriteLine ("GetNaturalNumbers - After YieldReturn");
                            if (i > _threshold)
                           {
                                   Console.WriteLine ("GetNaturalNumbers - Before YieldBreak");
                                   yield break ;
                                   Console.WriteLine ("GetNaturalNumbers - After YieldBreak"); // unreachable code
                           }
                     }
             }
  
Check out the changed function body on GetNaturalEnumerator(). As IEnumerable supports 'foreach', each member can be returned by doing that. 

(Code 4 - modified GetNaturalNumbers) 
               /// <summary>
               /// Iterate natural numbers until threshold reaches
               /// </summary>
               public static IEnumerator< int> GetNaturalEnumerator ()
              {
                      for (int i = 1; i < 99999999; i++)
                     {
                            Console.WriteLine ("GetNaturalEnumerator - Before YieldReturn");
                            yield return i;
                            Console.WriteLine ("GetNaturalEnumerator - After YieldReturn");
                             if (i > _threshold)
                           {
                                   Console.WriteLine ("GetNaturalEnumerator - Before YieldBreak");
                                   yield break ;
                                   Console.WriteLine ("GetNaturalEnumerator - After YieldBreak"); // unreachable code
                           }
                     }
              }

               /// <summary>
               /// Iterate natural numbers until threshold reaches
               /// </summary>
               public static IEnumerable< int> GetNaturalNumbers ()
              {
                      for (var j = GetNaturalEnumerator(); j .MoveNext(); )
                     {
                            yield return j. Current;
                     }
               }
  
How about this? I showed that they were convertible to each other and the results were all the same.

Just for fun 

I also bet you might think "what would happen if IEnumerable wanted to rehash IEnumerator whilst IEnumerator wanted to rehash IEnumerable again???" Seeing is believing. 

(Code 5 - just for fun) 
               public static IEnumerator< int> GetNaturalEnumerator ()
              {
                      foreach (var i in GetNaturalNumbers ())
                     {
                            yield return i;
                     }
                }

               public static IEnumerable< int> GetNaturalNumbers ()
              {
                      for (var j = GetNaturalEnumerator(); j .MoveNext(); )
                     {
                            yield return j. Current;
                     }
                }

               static void Main( string[] args )
              {
                      for (var j = GetNaturalEnumerator(); j .MoveNext(); )
                     {
                            Console.WriteLine (j. Current);
                            Console.ReadLine ();
                     }

                      foreach (var i in GetNaturalNumbers ())
                     {
                            Console.WriteLine (i);
                            Console.ReadLine ();
                     }

              }
  
Guess what? 

LOL - nobody was doing the job correctly and finally crashed whilst delegating to each other, which is really common in our real-life. :) 

Conclusion 
  • 'yield return' and 'yield break' statements are designed for IEnumerator and IEnumerable to iterate data collection easily.
  • The functions to return them work quite differently from usual functions. 
  • The next works after 'yield return' will be continued when the iterating function is called again. 
  • The next works after 'yield break' won't work at all. 
  • Functions to return IEnumerator and IEnumerable are convertible to each other in most cases but don't forget to implement one of them. :) 


Evernote helps you remember everything and get organized effortlessly. Download Evernote.

No comments: