Thursday, 20 February 2014

Providing unique ID on managed object using ConditionalWeakTable (C#)

Introduction

​I was asked how to get a unique ID or something like this to distinguish managed objects and I answered object.GetHashCode() would do that. However, this answer is not perfect because sometimes GetHashCode() can return same values for different object instances. It is just for making hash codes for collections like hash maps so not perfect. 

Actually every managed object is represented by its reference which is unique and can be used for checking equity (e.g. Object.ReferenceEquals method) but sometimes it's difficult to use that because holding references block GC from collecting garbages. 

Luckily, .Net 4.0 introduced a new collection of ConditionalWeakTable<key, value> which doesn't affect on garbage collecting so that it's perfect to use for providing unique IDs on managed objects, by setting up an internal collection of ConditionalWeakTable<object, A_Counter_Class>. Its original purpose is to attach some extra data on alien objects but its characteristics allow us to track managed objects as well. 

Code

* I've just implemented and tested it by myself so please note that this code might not be perfect and can be (or should be) enhanced by yourself. ​

               class ObjectRecorder
              {
                      ConditionalWeakTable<object , object> _cwt = new ConditionalWeakTable<object , object>();

                      private static int _uniqueId = 0;

                      public void Add( object x )
                     {
                            try
                           {
                                   _cwt.Add (x, ( object)Interlocked .Increment( ref _uniqueId));
                           }
                            catch (System .ArgumentException) {}
                     }

                      public int GetValue( object x )
                     {
                            object id ;
                            if (_cwt .TryGetValue( x, out id))
                           {
                                   return (int )id;
                           }
                            return -1;
                     }

                      public ConditionalWeakTable <object, object> Cwt
                     {
                            get
                           {
                                   return _cwt ;
                           }
                     }
              }

The essential part of this class is _cwt as ConditionalWeakTable which manages a sort of map to store object reference in conjunction with its unique ID. If an attempt is made to add the same object instance again, it will do nothing. GetValue() method is to retrieve the ID connected to the given reference. 

Usage & Examples

               static void Main( string[] args )
              {
                      object a = new object();
                      object b = new object();
                      object bb = b;

                      List<int > lista = new List <int>();
                      List<int > listb = new List <int>();
                      List<int > listbb = listb;

                      object c = (object)0;
                      object d = (object)0;
                      object e = d;

                      ObjectRecorder x = new ObjectRecorder();
                      x.Add (a);
                      x.Add (a);
                      x.Add (b);
                      x.Add (bb);
                      x.Add (lista);
                      x.Add (listb);
                      x.Add (listbb);
                      x.Add (c);
                      x.Add (d);

                      Console.WriteLine ("a:" + x.GetValue (a). ToString());
                      Console.WriteLine ("a:" + x.GetValue (a). ToString());
                      Console.WriteLine ("b:" + x.GetValue (b). ToString());
                      Console.WriteLine ("bb:" + x.GetValue (bb). ToString());
                      Console.WriteLine ("lista:" + x.GetValue (lista). ToString());
                      Console.WriteLine ("listb:" + x.GetValue (listb). ToString());
                      Console.WriteLine ("listbb:" + x.GetValue (listbb). ToString());
                      Console.WriteLine ("c:" + x.GetValue (c). ToString());
                      Console.WriteLine ("d:" + x.GetValue (d). ToString());
                      Console.WriteLine ("e:" + x.GetValue (e). ToString());
                      Console.WriteLine ("x:" + x.GetValue (x). ToString());
               }

As per the expectation, it should be able to count "same references" only and not to count "same values". Here's the output. ​​​​


​​It definitely displayed the same IDs for the same references only so it actually works! 

Conclusion​​
  • Different ​GetHashCode() values mean different references, but the same values don't mean they are the same references as it's just a hash generation function which is not guaranteed to be unique per object. 
  • The real unique value is the reference itself but it doesn't come in handy for reference comparison in some cases as keeping references interferes garbage collection.
  • ConditionalWeakTable, introduced in .Net 4.0, is a good solution for this purpose as it doesn't impact on garbage collection whilst it can hold additional information per managed object. 
References
  • http://msdn.microsoft.com/en-us/library/dd287757(v=vs.100).aspx
  • http://stackoverflow.com/questions/750947/net-unique-object-identifier
Evernote helps you remember everything and get organized effortlessly. Download Evernote.

Understanding yield return, yield break in C# for C/C++ programmers

Introduction

These two things are very simple keywords in C# which don't exist on C/C++. I am writing this article for somebody learning C# with C/C++ background.

Basics

IEnumerator: An iterator which can perform the following; Just imagine POSITION in MFC or iterator in STL. 
  • MoveNext(): Get next value
  • Reset(): Return to the initial position
  • Current: Current value 
IEnumerable: A set of iterators. One of great advantages to use this type is "foreach" statement which is also supported in STL and the new C++11 standard. But this supports much more - data query, conversion, ... 

Example of IEnumerator & yield return / break 

(Code 1 - a function to return IEnumerator) 
               /// <summary>
               /// Iterate natural numbers until threshold reaches
               /// </summary>
               public static IEnumerator< int> GetNaturalEnumerator ()
              {
                      for (int i = 1; i < 99999999; i++)
                     {
                            Console.WriteLine ("GetNaturalEnumerator - Before YieldReturn");
                            yield return i;
                            Console.WriteLine ("GetNaturalEnumerator - After YieldReturn");
                            if (i > _threshold)     // threshold = 3 
                           {
                                   Console.WriteLine ("GetNaturalEnumerator - Before YieldBreak");
                                   yield break ;
                                   Console.WriteLine ("GetNaturalEnumerator - After YieldBreak"); // unreachable code
                           }
                     }
              }

               static void Main( string[] args )
              {
                      for (var j = GetNaturalEnumerator(); j .MoveNext(); )
                     {
                            Console.WriteLine (j. Current);
                            Console.ReadLine ();
                     }
              }
  
Let's see the for loop in Main() function - it initialises "j" as IEnumerator<int>, but the function itself does nothing. Say, if I set a breakpoint at the function body, it won't hit. Instead, j.MoveNext() actually triggers the breakpoint. This means, a function that returns IEnumerator (or IEnumerable - will be covered next) works differently. ' var j = GetNaturalEnumerator() ' statement sets up the connection between the enumerator variable and the function body and the actual function body won't run until somebody calls MoveNext() on the enumerator. 


This is the first stage of execution, where the process is waiting for a line of console input at Console.ReadLine(). As per the string displayed, the function had run until "yield return i" statement. 
I hit the enter key and got this. This means, the function body resumed its work just after "yield return i" statement. This happens over and over until it gets the end of function or "yield break" statement. 
As the threshold was 3, "yield break" was hit when the number was 4. The final Console.WriteLine("...After YieldBreak") didn't run as it was unreachable. 

Example of IEnumerable & yield return / break 

(Code 2 - a function to return IEnumerable) 
               /// <summary>
               /// Iterate natural numbers until threshold reaches
               /// </summary>
               public static IEnumerable< int> GetNaturalNumbers ()
              {
                      for (int i = 1; i < 99999999; i++)
                     {
                            Console.WriteLine ("GetNaturalNumbers - Before YieldReturn");
                            yield return i;
                            Console.WriteLine ("GetNaturalNumbers - After YieldReturn");
                            if (i > _threshold)
                           {
                                   Console.WriteLine ("GetNaturalNumbers - Before YieldBreak");
                                   yield break ;
                                   Console.WriteLine ("GetNaturalNumbers - After YieldBreak"); // unreachable code
                           }
                     }
             }

               static void Main( string[] args )
              {
                      foreach (var i in GetNaturalNumbers ())
                     {
                            Console.WriteLine (i);
                            Console.ReadLine ();
                     }
              }
  
Did you see? The function body of GetNaturalNumbers() is identical to GetNaturalEnumerator() on Code 1. However, see the Main() which now uses "foreach" keyword. In this example, the 'foreach' code block does the same thing as the previous 'for' code block on Code 1. 'yield return' & 'yield break' do the exactly same things. 
I bet you think now - you may ask, "why should we have a duplicate"? Of course not, because IEnumerator and IEnumerable are convertible to each other. Let me show you the 3rd example. 

Example of converting IEnumerable to IEnumerator and vice versa 

(Code 3 - modified GetNaturalEnumerator)

               /// <summary>
               /// Iterate natural numbers until threshold reaches
               /// </summary>
               public static IEnumerator< int> GetNaturalEnumerator ()
              {
                      foreach (var i in GetNaturalNumbers ())
                     {
                            yield return i;
                     }
                }

               /// <summary>
               /// Iterate natural numbers until threshold reaches
               /// </summary>
               public static IEnumerable< int> GetNaturalNumbers ()
              {
                      for (int i = 1; i < 99999999; i++)
                     {
                            Console.WriteLine ("GetNaturalNumbers - Before YieldReturn");
                            yield return i;
                            Console.WriteLine ("GetNaturalNumbers - After YieldReturn");
                            if (i > _threshold)
                           {
                                   Console.WriteLine ("GetNaturalNumbers - Before YieldBreak");
                                   yield break ;
                                   Console.WriteLine ("GetNaturalNumbers - After YieldBreak"); // unreachable code
                           }
                     }
             }
  
Check out the changed function body on GetNaturalEnumerator(). As IEnumerable supports 'foreach', each member can be returned by doing that. 

(Code 4 - modified GetNaturalNumbers) 
               /// <summary>
               /// Iterate natural numbers until threshold reaches
               /// </summary>
               public static IEnumerator< int> GetNaturalEnumerator ()
              {
                      for (int i = 1; i < 99999999; i++)
                     {
                            Console.WriteLine ("GetNaturalEnumerator - Before YieldReturn");
                            yield return i;
                            Console.WriteLine ("GetNaturalEnumerator - After YieldReturn");
                             if (i > _threshold)
                           {
                                   Console.WriteLine ("GetNaturalEnumerator - Before YieldBreak");
                                   yield break ;
                                   Console.WriteLine ("GetNaturalEnumerator - After YieldBreak"); // unreachable code
                           }
                     }
              }

               /// <summary>
               /// Iterate natural numbers until threshold reaches
               /// </summary>
               public static IEnumerable< int> GetNaturalNumbers ()
              {
                      for (var j = GetNaturalEnumerator(); j .MoveNext(); )
                     {
                            yield return j. Current;
                     }
               }
  
How about this? I showed that they were convertible to each other and the results were all the same.

Just for fun 

I also bet you might think "what would happen if IEnumerable wanted to rehash IEnumerator whilst IEnumerator wanted to rehash IEnumerable again???" Seeing is believing. 

(Code 5 - just for fun) 
               public static IEnumerator< int> GetNaturalEnumerator ()
              {
                      foreach (var i in GetNaturalNumbers ())
                     {
                            yield return i;
                     }
                }

               public static IEnumerable< int> GetNaturalNumbers ()
              {
                      for (var j = GetNaturalEnumerator(); j .MoveNext(); )
                     {
                            yield return j. Current;
                     }
                }

               static void Main( string[] args )
              {
                      for (var j = GetNaturalEnumerator(); j .MoveNext(); )
                     {
                            Console.WriteLine (j. Current);
                            Console.ReadLine ();
                     }

                      foreach (var i in GetNaturalNumbers ())
                     {
                            Console.WriteLine (i);
                            Console.ReadLine ();
                     }

              }
  
Guess what? 

LOL - nobody was doing the job correctly and finally crashed whilst delegating to each other, which is really common in our real-life. :) 

Conclusion 
  • 'yield return' and 'yield break' statements are designed for IEnumerator and IEnumerable to iterate data collection easily.
  • The functions to return them work quite differently from usual functions. 
  • The next works after 'yield return' will be continued when the iterating function is called again. 
  • The next works after 'yield break' won't work at all. 
  • Functions to return IEnumerator and IEnumerable are convertible to each other in most cases but don't forget to implement one of them. :) 


Evernote helps you remember everything and get organized effortlessly. Download Evernote.