Category: code snippets

Framework Tips IV: Check if character exists for given Encoding (CodePage)

In a project I’m currently working on, I needed to check if particular character is a part of given CodePage. Problem with .NET’s Encoding class, is that although it maintains a table mapping Unicode characters to codes in particular CodePage, it keeps it as private field. Moreover it does its best to replace characters it does not contain, with some fallback character.

One might use this fact, and compare character received this way from Encoding’ instance, with original character, assuming, that if they are different, this character is not a part of that CodePage, but this is not an elegant solution. And involves lot of overhead, by first converting char to byte[] and next the other way around.

Another solution is to use an overload of Encoding’s static GetEncoding method, like this:

Encoding.GetEncoding(1252, EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback);

this way, when user tries to convert a character that is not a part of given Encoding’s CharSet, fallback encoder throws an exception. So one might use try/catch and be happy with it, but this too is an awful solution, and also limiting, as you have to create Encoding instance yourself, so you’re helpless in cases when you receive arbitrary encoding.

After little bit of poking around I came up with yet another solution, that seems to be better, faster and more elegant than those two. I however didn’t test it thoroughly so it may have flaws as well (or may not even work at all in some cases). First, let the code speak:

using System;

using System.Text;

 

namespace ConsoleApplication3

{

    public class Program

    {

        static void Main(string[] args)

        {

            char s = '\x015f';  //exists in 1250 but not in 1252

            char a = 'a';       //exists in both

            char t = '\x00fe';  //exists in 1252 but not in 1250

            Encoding ce = Encoding.GetEncoding(1250);

            Encoding we = Encoding.GetEncoding(1252);

            Print(ce, we, s);

            Print(ce, we, a);

            Print(ce, we, t);

            Console.ReadKey();

        }

 

        private static void Print(Encoding ce, Encoding we, char c)

       {

            Console.WriteLine("{0}: {3}: {1,-6} {4}: {2,-6}",

                c, ce.Contains(c), we.Contains(c), ce.WebName, we.WebName);

        }

    }

 

    public static class EncodingExtensions

    {

        public static bool Contains(this Encoding encoding, char character)

        {

            Encoding enc = encoding;

            if (!(enc.EncoderFallback is EncoderFallbackCheckExists) && enc.IsReadOnly)

            {

                //you might want to cache these, in order to avoid having to

                //clone given encoding every time.

                enc = (Encoding)encoding.Clone();

                enc.EncoderFallback = new EncoderFallbackCheckExists();

            }

            int result = enc.GetByteCount(new char[] { character }, 0, 1);

            return result > 0;

 

        }

    }

    

    internal class EncoderFallbackCheckExists:EncoderFallback

    {

        public override int MaxCharCount { get { return 1; } }

 

        public override EncoderFallbackBuffer CreateFallbackBuffer()

        { return new FallbackBufferCheckExists(); }

    }

 

    internal class FallbackBufferCheckExists:EncoderFallbackBuffer

    {

        public override int Remaining { get { return 0; } }

 

        public override bool Fallback(char charUnknown, int index)

        { return false; }

 

        public override bool Fallback(char charUnknownHigh, char charUnknownLow, int index)

        { return false; }

 

        public override char GetNextChar() { return '\0'; }

 

        public override bool MovePrevious() { return false; }

    }

}

I created two classes: one inheriting from EncodierFallback, and one inheriting from EncoderFallbackBuffer. Basically my idea was, that I will provide Encoding instance with fake fallback encoder, that should not try to provide any fallback character. That way Encoding will silently (and fast) fail and its GetBytes and GetByteCount methods will return respectively empty array and 0y.

Only problem I had was to inject actual EncoderFallbackCheckExist instance info Encoding’s EncoderFallback property. Although this property has setter, when IsReadOnly is true, trying to set it, will raise an exception. Encoding however implements ICloneable, and cloning it, does not preserve its readonly state. So after its cloned, you can safely assign its EncoderFallback.

I also created simple EncodingExtensions class, with single extension method, to wrap the whole logic, and attach it to Encoding class, so that you can write:

Encoding encoding = Encoding.GetEncoding(1256);

bool b = encoding.Contains('ź');

Looks good to me, and as far as I’ve checked – works. However if you have better idea how to accomplish this, please leave a comment.

Technorati Tags: , , ,

Framework Tips II: How to get default ANSI Encoding for given culture

It’s sometimes useful to know what is the default ANSI CodePage, for some given culture. It’s quite easy to achieve, thanks to System.Globalization namespace.

CultureInfo cultureInfo = CultureInfo.GetCultureInfo(1252);

Encoding encoding = Encoding.GetEncoding(cultureInfo.TextInfo.ANSICodePage);

However this code will not always work correctly. The problem is, not every culture has default ANSI CodePage, and therefore, for them you’d have to have plan B, like using UTF-8.

CultureInfo cultureInfo = CultureInfo.CurrentUICulture;

int codePage = cultureInfo.TextInfo.ANSICodePage;

Encoding encoding = codePage.Equals(0)?

                        Encoding.UTF8:

                        Encoding.GetEncoding(codePage);

To see, all such cultures, you can use following code:

var cultures = from c in CultureInfo.GetCultures(CultureTypes.AllCultures)

                         where c.TextInfo.ANSICodePage.Equals(0)

                         select c;

foreach (CultureInfo info in cultures)

    Console.WriteLine("{0,-10}{1}", info.Name, info.EnglishName);

Nullable<bool> GetHashCode() – bug or a feature?

Today I stumbled upon a strange bug, that seems to be a feature of .net framework. I had a method that performed some action upon a instance of a class, lets say Customer, based on the hash value of that record. Seems plain and simple, however my unit test exhibited a strange behavior – in some cases, although Customer record had been updated, it acted as if it was not changed.

Short investigation pointed to a field of type bool? (Nullable<bool>), that although its value was changed, returned the same hash code.

The problem is, that generic struct Nulllable<T> implements GetHashCode like this:

public override bool GetHashCode()

{

    if(this.HasValue)

        return Value.GetHashCode();

    return 0;

}

System.Boolean implements its the same method like this:

public override bool GetHashCode()

{

    if(this)

        return 1;

    return 0;

}

It boils down to the fact, that for both: null, and false, we get hash value of 0. That’s why, although Customer changed value of that field, from false to null, or he other way around, its hash value was still the same.

I consider this a bug, but on the other hand, returning -1 or 2 might save the day for Nullable<bool> but how about Nullable<Int32> ?

Technorati Tags: , , ,

Using ‘using’

Using classes (and structures for that matter) that implement IDisposable has one implication: when you’re done using it, you should ASAP call it’s Dispose() method.

Like in the example:

Pies item1 = new Pies("Pies 1");

Console.WriteLine("Accessing {0}", item1.Name);

item1.Dispose();

To make life easier, and not have to remember to call this method directly you can alternatively use some ‘syntactic sugar’ in form of the ‘using’ keyword, and rewrite this example to (more or less, as we’ll se in just a second) equivalent code:

using(Pies item1 = new Pies("Pies 1"))

{

    Console.WriteLine("Accessing {0}",item1.Name);

}

This approach has quite a few advantages: you are freed from calling Dispose() explicitly (it get’s called after last line of code within ‘{‘ and ‘}’ ends executing), it’s more readable, and it limits the visibility of item1, what may be desirable in some cases (for example, you will not be able to call item1 after it’s disposed of, which might cause some nasty, run-time errors).

As I said, ‘using’ keyword, is just a syntactic sugar. Beneath it, is code looking like this (via Reflector):

Pies item1;

bool CS$4$0000;

item1 = new Pies("Pies 1");

l_000C:

try

{

    Console.WriteLine("Accessing {0}", item1.Name);

    goto Label_0031;

}

finally

{

Label_0021:

    if ((item1 == null) != null)

    {

        goto Label_0030;

    }

    item1.Dispose();

Label_0030:;

}

l_0031:

    return;

It wraps, the code we put within ‘using’ range with try/finally just to make sure that Dispose() gets called even if something goes wrong.

You can embed ‘using’ statements, one, within another like this:

using (Pies item1 = new Pies("Pies 1"))

{

    using (Pies item2 = new Pies("Pies 2"))

    {

        Console.WriteLine("Accessing {0}", item1.Name);

        Console.WriteLine("Accessing {0}", item2.Name);

    }

}

or in a shorter form:

using (Pies item1 = new Pies("Pies 1"))

using (Pies item2 = new Pies("Pies 2"))

{

    Console.WriteLine("Accessing {0}", item1.Name);

    Console.WriteLine("Accessing {0}", item2.Name);

}

What I didn’t know was, that when all variables you’re ‘using’ are of the same type (like in the example: both item1 and item2 are of type Pies) you can shorter it even further to this form:

using (Pies item1 = new Pies("Pies 1"), item2 = new Pies("Pies 2"))

{

    Console.WriteLine("Accessing {0}", item1.Name);

    Console.WriteLine("Accessing {0}", item2.Name);

}

All three snippets result in identical IL. I’d most often opt for the second solution. It’s concise, readable, and enable me to use variables of more than one type (like StreamReaded/StreamWriter).

Technorati Tags: , ,

Convert int to string as hex number

Today I needed to parse colors encoded as string in the form 0xRRGGBB where RR GG and BB were red green and blue values of given color encoded in hexadecimal.

Problem I stumbled upon, was, what if I have a number like:

0x008000

There was no problem converting it to System.Drawing.Color class, but back to string.

I used code like below:

string colorString = string.Format("0x{0:X}{1:X}{2:X}",

                       color.R,

                       color.G,

                       color.B);

This would yield:

0x0800

Which is definitely not what I wanted it to be. I needed a way to enforce a number to emit zero in front of it, if it’s small enough to fit in one digit.

After much too long search, trial and error I found a solution, that was so obvious when I finally discovered it, that I felt like it should be the first thing to try: you just give a number right after ‘X’, indicating how many chars you want the number to have. Considering the fact that I tried ‘X,2’, ‘X:2’, ‘X;2’ ‘X,00’ and several more before I tried this I feel now… well, not very well about myself 😉

string string1 = @"0x008000";

Color color = (Color) new ColorConverter().ConvertFromString(string1);

string colorString = string.Format("0x{0:X2}{1:X2}{2:X2}",

                               color.R,

                               color.G,

                               color.B);

Took me like 30 minutes to figure it out.

How do You regionerate your code?

I’ve been using Regionerate for some time, and I’m addicted to it. Literally when I have to write some code on a computer that doesn’t have Regionerate installed I feel odd. This tool is simply pure honey and nuts. Only thing I would change is it’s default keyboard mapping (ctrl+R for running it), because it collides with Visual Studio/ReSharpers “Refactor” shortcut. So every time I install it I have to go to VS settings and change it to something else (alt+3 at the moment).

Main reason for this post however is not to praise Rauchy and his tool, but to talk a little bit about it’s customization capabilities. Regionerate is Xml driven, that is, its regioneration (strange word, huh?) settings are kept in a xml file. It comes with xsd so when you edit it in VS you’ll get intellisence, which is pretty sweat and will save you a lot of time.

The simplest possible  Regionerate settings file would look like this:

<CodeLayout xmlns="http://regionerate.net/schemas/0.6.3.8/CodeLayout.xsd">
    <ForEachClass>
        <CreateRegion>
            <PutFields>
            </PutFields>
        </CreateRegion>
    </ForEachClass>
</CodeLayout>

It creates a region and puts all fields into it, like below:

namespace Xtoff.Tmx.Helpers
{
    public class TmxLanguage
    {
        
        #region [rgn] Unnamed Region (1)
 
        private readonly string _value;
 
        #endregion [rgn]
 
        public TmxLanguage(string value)
        {
            _value = value;
        }
        public string Value
        {
            get { return _value; }
        }
    }
}

All fields were put in a single region, and all other members were left below. Hooray!. However I guess very few would be satisfied at this point.

Before we move on, however, there are a few facts to note.

First of all, regions name: [rgn] Unnamed Region (1)

[rgn] is a standard prefix for regionerate to mark it’s regions. It was introduced because without some kind of differentiator regionerate would break your manually created regions when regionerating your file. Thanks to this, it will only look into parts of your class that are not inside any region, or are inside a Regionerate-created region. You can change this prefix, or remove it. Keep in mind however, that then every region will be treated as a Regionerate-created region.

Next thing is region’s name. We didn’t set it, so Regionerate set it to default. I don’t have to tell you that you DO want to name your regions :).

And finally (1) indication how many elements is in a region. VERY useful when dealing with large files.

Next step would then probably be setting a name, and looking at other options we have.

If you go back to CreateRegion, hit space and wait for intellisence to come up you’ll be presented with 4 options:

Separating lines: Allows you to specify how many free lines you want Regionerate to leave between members in a regions.

ShowCount: Flag allowing you to turn of showing count of members inside of a region, defaults to true, and I don’t recommend changing it.

 

Style: this is one of the best and little known features.

Three valid options are Visible, Comment and Invisible. Visible is the default option, and it will wrap your code with a region like seen above

Comment will clean up your code but instead of enclosing it within a region it will only put a comment on top of all fields, like this:

        // [rgn] Unnamed Region (1)
 
        private readonly string _value;

Invisible, will clean up your code, but it woun’t put any regions not comments.

Title: sets the title for region 🙂

Going down the Xml tree, we can define what we want to put in out region. In our example we chose fields, but you can put basically every class member (field, property, method, event and so on), or inner region. You can do multiple Put* into a region.

Now we’re getting into really interesting stuff, that is defining filters for specific elements we want to put in a region. In the example above we chose to keep all fields in this region, but we could have come up with something much more sophisticated, like region for non serialized public fields with names meetings certain regular expression.

I won’t explain every single option in detail because there are so many that it would take too long. There are also diferences between types of elements (for example for Properties you can filter by accessors). In 9 cases out of 10 you will be able to create rules you want. You can’t create rules like “Region for methods that subscribed to some events” unles you have a naming convention for those, because it would require analysis on a higher level of abstraction, but nonetheless it’s pretty sweat.

And for those interested, I attach my Regionerate settings file.

Technorati Tags: , ,

Fun with ?: operator

First of all, take a look at the following code:

        private string _targetText;
        private int _maxLines;
        private int _maxSize;
 
        public int Lines
        {
            get
            {
                if (_targetText == null)
                    return 0;
                return _targetText.Split('\n').Length;
            }
        }
 
        public bool IsValid
        {
            get
            {
                return  _maxSize == 0 ?
                    true :
                    Size <= _maxSize
                    &&
                    _maxLines == 0 ?
                    true :
                    Lines <= _maxLines;
            }
        }

It’s fairly simple, the most important piece is IsValid property, that checks if _targetText meets certain length and number of lines limitations.

Now, let’s say that  max size is 5, max lines is 1 and target text is “Some incredibly long piece of text“. Million dollar question is: What would IsValid return for these parameters?

 

It would actually  return true, because there is a subtle bug in this code. It may not be apparent and and it’s a tricky beast because you have to know how to look at it to see what’s actually going on. The reason why it returns true, when all signs on earth and in heaven say it should return false is operators priority, and the way how ?: gets translated by compiler to some other code.

Logically thinking we would expect the code to examine if maxSize is 0 and if it is to set left hand flag to true, and if it’s not zero to set it to whether or not Size is less or equal max size, then to do similar thing with maxLines and Lines and set right hand flag, and then, if both flags are true to return true, and false otherwise. By thinking this way we assume that it will first run both ?: operators and then && the results, in other words, we assume that ?: operator has higher priority than && operator that turns not to be true.

That’s because people think of ?: operator like shorthand of if else, whereas mighty Reflector reveals its true nature to be different. When we compile code above and then open it in Reflector we’ll see code like this (line breaks and indents added to make it easier to read).

        public bool IsValid
        {
            get
            {
                return 
                    (
                        (this._maxSize == 0) 
                        || 
                        (
                            (
                                (this.Size <= this._maxSize) 
                                && 
                                (this._maxLines == 0)
                            ) 
                            || 
                            (this.Lines <= this._maxLines)
                        )
                    );
            }
        }

I suspect that this code looks slightly different than what you expected. No if else only logical ands and ors. And it’s the reason for that unexpected output. If you examine that code closely you’ll notice that no mater size and max size – if number of lines is not greater than maxLines it will return true. So how to fix that code? Either by surrounding ?: operators in brackets, or by moving them to other properties/methods like this:

        public bool IsValid
        {
            get
            {
                return HasValidSize
                    && HasValidLineCount;
            }
        }
 
        public bool HasValidLineCount
        {
            get
            {
                return _maxLines == 0 ?
                    true :
                    Lines <= _maxLines;
            }
        }
 
        public bool HasValidSize
        {
            get
            {
                return _maxSize == 0 ?
                    true :
                    Size <= _maxSize;
            }
        }

I hope that was informative.

Replace text elements with Regular Expressions

I love/hate regular expressions. I love them for their flexibility and amount of time you can save using RegEx as opposed to manipulating strings manually. I hate them, because writing them is such a pain in the… you get the point. Today I had to quickly assemble a small tool that would replace certain elements in text file. To be more accurate it had to read lots small text files that were kind of bilingual, meaning English/Chinese, and change them to true Unicode bilingual. I said kind of, because files were written in plain ASCII with English text written normally, and Chinese encoded like this: #$2536#$5231#$AFF, that is #$ then one or two chars denoting older byte’s code, and two chars denoting younger byte’s code. It would be quite hard to do it manually, especially that file was a little bit more complicated than I presented here.

I used Regex class’s method Replace, that is specifically designed to help you replace elements in a string. It gets a string that you want to modify and a MatchEvaluator delegate. MatchEvaluator gets called every time match occurs on a given input string, it gets Match object representing said match, and returns string that substitutes matched element. It may seem complicated, but actual code is plain and simple:

public string Decode(string encodedString, Regex pattern)

{

    return pattern.Replace(encodedString, Replace);

}

 

private string Replace(Match match)

{

    string older = match.Groups["Older"].Value;

    string younger = match.Groups["Younger"].Value;

    char character = Convert(older, younger);

    return character.ToString();

}

 

private char Convert(string first, string second)

{

    if (second.Length == 1)

        second = "0" + second;

    return (char)(Convert.ToInt32(first + second, 16));

}

Groups ‘Older’ and ‘Younger’ denote location of older and younger byte of Unicode character code.

Method ‘Replace’ simply takes you byte codes from matched string and calls Convert that returns single character that is represented by this code, and it is put in the place of matched string. Using this simple approach I can easily substitute those codes with actual characters they represent.

Converting custom Strings to DateTime

One of projects I’m currently working on involves reading a file produced by other tool, that has rather unusual way of storing a date and time. For example 1st of July 2007, 14:00:00 would be stored as 20070701T140000Z (colors added for emphasis).

Using Convert.ToDateTime(string) or DateTime.Parse(string) throws FormatException. Parsing manually (splitting string and parsing its substring to ints to create DateTime object from them) is not very elegant solution. There is however static method DateTime.ParseExact(string, string, IFormatProvider). First parameter is encoded DateTime, second is pattern, and third can be null. How to create pattern you can learn from this msdn article.

Putting it all together, here’s how I parsed those strings to DateTime and DateTime to strings:

DateTime dt = DateTime.ParseExact("20070622T142203Z", "yyyyMMdd'T'HHmmss'Z'", null);

string s = dt.ToString("yyyyMMdd'T'HHmmss'Z'");

Compare paths from the end in C#

Today at work, a colleague came to me with quite interesting problem. He needed to find out first common directory for two given paths starting from the end.

For example, for given paths like:

c:\documents and settings\some user\my files\projects\project1\initialFiles\somefiles\ and

d:\ My Projects\project1\ChangedFiles\MyFiles\

It would return ‘project1‘.

I was surprised to find out that neither System.IO.Path, nor System.IO.Directory allows you to that. Here’s simple solution I created for him.

public static string FindLastCommonParentFolder(string path1, string path2)

{

    if (string.IsNullOrEmpty(path1))

        throw new ArgumentException();

    if (string.IsNullOrEmpty(path2))

        throw new ArgumentException();

    try

    {

        //ensures that paths are valid

        path1 = Path.GetFullPath(path1);

        path2 = Path.GetFullPath(path2);

    }

    catch (PathTooLongException ex)

    { /*handle exception}*/}

    catch (ArgumentException ex)

    { /*handle exception}*/}

    catch (NotSupportedException ex)

    { /*handle exception}*/}

    catch (SecurityException ex)

    { /*handle exception}*/}

 

    if (path1 == path2)

        return path1.Substring(path1.LastIndexOf(Path.DirectorySeparatorChar));

 

    string[] folders1 = path1.Split(Path.DirectorySeparatorChar);

    string[] folders2 = path2.Split(Path.DirectorySeparatorChar);

    if (folders1.Length < 1 || folders2.Length < 1) 

        return string.Empty;

 

    for (int i = folders1.Length - 1; i >= 0; i--)

    {

        for (int j = folders2.Length - 1; j >= 0; j--)

        { 

            if(folders1[i]==folders2[j])

                return folders1[i];

        }

    }

    return string.Empty;

}