Storing long text and binary blobs with NHibernate

There comes a time when you have to store in the data­base not only pretty objects but also some not so pretty data. For exam­ple, let’s take the clas­si­cal entity – the invoice.

 

Let’s say your appli­ca­tion can receive invoice in three ways: by its own fron­tend, by email in rtf file, and by fax. In two later cases you may have a require­ment to store the rtf file and the scanned fax as a proof in the data­base. How to approach that?

Let’s start by sketch­ing our Invoice class:

public class Invoice
{
    public virtual Guid Id { get; protected set; }
    public virtual string InvoiceRtfString { get; set; }
    public virtual byte[] ScannedInvoiceJpg { get; set; }
    
    //a lot more stuff...
}

For the sake of exam­ple let’s say we want to keep the rtf file as text (well, that’s what it is under­neath), and we want to keep the scanned invoice as byte array.

Let’s cre­ate the sim­plest map­ping and see what NHiber­nate will pro­duce out of it.

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2" 
                   namespace="ConsoleApplication1" 
                   assembly="ConsoleApplication1">
  <class name="Invoice">
    <id name="Id">
      <generator class="guid.comb" />
    </id>
    <property name="InvoiceRtfString"/>
    <property name="ScannedInvoiceJpg"/>
  </class>
</hibernate-mapping>

If we now ask NHiber­nate to gen­er­ate the schema for us, that is what it will produce:

create table Invoice (
    Id UNIQUEIDENTIFIER not null,
   InvoiceRtfString NVARCHAR(255) null,
   ScannedInvoiceJpg VARBINARY(8000) null,
   primary key (Id)
)

As you can see, we get quite lit­tle space for our Rtf file – hardly any file will fit in 255 char­ac­ters, and cer­tainly not one cre­ated by Microsoft Word.

If you check NHiber­nate doc­u­men­ta­tion it sug­gests using String­Clob for this, so let’s do just that. It also sug­gests we used Bina­ry­Blob type for the scanned jpg so let us apply this change as well, and see how it affects the gen­er­ated schema.

If we update the mapping:

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2" 
                   namespace="ConsoleApplication1" 
                   assembly="ConsoleApplication1">
  <class name="Invoice">
    <id name="Id">
      <generator class="guid.comb" />
    </id>
    <property name="InvoiceRtfString" type="StringClob"/>
    <property name="ScannedInvoiceJpg" type="BinaryBlob"/>
  </class>
</hibernate-mapping>

…and gen­er­ate schema…

create table Invoice (
    Id UNIQUEIDENTIFIER not null,
   InvoiceRtfString NVARCHAR(255) null,
   ScannedInvoiceJpg VARBINARY(8000) null,
   primary key (Id)
)

…it will sur­pris­ingly be exactly the same. I’m gen­uinely sur­prised. Is there any­thing else we can do?

Quick googling leads to this old post by Ayende, where he sug­gests the solu­tion: spec­i­fy­ing sql-type explic­itly as NTEXT for the string column.

How­ever, as Scott White noticed, NTEXT is dep­re­cated since SQL Server 2008, we should use nvarchar(max) instead. Since we also expect size of seri­al­ized jpg images to be higher than 8kb, we’ll use varbinary(max) instead here as well.

So our final map­ping looks like this:

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2" 
                   namespace="ConsoleApplication1" 
                   assembly="ConsoleApplication1">
  <class name="Invoice">
    <id name="Id">
      <generator class="guid.comb" />
    </id>
    <property name="InvoiceRtfString" type="StringClob">
      <column name ="RtfText" sql-type="nvarchar(MAX)"/>
    </property>
    <property name="ScannedInvoiceJpg" type="BinaryBlob">
      <column name ="JpgData" sql-type="varbinary(max)" />
    </property>
  </class>
</hibernate-mapping>

And NHiber­nate pro­duces the fol­low­ing DDL out of it:

create table Invoice (
    Id UNIQUEIDENTIFIER not null,
   RtfText nvarchar(MAX) null,
   JpgData varbinary(max) null,
   primary key (Id)
)

Which is exactly what we needed.

DNK Tags:
  • dario-g

    I never save blobs (images) in db. It's belongs to file sys­tem. In db I have only pointer (path, file­name) to that file. :)

  • http://michaelstechnical.blogspot.com/ Michael Hedg­peth

    Just what I was look­ing for; thanks.

  • http://www.tigraine.at/ Daniel Höl­bling

    @dario-g Actu­ally, I believe putting it in the DB is the most effi­cient way to go about this.

    Once you save stuff to the filesys­tem, you get all sorts of prob­lems and headaches that you oth­er­wise would avoid.

    Like:
    Backup/Recovery/Replication –Data­bases do this very well.
    Filesys­tem rights — Deploy­ment gets yet another thing you need to do right. And once IT-Admins start mess­ing around with ACLs you're usu­ally get­ting a call from a angry cus­tomer.
    Orga­ni­za­ton — Fold­ers with huge num­bers of files put a heavy strain on your sys­tem per­for­mance after time. I usu­ally have a set of sub­fold­ers that I dis­trib­ute files around to avoid per­for­mance issues fold­er­name = (FileId % 1000)

    greet­ings Daniel

  • aaron

    @daniel hol­bling When you have to the db you also "get all sorts of prob­lems and headaches that you oth­er­wise would avoid." Such as virus scan­ning, read/write per­for­mance or db dri­vers, all files require an appli­ca­tion to view them, and often data­base fea­tures become lim­ited when you use blob binary fea­tures such as filestream in sql200