Storing long text and binary blobs with NHibernate

There comes a time when you have to store in the database not only pretty objects but also some not so pretty data. For example, let’s take the classical entity – the invoice.

 

Let’s say your application can receive invoice in three ways: by its own frontend, by email in rtf file, and by fax. In two later cases you may have a requirement to store the rtf file and the scanned fax as a proof in the database. How to approach that?

Let’s start by sketching our Invoice class:

public class Invoice
{
    public virtual Guid Id { get; protected set; }
    public virtual string InvoiceRtfString { get; set; }
    public virtual byte[] ScannedInvoiceJpg { get; set; }
    
    //a lot more stuff...
}

For the sake of example let’s say we want to keep the rtf file as text (well, that’s what it is underneath), and we want to keep the scanned invoice as byte array.

Let’s create the simplest mapping and see what NHibernate will produce out of it.

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2" 
                   namespace="ConsoleApplication1" 
                   assembly="ConsoleApplication1">
  <class name="Invoice">
    <id name="Id">
      <generator class="guid.comb" />
    </id>
    <property name="InvoiceRtfString"/>
    <property name="ScannedInvoiceJpg"/>
  </class>
</hibernate-mapping>

If we now ask NHibernate to generate the schema for us, that is what it will produce:

create table Invoice (
    Id UNIQUEIDENTIFIER not null,
   InvoiceRtfString NVARCHAR(255) null,
   ScannedInvoiceJpg VARBINARY(8000) null,
   primary key (Id)
)

As you can see, we get quite little space for our Rtf file – hardly any file will fit in 255 characters, and certainly not one created by Microsoft Word.

If you check NHibernate documentation it suggests using StringClob for this, so let’s do just that. It also suggests we used BinaryBlob type for the scanned jpg so let us apply this change as well, and see how it affects the generated schema.

If we update the mapping:

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2" 
                   namespace="ConsoleApplication1" 
                   assembly="ConsoleApplication1">
  <class name="Invoice">
    <id name="Id">
      <generator class="guid.comb" />
    </id>
    <property name="InvoiceRtfString" type="StringClob"/>
    <property name="ScannedInvoiceJpg" type="BinaryBlob"/>
  </class>
</hibernate-mapping>

…and generate schema…

create table Invoice (
    Id UNIQUEIDENTIFIER not null,
   InvoiceRtfString NVARCHAR(255) null,
   ScannedInvoiceJpg VARBINARY(8000) null,
   primary key (Id)
)

…it will surprisingly be exactly the same. I’m genuinely surprised. Is there anything else we can do?

Quick googling leads to this old post by Ayende, where he suggests the solution: specifying sql-type explicitly as NTEXT for the string column.

However, as Scott White noticed, NTEXT is deprecated since SQL Server 2008, we should use nvarchar(max) instead. Since we also expect size of serialized jpg images to be higher than 8kb, we’ll use varbinary(max) instead here as well.

So our final mapping looks like this:

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2" 
                   namespace="ConsoleApplication1" 
                   assembly="ConsoleApplication1">
  <class name="Invoice">
    <id name="Id">
      <generator class="guid.comb" />
    </id>
    <property name="InvoiceRtfString" type="StringClob">
      <column name ="RtfText" sql-type="nvarchar(MAX)"/>
    </property>
    <property name="ScannedInvoiceJpg" type="BinaryBlob">
      <column name ="JpgData" sql-type="varbinary(max)" />
    </property>
  </class>
</hibernate-mapping>

And NHibernate produces the following DDL out of it:

create table Invoice (
    Id UNIQUEIDENTIFIER not null,
   RtfText nvarchar(MAX) null,
   JpgData varbinary(max) null,
   primary key (Id)
)

Which is exactly what we needed.

DNK Tags:
  • dario-g

    I never save blobs (images) in db. It’s belongs to file system. In db I have only pointer (path, filename) to that file. :)

  • http://michaelstechnical.blogspot.com/ Michael Hedgpeth

    Just what I was looking for; thanks.

  • http://www.tigraine.at/ Daniel Hölbling

    @dario-g Actually, I believe putting it in the DB is the most efficient way to go about this.

    Once you save stuff to the filesystem, you get all sorts of problems and headaches that you otherwise would avoid.

    Like:
    Backup/Recovery/Replication -Databases do this very well.
    Filesystem rights – Deployment gets yet another thing you need to do right. And once IT-Admins start messing around with ACLs you’re usually getting a call from a angry customer.
    Organizaton – Folders with huge numbers of files put a heavy strain on your system performance after time. I usually have a set of subfolders that I distribute files around to avoid performance issues foldername = (FileId % 1000)

    greetings Daniel

  • aaron

    @daniel holbling When you have to the db you also "get all sorts of problems and headaches that you otherwise would avoid." Such as virus scanning, read/write performance or db drivers, all files require an application to view them, and often database features become limited when you use blob binary features such as filestream in sql200