At work we’re currently in the process of migrating our larger API project from a Windows host running under IIS to a Linux host running under Mono. The project is written using ServiceStack, which means we can use its built-in self-hosting mechanism instead of having to run under XSP, which is really nice. We already successfully made the transition with a smaller API project several weeks ago, so we were able to apply the things we learned during that migration to avoid many of the trials and tribulations involved with getting an existing .NET project to compile and run under Mono this time around. I may write more about that process some time in the future, but that’s not what this post is about.

This post is about something much more specific and unlikely to be of interest to almost anyone except me, and maybe some hypothetical programmer who finds themselves in the exact same scenario and happens to stumble across this post after tearing his or her hair out all night. To that hypothetical programmer, you are welcome. To everyone else, I am sorry. Unless you actually do find this post interesting for some reason, in which case you are also welcome.

Working in the healthcare industry means having to be HIPAA compliant, which is basically a set of rules and guidelines intended to keep personal health information protected from prying eyes (that’s probably grossly oversimplifying it, but you get the idea). Part of our service involves storing sensitive information in S3 which, in order to be HIPAA compliant, has to be encrypted. We also have to decrypt when an authorized user wishes to retrieve it. Fortunately, Amazon has an official AWSSDK library in Nuget which makes communication with S3 a breeze, and .NET has a ton of standard crypto functionality built in which makes encrypting and decrypting a breeze. Here’s the cool part:

The Cool Part

Amazon’s S3 SDK deals with streams when you upload and download, and .NET has this thing called a CryptoStream which lets you encrypt and decrypt streams on the fly. Nifty! An astute programmer might put two and two together and realize that he or she could handle uploading by passing the stream directly from the client into a CryptoStream and pass that to the S3 SDK in order to encrypt the data as the user uploads it and pass the encrypted bytes directly for storage in S3. That way the whole process is completed without ever storing the sensitive information anywhere on the server’s disk, or even holding it all at once in memory. Similarly, downloading could be handled by passing the encrypted bytes from S3 to a CryptoStream and then passing that directly on to the client, again never having to store any information on the server’s disk or in memory.

Here’s some example code to show just how slick the upload could look:

public void EncryptAndUpload(Stream stream, AmazonS3Client s3Client,
  string bucket, string key)
{
  using(var aes = new AesManaged
  {
    BlockSize = 128,
    KeySize = 256,
    Mode = CipherMode.CBC
  })
  {
    aes.GenerateKey();
    aes.GenerateIV();

    using(var encryptor = aes.CreateEncryptor())
    {
      using(var cstream = new CryptoStream(stream, encryptor, CryptoStreamMode.Read))
      {
        s3Client.PutObject(new PutObjectRequest
        {
          BucketName = bucket,
          Key = key,
          InputStream = cstream
        });
      }
    }
  }
}

Sweet, right? You pass it your stream and the S3 client and it encrypts the stream and stores it in S3 for you. Instead of reading all the unencrypted bytes into memory, encrypting them in memory, then sending all the encrypted bytes to S3 like some noob, we’re getting all streamed up in that shit. Stream-style baby.

Wrong.

If you try running this, you’ll get an exception because the AmazonS3Client wants to know the length of the stream before you upload it, and CryptoStream.Length throws an exception (it doesn’t know how to guess the length of the encrypted stream before it’s been encrypted). Another problem that may arise, even if this did work, is that you have no way of verifying that the encrypted content in S3 hasn’t been tampered with when you come back to decrypt it later. Well, wipe away those tears my friend, we’re not finished yet.

Two Birds with One Stone

CryptoStream may not know how to calculate the final size of its encrypted bytes, but we do! We are specifying an AesManaged.BlockSize value of 128 bits. This means the final stream will be encoded into chunks of 128 bits (or 16 bytes). The final size of the stream will be the minimum number of 16-byte chunks that would be required to hold the original stream. You can calculate that from the original stream using the formula streamLength + blockSize - (streamLength % blockSize). Trust me, I got that off StackOverflow so it must be right.

Another nifty feature of Amazon’s S3 is that it stores the MD5 hash of the data stored at a particular key as metadata—namely the ETag value. If we could calculate the MD5 hash of our encrypted data locally, we could use that value to check against the ETag in S3 to make sure they match later before we decrypt it.

So, if you’ve been paying attention, you’ve probably figured out that what we want is a way to wrap our CryptoStream with a facade that can a) report the length of the final encrypted stream and b) calculate the MD5 of the encrypted data as it’s being read by the AmazonS3Client.

public class MD5CalculatorStream : Stream
{
  private readonly Stream _stream; // the stream to wrap
  private long? _length; // stream.Length override
  private bool _firstPass; // we only need to calculate the MD5 once
  private readonly MD5CryptoServiceProvider _md5 = new MD5CryptoServiceProvider();

  // constructor takes a stream and an optional pre-calculated length override
  public MD5CalculatorStream(Stream stream, long? length = null)
  {
    _stream = stream;
    _length = length;
    _firstPass = true;
    _md5.Initialize();
  }

  // property to access the MD5 after it's been calculated
  public byte[] MD5
  {
    get { return _md5.Hash; }
  }

  // pass SetLength calls to our override if we have one
  public override void SetLength(long value)
  {
    if(_length.HasValue)
      _length = value;
    else
      _stream.SetLength(value);
  }

  // here's the meat and potatoes, calculate the MD5 as the stream is read
  public override int Read(byte[] buffer, int offset, int count)
  {
    // calculate the MD5 in blocks as the stream is read
    var bytesRead = _stream.Read(buffer, offset, count);
    if(_firstPass)
    {
      _md5.TransformBlock(buffer, 0, bytesRead, null, 0);

      // if that was the last block, finalize the MD5 hash
      if(bytesRead < count)
      {
        _md5.TransformFinalBlock(new byte[0], 0, 0);
        _firstPass = false;
      }
    }
    return bytesRead;
  }

  // return our length override if it exists
  public override long Length
  {
    get { return _length ?? _stream.Length; }
  }

  // amazon also calls this for some reason so we need it to not throw errors
  // doesn't seem to matter that the values we return are incorrect
  public override long Position
  {
    get { return _length.HasValue ? 0 : _stream.Position; }
    set { if (!_length.HasValue) _stream.Position = value; }
  }

  // override the rest of Stream's members, passing the calls directly to _stream
  // ...
}

The above class does what we want—it can report a stream length that you give it beforehand, and it uses the MD5CryptoServiceProvider to calculate the MD5 of the stream as it’s being read. I didn’t complete the implementation, which involved creating simple wrapper functions for the other members of Stream such as Stream.Flush, Stream.Write, etc. For these you simply call those same functions on _stream in order to pass them through. The exceptions may be the Seek related methods, since seeking will screw up the MD5 calculations, you probably don’t want to let anyone do it on one of these streams. Up to you. AmazonS3Client doesn’t do any seeking so it doesn’t matter either way for the purposes of this article.

Here’s what our new upload function looks like:

public void EncryptAndUpload(Stream stream, AmazonS3Client s3Client,
  string bucket, string key)
{
  using(var aes = new AesManaged
  {
    BlockSize = 128,
    KeySize = 256,
    Mode = CipherMode.CBC
  })
  {
    aes.GenerateKey();
    aes.GenerateIV();

    using(var encryptor = aes.CreateEncryptor())
    {
      // calculate the future length of the encrypted stream
      var clen = stream.Length + (aes.BlockSize >> 3)
        - (streamLength % (aes.BlockSize >> 3));

      using(var cstream = new MD5CalculatorStream(
        new CryptoStream(stream, encryptor, CryptoStreamMode.Read),
        clen))
      {
        s3Client.PutObject(new PutObjectRequest
        {
          BucketName = bucket,
          Key = key,
          InputStream = cstream
        });
      }
    }
  }
}

Voila. Shit just works now. Here’s what the download function looks like:

public Stream DownloadAndDecrypt(AmazonS3Client s3Client,
  string bucket, string s3key,
  byte[] key, byte[] iv, byte[] md5)
{
  var get = s3Client.GetObject(new GetObjectRequest
  {
    BucketName = bucket,
    Key = s3key
  });

  // make sure the ETag matches our MD5
  // if not, the data in S3 changed since we encrypted it!
  var md5str = BitConverter.ToString(md5).Replace("-", string.Empty);
  if(!get.ETag.Trim('"').Equals(md5str, StringComparison.OrdinalIgnoreCase))
    throw new Exception("OH CRAP WE'VE BEEN HACKED ALERT THE CEO!!");

  using(var aes = new AesManaged
  {
    BlockSize = 128,
    KeySize = 256,
    Mode = CipherMode.CBC,
    Key = key,
    IV = iv
  })
  {
    var decryptor = aes.CreateDecryptor();
    return new CryptoStream(get.ResponseStream, decryptor, CryptoStreamMode.Read);
  }
}

Bam! This function first verifies that our MD5 matches the ETag from S3, then sets up a CryptoStream to decrypt the data which can then be passed directly to the client. No problemo! Done, right?

Wrong.

If you’re on Windows, then yeah, you’re done. In Mono, as of the date of this post, you are not done. There’s some bug somewhere in either the AWSSDK or in the Mono runtime that causes the CryptoStream to throw an exception on the final block as it’s decrypting.

Why I Cried Myself to Sleep that Night

I spent a long time trying to figure out a clean way to work around this problem, but in the end I had to move on in to Noobsville.

public Stream DownloadAndDecrypt(AmazonS3Client s3Client,
  string bucket, string s3key,
  byte[] key, byte[] iv, byte[] md5)
{
  var get = s3Client.GetObject(new GetObjectRequest
  {
    BucketName = bucket,
    Key = s3key
  });

  // make sure the ETag matches our MD5
  // if not, the data in S3 changed since we encrypted it!
  var md5str = BitConverter.ToString(md5).Replace("-", string.Empty);
  if(!get.ETag.Trim('"').Equals(md5str, StringComparison.OrdinalIgnoreCase))
    throw new Exception("OH CRAP WE'VE BEEN HACKED ALERT THE CEO!!");

  using(var aes = new AesManaged
  {
    BlockSize = 128,
    KeySize = 256,
    Mode = CipherMode.CBC,
    Key = key,
    IV = iv
  })
  {
    // lame crappy work around hack that makes me sad
    var dstream = new MemoryStream();
    get.ResponseStream.CopyTo(dstream);
    dstream.Position = 0;

    var decryptor = aes.CreateDecryptor();
    return new CryptoStream(dstream, decryptor, CryptoStreamMode.Read);
  }
}

As you can see, to get it working, all that’s required is to read all of the encrypted bytes from S3 into memory first and then pass that to the CryptoStream. But my vision of an all-streaming solution has been shattered. While I’m happy that this is finally working, the inelegance of having to store the whole thing in memory has left me a broken man. I have failed.

Moral of the Story

Never give up. Never surrender. Unless there’s a bug in a third party SDK or the framework, then you’re shit-outta-luck. I suppose you could track down the actual bug, fix it, and try to open a pull request into the official repo, but ain’t nobody got time for that.