Using AWS Lambda to Transcode Music Files on The Fly

Converting analogue sounds to digital storage requires the pitch and volume of the original wave to be sampled at many points. To reduce the number of data points, some of them can be thrown away if you think that they contribute little or nothing to the listening experience. How many points to record and keep is the job of the audio codec, and different requirements lead to many different formats of differing file sizes.

Working in digital music, one of our basic tasks is to serve music in all of these different formats, from large, high quality lossless formats like FLAC or MQA to small, short MP3 clips that can be streamed to mobile devices with crappy data plans. A single track could have a dozen different formats of varying quality. For a catalogue of 60 million tracks, this can add up to a hefty storage problem, not to mention a hefty bill from your cloud provider.

Given the economics of the long tail, most tracks will be required only occasionally if at all. That makes it attractive to keep only a single, high-quality format for a track and generate other formats if and when they are needed, a process known as transcoding.

Given the inherently functional nature of this task, it seemed a natural task for a serverless function to implement. That is, a function that will act as an event-handler in the sky, or at least the cloud, taking an input and reacting to it without specifying any of the hardware requirements to run them. The attractions of scaling to user actions as fast as they can generate them, without needing to wait for hardware to spin up are obvious. Google Cloud Functions are one such infrastructure, but I am going to take you through an implementation in AWS Lambda Functions.

I should point out that this is Proof Of Concept stuff. It's not running in production anywhere, least of all in 7digital. But it's a neat demo of a C# function that leverages the AWS infrastructure.

Remember, when I say C# in the cloud, I mean dotnet core running on Linux. A beautiful thing.

The double attraction of using AWS is its Elastic Transcoder, which has the magic to convert from one format to another. Or at least from one open-format to another.

All that transcoder requires is one S3 bucket for the input, one for output and a pre-defined pipeline to take audio files between them. All that your Lambda function requires is an IAM Role with the AmazonElasticTranscoderRole policy action. Once you have these, simply create a job that specifies what you want:

var response = await transcoder.CreateJobAsync( new CreateJobRequest(){  
    PipelineId = pipeline.Id,            
    Input = new JobInput()
    {
        Container = "auto",
        Key = s3event.Records.First().Sns.Message
    },
    Output = new CreateJobOutput()
    {
        Key = String.Format( "{0}.mp3", sourceFile),
        PresetId = "1351620000001-300020"
    }
} );

The strange-looking PresetId specifies mp3, the full list can be seen here.

Using the C# await functionality makes the response nicely asynchronous. All that is required to make this a serverless function is to mark the containing method with the LambdaSerializerAttribute attribute

[LambdaSerializerAttribute(typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))]
public async Task<string> Transcode(SNSEvent s3event, ILambdaContext context)  
{
    // more magic here
}

and deploy it in the Lambda console. It can be triggered by any of the usual serverless events: writing to a bucket, or a Kinesis stream or, really nicely, by the AWS Gateway, which is a whole other bag of spanners.

The performance is really pretty good. Specifying a large amount of memory helps, but this code could transcode The Sunday's superb Here's Where The Story Ends from 16-bit FLAC to mp3 in less than a second. This is the kind of thing you could build a streaming service around, especially with a decent CDN for the popular stuff.

The full code is available on my github repo