Alright guys read carefully. This blog post explains the code
by which you can UPLOAD MULTIPLE LARGE FILEs TO AZURE BLOB IN PARALLEL
(PRECISELY BLOCK BLOB).
What I am
doing and what I am not?
1.Ask user to provide list large files and blob names to
upload in one go in parallel
2.The code will use TPL (Parallel.ForEach loop precisely) to
perform simultaneous uploads of azure blobs.
3.Code
will chop the large files into multiple blocks.
4.Every
block of file is uploading in SYNC. I am not performing any ASYNC operation while
uploading individual blocks to azure blob. (Precisely I will be using PutBlock
and PutBlockList methods).
5.However,
the UI project (In my case it is console application, also it can be either
Worker Role) calls the method of upload blob in ASYNC way with the help of
BeginInvoke and EndInvoke.
Applicable
technology -
I am
using VS2013, Azure SDK 2.3 and Storage library as “Windows Azure Storage 4.1.0”
launched on 23rd of Jun 2014.
Implementation
In real world scenario, we always tend to perform large file
uploads to azure blob using worker role. Hence I will depict code with respect
to console application. Don’t worry it can be easily converted to worker role
specific code. J
Alright so, let’s start with it!!
Again – This
might also be long post due to heavy code blocks. So be prepared.
Reference
– 70% of this code is based on solution provided by codeplex on this post - http://azurelargeblobupload.codeplex.com/SourceControl/latest#AzurePutBllockExample/
It is a great
solution to upload large files to azure blob storage. Just awesome!! I will change
a bit so that, we can perform parallel uploads of large files.
First I create a simple console application and named it as AzureBlobUploadParallelSample
as shown below –
Then I added another class library in the same solution named
as AzureBlobOperationsManager as shown below –
This class library will perform the upload of large files to
azure blob storage and hence should have reference to Azure storage libraries.
Nuget is the best way to get latest of dll’s therefore I opened Tools->
Library Package Manager-> Package Manager Console and types following
command to install storage libraries –
Also add reference to Microsoft.WindowsAzure.Serviceruntime
latest version from Add Reference dialog box.
I am defining a class here named as FileBlobNameMapper. This
class defines two properties BlobName and FilePath. These properties will be
used by user to specify name of the blob and path of file to be uploaded. This
class will help users to provide multiple large files to be uploaded in Azure
blob storage.
/// <summary>
/// Class to be used for holding the file-blobname
mapping.
/// </summary>
public class FileBlobNameMapper
{
public FileBlobNameMapper(string blobName, string filePath)
{
BlobName = blobName;
FilePath = filePath;
}
public string BlobName { get; set; }
public string FilePath { get; set; }
}
After invoking async upload of multiple blobs in azure
storage, we need to know which uploads is successful and which are failed.
Therefore to get this status information I have defined another class named as BlobOperationStatus. It is as follows –
public class BlobOperationStatus
{
public string Name { get; set; }
public Uri BlobUri { get; set; }
public OperationStatus OperationStatus { get; set; }
public Exception ExceptionDetails { get; set; }
}
public enum OperationStatus
{
Failed, Succeded
}
Now we need a class which will actually perform upload of
large file to blob in parallel. Therefore I added a class named as AsyncBlockBlobUpload.
In this class I copied method GetFileBlocks, and internal class FileBlock from the codeplex link which
is also specified above.
I defined MaxBlockSize class varibale to 2 MB as follows. This
means every file block will be of size 2MB.
private const int MaxBlockSize = 2097152; //
Approx. 2MB chunk size
Now I defined a method which will use Parallel.ForEach to
start the upload of all blobs in parallel, means every blob upload on different
thread and hence faster.
public List<BlobOperationStatus> UploadBlockBlobsInParallel(List<FileBlobNameMapper> fileBlobNameMapperList, string containerName)
So if you see, this is where I use the earlier defined class
FileBlobMapper. The method has parameter containerName means all files and
blobs names present in FileBlobNameMapper class list will be uploaded in the
specified container. So if you wish to upload a single file or multiple files
to blob then also this method serves the purpose. The full method code is as
follows –
public List<BlobOperationStatus> UploadBlockBlobsInParallel(List<FileBlobNameMapper> fileBlobNameMapperList,
string
containerName)
{
//create
list of blob operation status
List<BlobOperationStatus> blobOperationStatusList = new List<BlobOperationStatus>();
//upload
every file from list to blob in parallel (multitasking)
Parallel.ForEach(fileBlobNameMapperList, fileBlobNameMapper =>
{
string blobName = fileBlobNameMapper.BlobName;
//read
file contents in byte array
byte[] fileContent = File.ReadAllBytes(fileBlobNameMapper.FilePath);
//call
private method to actually perform upload of files to blob storage
BlobOperationStatus blobStatus =
UploadBlockBlobInternal(fileContent, containerName, blobName);
//add
the status of every blob upload operation to list.
blobOperationStatusList.Add(blobStatus);
});
return blobOperationStatusList;
}
Let’s have a look at private method where I am actually
performing the upload operation of blob using PutBlock and PutBlockList.
private BlobOperationStatus UploadBlockBlobInternal(byte[] fileContent, string containerName, string blobName)
This method will be called as many times as the number of
records in list of FileBlobNameMapper.
Let’s look at the complete code of this method.
private BlobOperationStatus UploadBlockBlobInternal(byte[] fileContent, string containerName, string blobName)
{
BlobOperationStatus blobStatus = new BlobOperationStatus();
try
{
//
Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
//
Retrieve reference to container and create if not exists
CloudBlobContainer container =
blobClient.GetContainerReference(containerName);
container.CreateIfNotExists();
//
Retrieve reference to a blob and set the stream read and write size to minimum
CloudBlockBlob blockBlob = container.GetBlockBlobReference(blobName);
blockBlob.StreamWriteSizeInBytes = 1048576;
blockBlob.StreamMinimumReadSizeInBytes = 1048576;
//set
the blob upload timeout and retry strategy
BlobRequestOptions options = new BlobRequestOptions();
options.ServerTimeout = new TimeSpan(0, 180, 0);
options.RetryPolicy = new ExponentialRetry(TimeSpan.Zero, 20);
//get
the file blocks of 2MB size each and perform upload of each block
HashSet<string> blocklist = new HashSet<string>();
List<FileBlock> bloksT = GetFileBlocks(fileContent).ToList();
foreach (FileBlock block in GetFileBlocks(fileContent))
{
blockBlob.PutBlock(
block.Id,
new MemoryStream(block.Content,
true), null,
null, options, null
);
blocklist.Add(block.Id);
}
//commit
the blocks that are uploaded in above loop
blockBlob.PutBlockList(blocklist, null, options, null);
//set
the status of operation of blob upload as succeeded as there is not exception
blobStatus.BlobUri =
blockBlob.Uri;
blobStatus.Name =
blockBlob.Name;
blobStatus.OperationStatus = OperationStatus.Succeded;
return blobStatus;
}
catch (Exception ex)
{
//set
the status of blob upload as failed along with exception message
blobStatus.Name = blobName;
blobStatus.OperationStatus = OperationStatus.Failed;
blobStatus.ExceptionDetails =
ex;
return blobStatus;
}
}
The comments in above method are self-explanatory and simple
to understand. So here we complete our library classes for large file upload to
blob storage. Build the class library project and add reference to console
application project.
No we client project (means my console app or worker role app)
I need to invoke these methods of azure blob upload asynchronously. After
completion of upload operation retrieve the result in callback method and take
necessary action.
Alright we need to now look into console application code from
which we will call upload operation in Async way. I highly recommend you to go
through this link - http://msdn.microsoft.com/en-us/library/2e08f6yc(v=vs.110).aspx to
understand how can we call any method in Async way from C#. So based on this
approach I have defined delegate AsyncBlockBlobUploadCaller having the same signature
as that of actual blob upload method. So I will use object of this delegate to
use BeginInvoke and EndInvoke method.
I declared delegate in Program class of console application as
class variable –
public delegate List<BlobOperationStatus> AsyncBlockBlobUploadCaller(List<FileBlobNameMapper> blobFileMapperList, string containerName);
So Main method code is as follows –
static void Main(string[] args)
{
//define
file paths
string file1 = @"C:\Kunal_Apps\Sample hours1.xlsx";//5MB
string file2 = @"C:\Kunal_Apps\Sample hours2.xlsx";//1MB
string file3 = @"C:\Kunal_Apps\Sample hours3.xlsx";//6MB
string file4 = @"C:\Kunal_Apps\Boot Camp 14.zip";//100MB
//map
the file names to blob names
List<FileBlobNameMapper> blobFileMapperList = new List<FileBlobNameMapper>();
blobFileMapperList.Add(new FileBlobNameMapper("blob1", file1));
blobFileMapperList.Add(new FileBlobNameMapper("blob2", file2));
blobFileMapperList.Add(new FileBlobNameMapper("blob3", file3));
blobFileMapperList.Add(new FileBlobNameMapper("blob4", file4));
//specify
the container name
string containerName = "mycontainer";
AsyncBlockBlobUpload blobUploadManager = new AsyncBlockBlobUpload();
AsyncBlockBlobUploadCaller caller = new AsyncBlockBlobUploadCaller(blobUploadManager.UploadBlockBlobsInParallel);
caller.BeginInvoke(blobFileMapperList, containerName, new AsyncCallback(OnUploadBlockBlobsInParallelCompleted),
null);
//to
keep main thread alive I am using While(true). Because Async operations here
will be based on ThreadPool and if main thread is ended then async operation
child threads will also end.
//Note:
If you are using worker role here then it usually run's the operation in Run
method in While(true) method keeping your main thread alive always.
while (true)
{
Console.WriteLine("continue the
main thread work...");
Thread.Sleep(90000);
}
}
If you see I have added While(true) loop. It is of no use
here. It just to simulate that my main thread of console operation is doing
some work and in the background my async upload of azure blob storage is also
happening at the same time. If you are using worker role then you will not need
it. In above code change yellow
marked file paths to your file paths and they can be of different sizes.
Also you may change the container name, blob names as per your choice.
Not it was time for me defining the callback method which will
get automatically called when blob upload async operation fails or succeeds.
/// <summary>
/// Callback method for upload to azure blob operation
/// </summary>
/// <param
name="result">async result</param>
public static void OnUploadBlockBlobsInParallelCompleted(IAsyncResult result)
{
//
Retrieve the delegate.
AsyncResult asyncResult = (AsyncResult)result;
AsyncBlockBlobUploadCaller caller = (AsyncBlockBlobUploadCaller)asyncResult.AsyncDelegate;
//retrive
the blob upload operation status list to take necessary action
List<BlobOperationStatus> operationStausList = caller.EndInvoke(asyncResult);
//print
the status of upload operation for each blob
foreach (BlobOperationStatus blobStatus in operationStausList)
{
Console.WriteLine("Blob
name:" + blobStatus.Name + Environment.NewLine);
Console.WriteLine("Blob
operation status:" + blobStatus.OperationStatus
+ Environment.NewLine);
if (blobStatus.ExceptionDetails != null)
{
Console.WriteLine("Blob
operation exception if any:" +
blobStatus.ExceptionDetails.Message + Environment.NewLine);
}
//Note:This
is where you can write the failed blob operation entry in table/ queue and
again make worker role traverse th' to perform upload again.
}
}
That’s it. If you run the application the output will be as
follows –
If you observe the main thread work had started and continuing
then when entire blob uploads operation was successful then the message of
those blob upload appeared and after that again main thread continue work
message. J J…
Hence my entire large file uploads to azure blob storage was
async and in parallel.
Let’s check if the sample is working correct and getting
correct results if my async azure blob upload fails. To fail the blob upload,
best way is to specify name of blob to length greater than 1024 characters.
Therefore I wrote some random sentence in word file and made sure that its
length is greater than 1024(I am having 2019 length of those random words) and
then in debug mode I changed the name of my blob to this random name of heavy
length.
As expected it got failed and I got the correct result of
failure as shown below –
Enhancements
–
Right now the code uploads
multiple files in parallel but all blocks of file are uploaded synchronously. The
enhancements can be, to upload blocks of one file ALSO IN PARALLEL.
If you are looking for REST API based upload of large
files to Azure blob storage using SAS and also renew of SAS then refer to the
following link – http://sanganakauthority.blogspot.com/2014/08/using-sas-renew-sas-and-rest-api-to.html
Important
– Please suggest your Feedback/ Changes / Comments to the article to improve
it.
To
download full source code refer to the link - http://code.msdn.microsoft.com/Upload-large-file-to-azure-fd1ac46d
Cheers…
Happy Uploading!!
Nice article.
ReplyDeleteSame solution i tried with 500MB file, i am getting out of memory exception
ReplyDeleteIt will work for azure request timeout greater than 230 seconds?
ReplyDelete