Tuesday, March 22, 2011

Centripetal Product Review of CloudBerry S3 Explorer Pro

Part of building, running and maintaining a Software as a Service application like ours means having great tools at your disposal to make things easier. I wanted to highlight one of those tools that I've become particularly dependent on in my day to day work for keeping things running smoothly. The tool is CloudBerry S3 Explorer Pro by CloudBerry Labs.

We manage our complete backend logging, monitoring and reporting systems based on Amazon S3 at the core. There are lots of tools we run on top of that from custom built applications to Hadoop and everything in between. We currently have 10's of millions of individual files ranging in size from 100 bytes to over 1 GB. In Amazon's S3 environment we we manage over 100 buckets, spanning all Amazon worldwide regions with each containing complex directory structures that define content and date partitioning of the data in the files. We also run a lot of different portions of our applications through the Amazon CDN which is seemlessly integrated into S3. With all the applications and code we have dedicated to S3 specific functionality you'd think that we would never have a need to actually just look at the raw S3 structure and browse around for files or other things like that, but it has become a daily need to go in for one reason or another. When I have a need to go directly to S3 for something Cloudberry S3 Explorer Pro has been my tool of choice. It is indispensable when looking for individual files when debugging, doing copy or move jobs, or scripting more complex file jobs.

Debugging

A typical day finds me debugging an issue for one of our business team members. Many times I find that I need to go directly to the transaction logs of some server to determine exactly what happens. We store all of these in S3 in order to run transaction reports across them with Hadoop and other tools. Cloudberry Explorer has made the job so much easier because I can look at my different buckets through the lens of a typical filesytem that I am used to and can browse around, download and open files with a click of the mouse and can even make quick updates when I need to. Cloudberry gives me a beautiful user interface for working with some of the more advanced features of S3 like ACL's, Bucket Policies, Cloudfront distributions, External Buckets and more and it makes these features much more accessible than the straight Amazon S3 API for quick tasks I need to do. If I ad to work within the confines of the Amazon API or even the Amazon Web interface for doing this debugging I would be entirely hamstrung and my life would be in shambles from the craziness of things.....

Copy or Move Jobs

Another thing that I find myself doing a lot is moving files around within S3. We use specific naming conventions for S3 files to denote the working state of a file. We also use different buckets in each Amazon region to reduce cross datacenter chatter. But there are often times that I find I just need to copy or move a whole slew of files from one place to another. Recently I actually had the need to move over 1 million files between buckets. For this I use Cloudberry. Moving and Copying is a drag and drop task within Cloudberry. And for some of those bigger jobs (like the million files) I can use up to 100 threads to get the job done more quickly. The ease of sing Cloudberry for these types of tasks has gotten me to be a little too dependent on the tool, I've actually spun up Amazon Ec2 instances just to put Cloudberry on to do large copy/move jobs and then tore them down. That allowed me to have more CPU power going as well as even more threads working on my job.

Scripting 

One thing that we've recently discovered is that everything that is available in the CloudBerry S3 Explorer Pro version is also available in Windows Powershell snapins. We've utilized these
extensively to script out tasks that we find ourselves doing over and over again. While we have our own tools that use the Amazon S3 API to interact with S3 from within our applications, I've found that the Cloudberry Powershell snapins are more reliable and much easier to use due to the scripting capabilities of Powershell. Now each time I find myself doing something in Cloudberry I ask if it is something that I should script out for future use. Often times I find that a few minutes adding new capabilities to my script toolbox using these snapins ends up saving countless hours down the road. 

Summary

If you're using Amazon S3 for anything you're doing in your business you need to go out and get a license for CloudBerry S3 Explorer Pro. This tool is one of the most useful tools that I have found, not only do I use it daily but I bought licenses for everyone on our engineering team and they all use it pretty much daily as well. Cloudberry also makes tools for many of the other cloud based storage solutions. Pretty cool. Thanks a lot to the guys over at Cloudberry Labs for such a great tool!


Mike Davis