Google Cloud Storage – Initial setup, cp, and rsync

February 20, 2015 | By

gcslogoDuring the past few days, I’ve been diving into Google Cloud Storage (GCS).  You might assume that this is a boring aspect of Google Cloud Platform because cloud storage has been around for awhile, but I found some pleasant surprises in my exploration.  Let’s start with a brief overview:

  • Google Cloud Storage is not Google Drive. Google Drive is a consumer-oriented file storage and synchronization service that allows users to store documents, share files, and edit documents with collaborators, and it has APIs that allow developers to extend it. Google Cloud Storage is file storage solution intended for developers and system admins and has APIs allowing apps to store and retrieve objects. It features global edge-caching, versioning, OAuth and granular access controls, configurable security, etc..
  • Google Cloud Storage is built on Google’s massive planet-wide infrastructure.  Think about all of the files that Google serves, including YouTube videos, etc.  The size and scale are beyond most people’s comprehension.  I’ve recently seen the inside of one of the many Google data centers (a very rare treat, even for Googlers!), and it gave me a tiny glimpse of the enormous scale that is Google, and it radically changed my definition of “scalable” and “fast”.
  • Google Cloud Storage has a web interface for the basic functionality, but it’s the command line tool, gsutil, that I prefer to use.  Also, the command-line tool exposes many more features.  There are also APIs (XML and JSON) that we’ll cover in a later post.

Initial Setup:

  • If you don’t already have a Google Cloud Platform account, you’ll want to create one at http://cloud.google.com.  If you’ve never signed up before, you can get a free trial that is $300 for 60 days (as of this writing).
  • Install the Google Cloud Platform SDK from http://cloud.google.com/sdk.
  • Create a project, and activate Google Cloud Storage for that project  (A project is a logical grouping of Google Cloud resources – storage, services, users, instances, etc.).  Creating a project requires a web browser, but once you get it created, you can do most everything else using the command-line tools.  Make a note of your project ID.  I’ll use “gwprojectid” for the remainder of this article.
  • We need to set up our new project as the default for the Google Cloud command-line tools, otherwise we’ll have to use “-p gwprojectid” for each command.  To set the default project, type: gcloud config set project gwprojectid

Working with the gsutil tool

Below is a simple example of storing and retrieving files. First we need to create a bucket – the basic container used to store files.  Since all bucket names share the global name space, it needs to be unique across all of Google Cloud.  You’ll see why this is the case in a later post when we start sharing files publicly via http.   For this example, I’ll use a bucket named “gwbucket”.  When referencing a bucket you need to specify the URL — e.g. gs://[bucketname].

#Create a bucket:
gsutil mb gs://gwbucket

#Copy some files to the bucket:
gsutil cp *.jpg gs://gwbucket

#List the files in the bucket:
gsutil ls -l gs://gwbucket

#Copy a specific file from our bucket back to the local /tmp directory
gsutil cp gs://gwbucket/sunset.jpg /tmp

#Delete the files:
gsutil rm gs://gwbucket/*

#Remove the bucket:
gsutil rb gs://gwbucket

Nothing too exciting yet, but you quickly see that the commands are familiar.  You can add “-r” to most commands to make it recursive, as expected.  Detailed help is available for all commands at https://cloud.google.com/storage/docs/gsutil or from the command line – gsutil help cp Here’s another example using rsync and versioning:

#Create a new bucket:
gsutil mb gs://gwbucket

#Turn on versioning for the bucket
gsutil versioning set on gs://gwbucket

#rsync the current directory to our new bucket
#Adding -m to run multiple parallel processes (speed boost)
gsutil -m rsync -r -d . gs://gwbucket

#List all of the files in the bucket:
gsutil ls -lr gs://gwbucket

Learn more about object versioning
Learn more about the -m option

Note: The above examples use the default bucket class of “standard” and the default bucket location of “US”.  To learn about other options, see these docs. I used the last example above (except for the versioning line) to backup my 120,000+ image library to Google Cloud Storage. Anytime I add/modify/delete images, I simply repeat the gsutil rsync.

 Be very careful using the -d option on rsync as it deletes any files from the destination that have been removed from the source.  I suggest using the -n option if you have any doubts.  The -n option causes rsync to run in “dry run” mode, i.e., just outputting what would be copied or deleted without actually doing any copying/deleting .

In my next blog post, I’ll show how to set up bucket object lifecycle management to configure automatic object deletion when it reaches a certain age, or to simply keep the last n versions of a specific object.  This becomes a key feature when doing regular backups such as archiving log files each night, etc.  Then I’ll show you how to share objects publicly via HTTP and how to utilize Google’s worldwide edge caching to provide very fast downloading of your files.  I now have all images on my blog being served this way. I’ll also cover how I configured WordPress to use GCS as the file host.

Share

Filed in: Google Cloud Platform, Google Cloud Storage | Tags: , , , , , , , ,

About the Author (Author Profile)

Greg is a developer advocate for the Google Cloud Platform based in San Francisco, California
  • Great article!

  • Pingback: Google Cloud Storage – Lifecycle Management : Gregs Ramblings()

  • Paul Mackinney

    I’m using a service where we pay a flat fee for 5T of storage, accessible by an rsync/ssh tunnel, but updates are unmetered. Two questions: a) Are you able to get a billing statement and see how much you’d have been billed if you weren’t on the free trial? b) How did you handle seeding? My current service let me send them a couple of large SATA drives to seed the offsite backup.

  • blog_egoactive

    Hello Greg,

    Tried some things with Nearline. It looks promising.

    What I am trying to achieve is combining rsnapshot & gsutil

    Making snapshots on a server and copying the results to Nearline. Any thoughts /ideas about that or do you have some examples