Monthly Archives: May 2012

Google App Engine

My ongoing project it's deployed on Google App Engine.

The setup that I'm using includes

  • Python 2.7
  • HRD
  • Pipeline API
  • Mechanize
  • urlfetch
  • BeautifulSoup

I've been mentioning this for awhile now, but after a couple of months of experience on my back, I can mention more things about google app engine.

I'm using the free version, that has a daily quota. After that, the deploy goes down and you can't do pretty much, except than to wait till next day till the quota resets, and you are fresh to start again.

What consumes my quota ?

Mostly, HRD write operations and read operations, that is, when you operate with your database. This includes also the Pipeline tasks, that also generate write operations on the HRD. Since the nature of my project requires the pipelines, we have that playing against us too, but it is better handled than the regular write ops that I have on my entities.

What's one of the challenges here ?

Create an instance, with the proper indexes to reduce the write operation costs.

I saw a huge drop on the write operations when I defined indexes, though this took time. Finding the information and applying it took time, mostly due to the whole thing of learning what my client wants, and during this iterative process that I have with him, we learn from our mistakes and improve during the next run.

The entities that I'm working aren't complex, and aren't using parent / child relations, they are straight forward entities. Why ?

The workflow I'm working on, it heavily uses the datastore to store scraped data from the web. A bad defined index will increase the write operations per record, which incurs on burning your quota faster.

Some stuff to keep in mind

There's a lot of information and opinions going around about Google App Engine, there are very negative ones, but after using it for awhile, it's clear that you have to choose the proper tool for what you are using.

Yes, sometimes it's a bit frustrating to find out that your application isn't working, or suddenly you hit your quota and you can't operate in production.

So far, I did not have downtime due to Google errors. I mention this, because I saw a lot of, you will have a lot of downtime. That didn't happen to me so far.

The development version may work differently to what you have in production, I did notice this with the indexes.

One of my workflows was working perfectly here, but when I uploaded it, I started to have a lot of indexes errors.

I'm still trying to learn how to stub services, but not the datastore, since that is properly documented, mostly I'm talking about the User stub, since I couldn't make it work with testbed.setenv(), I had to use os.environment['var'] ..I have some complex stuff to test to, that puts me in the position where I've got to think how much it would solve and allow me to move forward, rather than consume my work hours on a test task, that won't cover much of the whole application.

After all, it's a nice exercise for the mind and I'm enjoying it so far, it's something different than what I was doing previously on my other company, that was boring and dull.