04. March 2020
Migrating to Python 3 on Google App Engine - Part 1 - Overview and planning
My blog is running on my simple, single author blogging engine written in Python using the Google App Engine platform. I have originally deployed it in 2014 and only made minor updates, hence, as the support for Python 2 has ended and the App Engine Python 2 environment is in the sunset phase, it was time to migrate to the Python 3 Standard Environment. This week, I have concluded the migration of my tiny blog engine to the new environment and I thought I share my experience and the issues encountered on my journey. This was originally planned as a single post, but there are quite a few kinds and stages of changes needed for even such a small application, and so, this will be a miniseries of blog entries. This post gives an overview of the migration process on a high level and the big picture plan for my migration scenario.
The high level process
Google Cloud has a pretty good document giving a bird's eye view of the migration steps needed to migrate from the Python 2 Standard Environment to the Python 3 Standard Environment. This migration process can be distilled into the following steps:
- Remove any dependencies on the service client APIs baked into the Python 2 Standard Environment.
- Update your code to make it Python 3 compatible.
- Update your configuration files controlling the runtime environment (eg.
app.yaml
,appengine_config.py
,requirements.txt
). - Test your application in the Python 3 environment.
- Deploy to App Engine.
When Google App Engine was born, it provided access to the most common services a web application requires right inside the runtime environment, this was part of the core concept. The client API modules just had to be imported and you could start to communicate with the Datastore for database storage, to give an example. You didn't have to worry about authentication or any boilerplate code to set up the communication with the services, they were available out of the box. As time went by and the Google Cloud product portfolio grew, some of those services were spun off into separate cloud products while new ones, that offered very similar services appeared as well. Because of this and the effort to make the runtime more generic, the majority of such services have been removed in the Python 3 Standard Environment. So, before you can move your app into the new environment, you have to change a few things in your code: it either has to talk to the spun off and generalized Google Cloud version of those services, or you can also choose to use a third party service. Fortunately you can do this migration while still running in the Python 2 environment, so when you are ready to migrate, it should be a smooth ride.
I wouldn't waste too many characters on how you can make an existing Python 2 code base Python 3 compatible, fortunately there are plenty of resources and utility libraries out there to facilitate this migration. The code base of my blog is very small, so this wasn't a big issue for me.
Next, there are some additions, deletions, changes you have to do in the configuration files controlling the runtime. Just to name a few, you have to update the runtime
element in app.yaml
to reflect the correct version and remove any elements related to deprecated features. Also, your dependencies need to be specified in the requirements.txt
file, the library imports specified in appengine_config.py
are no longer invoked or necessary.
Once you have checked all the boxes for the steps above, you are ready to test your application. With the Python 2 Standard Environment, you had to use the dev_appserver.py
local development server, that emulated the production App Engine runtime, including the bundled services. With the Python 3 Standard Environment, the bundled services have been decoupled, so theoretically you can use any WSGI compatible web server implementation and this is the local testing method, the official documentation recommends. However, there is still a small set of services, that only exist in the runtime, static file serving for instance. If you need to use one of those, you are stuck with dev_appserver.py
, which is still in a half baked state for the Python 3 environment in my experience: for one, the script to bootstrap the server environment still uses Python 2, even though the runtime environment it will run your app in is Python 3.7. I was surprised by this, after setting up a virtual environment with Python 3.
If your application is running fine locally and you are ready to deploy to App Engine, there is one additional change you have to make. The skip_files
element in app.yaml
has been replaced by a separate .gcloudignore
file, its syntax borrows heavily from the .gitignore
file of Git. Any path patterns specified in this file are excluded from the deployment.
This is the high level process in a nutshell.
My migration process
After a generalized overview, I would like to give you a more specific overview of my migration journey. I want to emphasize, that this is a process that has worked for me, but it doesn't mean, that you need to follow the same steps or order, as always, this depends on your concrete situation. Hopefully there will be some good takeaways for those, planning a similar migration though.
Datastore access
As far as App Engine is concerned, there are three primary ways, in which you can access the Google Cloud Datastore:
- The built-in
google.appengine.ext.db
module. - The built-in
google.appengine.ext.ndb
module. - The external
google.cloud.ndb
module.
From these three options, only the last one is supported by the Python 3 Standard Environment, so I have done the changes in two stages. First, I have migrated to the built-in NDB API, for this step, the client code had to be changed, then, once the application was working fine with the NDB API, I have switched over to the external library module. This is what the documentation calls Cloud NDB. The difference between the NDB API built into the App Engine runtime and the external Cloud NDB module is very small, there are some limitations you need to keep in mind for Cloud NDB. It is possible, that you don't have to make any changes in your client code at all, only the way Cloud NDB is imported and the client context is created.
This was the stage, where I have encountered the most glitches and most tweaking was required compared to the officially documented steps, but I have always managed to find a solution eventually, Google API Github issues were a great help with these.
The Pyramid Web Framework
When App Engine launched with the Python runtime, the recommended framework to get going, was Webapp2. Webapp2 is a pretty simple, but well thought out framework, but this is not surprising at all, given Google's involvement. It suited the needs of my little weblog project perfectly at the time, so I started using this framework. Time went by and pretty much all the popular web framework options available in the Python world and supporting the WSGI can run on App Engine today. This can also be one of the reasons why there isn't too much movement I see in the Webapp2 project, although it is still being maintained by the community.
Hence, I have decided to cease the opportunity and pick a widespread framework, that is lightweight, but has a good basic set of importable services and a large extension library. I wanted something lighter and less opinionated than Django, that is relatively easy to scale in the meantime. Pyramid did fit this bill in my book. While moving off Webapp2 is not necessary for migrating to the Python 3 Standard Environment, I have done it, hence it will be covered in this series.
Authentication
In the Python 2 Standard Environment I had a separate route configured for the admin interface in the app.yaml
configuration file and I have used the login
element for this admin route. Apart from the admin interface, my blog is entirely public, and I am the only administrator, so this was very convenient for me. I didn't have to worry about setting up authentication and authorization policies and sessions, I could simply use my Google account to authenticate. All the complexity was hidden, and Google guaranteed that the blog administration interface is secure. Unfortunately this option is also unavailable in the Python 3 Standard Environment, so I had to build the authentication and authorization in the web application layer.
Fortunately Pyramid, has a pretty solid authentication and authorization feature built in and you can extend it or plug into the components you wish to customize easily. In my case, I have wired Google Sign-in into the Pyramid Authentication Policy, so that I can continue making a use of my Google account, that I am using on a daily basis already, as a result, I almost never have to type in my credentials to access the blog's admin interface.
Routing
App Engine allows you to specify the module (application) names you want to run for various URL patterns in the app.yaml
configuration of the Python 2 Standard Environment. You can dispatch requests to the application of your choice this way, and you only have to configure the application specific routing for each of them. In the Python 3 Standard Environment it is no longer possible to dispatch requests this way. You can only have a single entrypoint for your application and you have to take care of all the routing using the application's routing engine.
Thus, I had to merge my routes and views distributed in separate applications into a single application and routing configuration. The single entrypoint of the application became a main.py
script, that sets up the configuration and instantiates the WSGI application object in compliance with App Engine's requirements.
Make the code Python 3 compatible
Fortunately there wasn't anything in the blog's code base that was using Python 2 specific APIs too elaborately, so I had to do one maybe two changes altogether.
Make the project configuration compatible with the new runtime
As many services have been removed from the runtime, these have to be deleted from the app.yaml
configuration as well. Others had to be updated, or moved into separate configuration files.
Briefly:
- The runtime version had to be updated in
app.yaml
. - Routes to specific scripts had to be removed from
app.yaml
, only a single catchall route remained to invoke the WSGI app. - Libraries were removed from
app.yaml
. - Paths to skip on deployment were removed from
app.yaml
and added to the.gcloudignore
file. - The (pip)
requirements.txt
file was added listing the project's package dependencies. appengine_config.py
can be removed as it is no longer invoked during the bootstrap.
Testing locally
For each stage of the migration, I have used the dev_appserver.py
development runtime to run the application locally and test the changes done. Since I am using the Google Cloud Datastore service, I also had to set up the Datastore Emulator locally. The latter required some additional configuration beyond the steps covered in the documentation. The guide contains a pretty good generic summary on setting up your application with the emulator, but that procedure doesn't work with the App Engine dev server, there are some extra steps needed.
Deploy to App Engine
Since the application is pretty small in my case, I only have deployed twice during the migration process:
- After the migration to NDB Cloud was done, which was a major change.
- After the rest of the changes were done and tested locally.
If you need to migrate a larger application, your have more service dependencies in the App Engine runtime, it is highly recommended to do a test deployment after each stage to test the functionality in the production App Engine environment as well. You want to run a separate instance for that to avoid unexpected interruptions on the production site.
The next part
In part 2, I will share an in-depth account of my Datastore client API migration to Cloud NDB.