Using Amazon AWS to serve up Openstreetmap data
Posted by Diarmuid on December 9, 2007
Openstreetmap is a FOSS project that allows anyone to help with creating a fully open source street map of the world. Users upload GPS data to a central server where they or others can annotate and create street-maps . The resulting street-maps are as good as Goggle maps, Yahoo or Microsoft. The coverage is great in the dense urban centers and not so good in the country side. As part of this project they have created some great tools and leveraged some others.
But you ask, why bother when Google and co. are so generous with their data? Well, nothing is free and while Google can afford to let you use this data at home for nothing, they do restrict use of the data for commercial applications. They specifically do not allow system that connects a GPS to their map. Yahoo and AFAICT, Microsoft are the same.
OSM on the other hand, allow you to do pretty much anything you want with their data. They provide a great interface using OpenLayers, that gives the now commonplace, ajax “slippy map”. They use a great map renderer call Mapnik that actually creates the little images that make up the web map. While their images can be used freely, even in commercial applications, They are served from some hardware that is under allot of strain, and they encourage use of caches on your local servers.
I am developing a GPS tracking system for cars, trucks and boats and I want to do this as cheaply as possible. While I could use Google, et al, I don’t want to get into licensing, etc. So I am going to run my own OSM servers and I am going to do it on Amazon AWS.
Technical Details follow!! You’ve been warned.
I used an Ubuntu Dapper Drake 6.06. LTS server AMI with all the usual build tools. I downloaded mapnik 0.4.0 and compiled. I downloaded World Boundaries and extracted to /mnt/spatialdata. I copied generate_tiles.py and osm.xml to /mnt. I modified osm.xml to reflect the new path locations as well as removed all the postgresql calls (OSM stored all the road data in sql but I had trouble getting it going so I have left that for now). I then modified generate_tiles.py to only do an area around the South of Ireland and to write all data to /mnt/tiles. I added S3.py to /usr/src and modified generate_tiles.py to add in that directory as a src (sys.path.append(‘usr/src’)). I then added in code to upload the new tile to Amazon S3 in the osmtiles bucket.
I added the following lines above the “def minmax (a,b,c):” line. It assumes that you have set the environment variables properly.
os.sys.path.append(‘/usr/src’) import S3 BUCKET_NAME = “osmtiles” conn = S3.AWSAuthConnection()Then after the file is created locally I add these line:
try:filedata = open(tile_uri, ‘rb’).read()
content_type = ‘image/png’
response = conn.put(BUCKET_NAME,’1.0.0/basic/’+ tile_shortname, S3.S3Object(filedata),
{‘x-amz-acl’: ‘public-read’, ‘Content-Type’: content_type,’title’: tile_shortname})
print response.http_response.status
except Exception, ex:
print ex
print tile_uri
where tile_shortname is = zoom + ‘/’ + str_x + ‘/’ + str_y + ‘.png’ . Thus files are created like the following on S3.
with a path of http://osmtiles.s3.amazonaws.com/1.0.0/basic2/10/485/340.png.
These files can then be used in an Open Layers map. http://www.lukulu.com/map/tilesv2.htm
I also wrote a script to store the World Boundaries file on S3. I used a file splitter and created 20 segments. I also did the same for the planet.osm file (which contains the road data) which is 2 GB. There is no cost for data transfer between EC2 and S3 so I won’t have to pay to download these again for a while.
So What?
OK, so you ask, what is the big deal. Well, you may be right, it is not such a big deal, but what this means is that I can use Amazon EC2 to create my maps, the use Amazon S3 to store them. The only load on my servers is dishing up the map holding page. The java script for the map comes from the OpenLayers site. I do have to pay for storage of the files and transmission costs, but it means I have an app that can scale and is being served by one of the fastest pipes in the business.
Where to from here
I only ran my scripts for a few minutes as I just wanted the coast lines but if neccessar one could build a more sophisticated system. You could have a local script that creates messages on the Simple Queue Service that would tell running scripts which tiles to render. Thus popular locations (particular cities) could be prioritised. I can see for my requirement that I only want certain areas to be rendered in high detail. I would develop a tool to allow me to create the bounding boxes easily, create the messages and then my script would render just those areas. I could see this being useful also when changing the format of the osm.xml. Of course, you could launch 20 instances of the AMI doing the processing to get the images rendered quickly.
Also, while serving images from s3 is easy and there is a cost for storage there is alos a cost per request. As each pan of a map loads up new images, this could be an issue. Running instances on Amazon have up to 160 GB of “storage”. I say “storage” because if you turn off the instance, data in /mnt is gone (it is preserved through reboots). When an instance is run it could download from s3 a tar’d file of a standard set of tiles. The web server would the serve the files. Dynamic DNS could be used, but this area is in flux at the moment with Amazon (NATting, etc.
I am going to implement most of the above and see how well it works. I’m quite enthusiastic that this could work very well for mapping in other areas too, e.g Mining, Geophysics, etc
deelkar said
The license for OpenStreetMap data is CC-BY-SA by the way.
Kelvin Nicholson said
Great write-up! I’m quite enjoying playing with OSM/Mapnik, although I haven’t even come close to producing anything of substance. Yet. I like your idea of using AWS to offload the processing — I’ve considered doing the same thing, especially since my little dev VPS doesn’t have the memory to complete osm2pgsql, and doesn’t have the space to unzip the entire planet.osm — sigh.
Maybe I will cave in and create something automated and leverage AWS. Either way, thanks for sharing your writeup!
Kelvin