Monday, May 20, 2013

Making my webserver run on App Engine

So I had my two python scripts that were serving my tax visualization webpage (a JSON Server on localhost:9888 and a trivial server for files on disk (like the HTML page) on localhost:9889). How to make that available on the web?

Of course I could just put my static pages on some free web hosting place, but that misses the point in that I'd like to maintain my own JSON server. So my friends at work suggested using EC3 (Amazon)  or App Engine (Google). Since EC3 seems to possess a 1-year expiration time of the freebie quota, I decided to go for GAE.

GAE provides a development kit that allows for local development and contains some upload scripts. But I wanted to do everything on my Chromebook Pixel which does not have a real shell (can't enable developer mode because it's a corporate laptop). So I looked around and came across codenvy.com, formerly eXo Cloud IDE, a free service that offers online coding and integration with many services, amongst which GAE. Cloud 9, another popular online dev platform, does not seem to offer that integration.

Here are the things I needed to adapt in order to make it work:

  • Write a configuration file (app.yaml) that specifies the handlers, i.e. what to do when a certain URL is requested. In my application /json.* URLs are handled by a python script whereas all other URLs are just served.
  • Adapt my python script to use the webapp2 server library instead of the native Python web server modules.
There are a lot of small details that can go wrong with either of them, but in 7-8 hours' time I managed to make it work.

The final result is at
http://swisstaxvisualization.appspot.com/

Sunday, May 12, 2013

Making data files UTF8-compatible

An MS Excel CSV Export that contains Swiss town names has Umlauts, accents etc. How to replace these? For, example, to replacü by ue use

sed -i 's/\xFC/ue/g' my.file


  • -i (inline) means that the result is not in the terminal output but input is overwritten with it.
  • \xFC is the hexadecimal representation of ü.
  • Don't forget the single quotes.

Character   Hex Code
ü            \xFC
ö            \xF6
ä            \xE4

é            \xE9
â            \xE2

è            \xE8
ê            \xEA
ë            \xEB
ô            \xF4
sed -i 's/\xFC/ue/g' ledig.csv


sed -i 's/\xFC/ue/g' ledig.csv

Creating a webpage and a JSON data server


Had a lot of spare time this weekend, so I decided to tackle the problem of understanding tax rates in Switzerland.
Switzerland has the unique feature that the amount of tax you pay depends on the commune (town) you live in. There are calculators on the net that allow one to estimate the paid tax, but in order to make comparisons between communes one needs to submit a form N times and compare the numbers. Time for some visualization.

Since our intern at work currently also creates web-based visualizations with a framework called D3, I figured it'll be nice to use that for my little project as well. Also, I did not want to hard-code the data into the webpage (besides, storing data for 3000 communes in the client up front is far from efficient). Instead, I wanted to use a classic client-server model and it appears that JSON is the standard format of sending data around. What follows now is a couple of learning steps that I needed to take during the ~10 hours that it took me to set it up.

Servers

I decided to use Python for web servers because I knew that there were some easy-to-use plugins.
Python has the SocketServer module that contains TCPServer(derived from BaseServer), UDPServer (derived from TCPServer) and two other Unix-style servers. One would subclass
class MyTCPServer(SocketServer.ThreadingTCPServer):  allow_reuse_address = True
and define a handler to process requests,

class MyTCPHandler(SocketServer.BaseRequestHandler):   
def handle(self): 
    try: 
      data = json.loads(self.request.recv(1024).strip()) 
      # Do something. 
      self.request.sendall(json.dumps({'foo''server says hi.'})) 
    except Exception, e: 
      print "Exception wile receiving message: ", e

Then instantiate the server and run it indefinitely:

server = MyTCPServer(('my.host.com', 9888), MyTCPHandler)
server.serve_forever()
But that seems not to work with requests from "normal" web pages that use the HTTP protocol. After all, how would you send the JSON from within a webpage?
The simpler option seems to be to run an ordinary HTTP server where the type of request is parsed from the URL parameters.
import BaseHTTPServer 
class MyHTTPServer(BaseHTTPServer.HTTPServer):
  allow_reuse_address = True
 
class MyHTTPHandler(BaseHTTPServer.BaseHTTPRequestHandler):
  def do_GET(self):
    if 'favicon.ico' in self.path:  # Requested by browsers.
      return
    # Get request specifics from self.path ...
    # Prepare result (Serialized JSON object).

    serialized_result = json.dumps(result)
    self.send_response(200)
    self.send_header('Content-type''application/json')
    self.end_headers()
    # Wrap a callback around the result (JSONP).
    self.request.sendall('parseResponse(' + serialized_result + ')')
server = MyHTTPServer(('my.host.com', 9888), MyHTTPHandler)
server.serve_forever()
 

JSON

JSON is a data storage format that is used to pass around data between servers. It looks like a dictionary:
'key1''hello''key2': 42, 'key3': ['a', 'b', 'c'] }
Python has the json module that provides serialization from and to strings.
data = json.loads(s)
s = json.dumps(data)
JSONP is a simple convention to enclose the serialized JSON data with a function (server returns 'myFunction('+s+')' instead of s), with that effect that a place on a webpage that injects this server result

<script type="text/javascript" src="http://my.host.com:9888/?jsonp=myFunction"></script>
will actually execute myFunction as a callback with the received JSON data. myFunction needs to be defined on the page beforehand. Note that jsonp=myFunction is only there for the server to determine what it should do, and it is only convention but not a requirement that the URL has this form.

Security

This seems to be taken seriously in Chrome.
  • There is a strictly enforced same-origin policy that forbids scripts to sit in another host. JSONP should be able and circumvent that.
  • There is also a strictly enforced policy that HTTPS pages cannot include content from HTTP pages anymore. Setting up a HTTPS Python server seems to have become easier in Python 3, but my installation is 2.7 and there it doesn't seem straightforward. Moreover, my "auto" HTML page server that allows me to view any page in a certain directory always uses HTTPS. So I decided to actually create a second server to serve the HTML pages over insecure HTTP, just so they can communicate with my insecure JSONP server.

More random insights


  • Browsers always seem to be querying the server for favicon.ico (the little icon that appears in the Browser tab), so the HTML page server must handle that.
  • When serving a CSS file, it appears that the MIME type of the response MUST be set correctly to text/css or else the CSS file contents won't be applied by Chrome.
    Correct MIME types for JSON/JS/HTML are application/jsontext/javascript and text/html.