Google’s S2, geometry on the sphere, cells and Hilbert curve

Update – 05 Dec 2017: Google just announced that it will be commited to the development of a new released version of the S2 library, amazing news, repository can be found here.

Google’s S2 library is a real treasure, not only due to its capabilities for spatial indexing but also because it is a library that was released more than 4 years ago and it didn’t get the attention it deserved. The S2 library is used by Google itself on Google Maps, MongoDB engine and also by Foursquare, but you’re not going to find any documentation or articles about the library anywhere except for a paper by Foursquare, a Google presentation and the source code comments. You’ll also struggle to find bindings for the library, the official repository has missing Swig files for the Python library and thanks to some forks we can have a partial binding for the Python language (I’m going to it use for this post). I heard that Google is actively working on the library right now and we are probably soon going to get more details about it when they release this work, but I decided to share some examples about the library and the reasons why I think that this library is so cool.

The way to the cells

You’ll see this “cell” concept all around the S2 code. The cells are an hierarchical decomposition of the sphere (the Earth on our case, but you’re not limited to it) into compact representations of regions or points. Regions can also be approximated using these same cells, that have some nice features:

  • They are compact (represented by 64-bit integers)
  • They have resolution for geographical features
  • They are hierarchical (thay have levels, and similar levels have similar areas)
  • The containment query for arbitrary regions are really fast

The S2 library starts by projecting the points/regions of the sphere into a cube, and each face of the cube has a quad-tree where the sphere point is projected into. After that, some transformation occurs (for more details on why, see the Google presentation) and the space is discretized, after that the cells are enumerated on a Hilbert Curve, and this is why this library is so nice, the Hilbert curve is a space-filling curve that converts multiple dimensions into one dimension that has an special spatial feature: it preserves the locality.

Hilbert Curve

Hilbert Curve

The Hilbert curve is space-filling curve, which means that its range covers the entire n-dimensional space. To understand how this works, you can imagine a long string that is arranged on the space in a special way such that the string passes through each square of the space, thus filling the entire space. To convert a 2D point along to the Hilbert curve, you just need select the point on the string where the point is located. An easy way to understand it is to use this iterative example where you can click on any point of the curve and it will show where in the string the point is located and vice-versa.

In the image below, the point in the very beggining of the Hilbert curve (the string) is located also in the very beginning along curve (the curve is represented by a long string in the bottom of the image):

Hilbert Curve
Hilbert Curve

Now in the image below where we have more points, it is easy to see how the Hilbert curve is preserving the spatial locality. You can note that points closer to each other in the curve (in the 1D representation, the line in the bottom) are also closer in the 2D dimensional space (in the x,y plane). However, note that the opposite isn’t quite true because you can have 2D points that are close to each other in the x,y plane that aren’t close in the Hilbert curve.

Hilbert Curve
Hilbert Curve

Since S2 uses the Hilbert Curve to enumerate the cells, this means that cell values close in value are also spatially close to each other. When this idea is combined with the hierarchical decomposition, you have a very fast framework for indexing and for query operations. Before we start with the pratical examples, let’s see how the cells are represented in 64-bit integers.

If you are interested in Hilbert Curves, I really recommend this article, it is very intuitive and show some properties of the curve.

The cell representation

As I already mentioned, the cells have different levels and different regions that they can cover. In the S2 library you’ll find 30 levels of hierachical decomposition. The cell level and the area range that they can cover is shown in the Google presentation in the slide that I’m reproducing below:

Cell areas
Cell areas

As you can see, a very cool result of the S2 geometry is that every cm² of the earth can be represented using a 64-bit integer.

The cells are represented using the following schema:

Cell Representation Schema
Cell Representation Schema (images from the original Google presentation)

The first one is representing a leaf cell, a cell with the minimum area usually used to represent points. As you can see, the 3 initial bits are reserved to store the face of the cube where the point of the sphere was projected, then it is followed by the position of the cell in the Hilbert curve always followed by a “1” bit that is a marker that will identify the level of the cell.

So, to check the level of the cell, all that is required is to check where the last “1” bit is located in the cell representation. The checking of containment, to verify if a cell is contained in another cell, all you just have to do is to do a prefix comparison. These operations are really fast and they are possible only due to the Hilbert Curve enumeration and the hierarchical decomposition method used.

Covering regions

So, if you want to generate cells to cover a region, you can use a method of the library where you specify the maximum number of the cells, the maximum cell level and the minimum cell level to be used and an algorithm will then approximate this region using the specified parameters. In the example below, I’m using the S2 library to extract some Machine Learning binary features using level 15 cells:

Cells at level 15 - binary features for Machine Learning
Cells at level 15 – binary features for Machine Learning

The cells regions are here represented in the image above using transparent polygons over the entire region of interest of my city. Since I used the level 15 both for minimum and maximum level, the cells are all covering similar region areas. If I change the minimum level to 8 (thus allowing the possibility of using larger cells), the algorithm will approximate the region in a way that it will provide the smaller number of cells and also trying to keep the approximation precise like in the example below:

Covering using range from 8 to 15 (levels)
Covering using range from 8 to 15 (levels)

As you can see, we have now a covering using larger cells in the center and to cope with the borders we have an approximation using smaller cells (also note the quad-trees).

Examples

* In this tutorial I used the Python 2.7 bindings from the following repository. The instructions to compile and install it are present in the readme of the repository so I won’t repeat it here.

The first step to convert Latitude/Longitude points to the cell representation are shown below:

>>> import s2
>>> latlng = s2.S2LatLng.FromDegrees(-30.043800, -51.140220)
>>> cell = s2.S2CellId.FromLatLng(latlng)
>>> cell.level()
30
>>> cell.id()
10743750136202470315
>>> cell.ToToken()
951977d377e723ab

As you can see, we first create an object of the class S2LatLng to represent the lat/lng point and then we feed it into the S2CellId class to build the cell representation. After that, we can get the level and id of the class. There is also a method called ToToken that converts the integer representation to a compact alphanumerical representation that you can parse it later using FromToken method.

You can also get the parent cell of that cell (one level above it) and use containment methods to check if a cell is contained by another cell:

>>> parent = cell.parent()
>>> print parent.level()
29
>>> parent.id()
10743750136202470316
>>> parent.ToToken()
951977d377e723ac
>>> cell.contains(parent)
False
>>> parent.contains(cell)
True

As you can see, the level of the parent is one above the children cell (in our case, a leaf cell). The ids are also very similar except for the level of the cell and the containment checking is really fast (it is only checking the range of the children cells of the parent cell).

These cells can be stored on a database and they will perform quite well on a BTree index.  In order to create a collection of cells that will cover a region, you can use the S2RegionCoverer class like in the example below:

>>> region_rect = S2LatLngRect(
        S2LatLng.FromDegrees(-51.264871, -30.241701),
        S2LatLng.FromDegrees(-51.04618, -30.000003))
>>> coverer = S2RegionCoverer()
>>> coverer.set_min_level(8)
>>> coverer.set_max_level(15)
>>> coverer.set_max_cells(500)
>>> covering = coverer.GetCovering(region_rect)

First of all, we defined a S2LatLngRect which is a rectangle delimiting the region that we want to cover. There are also other classes that you can use (to build polygons for instance), the S2RegionCoverer works with classes that uses the S2Region class as base class. After defining the rectangle, we instantiate the S2RegionCoverer and then set the aforementioned min/max levels and the max number of the cells that we want the approximation to generate.

If you wish to plot the covering, you can use Cartopy, Shapely and matplotlib, like in the example below:

import matplotlib.pyplot as plt

from s2 import *

from shapely.geometry import Polygon

import cartopy.crs as ccrs
import cartopy.io.img_tiles as cimgt

proj = cimgt.MapQuestOSM()
plt.figure(figsize=(20,20), dpi=200)
ax = plt.axes(projection=proj.crs)

ax.add_image(proj, 12)
ax.set_extent([-51.411886, -50.922470,
               -30.301314, -29.94364])

region_rect = S2LatLngRect(
    S2LatLng.FromDegrees(-51.264871, -30.241701),
    S2LatLng.FromDegrees(-51.04618, -30.000003))

coverer = S2RegionCoverer()
coverer.set_min_level(8)
coverer.set_max_level(15)
coverer.set_max_cells(500)
covering = coverer.GetCovering(region_rect)

geoms = []
for cellid in covering:
    new_cell = S2Cell(cellid)
    vertices = []
    for i in xrange(0, 4):
        vertex = new_cell.GetVertex(i)
        latlng = S2LatLng(vertex)
        vertices.append((latlng.lat().degrees(),
                         latlng.lng().degrees()))

    geo = Polygon(vertices)
    geoms.append(geo)

print "Total Geometries: {}".format(len(geoms))
    
ax.add_geometries(geoms, ccrs.PlateCarree(), facecolor='coral',
                  edgecolor='black', alpha=0.4)
plt.show()

And the result will be the one below:

The covering cells.
The covering cells.

There are a lot of stuff in the S2 API, and I really recommend you to explore and read the source-code, it is really helpful. The S2 cells can be used for indexing and in key-value databases, it can be used on B Trees with really good efficiency and also even for Machine Learning purposes (which is my case), anyway, it is a very useful tool that you should keep in your toolbox. I hope you enjoyed this little tutorial !

– Christian S. Perone

Cite this article as: Christian S. Perone, "Google’s S2, geometry on the sphere, cells and Hilbert curve," in Terra Incognita, 14/08/2015, https://blog.christianperone.com/2015/08/googles-s2-geometry-on-the-sphere-cells-and-hilbert-curve/.

68 thoughts to “Google’s S2, geometry on the sphere, cells and Hilbert curve”

  1. Great post! Thank you for digging this up and articulating it so well. Definitely sharing this one around the office. Cheers!

  2. It’s a quadtree projected onto the unit sphere, plus using the Hilbert curve for a fast mapping between integer indices and cells.

  3. Thanks for this post. Helped me quite a lot. You had posted this link on my question in stack exchange.

    I have run the same examples of getting cellIDs in Java. It works well. However, I have some question which you might be able to answer or give direction for the same:

    1) How is cell ID different from geo-hash? Or is it same?

    2) Can routing be done with S2?

    3) I came across S2 library when I was trying to do ETA(time of arrival) analysis with vehicle/bike data I had. I read that Google, foursquare, Uber use this library. I had posted the same question : gis. stackexchange. com /questions/152719/postgis-for-gpx-analysis. few days back. Can you make out any approach to solve the same?

    1. Hi, I would say that S2 is a kind of a geohash with different properties. Routing cannot be done with S2, S2 isn’t a algorithm for routing but for geometry on sphere, it can be used to generate a covering for a route but not for routing itself, for that I suggest you to look at the traditional routing algorithms like A-star, etc.
      Your problem is a wide open problem, you can use Machine Learning methods to approximate the time it will take to go to a certain destination at certain time of the day using historical records, there are a lot of methods you can use, but I don’t really know any benchmark (of results performance) between different models for that, try searching for papers about that, you should certainly find something related with your problem.

      1. Thats great what you said – “it can be used to generate a covering for a route but not for routing itself,”
        It means, if I have a set of points along a route, I can create a covering with S2 cells.
        I have to pre-calculate from historical data, what time it takes for the cells to move at a particular time of the day. I can sum them up and create the ETA.

        But, what is best use of Hilbert curve here? I am still confused w.r.t usecases of the same. Can S2 cells be enriched with more information, like in my case with time? And Hilbert curve might give me faster lookup of this information?

        1. Not sure if the questions are still on your mind, but if they are :
          GeoHash and S2Cells are made for the same purpose of dividing a spherical geospatial plane into partitions / cells while preserving locality (meaning 2 geospatially adjacent cells should have a “similar” hash code / cell id)

          GeoHash does that by taking the sphere and dividing it to 2 parts, one is coded 1 the other is 0 (lvl1), taking one of the parts and continuing the same process will result in a new lvl with finer granularity and longer GeoHash Code.
          This method as you see is linear in embedding (lon,lat -> geohash code) which makes it super fast but lacks accuracy (mostly near equator lines) and therefore has large variance on cell sized (2 adjacent cells can vary largely in cell area)

          S2Cell embedding is more complex, bounding the sphere with a cube, dividing each cube face to 4 parts with a quad-tree (and each part to smaller parts, therefrom comes the notion of s2 cell levels which are the level in the tree) and finally enumerating tree leaves to s2cell IDs (long codes) using a hilbert space filling curve.
          Embedding is therefore quadratic (quad-tree) which is a bit slower then geohash but taking into account projection from a sphere to a cube which trades off with less variance regarding to cell area (adjacent cells are closer in size, so you get less errors when aggregating appearances in every cell and comparing them to one another – is this a “hotspot”?)

          Hope this helps!
          S2 Cells are awesome, and this was a great piece!
          (all the data provided here is “public” meaning you could with some great effort deduce it from the presentation above and the source code :))

          One more resource that might help is this presentation :
          http://www.slideshare.net/DanielMarcous/geo-data-analytics

          Regards,
          Daniel Marcous.

  4. “However, note that the opposite isn’t quite true because you can have 2D points that are close to each other in the x,y plane that aren’t close in the Hilbert curve.”

    I noticed this and wonder what the implications are. It seems that it would make some of the operations “asymmetric”. Is this a problem in practice?

  5. “you’re not going to find any documentation or articles about the library anywhere except for a paper by Foursquare , a Google presentation and the source code comments.”.

    Hi! Where can i find the paper from Foursquare? I’v got confuse in the class S2CellId in the java version, how dose it generate a lookup table recursively, It’s too much bit operations makes the code hard to read.

  6. Hi Christian,

    Really informative post.
    Were you able to create a regionCoverer using polygons? Do you have any example to do the same?

  7. Hi Christian — I think the library from the micolous repository is mapping coordinates to the wrong cube face, causing the more diamond-like cell shapes. I did a cover using some coordinates in Washington DC. The micolous library put them on face 101 while s2map put them on 100. I posted a comparison slide here: http://imgur.com/ymvw3o2

  8. Christian, this is a great article. I have a question- Does the S2 cell lie on the idealized sphere, or does it mirror the contour of the actual surface? Also since it assumes a sphere, is there an issue because the actual shape of the earth is an oblate ellipsoid?

  9. This is Fantastic. Thanks Christian and I am sorry for not able to find a word other than fantastic which really could compliment you sufficiently for this.

  10. Can S2 be used to make geo-radial query? For example, say, I have a set of lat/long, and I want to find out which among those are with in a given distance from a given location. Basically combination of GeoAdd and GeoRadis Redis commands.

  11. Really awesome post! The better summary that I’ve found on my study about GeoQuery and GeoIndex methods.

    Thanks a lot for that and for awesome Pingbacks!

  12. Great post.

    If you had a load of lat/long data in a database – how would you go about creating s2 functions to go from lat/long to a cell_id at a given level.

    I have lat/long data in google bigquery and am wondering if i could use UDF’s to get s2 functionality in BQ to avoid having to send the data back out to python or somewhere else for processing.

    Cheers
    Andy

  13. Great post, updates for Python 3.5 and s2sphere (python)

    import s2sphere as s2

    import matplotlib.pyplot as plt
    from shapely.geometry import Polygon
    import cartopy.crs as ccrs
    import cartopy.io.img_tiles as cimgt

    proj = cimgt.OSM()
    plt.figure(figsize=(20, 20), dpi=200)
    ax = plt.axes(projection=proj.crs)
    ax.add_image(proj, 12)

    ax.set_extent([-51.411886, -50.922470,
    -30.301314, -29.94364])

    region_rect = s2.LatLngRect(
    s2.LatLng.from_degrees(-51.264871, -30.241701),
    s2.LatLng.from_degrees(-51.04618, -30.000003))
    coverer = s2.RegionCoverer()
    coverer.min_level=8
    coverer.max_level=15
    coverer.max_cells=500
    covering = coverer.get_covering(region_rect)
    geoms = []

    for cellid in covering:
    new_cell = s2.Cell(cellid)
    vertices = []
    for i in range(0, 4):
    vertex = new_cell.get_vertex(i)
    latlng = s2.LatLng.from_point(vertex)
    vertices.append((latlng.lat().degrees,latlng.lng().degrees))
    geo = Polygon(vertices)
    geoms.append(geo)
    print("Total Geometries: {}".format(len(geoms)))

    ax.add_geometries(geoms, ccrs.PlateCarree(), facecolor='coral', edgecolor='black', alpha=0.4)
    plt.show()

  14. Hi Christian,

    Thanks so much for this blog post !

    As there is no documentation for the S2Polygon object, I wonder what arguments it needs to be fed.
    Could you give me a pointer ?
    Thank you !

  15. Everything posted made a bunch of sense. However, consider this, suppose you were to write a awesome title?
    I am not suggesting your information isn’t good, but what if you added a title that grabbed folk’s attention? I mean Google’
    s S2, geometry on the sphere, cells and Hilbert curve | Terra Incognita is a little vanilla.

    You could look at Yahoo’s home page and note how they create news
    headlines to get people to click. You might add a video or a related picture or
    two to get people interested about what you’ve got to say.

    Just my opinion, it would bring your website a little livelier.

  16. Chris….I don’t know why it took me so long to find this wonderful post, but it has been a tremendous benefit to me!!!

    Thank you very much!

  17. Thanks Christian S. Perone for this article. It has helped to pursued to use the S2 Google library to do our geo spatial queries for a fleet tracking system we are developing. Those who are interested in a very cool node.js implementation please look at Andrea’s typescript implementation of the Google S2 which plays nicely in node on github (nodes2ts). https://github.com/vekexasia/nodes2-ts

  18. Thank you for the auspicious writeup. It in truth used to be a enjoyment account it.

    Glance complicated to more delivered agreeable from you! By the
    way, how can we be in contact?

  19. Thanks a lot for this amazing tutorial Christian!

    I have a lot of doubts though. Would be great if you can help. 🙂

    In the example, you have created a rect region from two points. How does that work?
    I want to create a region from a thousand points(latlong). And then divide it region in the cells. How can I do that?

    I can see there is a method addPoint for rect in c++ library, but nothing in python.
    Can you help? Thanks a lot!

  20. Hi Christian,
    Many thanks for the article, really comprehensive! My only question would be how the appropriate face (out of the 6 faces of the cube) is decided to project a given (lat,lng) region on.
    Best,
    – ZP

    1. Edit:
      My original impression was that each region (or point is potentially projected linearly on more than one cube faces (the ones where it is “visible from). However, it seems that each region (or point) has a unique projection onto the cube, as the projection of the line segment starting from the Earth center, passing through the region (or point) and ending at a point in one of the six cube faces (as shown in Google’s presentation: https://docs.google.com/presentation/d/1Hl4KapfAENAOf4gv-pSngKwvS_jwNVHRPZTTDzXXn6Q/view#slide=id.i15).

  21. It’s very effortles to find out any matter on web as compared to textbooks, as I found this post aat this web page.

  22. Hi Christian,
    Thanks a lot for this amazing blog. It is just awesome.

    I have one use case where I have geo location of users and a point in the city and I need to find users who are within 5kms radius of that point. I could represent locations of user and the point with S2 cell id. But how would I query that?
    Do I need to load data for all the users inmemory to my application and then use library to find my target users. Or can I store data for all the users in a database and make query on S2 cell ids as S2 cell id is using Hilbert curve so we know that geo points who have closer S2 cell ids are closer in distance with each other.
    Or is there any other method which I am missing
    Thanks for the help in advance

  23. Hi Christian,
    Thanks a lot for this amazing blog. It is just awesome.

    I have one use case where I have geo location of users and a point in the city and I need to find users who are within 5kms radius of that point. I could represent locations of user and the point with S2 cell id. But how would I query that?
    Do I need to load data for all the users inmemory to my application and then use library to find my target users. Or can I store data for all the users in a database and make query on S2 cell ids as S2 cell id is using Hilbert curve so we know that geo points who have closer S2 cell ids are closer in distance with each other.
    Or is there any other method which I am missing
    Thanks for the help in advance

    1. Hi,
      Saw this blog very late. I am also waiting for reply to the question you posted. By any chance did you solve the problem.

  24. Hi Christian, thanks for this article. I just wanted to mention that I had to use `OSM` instead of `MapQuestOSM ` and switch the order of lat and lng when appending to `geoms`

    So instead of:
    “`
    vertices.append((latlng.lat().degrees(),
    latlng.lng().degrees()))
    “`
    I did:
    “`
    vertices.append((latlng.lng().degrees(),
    latlng.lat().degrees()))
    “`

  25. Everything composed wass very logical. However, what about this?
    what if you added a little information? I ain’t suggesting your information is not good, but
    suppose you added a post title that grabbed folk’s attention?
    I mean Google's S2, geometry on thhe sphere, cells and Hilbert cuurve | Terra Incognita is kinda plain. You could glance at Yahoo’s front page
    and see how they create news headlines to get
    people interested. You mignt add a video orr a related
    picture or two to get readers interested about what you’ve got
    to say. In my opinion, it would make your blog a little bit more interesting.

  26. Hello. I see that you don’t update your site too often. I know that writing articles is time consuming and boring.
    But did you know that there is a tool that allows you to create
    new articles using existing content (from article directories or other websites from your niche)?
    And it does it very well. The new articles are unique and pass the copyscape test.
    You should try miftolo’s tools

  27. To some individuals, junk food appears inevitable even though they are aware of the inevitable risk fast food
    diet regimens could possibly trigger.

  28. Anyone got a javascript version? I’m just a reg guy with a reg map that I’d like to implement this to.

  29. Magnificent goods from you, man. I’ve understand your stuff previous to
    and you are just too great. I actually like what you have acquired here, really like what you
    are stating and the way in which you say it. You make it entertaining and you still care for to keep it wise.
    I can not wait to read far more from you. This
    is really a tremendous site.

  30. I have been exploring for a bit for any high-quality articles or weblog posts in this sort of house .
    Exploring in Yahoo I ultimately stumbled upon this web site.
    Reading this info So i am happy to show that I’ve a very excellent uncanny feeling I came upon just what
    I needed. I most indisputably will make certain to don?t omit this website and give
    it a look regularly.

  31. Helpful information. Fortunate me I found your site by chance, and I am stunned why this accident didn’t took
    place earlier! I bookmarked it.

  32. Christian,

    Thanks for pulling this all together, it’s super useful 😀

    Note that, you need to be wary of the order that lats/lons are provided to each of the packages i.e.,
    – s2 takes (lat, lon) order
    – shapely takes (x, y) or (lon, lat) order
    – cartopy’s set_extent takes (x0, x1, y0, y1) or (lon0, lon1, lat0, lat1) order

    Your example code seems to follow this, but in lines 35-36 you’ve given s2 (lat, lon) order to specify each geometry vertex to a shapely Polygon, which should be in (lon, lat) order instead.

    Also, I’m keen to expose the whole s2geometry API to Python, rather than be limited to the restricted API coverage provided by the google/s2geometry SWIG Python bindings.

    Inspired by you blog, I’ve started that journey at https://github.com/bjlittle/docker-s2geometry

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.