Talk given at Datapalooza Denver 2016 titled "(Your) Data as a Service: The Easy Way to Build an API for Your Data".
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 4.7 KiB

8 anni fa
8 anni fa
8 anni fa
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
  1. # The Server Demo
  2. This demo is inspired by the work in the [Connexion Example Service](https://github.com/hjacobs/connexion-example). The goal of this demo is to show how to quickly prop up an API to make use of data that you've been holding on to -- and to impress upon you how easy it is to liberate your data, no matter what form it may be in. This example, will create endpoints over a MS Excel file!
  3. Sound interesting? Read on!
  4. ## The Data Source
  5. The data for the first part of the demo comes from the [National Center for Education Statistics](http://nces.ed.gov). In particular, we're going to make an API over some of the data in the [Integrated Postsecondary Data System](http://nces.ed.gov/ipeds) (IPEDS) which keeps track of a number of interesting statistics on higher education data.
  6. ## The API Problem
  7. There is a wealth of information in the data that we'd like to expose, but the files are all in Excel. What's more, we don't exactly have time (or energy just yet) to convert the files to CSV, import them into a database, etc. For the AccessDB folks, this might be a simple problem to solve, but I'd rather just take the data and expose it as fast as possible.
  8. Though it'd be nice to demo exposing all of this, I've only got an hour (remember), so the data I'd really like to start with are :
  9. * average in state tuition and fees from 2006 (or so) to 2014,
  10. * average room and board (same time frame), and
  11. * average books and supplies cost.
  12. Seems simple enough, but there is a little hitch -- this data lives in two separate files.
  13. ## Finding the Data
  14. We're going to use the survey data from IPEDS in this example and you can check out the [methodology for the data here]. In particular, there's a data set for the "Student charges for academic year programs" which contains a number of data points on the costs students pay to attend 4-year colleges. These costs include the amount of tuituion and fees, the amount of room and board, and so on. We'd like to just expose the averages of a few of these amounts to get the exercise started. While the raw data to calculate these averages is in the core data files, once we browse around a bit, we'll see that the "Dictionary" has a summary of the averages data we're looking for.
  15. We going to work with two files:
  16. * 2010 dictionary: [http://nces.ed.gov/ipeds/datacenter/data/IC2010_AY_Dict.zip](http://nces.ed.gov/ipeds/datacenter/data/IC2010_AY_Dict.zip), which includes the data we need for 2007-2011, and
  17. * 2014 dictionary: [http://nces.ed.gov/ipeds/datacenter/data/IC2014_AY_Dict.zip](http://nces.ed.gov/ipeds/datacenter/data/IC2014_AY_Dict.zip), which includes the summary data for 2011-2014.
  18. ## Getting the data
  19. Examining these files a little more closely, we can see the follow table gives us a mapping of where we can find what we're looking for:
  20. | Data | Location Notes |
  21. |------|-------------------|
  22. | Average published in-state tuition and fees (2007-10) | [ic2010_ay.xls](./code/server/data/ic2010_ay.xls) StatisticsRV sheet; cells E87-E98 |
  23. | Average published in-state tuition and fees (2010-14) | [ic2014_ay.xls](./code/server/data/ic2010_ay.xls) Statistics sheet; cells E72-E81 |
  24. | Average books and supplies costs (2007-10) | [ic2010_ay.xls](./code/server/data/ic2010_ay.xls) StatisticsRV sheet; cells E101-E104 |
  25. | Average books and supplies costs (2011-14) | [ic2014_ay.xls](./code/server/data/ic2014_ay.xls) Statistics sheet; cells E98-E101 |
  26. | Average room and board (2007-10) | [ic2010_ay.xls](./server/code/data/ic2010_ay.xls) StatisticsRV sheet; cells E105-E108 |
  27. | Average room and board (2011-14) | [ic2014_ay.xls](./server/code/data/ic2014_ay.xls) Statistics sheet; cells E102-105 |
  28. # The Server Code
  29. The API we're going to develop will return JSON (by default and as a nicety of the Connexion library), and we will wrap the Excel file to demonstrate how it can be done. This is done only out of convenience, and is_not_ necessarily recommended practice - except in special cases that might actually warrant it.
  30. ## Endpoints
  31. For the sake of this example, we are going to have three endpoints listed in the table below:
  32. | Endpoint| Description |
  33. |------|-------------------|
  34. | `/` | Provides the server root information. |
  35. | `/summary/costs` | Provides the links to the endpoints to obtain all the costs data in the API. |
  36. | `/summary/costs/{year}` | Provides the costs return data for the supplied year (e.g. tuition+fees, room and board, books and supplies) |
  37. ## Implementation
  38. The core implementation can be found in [server](./code/server/api_server.py) file.
  39. The API specification can be found in [apispec](./code/server/apispec/data_api.yaml). For more information about YAML, go [here](http://www.yaml.org/), and of course, [here for the OpenAPI Specification](https://github.com/OAI/OpenAPI-Specification/).