Making a spreadsheet with addresses into data on a map (Part 1, Geocoding)

              ·

Goal of this post

This is the first in a series of doing something interesting with a larger geospatial dataset. At the beginning of this post, you will retrieve a CSV dataset containing addresses of varying quality. By the end of this post, you will have a map on GitHub that looks like this:

This post is not about making a good map or a good visualization. The point of the map in this post is to do a visual sanity check on your work, while at the same time building towards an eventual goal (in a post yet to come) of creating an interactive visual map in [Leaflet]() with the ability to visually discern and interact with data on that map. But first thing’s first. Let’s get some data and geocode it.

Geocoding

Geocoding is a basic data cleaning task. In general terms, the procedure is:

  1. Take a file without spatial data, but with address info in some form.
  2. Use a service to append that spatial information to the dataset.
  3. Save the dataset back in a format that is spatially aware.

This is one of those things that people ask me to do fairly often. They have a file with addresses, zipcodes, country names, cities, counties, whatever. And they want it and the data inside it put on a map. The task I’ll be covering today won’t make a good map, but it will allow you to make one that will let you check that your data is correctly coded and provide the basis for creating a map that reflects the data and discovery intention well later on.

First, let’s map out the tools I’ll be using:

If you were going to geocode a lot of addresses, you’ll want to setup your own geocoder or else find a service with little to no rate-limiting and a chunk of cash. Here’s what I use:

First let’s grab a dataset. I found this one: Liberating Data from NYC Property Tax Bills

The blog post about the dataset goes into good detail about what it is. For our purposes, it is a good dataset because it is:

Go ahead and download the dataset. Assuming you’re using Anaconda, the first thing you’ll want to do is install Shapely and geopy. Shapely can be installed with conda:

$ conda install shapely
$ pip install geopy

Make sure that you use Anaconda’s pip if you have more than one Python environment on your machine.

Now that we have the dataset, let’s load it in Pandas and take a look.

import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim
import usaddress
import re
import json

First thing we need to do is load the dataset. Pandas conveniently understands URLs, so we can load them directly from the website.

# If you're running this on your own on a fast connection, this will work.
# I recommend writing out the dataframe later.
# tax_bills_june15_bbls = pd.read_csv("http://taxbills.nyc/tax_bills_june15_bbls.csv")
# tax_bills_june15_exab = pd.read_csv("http://taxbills.nyc/tax_bills_june15_exab.csv")

# loading these over HTTP turns out to be an arduous task
tax_bills_june15_bbls = pd.read_csv("/Users/jeff/Downloads/tax_bills_june15_bbls.csv", index_col='bbl')
tax_bills_june15_exab = pd.read_csv("/Users/jeff/Downloads/tax_bills_june15_exab.csv", index_col='bbl')

tax_bills_june15_bbls
Output
ownername address taxclass taxrate emv tbea bav tba propertytax condonumber condo
bbl
1000010010 GOVERNORS ISLAND CORPORATION GOVERNORS ISLAND CORPORATION\nC/O TRUST FOR. G... 4 - commercial property 10.6840% 337672000.0 15749050.0 147407802.0 NaN 0.0 NaN NaN
1000010101 U. S. GOVT LAND & BLDGS BEDLOES ISLAND\n1 LIBERTY ISLAND\nELLIS ISLAND, 4 - commercial property 10.6840% 25607000.0 1106496.0 10356570.0 NaN 0.0 NaN NaN
1000010201 U. S. GOVT LAND & BLDGS ELLIS ISLAND\n1 LIBERTY ISLAND\nELLIS ISLAND, 4 - commercial property 10.6840% 233982000.0 10366655.0 97029720.0 NaN 0.0 NaN NaN
1000020001 NYC DSBS NYC DSBS\n110 WILLIAM ST. FL. 7\nNEW YORK , NY... 4 - commercial property 10.6840% 69458000.0 3163690.0 29611473.0 NaN 0.0 NaN NaN
1000020002 10 SSA LANDLORD, LLC 10 SSA LANDLORD, LLC\n729 7TH AVE. FL. 15\nNEW... 4 - commercial property 10.6840% 55592000.0 2672762.0 25016491.0 654246.0 654246.0 NaN NaN
1000020003 NOT ON FILE \nBAD LOCATION ADDRESS\n, 4 - commercial property 10.6840% 1774000.0 83277.0 779458.0 83277.0 83277.0 NaN NaN
1000020023 NYC DSBS NYC DSBS\n110 WILLIAM ST. FL. 7\nNEW YORK , NY... 4 - commercial property 10.6840% 36968000.0 1824996.0 17081581.0 NaN 0.0 NaN NaN
1000030001 PARKS AND RECREATION (GENERAL) PARKS AND RECREATION (GENERAL)\nARSENAL WEST\n... 4 - commercial property 10.6840% 285745000.0 13749587.0 128693250.0 NaN 0.0 NaN NaN
1000030002 PARKS AND RECREATION (GENERAL) PARKS AND RECREATION (GENERAL)\nARSENAL WEST\n... 4 - commercial property 10.6840% 10918000.0 524916.0 4913100.0 NaN 0.0 NaN NaN
1000030003 PARKS AND RECREATION (GENERAL) PARKS AND RECREATION (GENERAL)\nARSENAL WEST\n... 4 - commercial property 10.6840% 9484000.0 432000.0 4043430.0 NaN 0.0 NaN NaN
1000030010 UNITED STATES AMERICA UNITED STATES AMERICA\n26 FEDERAL PLZ. STE 30-... 4 - commercial property 10.6840% 32516000.0 1440628.0 13483980.0 NaN 0.0 NaN NaN
1000041001 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nRYAN, LLC\n16220 N. SCOT... 4 - commercial property 10.6840% 5351542.0 254361.0 2380764.0 254361.0 251840.0 835.0 unit
1000041002 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nRYAN, LLC\n16220 N. SCOT... 4 - commercial property 10.6840% 7733995.0 367600.0 3440656.0 367600.0 342763.0 835.0 unit
1000041003 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nRYAN, LLC\n16220 N. SCOT... 4 - commercial property 10.6840% 15960040.0 713000.0 6673528.0 713000.0 713000.0 835.0 unit
1000041004 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nRYAN, LLC\n16220 N. SCOT... 4 - commercial property 10.6840% 1372802.0 65249.0 610721.0 65249.0 53869.0 835.0 unit
1000041005 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nRYAN, LLC\n16220 N. SCOT... 4 - commercial property 10.6840% 3144677.0 149467.0 1398982.0 149467.0 148035.0 835.0 unit
1000041006 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nRYAN, LLC\n16220 N. SCOT... 4 - commercial property 10.6840% 2973077.0 141311.0 1322645.0 141311.0 141311.0 835.0 unit
1000041007 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nRYAN, LLC\n16220 N. SCOT... 4 - commercial property 10.6840% 10487584.0 498479.0 4665659.0 498479.0 384952.0 835.0 unit
1000041008 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nRYAN, LLC\n16220 N. SCOT... 4 - commercial property 10.6840% 10208233.0 485201.0 4541380.0 485201.0 374697.0 835.0 unit
1000041009 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nRYAN LLC\n16220 N. SCOTT... 4 - commercial property 10.6840% 10208233.0 485201.0 4541380.0 485201.0 374697.0 835.0 unit
1000041010 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nC/O RYAN DEPT. 113\nP.O.... 4 - commercial property 10.6840% 10208233.0 485201.0 4541380.0 485201.0 374697.0 835.0 unit
1000041011 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nC/O RYAN DEPT. 113\nP.O.... 4 - commercial property 10.6840% 10208233.0 485201.0 4541380.0 485201.0 374697.0 835.0 unit
1000041012 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nC/O RYAN DEPT. 113\nP.O.... 4 - commercial property 10.6840% 10208233.0 485201.0 4541380.0 485201.0 374697.0 835.0 unit
1000041013 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nC/O RYAN DEPT. 113\nP.O.... 4 - commercial property 10.6840% 10208233.0 485201.0 4541380.0 485201.0 374697.0 835.0 unit
1000041014 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nC/O RYAN DEPT. 113\nP.O.... 4 - commercial property 10.6840% 10208233.0 485201.0 4541380.0 485201.0 374697.0 835.0 unit
1000041015 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nC/O RYAN DEPT. 113\nP.O.... 4 - commercial property 10.6840% 10204251.0 485012.0 4539612.0 485012.0 374508.0 835.0 unit
1000041016 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nC/O RYAN DEPT. 113\nP.O.... 4 - commercial property 10.6840% 10072551.0 478752.0 4481020.0 478752.0 369737.0 835.0 unit
1000041017 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nC/O RYAN DEPT. 113\nP.O.... 4 - commercial property 10.6840% 10072551.0 478752.0 4481020.0 478752.0 478752.0 835.0 unit
1000041018 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nC/O RYAN DEPT. 113\nP.O.... 4 - commercial property 10.6840% 10399795.0 494306.0 4626605.0 494306.0 494306.0 835.0 unit
1000041019 ONE NY PLAZA CO. LLC ONE NY PLAZA CO. LLC\nC/O RYAN DEPT. 113\nP.O.... 4 - commercial property 10.6840% 10399795.0 494306.0 4626605.0 494306.0 494306.0 835.0 unit
... ... ... ... ... ... ... ... ... ... ... ...
5080500001 NUSSER- MEANY, SUSAN M. NUSSER- MEANY, SUSAN M.\nSTATEN ISLAND, NY 103... 1 - small home, less than 4 families 19.1570% 591000.0 5754.0 30038.0 5754.0 5754.0 NaN NaN
5080500004 JEANNE GIORLANDO JEANNE GIORLANDO\n7631 AMBOY RD.\nSTATEN ISLAN... 1 - small home, less than 4 families 19.1570% 533000.0 5420.0 28290.0 5118.0 5118.0 NaN NaN
5080500007 A. MARKOWITZ A. MARKOWITZ\n7635 AMBOY RD.\nSTATEN ISLAND, N... 1 - small home, less than 4 families 19.1570% 572000.0 5656.0 29527.0 4975.0 4975.0 NaN NaN
5080500010 CHARLES BEARDSLEY CHARLES BEARDSLEY\n7639 AMBOY RD.\nSTATEN ISLA... 1 - small home, less than 4 families 19.1570% 465000.0 3432.0 17914.0 2926.0 2926.0 NaN NaN
5080500013 JOHN ALLIDA SCOTTI JOHN ALLIDA SCOTTI\n7647 AMBOY RD.\nSTATEN ISL... 1 - small home, less than 4 families 19.1570% 633000.0 6177.0 32246.0 5875.0 5875.0 NaN NaN
5080500017 NUSSER, STACEY NUSSER, STACEY\n7688 AMBOY RD.\nSTATEN ISLAND,... 1 - small home, less than 4 families 19.1570% 474000.0 5448.0 28440.0 5448.0 5448.0 NaN NaN
5080500019 BECKETT, JOSEPH BECKETT, JOSEPH\nSTATEN ISLAND, NY 10307-1418\... 1 - small home, less than 4 families 19.1570% 497000.0 5282.0 27570.0 4980.0 4980.0 NaN NaN
5080500022 PAUL A. MANDILE PAUL A. MANDILE\n7663 AMBOY RD.\nSTATEN ISLAND... 1 - small home, less than 4 families 19.1570% 867000.0 9345.0 48781.0 9043.0 9043.0 NaN NaN
5080500025 ALAN J. OLSEN ALAN J. OLSEN\n7671 AMBOY RD.\nSTATEN ISLAND, ... 1 - small home, less than 4 families 19.1570% 457000.0 4847.0 25303.0 4545.0 4545.0 NaN NaN
5080500028 CLAIRE DOGERY CLAIRE DOGERY\nSTATEN ISLAND, NY 10307-1418\nO... 1 - small home, less than 4 families 19.1570% 400000.0 4512.0 23554.0 1501.0 1501.0 NaN NaN
5080500031 STEFANIE DIMINO STEFANIE DIMINO\nSTATEN ISLAND, NY 10307-1418\... 1 - small home, less than 4 families 19.1570% 490000.0 5324.0 27793.0 5324.0 5324.0 NaN NaN
5080500034 V. SCHNURR V. SCHNURR\n590 CRAIG AVE.\nSTATEN ISLAND, NY ... 1 - small home, less than 4 families 19.1570% 497000.0 5148.0 26872.0 4222.0 4222.0 NaN NaN
5080500037 ANTONIELLO, FRANK ANTONIELLO, FRANK\nSTATEN ISLAND, NY 10307-123... 1 - small home, less than 4 families 19.1570% 570000.0 6552.0 34200.0 6552.0 6552.0 NaN NaN
5080500050 VAUGHAN, BRIAN VAUGHAN, BRIAN\n582 CRAIG AVE.\nSTATEN ISLAND,... 1 - small home, less than 4 families 19.1570% 613000.0 5979.0 31213.0 5677.0 5677.0 NaN NaN
5080500053 OGNO, CHRISTOPHER E. OGNO, CHRISTOPHER E.\nSTATEN ISLAND, NY 10307-... 1 - small home, less than 4 families 19.1570% 553000.0 6324.0 33009.0 6324.0 6324.0 NaN NaN
5080500055 NOT ON FILE \nBAD LOCATION ADDRESS\n, 1b - vacant land, zoned residential 19.1570% 6000.0 25.0 129.0 25.0 25.0 NaN NaN
5080500056 WYSOCZNSKI, ANDRE MR. & MRS. ANDRZEJ WYSOCZANSKI\nMONROE, NJ 088... 1 - small home, less than 4 families 19.1570% 472000.0 4463.0 23296.0 4463.0 4463.0 NaN NaN
5080500058 CATHERINE MARINO CATHERINE MARINO\nSTATEN ISLAND, NY 10307-1237... 1 - small home, less than 4 families 19.1570% 488000.0 5273.0 27523.0 4971.0 4971.0 NaN NaN
5080500060 COMO, PIETRO COMO, PIETRO\nSTATEN ISLAND, NY 10307-1237\nOu... 1 - small home, less than 4 families 19.1570% 622000.0 5424.0 28315.0 5424.0 5424.0 NaN NaN
5080500062 JOHN K. CHAPMAN JOHN K. CHAPMAN\n560 CRAIG AVE.\nSTATEN ISLAND... 1 - small home, less than 4 families 19.1570% 486000.0 5201.0 27147.0 4899.0 4899.0 NaN NaN
5080500065 BROWN, ELIZABETH BROWN, ELIZABETH\nLONG BRANCH, NJ 07740-4932\n... 1 - small home, less than 4 families 19.1570% 492000.0 5417.0 28279.0 5417.0 5417.0 NaN NaN
5080500068 MARIO AMOROSO MARIO AMOROSO\nSTATEN ISLAND, NY 10307-1237\nO... 1 - small home, less than 4 families 19.1570% 550000.0 5974.0 31185.0 5672.0 5672.0 NaN NaN
5080500072 JOSEPH R. GLORIA JOSEPH R. GLORIA\n536 CRAIG AVE.\nSTATEN ISLAN... 1 - small home, less than 4 families 19.1570% 668000.0 7258.0 37887.0 6956.0 6956.0 NaN NaN
5080500076 DENNIS EMPEROR DENNIS EMPEROR\n534 CRAIG AVE.\nSTATEN ISLAND,... 1 - small home, less than 4 families 19.1570% 442000.0 4701.0 24538.0 3801.0 3801.0 NaN NaN
5080500078 CHIU, ANNE CHIU, ANNE\n532 CRAIG AVE.\nSTATEN ISLAND, NY ... 1 - small home, less than 4 families 19.1570% 1033000.0 10413.0 54355.0 10413.0 10413.0 NaN NaN
5080500083 TOBIN, GALE TOBIN, GALE\n142 BENTLEY ST.\nSTATEN ISLAND, N... 1 - small home, less than 4 families 19.1570% 475000.0 5361.0 27986.0 5059.0 5059.0 NaN NaN
5080500086 ARLOTTA, THOMAS ARLOTTA, THOMAS\nSTATEN ISLAND, NY 10307-1235\... 1 - small home, less than 4 families 19.1570% 585000.0 3432.0 17914.0 3130.0 3130.0 NaN NaN
5080500089 JOHN GERVASI JOHN GERVASI\nSTATEN ISLAND, NY 10307-1235\nOu... 1 - small home, less than 4 families 19.1570% 507000.0 5282.0 27570.0 2020.0 2020.0 NaN NaN
5080500092 RITA M. MOOG WILLIAM P. MOOG\nSTATEN ISLAND, NY 10307-1235\... 1 - small home, less than 4 families 19.1570% 484000.0 5296.0 27644.0 4994.0 4994.0 NaN NaN
5080500094 EDWARD DONOHUE EDWARD DONOHUE\n162 BENTLEY ST.\nSTATEN ISLAND... 1 - small home, less than 4 families 19.1570% 448000.0 5148.0 26872.0 4846.0 4846.0 NaN NaN

1081624 rows × 11 columns

And the other sheet:

tax_bills_june15_exab
Output
type detail amount units
bbl
4015180009 abatement j51 abatement -4075.0 NaN
4046020125 abatement j51 abatement -11794.0 NaN
4001570040 abatement j51 abatement -6942.0 NaN
4004740010 exemption icip -3548.0 NaN
4012820175 abatement j51 abatement -8735.0 NaN
4096480024 abatement j51 abatement -9421.0 NaN
4008811001 exemption icip -9702.0 NaN
4066880010 exemption clergy -287.0 NaN
4022201059 abatement j51 abatement -69.0 NaN
4114171507 abatement j51 abatement -106.0 NaN
5064220040 exemption park -1218.0 NaN
5077020043 exemption icip -9906.0 NaN
5009530307 exemption park -1139.0 NaN
5065770038 exemption park -1483.0 NaN
3025970001 exemption icip -72473.0 NaN
3002751203 abatement j51 abatement -1286.0 NaN
3068020036 abatement j51 abatement -2567.0 NaN
3067160075 abatement j51 abatement -1438.0 NaN
3078780009 exemption icip -684.0 NaN
3021111197 exemption icip -10772.0 NaN
3008050016 exemption icip -12288.0 NaN
3024770001 exemption icip -97241.0 NaN
3067820050 abatement j51 abatement -2253.0 NaN
3002451578 exemption icip -503.0 NaN
3073970001 abatement j51 abatement -8575.0 NaN
3067210070 abatement j51 abatement -1875.0 NaN
3065030030 abatement j51 abatement -10109.0 NaN
3053390001 abatement j51 abatement -3833.0 NaN
3006650071 exemption clergy -287.0 NaN
2039441082 abatement j51 abatement -775.0 NaN
... ... ... ... ...
2057630556 exemption basic star - school tax relief -302.0 NaN
2057630554 exemption basic star - school tax relief -302.0 NaN
2057630546 exemption enhanced star - school tax relief -621.0 NaN
2057630540 exemption senior citizens homeowners’ exemption -3338.0 NaN
2057630540 exemption enhanced star - school tax relief -621.0 NaN
2057630565 exemption basic star - school tax relief -302.0 NaN
2057630563 exemption disabled homeowner -3003.0 NaN
2057630563 exemption basic star - school tax relief -302.0 NaN
2057630533 exemption basic star - school tax relief -302.0 NaN
2057630525 exemption basic star - school tax relief -302.0 NaN
2058060703 exemption faculty student hsg -12139.0 NaN
2058060721 exemption basic star - school tax relief -302.0 NaN
2058060708 exemption basic star - school tax relief -302.0 NaN
2058060681 exemption school-elem,hs,acad -196908.0 NaN
2058060698 exemption school-elem,hs,acad -16195.0 NaN
2058060723 exemption basic star - school tax relief -302.0 NaN
2023280032 exemption icip -2337.0 NaN
2023280017 exemption icip -14241.0 NaN
2023280035 exemption icip -84983.0 NaN
1014440041 abatement j51 abatement -4792.0 NaN
1018381116 abatement j51 abatement -59.0 NaN
1018333209 abatement j51 abatement -138.0 NaN
1012061087 abatement j51 abatement -417.0 NaN
1001971110 exemption icip -8178.0 NaN
1020901029 abatement j51 abatement -20.0 NaN
1014201524 abatement j51 abatement -294.0 NaN
1018261122 abatement j51 abatement -951.0 NaN
1016021056 abatement j51 abatement -448.0 NaN
1020421124 abatement j51 abatement -3547.0 NaN
1012500006 abatement j51 abatement -7042.0 NaN

752599 rows × 4 columns

Note that address isn’t exactly right. We’ll have to normalize it a bit before we go further. First thing to do is get rid of the pesky newlines. It turns out they’re not really newlines, since we read a CSV, but rather literal escapes.

My first thought was to simply get rid of them as in below:

corrected_addresses = tax_bills_june15_bbls['address'].str.replace("\\\\n", ' ')
next(iter(corrected_addresses))
Output
'GOVERNORS ISLAND CORPORATION C/O TRUST FOR. GOVERNORS ISLAN 10 SOUTH ST. APT. FRNT SLIP7'

But it turns out that a lot of geocoders don’t respond well to recipient names at the head of the address, zip4s, and borough names instead of city names. Our solution won’t be perfect, but it’ll be decent enough to show as an example and leave “perfection” for the reader if it’s really necessary for the application.

Nominatim in particular is fragile, but we use it because it’s free and requires no API key. We also use it because in order to do bulk geocoding you’re going to have to set up Nominatim yourself or another equally fragile bulk geocoder (there are several).

So let’s go ahead and do the corrections.

First we’ll correct some common quirks. We’ll drop empty addresses and replace “One” with 1, which is common in commercial districts.

# starts_with_number = re.compile('^[0-9]+')
starts_with_one = re.compile('^ONE')

def scrub_addr(addr_lines):
    global starts_with_number
    if not isinstance(addr_lines, list):
        return addr_lines

    if starts_with_one.match(addr_lines[0]):  # this is a common pattern in addresses
        addr_lines[0] = addr_lines[0].replace('ONE', '1')

    if len(addr_lines):
        return ' '.join(addr_lines)
    else:
        return np.nan

This next function will take the addresses we ahve and make sure they at least have state and zipcode attached.

state_and_zip = re.compile(r'^.*NY\s+(?:[0-9]{5})?(?:-[0-9]{4})\s*(?:USA)?$')

def append_state_info(addr):
    if not state_and_zip.match(addr):
        return addr + ', NY 00000'  # we append a dummy zipcode because it helps the address tagger work better.
    else:
        return addr

Now that we’ve cleaned up common quirks, let’s tag the address. Rule number 1. Addresses, despite being ubiquitous, are messy. There is an (likely open-source) address parser for your country. Find it and use it. Don’t try to create your own. Writing your own address parser will cause damage to your monitor, keyboard, desk, and face. We don’t want that, do we?

My code is using usaddress. It’s slow but accurate, so we’ll not actually be doing the entire dataset in this notebook. You can of course do the whole dataset yourself without trouble, but you should really go make some pancakes from scratch and eat them while you’re waiting for it to finish.

def tag_addr(addr):
    if not isinstance(addr, str) or len(addr) == 0:
        return np.nan
    else:
        try:
            return usaddress.tag(addr)
        except usaddress.RepeatedLabelError:  # some addresses turn out to have bits repeated
            return tag_addr(' '.join(addr.split(' ')[1:]))

A final pass will re-merge the address into a single string for geocoding, keeping only the parts of the address that the geocoder understands.

def join_addr(addr):
    addr = addr[0]
    if 'ZipCode' in addr and '-' in addr['ZipCode']:
        addr['ZipCode'] = addr['ZipCode'].split('-')[0]   # Nominatim hates zip4

    if 'PlaceName' not in addr:
        addr['PlaceName'] = 'New York'  # we already know we're in new york, some addresses omit it.

    if all((
        'AddressNumber' in addr,
        'StreetName' in addr,
        'ZipCode' in addr and addr['ZipCode'] != '00000'
    )):
        return ' '.join((
                addr.get('AddressNumberPrefix', ''),
                addr.get('AddressNumber', ''),
                addr.get('AddressNumberSuffix', ''),
                addr.get('StreetNamePreModifier', ''),
                addr.get('StreetNamePreDirectional', ''),
                addr.get('StreetNamePreType', ''),
                addr.get('StreetName', ''),
                addr.get('StreetNamePostType', ''),
                addr.get('StreetNamePostDirectional', ''),
                addr['StateName'],
                addr['ZipCode']
            ))
    elif all((
        'AddressNumber' in addr,
        'StreetName' in addr
    )):
        return ' '.join((
            addr.get('AddressNumberPrefix', ''),
            addr.get('AddressNumber', ''),
            addr.get('AddressNumberSuffix', ''),
            addr.get('StreetNamePreModifier', ''),
            addr.get('StreetNamePreDirectional', ''),
            addr.get('StreetNamePreType', ''),
            addr.get('StreetName', ''),
            addr.get('StreetNamePostType', ''),
            addr.get('StreetNamePostDirectional', ''),
            addr['StateName'],
        ))
    else:
        return np.nan

scrubbed_addresses = tax_bills_june15_bbls['address']\
    .sample(n=150)\
    .str.split("\\\\n")\
    .map(scrub_addr)\
    .dropna()\
    .map(append_state_info)\
    .map(tag_addr)\
    .map(join_addr)\
    .dropna()

len(scrubbed_addresses)
Output
104
geocodable_tax_bills_june15_bbls = tax_bills_june15_bbls.ix[scrubbed_addresses.index]
geocodable_tax_bills_june15_bbls['address'] = scrubbed_addresses
geocodable_tax_bills_june15_bbls
Output
ownername address taxclass taxrate emv tbea bav tba propertytax condonumber condo
bbl
4050221346 WINSTON TOWER, LLC 315 CENTRAL PARK W. NY 2 - residential, more than 10 units 12.8550% 47791.0 2532.0 19698.0 2532.0 2532.0 162.0 unit
5022680027 ALAN NANCY EILENBERG 228 LONDON RD. NY 10306 1 - small home, less than 4 families 19.1570% 714000.0 8066.0 42103.0 7764.0 7764.0 NaN NaN
4011690028 TZIKAS, NICK (TRUSTEE) 3242 74TH ST. NY 11370 1 - small home, less than 4 families 19.1570% 591000.0 5902.0 30811.0 5600.0 5600.0 NaN NaN
1000151174 NEKKAH, KEVIN ERWIN 20 WEST ST. NY 2 - residential, more than 10 units 12.8550% 165429.0 8485.0 66004.0 3072.0 2074.0 1557.0 unit
4001391039 AMINOV, BARNO 16427 75TH RD. NY 11366 2 - residential, more than 10 units 12.8550% 43144.0 2349.0 18274.0 2349.0 2349.0 646.0 unit
3053400039 CHAMBERS, DESMOND 246 E. 8TH ST. NY 11218 1 - small home, less than 4 families 19.1570% 877000.0 5058.0 26403.0 4756.0 4756.0 NaN NaN
4072510031 WOISLAVSKY, IRWIN 18442 TUDOR RD. NY 11432 1 - small home, less than 4 families 19.1570% 976000.0 9572.0 49968.0 9270.0 9270.0 NaN NaN
4132000004 PATEL, JYOTI 13560 234TH PL. NY 11422 1 - small home, less than 4 families 19.1570% 446000.0 4298.0 22438.0 4298.0 4298.0 NaN NaN
2038730049 GUERRA, LOUIS J. 1345 ROSEDALE AVE. NY 10472 2a - 4-6 unit residential building 12.8550% 448000.0 6261.0 48708.0 5645.0 5645.0 NaN NaN
5040330032 GARY S. SCALESCI 69 CUBA AVE. NY 10306 1 - small home, less than 4 families 19.1570% 257000.0 2954.0 15420.0 2652.0 2652.0 NaN NaN
2038060039 HAGANS, DARRYL 2162 CHATTERTON AVE. NY 10472 1 - small home, less than 4 families 19.1570% 395000.0 4242.0 22144.0 3940.0 3940.0 NaN NaN
2054230006 SCALA PROPERTIES INC. 40 GRANDVIEW CIR. NY 11030 2a - 4-6 unit residential building 12.8550% 667000.0 7991.0 62162.0 7991.0 7991.0 NaN NaN
4093920028 KHAN JAMALODEEN 9419 108TH ST. NY 11419 1 - small home, less than 4 families 19.1570% 460000.0 4162.0 21728.0 3860.0 3860.0 NaN NaN
5054940083 YNCLINO CYNTHIA 42 CORTELYOU AVE. NY 10312 1 - small home, less than 4 families 19.1570% 392000.0 4380.0 22863.0 4078.0 4078.0 NaN NaN
3047270062 PAUL BASTIEN 329 E. 55TH ST. NY 1 - small home, less than 4 families 19.1570% 365000.0 4090.0 21349.0 1424.0 1424.0 NaN NaN
5000480018 YORK-VANDUZER, LLC 150 GREAVES LN. NY 10308 1 - small home, less than 4 families 19.1570% 290000.0 2245.0 11721.0 2245.0 2245.0 NaN NaN
2042481003 2013 COLONIAL LLC 2013 COLONIAL LLC 69 BOLTON AVE. NY 10605 2 - residential, more than 10 units 12.8550% 61169.0 3587.0 27903.0 72.0 72.0 124.0 unit
1011651703 CARTUS FINANCIAL CORPORATION 2109 BROADWAY NY 2 - residential, more than 10 units 12.8550% 426152.0 21451.0 166867.0 21451.0 17697.0 799.0 unit
3032490012 NAVARRO, LEONOR 1638 DEKALB AVE. NY 11237 1 - small home, less than 4 families 19.1570% 568000.0 3457.0 18047.0 3155.0 3155.0 NaN NaN
4114441016 LOZITO, MARIANA 15614 76TH ST. NY 11414 1a - condo unit in 1-3 story building 19.1570% 299841.0 2258.0 11786.0 2258.0 2176.0 59.0 unit
3048730036 BERTRAND MERISE 3615 CHURCH AVE. NY 11203 1 - small home, less than 4 families 19.1570% 556000.0 6391.0 33360.0 6089.0 6089.0 NaN NaN
2039442458 DENNIS MOHABIR 1972 POWELL AVE. NY 10472 2 - residential, more than 10 units 12.8550% 65086.0 3150.0 24504.0 1204.0 142.0 2.0 unit
3057940014 EVERBRIGHT BROOKLYN LLC 714 61ST ST. NY 11220 4 - commercial property 10.6840% 327000.0 14203.0 132935.0 14203.0 14203.0 NaN NaN
5036420053 ZARRILLI DANIEL 136 BACHE AVE. NY 10306 1 - small home, less than 4 families 19.1570% 491000.0 5239.0 27348.0 4937.0 4937.0 NaN NaN
2047520059 RODRIQUES, VALERIE ROSE 3211 WICKHAM AVE. NY 10469 1 - small home, less than 4 families 19.1570% 453000.0 5207.0 27180.0 4905.0 4905.0 NaN NaN
3071010006 CHEN, WEN NAN 232 AVENUE T. NY 11223 1 - small home, less than 4 families 19.1570% 961000.0 7270.0 37951.0 3014.0 3014.0 NaN NaN
4045800027 SPYRO AVDOULOS 16012 11TH AVE. NY 11357 1 - small home, less than 4 families 19.1570% 999000.0 8647.0 45136.0 8647.0 8647.0 NaN NaN
4097270072 YEH, SUMI CHUANG 15015 87TH AVE. NY 11432 1 - small home, less than 4 families 19.1570% 603000.0 5879.0 30688.0 5577.0 5577.0 NaN NaN
3043050052 COLE-WRIGHT, DORIAN 713 VAN SICLEN AVE. NY 11207 2a - 4-6 unit residential building 12.8550% 130000.0 6747.0 52488.0 6747.0 6747.0 NaN NaN
5053630040 VALERIO CHIRONNA 188 SEIDMAN AVE. NY 10312 1 - small home, less than 4 families 19.1570% 494000.0 5678.0 29640.0 2060.0 2060.0 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ...
1006401015 BONN INVESTMENT, INC. 400 W. 12TH ST. NY 2 - residential, more than 10 units 12.8550% 786133.0 42910.0 333797.0 18214.0 18214.0 2060.0 unit
5017150001 GOETHALS SOUTH LLC TX 75088-5526 NY 4 - commercial property 10.6840% 1130000.0 53703.0 502650.0 49376.0 49376.0 NaN NaN
5047130031 WILLIAM C. MEANEY 205 FAIRBANKS AVE. NY 10306 1 - small home, less than 4 families 19.1570% 529000.0 6019.0 31418.0 5717.0 5717.0 NaN NaN
4033110022 YU, LING 11729 UNION TPKE. NY 11375 1 - small home, less than 4 families 19.1570% 1069000.0 7097.0 37047.0 6795.0 6795.0 NaN NaN
5026200037 CARMELITA DE JESUS 68 TOWERS LN. NY 10314 1 - small home, less than 4 families 19.1570% 362000.0 4161.0 21720.0 3859.0 3859.0 NaN NaN
4155950024 EVERTON ANGUS 611 JARVIS AVE. NY 11691 1 - small home, less than 4 families 19.1570% 431000.0 4905.0 25603.0 4905.0 4905.0 NaN NaN
3007730065 323 LUOS PROPERTY LLC 133 MOTT ST. NY 10013 1 - small home, less than 4 families 19.1570% 658000.0 2178.0 11370.0 2178.0 2178.0 NaN NaN
3074590059 1517 VOORHIES AVE. LLC 1517 VOORHIES AVE. NY 11235 4 - commercial property 10.6840% 2156000.0 77646.0 726750.0 6423.0 6423.0 NaN NaN
4163000004 ATHENA BENDO 14309 NEWPORT AVE. NY 11694 1 - small home, less than 4 families 19.1570% 1006000.0 11563.0 60360.0 11563.0 11563.0 NaN NaN
3010271092 ZHENG, YAN XIA 110 MADISON ST. NY 10002 2 - residential, more than 10 units 12.8550% 65055.0 3235.0 25164.0 91.0 91.0 2786.0 unit
1021080039 515 EDGECOMBE OWNERS CP 343 SAINT NICHOLAS AVE. NY 2 - residential, more than 10 units 12.8550% 1720000.0 81248.0 632035.0 79738.0 64497.0 NaN NaN
3048900015 LEESEAP REID 270 E. 37TH ST. NY 11203 1 - small home, less than 4 families 19.1570% 360000.0 3902.0 20366.0 3600.0 3600.0 NaN NaN
3067961005 WHITMAN PUISANG CHOI 1720 E. 14TH ST. NY 11229 2 - residential, more than 10 units 12.8550% 58500.0 2906.0 22604.0 2604.0 2604.0 250.0 unit
3033440152 RODRIGUEZ, LORENZO 305 PALMETTO ST. NY 11237 1 - small home, less than 4 families 19.1570% 540000.0 4485.0 23414.0 2329.0 2329.0 NaN NaN
4125230072 SHARPLIS-ESPRIT, LUCIA 17451 128TH AVE. NY 11434 1 - small home, less than 4 families 19.1570% 433000.0 4736.0 24720.0 4736.0 4736.0 NaN NaN
4032410055 VAY MIN HOM 7107 NANSEN ST. NY 11375 1 - small home, less than 4 families 19.1570% 648000.0 4871.0 25428.0 1814.0 1814.0 NaN NaN
1001421296 MARLINCA PROPERTIES LIMITED 200 CHAMBERS ST. NY 2 - residential, more than 10 units 12.8550% 169209.0 9515.0 74015.0 6142.0 6142.0 1629.0 unit
2039443852 FERNANDES, DANIELLE LOFF 1641 METROPOLITAN AVE. NY 2 - residential, more than 10 units 12.8550% 55677.0 2714.0 21110.0 1024.0 119.0 2.0 unit
2047370021 CHARLES ALLEN 3339 FENTON AVE. NY 10469 1 - small home, less than 4 families 19.1570% 389000.0 4471.0 23340.0 4169.0 4169.0 NaN NaN
4122740011 EARL MERCURIUS 16107 137TH AVE. NY 11434 1 - small home, less than 4 families 19.1570% 582000.0 5909.0 30846.0 5607.0 5607.0 NaN NaN
4011690008 JANILY LEE TRUST 7316 32ND AVE. NY 11370 1 - small home, less than 4 families 19.1570% 594000.0 6011.0 31380.0 5709.0 5709.0 NaN NaN
4068970003 TSANG, WINNIE 16915 65TH AVE. NY 11365 1 - small home, less than 4 families 19.1570% 708000.0 6781.0 35395.0 6479.0 6479.0 NaN NaN
2039090001 PMC LLC 320 WEST ST. NY 10528 4 - commercial property 10.6840% 527000.0 24251.0 226980.0 5433.0 5433.0 NaN NaN
3082850069 THOMAS, JULIUS, RUTHY 1360 E. 102ND ST. NY 11236 1 - small home, less than 4 families 19.1570% 424000.0 4874.0 25440.0 4572.0 4572.0 NaN NaN
3078570043 MASSIE, OLYMPHISE 1209 E. 59TH ST. NY 11234 1 - small home, less than 4 families 19.1570% 316000.0 3632.0 18960.0 1195.0 1195.0 NaN NaN
4150100052 145-47 155TH ST. REALTY 1392 BEECH ST. NY 11509 4 - commercial property 10.6840% 590000.0 22529.0 210870.0 5847.0 5847.0 NaN NaN
4082200097 RIZO, HERMAN 5110 MARATHON PKWY. NY 11362 1 - small home, less than 4 families 19.1570% 699000.0 6905.0 36045.0 6603.0 6603.0 NaN NaN
4069860022 JEAN WONG 7513 167TH ST. NY 11366 1 - small home, less than 4 families 19.1570% 584000.0 5165.0 26960.0 5165.0 5165.0 NaN NaN
4023720143 ROSE ALGOZINI 5315 62ND ST. NY 11378 1 - small home, less than 4 families 19.1570% 549000.0 5323.0 27786.0 5323.0 5323.0 NaN NaN
4090470017 KHAIR, SIKDAR M. 9722 76TH ST. NY 11416 1 - small home, less than 4 families 19.1570% 503000.0 4559.0 23797.0 4559.0 4559.0 NaN NaN

104 rows × 11 columns

Now these look good. However, geocoding all of them with geopy will prove impossible. Not only will it take until sometime next year to complete because of rate limiting, but you will find that geocoding services like to charge a fee for their services beyond a certain number of records. In awhile we’ll show you how to load your own geocoder and use it.

For now, we’ll take a subset. Some services rate limit you to one call a second. To avoid taking forever, we’ll just use a few records at first.

# grab a random sample of records
tax_bills_bbls_sample = geocodable_tax_bills_june15_bbls.sample(n=15)

# grab the same records from the key-linked dataframe.
tax_bills_exab_sample = tax_bills_june15_exab.ix[tax_bills_bbls_sample.index]
tax_bills_bbls_sample
Output
ownername address taxclass taxrate emv tbea bav tba propertytax condonumber condo
bbl
4006480018 RABOS, CONSTANTINE 3157 35TH ST. NY 11106 2a - 4-6 unit residential building 12.8550% 818000.0 12720.0 98946.0 12720.0 12720.0 NaN NaN
4134860062 WILSON, MARJORIE 14536 230TH ST. NY 11413 1 - small home, less than 4 families 19.1570% 494000.0 5020.0 26203.0 4718.0 4718.0 NaN NaN
4081360040 DENNIS DELORENZO 4132 WESTMORELAND ST. NY 11363 1 - small home, less than 4 families 19.1570% 1350000.0 9973.0 52058.0 9671.0 9671.0 NaN NaN
3042680035 JAMES OXLEY FAMILY IRREVOCABLE TRUST 664 HEMLOCK ST. NY 11208 1 - small home, less than 4 families 19.1570% 440000.0 4735.0 24715.0 4433.0 4433.0 NaN NaN
3053990052 MELVIN BRICKMAN 507 F. NY 11218 1 - small home, less than 4 families 19.1570% 929000.0 7885.0 41158.0 7583.0 7583.0 NaN NaN
3001070024 PARKS AND RECREATION (GENERAL) 16 61ST ST. New York NY 4 - commercial property 10.6840% 182000.0 8471.0 79290.0 NaN 0.0 NaN NaN
3073330067 FRANKIE KAFAI LAU 2053 E. 28TH ST. NY 11229 1 - small home, less than 4 families 19.1570% 408000.0 4690.0 24480.0 4388.0 4388.0 NaN NaN
4104270034 PATRICK, GLORIA 18833 KEESEVILLE AVE. NY 11412 1 - small home, less than 4 families 19.1570% 450000.0 3761.0 19634.0 3459.0 3459.0 NaN NaN
1006440063 FAIRFAX & SAMMONS PROPERTIES, LLC 67 GANSEVOORT ST. NY 10014 4 - commercial property 10.6840% 4157000.0 157150.0 1470890.0 157150.0 157150.0 NaN NaN
4082470008 ALICE PONEROS 25123 THEBES AVE. NY 11362 1 - small home, less than 4 families 19.1570% 841000.0 7523.0 39269.0 6902.0 6902.0 NaN NaN
4157090005 GRETEL JOSEPH 2210 LORETTA RD. NY 11691 1 - small home, less than 4 families 19.1570% 371000.0 4264.0 22260.0 3962.0 3962.0 NaN NaN
1008701214 CHEN, ADRIAN 1 IRVING PLACE New York NY 2 - residential, more than 10 units 12.8550% 286232.0 13722.0 106748.0 13722.0 13722.0 449.0 unit
3032750036 JANICE GEIGER WATSON 96 HIMROD ST. NY 11221 1 - small home, less than 4 families 19.1570% 588000.0 1782.0 9304.0 1480.0 1480.0 NaN NaN
2033630067 ROBERT MAUCH 4222 HERKIMER PL. NY 10470 1 - small home, less than 4 families 19.1570% 457000.0 5253.0 27420.0 4320.0 4320.0 NaN NaN
4114174401 GIAQUINTO GINA 8710 149TH AVE. NY 11414 2 - residential, more than 10 units 12.8550% 55425.0 3033.0 23596.0 2731.0 1964.0 12.0 unit

Now a quick sanity check to make sure our addresses work.

print(next(iter(tax_bills_bbls_sample['address'])))
geocoder = Nominatim()
geocoder.geocode(next(iter(tax_bills_bbls_sample['address'])))
Output
3157  35TH ST. NY 11106
Location(35th Street, Astoria, Queens County, NYC, New York, 11101, United States of America, (40.7628959, -73.9202263, 0.0))
geocoded_addresses = tax_bills_bbls_sample['address'].map(lambda addr: geocoder.geocode(addr))
geocoded_addresses
Output
bbl
4006480018    (35th Street, Astoria, Queens County, NYC, New...
4134860062    (230th Street, Laurelton, Queens County, NYC, ...
4081360040    (Westmoreland Street, Douglaston, Queens Count...
3042680035    (664, Hemlock Street, East New York, Kings Cou...
3053990052    (507, Avenue F, Parkville, BK, Kings County, N...
3001070024    (60-16, 61st Street, Fresh Pond, Queens County...
3073330067    (2053, East 28th Street, Sheepshead Bay, BK, K...
4104270034    (Keeseville Avenue, Saint Albans, Queens Count...
1006440063    (67, Gansevoort Street, Chelsea, Manhattan, Ne...
4082470008    (Thebes Avenue, Little Neck, Queens County, NY...
4157090005    (Loretta Road, Roy Reuther Houses, Far Rockawa...
1008701214    (72 1/2, Irving Place, Flatiron, Manhattan, Ne...
3032750036    (96, Himrod Street, Bushwick, Kings County, NY...
2033630067    (4222, Herkimer Place, Woodlawn, Bronx, Bronx ...
4114174401    (149th Avenue, Ozone Park, Kings, NYC, New Yor...
Name: address, dtype: object

Now we have some geocoded addresses! We got lucky here, but in a larger sample, some will be None. A .dropna() or filter should suffice to get rid of null values. Each location will have a bunch of fields, but the most useful are the canonical address, longitude, and latitude:

loc = next(iter(geocoded_addresses))  # let's inspect the first one.
print((loc.address, (loc.longitude, loc.latitude)))
Output
('35th Street, Astoria, Queens County, NYC, New York, 11101, United States of America', (-73.9202263, 40.7628959))

Now the only thing left is to add Series for these to our sample. I like to canonicalize the address, too, although you may want to rename the original address field to original_address or some such:

tax_bills_bbls_sample['address'] = geocoded_addresses.map(lambda l: l.address)
tax_bills_bbls_sample['latitude'] = geocoded_addresses.map(lambda l: l.latitude)
tax_bills_bbls_sample['longitude'] = geocoded_addresses.map(lambda l: l.longitude)
tax_bills_bbls_sample
Output
ownername address taxclass taxrate emv tbea bav tba propertytax condonumber condo latitude longitude
bbl
4006480018 RABOS, CONSTANTINE 35th Street, Astoria, Queens County, NYC, New ... 2a - 4-6 unit residential building 12.8550% 818000.0 12720.0 98946.0 12720.0 12720.0 NaN NaN 40.762896 -73.920226
4134860062 WILSON, MARJORIE 230th Street, Laurelton, Queens County, NYC, N... 1 - small home, less than 4 families 19.1570% 494000.0 5020.0 26203.0 4718.0 4718.0 NaN NaN 40.659682 -73.750916
4081360040 DENNIS DELORENZO Westmoreland Street, Douglaston, Queens County... 1 - small home, less than 4 families 19.1570% 1350000.0 9973.0 52058.0 9671.0 9671.0 NaN NaN 40.772826 -73.738065
3042680035 JAMES OXLEY FAMILY IRREVOCABLE TRUST 664, Hemlock Street, East New York, Kings Coun... 1 - small home, less than 4 families 19.1570% 440000.0 4735.0 24715.0 4433.0 4433.0 NaN NaN 40.672613 -73.868584
3053990052 MELVIN BRICKMAN 507, Avenue F, Parkville, BK, Kings County, NY... 1 - small home, less than 4 families 19.1570% 929000.0 7885.0 41158.0 7583.0 7583.0 NaN NaN 40.633838 -73.973400
3001070024 PARKS AND RECREATION (GENERAL) 60-16, 61st Street, Fresh Pond, Queens County,... 4 - commercial property 10.6840% 182000.0 8471.0 79290.0 NaN 0.0 NaN NaN 40.714971 -73.902534
3073330067 FRANKIE KAFAI LAU 2053, East 28th Street, Sheepshead Bay, BK, Ki... 1 - small home, less than 4 families 19.1570% 408000.0 4690.0 24480.0 4388.0 4388.0 NaN NaN 40.601201 -73.943775
4104270034 PATRICK, GLORIA Keeseville Avenue, Saint Albans, Queens County... 1 - small home, less than 4 families 19.1570% 450000.0 3761.0 19634.0 3459.0 3459.0 NaN NaN 40.698718 -73.766175
1006440063 FAIRFAX & SAMMONS PROPERTIES, LLC 67, Gansevoort Street, Chelsea, Manhattan, New... 4 - commercial property 10.6840% 4157000.0 157150.0 1470890.0 157150.0 157150.0 NaN NaN 40.739595 -74.007467
4082470008 ALICE PONEROS Thebes Avenue, Little Neck, Queens County, NYC... 1 - small home, less than 4 families 19.1570% 841000.0 7523.0 39269.0 6902.0 6902.0 NaN NaN 40.766735 -73.734254
4157090005 GRETEL JOSEPH Loretta Road, Roy Reuther Houses, Far Rockaway... 1 - small home, less than 4 families 19.1570% 371000.0 4264.0 22260.0 3962.0 3962.0 NaN NaN 40.603080 -73.756490
1008701214 CHEN, ADRIAN 72 1/2, Irving Place, Flatiron, Manhattan, New... 2 - residential, more than 10 units 12.8550% 286232.0 13722.0 106748.0 13722.0 13722.0 449.0 unit 40.736727 -73.986599
3032750036 JANICE GEIGER WATSON 96, Himrod Street, Bushwick, Kings County, NYC... 1 - small home, less than 4 families 19.1570% 588000.0 1782.0 9304.0 1480.0 1480.0 NaN NaN 40.696093 -73.923497
2033630067 ROBERT MAUCH 4222, Herkimer Place, Woodlawn, Bronx, Bronx C... 1 - small home, less than 4 families 19.1570% 457000.0 5253.0 27420.0 4320.0 4320.0 NaN NaN 40.896346 -73.875535
4114174401 GIAQUINTO GINA 149th Avenue, Ozone Park, Kings, NYC, New York... 2 - residential, more than 10 units 12.8550% 55425.0 3033.0 23596.0 2731.0 1964.0 12.0 unit 40.668572 -73.856841

THe last thing is to write this out to file and push the resulting file to git.

features = []
import json
for i, row in tax_bills_bbls_sample.iterrows():
    feature = {
        "type": "Feature",
        "geometry": {"type": "Point", "coordinates": [row['longitude'], row['latitude']]},
        "properties": {}
    }
    for key, value in row.items():
        if not pd.isnull(value):
            feature['properties'][key] = value
    features.append(feature)

with open('SampleFeatures.geojson', 'w') as output:
    json.dump({"type": "FeatureCollection", "features": features}, output, indent=2)

And now assuming you have a new repository on GitHub, the push:

sh $ git add . $ git commit -m 'initial commit' $ git remote add origin https://github.com/jeffersonheard/DataSciBlogGeoJsonSample.git $ git push origin master `

comments powered by Disqus