Ditch the Cursor

mc_pandas.jpg

Editing Feature Classes with Spatialy-Enabled DataFrames

ArcPy Is Great, And...

arcpy.png

Problem one: row[0]

def _update_year_built(layer, year_fields):
    with arcpy.da.UpdateCursor(layer, year_fields) as cursor:
        for row in cursor:
            if row[0] is None or row[0] < 1 or row[0] == '':
                row[0] = f'{row[1]}_{row[2]}' 

            cursor.updateRow(row)

Problem 2: Transfering data between feature classes and tables

with arcpy.da.SearchCursor(new_data_fc, fields) as new_data_cursor, \
     arcpy.da.InsertCursor(current_data_fc, fields) as current_data_cursor:
    for row in new_data_cursor:
        current_data_cursor.insertRow(row)
        copied_records += 1

Problem 3: Renaming/reordering fields

fieldmappings = arcpy.FieldMappings()

fieldmappings.addTable(energov_parcels_fc)
fieldmappings.addTable(tville_parcels_fc)

fields_list = [
    ('PIN', 'parcel_id'),
    ('own_cityst', 'own_citystate'),
    ('own_zip_fo', 'own_zip_four'),
    ('prop_locat', 'prop_location'),
    ('property_t', 'property_type'),
    ('neighborho', 'neighborhood_code'),
    ('adjusted_p', 'adjusted_prcl_total'),
    #: ...
]

for field_map in fields_list:
    field_to_map_index = fieldmappings.findFieldMapIndex(field_map[0])
    field_to_map = fieldmappings.getFieldMap(field_to_map_index)
    field_to_map.addInputField(tville_parcels_fc, field_map[1])
    fieldmappings.replaceFieldMap(field_to_map_index, field_to_map)

Problem 4: Intermediate feature classes

ssa_summarized_roads = fr'{output_gdb}\ssa_bike_lanes_roads'
ssa_summarized_paths = fr'{output_gdb}\ssa_bike_lanes_paths'
ssa_summarized_lengths = fr'{output_gdb}b\SmallStatisticalAreas_2018_bike_lane_lengths'
tract_summarized_roads = fr'{output_gdb}\tract_bike_lanes_roads'
tract_summarized_paths = fr'{output_gdb}\tract_bike_lanes_paths'
tract_summarized_lengths = fr'{output_gdb}\census_tracts_2020_bike_lane_lengths'
buffered_tracts = fr'{output_gdb}\census_tracts_2020_buffered_30ft'
buffered_areas = fr'{output_gdb}\small_areas_buffered_200ft'
bike_lanes = fr'{output_gdb}\bike_lanes_20220111'
major_paths = fr'{output_gdb}\major_paths'

Enter the Pandas!

pandas.jpg

pandas gives you the tools to work with tables of data defined by rows and columns, called a DataFrame

In [1]:
import pandas as pd

medians_df = pd.read_csv('assets/median_age.csv')
medians_df.head()
Out[1]:
County Median_age educationHighSchoolGraduate educationBachelorOrGreater Median HH income (dollars) Mean HH income (dollars) Mean Per Capita income Avg_MonthlyIncome TI_1 TI_2 TI_3 TI_4 TI_5 TI_6 TI_7 TI_8 TI_9 TI_10
0 BEAVER 33.4 90.3 20.4 54,212 64,003 22,558 2805 Gasoline stations Elementary and secondary schools Traveler accommodation Executive, legislative and general government Cattle ranching and farming Building equipment contractors Offices of physicians Residential building construction Depository credit intermediation Other crop farming
1 BOX ELDER 32.5 93.2 23.3 59,937 73,085 23,998 3344 Restaurants and other eating places Architectural and structural metals mfg. General freight trucking Building foundation and exterior contractors Utility system construction Justice, public order, and safety activities Grain and oilseed milling Gasoline stations Automobile dealers Offices of physicians
2 CACHE 25.1 93.1 37.8 56,840 72,148 22,666 3049 Restaurants and other eating places Elementary and secondary schools Dairy product manufacturing Accounting and bookkeeping services Other miscellaneous manufacturing General merchandise stores, including warehous... Offices of physicians Scientific research and development services Grocery stores Electronic instrument manufacturing
3 CARBON 37.2 90.8 16.4 50,278 60,125 23,473 3436 Justice, public order, and safety activities Machinery and supply merchant wholesalers Executive, legislative and general government Commercial machinery repair and maintenance Offices of physicians Gasoline stations Building equipment contractors Traveler accommodation Offices of dentists Home health care services
4 DAGGET 46.1 96.2 12.4 81,250 78,210 27,698 2252 Elementary and secondary schools Administration of environmental programs Postal service Support activities for mining Offices of real estate agents and brokers Justice, public order, and safety activities Executive, legislative and general government Personal care services Commercial machinery repair and maintenance Special food services

We can access individual rows and columns using .loc (with index labels) or .iloc (with indices)

medians_df.loc[row labels, column labels]
medians_df.iloc[row indices, column indices]
In [2]:
medians_df.loc[[0, 1, 2, 5], 'County']
Out[2]:
0       BEAVER
1    BOX ELDER
2        CACHE
5        DAVIS
Name: County, dtype: object
In [3]:
medians_df.iloc[10:15, :4]
Out[3]:
County Median_age educationHighSchoolGraduate educationBachelorOrGreater
10 IRON 29.1 92.8 29.1
11 JUAB 30.4 92.2 17.1
12 KANE 41.7 92.8 27.3
13 MILLARD 35.8 88.7 22.4
14 MORGAN 32.4 97.6 39.5

We can also get just a few columns from all rows

In [4]:
medians_df[['Median_age', 'Avg_MonthlyIncome']].head()
Out[4]:
Median_age Avg_MonthlyIncome
0 33.4 2805
1 32.5 3344
2 25.1 3049
3 37.2 3436
4 46.1 2252

Extending pandas Spatially

The ArcGIS API for Python provides Spatially Enabled DataFrames, which include geometry information.

python_api_logo.png

In [5]:
from arcgis.features import GeoAccessor, GeoSeriesAccessor

counties_fc_path = r'C:\Users\jdadams\AppData\Roaming\Esri\ArcGISPro\Favorites\opensgid.agrc.utah.gov.sde\opensgid.boundaries.county_boundaries'
counties_df = pd.DataFrame.spatial.from_featureclass(counties_fc_path)
counties_df.head()
Out[5]:
xid countynbr entitynbr entityyr name fips stateplane pop_lastcensus pop_currestimate globalid fips_str color4 SHAPE
0 1 20 2.010201e+09 2010.0 SANPETE 39.0 Central 28437 None 02C0C074-657F-44AE-A886-44ADB97263BD 49039 2 {"rings": [[[448347.6200000001, 4407163.6], [4...
1 2 11 2.010111e+09 2010.0 IRON 21.0 South 57289 None 2ACA2EB9-31B5-4D6A-A858-A0F6B16C64E2 49021 3 {"rings": [[[292688.5800000001, 4224956.960000...
2 3 13 2.010131e+09 2010.0 KANE 25.0 South 7667 None A250C849-8914-4E00-A80F-C1B45CD8F0A7 49025 4 {"rings": [[[425313.88999999966, 4154648.07000...
3 4 29 2.010291e+09 2010.0 WEBER 57.0 North 262223 None 1757A80B-1895-4975-81DA-3AD2EBBA64C9 49057 1 {"rings": [[[422712.16000000015, 4554559.60999...
4 5 19 2.006191e+09 2006.0 SAN JUAN 37.0 South 14518 None EC858EAC-D7E7-4748-B7A5-B2B744668178 49037 3 {"rings": [[[609343.4100000001, 4095382.08], [...

pandas lets you work on rows that meet a certain condition

In [6]:
counties_df.loc[counties_df['stateplane'] == 'Central', ['name', 'stateplane', 'fips_str']]
Out[6]:
name stateplane fips_str
0 SANPETE Central 49039
8 TOOELE Central 49045
12 UINTAH Central 49047
13 GRAND Central 49019
15 MILLARD Central 49027
16 WASATCH Central 49051
17 JUAB Central 49023
18 UTAH Central 49049
19 DUCHESNE Central 49013
25 EMERY Central 49015
26 SEVIER Central 49041
27 CARBON Central 49007
28 SALT LAKE Central 49035

You can easily add new columns

In [7]:
counties_df['emperor'] = 'Jake'
counties_df.head()
Out[7]:
xid countynbr entitynbr entityyr name fips stateplane pop_lastcensus pop_currestimate globalid fips_str color4 SHAPE emperor
0 1 20 2.010201e+09 2010.0 SANPETE 39.0 Central 28437 None 02C0C074-657F-44AE-A886-44ADB97263BD 49039 2 {"rings": [[[448347.6200000001, 4407163.6], [4... Jake
1 2 11 2.010111e+09 2010.0 IRON 21.0 South 57289 None 2ACA2EB9-31B5-4D6A-A858-A0F6B16C64E2 49021 3 {"rings": [[[292688.5800000001, 4224956.960000... Jake
2 3 13 2.010131e+09 2010.0 KANE 25.0 South 7667 None A250C849-8914-4E00-A80F-C1B45CD8F0A7 49025 4 {"rings": [[[425313.88999999966, 4154648.07000... Jake
3 4 29 2.010291e+09 2010.0 WEBER 57.0 North 262223 None 1757A80B-1895-4975-81DA-3AD2EBBA64C9 49057 1 {"rings": [[[422712.16000000015, 4554559.60999... Jake
4 5 19 2.006191e+09 2006.0 SAN JUAN 37.0 South 14518 None EC858EAC-D7E7-4748-B7A5-B2B744668178 49037 3 {"rings": [[[609343.4100000001, 4095382.08], [... Jake

pandas provides powerful built in grouping and aggregation tools, along with Spatially Enabled DataFrames' geometry operations

In [8]:
counties_df.groupby('stateplane').count()
Out[8]:
xid countynbr entitynbr entityyr name fips pop_lastcensus pop_currestimate globalid fips_str color4 SHAPE emperor
stateplane
Central 13 13 13 13 13 13 13 0 13 13 13 13 13
North 8 8 8 8 8 8 8 0 8 8 8 8 8
South 8 8 8 8 8 8 8 0 8 8 8 8 8
In [9]:
counties_df['acres'] = counties_df['SHAPE'].apply(lambda shape: shape.area / 4046.8564)
counties_df.groupby('stateplane')['acres'].sum()
Out[9]:
stateplane
Central    2.725686e+07
North      8.633955e+06
South      1.842508e+07
Name: acres, dtype: float64

pandas Solutions to our Arcpy Problems

different.jpg

row[0] Solution: Field Names

Let's make Erik the emperor of the small counties that use State Plane North

In [10]:
counties_df.loc[(counties_df['pop_lastcensus'] < 100000) & (counties_df['stateplane'] == 'North'), 'emperor'] = 'Erik'
counties_df[['name', 'pop_lastcensus', 'stateplane', 'emperor']].sort_values('name').head()
Out[10]:
name pop_lastcensus stateplane emperor
9 BEAVER 7072 South Jake
10 BOX ELDER 57666 North Erik
11 CACHE 133154 North Jake
27 CARBON 20412 Central Jake
20 DAGGETT 935 North Erik

Joined Tables Solution: Merged DataFrames

Let's add census data to our counties

In [11]:
census_fc_path = r'C:\Users\jdadams\AppData\Roaming\Esri\ArcGISPro\Favorites\opensgid.agrc.utah.gov.sde\opensgid.demographic.census_counties_2020'
census_df = pd.DataFrame.spatial.from_featureclass(census_fc_path)
counties_with_census_df = counties_df.merge(census_df[['geoid20', 'aland20']], left_on='fips_str', right_on='geoid20')
counties_with_census_df.head()
Out[11]:
xid countynbr entitynbr entityyr name fips stateplane pop_lastcensus pop_currestimate globalid fips_str color4 SHAPE emperor acres geoid20 aland20
0 1 20 2.010201e+09 2010.0 SANPETE 39.0 Central 28437 None 02C0C074-657F-44AE-A886-44ADB97263BD 49039 2 {'rings': [[[448347.6200000001, 4407163.6], [4... Jake 1.024680e+06 49039 4.117901e+09
1 2 11 2.010111e+09 2010.0 IRON 21.0 South 57289 None 2ACA2EB9-31B5-4D6A-A858-A0F6B16C64E2 49021 3 {'rings': [[[292688.5800000001, 4224956.960000... Jake 2.112861e+06 49021 8.537474e+09
2 3 13 2.010131e+09 2010.0 KANE 25.0 South 7667 None A250C849-8914-4E00-A80F-C1B45CD8F0A7 49025 4 {'rings': [[[425313.88999999966, 4154648.07000... Jake 2.627117e+06 49025 1.033391e+10
3 4 29 2.010291e+09 2010.0 WEBER 57.0 North 262223 None 1757A80B-1895-4975-81DA-3AD2EBBA64C9 49057 1 {'rings': [[[422712.16000000015, 4554559.60999... Jake 4.220563e+05 49057 1.492537e+09
4 5 19 2.006191e+09 2006.0 SAN JUAN 37.0 South 14518 None EC858EAC-D7E7-4748-B7A5-B2B744668178 49037 3 {'rings': [[[609343.4100000001, 4095382.08], [... Jake 5.075154e+06 49037 2.025317e+10

Renaming/Reordering Fields Solution: df.rename() and df.reindex()

"Emperor" is too bold; let's use "Benevolent Dictator for Life" instead.

In [12]:
renames = {
    'name': 'County Name',
    'pop_lastcensus': 'Last Census Population',
    'emperor': 'Benevolent Dictator for Life',
    'acres': 'Acres',
    'aland20': 'Land Area',
}
counties_with_census_df.rename(columns=renames, inplace=True)
counties_with_census_df.head()
Out[12]:
xid countynbr entitynbr entityyr County Name fips stateplane Last Census Population pop_currestimate globalid fips_str color4 SHAPE Benevolent Dictator for Life Acres geoid20 Land Area
0 1 20 2.010201e+09 2010.0 SANPETE 39.0 Central 28437 None 02C0C074-657F-44AE-A886-44ADB97263BD 49039 2 {'rings': [[[448347.6200000001, 4407163.6], [4... Jake 1.024680e+06 49039 4.117901e+09
1 2 11 2.010111e+09 2010.0 IRON 21.0 South 57289 None 2ACA2EB9-31B5-4D6A-A858-A0F6B16C64E2 49021 3 {'rings': [[[292688.5800000001, 4224956.960000... Jake 2.112861e+06 49021 8.537474e+09
2 3 13 2.010131e+09 2010.0 KANE 25.0 South 7667 None A250C849-8914-4E00-A80F-C1B45CD8F0A7 49025 4 {'rings': [[[425313.88999999966, 4154648.07000... Jake 2.627117e+06 49025 1.033391e+10
3 4 29 2.010291e+09 2010.0 WEBER 57.0 North 262223 None 1757A80B-1895-4975-81DA-3AD2EBBA64C9 49057 1 {'rings': [[[422712.16000000015, 4554559.60999... Jake 4.220563e+05 49057 1.492537e+09
4 5 19 2.006191e+09 2006.0 SAN JUAN 37.0 South 14518 None EC858EAC-D7E7-4748-B7A5-B2B744668178 49037 3 {'rings': [[[609343.4100000001, 4095382.08], [... Jake 5.075154e+06 49037 2.025317e+10

Now that we've got it all looking good, let's reorder the fields and get rid of the ones we don't want

In [13]:
field_order = [
    'County Name',
    'Benevolent Dictator for Life',
    'Acres',
    'Land Area',
    'Last Census Population',
    'SHAPE'
]
final_counties_df = counties_with_census_df.reindex(columns=field_order)
final_counties_df.head()
Out[13]:
County Name Benevolent Dictator for Life Acres Land Area Last Census Population SHAPE
0 SANPETE Jake 1.024680e+06 4.117901e+09 28437 {'rings': [[[448347.6200000001, 4407163.6], [4...
1 IRON Jake 2.112861e+06 8.537474e+09 57289 {'rings': [[[292688.5800000001, 4224956.960000...
2 KANE Jake 2.627117e+06 1.033391e+10 7667 {'rings': [[[425313.88999999966, 4154648.07000...
3 WEBER Jake 4.220563e+05 1.492537e+09 262223 {'rings': [[[422712.16000000015, 4554559.60999...
4 SAN JUAN Jake 5.075154e+06 2.025317e+10 14518 {'rings': [[[609343.4100000001, 4095382.08], [...

Intermediate Feature Classes: New DataFrame Variables

With everything we've done, we've not written a single feature class to either disk or in_memory

counties_df
counties_with_census_df
final_counties_df

Finally, Write It All To Disk

In [14]:
final_counties_df.spatial.to_featureclass(r'C:\gis\Projects\HousingInventory\HousingInventory.gdb\counties_ugic')
Out[14]:
'C:\\gis\\Projects\\HousingInventory\\HousingInventory.gdb\\counties_ugic'

results.png

but_wait.png

Spatial Joins

centroids_df = pd.DataFrame.spatial.from_featureclass(centroids_fc)
walksheds_df = pd.DataFrame.spatial.from_featureclass(walksheds_fc)

walk_centroids_df = centroids_df.spatial.join(walksheds_df, 'left', 'within')

Update an AGOL Hosted Feature Layer

feature_layer_item = gis.content.get(feature_layer_itemid)
feature_layer = arcgis.features.FeatureLayer.fromitem(feature_layer_item)
live_dataframe = pd.DataFrame.spatial.from_layer(feature_layer)

#: maniuplate/transform the existing data
cleaned_dataframe = do_stuff(live_dataframe)

feature_layer.edit_features(updates=cleaned_dataframe.spatial.to_featureset())

Resources?

resources.png