Improving a Deep Learning Model with Transfer Learning in ArcGIS Pro

Intro

This project demonstrates how an existing deep learning model can be refined to perform better on local data. It walks through how transfer learning allows a pretrained model to adapt to new imagery and conditions, using ArcGIS Pro as a complete workspace for deep learning.

Preparing the Project

The lab begins with the Seattle_Building_Detection project, which contains NAIP aerial imagery of the Seattle area. The goal is to detect and extract building footprints using a deep learning model. The pretrained model used is Building Footprint Extraction: USA from the ArcGIS Living Atlas, which was trained on high resolution imagery collected from several parts of the country.

We first examine the properties of the Seattle imagery. The pixel size, color band combination, and data type are checked against the requirements of the pretrained model. Since the imagery is one meter resolution while the model was trained on 10 to 40 centimeter resolution, the model does not perform well initially. Many small buildings are missed, and some roof structures are detected incorrectly. This mismatch between training and test data is the reason transfer learning is needed.

In this stage, we will also prepare the workspace by confirming that the deep learning libraries are installed and that the GPU is recognized by ArcGIS Pro. Model performance and training speed depend heavily on these resources.

Creating Local Training Samples

We then create training samples to help the model understand how buildings appear in Seattle’s NAIP imagery. A new feature class is created and stored within the project’s geodatabase. Using the Create Training Samples tool, we manually draw polygons over visible buildings in the imagery. Each feature is labeled with a numeric class value of 1, representing the single class “Building.”

To ensure that the model learns accurately, we will avoid labeling ambiguous or partially obscured structures. Shadows, vegetation overlap, and building corners are carefully traced to maintain clean labels. The quality of these polygons directly affects how the model learns during training.

The imagery is then clipped to the extent of the labeled samples using the Clip Raster tool. This step ensures that the exported chips will not include unlabeled buildings that could confuse the model. Next, we will run the Export Training Data for Deep Learning tool with the following settings:

  • Input Raster: Clipped Seattle imagery
  • Input Feature Class: Training samples feature class
  • Tile Size: 256 pixels
  • Stride: 64 pixels
  • Metadata Format: RCNN Masks
  • Class Value Field: Class

This process exports small image tiles called “chips.” Each chip contains both imagery and the corresponding labeled building area. These chips form the dataset used to retrain the model.

Retraining the Model

After exporting the training data, we will use the Train Deep Learning Model tool. The exported chips folder is used as the input training data. The pretrained model Building Footprint Extraction: USA is selected under the “Pretrained Model” parameter.

Key settings are then configured:

  • Model Type: MaskRCNN
  • Freeze Model: Checked (keeps the early layers fixed to preserve general knowledge)
  • Batch Size: 4
  • Learning Rate: 0.0001
  • Max Epochs: 20
  • Output Model: Building_Footprint_Seattle.dlpk

By freezing most of the pretrained layers, the model retains its ability to identify general shapes and textures while retraining the final layers to adapt to the specific features of Seattle buildings. Training runs faster because only a small portion of the model’s parameters are updated.

During training, we will monitor the progress and observe the loss value decreasing with each epoch. Once training is complete, ArcGIS Pro generates a new model package (.dlpk file), which is the refined version of the original model.

Applying and Evaluating Results

To test the fine-tuned model, we use the Detect Objects Using Deep Learning tool. The tool is configured with the following parameters:

  • Input Raster: Seattle imagery
  • Model Definition: Fine-tuned .dlpk model
  • Padding: 64
  • Tile Size: 256
  • Confidence Threshold: 0.9
  • Non Maximum Suppression: Enabled

The tool processes the imagery and generates a feature class of detected buildings. We compare these results with both the pretrained model’s output and the ground truth data.

The difference is clear. The fine-tuned model identifies smaller buildings that the pretrained model missed, reduces false positives, and more accurately outlines roof structures. The overall quality of the detected features improves, particularly in shaded areas and neighborhoods with dense structures.

Quantitatively, the fine-tuned model achieves a detection accuracy of approximately 90 percent, compared to about 70 percent for the pretrained model. The training time is also efficient, taking roughly one hour on a mid-range GPU workstation.

Outro

This project shows how artificial intelligence becomes more effective when trained with local data. Transfer learning in ArcGIS Pro is a practical method that allows a model trained on broad, national scale imagery to adapt to specific regional conditions. It bridges the gap between global knowledge and local application.

A model that was designed to detect buildings across the country fails to perform well in Seattle because of differences in lighting, roof material, and image resolution. By retraining the model using local examples, the model learns the visual characteristics that are unique to the area. It does not discard what it already knows but adjusts its understanding to a new context.

This reflects how human learning works. People carry forward what they already know and adapt it to new experiences. Transfer learning applies the same principle to machines. The model starts with existing knowledge and improves it by learning from the specific environment it will operate in.

The process also highlights the importance of data quality. In deep learning, a model is only as reliable as the samples it is trained on. Poorly labeled data or inconsistent imagery will lead to weak results. We must carefully prepare and verify training samples to ensure that the model learns accurately.

The efficiency of transfer learning is another important takeaway. Full model training requires thousands of images and extensive computing resources. Transfer learning reduces both the data requirement and the training time while still producing a model that performs well. This makes artificial intelligence more accessible to GIS professionals and local government offices that may not have advanced hardware or large datasets.

ArcGIS Pro’s deep learning tools simplify this process. They provide a consistent and visual interface for every stage of the workflow, from preparing samples to applying the final model. The analyst can focus on understanding patterns in spatial data instead of managing code or writing scripts. This integration allows deep learning to become a standard part of GIS analysis rather than a specialized task.

The broader message of this project is that intelligence, whether human or artificial, becomes meaningful when it is tied to context. Transfer learning is not just about model performance. It is about understanding that every place has its own visual language and that models must learn to speak it. By combining human expertise with computational efficiency, GIS professionals can build models that see the world more accurately, one region at a time.

Tags: