SHELDON
  • Insights
  • About

Table of contents

  • Data Rights
  • Load Packages
  • Part I: Product Service Codes [PSC]
    • Table to Explore All Active PSCs
    • What is the Breakdown Between Products and Services?
    • Frequency of PSC Additions
  • Part II: FPDS CSV
    • Acquire All Procurement of Visible and Invisible Light Communication Equipment
    • What is the Total Spend?
    • Spend Over Time?
    • Explore Total Outlays by Day
    • Cumulative Spend?
    • Spend by Department
    • Procurement Seasonality?
    • Deepive: Top Visible and Invisible Light Communication Equipment Vendors
  • Part III: FPDS Atom
    • Anduril Procurement
    • What Can We See from FPDS Atom?
    • Entity Resolution
    • Anduril Agency Funding
    • Anduril Award Description Analysis
    • Anduril Trigram Award Totals
  • Part IV: Analyzing Phase III SBIRs
    • Acquire All Phase III SBIR Procurements
    • Aggregate Summary Data
    • Which Departments Award SBIR Phase IIIs
    • Phase IIIs by Agency
    • Explore Every SBIR Awarding Office
    • Top 20 SBIR Phase III Awardees
  • Part V: The 2023 Defense Budget
    • Acquire 2023 Budget Data
    • Macro Exploration of the Budget
      • Breakdown by Agency
    • Which Items Are We Buying
    • 2023 Budget Request

govtrackR

Using govtrackR

govtrackR
data analysis
Author

SHELDON

Published

September 12, 2023

govtrackR
Introduction to govtrackR and government data analysis.

Data Rights

Contract Number: N6833522C0500
Contractor Name: P.W. Communications, Inc.
Contractor Address: 11200 Rockville Pike Suite 130 Rockville, MD 20852
Expiration of Data Rights Period: January 16, 2029

The Government’s rights to use, modify, reproduce, release, perform, display, or disclose technical data or computer software marked with this legend are restricted during the period shown as provided in paragraph (b)(4) of the Rights in Noncommercial Technical Data and Computer Software Small Business Innovation Research (SBIR) Program clause contained in the above identified contract. No restrictions apply after the expiration date shown above. Any reproduction of technical data, computer software, or portions thereof marked with this legend must also reproduce the markings.

Load Packages

Code
library(lubridate)
library(tidyverse)
library(tidytext)
library(govtrackR)
library(scales)
library(ggtext)
library(gganimate)
library(hrbrthemes)
library(rtemis)
library(viridis)
library(ggrepel)
library(highcharter)
library(tidylo)
library(widyr)
library(gt)
library(tidygraph)
library(ggraph)
library(igraph)
library(d3r)
library(treemap)
library(sunburstR)
library(reactable)
library(skimr)
library(glue)
options(highcharter.theme = hc_theme_hcrt(tooltip = list(valueDecimals = 2)))

Part I: Product Service Codes [PSC]

Lets explore PSC’s to help us better understand the products and services procured by the government.

Code
tbl_psc <- dictionary_psc_active(only_active = T, snake_names = T)

Table to Explore All Active PSCs

Here is an interactive table that lets us engage with the active PSCs.

Code
psc_tbl <- tbl_psc |>
  select(
    is_active_psc,
    type_product_service,
    name_solicitation_group,
    code_product_service,
    name_product_service,
    date_start,
    date_end,
    details_product_service_includes,
    details_product_service_notes
  ) |>
  reactable(
    filterable = T,
    resizable = T,
    searchable = T,
    showPageSizeOptions = T,
    defaultPageSize = 4,
    defaultColDef = colDef(
        align = "center",
        vAlign = "center",
        minWidth = 175
      ),
    pageSizeOptions = c(5, 10, 20),
    sortable = T,
    compact = T,
    columns = list(
      name_product_service = colDef(minWidth = 300),
      details_product_service_includes = colDef(minWidth = 500),
      details_product_service_notes = colDef(minWidth = 500)
    )
  )

Here it is

What is the Breakdown Between Products and Services?

What is the breakdown between these two groups?

Code
gg_psc_bkd <-
  tbl_psc |>
  count(type_product_service, sort = T, name = "count") |>
  mutate(type_product_service = fct_reorder(type_product_service, count)) |>
  ggplot(aes(x = type_product_service, y = count, fill = type_product_service)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_viridis(discrete = TRUE, name = "") +
  theme_ipsum() +
  ylab("Number of Product Service Codes") +
  xlab("") +
  ggtitle("Products Versus Services") +
  coord_flip()

Interesting. There are 739 total product codes and 1795 service codes.

Another way to think about this is that for every product there are 2.43. Is this a meaningful insight?

Frequency of PSC Additions

Accurately describing products and services is important for transparency, vendor discovery, understanding who inside of government does what and so much more. As economies and technologies change it is important that procurement dictonaries keep up with the times.

Lets try to understand this a bit more.

The first step in doing this is seeing the dates where the GSA updated the PSC dictionary.

Code
tbl_new_codes <-
  tbl_psc|> 
  group_by(type_product_service, date_start)|> 
  summarise(count_added = n(), .groups = "drop")
Code
tbl_new_codes |>
  reactable(
    filterable = T,
    resizable = T,
    searchable = T,
    showPageSizeOptions = T,
    defaultPageSize = 4,
    pageSizeOptions = c(5, 10, 20),
    sortable = T,
    compact = T
  )

Lets explore this changes interactively interactively

Code
hc_new_psc <-
  hchart(tbl_new_codes,
         "line",
         hcaes(x = date_start, y = count_added, group = type_product_service))|> 
  hc_title(text = "New Product Service Codes by Date Added")|> 
  hc_yAxis(title = list(text = "# Product Services Codes Added"))|> 
  hc_xAxis(title = list(text = "Date Added")) |>
  hc_credits(title = list(text = "Data from GSA/acquisition.gov via govtrackR"))
  hc_add_theme(hc_theme_hcrt())

Part II: FPDS CSV

This section explores fpds_csv which provides real time access into the FPDS csv interface.

Code
args(fpds_csv)

function (fpds_type = “award”, decode_contract_ids = F, vendor_doing_business_as_name = NA, global_vendor_name = NA, vendor_name = NA, parent_vendor_name = NA, vendor_uei = NA, uei_name = NA, base_exercised_options_value = NA, current_contract_value = NA, dollars_obligated = NA, contract_value = NA, fee_range_lower_value = NA, fee_range_upper_value = NA, fixed_fee_value = NA, obligated_amount = NA, total_current_contract_value = NA, total_dollars_obligated = NA, total_non_government_value = NA, total_ultimate_contract_value = NA, ultimate_contract_value = NA, contract_fiscal_year = NA, vendor_duns_number = NA, parent_duns_number = NA, agency_name = NA, award_type = NA, solicitation_procedure = NA, contracting_office_name = NA, contracting_agency_name = NA, principal_naics_code = NA, award_status = NA, contract_type = NA, contract_type_description = NA, type_of_contract_pricing = NA, contract_id = NA, subcontract_plan = NA, ref_idv_contract_id = NA, ref_idv_agency_id = NA, contracting_agency_id = NA, contracting_office_id = NA, funding_agency_id = NA, funding_office_id = NA, funding_office_name = NA, agency_code = NA, department_id = NA, department_name = NA, research = NA, last_mod_date = NA, last_modified_by = NA, award_completion_date = NA, created_date = NA, signed_date = NA, effective_date = NA, estimated_completion_date = NA, cancellation_date = NA, destroy_date = NA, final_invoice_paid_date = NA, funded_through_date = NA, last_modified_date = NA, physical_completion_date = NA, reveal_date = NA, solicitation_issue_date = NA, sys_last_modified_date = NA, vendor_registration_date = NA, vendor_renewal_date = NA, created_by = NA, description_of_requirement = NA, reason_for_modification = NA, legislative_mandates = NA, local_area_set_aside = NA, socio_economic_indicators = NA, multiyear_contract = NA, national_interest_code = NA, product_or_service_code = NA, performance_district_code = NA, performance_country = NA, performance_state_name = NA, vendor_address_city = NA, vendor_congress_district_code = NA, vendor_address_country_code = NA, vendor_address_country_name = NA, vendor_address_state_code = NA, vendor_address_state_name = NA, vendor_address_zip_code = NA, extent_competed = NA, number_of_offers_received = NA, sort_item = “Signed Date”, unformat = F, exclude_bloat = T, clean_entity_column = F, sort_descending = T, use_future = T, return_message = T, snake_names = F, …) NULL

Acquire All Procurement of Visible and Invisible Light Communication Equipment

Lets take advantage of the function to acquire all these historic product procurement.

Code
tbl_crypto <-
  fpds_csv(product_or_service_code = "5850", snake_names = T)

What is the Total Spend?

How much has been spent on this product?

Code
tbl_crypto |> summarise(amount_obligation = sum(amount_obligation))
amount_obligation
$1,782,934,804

Spend Over Time?

Code
tbl_by_day <- 
  tbl_crypto|> 
  group_by(date_obligation)|> 
  summarise(amount = sum(amount_obligation), .groups = "drop")

Lets take a look at the data

Code
tbl_by_day|>  sample_n(3)|>  munge_data()|>  gt()
date_obligation amount
2005-03-15 $7,677
2003-06-23 $1,949,414
2007-12-13 $69,960

Explore Total Outlays by Day

Code
tbl_by_day|> 
  hchart("line",
         hcaes(x = date_obligation, y = amount))|> 
  hc_title(text = "Visible and Invisible Light Communication Equipment Procurement Total Daily Spend")|> 
  hc_yAxis(title = list(text = "Amount Obligated"))|> 
  hc_xAxis(title = list(text = "Date Added"))|> 
  hc_add_theme(hc_theme_elementary())

Cumulative Spend?

Lets add a cumulative total

Code
tbl_by_day <-
  tbl_by_day |>
  mutate(amount_cumulative = cumsum(amount))

Now lets explore it statically

Code
gg_area_pyro <-
  tbl_by_day |>
  ggplot(aes(date_obligation, amount_cumulative)) +
  geom_area(fill = "#22908C", alpha = .5) +
  scale_fill_viridis(discrete = TRUE) +
  scale_y_continuous(labels = scales::dollar) +
  scale_x_date() +
  theme(legend.position = "none") +
  theme_ipsum() +
  labs(title = "Cumulative Federal Procurement on Visible and Invisible Light Communication Equipments", x = "Date", y = "Cumulative Procurement Spend")

Spend by Department

Lets take a look at spend by department.

Code
tbl_depts <-
  tbl_crypto |>
  group_by(name_department_award) |> 
  summarise(amount = sum(amount_obligation),
            .groups = "drop") |> 
  arrange(desc(amount))
name_department_award amount
DEPARTMENT OF DEFENSE $1,651,586,944
GENERAL SERVICES ADMINISTRATION $63,126,942
DEPARTMENT OF TRANSPORTATION $14,115,735
DEPARTMENT OF JUSTICE $13,826,441
DEPARTMENT OF HOMELAND SECURITY $10,694,143
DEPARTMENT OF THE INTERIOR $9,874,243
DEPARTMENT OF STATE $6,556,371
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION $6,288,416
DEPARTMENT OF VETERANS AFFAIRS $1,502,817
DEPARTMENT OF COMMERCE $1,482,782
DEPARTMENT OF THE TREASURY $1,143,140
DEPARTMENT OF AGRICULTURE $1,043,279
DEPARTMENT OF ENERGY $895,548
BROADCASTING BOARD OF GOVERNORS $191,499
DEPARTMENT OF HEALTH AND HUMAN SERVICES $176,271
UNKNOWN AWARDING DEPARTMENT $154,000
CONSUMER PRODUCT SAFETY COMMISSION $110,000
ENVIRONMENTAL PROTECTION AGENCY $50,935
SMALL BUSINESS ADMINISTRATION $44,774
FEDERAL COMMUNICATIONS COMMISSION $21,486
SMITHSONIAN INSTITUTION $18,935
PENSION BENEFIT GUARANTY CORPORATION $13,199
DEPARTMENT OF LABOR $6,610
INTERNATIONAL BOUNDARY AND WATER COMMISSION: U.S.-MEXICO $5,574
AGENCY FOR INTERNATIONAL DEVELOPMENT $4,720

Now we can visualize it.

Code
tbl_depts |>
  mutate(
    name_department_award = fct_reorder(name_department_award, amount),
    amount_millions = amount / 1000000
  ) |>
  ggplot(aes(x = name_department_award, y = amount_millions)) +
  geom_segment(
    aes(
      x = name_department_award ,
      xend = name_department_award,
      y = 0,
      yend = amount_millions
    ),
    color = "grey"
  ) +
  geom_point(size = 3, color = "#69b3a2") +
  coord_flip() +
  theme_ipsum() +
  theme(
    legend.position = "none",
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_blank(),
    text = element_text(size = 7)
  ) +
  xlab("") +
  scale_y_log10(labels = scales::dollar, n.breaks = 10) +
  labs(title = "Which Departments Purchased the Visible and Invisible Light Communication Equipment?",
       x = "",
       y = "Procurement $ in milions (log10 transformed)")

Procurement Seasonality?

Often products and services tend to be procured in September at the end of the budget year, does this product exhibit that trend?

Lets build a table that gives us the inputs

Code
govt_months <- 
  c(
    "Oct",
    "Nov",
    "Dec",
    "Jan",
    "Feb",
    "Mar",
    "Apr",
    "May",
    "Jun",
    "Jul",
    "Aug",
    "Sep"
  )

tbl_monthly_pyro <-
  tbl_crypto|> 
  mutate(month_obligation = lubridate::month(date_obligation, label = T))|> 
  count(year_fiscal_obligation,
        month_obligation,
        wt = amount_obligation,
        name = "amount")

## Set the factor levels to budget months

tbl_monthly_pyro <- 
  tbl_monthly_pyro|> 
  mutate(month_obligation = factor(
    month_obligation,
    levels = govt_months,
    ordered = T
  ))

Lets turn this into an interactive heatmap.

Code
fntltp <- JS("function(){
  return this.point.x + ' ' +  this.series.yAxis.categories[this.point.y] + ': ' +
  Highcharts.numberFormat(this.point.value, 2);
}")


hc_pyro_hm <-
  hchart(
    tbl_monthly_pyro,
    "heatmap",
    hcaes(x = year_fiscal_obligation,
          y = month_obligation,
          value = amount)
  )|> 
  hc_colorAxis(
    stops = color_stops(20, colors = scales::viridis_pal(option = "B")(20)),
    # fuerza a utilzar mejor el espectro de colores para que HJS no amplie el
    # eje para tener numero "redondos
    startOnTick = FALSE,

    endOnTick =  FALSE,
    reversed = T
  )|> 
  hc_yAxis(
    title = list(text = ""),
    reversed = TRUE,
    offset = -20,
    tickLength = 0,
    gridLineWidth = 0,
    minorGridLineWidth = 0,
    labels = list(style = list(fontSize = "9px"))
  )|> 
  hc_tooltip(formatter = fntltp)|> 
  hc_title(text = "Spend by Month and Fiscal Year")|> 
  hc_legend(
    layout = "horizontal",
    verticalAlign = "top",
    align = "left",
    valueDecimals = 0
  )|> 
  hc_add_theme(hc_theme_darkunica())

Deepive: Top Visible and Invisible Light Communication Equipment Vendors

Next lets take a look at the companies that supply this product.

First thing we want to do is build a summary table that lets us explore the data.

Code
tbl_vendors <-
  tbl_crypto |>
  group_by(id_unique_entity) |>
  summarise(
    name_vendor = name_vendor[which.max(amount_obligation)],
    date_first_award = min(date_obligation, na.rm = T),
    date_recent_award = max(date_obligation, na.rm = T),
    count_actions = n(),
    count_contracts = n_distinct(id_contract_analysis, na.rm = T),
    amount_contracts = sum(amount_obligation),
    count_departments = n_distinct(name_department_award, na.rm = T),
    count_agencies = n_distinct(name_agency_cgac_award, na.rm = T)
  ) |>
  arrange(desc(amount_contracts))

Who are the top 10 vendors?

Code
tbl_vendors|> 
  slice(1:10) 
id_unique_entity name_vendor date_first_award date_recent_award count_actions count_contracts amount_contracts count_departments count_agencies
HEBCLD22EJD1 RAYTHEON COMPANY 2007-09-27 2023-06-01 86 8 $720,421,860 1 3
F11UCKNJJHX8 FLIR SYSTEMS, INC 1991-09-15 2018-08-15 747 404 $295,177,656 16 19
H7PNSVNN5827 LOCKHEED MARTIN CORPORATION 1995-02-15 2017-08-25 58 6 $119,123,268 1 3
F9AVUZ8YBGQ6 BAE SYSTEMS INFORMATION AND ELECTRONIC SYSTEMS INTEGRATION I 2003-12-24 2014-10-06 24 5 $84,681,947 1 3
FA38CTVQM319 CANADIAN COMMERCIAL CORPORATION 1986-08-15 2019-04-30 17 3 $69,764,248 1 3
LG6WVXVEEJN3 INSIGHT TECHNOLOGY INC 1996-09-15 2022-11-17 157 28 $69,098,102 3 5
G8YTUPGGAJN5 THE BOEING COMPANY 2004-12-02 2021-07-27 127 3 $59,863,370 1 2
LMKDJA7B7NV6 BOOZ ALLEN HAMILTON ENGINEERING SERVICES LLC 2013-09-29 2019-12-12 57 10 $41,033,029 1 1
L88SRK33JSR6 S.N.C. SCIONTI 1978-10-15 1996-03-15 142 129 $40,159,000 5 8
RRFJZGASZJ41 FEDERAL ELECT INTL 1980-12-15 1985-02-15 51 51 $26,394,000 1 1

Lets filter our data to include only vendors with over $500,000 in obligations.

Code
tbl_vendors <- tbl_vendors|>  filter(amount_contracts > 500000)

Now lets take a look at the top vendors visually.

Code
gg_top_vendors_pyro <-
  tbl_vendors |>
  filter(amount_contracts > 0) |>
  mutate(
    name_vendor_lumped = name_vendor |>  fct_lump(10, w = amount_contracts, other_level = "ALL OTHER VENDORS")
  ) |>
  count(name_vendor_lumped, wt = amount_contracts, name = "amount") |>
  mutate(name_vendor_lumped = fct_reorder(name_vendor_lumped, amount)) |>
  ggplot(aes(x = name_vendor_lumped, y = amount)) +
  geom_bar(stat = "identity", fill = "#B71212") +
  coord_flip() +
  theme_ipsum() +
  theme(
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_blank(),
    legend.position = "none",
  ) +
  xlab("") +
  labs(title = "Top Visible and Invisible Light Communication Equipment Vendors") +
  scale_y_continuous(labels = scales::dollar, n.breaks = 10)

Part III: FPDS Atom

Next we will explore FPDS atom, this provides the full interface into all the data contained in FPDS going back to 1978.

This provides access to significantly more data but is extremely compute heavy and exponentially slower than fpds_csv

Code
args(fpds_atom)

function (global_vendor_name = NA, vendor_name = NA, parent_vendor_name = NA, department_name = NA, award_type = NA, research = NA, vendor_uei = NA, uei_name = NA, vendor_duns_number = NA, parent_duns_number = NA, vendor_doing_business_as_name = NA, agency_name = NA, contracting_office_name = NA, contracting_agency_name = NA, principal_naics_code = NA, award_status = NA, subcontract_plan = NA, solicitation_procedure = NA, contract_type = NA, contract_type_description = NA, type_of_contract_pricing = NA, contract_id = NA, ref_idv_contract_id = NA, ref_idv_agency_id = NA, contracting_agency_id = NA, contracting_office_id = NA, funding_agency_id = NA, funding_office_id = NA, funding_office_name = NA, agency_code = NA, department_id = NA, last_mod_date = NA, last_modified_by = NA, award_completion_date = NA, created_date = NA, signed_date = NA, effective_date = NA, estimated_completion_date = NA, cancellation_date = NA, destroy_date = NA, final_invoice_paid_date = NA, funded_through_date = NA, last_modified_date = NA, physical_completion_date = NA, reveal_date = NA, solicitation_issue_date = NA, sys_last_modified_date = NA, vendor_registration_date = NA, vendor_renewal_date = NA, base_exercised_options_value = NA, current_contract_value = NA, dollars_obligated = NA, contract_value = NA, fee_range_lower_value = NA, fee_range_upper_value = NA, fixed_fee_value = NA, obligated_amount = NA, total_current_contract_value = NA, total_dollars_obligated = NA, total_non_government_value = NA, total_ultimate_contract_value = NA, ultimate_contract_value = NA, contract_fiscal_year = NA, created_by = NA, description_of_requirement = NA, reason_for_modification = NA, legislative_mandates = NA, local_area_set_aside = NA, socio_economic_indicators = NA, multiyear_contract = NA, national_interest_code = NA, product_or_service_code = NA, performance_district_code = NA, performance_country = NA, performance_state_name = NA, vendor_address_city = NA, vendor_congress_district_code = NA, vendor_address_country_code = NA, vendor_address_country_name = NA, vendor_address_state_code = NA, vendor_address_state_name = NA, vendor_address_zip_code = NA, extent_competed = NA, number_of_offers_received = NA, sort_item = “Signed Date”, use_future = F, show_progress = T, clean_address = T, clean_entity_column = T, sort_descending = T, parse_contracts = F, snake_names = F, keep_key_columns = F, exclude_bloat = F, unformat = T, return_message = T, …) NULL

Anduril Procurement

Lets explore a vendor. To do this you can be specific and use a known Unique Entity Identifier [UEI] or input vendor name text.

Code
tbl_anduril <-
  fpds_atom(
    vendor_name = "ANDURIL",
    parse_contracts = T,
    snake_names = T
  )

Removed 0 duplicates

Lets explore the data

Code
skim(tbl_anduril)
Data summary
Name tbl_anduril
Number of rows 261
Number of columns 357
_______________________
Column type frequency:
character 197
Date 6
logical 126
numeric 23
POSIXct 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
id_contract_analysis 0 1.00 13 17 0 107 0
id_contract 0 1.00 13 17 0 107 0
id_contract_idv 146 0.44 13 17 0 6 0
name_contract 0 1.00 64 106 0 145 0
description_obligation 0 1.00 7 232 0 168 0
name_vendor 0 1.00 22 23 0 2 0
address_street1vendor 0 1.00 16 24 0 5 0
cage_vendor_vendor 1 1.00 5 5 0 2 0
city_vendor 0 1.00 6 10 0 4 0
code_congressional_district_vendor 3 0.99 2 2 0 5 0
code_country_vendor 0 1.00 3 3 0 1 0
code_site_alternate_vendor_vendor 0 1.00 5 9 0 4 0
code_state_vendor 0 1.00 2 2 0 3 0
details_entity_identifiers_vendor 0 1.00 70 76 0 5 0
details_uei_information_vendor 0 1.00 69 71 0 5 0
id_unique_entity_parent 0 1.00 12 12 0 4 0
id_unique_entity 0 1.00 12 12 0 3 0
name_data_source_vendor 1 1.00 3 3 0 2 0
name_unique_entity_parent 0 1.00 23 23 0 2 0
name_unique_entity 0 1.00 22 24 0 3 0
telephone_vendor 0 1.00 10 10 0 3 0
zipcode_vendor 0 1.00 5 5 0 4 0
name_state_vendor 0 1.00 8 20 0 3 0
fax_vendor 260 0.00 10 10 0 1 0
code_consolidated_contract 78 0.70 1 1 0 2 0
type_consolidated_contract 78 0.70 2 16 0 2 0
id_agency_funding 4 0.98 4 4 0 7 0
name_agency_award 0 1.00 22 41 0 8 0
name_agency_award_idv 146 0.44 21 32 0 2 0
name_agency_funding 4 0.98 22 44 0 7 0
id_agency_award 0 1.00 4 4 0 8 0
id_agency_award_idv 146 0.44 4 4 0 2 0
code_inter_agency_contracting_authority 78 0.70 1 1 0 2 0
type_inter_agency_contracting_authority 78 0.70 11 14 0 2 0
name_office_award 0 1.00 8 43 0 25 0
name_office_funding 4 0.98 4 43 0 34 0
id_office_award 0 1.00 6 6 0 25 0
id_office_funding 4 0.98 6 6 0 36 0
code_contracting_officer_business_size_determination 78 0.70 1 1 0 2 0
type_contracting_officer_business_size_determination 78 0.70 14 25 0 2 0
name_naics 78 0.70 19 103 0 7 0
code_country_origin_product_service 108 0.59 3 3 0 1 0
code_product_service 0 1.00 4 4 0 25 0
name_country_origin_product_service 108 0.59 13 13 0 1 0
name_product_service 0 1.00 12 90 0 21 0
id_federal_account 253 0.03 8 8 0 4 0
url_fpds_contract_atom 0 1.00 107 111 0 107 0
code_action 0 1.00 1 1 0 5 0
code_additional_reporting_value 79 0.70 1 4 0 3 0
code_commercial_acquisition_procedures 78 0.70 1 1 0 2 0
code_congressional_district_performance 30 0.89 2 2 0 16 0
code_contract_bundling 78 0.70 1 1 0 2 0
code_contract_competition 1 1.00 1 1 0 6 0
code_contract_modification 106 0.59 1 1 0 8 0
code_cost_pricing_data 78 0.70 1 1 0 4 0
code_country_incorporation 1 1.00 3 3 0 1 0
code_country_performance 30 0.89 3 3 0 1 0
code_domestic_foreign_entity 78 0.70 1 1 0 1 0
code_epa_designated_product 108 0.59 1 1 0 2 0
code_evaluated_preference 79 0.70 4 4 0 1 0
code_humanitarian_peackeeping_operation 78 0.70 1 1 0 1 0
code_idv_set_aside 178 0.32 3 4 0 2 0
code_inherently_government_function 84 0.68 2 2 0 2 0
code_list_of_additional_reporting_values 79 0.70 1 4 0 4 0
code_manufacturing_organization 78 0.70 1 1 0 1 0
code_materials_supplies_articles_equipment 78 0.70 1 1 0 3 0
code_modification 0 1.00 1 6 0 16 0
code_modification_idv 146 0.44 1 6 0 3 0
code_multiple_or_single_idv 178 0.32 1 1 0 2 0
code_national_interest_action 78 0.70 4 4 0 1 0
code_offers_source 80 0.69 1 1 0 2 0
code_performance_based_service_contract 78 0.70 1 1 0 3 0
code_place_of_manufacture 108 0.59 1 1 0 2 0
code_recovered_materials 79 0.70 1 1 0 1 0
code_referenced_idv 178 0.32 1 1 0 1 0
code_research 163 0.38 3 3 0 4 0
code_set_aside_source 80 0.69 1 1 0 2 0
code_solicitation_procedures 1 1.00 2 4 0 7 0
code_state_incorporation 1 1.00 2 2 0 1 0
code_state_performance 30 0.89 2 2 0 13 0
code_status_transaction 0 1.00 1 1 0 1 0
code_sub_contract_plan 79 0.70 1 1 0 2 0
code_undefinitized_action 78 0.70 1 1 0 1 0
generic_string02 0 1.00 12 12 0 3 0
generic_string06 134 0.49 1 4 0 3 0
transaction_approved_by 0 1.00 7 41 0 51 0
transaction_created_by 0 1.00 7 41 0 55 0
transaction_last_modified_by 0 1.00 7 41 0 51 0
type_organization 0 1.00 11 24 0 2 0
zipcode_performance 30 0.89 5 5 0 17 0
name_country_incorporation 1 1.00 13 13 0 1 0
name_country_performance 30 0.89 13 13 0 1 0
name_state_incorporation 1 1.00 8 8 0 1 0
name_state_performance 30 0.89 5 20 0 13 0
type_a76fair_act_action 80 0.69 2 2 0 1 0
type_action 0 1.00 3 27 0 8 0
type_additional_reporting_value 79 0.70 17 35 0 3 0
type_clinger_cohen_act_planning 78 0.70 2 3 0 2 0
type_commercial_acquisition_procedures 78 0.70 15 48 0 4 0
type_commercial_item_test_program 79 0.70 2 3 0 2 0
type_construction_wage_requirements 78 0.70 2 14 0 2 0
type_contract_bundling 78 0.70 11 25 0 2 0
type_contract_competition 1 1.00 8 52 0 6 0
type_contract_modification 106 0.59 12 44 0 8 0
type_cost_or_pricing_data 95 0.64 2 3 0 2 0
type_cost_pricing_data 78 0.70 11 19 0 4 0
type_credit_card_purchase_method 108 0.59 2 3 0 2 0
type_domestic_foreign_entity 78 0.70 14 14 0 1 0
type_epa_designated_product 108 0.59 12 18 0 2 0
type_evaluated_preference 79 0.70 18 18 0 1 0
type_fbo_sourced 78 0.70 2 14 0 3 0
type_gfe_or_gfp 78 0.70 24 32 0 2 0
type_humanitarian_peackeeping_operation 78 0.70 14 14 0 1 0
type_idv_set_aside 178 0.32 18 32 0 2 0
type_inherently_government_function 84 0.68 15 18 0 2 0
type_labor_standards_clause 78 0.70 2 14 0 3 0
type_local_area_set_aside 79 0.70 2 2 0 1 0
type_manufacturing_organization 78 0.70 19 19 0 1 0
type_materials_supplies_articles_equipment 78 0.70 2 14 0 3 0
type_multiple_or_single_idv 178 0.32 12 14 0 2 0
type_multi_year_contract 95 0.64 2 3 0 2 0
type_national_interest_action 78 0.70 4 4 0 1 0
type_offers_source 80 0.69 3 11 0 2 0
type_performance_based_service_contract 78 0.70 14 35 0 3 0
type_place_of_manufacture 108 0.59 11 30 0 2 0
type_recovered_materials 79 0.70 50 50 0 1 0
type_referenced_idv 178 0.32 3 3 0 1 0
type_research 163 0.38 52 59 0 4 0
type_set_aside_source 80 0.69 3 11 0 2 0
type_solicitation_procedures 1 1.00 14 42 0 7 0
type_status_transaction 0 1.00 5 5 0 1 0
type_sub_contract_plan 79 0.70 17 27 0 2 0
type_undefinitized_action 78 0.70 2 2 0 1 0
code_cost_accounting_clause 157 0.40 1 1 0 2 0
type_cost_accounting_clause 157 0.40 25 30 0 2 0
code_idc 231 0.11 1 1 0 2 0
code_idc_contract 232 0.11 1 1 0 1 0
code_set_aside 162 0.38 3 4 0 2 0
code_who_can_use 231 0.11 13 14 0 3 0
id_solicitation 155 0.41 13 17 0 14 0
type_idc 231 0.11 12 14 0 2 0
type_idc_contract 232 0.11 41 41 0 1 0
type_set_aside 162 0.38 18 32 0 2 0
code_contract_financing 123 0.53 1 1 0 1 0
code_equipment_system 123 0.53 3 3 0 1 0
code_technology_item 133 0.49 1 1 0 3 0
type_contract_financing 123 0.53 14 14 0 1 0
type_equipment_system 123 0.53 4 4 0 1 0
type_sea_transported 122 0.53 2 7 0 2 0
type_technology_item 133 0.49 19 27 0 4 0
url_fpds_atom 0 1.00 131 133 0 27 0
code_program_claimant 134 0.49 2 3 0 5 0
type_program_claimant 134 0.49 8 60 0 5 0
code_non_traditional_government_contractor_participation 183 0.30 2 3 0 2 0
type_agreement 183 0.30 9 10 0 2 0
year_fiscal_contract 183 0.30 4 4 0 5 0
type_non_traditional_government_contractor_participation 183 0.30 12 40 0 2 0
code_not_compete_reason 200 0.23 3 3 0 3 0
generic_boolean02 260 0.00 5 5 0 1 0
generic_boolean03 260 0.00 5 5 0 1 0
generic_boolean04 260 0.00 5 5 0 1 0
generic_booleans 260 0.00 20 20 0 1 0
type_ordering_procedure 258 0.01 326 326 0 1 0
type_not_compete_reason 200 0.23 19 23 0 3 0
code_fee_for_use_of_service 246 0.06 2 2 0 1 0
type_fee_for_use_of_service 246 0.06 6 6 0 1 0
transaction_closed_by 249 0.05 12 12 0 1 0
email_contract 254 0.03 23 25 0 2 0
url_website 254 0.03 20 20 0 1 0
code_fair_opportunity_exception 260 0.00 4 4 0 1 0
type_fair_opportunity_exception 260 0.00 22 22 0 1 0
code_account_omb 253 0.03 4 4 0 4 0
code_treasury_account_symbol 253 0.03 6 9 0 4 0
id_cgac 253 0.03 2 2 0 3 0
code_treasury_sub_account_symbol 259 0.01 3 3 0 2 0
id_initiative 260 0.00 4 4 0 1 0
name_initiative 260 0.00 38 38 0 1 0
slug_agency_funding 184 0.30 3 7 0 3 0
slug_office_funding 259 0.01 6 6 0 1 0
location_vendor 0 1.00 34 45 0 5 0
city_state_vendor 0 1.00 10 14 0 4 0
location_performance 30 0.89 8 8 0 17 0
id_department_award 0 1.00 4 4 0 2 0
name_department_award 0 1.00 21 31 0 2 0
id_department_funding 4 0.98 4 4 0 2 0
name_department_funding 46 0.82 21 31 0 2 0
type_dod_award 26 0.90 1 2 0 2 0
type_contract_id_analysis 0 1.00 8 8 0 1 0
name_agency_cgac_funding 4 0.98 21 31 0 4 0
slug_cgac_funding 4 0.98 3 3 0 4 0
name_agency_cgac_award 0 1.00 21 31 0 5 0
slug_cgac_award 0 1.00 3 3 0 5 0
name_cgac_agency_idv 146 0.44 21 31 0 2 0
slug_cgac_agency_idv 146 0.44 3 3 0 2 0
id_federal_account_resolved 253 0.03 8 8 0 4 0
code_agency_award 0 1.00 4 4 0 5 0
code_agency_funding 4 0.98 4 4 0 4 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date_obligation 0 1.00 2012-07-16 2023-08-03 2022-04-20 211
date_contract_completion_current 30 0.89 2012-11-01 2026-06-06 2022-11-11 125
date_contract_completion_ultimate 30 0.89 2012-11-01 2028-09-29 2022-12-01 125
date_contract_effective 0 1.00 2012-08-01 2023-08-02 2021-09-23 95
date_registration_sam 0 1.00 2002-08-28 2020-10-21 2018-07-19 3
date_renewal_sam 0 1.00 2013-01-11 2024-05-07 2022-01-28 16

Variable type: logical

skim_variable n_missing complete_rate mean count
is_federal_government_agency 0 1.00 0.00 FAL: 261
has_a76fair_act_action 80 0.69 0.00 FAL: 181
has_clinger_cohen_act_planning 78 0.70 0.01 FAL: 182, TRU: 1
has_construction_wage_requirements 231 0.11 0.00 FAL: 30
has_contracts 0 1.00 0.00 FAL: 260, TRU: 1
has_contracts_and_grants 0 1.00 1.00 TRU: 260, FAL: 1
has_cost_or_pricing_data 95 0.64 0.02 FAL: 163, TRU: 3
has_credit_card_purchase_method 108 0.59 0.01 FAL: 151, TRU: 2
has_gfe_or_gfp 78 0.70 0.08 FAL: 169, TRU: 14
has_grants 0 1.00 0.00 FAL: 261
has_labor_standards_clause 242 0.07 0.63 TRU: 12, FAL: 7
is1862land_grant_college 0 1.00 0.00 FAL: 261
is1890land_grant_college 0 1.00 0.00 FAL: 261
is1994land_grant_college 0 1.00 0.00 FAL: 261
is_airport_authority 0 1.00 0.00 FAL: 261
is_alaskan_native_owned_corporation_or_firm 0 1.00 0.00 FAL: 261
is_alaskan_native_servicing_institution 0 1.00 0.00 FAL: 261
is_american_indian_owned 0 1.00 0.00 FAL: 261
is_asian_pacific_american_owned_business 0 1.00 0.00 FAL: 261
is_black_american_owned_business 0 1.00 0.00 FAL: 261
is_city_local_government 0 1.00 0.00 FAL: 261
is_commercial_item_test_program 79 0.70 0.01 FAL: 180, TRU: 2
is_community_developed_corporation_owned 0 1.00 0.00 FAL: 261
is_community_developed_corporation_owned_firm 0 1.00 0.00 FAL: 261
is_community_development_corporation 0 1.00 0.00 FAL: 261
is_corporate_entity_not_tax_exempt 0 1.00 1.00 TRU: 260, FAL: 1
is_corporate_entity_tax_exempt 0 1.00 0.00 FAL: 261
is_council_of_governments 0 1.00 0.00 FAL: 261
is_county_local_government 0 1.00 0.00 FAL: 261
is_domestic_shelter 0 1.00 0.00 FAL: 261
is_dot_certified_disadvantaged_business_enterprise 0 1.00 0.00 FAL: 261
is_economically_disadvantaged_women_owned_small_business 0 1.00 0.00 FAL: 261
is_educational_institution 0 1.00 0.00 FAL: 261
is_fbo_sourced 236 0.10 0.40 FAL: 15, TRU: 10
is_federal_government 0 1.00 0.00 FAL: 261
is_federally_funded_research_and_development_corp 0 1.00 0.00 FAL: 261
is_foreign_government 0 1.00 0.00 FAL: 261
is_foreign_owned_and_located 0 1.00 0.00 FAL: 261
is_for_profit_organization 0 1.00 1.00 TRU: 261
is_foundation 0 1.00 0.00 FAL: 261
is_hispanic_american_owned_business 0 1.00 0.00 FAL: 261
is_hispanic_servicing_institution 0 1.00 0.00 FAL: 261
is_historically_black_college_or_university 0 1.00 0.00 FAL: 261
is_hospital 0 1.00 0.00 FAL: 261
is_housing_authorities_public_or_tribal 0 1.00 0.00 FAL: 261
is_indian_tribe_owned 0 1.00 0.00 FAL: 261
is_inter_municipal_local_government 0 1.00 0.00 FAL: 261
is_international_organization 0 1.00 0.00 FAL: 261
is_interstate_entity 0 1.00 0.00 FAL: 261
is_joint_venture_economically_disadvantaged_women_owned_small_business 0 1.00 0.00 FAL: 261
is_joint_venture_women_owned_small_business 0 1.00 0.00 FAL: 261
is_labor_surplus_area_firm 0 1.00 0.00 FAL: 261
is_limited_liability_corporation 0 1.00 0.00 FAL: 261
is_local_area_set_aside 79 0.70 0.00 FAL: 182
is_local_government 0 1.00 0.00 FAL: 261
is_local_government_owned 0 1.00 0.00 FAL: 261
is_manufacturer_of_goods 0 1.00 0.49 FAL: 132, TRU: 129
is_minority_institution 0 1.00 0.00 FAL: 261
is_minority_owned 0 1.00 0.00 FAL: 261
is_multi_year_contract 95 0.64 0.05 FAL: 158, TRU: 8
is_municipality_local_government 0 1.00 0.00 FAL: 261
is_native_american_owned_business 0 1.00 0.00 FAL: 261
is_native_hawaiian_owned_organization_or_firm 0 1.00 0.00 FAL: 261
is_native_hawaiian_servicing_institution 0 1.00 0.00 FAL: 261
is_nonprofit_organization 0 1.00 0.00 FAL: 261
is_other_minority_owned 0 1.00 0.00 FAL: 261
is_other_not_for_profit_organization 0 1.00 0.00 FAL: 261
is_partnership_or_limited_liability_partnership 0 1.00 0.00 FAL: 260, TRU: 1
is_planning_commission 0 1.00 0.00 FAL: 261
is_port_authority 0 1.00 0.00 FAL: 261
is_private_university_or_college 0 1.00 0.00 FAL: 261
is_sba_certified8a_joint_venture 0 1.00 0.00 FAL: 261
is_sba_certified8a_program_participant 0 1.00 0.00 FAL: 261
is_sba_certified_hub_zone 0 1.00 0.00 FAL: 261
is_sba_certified_small_disadvantaged_business 0 1.00 0.00 FAL: 261
is_school_district_local_government 0 1.00 0.00 FAL: 261
is_school_of_forestry 0 1.00 0.00 FAL: 261
is_self_certified_hub_zone_joint_venture 0 1.00 0.00 FAL: 261
is_self_certified_small_disadvantaged_business 0 1.00 0.00 FAL: 261
is_service_related_disabled_veteran_owned_business 0 1.00 0.00 FAL: 261
is_sheltered_workshop 0 1.00 0.00 FAL: 261
is_small_agricultural_cooperative 0 1.00 0.00 FAL: 261
is_small_business 0 1.00 0.00 FAL: 261
is_sole_propreitorship 0 1.00 0.00 FAL: 261
is_state_controlled_institution_of_higher_learning 0 1.00 0.00 FAL: 261
is_state_government 0 1.00 0.00 FAL: 261
is_subchapter_s_corporation 0 1.00 0.00 FAL: 261
is_sub_continent_asian_american_owned_business 0 1.00 0.00 FAL: 261
is_township_local_government 0 1.00 0.00 FAL: 261
is_transaction_closed 1 1.00 0.05 FAL: 248, TRU: 12
is_transit_authority 0 1.00 0.00 FAL: 261
is_tribal_college 0 1.00 0.00 FAL: 261
is_tribal_government 0 1.00 0.00 FAL: 261
is_tribally_owned_firm 0 1.00 0.00 FAL: 261
is_us_government_entity 0 1.00 0.00 FAL: 261
is_very_small_business 0 1.00 0.00 FAL: 261
is_veteran_owned 0 1.00 0.00 FAL: 261
is_veterinary_college 0 1.00 0.00 FAL: 261
is_veterinary_hospital 0 1.00 0.00 FAL: 261
is_women_owned 0 1.00 0.00 FAL: 261
is_women_owned_small_business 0 1.00 0.00 FAL: 261
is_sea_transported 227 0.13 0.00 FAL: 34
has_consortia_flag 248 0.05 0.00 FAL: 13
has_consortia_information 248 0.05 0.00 FAL: 13
has_generic_string04 248 0.05 0.00 FAL: 13
has_small_business_competitiveness_demonstration_program 260 0.00 0.00 FAL: 1
is_idv 0 1.00 0.44 FAL: 146, TRU: 115
has_federal_account 0 1.00 0.03 FAL: 253, TRU: 8
is_possible_sbirsttr 0 1.00 0.07 FAL: 243, TRU: 18
is_filtered_sbir 0 1.00 0.10 FAL: 235, TRU: 26
has_unknown_awarding_department 0 1.00 0.00 FAL: 261
has_unknown_funding_department 0 1.00 0.02 FAL: 257, TRU: 4
has_unknown_awarding_agency 0 1.00 0.00 FAL: 261
has_unknown_funding_agency 0 1.00 0.02 FAL: 257, TRU: 4
has_unknown_awarding_office 0 1.00 0.00 FAL: 261
has_unknown_funding_office 0 1.00 0.02 FAL: 257, TRU: 4
has_different_funding_department 4 0.98 0.00 FAL: 257
has_different_funding_agency 4 0.98 0.29 FAL: 183, TRU: 74
has_different_funding_office 4 0.98 0.83 TRU: 214, FAL: 43
is_idv_contract 0 1.00 0.44 FAL: 146, TRU: 115
is_inherently_government_function 0 1.00 0.00 FAL: 261
is_closely_associated_function 0 1.00 0.00 FAL: 261
is_critical_function 0 1.00 0.00 FAL: 261
is_other_function 0 1.00 0.00 FAL: 261
is_closely_associated_and_critical_function 0 1.00 0.00 FAL: 261
is_critical_function_and_closely_associated 0 1.00 0.00 FAL: 261

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
amount_base_all_option 0 1.00 12728397.13 85678581.15 -759999.36 0.0 0 2120219 967599957 ▇▁▁▁▁
amount_base_all_option_total 0 1.00 67909952.53 197073479.24 0.00 1000000.0 4260000 18000000 967599957 ▇▁▁▁▁
amount_base_and_exercised_option 30 0.89 3296381.52 9931555.96 -0.08 0.0 65000 2072610 99905000 ▇▁▁▁▁
amount_base_and_exercised_option_total 30 0.89 7571406.78 12643919.58 3800.00 749999.9 2500000 9375000 99905000 ▇▁▁▁▁
amount_fee_paid_for_service 108 0.59 0.00 0.00 0.00 0.0 0 0 0 ▁▁▇▁▁
amount_obligation 0 1.00 2484562.21 7264145.71 -11800000.00 0.0 3800 1620000 67211616 ▇▁▁▁▁
amount_obligation_total 0 1.00 5689278.99 8783014.18 0.00 300000.0 1940000 6770000 67211616 ▇▂▁▁▁
amount_estimated_total 231 0.11 413211655.20 409278538.15 0.00 5000000.0 249550000 963199968 967599957 ▆▇▁▁▇
amount_non_governmental_dollars 183 0.30 0.00 0.00 0.00 0.0 0 0 0 ▁▁▇▁▁
amount_individual_order_limit 248 0.05 688103819.69 437746487.71 0.00 249550000.0 967599957 967599957 967599957 ▂▁▁▁▇
id_naics 78 0.70 477617.22 96608.96 334511.00 334511.0 541519 541715 561621 ▃▁▁▁▇
count_actions 108 0.59 1.00 0.00 1.00 1.0 1 1 1 ▁▁▇▁▁
count_offers_received_idv 178 0.32 8.28 9.36 1.00 1.0 12 12 77 ▇▁▁▁▁
count_offers_received 161 0.38 18.94 83.20 1.00 1.0 1 2 600 ▇▁▁▁▁
number_transaction 108 0.59 0.00 0.00 0.00 0.0 0 0 0 ▁▁▇▁▁
pct_price_difference_price_evaluation 205 0.21 0.00 0.00 0.00 0.0 0 0 0 ▁▁▇▁▁
year_budget 0 1.00 2021.64 1.36 2012.00 2021.0 2022 2023 2023 ▁▁▁▂▇
code_department_award 0 1.00 94.31 8.10 70.00 97.0 97 97 97 ▁▁▁▁▇
code_department_funding 46 0.82 93.73 8.82 70.00 97.0 97 97 97 ▁▁▁▁▇
id_cgac_funding 4 0.98 74.04 24.13 17.00 57.0 70 97 97 ▁▁▆▂▇
id_cgac_award 0 1.00 74.88 24.34 17.00 57.0 70 97 97 ▁▁▆▂▇
id_cgac_agency_idv 146 0.44 93.48 9.13 70.00 97.0 97 97 97 ▁▁▁▁▇
amount_obligation_allocated 0 1.00 2484562.21 7264145.71 -11800000.00 0.0 3800 1620000 67211616 ▇▁▁▁▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
datetime_contract_modified 0 1.00 2012-07-27 15:20:59 2023-08-30 07:47:49 2022-06-17 09:56:45 215
datetime_transaction_approved 0 1.00 2012-07-27 15:20:59 2023-08-24 11:11:29 2022-04-20 14:22:51 261
datetime_transaction_created 0 1.00 2012-07-27 15:20:45 2023-08-24 11:11:28 2022-04-20 10:58:59 261
datetime_transaction_last_modified 0 1.00 2012-07-27 15:20:59 2023-08-30 07:47:49 2022-06-17 09:56:45 215
datetime_transaction_closed 249 0.05 2020-09-15 23:02:53 2023-07-31 18:05:46 2023-04-11 23:39:10 12

What Can We See from FPDS Atom?

Here is a sample of the sorts of information that you can explore from FPDS Atom

Entity Resolution

Lets make sure we only have data from the Anduril we are looking for, Anduril Industries.

Code
tbl_count <- 
  tbl_anduril |>
  group_by(name_vendor, id_unique_entity) |>
  summarise(
    count_actions = n(),
    distinct_contracts = n_distinct(id_contract_analysis),
    .groups = "drop"
  )
name_vendor id_unique_entity count_actions distinct_contracts
ANDURIL ENGINEERING LLC MMLFZGBVYMB8 1 1
ANDURIL INDUSTRIES INC JA9NV8H3L2T9 9 2
ANDURIL INDUSTRIES INC KC3CH2MSK7Q3 251 104

Looks like there is another vendor that isn’t who we are interested in. Lets exclude them.

Code
tbl_anduril <-
  tbl_anduril|> 
  filter(id_unique_entity != "MMLFZGBVYMB8")

Anduril Agency Funding

Code
tbl_anduril_agency_awards <- tbl_anduril |>
  group_by(name_department_award, name_agency_award) |>
  summarise(amount = sum(amount_obligation, na.rm = T),
            distinct_contracts = n_distinct(id_contract_analysis), 
            .groups = "drop") |> 
  arrange(desc(amount))
name_department_award name_agency_award amount distinct_contracts
DEPARTMENT OF HOMELAND SECURITY US CUSTOMS AND BORDER PROTECTION $273,821,035 15
DEPARTMENT OF DEFENSE US SPECIAL OPERATIONS COMMAND $162,448,816 35
DEPARTMENT OF DEFENSE WASHINGTON HEADQUARTERS SERVICES $86,773,748 20
DEPARTMENT OF DEFENSE DEPARTMENT OF THE AIR FORCE $52,590,777 27
DEPARTMENT OF DEFENSE DEPARTMENT OF THE NAVY $40,501,818 5
DEPARTMENT OF DEFENSE DEPARTMENT OF THE ARMY $23,000,000 1
DEPARTMENT OF DEFENSE DEFENSE ADVANCED RESEARCH PROJECTS AGENCY $9,318,215 2
DEPARTMENT OF DEFENSE DEFENSE CONTRACT MANAGEMENT AGENCY $0 3

Anduril Award Description Analysis

One of the biggest benefits of fpds_atom is that it gives access to procurement text descriptions. Lets take a look at them for Anduril and see what 2 sets of words show up together and visualize them in a network graph.

Code
stop_words <- get_stopwords(source = "smart")

tidy_ngram <-
  tbl_anduril|> 
  select(id_contract_analysis,
         description_obligation,
         amount_obligation)|> 
  unnest_tokens(bigram,
                description_obligation,
                token = "ngrams",
                n = 2)

Next we can filter out stop words and build our bigram count table.

Code
bigram_counts <-
  tidy_ngram|> 
  separate(bigram, c("word1", "word2"), sep = " ")|> 
  filter(!word1 %in% stop_words$word,
         !word2 %in% stop_words$word)|> 
  count(word1, word2, sort = TRUE)

Lets take a look

Code
bigram_counts |> 
  slice(1:12) |> 
  gt::gt()
word1 word2 n
counter unmanned 54
unmanned aerial 52
aerial systems 51
battle management 20
transaction order 20
advanced battle 17
management system 16
include sentry 15
installing counter 15
interceptor prototypes 15
sentry towers 15
sensing network 11

Lets explore this visuall with all incidences over 5.

Code
bigram_counts|> 
  filter(!is.na(word1))|> 
  filter(n >= 5) |> 
  graph_from_data_frame()|> 
  ggraph(layout = "nicely") +
  geom_edge_link(
    aes(edge_alpha = n),
    show.legend = FALSE,
    arrow = arrow(length = unit(1.5, 'mm')),
    start_cap = circle(3, 'mm'),
    end_cap = circle(3, 'mm')
  ) +
  geom_node_text(aes(label = name)) +
  theme_graph() +
  labs(title = "Anduril FPDS Description Bigram (Minimum of 5 Bigram Counts)")

Anduril Trigram Award Totals

Next lets take a look at which 3 words show up together and how many contracts this and obligations they correspond to.

First we build the ngram table

Code
tbl_ngrams <-
  tbl_anduril|> 
  select(id_contract_analysis,
         description_obligation,
         amount_obligation)|> 
  unnest_tokens(ngram, description_obligation, token = "ngrams", n = 3)

Next we build the summary

Code
 tbl_ngrams <- 
  tbl_ngrams|> 
  group_by(ngram)|> 
  summarise(
    amount = sum(amount_obligation),
    count_contracts = n_distinct(id_contract_analysis)
  )|> 
  arrange(desc(amount))|>  
  munge_data()

Lets look at the top 15

Code
tbl_ngrams|>  
  slice(1:15) |>  
  gt()
ngram amount count_contracts
AUTONOMOUS SURVEILLANCE TOWERS $98,391,963 5
COUNTER UNMANNED AERIAL $75,082,167 16
UNMANNED AERIAL SYSTEMS $68,682,167 16
AST DO 8 $67,211,616 1
SURVEILLANCE TOWERS DELIVERY $60,731,513 2
TOWERS DELIVERY ORDER $60,731,513 2
PHASE III AUTONOMOUS $42,473,449 2
SBIR PHASE III $42,473,449 2
FOR COUNTER UNMANNED $39,063,500 9
ORDER FOR COUNTER $39,063,500 8
OTHER TRANSACTION ORDER $39,063,500 8
PRODUCTION OTHER TRANSACTION $39,063,500 9
TRANSACTION ORDER FOR $39,063,500 8
13 PURCHASE OF $38,030,175 1

Part IV: Analyzing Phase III SBIRs

This next section shows you how to take advantage of a special fpds_csv wrapper that acquires SBIR/STTR procurement.

Acquire All Phase III SBIR Procurements

Code
df_all <-
  fpds_research_csv(research_codes = c("SR3"),
                    snake_names = T)

SR3: $26,021,110,224 procured between 1992-07-29 and 2023-09-11 across 66,268 actions amongst 16,026 contracts allocated to 2,562 distinct UEI and 2,561 distinct parent UEI

Lets look at the data

Code
skim(df_all)
Data summary
Name df_all
Number of rows 66268
Number of columns 49
_______________________
Column type frequency:
character 35
Date 2
logical 3
numeric 9
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
code_research 0 1.00 3 3 0 1 0
id_contract 0 1.00 5 22 0 16030 0
id_contract_analysis 0 1.00 8 22 0 16026 0
type_procurement 0 1.00 5 5 0 1 0
id_contract_idv 42991 0.35 8 20 0 1152 0
code_modification 0 1.00 1 7 0 483 0
type_award 0 1.00 17 35 0 8 0
id_agency_award 0 1.00 4 4 0 78 0
name_agency_award 3261 0.95 10 59 0 67 0
name_office_award 0 1.00 4 63 0 802 0
type_product_or_service 0 1.00 1 1 0 2 0
code_product_service 7 1.00 4 4 0 1101 0
name_naics 811 0.99 7 118 0 655 0
city_vendor 0 1.00 3 20 0 1060 0
state_vendor 45 1.00 2 17 0 64 0
zip_vendor 33 1.00 4 9 0 2983 0
id_unique_entity 0 1.00 12 12 0 2562 0
id_unique_entity_parent 17 1.00 12 12 0 2560 0
name_vendor_parent 29 1.00 4 71 0 2585 0
name_vendor 0 1.00 3 72 0 3050 0
url_csv 0 1.00 190 192 0 25 0
id_department_award 14 1.00 4 4 0 29 0
name_department_award 0 1.00 11 45 0 29 0
type_contract_id_analysis 0 1.00 3 8 0 2 0
name_agency_cgac_award 4 1.00 19 45 0 30 0
slug_cgac_award 4 1.00 3 3 0 30 0
type_product_service 7 1.00 7 7 0 2 0
name_product_service 7 1.00 3 100 0 1023 0
id_solicitation_group 7 1.00 1 2 0 110 0
name_solicitation_group 7 1.00 6 75 0 110 0
cage_vendor 43816 0.34 5 5 0 1328 0
code_additional_reporting 51578 0.22 1 4 0 4 0
description_additonal_reporting 51578 0.22 17 82 0 4 0
names_vendor_listed 0 1.00 4 204 0 2537 0
names_vendor_parent_listed 264 1.00 4 289 0 2497 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date_obligation 0 1.00 1992-07-29 2023-09-11 2013-05-06 6314
date_solicitation 55542 0.16 1989-05-28 2023-09-07 2020-06-24 1122

Variable type: logical

skim_variable n_missing complete_rate mean count
is_idv 0 1 0.35 FAL: 42991, TRU: 23277
has_multiple_vendors 0 1 0.67 TRU: 44433, FAL: 21835
has_multiple_parent_vendors 264 1 0.09 FAL: 60008, TRU: 5996

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
number_transaction 0 1.00 0.40 3.99 0 0 0.00 0.0 786 ▇▁▁▁▁
amount_obligation 0 1.00 392664.79 5296886.53 -663800111 0 9792.67 198002.5 669175315 ▁▁▇▁▁
id_naics 811 0.99 481139.63 110862.03 111140 423410 541511.00 541712.0 928120 ▁▂▇▁▁
year_fiscal_obligation 0 1.00 2014.20 4.90 1992 2011 2013.00 2018.0 2023 ▁▁▂▇▅
code_department_award 14 1.00 91.87 14.74 11 97 97.00 97.0 97 ▁▁▁▁▇
id_cgac_award 0 1.00 35.70 24.75 11 21 21.00 57.0 97 ▇▁▂▁▂
year_budget 0 1.00 2014.40 4.90 1992 2012 2013.00 2018.0 2023 ▁▁▂▇▅
count_vendors_listed 0 1.00 1.90 0.85 1 1 2.00 2.0 6 ▇▁▁▁▁
count_vendors_parent_listed 264 1.00 1.20 0.97 1 1 1.00 1.0 12 ▇▁▁▁▁

Aggregate Summary Data

Code
df_all |>
  summarise(
    amount = sum(amount_obligation, na.rm = T),
    count_distinct_contracts = n_distinct(id_contract_analysis, na.rm = T),
    count_contract_actions = n(),
    count_distinct_vendors = n_distinct(id_unique_entity, na.rm = T),
    .groups = "drop"
  )
amount count_distinct_contracts count_contract_actions count_distinct_vendors
$26,021,110,224 16,026 66,268 2,562

Which Departments Award SBIR Phase IIIs

Lets explore which departments have awarded FPDS Phase IIIs since the year 2000.

Code
tbl_by_dept <-
  df_all|> 
  group_by(
    year_budget,
    name_department_award
  ) |> 
  summarise(amount = sum(amount_obligation),
            .groups = "drop") |> 
  filter(amount > 0, year_budget >= 2000)|> 
  mutate(
    department_group = fct_lump(
      name_department_award,
      n = 5,
      w = amount,
      other_level = "ALL OTHER DEPARTMENTS"
    )
  )|> 
  group_by(year_budget, department_group) |> 
  summarise(amount = sum(amount),
            .groups = "drop") |> 
  mutate(department_group = fct_reorder(department_group, -amount))

Now we can build an interactive streamgraph with this information.

Code
tbl_by_dept |>
  hchart("streamgraph",
         hcaes(year_budget, amount, group = department_group)) |>
  hc_yAxis(visible = T,
           startOnTick = FALSE,
           endOnTick = FALSE) |>
  hc_title(text = "SBIR Phase III Award $ by Department Group Since 2000") |> 
  hc_subtitle(text ="Top 5 Awarding Departments, Remainder Lumped.") |> 
  hc_caption(text = "Data from fpds via govtrackR")

Phase IIIs by Agency

Lets explore this data by agency.

First lets see the raw breakdown by agency.

Code
tbl_agency_amt <-
  df_all |>
  filter(!is.na(name_agency_cgac_award)) |>
  group_by(name_agency_cgac_award) |>
  summarise(amount = sum(amount_obligation), .groups = "drop") |>
  arrange(desc(amount))

Lets Look at the top 15

Code
tbl_agency_amt|> 
  slice(1:15)
name_agency_cgac_award amount
DEPARTMENT OF THE NAVY $10,582,311,589
DEPARTMENT OF THE ARMY $5,122,678,149
DEPARTMENT OF THE AIR FORCE $5,090,398,250
GENERAL SERVICES ADMINISTRATION $2,575,627,991
DEPARTMENT OF DEFENSE $1,174,204,112
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION $737,191,474
DEPARTMENT OF HOMELAND SECURITY $413,793,698
AGENCY FOR INTERNATIONAL DEVELOPMENT $97,210,029
DEPARTMENT OF THE INTERIOR $73,277,220
DEPARTMENT OF VETERANS AFFAIRS $34,299,592
DEPARTMENT OF COMMERCE $28,883,544
DEPARTMENT OF ENERGY $23,781,359
DEPARTMENT OF THE TREASURY $16,584,530
DEPARTMENT OF AGRICULTURE $15,090,561
DEPARTMENT OF TRANSPORTATION $12,450,970

Next lets lump together low frequency agencies (any agency not in the top 15) and visualize the totals.

Code
tbl_agency_amt |>
  mutate(
    agency_group = fct_lump(
      name_agency_cgac_award,
      n = 15,
      w = amount,
      other_level = "ALL OTHER AGENCIES"
    )
  ) |>
  group_by(agency_group) |>
  summarise(amount = sum(amount), .groups = "drop") |>
  arrange(desc(amount)) |> 
  mutate(agency_group = fct_reorder(agency_group, amount))|> 
  ggplot(aes(x = agency_group, y = amount)) +
  geom_bar(
    stat = "identity",
    fill = "#f68060",
    alpha = .6,
    width = .4
  ) +
  coord_flip() +
  xlab("") +
  scale_y_continuous(labels = scales::dollar, n.breaks = 5) +
  geom_text(
    aes(label = paste("$", round(amount / 1000000, digits = 2), "M")),
    position = position_dodge(width = 3.9),

    check_overlap = T,
    size = 2.5,
  ) +
  ggtitle("Top SBIR Phase III Awarding Agencies") +
  labs(subtitle = "By Lumped Government-wide Accounting Agency [CGAC] - Top 15",
       x = "") +
  theme_ipsum() +
  theme(
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_blank(),
    legend.position = "none",
    text = element_text(size = 8,  family = "serif")
  ) 

Explore Every SBIR Awarding Office

Lets take a look at all the awarding offices of SBIR Phase III and put it into an interactive table.

Code
tbl_office <-
  df_all|> 
  filter(!is.na(name_office_award))|> 
  group_by(name_department_award,
           name_agency_cgac_award,
           name_office_award)|> 
  summarise(
    date_first = min(date_obligation, na.rm = T),
    date_recent = max(date_obligation, na.rm = T),
    distinct_vendors = n_distinct(id_unique_entity, na.rm = T),
    amount = sum(amount_obligation, na.rm = T),
    amount_mean = mean(amount_obligation),
    actions = n(),
    contracts = n_distinct(id_contract_analysis, na.rm = T),
    .groups = "drop"
  )|> 
  filter(amount > 0)|> 
  arrange(desc(amount))
Code
table_office <- tbl_office|> 
  reactable(
    filterable = F,
    resizable = T,
    searchable = T,
    showPageSizeOptions = T,
    pageSizeOptions = c(5, 10, 20),
    sortable = T,
    compact = T,
    columns = list(
      date_first = colDef(name = "First Award"),
      date_recent = colDef(name = "Recent Award"),
      name_department_award = colDef(
        name = "Department",
        sortable = T,
        filterable = T
      ),
      name_agency_cgac_award = colDef(
        name = "Agency",
        sortable = T,
        filterable = T
      ),
      name_office_award = colDef(
        name = "Office",
        sortable = T,
        filterable = T
      ),
      distinct_vendors = colDef(name = "Unique Vendors"),
      amount = colDef(
        name = "$ Total",
        format = colFormat(
          prefix = "$",
          separators = TRUE,
          digits = 0
        )
      ),
      amount_mean = colDef(
        name = "$ Mean",
        format = colFormat(
          prefix = "$",
          separators = TRUE,
          digits = 0
        )
      ),
      contracts = colDef(name = "Unique Contracts"),
      actions = colDef(name = "Contract Actions")
    )
  )

Top 20 SBIR Phase III Awardees

Who are the top 20 SBIR Phase III awardees by amount?

Code
tbl_vendors <-
  df_all|> 
  filter(!is.na(id_unique_entity))|> 
  group_by(id_unique_entity)|> 
  summarise(
    vendor = name_vendor[which.max(amount_obligation)],
    amount = sum(amount_obligation, na.rm = T),
    count_distinct_contracts = n_distinct(id_contract_analysis, na.rm = T),
    amount_mean = mean(amount_obligation),
    date_first = min(date_obligation, na.rm = T),
    date_recent = max(date_obligation, na.rm = T),
    count_distinct_agencies = n_distinct(id_cgac_award, na.rm = T),
    count_departments = n_distinct(id_department_award),
    count_offices = n_distinct(name_office_award, na.rm = T),
    count_contract_actions = n(),
    .groups = "drop"
  )|> 
  filter(amount > 0)|> 
  arrange(desc(amount))|> 
  mutate(
    year_recent = year(date_recent),
    year_first = year(date_first),
    .before = "date_first"
  )

Here are the top 20

Code
tbl_vendors|> 
  slice(1:20)|> 
  munge_data()|> 
  gt()
id_unique_entity vendor amount count_distinct_contracts amount_mean year_recent year_first date_first date_recent count_distinct_agencies count_departments count_offices count_contract_actions
LN9PU5M2YZN5 MISCELLANEOUS FOREIGN CONTRACTORS $2,317,178,629 8,195 $106,054 2022 2008 2008-04-22 2022-09-27 3 3 61 21,849
MU3CM1145E44 PROGENY SYSTEMS CORPORATION $1,393,719,855 103 $1,075,401 2023 2000 2000-03-10 2023-06-13 4 2 23 1,296
HBCXZGZJ83M4 ALLIANT TECHSYSTEMS OPERATIONS LLC $1,042,435,288 8 $6,949,569 2023 2007 2007-01-04 2023-06-05 2 1 4 150
M7L6QRM22UN4 NAVMAR APPLIED SCIENCES CORPORATION $847,301,622 30 $1,200,144 2023 1999 1999-05-10 2023-05-23 2 1 4 706
TNMNYWRNFDK9 GATR TECHNOLOGIES, INC. $838,338,146 131 $1,046,614 2023 2008 2008-08-22 2023-05-26 3 1 7 801
P7SFAJZ6M9J6 FRONTIER TECHNOLOGY INC. $747,807,039 79 $733,864 2023 2009 2009-09-22 2023-08-17 5 2 22 1,019
NY98VECEW8H6 TECHNOLOGY SERVICE CORPORATION $584,008,611 23 $2,085,745 2023 2003 2003-05-27 2023-06-06 4 1 11 280
DG3AB4E8R4D8 PHYSICAL OPTICS CORPORATION $437,834,020 82 $1,020,592 2023 1998 1998-06-12 2023-06-12 7 4 27 429
R35KDAL1YFJ6 DIGITAL SYSTEM RESOURCES, INC $318,833,971 14 $1,423,366 2016 1998 1998-08-27 2016-06-22 2 1 11 224
GJEUJECUQNN6 CHESAPEAKE SCIENCES CORPORATION $311,607,886 30 $484,616 2023 2000 2000-06-02 2023-04-29 2 1 9 643
KC3CH2MSK7Q3 ANDURIL INDUSTRIES, INC. $309,541,465 26 $5,159,024 2023 2019 2019-09-24 2023-08-03 3 2 9 60
GDMPJ7GMBH13 3 PHOENIX, INC. $294,943,589 12 $1,100,536 2019 2006 2006-07-10 2019-12-13 2 1 5 268
QEHMNH93DYH6 INNOVATIVE DEFENSE TECHNOLOGIES, LLC $273,741,101 50 $600,309 2023 2007 2007-09-12 2023-06-02 3 1 13 456
XDBPX4PNJE57 AEPTEC MICROSYSTEMS, INC $268,566,325 12 $860,790 2023 2000 2000-09-29 2023-01-12 3 1 14 312
WYFZJPX5J537 SONALYSTS, INC. $240,162,699 8 $863,895 2023 2008 2008-08-28 2023-06-22 4 2 7 278
QWM1XWLUFQM8 TRIDENT SYSTEMS INCORPORATED $230,484,114 45 $391,314 2023 1999 1999-06-16 2023-05-23 5 2 16 589
QXMUHJ7XTF89 INSITU, INC. $222,591,199 5 $3,974,843 2023 2011 2011-05-12 2023-03-15 2 1 3 56
KVNLHF7DEVX9 L3 ADAPTIVE METHODS, INC. $220,035,459 14 $540,628 2023 2005 2005-03-08 2023-04-17 2 1 5 407
H6PJK2QBB986 PERDUCO GROUP, INC., THE $205,576,958 25 $747,553 2023 2019 2019-01-16 2023-08-31 2 2 5 275
DMKEXJ8NQ4C5 ARETE ASSOCIATES $201,235,016 35 $824,734 2023 2003 2003-05-22 2023-06-13 4 2 14 244

Notice anything interesting?

What would some possible next steps be to better understand strange entity we discovered?

Part V: The 2023 Defense Budget

Finally lets explore the 2023 Department of Defense Budget.

Acquire 2023 Budget Data

Here is how to acquire the data.

Code
df_budget <- dod_years_budgets(budget_years = 2023, snake_names = T)

[1] “FY 2021” “FY 2022” “FY 2023” [1] “EXHIBIT M-1” “FY 2021 (BASE + OCO)”
[3] “FY 2022 LESS SUPPLEMENTALS ENAC” “FY 2022 DIVISION B DIVISION C P” [5] “FY 2022 DIVISION B P.L.117-70 E” “FY 2022 DIVISION A P.L. 117-86” [7] “FY 2022 DIVISION N P.L. 117-103” “FY 2022 TOTAL SUPPLEMENTAL ENAC” [9] “FY 2022 TOTAL ENACTMENT” “FY 2023 REQUEST”
[1] “OM TITLE PLUS INDEFINITE” “OM TITLE”
[3] “INDEFINITE ACCOUNTS” “FY 2021 (BASE + OCO)”
[5] “FY 2022 LESS SUPPLEMENTALS ENAC” “FY 2022 DIVISION B DIVISION C P” [7] “FY 2022 DIVISION B P.L.117-70 E” “FY 2022 DIVISION A P.L. 117-86” [9] “FY 2022 DIVISION N P.L. 117-103” “FY 2022 TOTAL SUPPLEMENTAL ENAC” [11] “FY 2022 TOTAL ENACTMENT” “FY 2023 REQUEST”
[1] “EXHIBIT P-1” “FY 2021 (BASE + OCO)” “FY 2022 ENACTMENT”
[4] “FY 2023 REQUEST”
[1] “EXHIBIT P-1R” “FY 2021 (BASE + OCO)” “FY 2022 ENACTMENT”
[4] “FY 2023 REQUEST”
[1] “EXHIBIT R-1” “FY 2021 (BASE + OCO)” “FY 2022 ENACTMENT”
[4] “FY 2023 REQUEST”
[1] “RF TITLE” “FY 2021 (BASE + OCO)”
[3] “FY 2022 LESS SUPPLEMENTALS ENAC” “FY 2022 DIVISION B DIVISION C P” [5] “FY 2022 DIVISION B P.L.117-70 E” “FY 2022 DIVISION A P.L. 117-86” [7] “FY 2022 DIVISION N P.L. 117-103” “FY 2022 TOTAL SUPPLEMENTAL ENAC” [9] “FY 2022 TOTAL ENACTMENT” “FY 2023 REQUEST”

Lets take a look at it.

Code
skim(df_budget)
Data summary
Name df_budget
Number of rows 2900
Number of columns 43
_______________________
Column type frequency:
character 36
logical 3
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
type_budget 0 1.00 4 4 0 1 0
slug_dod_budget_group 0 1.00 2 3 0 7 0
name_dod_budget_group 0 1.00 4 29 0 7 0
force_organization 0 1.00 4 12 0 5 0
slug_classification 0 1.00 1 1 0 1 0
slug_organization 0 1.00 2 12 0 40 0
name_account_omb_clean 0 1.00 4 69 0 36 0
name_account_omb 0 1.00 4 47 0 47 0
code_program_element 274 0.91 1 12 0 2154 0
name_budget_parent_actual 0 1.00 5 48 0 114 0
name_budget_parent 0 1.00 5 48 0 108 0
name_budget_activity 0 1.00 2 80 0 1171 0
name_program_element_actual 0 1.00 2 80 0 1981 0
name_program_element 0 1.00 2 80 0 1979 0
type_budget_sub 203 0.93 7 7 0 1 0
slug_budget_parent 4 1.00 2 2 0 11 0
code_account_omb 0 1.00 4 6 0 101 0
slug_organization_account 0 1.00 2 20 0 21 0
name_budget_parent_detail 2764 0.05 2 18 0 9 0
name_budget_activity_actual 0 1.00 2 80 0 1213 0
name_budget_activity_detail 2705 0.07 3 55 0 74 0
name_location 2697 0.07 4 37 0 94 0
name_construction_project 2697 0.07 8 12 0 202 0
name_state_country 2697 0.07 4 21 0 48 0
url_budget_dod_file 0 1.00 77 78 0 7 0
id_line_number 748 0.74 1 3 0 321 0
slug_cost 1797 0.38 1 2 0 16 0
type_cost 1797 0.38 1 35 0 19 0
slug_treasury_agency 0 1.00 1 1 0 4 0
slug_state_country 2697 0.07 2 2 0 48 0
detail_program_element 2896 0.00 1 3 0 3 0
id_federal_account 0 1.00 8 8 0 103 0
name_account 136 0.95 23 87 0 89 0
name_financial_reporting_entity 136 0.95 21 25 0 2 0
name_function_budget 136 0.95 16 33 0 2 0
name_agency_cgac 0 1.00 21 27 0 4 0

Variable type: logical

skim_variable n_missing complete_rate mean count
is_budget_current_year 0 1.00 1.00 TRU: 2900
is_added 1764 0.39 0.00 FAL: 1136
is_toa 1314 0.55 0.98 TRU: 1561, FAL: 25

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year_budget 0 1.00 2023.00 0.00 2023 2023.0 2023 2023 2023 ▁▁▇▁▁
amount_item 0 1.00 272048787.93 1112103765.86 -2716395000 8234500.0 39073500 141525750 21702225000 ▇▁▁▁▁
count_item 2706 0.07 120.32 510.78 1 4.0 11 35 4674 ▇▁▁▁▁
amount_unit_cost 2706 0.07 80001542.93 475005948.07 -2103748000 165364.1 2925357 24404625 3625289500 ▁▇▁▁▁

Macro Exploration of the Budget

Breakdown by Agency
Code
df_budget|> 
  group_by(name_agency_cgac) |> 
  summarise(amount = sum(amount_item, na.rm = T),
            pct_total = amount / sum(amount),
            count_distinct_elements = n_distinct(code_program_element), .groups = "drop") |> 
  mutate(pct_total = amount / sum(amount)) |> 
  arrange(desc(amount))
name_agency_cgac amount count_distinct_elements pct_total
DEPARTMENT OF THE NAVY $239,573,277,000 657 30.37%
DEPARTMENT OF THE AIR FORCE $235,892,349,000 575 29.90%
DEPARTMENT OF THE ARMY $181,648,110,000 529 23.02%
DEPARTMENT OF DEFENSE $131,827,749,000 406 16.71%

Now let’s take a look at breakdown by Agency and Budget Group:

Code
df_budget |>
  group_by(name_agency_cgac, name_dod_budget_group) |>
  summarise(
    amount = sum(amount_item, na.rm = T),
    count_distinct_elements = n_distinct(code_program_element),
    .groups = "drop"
  ) |>
  mutate(pct_total = amount / sum(amount)) |> 
  arrange(desc(amount))
name_agency_cgac name_dod_budget_group amount count_distinct_elements pct_total
DEPARTMENT OF DEFENSE OPERATION AND MAINTENANCE $88,444,532,000 72 11.21%
DEPARTMENT OF THE NAVY OPERATION AND MAINTENANCE $77,717,886,000 67 9.85%
DEPARTMENT OF THE NAVY PROCUREMENT $74,030,027,000 305 9.38%
DEPARTMENT OF THE AIR FORCE OPERATION AND MAINTENANCE $73,106,970,000 57 9.27%
DEPARTMENT OF THE ARMY OPERATION AND MAINTENANCE $70,256,581,000 57 8.91%
DEPARTMENT OF THE ARMY MILITARY PERSONNEL $69,073,043,000 1 8.76%
DEPARTMENT OF THE AIR FORCE RDTE $59,953,673,000 281 7.60%
DEPARTMENT OF THE NAVY MILITARY PERSONNEL $58,484,305,000 1 7.41%
DEPARTMENT OF THE AIR FORCE PROCUREMENT $52,888,981,000 176 6.70%
DEPARTMENT OF THE AIR FORCE MILITARY PERSONNEL $46,325,984,000 1 5.87%
DEPARTMENT OF DEFENSE RDTE $33,266,604,000 224 4.22%
DEPARTMENT OF THE NAVY RDTE $24,093,583,000 238 3.05%
DEPARTMENT OF THE ARMY PROCUREMENT $22,609,292,000 222 2.87%
DEPARTMENT OF THE ARMY RDTE $14,753,043,000 202 1.87%
DEPARTMENT OF DEFENSE PROCUREMENT $5,957,011,000 57 0.76%
DEPARTMENT OF THE NAVY MILITARY CONSTRUCTION $4,594,913,000 47 0.58%
DEPARTMENT OF THE ARMY PROCUREMENT RESERVE $3,010,037,000 86 0.38%
DEPARTMENT OF THE AIR FORCE MILITARY CONSTRUCTION $2,956,283,000 59 0.37%
DEPARTMENT OF DEFENSE MILITARY CONSTRUCTION $2,686,592,000 50 0.34%
DEPARTMENT OF THE ARMY MILITARY CONSTRUCTION $1,916,177,000 46 0.24%
DEPARTMENT OF DEFENSE REVOLVING AND MANAGEMENT FUND $1,473,010,000 4 0.19%
DEPARTMENT OF THE NAVY PROCUREMENT RESERVE $652,563,000 19 0.08%
DEPARTMENT OF THE AIR FORCE PROCUREMENT RESERVE $580,010,000 35 0.07%
DEPARTMENT OF THE AIR FORCE REVOLVING AND MANAGEMENT FUND $80,448,000 1 0.01%
DEPARTMENT OF THE ARMY REVOLVING AND MANAGEMENT FUND $29,937,000 2 0.00%

Which Items Are We Buying

Now lets take a look at what specific items we are looking to acquire.

Code
tbl_items <-
  df_budget|> 
  filter(amount_unit_cost > 0, count_item > 0)|> 
  select(
    count_item,
    amount_unit_cost,
    name_program_element_actual,
    name_agency_cgac,
    slug_organization_account
  )|> 
  distinct()

Lets build an interactive chart that does that.

Code
hchart(
  tbl_items,
  "scatter",
  hcaes(
    x = count_item,
    y = amount_unit_cost,
    group = name_agency_cgac,
    name = name_program_element_actual
  ),
  marker = list(radius = 3, symbol = 'circle'),
  regression = TRUE
) |>
  hc_title(text = "2023 Defense Budget Requested Items") |>
  hc_xAxis(title = list(text = "# Items(log10 transformed)"),
           type = "logarithmic") |>
  hc_yAxis(title = list(text = "$ Amount Per Unit (log 10 transformed)"),
           type = "logarithmic") |>
  hc_tooltip(
    table = TRUE,
    outside = TRUE,
    shared = TRUE,
    useHTML = TRUE,
    headerFormat = "<small>{point.key}</small><table>",
    pointFormat = str_c(
      "<tr><td style=\"color: {series.color}\">{series.name}: </td>",
      "<tr><td style=\"text-align: right\"><b>Items: {point.x:,.0f}</b></td></tr>",
      "<tr><td style=\"text-align: right\"><b>$ Per item: {point.y:,0.f}</b></td></tr>",
      "<td style=\"text-align: right\"></td>"
    ),
    style = list(fontSize = "0.7em")
  ) |>
  hc_colors(c("#d35400", "#2980b9", "#2ecc71", "black")) |>
  hc_add_dependency("plugins/highcharts-regression.js") |>
  hc_credits(text = "Data from 2023 Defense Budget Request via govrtrackr")

2023 Budget Request

Lets see if we can explore the whole budget interactively.

To do that we will build a hierarchical sunburst visualization.

First lets get the data we need.

Code
treemap_columns <-
  c(
    "name_dod_budget_group",
    "name_agency_cgac",
    "slug_organization_account",
    "name_account_omb_clean",
    "name_budget_parent",
    "name_budget_activity",
    "name_program_element"
  )

tbl_treemap <- df_budget|> 
  count(
    !!!syms(treemap_columns),
    wt = amount_item,
    name = "amount",
    sort = T
  )|> 
  filter(amount > 0)

Now we can build the interactive sunburst chart.

Code
tm <-
  tbl_treemap|> 
  treemap(
    index = treemap_columns,
    vSize = "amount",
    draw = F,
    type = "index"
  )

tm_nest <- d3_nest(tm$tm[, c(treemap_columns,
                             "vSize",
                             "color")],
                   value_cols = c("vSize", "color"))

sun <- sund2b(
  tm_nest,
  colors = htmlwidgets::JS(# yes this is a little different, so please pay attention
    #  "function(d) {return d.color}" will not work
    "function(name, d){return d.color || '#ccc';}"),
  valueField = "vSize",
  elementId = "my-sunburst"
)

Lets See how it looks:

2023 Defense Budget Request Sunburst Chart

Source Code
---
title: "govtrackR"
author: SHELDON
date: "`r  Sys.Date()`"
editor: source
toc: true
toc-depth: 5
code-copy: true
code-tools: true
code-fold: show
fig-align: center
categories: [govtrackR, data analysis]
search: false
page-layout: full
title-block-banner: false
self-contained: false
execute: 
  echo: false
  eval: true
  warning: false
  message: false

format: 
  html:
    categories: [govtrackR, data analysis]
    subtitle: "Using govtrackR"
    abstract: "Introduction to govtrackR and government data analysis."
    abstract-title: "govtrackR"
    anchor-sections: true
    smooth-scroll: true
---

### Data Rights

```{r}
#| label: sheldon-data-rights
htmltools::HTML(sheldon::sbir_data_rights())
```

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  eval = T,
  results = 'asis',
  fig.retina = 4,
  warning = F,
  message = F,
  fig.width = 12,
  fig.height = 8
)
```

### Load Packages

```{r load_libs}
library(lubridate)
library(tidyverse)
library(tidytext)
library(govtrackR)
library(scales)
library(ggtext)
library(gganimate)
library(hrbrthemes)
library(rtemis)
library(viridis)
library(ggrepel)
library(highcharter)
library(tidylo)
library(widyr)
library(gt)
library(tidygraph)
library(ggraph)
library(igraph)
library(d3r)
library(treemap)
library(sunburstR)
library(reactable)
library(skimr)
library(glue)
options(highcharter.theme = hc_theme_hcrt(tooltip = list(valueDecimals = 2)))
```

### Part I: Product Service Codes [PSC]

Lets explore PSC's to help us better understand the products and services procured by the government.

```{r pull_psc}
tbl_psc <- dictionary_psc_active(only_active = T, snake_names = T)
```

#### Table to Explore All Active PSCs

Here is an interactive table that lets us engage with the active PSCs.

```{r}
psc_tbl <- tbl_psc |>
  select(
    is_active_psc,
    type_product_service,
    name_solicitation_group,
    code_product_service,
    name_product_service,
    date_start,
    date_end,
    details_product_service_includes,
    details_product_service_notes
  ) |>
  reactable(
    filterable = T,
    resizable = T,
    searchable = T,
    showPageSizeOptions = T,
    defaultPageSize = 4,
    defaultColDef = colDef(
        align = "center",
        vAlign = "center",
        minWidth = 175
      ),
    pageSizeOptions = c(5, 10, 20),
    sortable = T,
    compact = T,
    columns = list(
      name_product_service = colDef(minWidth = 300),
      details_product_service_includes = colDef(minWidth = 500),
      details_product_service_notes = colDef(minWidth = 500)
    )
  )
```

Here it is

```{r psc_table}
#| echo: false
psc_tbl
```

#### What is the Breakdown Between Products and Services?

What is the breakdown between these two groups?

```{r psc_bkdown}
gg_psc_bkd <-
  tbl_psc |>
  count(type_product_service, sort = T, name = "count") |>
  mutate(type_product_service = fct_reorder(type_product_service, count)) |>
  ggplot(aes(x = type_product_service, y = count, fill = type_product_service)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_viridis(discrete = TRUE, name = "") +
  theme_ipsum() +
  ylab("Number of Product Service Codes") +
  xlab("") +
  ggtitle("Products Versus Services") +
  coord_flip()
```

```{r munge_psc_count}
#| echo: false

tbl_count_psc <- tbl_psc |>
  count(type_product_service, sort = T, name = "count")

prod <- tbl_count_psc |> filter(type_product_service == "PRODUCT") |> pull(count)

service <- tbl_count_psc |> filter(type_product_service != "PRODUCT") |> pull(count)

service_ratio <- round(service/prod, 2)

```

```{r show_psc_bk}
#| echo: false

gg_psc_bkd
```

Interesting.  There are `r prod` total product codes and `r service` service codes.

Another way to think about this is that for every product there are `r service_ratio`.  Is this a meaningful insight?

#### Frequency of PSC Additions

Accurately describing products and services is important for transparency, vendor discovery, understanding who inside of government does what and so much more.  As economies and technologies change it is important that procurement dictonaries keep up with the times.

Lets try to understand this a bit more.

The first step in doing this is seeing the dates where the GSA updated the PSC dictionary.

```{r new_psc_dates}
tbl_new_codes <-
  tbl_psc|> 
  group_by(type_product_service, date_start)|> 
  summarise(count_added = n(), .groups = "drop")

```

```{r}
tbl_new_codes |>
  reactable(
    filterable = T,
    resizable = T,
    searchable = T,
    showPageSizeOptions = T,
    defaultPageSize = 4,
    pageSizeOptions = c(5, 10, 20),
    sortable = T,
    compact = T
  )
```

Lets explore this changes interactively interactively

```{r hc_new_psc, eval=FALSE}
hc_new_psc <-
  hchart(tbl_new_codes,
         "line",
         hcaes(x = date_start, y = count_added, group = type_product_service))|> 
  hc_title(text = "New Product Service Codes by Date Added")|> 
  hc_yAxis(title = list(text = "# Product Services Codes Added"))|> 
  hc_xAxis(title = list(text = "Date Added")) |>
  hc_credits(title = list(text = "Data from GSA/acquisition.gov via govtrackR"))
  hc_add_theme(hc_theme_hcrt())
```

```{r}
#| echo: false
#| warning: false
#| message: false
#| results: hide

hc_new_codes <- tbl_new_codes |> asbviz::hc_xy(
  type = "spline",
  x = "date_start",
  y = "count_added",
  group = "type_product_service",
  use_navigator = TRUE,
  override_x_text = list(text = "Date Added"),
  override_y_text = list(text = "# Product Services Codes Added"),
  override_y_label =
    list(
      formatter = JS(
        "function(){
return '' + Highcharts.numberFormat(this.value,0, '.', ',');
}"
      )
    ),
credits = "Data from GSA/acquisition.gov via govtrackR",
theme_name = "better_unica",
color_palette = "pals::coolwarm",
title = "New Product Service Codes by Date Added",
use_range_selector = T
)
```

```{r hc_new_codes}
#| echo: false  
#| fig-align: center
#| fig-width: 12
#| fig-height: 12

hc_new_codes
```

### Part II: FPDS CSV

This section explores `fpds_csv` which provides real time access into the <a href="https://www.fpds.gov/ezsearch/search.do?indexName=awardfull&templateName=1.5.1&s=FPDS.GOV&q=Google-like+search+to+help+you+find+federal+contracts..." target="_blank">FPDS</a> csv interface.

```{r eval = T}
#| tidy: true
#| sanitize: true
args(fpds_csv)
```

#### Acquire All Procurement of Visible and Invisible Light Communication Equipment

Lets take advantage of the function to acquire all these historic product procurement.

```{r fpds_crypto}
tbl_crypto <-
  fpds_csv(product_or_service_code = "5850", snake_names = T)
```

```{r fpds_fix}
#| include: false
tbl_crypto <- tbl_crypto |> 
  mutate(name_department_award = coalesce(name_department_award, "UNKNOWN AWARDING DEPARTMENT"),
         name_agency_award =  coalesce(name_agency_award, "UNKNOWN AWARDING AGENCY"),
         name_agency_cgac_award = coalesce(name_agency_cgac_award, "UNKNOWN AWARDING AGENCY"))

total_spend <- tbl_crypto |> pull(amount_obligation) |> sum() |> formattable::comma()

```

#### What is the Total Spend?

How much has been spent on this product?

```{r}
#| eval: false
tbl_crypto |> summarise(amount_obligation = sum(amount_obligation))
```

```{r}
#| echo: false
tbl_crypto |> summarise(amount_obligation = sum(amount_obligation)) |> munge_data() |> gt::gt()
```

#### Spend Over Time?

```{r pyro_tbl}
tbl_by_day <- 
  tbl_crypto|> 
  group_by(date_obligation)|> 
  summarise(amount = sum(amount_obligation), .groups = "drop")
```

Lets take a look at the data

```{r}
tbl_by_day|>  sample_n(3)|>  munge_data()|>  gt()
```

#### Explore Total Outlays by Day

```{r eval = F}
tbl_by_day|> 
  hchart("line",
         hcaes(x = date_obligation, y = amount))|> 
  hc_title(text = "Visible and Invisible Light Communication Equipment Procurement Total Daily Spend")|> 
  hc_yAxis(title = list(text = "Amount Obligated"))|> 
  hc_xAxis(title = list(text = "Date Added"))|> 
  hc_add_theme(hc_theme_elementary())
```

```{r}
#| echo: false
#| warning: false
#| message: false
#| results: hide

hc_day_line <- tbl_by_day |>
  asbviz::hc_xy(
    type = "line",
    x = "date_obligation",
    y = "amount",
    override_x_text = list(text = "Date"),
    override_y_text = list(text = "Total Procurement $"),
    override_y_label = list(
      formatter = JS(
        "function(){
return '$' + Highcharts.numberFormat(this.value, 0, '.', ',');
}"
      )
    ),
use_navigator = TRUE,
credits = "Data from fpds via govtrackR",
theme_name = "clean_unica",
color_palette = "pals::coolwarm",
point_size = 1,
disable_legend = TRUE,
title = "Visible and Invisible Light Communication Equipment Procurement Total Daily Spend"
  )
```

```{r hc_day_line}
#| echo: false  
#| fig-align: center
#| fig-width: 12
#| fig-height: 12

hc_day_line
```

#### Cumulative Spend?

Lets add a cumulative total

```{r}
tbl_by_day <-
  tbl_by_day |>
  mutate(amount_cumulative = cumsum(amount))
```

Now lets explore it statically

```{r}
gg_area_pyro <-
  tbl_by_day |>
  ggplot(aes(date_obligation, amount_cumulative)) +
  geom_area(fill = "#22908C", alpha = .5) +
  scale_fill_viridis(discrete = TRUE) +
  scale_y_continuous(labels = scales::dollar) +
  scale_x_date() +
  theme(legend.position = "none") +
  theme_ipsum() +
  labs(title = "Cumulative Federal Procurement on Visible and Invisible Light Communication Equipments", x = "Date", y = "Cumulative Procurement Spend")
```

```{r}
#| echo: false
gg_area_pyro
```

#### Spend by Department

Lets take a look at spend by department.

```{r}
tbl_depts <-
  tbl_crypto |>
  group_by(name_department_award) |> 
  summarise(amount = sum(amount_obligation),
            .groups = "drop") |> 
  arrange(desc(amount))
```

```{r echo=FALSE}
tbl_depts|>  munge_data()|>  gt()
```

Now we can visualize it.

```{r}
tbl_depts |>
  mutate(
    name_department_award = fct_reorder(name_department_award, amount),
    amount_millions = amount / 1000000
  ) |>
  ggplot(aes(x = name_department_award, y = amount_millions)) +
  geom_segment(
    aes(
      x = name_department_award ,
      xend = name_department_award,
      y = 0,
      yend = amount_millions
    ),
    color = "grey"
  ) +
  geom_point(size = 3, color = "#69b3a2") +
  coord_flip() +
  theme_ipsum() +
  theme(
    legend.position = "none",
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_blank(),
    text = element_text(size = 7)
  ) +
  xlab("") +
  scale_y_log10(labels = scales::dollar, n.breaks = 10) +
  labs(title = "Which Departments Purchased the Visible and Invisible Light Communication Equipment?",
       x = "",
       y = "Procurement $ in milions (log10 transformed)")
```

#### Procurement Seasonality?

Often products and services tend to be procured in September at the end of the budget year, does this product exhibit that trend?

Lets build a table that gives us the inputs

```{r}
govt_months <- 
  c(
    "Oct",
    "Nov",
    "Dec",
    "Jan",
    "Feb",
    "Mar",
    "Apr",
    "May",
    "Jun",
    "Jul",
    "Aug",
    "Sep"
  )

tbl_monthly_pyro <-
  tbl_crypto|> 
  mutate(month_obligation = lubridate::month(date_obligation, label = T))|> 
  count(year_fiscal_obligation,
        month_obligation,
        wt = amount_obligation,
        name = "amount")

## Set the factor levels to budget months

tbl_monthly_pyro <- 
  tbl_monthly_pyro|> 
  mutate(month_obligation = factor(
    month_obligation,
    levels = govt_months,
    ordered = T
  ))

```

Lets turn this into an interactive heatmap.

```{r eval = FALSE}

fntltp <- JS("function(){
  return this.point.x + ' ' +  this.series.yAxis.categories[this.point.y] + ': ' +
  Highcharts.numberFormat(this.point.value, 2);
}")


hc_pyro_hm <-
  hchart(
    tbl_monthly_pyro,
    "heatmap",
    hcaes(x = year_fiscal_obligation,
          y = month_obligation,
          value = amount)
  )|> 
  hc_colorAxis(
    stops = color_stops(20, colors = scales::viridis_pal(option = "B")(20)),
    # fuerza a utilzar mejor el espectro de colores para que HJS no amplie el
    # eje para tener numero "redondos
    startOnTick = FALSE,

    endOnTick =  FALSE,
    reversed = T
  )|> 
  hc_yAxis(
    title = list(text = ""),
    reversed = TRUE,
    offset = -20,
    tickLength = 0,
    gridLineWidth = 0,
    minorGridLineWidth = 0,
    labels = list(style = list(fontSize = "9px"))
  )|> 
  hc_tooltip(formatter = fntltp)|> 
  hc_title(text = "Spend by Month and Fiscal Year")|> 
  hc_legend(
    layout = "horizontal",
    verticalAlign = "top",
    align = "left",
    valueDecimals = 0
  )|> 
  hc_add_theme(hc_theme_darkunica())
```

```{r}
#| echo: false
#| message: false
#| eval: true
#| results: hide

hc_pyro_hm_final <- tbl_monthly_pyro |>
  asbviz::hc_xy(
    x = "year_fiscal_obligation",
    y = "month_obligation",
    group = "amount",
    type = "heatmap",
    theme_name = "darkunica",
    override_x_text = list(text = ""),
    override_y_text = list(text = ""),
    title = "Visible and Invisible Light Communication Equipment Procurement<br>Spend by Month and Budget Year"
  )
```


```{r}
#| echo: false
#| fig-align: center
#| fig-width: 12
#| fig-height: 12

hc_pyro_hm_final
```

#### Deepive: Top Visible and Invisible Light Communication Equipment Vendors

Next lets take a look at the companies that supply this product.

First thing we want to do is build a summary table that lets us explore the data.

```{r}
tbl_vendors <-
  tbl_crypto |>
  group_by(id_unique_entity) |>
  summarise(
    name_vendor = name_vendor[which.max(amount_obligation)],
    date_first_award = min(date_obligation, na.rm = T),
    date_recent_award = max(date_obligation, na.rm = T),
    count_actions = n(),
    count_contracts = n_distinct(id_contract_analysis, na.rm = T),
    amount_contracts = sum(amount_obligation),
    count_departments = n_distinct(name_department_award, na.rm = T),
    count_agencies = n_distinct(name_agency_cgac_award, na.rm = T)
  ) |>
  arrange(desc(amount_contracts))
```

Who are the top 10 vendors?

```{r eval = F}
tbl_vendors|> 
  slice(1:10) 
```

```{r}
#| echo: false
#| eval: true
tbl_vendors|> 
  slice(1:10) |> 
  munge_data() |> 
  gt()
```

Lets filter our data to include only vendors with over $500,000 in obligations.

```{r}
tbl_vendors <- tbl_vendors|>  filter(amount_contracts > 500000)
```

Now lets take a look at the top vendors visually.

```{r}
gg_top_vendors_pyro <-
  tbl_vendors |>
  filter(amount_contracts > 0) |>
  mutate(
    name_vendor_lumped = name_vendor |>  fct_lump(10, w = amount_contracts, other_level = "ALL OTHER VENDORS")
  ) |>
  count(name_vendor_lumped, wt = amount_contracts, name = "amount") |>
  mutate(name_vendor_lumped = fct_reorder(name_vendor_lumped, amount)) |>
  ggplot(aes(x = name_vendor_lumped, y = amount)) +
  geom_bar(stat = "identity", fill = "#B71212") +
  coord_flip() +
  theme_ipsum() +
  theme(
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_blank(),
    legend.position = "none",
  ) +
  xlab("") +
  labs(title = "Top Visible and Invisible Light Communication Equipment Vendors") +
  scale_y_continuous(labels = scales::dollar, n.breaks = 10)
```

```{r}
#| echo: false
gg_top_vendors_pyro
```

### Part III: FPDS Atom

Next we will explore FPDS atom, this provides the full interface into all the data contained in FPDS going back to 1978.

This provides access to significantly more data but is extremely compute heavy and exponentially slower than `fpds_csv`

```{r}
#| tidy: true
#| sanitize: true
args(fpds_atom)
```


#### Anduril Procurement

Lets explore a vendor.  To do this you can be specific and use a known Unique Entity Identifier [UEI] or input vendor name text.

```{r}
tbl_anduril <-
  fpds_atom(
    vendor_name = "ANDURIL",
    parse_contracts = T,
    snake_names = T
  )
```

Lets explore the data

```{r}
#| tidy: true
skim(tbl_anduril)
```

#### What Can We See from FPDS Atom?

Here is a sample of the sorts of information that you can explore from FPDS Atom

```{r}
#| label: obj_sample
#| echo: false
#| fig-width: 10
#| fig-height: 8

out <- tbl_anduril |>
  filter(amount_obligation == max(amount_obligation)) |> 
  janitor::remove_empty("cols") |> 
  jsonview::json_tree_view(scroll = T)

widgetframe::frameWidget(out)
```

#### Entity Resolution

Lets make sure we only have data from the Anduril we are looking for, Anduril Industries.

```{r}
tbl_count <- 
  tbl_anduril |>
  group_by(name_vendor, id_unique_entity) |>
  summarise(
    count_actions = n(),
    distinct_contracts = n_distinct(id_contract_analysis),
    .groups = "drop"
  )
```

```{r}
#| echo: false
gt(tbl_count)
```

Looks like there is another vendor that isn't who we are interested in.  Lets exclude them.

```{r}
tbl_anduril <-
  tbl_anduril|> 
  filter(id_unique_entity != "MMLFZGBVYMB8")
```

#### Anduril Agency Funding

```{r}
tbl_anduril_agency_awards <- tbl_anduril |>
  group_by(name_department_award, name_agency_award) |>
  summarise(amount = sum(amount_obligation, na.rm = T),
            distinct_contracts = n_distinct(id_contract_analysis), 
            .groups = "drop") |> 
  arrange(desc(amount))
```

```{r}
#| echo: false
tbl_anduril_agency_awards |> 
  munge_data() |> 
  gt()
```

#### Anduril Award Description Analysis

One of the biggest benefits of `fpds_atom` is that it gives access to procurement text descriptions.  Lets take a look at them for Anduril and see what 2 sets of words show up together and visualize them in a network graph.

```{r}
stop_words <- get_stopwords(source = "smart")

tidy_ngram <-
  tbl_anduril|> 
  select(id_contract_analysis,
         description_obligation,
         amount_obligation)|> 
  unnest_tokens(bigram,
                description_obligation,
                token = "ngrams",
                n = 2)
```

Next we can filter out stop words and build our bigram count table.

```{r}
bigram_counts <-
  tidy_ngram|> 
  separate(bigram, c("word1", "word2"), sep = " ")|> 
  filter(!word1 %in% stop_words$word,
         !word2 %in% stop_words$word)|> 
  count(word1, word2, sort = TRUE)
```

Lets take a look

```{r}
bigram_counts |> 
  slice(1:12) |> 
  gt::gt()
```

Lets explore this visuall with all incidences over 5.

```{r}
bigram_counts|> 
  filter(!is.na(word1))|> 
  filter(n >= 5) |> 
  graph_from_data_frame()|> 
  ggraph(layout = "nicely") +
  geom_edge_link(
    aes(edge_alpha = n),
    show.legend = FALSE,
    arrow = arrow(length = unit(1.5, 'mm')),
    start_cap = circle(3, 'mm'),
    end_cap = circle(3, 'mm')
  ) +
  geom_node_text(aes(label = name)) +
  theme_graph() +
  labs(title = "Anduril FPDS Description Bigram (Minimum of 5 Bigram Counts)")
```


#### Anduril Trigram Award Totals

Next lets take a look at which 3 words show up together and how many contracts this and obligations they correspond to.

First we build the ngram table

```{r}
tbl_ngrams <-
  tbl_anduril|> 
  select(id_contract_analysis,
         description_obligation,
         amount_obligation)|> 
  unnest_tokens(ngram, description_obligation, token = "ngrams", n = 3)
```

Next we build the summary

```{r}
 tbl_ngrams <- 
  tbl_ngrams|> 
  group_by(ngram)|> 
  summarise(
    amount = sum(amount_obligation),
    count_contracts = n_distinct(id_contract_analysis)
  )|> 
  arrange(desc(amount))|>  
  munge_data()
```

Lets look at the top 15

```{r eval = F}
tbl_ngrams|>  
  slice(1:15) |>  
  gt()
```

```{r echo = FALSE}
tbl_ngrams|>  
  slice(1:15) |>  
  filter(!is.na(ngram)) |> 
  munge_data() |> 
  gt()
```

### Part IV: Analyzing Phase III SBIRs

This next section shows you how to take advantage of a special `fpds_csv` wrapper that acquires SBIR/STTR procurement.

#### Acquire All Phase III SBIR Procurements

```{r eval = T}
df_all <-
  fpds_research_csv(research_codes = c("SR3"),
                    snake_names = T)
```

Lets look at the data 

```{r}
#| tidy: true
skim(df_all)
```

#### Aggregate Summary Data

```{r eval = F}
df_all |>
  summarise(
    amount = sum(amount_obligation, na.rm = T),
    count_distinct_contracts = n_distinct(id_contract_analysis, na.rm = T),
    count_contract_actions = n(),
    count_distinct_vendors = n_distinct(id_unique_entity, na.rm = T),
    .groups = "drop"
  )
```

```{r eval = T, echo=FALSE}
df_all |>
  summarise(
    amount = sum(amount_obligation, na.rm = T),
    count_distinct_contracts = n_distinct(id_contract_analysis, na.rm = T),
    count_contract_actions = n(),
    count_distinct_vendors = n_distinct(id_unique_entity, na.rm = T),
    .groups = "drop"
  ) |>
  munge_data() |>
  gt::gt()
```

#### Which Departments Award SBIR Phase IIIs

Lets explore which departments have awarded FPDS Phase IIIs since the year 2000.

```{r}
tbl_by_dept <-
  df_all|> 
  group_by(
    year_budget,
    name_department_award
  ) |> 
  summarise(amount = sum(amount_obligation),
            .groups = "drop") |> 
  filter(amount > 0, year_budget >= 2000)|> 
  mutate(
    department_group = fct_lump(
      name_department_award,
      n = 5,
      w = amount,
      other_level = "ALL OTHER DEPARTMENTS"
    )
  )|> 
  group_by(year_budget, department_group) |> 
  summarise(amount = sum(amount),
            .groups = "drop") |> 
  mutate(department_group = fct_reorder(department_group, -amount))
```

Now we can build an interactive streamgraph with this information.

```{r eval = FALSE}
tbl_by_dept |>
  hchart("streamgraph",
         hcaes(year_budget, amount, group = department_group)) |>
  hc_yAxis(visible = T,
           startOnTick = FALSE,
           endOnTick = FALSE) |>
  hc_title(text = "SBIR Phase III Award $ by Department Group Since 2000") |> 
  hc_subtitle(text ="Top 5 Awarding Departments, Remainder Lumped.") |> 
  hc_caption(text = "Data from fpds via govtrackR")
```

```{r}
#| echo: false
#| eval: true
#| message: false
#| warning: false
#| results: hide

hc_stream_p3 <- tbl_by_dept |>
  mutate(department_group = as.character(department_group)) |> 
  asbtools::tbl_ordered_factor(columns = "department_group", weight = "amount", reverse = T) |> 
  asbviz::hc_xy(
    x = "year_budget",
    y = "amount",
    group = "department_group",
    type = "streamgraph",
    theme_name = "clean_unica",
    point_size = 0,
    point_width = 0,
    disable_y = T,
    relevel_group = T,
    override_x_text = list(text = ""),
    disable_x = F,
    invert_chart = F,
    color_palette = "viridis::viridis",
    title = "SBIR Phase III Award $ by Department Group Since 2000",
    subtitle = "Top 5 Awarding Departments, Remainder Lumped.",
    caption = "Data from fpds via govtrackR"
  )


```

```{r hc_stream_p3}
#| echo: false  
#| fig-align: center
#| fig-width: 12
#| fig-height: 12
hc_stream_p3
```

#### Phase IIIs by Agency

Lets explore this data by agency.

First lets see the raw breakdown by agency.

```{r}
tbl_agency_amt <-
  df_all |>
  filter(!is.na(name_agency_cgac_award)) |>
  group_by(name_agency_cgac_award) |>
  summarise(amount = sum(amount_obligation), .groups = "drop") |>
  arrange(desc(amount))
```

Lets Look at the top 15

```{r eval=FALSE}
tbl_agency_amt|> 
  slice(1:15)
```

```{r}
#| eval: true
#| echo: false
tbl_agency_amt|> 
  slice(1:15) |> 
  munge_data() |> 
  gt()
```

Next lets lump together low frequency agencies (any agency not in the top 15) and visualize the totals.

```{r}
tbl_agency_amt |>
  mutate(
    agency_group = fct_lump(
      name_agency_cgac_award,
      n = 15,
      w = amount,
      other_level = "ALL OTHER AGENCIES"
    )
  ) |>
  group_by(agency_group) |>
  summarise(amount = sum(amount), .groups = "drop") |>
  arrange(desc(amount)) |> 
  mutate(agency_group = fct_reorder(agency_group, amount))|> 
  ggplot(aes(x = agency_group, y = amount)) +
  geom_bar(
    stat = "identity",
    fill = "#f68060",
    alpha = .6,
    width = .4
  ) +
  coord_flip() +
  xlab("") +
  scale_y_continuous(labels = scales::dollar, n.breaks = 5) +
  geom_text(
    aes(label = paste("$", round(amount / 1000000, digits = 2), "M")),
    position = position_dodge(width = 3.9),

    check_overlap = T,
    size = 2.5,
  ) +
  ggtitle("Top SBIR Phase III Awarding Agencies") +
  labs(subtitle = "By Lumped Government-wide Accounting Agency [CGAC] - Top 15",
       x = "") +
  theme_ipsum() +
  theme(
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_blank(),
    legend.position = "none",
    text = element_text(size = 8,  family = "serif")
  ) 
```

#### Explore Every SBIR Awarding Office

Lets take a look at all the awarding offices of SBIR Phase III and put it into an interactive table.

```{r}
tbl_office <-
  df_all|> 
  filter(!is.na(name_office_award))|> 
  group_by(name_department_award,
           name_agency_cgac_award,
           name_office_award)|> 
  summarise(
    date_first = min(date_obligation, na.rm = T),
    date_recent = max(date_obligation, na.rm = T),
    distinct_vendors = n_distinct(id_unique_entity, na.rm = T),
    amount = sum(amount_obligation, na.rm = T),
    amount_mean = mean(amount_obligation),
    actions = n(),
    contracts = n_distinct(id_contract_analysis, na.rm = T),
    .groups = "drop"
  )|> 
  filter(amount > 0)|> 
  arrange(desc(amount))
```

```{r}
table_office <- tbl_office|> 
  reactable(
    filterable = F,
    resizable = T,
    searchable = T,
    showPageSizeOptions = T,
    pageSizeOptions = c(5, 10, 20),
    sortable = T,
    compact = T,
    columns = list(
      date_first = colDef(name = "First Award"),
      date_recent = colDef(name = "Recent Award"),
      name_department_award = colDef(
        name = "Department",
        sortable = T,
        filterable = T
      ),
      name_agency_cgac_award = colDef(
        name = "Agency",
        sortable = T,
        filterable = T
      ),
      name_office_award = colDef(
        name = "Office",
        sortable = T,
        filterable = T
      ),
      distinct_vendors = colDef(name = "Unique Vendors"),
      amount = colDef(
        name = "$ Total",
        format = colFormat(
          prefix = "$",
          separators = TRUE,
          digits = 0
        )
      ),
      amount_mean = colDef(
        name = "$ Mean",
        format = colFormat(
          prefix = "$",
          separators = TRUE,
          digits = 0
        )
      ),
      contracts = colDef(name = "Unique Contracts"),
      actions = colDef(name = "Contract Actions")
    )
  )
```

```{r}
#| echo: false
table_office
```

#### Top 20 SBIR Phase III Awardees

Who are the top 20 SBIR Phase III awardees by amount?

```{r}
tbl_vendors <-
  df_all|> 
  filter(!is.na(id_unique_entity))|> 
  group_by(id_unique_entity)|> 
  summarise(
    vendor = name_vendor[which.max(amount_obligation)],
    amount = sum(amount_obligation, na.rm = T),
    count_distinct_contracts = n_distinct(id_contract_analysis, na.rm = T),
    amount_mean = mean(amount_obligation),
    date_first = min(date_obligation, na.rm = T),
    date_recent = max(date_obligation, na.rm = T),
    count_distinct_agencies = n_distinct(id_cgac_award, na.rm = T),
    count_departments = n_distinct(id_department_award),
    count_offices = n_distinct(name_office_award, na.rm = T),
    count_contract_actions = n(),
    .groups = "drop"
  )|> 
  filter(amount > 0)|> 
  arrange(desc(amount))|> 
  mutate(
    year_recent = year(date_recent),
    year_first = year(date_first),
    .before = "date_first"
  )
```

Here are the top 20

```{r}
#| eval: false
tbl_vendors|> 
  slice(1:20)|> 
  munge_data()|> 
  gt()
```

```{r}
#| echo: false
tbl_vendors|> 
  slice(1:20)|> 
  munge_data()|> 
  gt()
```

Notice anything interesting?

What would some possible next steps be to better understand strange entity we discovered?

### Part V: The 2023 Defense Budget

Finally lets explore the 2023 Department of Defense Budget.

#### Acquire 2023 Budget Data

Here is how to acquire the data.

```{r}
#| tidy: true
df_budget <- dod_years_budgets(budget_years = 2023, snake_names = T)
```

```{r}
#| include: false

df_budget <- df_budget |> mutate(name_agency_cgac = case_when(
  slug_organization %in% c("SPACE FORCE", "AIR FORCE", "AF") ~ "DEPARTMENT OF THE AIR FORCE",
  slug_organization %in% c("NAVY") ~ "DEPARTMENT OF THE NAVY",
  slug_organization %in% c("ARMY") ~ "DEPARTMENT OF THE ARMY",
  slug_organization %in% c("DEFW") ~ "DEPARTMENT OF DEFENSE",
  TRUE ~ name_agency_cgac
))

```

Lets take a look at it.

```{r}
skim(df_budget)
```

#### Macro Exploration of the Budget

##### Breakdown by Agency

```{r eval = F}
df_budget|> 
  group_by(name_agency_cgac) |> 
  summarise(amount = sum(amount_item, na.rm = T),
            pct_total = amount / sum(amount),
            count_distinct_elements = n_distinct(code_program_element), .groups = "drop") |> 
  mutate(pct_total = amount / sum(amount)) |> 
  arrange(desc(amount))
```

```{r echo=FALSE}
df_budget |>
  group_by(name_agency_cgac) |>
  summarise(
    amount = sum(amount_item, na.rm = T),
    count_distinct_elements = n_distinct(code_program_element),
    .groups = "drop"
  ) |>
  mutate(pct_total = amount / sum(amount)) |> 
  arrange(desc(amount)) |>
  munge_data() |>
  gt::gt()
```

Now let's take a look at breakdown by Agency and Budget Group:

```{r eval = FALSE}
df_budget |>
  group_by(name_agency_cgac, name_dod_budget_group) |>
  summarise(
    amount = sum(amount_item, na.rm = T),
    count_distinct_elements = n_distinct(code_program_element),
    .groups = "drop"
  ) |>
  mutate(pct_total = amount / sum(amount)) |> 
  arrange(desc(amount))
```

```{r echo = FALSE}
df_budget |>
  group_by(name_agency_cgac, name_dod_budget_group) |>
  summarise(
    amount = sum(amount_item, na.rm = T),
    count_distinct_elements = n_distinct(code_program_element),
    .groups = "drop"
  ) |>
  mutate(pct_total = amount / sum(amount)) |> 
  arrange(desc(amount)) |>
  munge_data() |>
  gt::gt()

```

#### Which Items Are We Buying

Now lets take a look at what specific items we are looking to acquire.

```{r}
tbl_items <-
  df_budget|> 
  filter(amount_unit_cost > 0, count_item > 0)|> 
  select(
    count_item,
    amount_unit_cost,
    name_program_element_actual,
    name_agency_cgac,
    slug_organization_account
  )|> 
  distinct()
```

Lets build an interactive chart that does that.

```{r eval=F}
hchart(
  tbl_items,
  "scatter",
  hcaes(
    x = count_item,
    y = amount_unit_cost,
    group = name_agency_cgac,
    name = name_program_element_actual
  ),
  marker = list(radius = 3, symbol = 'circle'),
  regression = TRUE
) |>
  hc_title(text = "2023 Defense Budget Requested Items") |>
  hc_xAxis(title = list(text = "# Items(log10 transformed)"),
           type = "logarithmic") |>
  hc_yAxis(title = list(text = "$ Amount Per Unit (log 10 transformed)"),
           type = "logarithmic") |>
  hc_tooltip(
    table = TRUE,
    outside = TRUE,
    shared = TRUE,
    useHTML = TRUE,
    headerFormat = "<small>{point.key}</small><table>",
    pointFormat = str_c(
      "<tr><td style=\"color: {series.color}\">{series.name}: </td>",
      "<tr><td style=\"text-align: right\"><b>Items: {point.x:,.0f}</b></td></tr>",
      "<tr><td style=\"text-align: right\"><b>$ Per item: {point.y:,0.f}</b></td></tr>",
      "<td style=\"text-align: right\"></td>"
    ),
    style = list(fontSize = "0.7em")
  ) |>
  hc_colors(c("#d35400", "#2980b9", "#2ecc71", "black")) |>
  hc_add_dependency("plugins/highcharts-regression.js") |>
  hc_credits(text = "Data from 2023 Defense Budget Request via govrtrackr")
```

```{r}
#| echo: false
#| warning: false
#| message: false
#| results: hide

hc_items <- tbl_items |> asbviz::hc_xy(
  x = "count_item",
  y = "amount_unit_cost",
  group = "name_agency_cgac",
  name = "name_program_element_actual",
  transformations = c("log_x", "log_y", "mean_x", "mean_y"),
  theme_name = "clean_unica",
  color_palette = "pals::coolwarm",
  fits = c("lm"),
  title = "2023 Defense Budget Requested Items by Agency",
  credits = "Data from 2023 Defense Budget Request via govrtrackr",
  override_x_text = list(text = "# Items(log10 transformed)"),
  override_y_text = list(text = "$ Amount Per Unit (log 10 transformed)"),
  override_y_label = list(formatter = asbviz::hc_axis_dollar_format_millions())
)
```

```{r hc_items}
#| echo: false  
#| fig-align: center
#| fig-width: 12
#| fig-height: 12

hc_items
```

#### 2023 Budget Request

Lets see if we can explore the whole budget interactively.

To do that we will build a hierarchical sunburst visualization.

First lets get the data we need.

```{r, eval=FALSE}
treemap_columns <-
  c(
    "name_dod_budget_group",
    "name_agency_cgac",
    "slug_organization_account",
    "name_account_omb_clean",
    "name_budget_parent",
    "name_budget_activity",
    "name_program_element"
  )

tbl_treemap <- df_budget|> 
  count(
    !!!syms(treemap_columns),
    wt = amount_item,
    name = "amount",
    sort = T
  )|> 
  filter(amount > 0)
```

Now we can build the interactive sunburst chart.

```{r eval = F}
tm <-
  tbl_treemap|> 
  treemap(
    index = treemap_columns,
    vSize = "amount",
    draw = F,
    type = "index"
  )

tm_nest <- d3_nest(tm$tm[, c(treemap_columns,
                             "vSize",
                             "color")],
                   value_cols = c("vSize", "color"))

sun <- sund2b(
  tm_nest,
  colors = htmlwidgets::JS(# yes this is a little different, so please pay attention
    #  "function(d) {return d.color}" will not work
    "function(name, d){return d.color || '#ccc';}"),
  valueField = "vSize",
  elementId = "my-sunburst"
)
```

Lets See how it looks:

```{r}
#| eval: true
#| echo: false
#| message: false
#| results: hide

sun <- df_budget |> 
  asbtools::tbl_summarise(
    group_variables = c(
      "name_dod_budget_group",
      "name_agency_cgac",
      "slug_organization_account",
      "name_account_omb_clean",
      "name_budget_parent",
      "name_program_element"
    ), 
    amount_variables = "amount_item"
  ) |> 
  rename(amount = amount_item_total) |> 
  filter(amount > 0) |> 
  asbviz::d3_sunburst(
    index_columns =
      c(
        "name_dod_budget_group",
      "name_agency_cgac",
      "slug_organization_account",
      "name_account_omb_clean",
      "name_budget_parent",
      "name_program_element"
      ),
   size = "amount" ,
   plot_title = "2023 Defense Budget Request Sunburst Chart",
   color_palette = "viridis::magma",
    extra_css = T,
    font_size = 8,
    height = 800,
    content_css = "
   text-overflow: ellipsis;
   font-size: 10px;
   font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;
 font-kerning: auto;
 kerning: auto;
   ",
 width = 800,
 label_css = "
   font-size: 10px;
   text-justify: inter-word;
 text-overflow: clip;
 text-justify: none;
 white-space: pre;
 font-weight: 900;
   ",
 chart_advanced_css = "
 width: 100%;
 height: 100%;
 align-content: center;

 ",
 bread_tooltip = NULL,
 bread_html = NULL
  )

```


```{r}
#| echo: false
#| fig-align: center
#| fig-width: 10
#| fig-height: 12
sun
```

2025, SHELDON