Introduction to govtrackR and government data analysis.
Data Rights
Contract Number: N6833522C0500 Contractor Name: P.W. Communications, Inc. Contractor Address: 11200 Rockville Pike Suite 130 Rockville, MD 20852 Expiration of Data Rights Period: January 16, 2029
The Government’s rights to use, modify, reproduce, release, perform, display, or disclose technical data or computer software marked with this legend are restricted during the period shown as provided in paragraph (b)(4) of the Rights in Noncommercial Technical Data and Computer Software Small Business Innovation Research (SBIR) Program clause contained in the above identified contract. No restrictions apply after the expiration date shown above. Any reproduction of technical data, computer software, or portions thereof marked with this legend must also reproduce the markings.
What is the Breakdown Between Products and Services?
What is the breakdown between these two groups?
Code
gg_psc_bkd<-tbl_psc|>count(type_product_service, sort =T, name ="count")|>mutate(type_product_service =fct_reorder(type_product_service, count))|>ggplot(aes(x =type_product_service, y =count, fill =type_product_service))+geom_bar(stat ="identity", position ="dodge")+scale_fill_viridis(discrete =TRUE, name ="")+theme_ipsum()+ylab("Number of Product Service Codes")+xlab("")+ggtitle("Products Versus Services")+coord_flip()
Interesting. There are 739 total product codes and 1795 service codes.
Another way to think about this is that for every product there are 2.43. Is this a meaningful insight?
Frequency of PSC Additions
Accurately describing products and services is important for transparency, vendor discovery, understanding who inside of government does what and so much more. As economies and technologies change it is important that procurement dictonaries keep up with the times.
Lets try to understand this a bit more.
The first step in doing this is seeing the dates where the GSA updated the PSC dictionary.
Lets explore this changes interactively interactively
Code
hc_new_psc<-hchart(tbl_new_codes,"line",hcaes(x =date_start, y =count_added, group =type_product_service))|>hc_title(text ="New Product Service Codes by Date Added")|>hc_yAxis(title =list(text ="# Product Services Codes Added"))|>hc_xAxis(title =list(text ="Date Added"))|>hc_credits(title =list(text ="Data from GSA/acquisition.gov via govtrackR"))hc_add_theme(hc_theme_hcrt())
Part II: FPDS CSV
This section explores fpds_csv which provides real time access into the FPDS csv interface.
gg_area_pyro<-tbl_by_day|>ggplot(aes(date_obligation, amount_cumulative))+geom_area(fill ="#22908C", alpha =.5)+scale_fill_viridis(discrete =TRUE)+scale_y_continuous(labels =scales::dollar)+scale_x_date()+theme(legend.position ="none")+theme_ipsum()+labs(title ="Cumulative Federal Procurement on Visible and Invisible Light Communication Equipments", x ="Date", y ="Cumulative Procurement Spend")
INTERNATIONAL BOUNDARY AND WATER COMMISSION: U.S.-MEXICO
$5,574
AGENCY FOR INTERNATIONAL DEVELOPMENT
$4,720
Now we can visualize it.
Code
tbl_depts|>mutate( name_department_award =fct_reorder(name_department_award, amount), amount_millions =amount/1000000)|>ggplot(aes(x =name_department_award, y =amount_millions))+geom_segment(aes( x =name_department_award , xend =name_department_award, y =0, yend =amount_millions), color ="grey")+geom_point(size =3, color ="#69b3a2")+coord_flip()+theme_ipsum()+theme( legend.position ="none", panel.grid.minor.y =element_blank(), panel.grid.major.y =element_blank(), text =element_text(size =7))+xlab("")+scale_y_log10(labels =scales::dollar, n.breaks =10)+labs(title ="Which Departments Purchased the Visible and Invisible Light Communication Equipment?", x ="", y ="Procurement $ in milions (log10 transformed)")
Procurement Seasonality?
Often products and services tend to be procured in September at the end of the budget year, does this product exhibit that trend?
Lets build a table that gives us the inputs
Code
govt_months<-c("Oct","Nov","Dec","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep")tbl_monthly_pyro<-tbl_crypto|>mutate(month_obligation =lubridate::month(date_obligation, label =T))|>count(year_fiscal_obligation,month_obligation, wt =amount_obligation, name ="amount")## Set the factor levels to budget monthstbl_monthly_pyro<-tbl_monthly_pyro|>mutate(month_obligation =factor(month_obligation, levels =govt_months, ordered =T))
Lets turn this into an interactive heatmap.
Code
fntltp<-JS("function(){ return this.point.x + ' ' + this.series.yAxis.categories[this.point.y] + ': ' + Highcharts.numberFormat(this.point.value, 2);}")hc_pyro_hm<-hchart(tbl_monthly_pyro,"heatmap",hcaes(x =year_fiscal_obligation, y =month_obligation, value =amount))|>hc_colorAxis( stops =color_stops(20, colors =scales::viridis_pal(option ="B")(20)),# fuerza a utilzar mejor el espectro de colores para que HJS no amplie el# eje para tener numero "redondos startOnTick =FALSE, endOnTick =FALSE, reversed =T)|>hc_yAxis( title =list(text =""), reversed =TRUE, offset =-20, tickLength =0, gridLineWidth =0, minorGridLineWidth =0, labels =list(style =list(fontSize ="9px")))|>hc_tooltip(formatter =fntltp)|>hc_title(text ="Spend by Month and Fiscal Year")|>hc_legend( layout ="horizontal", verticalAlign ="top", align ="left", valueDecimals =0)|>hc_add_theme(hc_theme_darkunica())
Deepive: Top Visible and Invisible Light Communication Equipment Vendors
Next lets take a look at the companies that supply this product.
First thing we want to do is build a summary table that lets us explore the data.
One of the biggest benefits of fpds_atom is that it gives access to procurement text descriptions. Lets take a look at them for Anduril and see what 2 sets of words show up together and visualize them in a network graph.
Code
stop_words<-get_stopwords(source ="smart")tidy_ngram<-tbl_anduril|>select(id_contract_analysis,description_obligation,amount_obligation)|>unnest_tokens(bigram,description_obligation, token ="ngrams", n =2)
Next we can filter out stop words and build our bigram count table.
SR3: $26,021,110,224 procured between 1992-07-29 and 2023-09-11 across 66,268 actions amongst 16,026 contracts allocated to 2,562 distinct UEI and 2,561 distinct parent UEI
Lets explore which departments have awarded FPDS Phase IIIs since the year 2000.
Code
tbl_by_dept<-df_all|>group_by(year_budget,name_department_award)|>summarise(amount =sum(amount_obligation), .groups ="drop")|>filter(amount>0, year_budget>=2000)|>mutate( department_group =fct_lump(name_department_award, n =5, w =amount, other_level ="ALL OTHER DEPARTMENTS"))|>group_by(year_budget, department_group)|>summarise(amount =sum(amount), .groups ="drop")|>mutate(department_group =fct_reorder(department_group, -amount))
Now we can build an interactive streamgraph with this information.
Code
tbl_by_dept|>hchart("streamgraph",hcaes(year_budget, amount, group =department_group))|>hc_yAxis(visible =T, startOnTick =FALSE, endOnTick =FALSE)|>hc_title(text ="SBIR Phase III Award $ by Department Group Since 2000")|>hc_subtitle(text ="Top 5 Awarding Departments, Remainder Lumped.")|>hc_caption(text ="Data from fpds via govtrackR")
hchart(tbl_items,"scatter",hcaes( x =count_item, y =amount_unit_cost, group =name_agency_cgac, name =name_program_element_actual), marker =list(radius =3, symbol ='circle'), regression =TRUE)|>hc_title(text ="2023 Defense Budget Requested Items")|>hc_xAxis(title =list(text ="# Items(log10 transformed)"), type ="logarithmic")|>hc_yAxis(title =list(text ="$ Amount Per Unit (log 10 transformed)"), type ="logarithmic")|>hc_tooltip( table =TRUE, outside =TRUE, shared =TRUE, useHTML =TRUE, headerFormat ="<small>{point.key}</small><table>", pointFormat =str_c("<tr><td style=\"color: {series.color}\">{series.name}: </td>","<tr><td style=\"text-align: right\"><b>Items: {point.x:,.0f}</b></td></tr>","<tr><td style=\"text-align: right\"><b>$ Per item: {point.y:,0.f}</b></td></tr>","<td style=\"text-align: right\"></td>"), style =list(fontSize ="0.7em"))|>hc_colors(c("#d35400", "#2980b9", "#2ecc71", "black"))|>hc_add_dependency("plugins/highcharts-regression.js")|>hc_credits(text ="Data from 2023 Defense Budget Request via govrtrackr")
2023 Budget Request
Lets see if we can explore the whole budget interactively.
To do that we will build a hierarchical sunburst visualization.
First lets get the data we need.
Code
treemap_columns<-c("name_dod_budget_group","name_agency_cgac","slug_organization_account","name_account_omb_clean","name_budget_parent","name_budget_activity","name_program_element")tbl_treemap<-df_budget|>count(!!!syms(treemap_columns), wt =amount_item, name ="amount", sort =T)|>filter(amount>0)
Now we can build the interactive sunburst chart.
Code
tm<-tbl_treemap|>treemap( index =treemap_columns, vSize ="amount", draw =F, type ="index")tm_nest<-d3_nest(tm$tm[, c(treemap_columns,"vSize","color")], value_cols =c("vSize", "color"))sun<-sund2b(tm_nest, colors =htmlwidgets::JS(# yes this is a little different, so please pay attention# "function(d) {return d.color}" will not work"function(name, d){return d.color || '#ccc';}"), valueField ="vSize", elementId ="my-sunburst")
Lets See how it looks:
2023 Defense Budget Request Sunburst Chart
Source Code
---title: "govtrackR"author: SHELDONdate: "`r Sys.Date()`"editor: sourcetoc: truetoc-depth: 5code-copy: truecode-tools: truecode-fold: showfig-align: centercategories: [govtrackR, data analysis]search: falsepage-layout: fulltitle-block-banner: falseself-contained: falseexecute: echo: false eval: true warning: false message: falseformat: html: categories: [govtrackR, data analysis] subtitle: "Using govtrackR" abstract: "Introduction to govtrackR and government data analysis." abstract-title: "govtrackR" anchor-sections: true smooth-scroll: true---### Data Rights```{r}#| label: sheldon-data-rightshtmltools::HTML(sheldon::sbir_data_rights())``````{r setup, include=FALSE}knitr::opts_chunk$set(echo =TRUE,eval = T,results ='asis',fig.retina =4,warning = F,message = F,fig.width =12,fig.height =8)```### Load Packages```{r load_libs}library(lubridate)library(tidyverse)library(tidytext)library(govtrackR)library(scales)library(ggtext)library(gganimate)library(hrbrthemes)library(rtemis)library(viridis)library(ggrepel)library(highcharter)library(tidylo)library(widyr)library(gt)library(tidygraph)library(ggraph)library(igraph)library(d3r)library(treemap)library(sunburstR)library(reactable)library(skimr)library(glue)options(highcharter.theme =hc_theme_hcrt(tooltip =list(valueDecimals =2)))```### Part I: Product Service Codes [PSC]Lets explore PSC's to help us better understand the products and services procured by the government.```{r pull_psc}tbl_psc <-dictionary_psc_active(only_active = T, snake_names = T)```#### Table to Explore All Active PSCsHere is an interactive table that lets us engage with the active PSCs.```{r}psc_tbl <- tbl_psc |>select( is_active_psc, type_product_service, name_solicitation_group, code_product_service, name_product_service, date_start, date_end, details_product_service_includes, details_product_service_notes ) |>reactable(filterable = T,resizable = T,searchable = T,showPageSizeOptions = T,defaultPageSize =4,defaultColDef =colDef(align ="center",vAlign ="center",minWidth =175 ),pageSizeOptions =c(5, 10, 20),sortable = T,compact = T,columns =list(name_product_service =colDef(minWidth =300),details_product_service_includes =colDef(minWidth =500),details_product_service_notes =colDef(minWidth =500) ) )```Here it is```{r psc_table}#| echo: falsepsc_tbl```#### What is the Breakdown Between Products and Services?What is the breakdown between these two groups?```{r psc_bkdown}gg_psc_bkd <- tbl_psc |>count(type_product_service, sort = T, name ="count") |>mutate(type_product_service =fct_reorder(type_product_service, count)) |>ggplot(aes(x = type_product_service, y = count, fill = type_product_service)) +geom_bar(stat ="identity", position ="dodge") +scale_fill_viridis(discrete =TRUE, name ="") +theme_ipsum() +ylab("Number of Product Service Codes") +xlab("") +ggtitle("Products Versus Services") +coord_flip()``````{r munge_psc_count}#| echo: falsetbl_count_psc <- tbl_psc |>count(type_product_service, sort = T, name ="count")prod <- tbl_count_psc |>filter(type_product_service =="PRODUCT") |>pull(count)service <- tbl_count_psc |>filter(type_product_service !="PRODUCT") |>pull(count)service_ratio <-round(service/prod, 2)``````{r show_psc_bk}#| echo: falsegg_psc_bkd```Interesting. There are `r prod` total product codes and `r service` service codes.Another way to think about this is that for every product there are `r service_ratio`. Is this a meaningful insight?#### Frequency of PSC AdditionsAccurately describing products and services is important for transparency, vendor discovery, understanding who inside of government does what and so much more. As economies and technologies change it is important that procurement dictonaries keep up with the times.Lets try to understand this a bit more.The first step in doing this is seeing the dates where the GSA updated the PSC dictionary.```{r new_psc_dates}tbl_new_codes <- tbl_psc|>group_by(type_product_service, date_start)|>summarise(count_added =n(), .groups ="drop")``````{r}tbl_new_codes |>reactable(filterable = T,resizable = T,searchable = T,showPageSizeOptions = T,defaultPageSize =4,pageSizeOptions =c(5, 10, 20),sortable = T,compact = T )```Lets explore this changes interactively interactively```{r hc_new_psc, eval=FALSE}hc_new_psc <-hchart(tbl_new_codes,"line",hcaes(x = date_start, y = count_added, group = type_product_service))|>hc_title(text ="New Product Service Codes by Date Added")|>hc_yAxis(title =list(text ="# Product Services Codes Added"))|>hc_xAxis(title =list(text ="Date Added")) |>hc_credits(title =list(text ="Data from GSA/acquisition.gov via govtrackR"))hc_add_theme(hc_theme_hcrt())``````{r}#| echo: false#| warning: false#| message: false#| results: hidehc_new_codes <- tbl_new_codes |> asbviz::hc_xy(type ="spline",x ="date_start",y ="count_added",group ="type_product_service",use_navigator =TRUE,override_x_text =list(text ="Date Added"),override_y_text =list(text ="# Product Services Codes Added"),override_y_label =list(formatter =JS("function(){return '' + Highcharts.numberFormat(this.value,0, '.', ',');}" ) ),credits ="Data from GSA/acquisition.gov via govtrackR",theme_name ="better_unica",color_palette ="pals::coolwarm",title ="New Product Service Codes by Date Added",use_range_selector = T)``````{r hc_new_codes}#| echo: false #| fig-align: center#| fig-width: 12#| fig-height: 12hc_new_codes```### Part II: FPDS CSVThis section explores `fpds_csv` which provides real time access into the <a href="https://www.fpds.gov/ezsearch/search.do?indexName=awardfull&templateName=1.5.1&s=FPDS.GOV&q=Google-like+search+to+help+you+find+federal+contracts..." target="_blank">FPDS</a> csv interface.```{r eval = T}#| tidy: true#| sanitize: trueargs(fpds_csv)```#### Acquire All Procurement of Visible and Invisible Light Communication EquipmentLets take advantage of the function to acquire all these historic product procurement.```{r fpds_crypto}tbl_crypto <-fpds_csv(product_or_service_code ="5850", snake_names = T)``````{r fpds_fix}#| include: falsetbl_crypto <- tbl_crypto |>mutate(name_department_award =coalesce(name_department_award, "UNKNOWN AWARDING DEPARTMENT"),name_agency_award =coalesce(name_agency_award, "UNKNOWN AWARDING AGENCY"),name_agency_cgac_award =coalesce(name_agency_cgac_award, "UNKNOWN AWARDING AGENCY"))total_spend <- tbl_crypto |>pull(amount_obligation) |>sum() |> formattable::comma()```#### What is the Total Spend?How much has been spent on this product?```{r}#| eval: falsetbl_crypto |>summarise(amount_obligation =sum(amount_obligation))``````{r}#| echo: falsetbl_crypto |>summarise(amount_obligation =sum(amount_obligation)) |>munge_data() |> gt::gt()```#### Spend Over Time?```{r pyro_tbl}tbl_by_day <- tbl_crypto|>group_by(date_obligation)|>summarise(amount =sum(amount_obligation), .groups ="drop")```Lets take a look at the data```{r}tbl_by_day|>sample_n(3)|>munge_data()|>gt()```#### Explore Total Outlays by Day```{r eval = F}tbl_by_day|>hchart("line",hcaes(x = date_obligation, y = amount))|>hc_title(text ="Visible and Invisible Light Communication Equipment Procurement Total Daily Spend")|>hc_yAxis(title =list(text ="Amount Obligated"))|>hc_xAxis(title =list(text ="Date Added"))|>hc_add_theme(hc_theme_elementary())``````{r}#| echo: false#| warning: false#| message: false#| results: hidehc_day_line <- tbl_by_day |> asbviz::hc_xy(type ="line",x ="date_obligation",y ="amount",override_x_text =list(text ="Date"),override_y_text =list(text ="Total Procurement $"),override_y_label =list(formatter =JS("function(){return '$' + Highcharts.numberFormat(this.value, 0, '.', ',');}" ) ),use_navigator =TRUE,credits ="Data from fpds via govtrackR",theme_name ="clean_unica",color_palette ="pals::coolwarm",point_size =1,disable_legend =TRUE,title ="Visible and Invisible Light Communication Equipment Procurement Total Daily Spend" )``````{r hc_day_line}#| echo: false #| fig-align: center#| fig-width: 12#| fig-height: 12hc_day_line```#### Cumulative Spend?Lets add a cumulative total```{r}tbl_by_day <- tbl_by_day |>mutate(amount_cumulative =cumsum(amount))```Now lets explore it statically```{r}gg_area_pyro <- tbl_by_day |>ggplot(aes(date_obligation, amount_cumulative)) +geom_area(fill ="#22908C", alpha = .5) +scale_fill_viridis(discrete =TRUE) +scale_y_continuous(labels = scales::dollar) +scale_x_date() +theme(legend.position ="none") +theme_ipsum() +labs(title ="Cumulative Federal Procurement on Visible and Invisible Light Communication Equipments", x ="Date", y ="Cumulative Procurement Spend")``````{r}#| echo: falsegg_area_pyro```#### Spend by DepartmentLets take a look at spend by department.```{r}tbl_depts <- tbl_crypto |>group_by(name_department_award) |>summarise(amount =sum(amount_obligation),.groups ="drop") |>arrange(desc(amount))``````{r echo=FALSE}tbl_depts|>munge_data()|>gt()```Now we can visualize it.```{r}tbl_depts |>mutate(name_department_award =fct_reorder(name_department_award, amount),amount_millions = amount /1000000 ) |>ggplot(aes(x = name_department_award, y = amount_millions)) +geom_segment(aes(x = name_department_award ,xend = name_department_award,y =0,yend = amount_millions ),color ="grey" ) +geom_point(size =3, color ="#69b3a2") +coord_flip() +theme_ipsum() +theme(legend.position ="none",panel.grid.minor.y =element_blank(),panel.grid.major.y =element_blank(),text =element_text(size =7) ) +xlab("") +scale_y_log10(labels = scales::dollar, n.breaks =10) +labs(title ="Which Departments Purchased the Visible and Invisible Light Communication Equipment?",x ="",y ="Procurement $ in milions (log10 transformed)")```#### Procurement Seasonality?Often products and services tend to be procured in September at the end of the budget year, does this product exhibit that trend?Lets build a table that gives us the inputs```{r}govt_months <-c("Oct","Nov","Dec","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep" )tbl_monthly_pyro <- tbl_crypto|>mutate(month_obligation = lubridate::month(date_obligation, label = T))|>count(year_fiscal_obligation, month_obligation,wt = amount_obligation,name ="amount")## Set the factor levels to budget monthstbl_monthly_pyro <- tbl_monthly_pyro|>mutate(month_obligation =factor( month_obligation,levels = govt_months,ordered = T ))```Lets turn this into an interactive heatmap.```{r eval = FALSE}fntltp <-JS("function(){ return this.point.x + ' ' + this.series.yAxis.categories[this.point.y] + ': ' + Highcharts.numberFormat(this.point.value, 2);}")hc_pyro_hm <-hchart( tbl_monthly_pyro,"heatmap",hcaes(x = year_fiscal_obligation,y = month_obligation,value = amount) )|>hc_colorAxis(stops =color_stops(20, colors = scales::viridis_pal(option ="B")(20)),# fuerza a utilzar mejor el espectro de colores para que HJS no amplie el# eje para tener numero "redondosstartOnTick =FALSE,endOnTick =FALSE,reversed = T )|>hc_yAxis(title =list(text =""),reversed =TRUE,offset =-20,tickLength =0,gridLineWidth =0,minorGridLineWidth =0,labels =list(style =list(fontSize ="9px")) )|>hc_tooltip(formatter = fntltp)|>hc_title(text ="Spend by Month and Fiscal Year")|>hc_legend(layout ="horizontal",verticalAlign ="top",align ="left",valueDecimals =0 )|>hc_add_theme(hc_theme_darkunica())``````{r}#| echo: false#| message: false#| eval: true#| results: hidehc_pyro_hm_final <- tbl_monthly_pyro |> asbviz::hc_xy(x ="year_fiscal_obligation",y ="month_obligation",group ="amount",type ="heatmap",theme_name ="darkunica",override_x_text =list(text =""),override_y_text =list(text =""),title ="Visible and Invisible Light Communication Equipment Procurement<br>Spend by Month and Budget Year" )``````{r}#| echo: false#| fig-align: center#| fig-width: 12#| fig-height: 12hc_pyro_hm_final```#### Deepive: Top Visible and Invisible Light Communication Equipment VendorsNext lets take a look at the companies that supply this product.First thing we want to do is build a summary table that lets us explore the data.```{r}tbl_vendors <- tbl_crypto |>group_by(id_unique_entity) |>summarise(name_vendor = name_vendor[which.max(amount_obligation)],date_first_award =min(date_obligation, na.rm = T),date_recent_award =max(date_obligation, na.rm = T),count_actions =n(),count_contracts =n_distinct(id_contract_analysis, na.rm = T),amount_contracts =sum(amount_obligation),count_departments =n_distinct(name_department_award, na.rm = T),count_agencies =n_distinct(name_agency_cgac_award, na.rm = T) ) |>arrange(desc(amount_contracts))```Who are the top 10 vendors?```{r eval = F}tbl_vendors|>slice(1:10) ``````{r}#| echo: false#| eval: truetbl_vendors|>slice(1:10) |>munge_data() |>gt()```Lets filter our data to include only vendors with over $500,000 in obligations.```{r}tbl_vendors <- tbl_vendors|>filter(amount_contracts >500000)```Now lets take a look at the top vendors visually.```{r}gg_top_vendors_pyro <- tbl_vendors |>filter(amount_contracts >0) |>mutate(name_vendor_lumped = name_vendor |>fct_lump(10, w = amount_contracts, other_level ="ALL OTHER VENDORS") ) |>count(name_vendor_lumped, wt = amount_contracts, name ="amount") |>mutate(name_vendor_lumped =fct_reorder(name_vendor_lumped, amount)) |>ggplot(aes(x = name_vendor_lumped, y = amount)) +geom_bar(stat ="identity", fill ="#B71212") +coord_flip() +theme_ipsum() +theme(panel.grid.minor.y =element_blank(),panel.grid.major.y =element_blank(),legend.position ="none", ) +xlab("") +labs(title ="Top Visible and Invisible Light Communication Equipment Vendors") +scale_y_continuous(labels = scales::dollar, n.breaks =10)``````{r}#| echo: falsegg_top_vendors_pyro```### Part III: FPDS AtomNext we will explore FPDS atom, this provides the full interface into all the data contained in FPDS going back to 1978.This provides access to significantly more data but is extremely compute heavy and exponentially slower than `fpds_csv````{r}#| tidy: true#| sanitize: trueargs(fpds_atom)```#### Anduril ProcurementLets explore a vendor. To do this you can be specific and use a known Unique Entity Identifier [UEI] or input vendor name text.```{r}tbl_anduril <-fpds_atom(vendor_name ="ANDURIL",parse_contracts = T,snake_names = T )```Lets explore the data```{r}#| tidy: trueskim(tbl_anduril)```#### What Can We See from FPDS Atom?Here is a sample of the sorts of information that you can explore from FPDS Atom```{r}#| label: obj_sample#| echo: false#| fig-width: 10#| fig-height: 8out <- tbl_anduril |>filter(amount_obligation ==max(amount_obligation)) |> janitor::remove_empty("cols") |> jsonview::json_tree_view(scroll = T)widgetframe::frameWidget(out)```#### Entity ResolutionLets make sure we only have data from the Anduril we are looking for, Anduril Industries.```{r}tbl_count <- tbl_anduril |>group_by(name_vendor, id_unique_entity) |>summarise(count_actions =n(),distinct_contracts =n_distinct(id_contract_analysis),.groups ="drop" )``````{r}#| echo: falsegt(tbl_count)```Looks like there is another vendor that isn't who we are interested in. Lets exclude them.```{r}tbl_anduril <- tbl_anduril|>filter(id_unique_entity !="MMLFZGBVYMB8")```#### Anduril Agency Funding```{r}tbl_anduril_agency_awards <- tbl_anduril |>group_by(name_department_award, name_agency_award) |>summarise(amount =sum(amount_obligation, na.rm = T),distinct_contracts =n_distinct(id_contract_analysis), .groups ="drop") |>arrange(desc(amount))``````{r}#| echo: falsetbl_anduril_agency_awards |>munge_data() |>gt()```#### Anduril Award Description AnalysisOne of the biggest benefits of `fpds_atom` is that it gives access to procurement text descriptions. Lets take a look at them for Anduril and see what 2 sets of words show up together and visualize them in a network graph.```{r}stop_words <-get_stopwords(source ="smart")tidy_ngram <- tbl_anduril|>select(id_contract_analysis, description_obligation, amount_obligation)|>unnest_tokens(bigram, description_obligation,token ="ngrams",n =2)```Next we can filter out stop words and build our bigram count table.```{r}bigram_counts <- tidy_ngram|>separate(bigram, c("word1", "word2"), sep =" ")|>filter(!word1 %in% stop_words$word,!word2 %in% stop_words$word)|>count(word1, word2, sort =TRUE)```Lets take a look```{r}bigram_counts |>slice(1:12) |> gt::gt()```Lets explore this visuall with all incidences over 5.```{r}bigram_counts|>filter(!is.na(word1))|>filter(n >=5) |>graph_from_data_frame()|>ggraph(layout ="nicely") +geom_edge_link(aes(edge_alpha = n),show.legend =FALSE,arrow =arrow(length =unit(1.5, 'mm')),start_cap =circle(3, 'mm'),end_cap =circle(3, 'mm') ) +geom_node_text(aes(label = name)) +theme_graph() +labs(title ="Anduril FPDS Description Bigram (Minimum of 5 Bigram Counts)")```#### Anduril Trigram Award TotalsNext lets take a look at which 3 words show up together and how many contracts this and obligations they correspond to.First we build the ngram table```{r}tbl_ngrams <- tbl_anduril|>select(id_contract_analysis, description_obligation, amount_obligation)|>unnest_tokens(ngram, description_obligation, token ="ngrams", n =3)```Next we build the summary```{r} tbl_ngrams <- tbl_ngrams|>group_by(ngram)|>summarise(amount =sum(amount_obligation),count_contracts =n_distinct(id_contract_analysis) )|>arrange(desc(amount))|>munge_data()```Lets look at the top 15```{r eval = F}tbl_ngrams|>slice(1:15) |>gt()``````{r echo = FALSE}tbl_ngrams|>slice(1:15) |>filter(!is.na(ngram)) |>munge_data() |>gt()```### Part IV: Analyzing Phase III SBIRsThis next section shows you how to take advantage of a special `fpds_csv` wrapper that acquires SBIR/STTR procurement.#### Acquire All Phase III SBIR Procurements```{r eval = T}df_all <-fpds_research_csv(research_codes =c("SR3"),snake_names = T)```Lets look at the data ```{r}#| tidy: trueskim(df_all)```#### Aggregate Summary Data```{r eval = F}df_all |>summarise(amount =sum(amount_obligation, na.rm = T),count_distinct_contracts =n_distinct(id_contract_analysis, na.rm = T),count_contract_actions =n(),count_distinct_vendors =n_distinct(id_unique_entity, na.rm = T),.groups ="drop" )``````{r eval = T, echo=FALSE}df_all |>summarise(amount =sum(amount_obligation, na.rm = T),count_distinct_contracts =n_distinct(id_contract_analysis, na.rm = T),count_contract_actions =n(),count_distinct_vendors =n_distinct(id_unique_entity, na.rm = T),.groups ="drop" ) |>munge_data() |> gt::gt()```#### Which Departments Award SBIR Phase IIIsLets explore which departments have awarded FPDS Phase IIIs since the year 2000.```{r}tbl_by_dept <- df_all|>group_by( year_budget, name_department_award ) |>summarise(amount =sum(amount_obligation),.groups ="drop") |>filter(amount >0, year_budget >=2000)|>mutate(department_group =fct_lump( name_department_award,n =5,w = amount,other_level ="ALL OTHER DEPARTMENTS" ) )|>group_by(year_budget, department_group) |>summarise(amount =sum(amount),.groups ="drop") |>mutate(department_group =fct_reorder(department_group, -amount))```Now we can build an interactive streamgraph with this information.```{r eval = FALSE}tbl_by_dept |>hchart("streamgraph",hcaes(year_budget, amount, group = department_group)) |>hc_yAxis(visible = T,startOnTick =FALSE,endOnTick =FALSE) |>hc_title(text ="SBIR Phase III Award $ by Department Group Since 2000") |>hc_subtitle(text ="Top 5 Awarding Departments, Remainder Lumped.") |>hc_caption(text ="Data from fpds via govtrackR")``````{r}#| echo: false#| eval: true#| message: false#| warning: false#| results: hidehc_stream_p3 <- tbl_by_dept |>mutate(department_group =as.character(department_group)) |> asbtools::tbl_ordered_factor(columns ="department_group", weight ="amount", reverse = T) |> asbviz::hc_xy(x ="year_budget",y ="amount",group ="department_group",type ="streamgraph",theme_name ="clean_unica",point_size =0,point_width =0,disable_y = T,relevel_group = T,override_x_text =list(text =""),disable_x = F,invert_chart = F,color_palette ="viridis::viridis",title ="SBIR Phase III Award $ by Department Group Since 2000",subtitle ="Top 5 Awarding Departments, Remainder Lumped.",caption ="Data from fpds via govtrackR" )``````{r hc_stream_p3}#| echo: false #| fig-align: center#| fig-width: 12#| fig-height: 12hc_stream_p3```#### Phase IIIs by AgencyLets explore this data by agency.First lets see the raw breakdown by agency.```{r}tbl_agency_amt <- df_all |>filter(!is.na(name_agency_cgac_award)) |>group_by(name_agency_cgac_award) |>summarise(amount =sum(amount_obligation), .groups ="drop") |>arrange(desc(amount))```Lets Look at the top 15```{r eval=FALSE}tbl_agency_amt|>slice(1:15)``````{r}#| eval: true#| echo: falsetbl_agency_amt|>slice(1:15) |>munge_data() |>gt()```Next lets lump together low frequency agencies (any agency not in the top 15) and visualize the totals.```{r}tbl_agency_amt |>mutate(agency_group =fct_lump( name_agency_cgac_award,n =15,w = amount,other_level ="ALL OTHER AGENCIES" ) ) |>group_by(agency_group) |>summarise(amount =sum(amount), .groups ="drop") |>arrange(desc(amount)) |>mutate(agency_group =fct_reorder(agency_group, amount))|>ggplot(aes(x = agency_group, y = amount)) +geom_bar(stat ="identity",fill ="#f68060",alpha = .6,width = .4 ) +coord_flip() +xlab("") +scale_y_continuous(labels = scales::dollar, n.breaks =5) +geom_text(aes(label =paste("$", round(amount /1000000, digits =2), "M")),position =position_dodge(width =3.9),check_overlap = T,size =2.5, ) +ggtitle("Top SBIR Phase III Awarding Agencies") +labs(subtitle ="By Lumped Government-wide Accounting Agency [CGAC] - Top 15",x ="") +theme_ipsum() +theme(panel.grid.minor.y =element_blank(),panel.grid.major.y =element_blank(),legend.position ="none",text =element_text(size =8, family ="serif") ) ```#### Explore Every SBIR Awarding OfficeLets take a look at all the awarding offices of SBIR Phase III and put it into an interactive table.```{r}tbl_office <- df_all|>filter(!is.na(name_office_award))|>group_by(name_department_award, name_agency_cgac_award, name_office_award)|>summarise(date_first =min(date_obligation, na.rm = T),date_recent =max(date_obligation, na.rm = T),distinct_vendors =n_distinct(id_unique_entity, na.rm = T),amount =sum(amount_obligation, na.rm = T),amount_mean =mean(amount_obligation),actions =n(),contracts =n_distinct(id_contract_analysis, na.rm = T),.groups ="drop" )|>filter(amount >0)|>arrange(desc(amount))``````{r}table_office <- tbl_office|>reactable(filterable = F,resizable = T,searchable = T,showPageSizeOptions = T,pageSizeOptions =c(5, 10, 20),sortable = T,compact = T,columns =list(date_first =colDef(name ="First Award"),date_recent =colDef(name ="Recent Award"),name_department_award =colDef(name ="Department",sortable = T,filterable = T ),name_agency_cgac_award =colDef(name ="Agency",sortable = T,filterable = T ),name_office_award =colDef(name ="Office",sortable = T,filterable = T ),distinct_vendors =colDef(name ="Unique Vendors"),amount =colDef(name ="$ Total",format =colFormat(prefix ="$",separators =TRUE,digits =0 ) ),amount_mean =colDef(name ="$ Mean",format =colFormat(prefix ="$",separators =TRUE,digits =0 ) ),contracts =colDef(name ="Unique Contracts"),actions =colDef(name ="Contract Actions") ) )``````{r}#| echo: falsetable_office```#### Top 20 SBIR Phase III AwardeesWho are the top 20 SBIR Phase III awardees by amount?```{r}tbl_vendors <- df_all|>filter(!is.na(id_unique_entity))|>group_by(id_unique_entity)|>summarise(vendor = name_vendor[which.max(amount_obligation)],amount =sum(amount_obligation, na.rm = T),count_distinct_contracts =n_distinct(id_contract_analysis, na.rm = T),amount_mean =mean(amount_obligation),date_first =min(date_obligation, na.rm = T),date_recent =max(date_obligation, na.rm = T),count_distinct_agencies =n_distinct(id_cgac_award, na.rm = T),count_departments =n_distinct(id_department_award),count_offices =n_distinct(name_office_award, na.rm = T),count_contract_actions =n(),.groups ="drop" )|>filter(amount >0)|>arrange(desc(amount))|>mutate(year_recent =year(date_recent),year_first =year(date_first),.before ="date_first" )```Here are the top 20```{r}#| eval: falsetbl_vendors|>slice(1:20)|>munge_data()|>gt()``````{r}#| echo: falsetbl_vendors|>slice(1:20)|>munge_data()|>gt()```Notice anything interesting?What would some possible next steps be to better understand strange entity we discovered?### Part V: The 2023 Defense BudgetFinally lets explore the 2023 Department of Defense Budget.#### Acquire 2023 Budget DataHere is how to acquire the data.```{r}#| tidy: truedf_budget <-dod_years_budgets(budget_years =2023, snake_names = T)``````{r}#| include: falsedf_budget <- df_budget |>mutate(name_agency_cgac =case_when( slug_organization %in%c("SPACE FORCE", "AIR FORCE", "AF") ~"DEPARTMENT OF THE AIR FORCE", slug_organization %in%c("NAVY") ~"DEPARTMENT OF THE NAVY", slug_organization %in%c("ARMY") ~"DEPARTMENT OF THE ARMY", slug_organization %in%c("DEFW") ~"DEPARTMENT OF DEFENSE",TRUE~ name_agency_cgac))```Lets take a look at it.```{r}skim(df_budget)```#### Macro Exploration of the Budget##### Breakdown by Agency```{r eval = F}df_budget|>group_by(name_agency_cgac) |>summarise(amount =sum(amount_item, na.rm = T),pct_total = amount /sum(amount),count_distinct_elements =n_distinct(code_program_element), .groups ="drop") |>mutate(pct_total = amount /sum(amount)) |>arrange(desc(amount))``````{r echo=FALSE}df_budget |>group_by(name_agency_cgac) |>summarise(amount =sum(amount_item, na.rm = T),count_distinct_elements =n_distinct(code_program_element),.groups ="drop" ) |>mutate(pct_total = amount /sum(amount)) |>arrange(desc(amount)) |>munge_data() |> gt::gt()```Now let's take a look at breakdown by Agency and Budget Group:```{r eval = FALSE}df_budget |>group_by(name_agency_cgac, name_dod_budget_group) |>summarise(amount =sum(amount_item, na.rm = T),count_distinct_elements =n_distinct(code_program_element),.groups ="drop" ) |>mutate(pct_total = amount /sum(amount)) |>arrange(desc(amount))``````{r echo = FALSE}df_budget |>group_by(name_agency_cgac, name_dod_budget_group) |>summarise(amount =sum(amount_item, na.rm = T),count_distinct_elements =n_distinct(code_program_element),.groups ="drop" ) |>mutate(pct_total = amount /sum(amount)) |>arrange(desc(amount)) |>munge_data() |> gt::gt()```#### Which Items Are We BuyingNow lets take a look at what specific items we are looking to acquire.```{r}tbl_items <- df_budget|>filter(amount_unit_cost >0, count_item >0)|>select( count_item, amount_unit_cost, name_program_element_actual, name_agency_cgac, slug_organization_account )|>distinct()```Lets build an interactive chart that does that.```{r eval=F}hchart( tbl_items,"scatter",hcaes(x = count_item,y = amount_unit_cost,group = name_agency_cgac,name = name_program_element_actual ),marker =list(radius =3, symbol ='circle'),regression =TRUE) |>hc_title(text ="2023 Defense Budget Requested Items") |>hc_xAxis(title =list(text ="# Items(log10 transformed)"),type ="logarithmic") |>hc_yAxis(title =list(text ="$ Amount Per Unit (log 10 transformed)"),type ="logarithmic") |>hc_tooltip(table =TRUE,outside =TRUE,shared =TRUE,useHTML =TRUE,headerFormat ="<small>{point.key}</small><table>",pointFormat =str_c("<tr><td style=\"color: {series.color}\">{series.name}: </td>","<tr><td style=\"text-align: right\"><b>Items: {point.x:,.0f}</b></td></tr>","<tr><td style=\"text-align: right\"><b>$ Per item: {point.y:,0.f}</b></td></tr>","<td style=\"text-align: right\"></td>" ),style =list(fontSize ="0.7em") ) |>hc_colors(c("#d35400", "#2980b9", "#2ecc71", "black")) |>hc_add_dependency("plugins/highcharts-regression.js") |>hc_credits(text ="Data from 2023 Defense Budget Request via govrtrackr")``````{r}#| echo: false#| warning: false#| message: false#| results: hidehc_items <- tbl_items |> asbviz::hc_xy(x ="count_item",y ="amount_unit_cost",group ="name_agency_cgac",name ="name_program_element_actual",transformations =c("log_x", "log_y", "mean_x", "mean_y"),theme_name ="clean_unica",color_palette ="pals::coolwarm",fits =c("lm"),title ="2023 Defense Budget Requested Items by Agency",credits ="Data from 2023 Defense Budget Request via govrtrackr",override_x_text =list(text ="# Items(log10 transformed)"),override_y_text =list(text ="$ Amount Per Unit (log 10 transformed)"),override_y_label =list(formatter = asbviz::hc_axis_dollar_format_millions()))``````{r hc_items}#| echo: false #| fig-align: center#| fig-width: 12#| fig-height: 12hc_items```#### 2023 Budget RequestLets see if we can explore the whole budget interactively.To do that we will build a hierarchical sunburst visualization.First lets get the data we need.```{r, eval=FALSE}treemap_columns <-c("name_dod_budget_group","name_agency_cgac","slug_organization_account","name_account_omb_clean","name_budget_parent","name_budget_activity","name_program_element" )tbl_treemap <- df_budget|>count(!!!syms(treemap_columns),wt = amount_item,name ="amount",sort = T )|>filter(amount >0)```Now we can build the interactive sunburst chart.```{r eval = F}tm <- tbl_treemap|>treemap(index = treemap_columns,vSize ="amount",draw = F,type ="index" )tm_nest <-d3_nest(tm$tm[, c(treemap_columns,"vSize","color")],value_cols =c("vSize", "color"))sun <-sund2b( tm_nest,colors = htmlwidgets::JS(# yes this is a little different, so please pay attention# "function(d) {return d.color}" will not work"function(name, d){return d.color || '#ccc';}"),valueField ="vSize",elementId ="my-sunburst")```Lets See how it looks:```{r}#| eval: true#| echo: false#| message: false#| results: hidesun <- df_budget |> asbtools::tbl_summarise(group_variables =c("name_dod_budget_group","name_agency_cgac","slug_organization_account","name_account_omb_clean","name_budget_parent","name_program_element" ), amount_variables ="amount_item" ) |>rename(amount = amount_item_total) |>filter(amount >0) |> asbviz::d3_sunburst(index_columns =c("name_dod_budget_group","name_agency_cgac","slug_organization_account","name_account_omb_clean","name_budget_parent","name_program_element" ),size ="amount" ,plot_title ="2023 Defense Budget Request Sunburst Chart",color_palette ="viridis::magma",extra_css = T,font_size =8,height =800,content_css =" text-overflow: ellipsis; font-size: 10px; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif; font-kerning: auto; kerning: auto; ",width =800,label_css =" font-size: 10px; text-justify: inter-word; text-overflow: clip; text-justify: none; white-space: pre; font-weight: 900; ",chart_advanced_css =" width: 100%; height: 100%; align-content: center; ",bread_tooltip =NULL,bread_html =NULL )``````{r}#| echo: false#| fig-align: center#| fig-width: 10#| fig-height: 12sun```