Getting Started

Before running this notebook, select “Session > Restart R and Clear Output” in the menu above to start a new R session. This will clear any old data sets and give us a blank slate to start with.

After starting a new session, run the following code chunk to load the libraries and data that we will be working with today.

Exploring Richmond, Virginia Page

Start by constructing the URL to grab the text for the Wikipedia page named “Richmond,_Virginia” from the MediaWiki API.

# Question 01
url <- modify_url(
  "https://en.wikipedia.org/w/api.php",
  query = list(
    action = "parse", format = "json", redirects = TRUE,
    page = utils::URLdecode("Richmond,_Virginia")
  )
)

Now, make the API request and store the output as an R object called obj.

# Question 02
res <- dsst_cache_get(url, cache_dir = "cache")
obj <- content(res, type = "application/json")

Next, parse the XML/HTML text using the function read_html.

# Question 03
tree <- xml2::read_html(obj$parse$text[[1]])
tree
## {html_document}
## <html>
## [1] <body><div class="mw-parser-output">\n<div class="shortdescription no ...

Use the xml_find_all to find all of the nodes with the tag “h2”. Look at the Wikipedia page in a browser and try to find what these correspond to.

# Question 04
xml_find_all(tree, xpath = ".//h2")
## {xml_nodeset (17)}
##  [1] <h2>\n<span class="mw-headline" id="History">History</span><span cla ...
##  [2] <h2>\n<span class="mw-headline" id="Geography">Geography</span><span ...
##  [3] <h2>\n<span class="mw-headline" id="Demographics">Demographics</span ...
##  [4] <h2>\n<span class="mw-headline" id="Economy">Economy</span><span cla ...
##  [5] <h2>\n<span class="mw-headline" id="Arts_and_culture">Arts and cultu ...
##  [6] <h2>\n<span class="mw-headline" id="Sports">Sports</span><span class ...
##  [7] <h2>\n<span class="mw-headline" id="Parks_and_recreation">Parks and  ...
##  [8] <h2>\n<span class="mw-headline" id="Government">Government</span><sp ...
##  [9] <h2>\n<span class="mw-headline" id="Education">Education</span><span ...
## [10] <h2>\n<span class="mw-headline" id="Media">Media</span><span class=" ...
## [11] <h2>\n<span class="mw-headline" id="Infrastructure">Infrastructure</ ...
## [12] <h2>\n<span class="mw-headline" id="Sister_cities">Sister cities</sp ...
## [13] <h2>\n<span class="mw-headline" id="See_also">See also</span><span c ...
## [14] <h2>\n<span class="mw-headline" id="Notes">Notes</span><span class=" ...
## [15] <h2>\n<span class="mw-headline" id="References">References</span><sp ...
## [16] <h2>\n<span class="mw-headline" id="Further_reading">Further reading ...
## [17] <h2>\n<span class="mw-headline" id="External_links">External links</ ...

Try to use a new xpath argument below to extract just the name of each section. Turn it into an R string object using xml_text:

# Question 05
xml_text(xml_find_all(tree, xpath = ".//h2/span[@class='mw-headline']"))
##  [1] "History"              "Geography"            "Demographics"        
##  [4] "Economy"              "Arts and culture"     "Sports"              
##  [7] "Parks and recreation" "Government"           "Education"           
## [10] "Media"                "Infrastructure"       "Sister cities"       
## [13] "See also"             "Notes"                "References"          
## [16] "Further reading"      "External links"

Repeat the previous question for the tag “h3”. What are these?

# Question 06
xml_text(xml_find_all(tree, xpath = ".//h3/span[@class='mw-headline']"))
##  [1] "Colonial era"                                      
##  [2] "Revolution"                                        
##  [3] "Early United States"                               
##  [4] "American Civil War"                                
##  [5] "Postbellum"                                        
##  [6] "20th century"                                      
##  [7] "Cityscape"                                         
##  [8] "Climate"                                           
##  [9] "2020 census"                                       
## [10] "2010 Census"                                       
## [11] "Crime"                                             
## [12] "Religion"                                          
## [13] "Fortune 500 companies and other large corporations"
## [14] "Poverty"                                           
## [15] "Museums and monuments"                             
## [16] "Visual and performing arts"                        
## [17] "Literary arts"                                     
## [18] "Architecture"                                      
## [19] "Historic districts"                                
## [20] "Food"                                              
## [21] "Public schools"                                    
## [22] "Private schools"                                   
## [23] "Colleges and universities"                         
## [24] "Transportation"                                    
## [25] "Major highways"                                    
## [26] "Utilities"

Now, extract the paragraphs using the “p” tag.

# Question 07
xml_find_all(tree, xpath = "..//p")
## {xml_nodeset (127)}
##  [1] <p class="mw-empty-elt">\n\n</p>
##  [2] <p><b>Richmond</b> (<span class="rt-commentedText nowrap"><span clas ...
##  [3] <p>Richmond is located at the <a href="/wiki/Atlantic_Seaboard_fall_ ...
##  [4] <p>Richmond was an important village of the <a href="/wiki/Powhatan_ ...
##  [5] <p>Law, finance, and government primarily drive Richmond’s economy,  ...
##  [6] <p>After the first permanent English-speaking settlement was establi ...
##  [7] <p>In 1611, the first European settlement in Central Virginia was es ...
##  [8] <p>In early 1737, planter <a href="/wiki/William_Byrd_II" title="Wil ...
##  [9] <p>In 1775, <a href="/wiki/Patrick_Henry" title="Patrick Henry">Patr ...
## [10] <p>Richmond recovered quickly from the war, and by 1782 was once aga ...
## [11] <p>After the <a href="/wiki/American_Revolutionary_War" title="Ameri ...
## [12] <p>On April 17, 1861, five days after the Confederate attack on <a h ...
## [13] <p>Richmond held local, state and national government offices. hospi ...
## [14] <p>Three years later, in March 1865, Richmond became indefensible af ...
## [15] <p>The Confederate Army began the evacuation of Richmond on April 2, ...
## [16] <p>President <a href="/wiki/Abraham_Lincoln" title="Abraham Lincoln" ...
## [17] <p>\nRichmond emerged a decade after the smoldering rubble of the Ci ...
## [18] <p>By the beginning of the 20th century the city's population had re ...
## [19] <p>Several major performing arts venues were constructed during the  ...
## [20] <p>Between 1963 and 1965 there was a "downtown boom" that led to the ...
## ...

Continue by using xml_text to extract the text from each paragraph. Take a few minutes to look through the results to see why we need some special logic to turn the output into something we can parse as text.

# Question 08
xml_find_all(tree, xpath = "..//p") %>%
  xml_text() %>%
  head(n = 10)
##  [1] "\n\n"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
##  [2] "Richmond (/ˈrɪtʃmənd/) is the capital city of the Commonwealth of Virginia in the United States. It is the center of the Richmond Metropolitan Statistical Area and the Greater Richmond Region. Incorporated in 1742, Richmond has been an independent city since 1871. The city’s population in the 2020 census was 226,610, up from 204,214 in 2010,[6] making Richmond Virginia’s fourth-most populous city. The Richmond Metropolitan Area, with 1,260,029 people, is the Commonwealth’s third-most populous.\n"                                                                                                                                                                                                                                                                                                                             
##  [3] "Richmond is located at the James River’s fall line, 44 mi (71 km) west of Williamsburg, 66 mi (106 km) east of Charlottesville, 91 mi (146 km) east of Lynchburg and 92 mi (148 km) south of Washington, D.C. Surrounded by Henrico and Chesterfield counties, Richmond is at the intersection of Interstate 95 and Interstate 64 and encircled by Interstate 295, Virginia State Route 150 and Virginia State Route 288. Major suburbs include Midlothian to the southwest, Chesterfield to the south, Varina to the southeast, Sandston to the east, Glen Allen to the north and west, Short Pump to the west, and Mechanicsville to the northeast.[7][8]"                                                                                                                                                                                      
##  [4] "Richmond was an important village of the Powhatan Confederacy and was briefly settled by English colonists from Jamestown from 1609 to 1611. Founded in 1737, it replaced Williamsburg as the capital of the Colony and Dominion of Virginia in 1780. During the Revolutionary War period, several notable events occurred in the city, including Patrick Henry's \"Give me liberty, or give me death!\" speech in 1775 at St. John's Church and the passage of the Virginia Statute for Religious Freedom written by Thomas Jefferson. During the American Civil War, Richmond was the Confederacy’s capital. Nonetheless, the Jackson Ward neighborhood is the city’s traditional hub of African-American commerce and culture. At the beginning of the 20th century, it had one of the world's first successful electric streetcar systems. \n"
##  [5] "Law, finance, and government primarily drive Richmond’s economy, and the downtown area has federal, state, and local governmental agencies, notable legal and banking firms, and several Fortune 500 companies, including Dominion Energy, WestRock, Performance Food Group, CarMax, ARKO, and Altria. [9][10][11] The city is home to the U.S. Court of Appeals for the 4th Circuit, one of 13 such courts and a Federal Reserve Bank, one of 12 such banks. The metropolitan area has other large corporations, like Markel.\n"                                                                                                                                                                                                                                                                                                                 
##  [6] "After the first permanent English-speaking settlement was established at Jamestown, Virginia, in April 1607, Captain Christopher Newport led explorers northwest up the James River to an inhabited area in the Powhatan Nation.[12]"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
##  [7] "In 1611, the first European settlement in Central Virginia was established at Henricus, where the Falling Creek empties into the James River. In 1619, early Virginia Company settlers established the Falling Creek Ironworks there. Decades of conflicts between the Powhatan and the settlers followed, including the Battle of Bloody Run, fought near Richmond in 1656, after an influx of Manahoacs and Nahyssans from the North. Nonetheless, the James Falls area saw more White settlement in the late 1600s and early 1700s.[13]"                                                                                                                                                                                                                                                                                                       
##  [8] "In early 1737, planter William Byrd II commissioned Major William Mayo to lay out the original town grid, completed in April. Byrd named the city after the English town of Richmond near (and now part of) London, because the view of the James River’s bend at the fall line was similar to that of the River Thames from Richmond Hill, named after Henry VII's ancestral home in Richmond, North Yorkshire.[14] In 1742, the settlement was incorporated as a town.[15]"                                                                                                                                                                                                                                                                                                                                                                     
##  [9] "In 1775, Patrick Henry delivered his famous \"Give me liberty, or give me death\" speech in Richmond’s St. John's Church, greatly influencing Virginia's participation in the First Continental Congress and the course of the revolution and independence.[16] On April 18, 1780, the state capital was moved fromWilliamsburg to Richmond, providing a more centralized location for Virginia's increasing western population and isolating the capital from British attack from the coast.[17] Nonetheless, the British, commanded by Benedict Arnold, burned Richmond in 1781, causing Governor Thomas Jefferson to flee while the Virginia militia, led by Sampson Mathews, defended the city.[18]"                                                                                                                                          
## [10] "Richmond recovered quickly from the war, and by 1782 was once again a thriving city.[19] In 1786 the Virginia Statute for Religious Freedom (drafted by Thomas Jefferson, 1743–1826) was passed at the temporary capitol in Richmond, providing the basis for the separation of church and state, a key element in the development of freedom of religion in the United States.[20] A permanent home for the new government, the Greek Revival style of the Virginia State Capitol building, was designed by Jefferson with the assistance of Charles-Louis Clérisseau and completed in 1788.\n"

Now, create a tibble object with one column called links and only one row with the entry “Richmond,_Virginia” for the variable links. This is the format required in the next code block. Make sure to save the result.

# Question 10
links <- tibble(links = "Richmond,_Virginia")
links
## # A tibble: 1 × 1
##   links             
##   <chr>             
## 1 Richmond,_Virginia

Pass your data from the last code to the function dsst_wiki_make_data. Use the code from the notes to post-process the results and look through the text a bit to see how it corresponds with the result from your own usage of xml_text.

# Question 11
docs <- dsst_wiki_make_data(links, cache_dir = "cache")
docs <- mutate(docs, doc_id = stri_replace_all(doc_id, "", regex = "<[^>]+>"))
docs <- mutate(docs, text = stri_replace_all(text, " ", regex = "[\n]+"))
docs <- filter(docs, !duplicated(doc_id))
docs
## # A tibble: 1 × 2
##   doc_id             text                                                   
##   <chr>              <chr>                                                  
## 1 Richmond, Virginia Richmond is the capital city of the Commonwealth of Vi…

Building a Corpus of Virginia Cities

Let’s build a large corpus. Start by constructing the url for the page called “List_of_towns_in_Virginia”.

# Question 12
url <- modify_url(
  "https://en.wikipedia.org/w/api.php",
  query = list(
    action = "parse", format = "json", redirects = TRUE,
    page = utils::URLdecode("List_of_towns_in_Virginia")
  )
)

Now, call the API to grab the results and create an object tree that contains the parsed text of the page.

# Question 13
res <- dsst_cache_get(url, cache_dir = "cache")
obj <- content(res, type = "application/json")
tree <- xml2::read_html(obj$parse$text[[1]])
tree
## {html_document}
## <html>
## [1] <body><div class="mw-parser-output">\n<p class="mw-empty-elt">\n</p>\ ...

Try to use the function dsst_wiki_get_links_table to get links to each of the cities. This will take a bit of trial and error and looking at the actual Wikipedia page.

# Question 14
links <- dsst_wiki_get_links_table(obj, table_num = 3, column_num = 1)

Once you have the data, use dsst_wiki_make_data and the code from the notes to contruct a full docs table for the cities.

# Question 15
docs <- dsst_wiki_make_data(links)
docs <- mutate(docs, doc_id = stri_replace_all(doc_id, "", regex = "<[^>]+>"))
docs <- mutate(docs, text = stri_replace_all(text, " ", regex = "[\n]+"))
docs <- filter(docs, !duplicated(doc_id))
docs <- mutate(docs, train_id = "train")
docs
## # A tibble: 199 × 3
##    doc_id                                 text                       train…¹
##    <chr>                                  <chr>                      <chr>  
##  1 Abingdon, Virginia                     Abingdon is a town in Was… train  
##  2 Accomac, Virginia                      Accomac is a town in and … train  
##  3 Alberta, Virginia                      Alberta is a town in Brun… train  
##  4 Altavista, Virginia                    Altavista is an incorpora… train  
##  5 Amherst, Virginia                      Amherst is a town in Amhe… train  
##  6 Appalachia, Virginia                   Appalachia is a town in W… train  
##  7 Appomattox, Virginia                   Appomattox is a town in A… train  
##  8 Ashland, Virginia                      Ashland is a town in Hano… train  
##  9 Bedford, Virginia                      Bedford is an incorporate… train  
## 10 Belle Haven, Accomack County, Virginia Belle Haven is a town in … train  
## # … with 189 more rows, and abbreviated variable name ¹​train_id

Now, parse the text using the cnlp_annotate function:

# Question 16
library(cleanNLP)
cnlp_init_udpipe("english")

docs <- filter(docs, stringi::stri_length(text) > 0)
anno <- cnlp_annotate(docs)$token
## Processed document 10 of 199
## Processed document 20 of 199
## Processed document 30 of 199
## Processed document 40 of 199
## Processed document 50 of 199
## Processed document 60 of 199
## Processed document 70 of 199
## Processed document 80 of 199
## Processed document 90 of 199
## Processed document 100 of 199
## Processed document 110 of 199
## Processed document 120 of 199
## Processed document 130 of 199
## Processed document 140 of 199
## Processed document 150 of 199
## Processed document 160 of 199
## Processed document 170 of 199
## Processed document 180 of 199
## Processed document 190 of 199

And finally, display the top 5 NOUNS/VERBS from each town’s page (Hint: you should be able to copy the code from the end of Notebook13).

# Question 17
anno %>%
  filter(upos %in% c("NOUN", "VERB")) %>%
  dsst_metrics(docs, label_var = "doc_id") %>%
  filter(count > expected) %>%
  group_by(label) %>%
  slice_head(n = 6L) %>%
  summarize(terms = paste0(token, collapse = "; ")) %>%
  mutate(out = sprintf("%30 s => %s", stri_sub(label, 1, 30), terms)) %>%
  getElement("out") %>%
  cat(sep = "\n")
##             Abingdon, Virginia => degree; settler; attack; territory; log; theatre
##              Accomac, Virginia => Accomac; preaching; referendum; creation; ferry; petition
##              Alberta, Virginia => much; abandoned; merger; portion; access; Exit
##            Altavista, Virginia => plant; company; encourage; furniture; form; section
##              Amherst, Virginia => seal; proposal; aircraft; courthouse; service; route
##           Appalachia, Virginia => celebration; festival; chief; communities; week; surround
##           Appomattox, Virginia => surrender; slave; free; depot; tribe; encounter
##              Ashland, Virginia => track; train; congregation; racetrack; clay; denomination
##              Bedford, Virginia => city; statue; memorial; status; nation; revert
## Belle Haven, Accomack County,  => person; age; household; mile; income; family
##           Berryville, Virginia => tavern; intersection; cabin; academy; sell; estate
##        Big Stone Gap, Virginia => film; movie; tornado; state; museum; consolidation
##           Blacksburg, Virginia => campus; open; Blacksburg; build; apartment; phase
##           Blackstone, Virginia => training; centers; giant; attract; visitor; affair
##               Bloxom, Virginia => age; 0.3square; population; household; live; be
##            Bluefield, Virginia => city; football; name; stadium; park; celebrate
##          Boones Mill, Virginia => foot; crest; watersh; woman; rise; miles
##        Bowling Green, Virginia => plantation; horse; 1.6square; northsouth; build; stage
##                Boyce, Virginia => station; want; railroad; building; article; cattle
##              Boydton, Virginia => boater; dirt; struggle; data; headquartere; plank
##              Boykins, Virginia => roof; support; tract; inch; portico; house
##          Branchville, Virginia => 0.4square; person; eighteen; household; live; be
##          Bridgewater, Virginia => hole; golf; rank; course; skating; crime
##             Broadway, Virginia => connect; routing; 1.8square; distance; climate; person
##              Brodnax, Virginia => mile; ad; lease; improvement; person; age
##            Brookneal, Virginia => textile; waterway; river; cross; navigation; transportation
##             Buchanan, Virginia => Exit; councilmanager; gazetteer; rock; zoned; landowner
##           Burkeville, Virginia => woman; surrender; appoint; banjo; Southside; teller
##         Cape Charles, Virginia => Cape; Charles; beach; harbor; dredge; mouth
##               Capron, Virginia => 0.2square; none; household; live; be; age
##          Cedar Bluff, Virginia => bluff; information; geology; ruin; grave; grist
## Charlotte Court House, Virgini => change; courthouse; filming; grade; contain; speech
##           Chase City, Virginia => 2.2square; age; crop; population; household; live
##              Chatham, Virginia => prison; term; soldier; boarding; chapter; walking
##             Cheriton, Virginia => designat; climate; squaremile; person; Cfa; junction
##            Chilhowie, Virginia => 1760; 2.6square; fort; sustain; compare; expedition
##         Chincoteague, Virginia => island; swim; storm; coast; horse; childrens
##       Christiansburg, Virginia => bus; discover; adjustment; provide; debt; instruction
##            Claremont, Virginia => marker; gauge; ad; commemorat; terminus; memorial
##          Clarksville, Virginia => tobacco; bass; export; lake; title; claim
##            Cleveland, Virginia => 0.1square; person; eighteen; 0.04square; household; mile
##        Clifton Forge, Virginia => city; stream; elevate; revert; service; fuel
##              Clifton, Virginia => siding; station; train; development; railroad; construct
##             Clinchco, Virginia => mile; roads; mountains; tip; triangle; winding
##           Clinchport, Virginia => populate; municipality; household; person; component; tri-
##            Clintwood, Virginia => eliminate; cleanup; desire; star; legacy; attention
##              Coeburn, Virginia => camp; boating; neighboring; Station; surveyor; elect
##       Colonial Beach, Virginia => mus; beach; automobile; peninsula; summer; 1960
##            Courtland, Virginia => courthouse; peanut; style; tavern; trustee; build
##          Craigsville, Virginia => farming; inherit; sewer; store; move; bakery
##                Crewe, Virginia => repair; Homecoming; missionary; rolling; decline; museum
##             Culpeper, Virginia => Culpeper; connection; 6th; downtown; earthquake; conservation
##             Damascus, Virginia => festival; trail; excess; tourists; helicopter; ambulance
##               Dayton, Virginia => group; video; brother; depict; homestead; leadership
##              Dendron, Virginia => company; sawmill; buy; locomotive; tree; leave
##              Dillwyn, Virginia => drama; mile; person; life; series; age
##        Drakes Branch, Virginia => branch; person; mile; lead; intersect; southeast
##               Dublin, Virginia => plant; model; truck; battery; duty; trucks
##             Duffield, Virginia => edge; component; tri-; person; eighteen; write
##             Dumfries, Virginia => commodity; port; frame; shield; element; indicate
##            Dungannon, Virginia => person; component; tri-; age; household; income
##            Eastville, Virginia => quarter; court; century; peninsulas; landscape; secure
##             Edinburg, Virginia => 0.7square; Hispanics; distribute; age; describe; pass
##               Elkton, Virginia => school; grade; renovation; build; add; 1.4square
##               Exmore, Virginia => theory; culture; interpretation; stations; tenth; museum
##            Farmville, Virginia => coal; water; gallon; plant; test; mine
##            Fincastle, Virginia => rebuild; select; erect; style; donate; courthouse
##                Floyd, Virginia => music; band; night; bluegrass; string; focus
##      Fluvanna County, Virginia => canal; county; navigation; citizen; batteaux; towpath
##                Fries, Virginia => mill; music; fry; cotton; organize; bluegrass
##          Front Royal, Virginia => front; license; stadium; theory; radio; newspaper
##            Gate City, Virginia => manufacturing; formation; goods; hero; honor; auto
##         Glade Spring, Virginia => spring; hit; growth; accident; cannon; closure
##              Glasgow, Virginia => flooding; flood; dam; record; confluence; remove
##             Glen Lyn, Virginia => mile; Pearisburg; rivers; climate; confluence; eighteen
##         Gordonsville, Virginia => tavern; time; rail; prosperity; traveler; rebuild
##               Goshen, Virginia => flame; hotel; combat; corn; few; entertainment
##               Gretna, Virginia => football; championship; person; claim; program; age
##             Grottoes, Virginia => cave; access; provide; Interstate; portion; connect
##               Grundy, Virginia => flood; relocate; parking; wall; pharmacy; disaster
##              Halifax, Virginia => mile; watersh; climate; join; Cfa; lead
##             Hallwood, Virginia => climate; person; 0.2square; Cfa; elevation; terminus
##             Hamilton, Virginia => store; tourism; boarding; butcher; clothing; mens
##            Haymarket, Virginia => court; Interstate; path; lake; lynch; crime
##                Haysi, Virginia => say; worker; store; business; ferry; laurel
##              Herndon, Virginia => caboose; center; police; head; agency; theatre
## Hillsboro, Loudoun County, Vir => refer; maker; management; infrastructure; structure; play
##           Hillsville, Virginia => deputy; trial; pardon; cousin; heriff; jail
##              Honaker, Virginia => project; celebrat; restoration; undergo; World; fort
##                 Hurt, Virginia => go; 2.6square; road; spur; age; bypass
##         Independence, Virginia => Courthouse; stage; no; remain; brick; design
##            Iron Gate, Virginia => Cdp; airports; arise; cease; operate; reside
##            Irvington, Virginia => contributing; schoolhouse; winemaking; locate; cannery; clothing
##                 Ivor, Virginia => household; live; income; family; mile; have
##              Jarratt, Virginia => category; acquire; exception; hous; counties; execution
##           Jonesville, Virginia => founder; end; nineteenth; surviving; drug; concentrate
##               Keller, Virginia => person; 0.3square; elevation; household; lie; age
##            Kenbridge, Virginia => folk; rock; no; artist; person; age
##            Keysville, Virginia => bring; lead; effort; link; fund; accomplish
##           Kilmarnock, Virginia => 2.9square; appellation; Area; winemaking; mouth; locate
##            La Crosse, Virginia => 1.2square; fact; company; caboose; person; derive
##        Lawrenceville, Virginia => wood; concrete; nomination; describe; building; direct
##              Lebanon, Virginia => founde; breeding; Places; chill; pastor; cave
##             Leesburg, Virginia => airport; Leesburg; radio; northeast; club; enter
##               Louisa, Virginia => mainline; merge; transport; produce; communication; feel
##         Lovettsville, Virginia => immigrant; drain; vote; establish; buffer; crash
##                Luray, Virginia => neighborhood; engagement; cite; victory; elementary; Luray
##              Madison, Virginia => entirety; extend; highway; alignment; 0.2square; trip
##               Marion, Virginia => drink; position; label; mailing; motorcycle; enthusiast
##             McKenney, Virginia => pass; educator; abandoned; mainline; resource; exit
##   Mecklenburg County, Virginia => county; administrator; install; officer; department; category
##                Melfa, Virginia => 0.3square; person; rest; age; elevation; household
##           Middleburg, Virginia => comprise; continue; activist; press; research; steeplechase
##           Middletown, Virginia => extend; antebellum; manor; piece; watershed; 1600
##              Mineral, Virginia => mining; depth; fatality; magnitude; evacuation; heyday
##             Monterey, Virginia => southeast; 1840; cate; king; selection; Summers
##             Montross, Virginia => resolution; artifact; collection; visit; preserve; bear
##       Mount Crawford, Virginia => 0.3square; Zip; instruction; facility; limit; Area
##        Mount Jackson, Virginia => hospital; cavalry; trooper; retreat; skirmish; artillery
##              Narrows, Virginia => count; rebel; affiliate; 0.1square; person; baseball
##           Nassawadox, Virginia => category; bayside; seaside; counties; bound; coast
##           New Castle, Virginia => paddle; cliff; feature; dispute; landowner; 0.2square
##           New Market, Virginia => cadet; site; commander; number; finish; governor
##              Newsoms, Virginia => 0.5square; age; household; live; be; income
##         Nickelsville, Virginia => age; component; tri-; population; household; live
##   Northampton County, Virginia => county; court; lifetime; planter; rule; slavery
## Northumberland County, Virgini => county; fishing; tribe; industry; fish; recognition
##             Occoquan, Virginia => grist; tribe; 0.2square; neighbor; colonist; mill
##             Onancock, Virginia => barge; seek; surrender; action; marker; flagship
##                Onley, Virginia => climate; attorney; person; estate; consist; age
##               Orange, Virginia => courthouse; roadway; stand; manager; property; act
##              Painter, Virginia => church; shift; erect; chapel; Church; coming
##         Pamplin City, Virginia => library; holiday; pipe; occasion; revitalization; renovate
##             Parksley, Virginia => removal; approve; shore; airfield; harvest; invasion
##           Pearisburg, Virginia => descend; foot; Pearisburg; pass; southwest; landowner
##             Pembroke, Virginia => climate; marine; age; difference; high; rainfall
##       Pennington Gap, Virginia => 1.5square; consiste; consolidation; climate; person; locate
##               Phenix, Virginia => age; none; household; mile; live; be
##           Pocahontas, Virginia => coal; Pocahontas; mine; miner; worker; explosion
##           Port Royal, Virginia => cash; barn; port; export; crossroad; crop
##                Pound, Virginia => pound; revoke; bill; charter; crowd; miner
## Prince Edward County, Virginia => school; student; percent; rule; county; case
##              Pulaski, Virginia => university; locate; nestle; college; tornadoe; bike
##         Purcellville, Virginia => staff; career; terminate; apparatus; organization; support
##             Quantico, Virginia => base; series; investigation; flight; headquarters; security
##            Remington, Virginia => flag; seal; battle; absorb; fell; shoulder
##           Rich Creek, Virginia => mile; Pearisburg; mouth; person; lead; terminus
##            Richlands, Virginia => worker; man; crash; rob; Southside; satellite
##             Ridgeway, Virginia => survey; drain; side; tributary; not; passing
##    Rockingham County, Virginia => county; agency; peak; stretch; status; valley
##          Rocky Mount, Virginia => build; dealership; administration; furniture; adapt; grocery
## Round Hill, Loudoun County, Vi => supplement; lie; estimate; protection; terminate; sewer
##        Rural Retreat, Virginia => 2.2square; age; list; population; household; live
##            Saltville, Virginia => saltwork; Saltville; salt; attack; force; rocket
##                Saxis, Virginia => seafood; supplies; water; pier; blockade; vessel
##           Scottsburg, Virginia => person; eighteen; center; flow; tributary; household
##          Scottsville, Virginia => levee; flood; canal; prevent; rivers; serve
##           Shenandoah, Virginia => iron; furnace; flood; fire; destroy; river
##           Smithfield, Virginia => ham; peanut; pork; offer; prove; plat
##         South Boston, Virginia => south; cancel; decrease; leading; producer; suspend
##           South Hill, Virginia => candidate; election; council; favor; voter; mayor
##          St. Charles, Virginia => rug; straddle; official; hill; northwest; mining
##             St. Paul, Virginia => age; 0.04square; power; program; newspaper; population
##        Stanardsville, Virginia => landholder; extend; alignment; select; connection; survey
##              Stanley, Virginia => grade; serve; bus; rejoin; primary; remainder
##        Stephens City, Virginia => wagon; mus; committee; limit; force; member
##          Stony Creek, Virginia => exit; frontage; 0.6square; climate; interchange; age
##            Strasburg, Virginia => championship; team; boast; state; accommodation; mural
##               Stuart, Virginia => buy; pave; rank; railroad; occupation; sidewalk
##                Surry, Virginia => 0.8square; derive; age; population; household; live
##        Sussex County, Virginia => sussex; county; prison; row; category; exclude
##              Tangier, Virginia => island; call; sea; ridge; patent; resident
##         Tappahannock, Virginia => Rappahannock; Tappahannock; estuary; ship; port; mean
##             Tazewell, Virginia => car; motorcycle; enthusiast; headwater; hundred; cave
##           The Plains, Virginia => lodge; call; Association; blacksmit; corn; exhibit
##          Timberville, Virginia => head; Dutch; sr; constitute; 0.9square; migrate
##           Toms Brook, Virginia => dub; Exit; climate; 0.2square; person; interchange
##            Troutdale, Virginia => mayor; count; circle; figure; entertainment; parade
##           Troutville, Virginia => trail; operate; park; shipping; cannery; cyclist
##              Urbanna, Virginia => festival; craft; oyster; day; display; event
##             Victoria, Virginia => rail; steam; employment; division; coal; caboose
##               Vienna, Virginia => program; team; school; coach; send; title
##               Vinton, Virginia => lake; cover; border; consist; elevation; consider
##            Virgilina, Virginia => lead; copper; border; watersh; mile; 1950
##         Wachapreague, Virginia => marsh; seaside; centurie; ocean; fisherman; wind
##            Wakefield, Virginia => category; race; 1.2square; alert; appear; weather
##            Warrenton, Virginia => government; lynch; mob; northwest; man; drain
##               Warsaw, Virginia => change; Courthouse; array; inmate; vineyard; addition
##    Washington County, Virginia => county; part; category; combine; region; form
##           Washington, Virginia => town; restaurant; acre; church; establish; inn
##              Waverly, Virginia => senator; lynching; storm; tree; man; category
##           Weber City, Virginia => gap; skit; hear; pioner; welcome; read
##           West Point, Virginia => point; port; paper; reservation; continue; recognize
##          White Stone, Virginia => stone; category; derive; stagnate; story; heyday
##              Windsor, Virginia => school; enrollment; students; color; grade; student
##          Wise County, Virginia => county; wise; category; hunt; feature; part
##                 Wise, Virginia => wise; climate; Register; add; defend; raider
##            Woodstock, Virginia => Woodstock; cavalry; regiment; time; frontier; rail
##           Wytheville, Virginia => bird; halfway; signer; assault; ambulance; sister