# Class 24: More Network Types

### 2017-11-21

Today, we’ll look at three additional examples of networks. Pay attention as you’ll be selecting from amongst these for the third and final project. Notice that all of these, like the Supreme Court citations, are too large to look at all at once and you’ll need to subset the edges or nodes.

## Baseball data

For those of you interested in sports data, I have two datasets constructed from Major League Baseball. The nodes in the first are the various MLB franchises:

And the edges indicate, in a given year, how many players on one team came from another team:

To do something interesting with this, you’ll need to take a subset of the years and (likely) truncate to only those edges with a large enough count. Here, I’ll look at 2010 and counts above 10:

Interesting questions include local effects (how does this graph change over a specific decade), or how does it change over a long time period. For example, here is the graph from before the modern free-agency era:

The second baseball dataset is similar, but includes links between MLB teams and college teams for a given year:

Take a look at the data from 1950:

I can see some regional effects here in the 1950’s graph (the Red Sox have the only players from Providence and UConn, for example). Richmond even has a player on the Yankees roster!

## RFID Tags

The second set of graph data concerns RFID tags from a French Hospital system over the course of 8 days. The nodes consist of patients, nurses, administrators, and physicians:

The edges indicate whenever two entities came in contact with one another in a given 20-second time interval:

Interesting relationships can be understood by looking at the graph over various time periods:

A particularly interesting approach could collect summary statistics over particular hours and then plot that data. There is a lot of potential here, though it will take some digging into the dataset to find it.

## Shakespeare characters

The final dataset comes from character relationships from Shakespeare’s plays. There are two different sets of edges (there is no seperate nodes table), depending on whether links should indicate whether two characters talk to one another or appear within a fixed number of words of one another:

There is a seperate element for each play. As you can see, connections have scores that you could use to filter to only the strongest relationship. Here is the network from “A Midsummer Night’s Dream”:

The clusters line up well to the different aspects of the play. And here is the same set of characters using the speech network:

You can study and compare multiple plays; here is Romeo and Juliete:

An interesting project would be to study the differences and similarities between the plays. Do they themselves relate in any way?

## Project III

The third project requires you do a data analysis based on network data. Your analysis should not focus on just a single network, but should consist of comparing multiple networks to find more interesting meta-patterns. I am open to other suggestions, but generally I recommend that you use one of the following datasets:

• Wikipedia links data (double hops; text vs. citation vs. co-citation; contrast different starting points)
• Baseball datasets (compare college vs. pro; look across years; apply different cut-offs)
• Shakespeare plays (compare speech vs. time; play with the cut-off; compare plays, perhaps clustered on type: comedy, tragedy, history)
• RFID data (look at the graph over time; compare across days, hours, types, and individuals)
• Supreme Court citations (look at the graph for various issues, perhaps over time, using different cut-offs and looking at citation and co-citation graphs)

The end goal is to find something interesting and relay these interesting ideas through graphics and/or models through your data analysis report. This should more closely resemble the first data analysis rather than the second one (i.e., there should be a thesis rather than an hypothesis). We will have presentations on these reports during the final week of the term.