The third and four projects require you to build a corpus of pages and apply the Wiki Text Explorer code to this corpus. Today I will show you a few other details for putting together your projects. Start by loading the
assert wiki.__version__ >= 6 assert wikitext.__version__ >= 2 assert iplot.__version__ >= 3
wikitext module include a function
takes a link to a Wikipedia page and returns a dictionary with three elements.
links_us = wikitext.get_internal_links('History_of_the_United_States') links_us.keys()
dict_keys(['ilinks', 'ilinks_p', 'ilinks_li'])
The element 'ilinks' includes all links given on the page. The 'ilinks_p' are all links contained within paragraph tags, and 'ilinks_li' are all links given inside of list items (before bullet points). We can use all of these to build a corpus of interest for your projects.
Now, let's say that you've created a
WikiCorpus object, such as this one:
wcorp = wikitext.WikiCorpus(links_us['ilinks_p'])
There is a method provided by
wcorp that produces a dictionary object with
template names for all of the topics and clusters in our corpus. Here we will
use it to store the file as a file named 'history-us.json':
import json with open('history-us.json', 'w') as fout: json.dump(wcorp.json_meta_template(), fout, indent=2)
Now, open the file through Jupyter notebook (or your favorite text editor). Change the name of the first topic and now create the Text Explorer output:
You should see that the page now names the first topic whatever you renamed it to. A large part of the next project involves constructing names for all of your topics.