Module 1 covered the fundamentals of data, databases, and database management systems (DBMS). Databases are everywhere and whether we realize it or not, we use them in almost all our daily activities. We know that some companies collect terabytes of user data every single day. But did you know that there are organizations that have collected massive amounts of data and are making that data available to the general public for free? One benefit of using these public databases is that queries can be made with a simple input form or two! As you learn more about databases throughout this course, you will begin to appreciate the skill and knowledge required to develop and populate these types of databases.
Below are some large public databases. For this discussion, you may use one of these or any large public database you find.
Annotated Human Genome Data: This is a large database (310 GB) of genome information for humans and about 50 other species. The database has several different methods to access the data, including a simple web interface, export via FTP, MySQL server queries, Perl API, and even a data-mining tool.
Historical Weather Datasets: The National Oceanic and Atmospheric Administration (NOAA) provides time series data and climatological data in several different databases. The various databases provide different series of data. The data can be accessed via a browser interface as well as via specific tools that are provided by the administration.
The Library of Congress: This massive database contains 133 Terabytes of compressed data. A single search could take up to 24 hours to complete. The Library of Congress also has numerous interesting databases containing information on a variety of subjects. Although there is extensive information, it is available through website interfaces rather than through specialized tools or APIs.
The CIA World Factbook: This massive database provides information on history, people, societies, government, economy, energy, geography, communications, transportation, military, and transnational issues for 267 world entities. It includes a variety of world, regional, country, oceanic, and time zone maps; flags of the world; and a country comparison function that ranks country information and data in more than 75 Factbook fields.
Pew Research Center: Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes, and trends shaping the world. They conduct public opinion polling, demographic research, content analysis, and other data-driven social science research.
US Labor, Economic, and Census Data: Various departments of the US government provide large datasets containing information about the US economy, plus labor, commerce, housing, and census data. These databases can be accessed through a browser as well as customized APIs and downloads. Various departments include:
The U.S. Bureau of Labor Statistics website
The United States Census Bureau website
Instructions
For this discussion you will be selecting a large public database and doing some research. Your initial posting should at least include the following, but feel free to add any other interesting things you discover. (Please note the examples are just to get you started, your answers should be more substantial.):
The name and link to the database (e.g., The U.S. Bureau of Labor Statistics at https://www.bls.gov/data/ )
What kind of data does this database collect? (e.g., This database collects data about employment, compensation, living and working conditions….)
Why did you choose this public database? (e.g., I am thinking about a job change and .…)
What were you trying to find out? (e.g., I wanted to find out the average salary of a private archivist.)
Provide keywords or query text you used in your search
OR if you are just clicking on links
Provide a breadcrumb trail to where you found the information: (below is an example of a breadcrumb)
Bureau of Labor Statistics > Publications > Occupational Outlook Handbook > Education, Training, and Library
Provide the results of your search or an explanation why you think you were not able to get the results you expected. (e.g., A private archivist makes an average of $60,050 per year.)