Faculty Publications By Year

Approaching the Largest 'API': Extracting Information From the Internet With Python

Publication Title

Code4Lib Journal

Jonathan E. Germann, Georgia State UniversityFollow

Document Type

Article

Publication Date

2-5-2018

Abstract

This article explores the need for libraries to algorithmically access and manipulate the world’s largest API: the Internet. The billions of pages on the ‘Internet API’ (HTTP, HTML, CSS, XPath, DOM, etc.) are easily accessible and manipulable. Libraries can assist in creating meaning through the datafication of information on the world wide web. Because most information is created for human consumption, some programming is required for automated extraction. Python is an easy-to-learn programming language with extensive packages and community support for web page automation. Four packages (Urllib, Selenium, BeautifulSoup, Scrapy) in Python can automate almost any web page for all sized projects. An example warrant data project is explained to illustrate how well Python packages can manipulate web pages to create meaning through assembling custom datasets.

Recommended Citation

Jonathan E. Germann, Approaching the Largest 'API': Extracting Information From the Internet With Python, Code4Lib J. (Feb. 5, 2018), http://journal.code4lib.org/articles/13197.

Institutional Repository Citation

Jonathan E. Germann, Approaching the Largest 'API': Extracting Information From the Internet With Python, Faculty Publications By Year 2601 (2018)
https://readingroom.law.gsu.edu/faculty_pub/2601

Issue

Link to Full Text

COinS

Reading Room

Faculty Publications By Year

Approaching the Largest 'API': Extracting Information From the Internet With Python

Publication Title

Document Type

Publication Date

Abstract

Recommended Citation

Institutional Repository Citation

Issue

Browse

Search

Reading Room

Faculty Publications By Year

Approaching the Largest 'API': Extracting Information From the Internet With Python

Publication Title

Authors

Document Type

Publication Date

Abstract

Recommended Citation

Institutional Repository Citation

Issue

Share

Browse

Search