Skip to content

Glossary (Module 4: API and JSON Data)

API (Application Programming Interface)

A web-accessible service that allows programs to request and receive data. In this project, an API is used to retrieve structured text data in JSON format.

Endpoint

A specific URL used to access data from an API.

HTTP Request

A message sent from a client (your Python script) to a server (the API) to request data.

HTTP Request Headers

Metadata sent with an HTTP request that provides additional information about the client or the requested data format.

JSON (JavaScript Object Notation)

A structured data format commonly used by web APIs. JSON data is organized using:

  • objects (key-value pairs, like Python dictionaries)
  • arrays (ordered lists, like Python lists)

JSON Object

A collection of key-value pairs. In Python, this is represented as a dictionary.

JSON Array

An ordered collection of values. In Python, this is represented as a list.

Record

A single unit of data within a JSON structure. In this project, each record typically represents one item (e.g., one post).

Key

A name used to identify a value within a JSON object.

Value

The data associated with a key in a JSON object.

Nested Structure

A JSON structure that contains objects or arrays inside other objects or arrays.

Pipeline

A sequence of processing stages where data flows from a source to a sink.

EVTL (Extract, Validate, Transform, Load)

A pipeline model used in this project:

  • Extract: acquire data from a source
  • Validate: inspect structure and confirm the data is usable
  • Transform: reshape the data into a structured format
  • Load: write the data to a destination

Source

The origin of data in a pipeline stage. Examples include an API endpoint or an input file.

Sink

The destination where data is written after processing. Examples include a CSV file or database.

Extract

The stage of the pipeline that retrieves data from an external source and converts it into Python objects.

Validate

The stage of the pipeline that inspects the structure of the data and checks that it meets expectations before use.

Transform

The stage of the pipeline that reshapes data into a structured, analysis-ready format.

Load

The stage of the pipeline that writes the processed data to a chosen destination.

DataFrame

A tabular data structure with rows and columns. In this project, Polars DataFrames are used to store structured data.

Schema

The structure of data, including field names and expected data types.

Inspection

The process of examining data to understand its structure, including types, keys, and organization.

Validation

The process of confirming that data meets expected structure, types, and required fields.

Normalization

The process of converting data into a consistent, structured format where each record follows the same schema.

Reproducibility

The ability to run the same pipeline and obtain consistent results, given the same inputs and configuration.