Learn Javascript

Javascript – Working with Data

"Above all else, show the data."– Edward R. Tufte

Data Collections

In addition to primitive types (boolean, number, and string), Javascript can store collections (e.g. arrays, objects, etc.)

For more on arrays, objects, and other collections see: Data Structures or Learn Computing > Data Types
// array
let primaries = ["red", "green", "blue"];
// object
var singer = {
    "name": "Bruce Springsteen",
    "albums": [{
            "name": "Born to Run",
            "year": 1975
        }, {
            "name": "Born in the U.S.A.",
            "year": 1984
        }, {
            "name": "The Ghost of Tom Joad",
            "year": 1995
        }]
};

Data Exchange Formats

  • Data collections can be stored (serialized) in code, or imported from external files, databases, or API servers.
  • Exchange formats standardize how data is stored and transmitted. These are generally tabular or hierarchical.
"make", "model", "year"
"Pontiac", "Vibe", 2009
"Honda", "Fit", 2007
"Chevrolet", "Blazer", 2000
"Toyota", "RAV4", 1999
Tabular data exchange formats (e.g. CSV, TSV, etc.)
{
    "cars": [{
        "make": "Pontiac",
        "model": "Vibe",
        "year": 2009
    }, {
        "make": "Honda",
        "model": "Fit",
        "year": 2007
    }, {
        "make": "Chevrolet",
        "model": "Blazer",
        "year": 2000
    },
    ...]
}
Hierarchical data exchange formats (e.g. JSON or XML)

Using External Data

Important concepts to know when importing external data into your program:

  • External data sent over a network is serialized (as a string) and must be deserialized (stored as data in computer memory) to use in your program.
  • The most common method to retrieve external data in Javascript is the fetch(), which can deserialize using the .json() method.
  • Your program must have access to retrieve external data by the CORS policy on the remote server.

Serialization

  • When you create a variable in your program it stores data in the memory of the computer. If you restart your computer that data is lost.
  • To save data that can be accessed again or sent across a network you must serialize it using a data exchange format

Deserialization

  • Likewise, when you load data from an external source it arrives as plain text. For example, data returned from this random cat facts API (docs) arrives as one large string:
[{"type":"cat", "text":"In the original Italian version of Cinderella, the benevolent fairy godmother figure was a cat."}, {"type":"cat","text":"When a cats rubs up against you, the cat is marking you with it's scent claiming ownership."}];

Deserialization

  • Before you can use external data in your code, you must conversely deserialize the string into a data structure that Javascript can read and use.
  • JSON.parse() can be used to convert a string of serialized JSON data into a Javascript Object.
let obj = JSON.parse(str);
console.log(obj[1].text);
// -> "When a cats rubs up against you, the cat is marking you with it's scent claiming ownership."

Deserialization

After the above, the JSON string has now been deserialized, and is an object that Javascript can use.

[{
	"type": "cat",
	"text": "In the original Italian version of Cinderella, the benevolent fairy godmother figure was a cat."
}, {
	"type": "cat",
	"text": "When a cats rubs up against you, the cat is marking you with it's scent claiming ownership."
}]

You can also convert a Javascript Object back to a string with JSON.stringify().

console.log(JSON.stringify(str)); // -> "[{"type":"cat","text":"In the original Italian version ... ]"

Load data with Fetch

  • Javascript's Fetch() is the most common method to load external data over HTTP.
  • Fetch is asynchronous, and return a Promise object, which is resolved once the asynchronous response has been received.
// fetch requests an external resource from endpoint
fetch('https://catfact.ninja/fact')
    // then passes response to next item in chain
	.then(response => { 
        // log the entire response object stored in variable
        console.log(response);
        // -> 200, "OK"
	    console.log(response.status, response.statusText);     
    });

Load data with Fetch

fetch('https://catfact.ninja/fact') 
    .then(response => {
		// parse response.body (convert to JSON), pass to next
        return response.json();  
    })
    // data = deserialized response body
    .then(data => { 
        console.log(data); 
    });

Fetch examples

  1. Fetch (simple) => catfact.ninja
  2. Fetch in a function (randomuser.me)
  3. Fetch in a function - Asynchronous code demo
  4. More fetch() examples

Cross-Origin Resource Sharing

  • Cross-Origin Resource Sharing (a.k.a. CORS policies) control how Javascript in the browser can access external resources.
  • Without CORS, any website you visit could access your local filesystem! file:///Users/username/allYourPersonalData
  • Unless explicitly enabled by the host defaults are...

CORS error message

If you see the following error it likely means you are trying to load data:

  • from a server or API that doesn't allow cross-origin requests
  • from your local file system file:/// or C:
Access to fetch at 'remote-url' from origin 'your-url' has been blocked by CORS policy
No 'Access-Control-Allow-Origin' header is present on the requested resource.

CORS Rules

  • You can import files from the same origin as the file you are importing it into.
    • ✅ script at https://foo.com can access https://foo.com/data.json (they have the same origin)
  • You cannot import data across origins without permission.
    • 🚫 script at https://foo.com cannot access a resource at a different origin https://bar.com/data.json
  • You cannot import data using the file:// protocol, even with the same origin.

How to fix CORS (simple)

If you are loading a .json file from within your own project, just change the file type and store the data as an object in a Javascript variable:

e.g. data.json

{
    "key": "value"
}

...becomes data.js with a variable

let data = {
    "key": "value"
}

Test for CORS

  1. Open a page that uses remote data (e.g. Bootstrap or Google Fonts)
  2. Open DevTools, click on the Network tab, and refresh the page
  3. Select a Bootstrap file in the resources list.
  4. Under Headers, look for Access-Control-Allow-Origin:
    • ✅ A wildcard * (asterisk) = all requests are accepted.
    • 🚫 If not set, then cross-origin is not allowed
  5. Run these in the console and repeat #4 for each resource. Also see this tester
fetch("https://github.com").catch(e => console.log(e)); // 🚫
fetch("https://cdn.jsdelivr.net/").catch(e => console.log(e)); // 🚫
fetch("https://cdn.jsdelivr.net/npm/").catch(e => console.log(e)); // ✅

How to fix CORS

Unless the remote host has explicitely banned cross-origin requests

you are using remote data you will need to

To get around this while you are developing locally, you can do one of the following:

  • Upload your files to a web server => Commit and push your code to Github.io each time you make a change. This can be very tedious.
  • Run a web server on your own computer and test locally (see next slide)

How to run a web server

There are several ways to do this.

Quick:

Long term solutions:

Load data with D3 (.json, .csv, ...)

D3's d3-fetch module provides several methods to load and deserialize external data via HTTP request. For more information see the documentation or this Data Loading in D3 tutorial.

Install D3 in your HTML

<script src="https://d3js.org/d3-dsv.v2.min.js"></script>
<script src="https://d3js.org/d3-fetch.v2.min.js"></script>

Load data with D3 (.json, .csv, ...)

d3.json() return a Javascript object from JSON

// as of v5 D3 uses a promise-based syntax
d3.csv("/path/to/file.csv").then(function(data) {
	console.log(data); // use data here
});

d3.csv() return a Javascript object from CSV

d3.csv("/path/to/file.csv").then(function(data) {
	console.log(data); // use data here
});

Several other CSV parsers exist, including:

  • jquery-csv - a jQuery CSV parser (returns a 2D array from CSV string)
  • CSV - a universal JavaScript CSV parser (returns a 2D array from CSV string)

Loading data from Google Sheets

I cover this in a video How to export JSON data from google sheets (34:00)

Method 1 - Download it manually

While the easiest to start, it will quickly become tedious if you plan to update your spreadsheet often.

  1. In Google Sheets, choose File > Download > Comma-separated values
  2. Move the CSV file into your project and import it directly using one of the above methods, or convert it permanently to JSON.

Method 2 – Make it publicly accessible

Fairly simple to implement, this method allows you to reimport your data instantly with a single command line script. See sample-node-tools/node-projects/export-google-sheets for a working example.

  1. Install code
  2. Make your Google Sheet publicly accessible and publish it to the web
  3. Define the spreadsheet URL (see this guide) or optionally, the query documentation. Also see this post)
  4. Fetch and return JSON data

Method 3 - Use the Google Sheets API

This is the most involved method. See their tutorial and documentation.

  1. Enable the Google Sheets API
    1. Give it an obvious name
    2. Choose Desktop as the client
    3. Download the client configuration and save the file credentials.json to your working directory.
  2. Install the client library
  3. Set up the sample and name it quickstart.js
  4. Run the sample

How to get data from an API

See https://omundy.github.io/learn-computing/slides/data-apis.html

Tips for writing code using APIs

Using remote data adds latency and points where problems can occur:

  • Start with a tool that makes it easy to see what is returned. For example, test that your requests work in the browser or Postman
  • Save sample responses (test data) locally to develop your application without bumping into a rate limit.
  • APIs introduce latency so use asynchronous programming (async/await, promises, etc.)
  • Many APIs use rate limiting to prevent abuse. They identify your requests using your key.

Data Storage

Considerations for using data in a project:

  1. Is your data static (unchanging) or dynamic (and how often does it changes)?
  2. Do you need to insert or update data (from users, a scraper, or other means)?
  3. Will you need to query the data (by searching, sorting, etc.)?
  4. How much data will you need to store or request across a network?
  5. What data formats do you have / will you need?
  6. Do you need to clean or transform your data?
  7. Do all of your datasets need to be treated the same?

Example storage situations:

  • I have a spreadsheet (static), with many columns I don't need (not clean)
  • I want users to be able to (dynamic) add (insert) or edit (update) content
  • I want to use live data (dynamic) via a remote API (request)

Static vs. Dynamic

  • Static data is content that you download into your own project and doesn't change. You can clean, modify, or improve it to make it easier to use in your project.
  • Dynamic data is always changing. It could be due to users editing content, or sensor readings from a weather station. To use dynamic data you'll need to evaluate how often you need to update your own content, whether you'll save data, based on the needs of your project.

Flat files vs. Databases

  • A "flat file" is any single or collection of plain text files that stores data (e.g. CSV, TSV, JSON, etc.) They are easy to write, transform, or transmit, but since they must be read from memory they are slow and difficult to query.
  • A database requires some additional work to set up the software and write the code that uses it. The benefit is that it is easy to insert, update, and query your data.
  • How to install MySQL on MacOS

Browser-based storage

  • localStorage - For Storing data locally in your users' browser. Can be used by extensions or websites.
  • cookies - Another form of storage, typically used to identify users who are logged-in to an application.

See https://omundy.github.io/learn-javascript/topics/data-persistence/slides.html

Data Cleaning

See https://omundy.github.io/learn-computing/slides/data-cleaning.html

Data Conversion

Convert CSV to JSON

  1. Download a CSV from your spreadsheet
  2. Use Node and csvtojson to convert the file
  3. Save as a file using writeFile

For example, to convert this file ../datasets/temps.csv

C, F, description
-273.15, -459.67, absolute zero temperature
0, 32.0, freezing/melting point of water
21, 69.8, room temperature
37, 98.6, average body temperature
100, 212, boiling point of water

to deserialized Javascript code, two-dimensional array would be the closest equivalent data structure in Javascript for tabular data.

const tempsArr = [
	['C', 'F', 'description']
	['-273.15', '-459.67', 'absolute zero temperature'],
	['0', '32.0', 'freezing/melting point of water'],
	['21', '69.8', 'room temperature'],
	...
];

Convert data to a JS object can make it easier to use. Below, each row can be referenced using their their keys, and numbers have been retyped.

const tempsObjectsArr = [
	{
		C: -273.15,
		F: -459.67,
		description: 'absolute zero temperature'
	},{
		C: 0,
		F: 32.0,
		description: 'freezing/melting point of water'
	},{
		C: 21,
		F: 69.8,
		description: 'room temperature'
	}
	...
];
console.log(`${tempsObjectsArr[2].description} is about ${tempsObjectsArr[2].F} F`);
// -> 'room temperature is about 69.8 F'

Resources

- [Data Collections](#data-collections) - [Data Exchange Formats](#external-data) - [Using External Data](#using-external-data) - [Cross-Origin Resource Sharing](#cross-origin-resource-sharing) - [Serialization and Deserialization](#serialization-and-deserialization) - [Loading External Data](#loading-external-data) - [Fetch](#fetch) - [D3 (.json, .csv, ...)](#d3-json-csv-) - [Loading data from Google Sheets](#loading-data-from-google-sheets) - [Method 1 - Download it manually](#method-1-download-it-manually) - [Method 2 – Make it publicly accessible](#method-2-make-it-publicly-accessible) - [Method 3 - Use the Google Sheets API](#method-3-use-the-google-sheets-api) - [How to get data from an API](#how-to-get-data-from-an-api) - [Data Storage](#data-storage) - [Static vs. Dynamic](#static-vs-dynamic) - [Flat files vs. Databases](#flat-files-vs-databases) - [Browser-based storage](#browser-based-storage) - [Data Cleaning](#data-cleaning) - [Data Conversion](#data-conversion) - [Convert CSV to JSON](#convert-csv-to-json)

- Advanced: [Create a proxy server and run it on a live server (e.g. Heroku)](https://github.com/omundy/sample-node-proxy-server)