Interactive Data Visualization using D3.js, DC.js, Nodejs and MongoDB

Hello All,
The aim behind this blog post is to introduce open source business intelligence technologies and explore data using open source technologies like D3.js, DC.js, Nodejs and MongoDB.
Over the span of this post we will see the importance of the various components that we are using and we will do some code based customization as well.

The Need for Visualization:

Visualization is the so called front-end of modern business intelligence systems. I have been around in quite a few big data architecture discussions and to my surprise i found that most of the discussions are focused on the backend components: the repository, the ingestion framework, the data mart, the ETL engine, the data pipelines and then some visualization.

I might be biased in favor of the visualization technologies as i have been working on them for a long time. Needless to say visualization is as important as any other component of a system. I hope most of you will agree with me on that. Visualization is instrumental in inferring the trends from the data, spotting outliers and making sense of the data-points.
What they say is right, A picture is indeed worth a thousand words.

The components of our analysis and their function:

D3.js: A javascript based visualization engine which will render interactive charts and graphs based on the data.
Dc.js: A javascript based wrapper library for D3.js which makes plotting the charts a lot easier.
Crossfilter.js: A javascript based data manipulation library. Works splendid with dc.js. Enables two way data binding.
Node JS: Our powerful server which serves data to the visualization engine and also hosts the webpages and javascript libraries.
Mongo DB: The resident No-SQL database which will serve as a fantastic data repository for our project.

The Steps to Success:

  1. Identifying what our analysis will do
  2. Fetching the data and storing it in MongoDB
  3. Creating a Node.js server to get data from MongoDB and host it as an api
  4. Building our frontend using D3.js,Dc.js and some good old javascript.

Here’s what the end result will look like (might take a couple seconds to load):

projectnew

STEP 1: Identifying our Analysis

We will be analyzing data from DonorsChoose.org, which is a US based non profit organization that allows individuals to donate money directly to public school classroom projects. We will get the dataset and try to create an informative analysis on the basis of the data attributes. We will have more clarity on this once we actually see the data set. I have taken a subset of the original data set for our analysis purposes. This dataset contains nearly 9000 rows.

Step 2: Fetching the data and storing it in MongoDB

The original dataset from DonorsChoose.org is available here. For our project we will be using a portion of the “Project” dataset. I have chosen this data as i am somewhat familiar with it. The “Project” dataset contains datapoints for the classroom projects data available with DonorsChoose.org. I have intentionally used a small subset of the data so that we can focus on getting the charts done quickly rather than waiting for the data to be fetched each time we refresh. That said, you can always ingest the original dataset once you have mastered this post. To  download the required dataset click here.

Unzip the rar file to your desktop or a suitable directory.

After unzipping the downloaded file, we get a file named sampledata.csv with a size around 3 megabytes.The data is in csv (comma separated value) format. CSV files are a popular option for storing data in a tabular format. CSV files are easy ti interpret and parse. The file contains nearly 9,000 records and 44 attributes which we can use for our analysis.

Lets Install MongoDB. I am using a windows environment but the process is nearly similar on all platforms. The installation manuals can be found here. For the windows environment open your command prompt, Go to the folder where you installed mongodb and locate the bin folder. In my case the path is :

C:\Program Files\MongoDB\Server\3.0\bin>

Fire up your mongoDB by running mongod.exe. Leave the prompt running as it is and open another command prompt and navigate to the bin directory again. Enter this command

C:\Program Files\MongoDB\Server\3.0\bin>mongoimport -d donorschoose -c projects --type csv --headerline --file C:\Users\Anmol\Desktop\sampledata\sampledata.csv

You will see mongodb importing the datapoints from our data set into the database. This process might take some time.

While we are at it, i would strongly recommend using robomongo in case you are going to work with MongoDB on a regular basis. Robomongo is a GUI based mongoDB management tool. Install robomongo and  open it. Create a new connection and enter localhost as the address and 27017 as the password and click save. Connect to this instance in the next screen that you get. The robomongo shell will look something like this:

robomongo

Navigate to the projects collection and double click on it to see a list of the datapoints. Click on the tiny expand arrow on the datapoints list to see the full list of attributes for the document. These are the attributes  (columns) that we just imported from our dataset. We will be using the following attributes for our analysis:

  1. school_state
  2. resource_type
  3. poverty_level
  4. date_posted
  5. total_donations
  6. funding_status
  7. grade_level

Step 3: Setting Up the Node.js Server

Now to the one of the happening server platforms: Node.js or Node JS or Nodejs, point is you cant ignore it!
I am quoting their website here:

Node.js® is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.

I find Node.js to be very fast and it works really well as far as connecting to mongodb is concerned.

To begin with we need to install Node.js and npm. npm is a package manager for node and helps us in easily deploying modules for added functionality.Here is a cool guide on installing node and npm for windows.

Our Node.js platform is broadly classified into the following folders:

  1. App Folder: Contains the models for data connections and api serving. Includes the files:
    1. SubjectViews.js: holds the data model. Specify specific queries (if any) and the collection name to fetch data from
    2. routes.js: Fetches data from the collection and serves it as an api
  2. Config Folder:
    1. DB.js: Contains the database information i.e. the address and the port to connect to
  3. node_modules folder: Contains various node modules which are installed to enhance functionality of the node server
  4. Public Folder: Will contain our html, javascript and css files. We will use these three to code our charts utilizing the visualization and aggregation libraries.
  5. package.json file: Contains a list of all the modules which need to be installed for the server to run
  6. server.js file: The file utilizes the node modules to initialize data fetch from mongoDB and host our webpages over the network.

The folder structure for our project will look like this:

folderstr

You can copy the full  code and folder structure from the github repository here:

Navigate to the home folder and run npm install command
C:\Users\Anmol\Desktop\blog>npm install

NPM will read the dependencies from the package.json file and install them.

Tip for ubuntu users installing node.js :

To make those Node.js apps that reference “node” functional symbolic link it up:

sudo ln -s /usr/bin/nodejs /usr/bin/node

We now have our folder structure ready. Run this command :
npm start
You will see a message from node saying that the magic is happening on port 8080. (You can change the port number from the server.js file)

Open your browser and go to localhost:8080/api/data (Defined in the routes.js file)

Awesome. Now our API is all set! For the final part i.e. Creating visualizations for the data.

 4. Building our frontend using D3.js,Dc.js and some good old javascript.

A lot of great Business Intelligence (BI) tools exist in the current landscape, Qlik,Spotfire,Tableau,Microstrategy to name a few. Along came D3.js, an awesome open source visualization library which utilized the power of the omnipresent javascript to make charting cool and put the control of the visualization design in user’s hands.

We will be using a responsive html template for our design needs as our main aim is to get the charts up and running rather than code the responsiveness and the style of the various divs. I have used a nice synchronous template for this project.

If you take a look at the node.js code you will observe that the static content will be served in the public folder. Hence we place our html stuff there.

We will be utilizing the following libraries for visualization:

  1. D3.js: Which will render our charts. D3 creates svg based charts which are easily passed into out html blocks
  2. Dc.js: which we will use as a wrapper for D3.js, meaning we dont need to code each and every thing about the charts but just the basic parameters
  3. Crossfilter.js: which is used for exploring large multivariate datasets in the browser. Really great for slicing and dicing data.Enables drill down based analysis
  4. queue.js: An asynchronous helper library for data ingestion involving multiple api’s
  5. Dc.css : Contains the styling directives for our dc charts
  6. Dashboard.js: Will contain the code for our charts and graphs

You can always refer to the code repository for the placement of these libraries. We need to include these libraries in our html page (index.html). Now, To the main task at hand: Coding the charts!

In our Dashboard.js file we have the following :

A queue() function which utilizes the queue library for asynchronous loading. It is helpful when you are trying to get data from multiple API’s for a single analysis. In our current project we don’t need the queue functionality, but its good to have a code than can be reused as per the need. The queue function process that data hosted at the API and inserts it into the apiData Variable.

queue()
.defer(d3.json, "/api/data")
.await(makeGraphs);
function makeGraphs(error, apiData) {

Then we do some basic transformations on our data using the d3 functions. We pass the data inside the apiData variable into our dataSet variable. We then parse the date data type to suit our charting needs and set the data type of total_donations as a number using the + operator.

var dataSet = apiData;
var dateFormat = d3.time.format("%m/%d/%Y");
dataSet.forEach(function(d) {
d.date_posted = dateFormat.parse(d.date_posted);
d.date_posted.setDate(1);
d.total_donations = +d.total_donations;
});

Next Steps are ingesting the data into a crossfilter instance and creating dimensions based on the crossfilter instance. Crossfilter acts as a two way data binding pipeline. Whenever you make a selection on the data, it is automatically applied to other charts as well enabling our drill down functionality.

var ndx = crossfilter(dataSet);


var datePosted = ndx.dimension(function(d) { return d.date_posted; });
var gradeLevel = ndx.dimension(function(d) { return d.grade_level; });
var resourceType = ndx.dimension(function(d) { return d.resource_type; });
var fundingStatus = ndx.dimension(function(d) { return d.funding_status; });
var povertyLevel = ndx.dimension(function(d) { return d.poverty_level; });
var state = ndx.dimension(function(d) { return d.school_state; });
var totalDonations = ndx.dimension(function(d) { return d.total_donations; });

Now we calculate metrics and groups for grouping and counting our data.

var projectsByDate = datePosted.group();
var projectsByGrade = gradeLevel.group();
var projectsByResourceType = resourceType.group();
var projectsByFundingStatus = fundingStatus.group();
var projectsByPovertyLevel = povertyLevel.group();
var stateGroup = state.group();
var all = ndx.groupAll();


//Calculate Groups
var totalDonationsState = state.group().reduceSum(function(d) {
return d.total_donations;
});
var totalDonationsGrade = gradeLevel.group().reduceSum(function(d) {
return d.grade_level;
});
var totalDonationsFundingStatus = fundingStatus.group().reduceSum(function(d) {
return d.funding_status;
});
var netTotalDonations = ndx.groupAll().reduceSum(function(d) {return d.total_donations;});

Now we define the charts using DC.js library. Dc.js makes it easy to code good looking charts. Plus the dc library has a lot of charts to suit majority of anaysis. Checkout the github page for dc here.

var dateChart = dc.lineChart("#date-chart");
var gradeLevelChart = dc.rowChart("#grade-chart");
var resourceTypeChart = dc.rowChart("#resource-chart");
var fundingStatusChart = dc.pieChart("#funding-chart");
var povertyLevelChart = dc.rowChart("#poverty-chart");
var totalProjects = dc.numberDisplay("#total-projects");
var netDonations = dc.numberDisplay("#net-donations");
var stateDonations = dc.barChart("#state-donations");

And the final part where we define our charts. We are using a combination of charts and widgets here. You may notice that we are essentially supplying basic information to the chart definitions like dimension,group, axes properties etc.

// A dropdown widget
selectField = dc.selectMenu('#menuselect')
.dimension(state)
.group(stateGroup);
// Widget for seeing the rows selected and rows available in the dataset
dc.dataCount("#row-selection")
.dimension(ndx)
.group(all);
//A number chart
totalProjects
.formatNumber(d3.format("d"))
.valueAccessor(function(d){return d; })
.group(all);
//Another number chart
netDonations
.formatNumber(d3.format("d"))
.valueAccessor(function(d){return d; })
.group(netTotalDonations)
.formatNumber(d3.format(".3s"));
//A line chart
dateChart
//.width(600)
.height(220)
.margins({top: 10, right: 50, bottom: 30, left: 50})
.dimension(datePosted)
.group(projectsByDate)
.renderArea(true)
.transitionDuration(500)
.x(d3.time.scale().domain([minDate, maxDate]))
.elasticY(true)
.renderHorizontalGridLines(true)
.renderVerticalGridLines(true)
.xAxisLabel("Year")
.yAxis().ticks(6);
//A row chart
resourceTypeChart
//.width(300)
.height(220)
.dimension(resourceType)
.group(projectsByResourceType)
.elasticX(true)
.xAxis().ticks(5);
//Another row chart
povertyLevelChart
//.width(300)
.height(220)
.dimension(povertyLevel)
.group(projectsByPovertyLevel)
.xAxis().ticks(4);
//Another row chart
gradeLevelChart
//.width(300)
.height(220)
.dimension(gradeLevel)
.group(projectsByGrade)
.xAxis().ticks(4);
//A pie chart
fundingStatusChart
.height(220)
//.width(350)
.radius(90)
.innerRadius(40)
.transitionDuration(1000)
.dimension(fundingStatus)
.group(projectsByFundingStatus);
//A bar chart
stateDonations
//.width(800)
.height(220)
.transitionDuration(1000)
.dimension(state)
.group(totalDonationsState)
.margins({top: 10, right: 50, bottom: 30, left: 50})
.centerBar(false)
.gap(5)
.elasticY(true)
.x(d3.scale.ordinal().domain(state))
.xUnits(dc.units.ordinal)
.renderHorizontalGridLines(true)
.renderVerticalGridLines(true)
.ordering(function(d){return d.value;})
.yAxis().tickFormat(d3.format("s"));

And finally we call the dc render function which renders our charts.

dc.renderAll();

Mission Accomplished!

Open your browser and go to localhost:8080/index.html to see your dashboard in action.

There is a lot of customization that can be done to the charts. I did not delve into them at this stage. We can format the axes, the colors, the labels, the titles and a whole lot of things using dc.js, d3.js and CSS. Moving on i will be taking up one chart at a time and provide additional examples of what all we can customize.

At the end we now have some knowledge of MongoDB, Nodejs, D3. You can use this project as a boilerplate for exploring and analysing new data sets. All the source code can be found in this github repository.

I will be most happy to answer your questions and queries.Please leave them in the comments.

Do share the post and spread the good word. It may help folks out there get to speed on this open source visualization stack.

Cheers!

Advertisements

Author: anmolkoul

I am a pretty decent BI Guy. Learnt quite a bit about the different tools and technologies in the market. This blog will serve as an instrument of spreading what i learnt and learn more from the visitors. That apart, my hobbies include playing Basketball, Badminton and contemplating whether the Avengers are better than Justice league.

233 thoughts on “Interactive Data Visualization using D3.js, DC.js, Nodejs and MongoDB”

  1. The like to the required data set is no longer valid. Can you post a new link- I’d like to complete the tutorial. Thank you!!

    Like

  2. Hi, I am unable to download the sample file uploaded in dropbox, maybe its removed. is there a possibility to get the sample data please?

    -Sai

    Like

  3. What should we be looking for when we go to localhost:8080/api/data … I don’t get any response back (am using Mac). I also got the error “js-bson: Failed to load c++ bson exstension, using pure JS version” . It was fixed using the suggestion at https://stackoverflow.com/questions/21656420/failed-to-load-c-bson-extension by —

    In the file node_modules/mongoose/node_modules/mongodb/node_modules/bson/ext/index.js, change the line

    bson = require(‘../build/Release/bson’);
    to

    bson = require(‘bson’);

    Like

  4. I also get an error — Error: ENOENT: no such file or directory, stat ‘./public/login.html’ —

    npm start

    > starter-node-angular@1.0.1 start mongodb/node-dc-mongo
    > node server.js

    Magic happens on port 8080
    Error: ENOENT: no such file or directory, stat ‘./public/login.html’

    Like

  5. Hello sir,
    I am using dc series chart and i have provided all mandatory attributes but still it randomly throws error like “chart.seriesAccessor missing in chart” . Please can u help me regarding this.
    Thanks

    Like

  6. Hi There. I’ve been trying to get the code and data you have pointed me to to work. Seemed straight forward enough. The data has been loaded into MongoDB and the code has been pulled and installed (Steps 1- 4).

    There was 1 error related to the bson. Of which i solved by repointing the bson path.

    After running npm start. The data is not showing. The chrome inspect shows.

    api/data Failed to load resource: net::ERR_EMPTY_RESPONSE

    I think that means that the query returned no results.. is that right? Do i need to create a user ID on mongodb’s db to make this work? the query can run if i test it on Robo 3T and there aren’t any more error on the node console.

    Is there any other steps required that i’m missing.. ? help . this example looks really interesting.. just that i can’t get it work..

    Like

  7. Hi Anmol.

    I followed the tutorial and tried accessing the page on localhost:8080/api/data. The wheel keeps spinning in the tab but nothing is displayed. And when I tried accessing localhost:8080/index.html, none of the charts are rendered expect for the chart headers and html contents being displayed. Can you please help me out?

    Thanks,
    RR

    Like

  8. Hi,
    I tried to use another dataset in my local mongodb but apiData is always in Dashboard.js. Can you please explain how it is working. Even the function makeGraph is not getting executed. I can see that /api/data is fetching the records by putting a console.log statement. But nothing is happening in makeGraphs. Can you please help?

    Like

  9. Hi, just wondered if it’s possible to run this in another web service. I’m currently running Bitnami Wampstack and would like to serve this up within. I’m also a bit confused on where the /api/data points to. I have installed Mongo and node in their own folders to Bitnami\wampstack-5.6.32-1\apache2\htdocs and imported data to Mongo. Both Mongo and node are running. Any advise would be gratefully received. Thanks

    Like

  10. Really cool stuff. I have a working version with my own data. Do you have any ideas on how to expand this to use the Mongo $lookup function or Mongoose Aggregate? I would like to pull in data from 2 collections into my dashboard. Thanks.

    Like

  11. great tutorial! I reduced the csv file into an even smaller file. I would like to get rid of the mongoDB entirely and have the script read the csv file as the data source. can you suggest how I can easily edit the code to do this?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s