Linux, Computing, Contemplation, and MetaCrisis: MongoDB

Showing posts with label MongoDB. Show all posts

Wednesday, September 9, 2015

The Internship Experiment - Top 5 Challenging Scala and MongoDB concepts

Overview

In the recent internship that I had offered, I wanted the interns to experience new and emerging technologies that they were less likely encounter in their school's curriculum, and hence would add to the overall challenge. Scala, Play Framework, and MongoDB were chosen for this reason. To counter the steep learning curve, I kept the internship project goal simple. The interns were to develop a web-based survey creation and collection application, using the technologies mentioned earlier. In order to help the interns with learning these technologies, I shared the source code of another web-based application I created using these same technologies. We went over a code walk through so as to familiarize them with core concepts of Scala, Play, and MongoDB. As with learning any new technology, there were challenges. This article will list, and briefly describe, the top 5 challenges the interns ran into with learning and using these technologies for their project.

5. MongoDB: Joins and Aggregations

Schools teach RDBMS concepts, and they involve JOINing. Lots of it. Which is why NoSQL databases can be very confusing to understand at first. The interns were no different. The concept of rich documents excited them, and they seemed to believe that they understood the concepts well. They were also successful in creating CRUD code for simple documents. However, when they started work on slightly involved use cases, they began to miss their INNER JOINs and COUNTs from their Advance Databases Class. After a couple hours of revisiting document modeling in light of the use cases that triggered their SQL withdrawals, they became slightly more familiar and comfortable with using Rich Documents.

4. Play/Reactive MongoDB Driver: Writing code for asynchronous execution

Coding involves solving a problem, and so, we ought to code like we think. This involves anticipating and planning ahead of time, for possible decisions that we might encounter. Many of us relied on debugging and stepping through code, early on, as a learning tool. This, however, has an adverse consequence, as instead of teaching us to code like we think, it teaches us to code as we'd like the code to execute. This results in our code being written for sequential real-time execution. This makes it much more difficult to learn asynchronous programming, which involves coding for what might happen in the future, rather than in real time.

Things got even more complicated for the interns, when they had to write code involving multiple Futures, and tying their responses together, in a single code block. We spent a few hours revisiting this issue on more than a couple occasions, where we emphasized the need to "code like we think" rather than "code like we debug".

3. Scala: Implicit Conversions

Implicit type conversions is a very handy feature of the Scala language. It allows you to avoid writing intermediate type conversion code every time you have object of type A, but instead need object of type B. Writing an implicit conversion from type A to B gives you the ability to use object of type B anywhere you'd use object of type A. In my example code to the interns, I had provided them with an implicit conversion of a model case class to, and from, a BSONDocument type, so that I could pass the model case class directly to the MongoDB collection functions, instead of creating BSONDocuments for them every time. The interns soon forgot the concept and assumed that the MongoDB driver worked directly on the model classes. Many of their MongoDB driver related errors were, in fact, a result of copying-pasting code for implicit conversion from one model class to another.

2. MongoDB: Accessing the _id for an inserted document

MongoDB automatically creates _id key, and sets it to a new ObjectID, whenever you insert a document to a collection without an _id key. In the example code for a model class, I had also provided code for implicit conversion for the model case class to and from BSONDocument type. The implicit conversion to BSONDocument accounted for a missing id member of the model class. In that case, a new BSONObjectID was generated and assigned to the id member. This was part of the implicit conversion.

However, since the interns had forgotten all about the purpose of the implicit conversions, they were confused when they encountered a use case, where they had to use the id of a freshly inserted document, as a key in another document, in a different collection. They wanted to know how they could retrieve the id of a newly inserted document. The solution was simple. Supply the id yourself before inserting the document the first time. Since you control the generation of id in code, you could use it however you like.

1. Scala: Type Inference

Type inference is a very powerful feature of the Scala language and compiler. If a type can be inferred, chances are, you can get by without specifying the type of the object you are creating. This can, however, result in some unexpected behavior, and compile time errors, that could be confusing to interpret if not used carefully.

The most common complaint and frustration I heard from the interns had to do with this feature. The easiest way to troubleshoot type inference related issues happens to be to not let the compiler infer type of the problem code block. Explicitly specifying the type of the object created, or the return type of the code block, helps isolate the issue to the specific code block that might be causing the problem behavior. Once the problem is fixed, you can then omit the type information.

Conclusion

The above challenges were in no way a reflection on the interns' aptitude or ability to solve problems. I remember encountering similar challenges when I was first introduced to Scala, Play, and MongoDB driver for Scala. My academic training had not prepared me sufficiently for these challenges either. That was the primary reason why I chose these technologies for the internship. The interns admitted their frustration on my decision, but in the end were grateful for the same, as it pushed them out of their comfort zones, and gave them a safe platform to explore these technologies and concepts.

Saturday, December 31, 2011

RavenDB on Linux - Source Code

I've created a new github repository to temporarily host the updated code for RavenDB to enable execution under Linux. I say temporary because a few things could happen.

Worst case scenario, and I don't believe this would happen, but there is a slight possibility, that I might be asked to shut down the repository because I unknowingly violated some terms of use. For an open source project, this is a highly unlikely scenario.

The normal case scenario would be that I get no notice from the creators of RavenDB in which case this code would continue to exist under its own repository.

The bese case scenario, and I would really like this to be the case, would be that my changes in some shape be accommodated upstream in RavenDB making RavenDB a cross platform tool.

I did not investigate why the OutputStream.Flush() command was causing an exception. At the same time, this is really my first attempt at MEF and .Net 4.0 and I don't know why the exports were not automagically loaded, in resolution to which, I had to manually load them using reflections. A better fix would be to identify and resolve these issues.

I am glad, however, that I was able to fulfill a personal quest of learning about RavenDB, and in the process, making it run under Linux. This opens up the possibility of making RavenDB a serious contender against MongoDB on the non-Windows platforms.

RavenDB along with my source code changes are available at https://github.com/jimmy00784/RavenDB-for-Linux https://github.com/jimmy00784/ravendb.

Note: Source code url updated.

This article is part of the series NoSQL - RavenDB on Linux. The series contains the following articles:
NoSQL - RavenDB on Linux
Open Source Shines - RavenDB on Linux
RavenDB on Linux - Source Code
RavenDB on Linux - Update

Saturday, December 24, 2011

NoSQL - RavenDB on Linux

For a while now I've been on a quest to find a NoSQL database that met the following criteria:

Document Oriented Database
Supports Transactions across multiple Documents
.Net/Mono compatible drivers
Runs under Linux

And the journey has been anything but easy.

Ever since I first read about Document Oriented NoSQL Databases, I've been fasinated by them. I started looking into CouchDB [apache.org] at first, but MongoDB [mongodb.org] soon became my favorite.

MongoDB is a Document Oriented Database that is written in C++ with performance in mind. MongoDB allowed my to write applications using my favorite languages and ran on many Operating Systems. After doing some research, I also found a port of MongoDB for ARM. It now runs on a Debian server that I have setup on PogoPlug which runs on an arm chipset. The JSON style BSON representation of data was simple to follow and the ability store any object in MongoDB without having to implement special interfaces or inheirit from special classes made it that much more appealing to me. The .Net drivers were readily available on their website along with plenty of documentation on how to use them.

MongoDB met three out of the three criteria I had above. The was one that it didn't satisfy: MongoDB doesn't yet support Transactions [mongodb.org] and there is no word out yet that would promise availability of transactions any time soon.

I continued my research and came across RavenDB [ravendb.net]. RavenDB is a Document Oriented Database that is written completely in .Net. It is also Mono compatible. Unlike MongoDB, RavenDB does support transactions across multiple documents, however, it only runs under Windows. Though evn RavenDB doesn't meet all the four criteria, since it was written completely in .Net, that gives me something to tinker around and see if it could be made to run under Linux.

My first attempt was a few months ago when I first read about RavenDB. I downloaded the code from their GitHub git repository and fired up MonoDevelop [monodevelop.com]. The issues became immediately apparent: heavy reliance on Silverlight. But even then, there was .Net 4.0 code that my mono compiler could not make sense of. After spending few minutes over it, I gave up on it.

Few days ago, with the update of Ubuntu, I got the near latest version of Mono installed on my laptop. I decided to give RavenDB another try.

From my konsole, I issue the command to get the latest code for RavenDB and created a new branch - linux.

git clone git://github.com/ravendb/ravendb.git

cd ravendb

git branch linux

git checkout linux

As soon as I fired up MonoDevelop and loaded the solution, I was greeted with this error message:

Also, the projects Raven.Backup and Raven.Smuggler were not set to build under my current configuration. I noticed that upon Right Click -> Options -> Build -> Configuration on the Raven.Backup project, the only configurations that were available were for x86. I was able to select Debug, click Copy button and create another configuration for All CPU. I repeated this for Release and then the same steps for the other project as well.

Then from the Right Click -> Options -> Build -> Configurations -> Configuration Mapping tab, I selected the correct configuration for the two projects. now the projects were no longer marked as "unable to build under current configuration." Now, on to the issue with the three projects failing to load: Raven.Client.Silverlight, Raven.Studio, Raven.Tests.Silverlight.

While anything Silverlight under Linux (using Moonlight) was not a promising prospect to begin with, I still wanted to give it a shot before beginning the removal of non-compilable projects. I opened the project files in plain text format and searched for the GUIDs from the screen print above. On finding them, I commented them and attempted to reload the projects. The solution was now ready for the first compile.

First attempt:

95 build errors. I also noticed that at least a few of those errors were really warnings. This meant one thing - some or all the projects were setup to treat warnings as errors. That was an easy fix. On all the projects, I unchecked the "Treat warnings as errors" box from Right click -> Options -> Build -> Compiler screen. Time for the second compile attempt.

Second attempt:

100 build errors. One of the first errors was in the Raven.Client.Debug project on classes from the Microsoft.VisualStudio.DebuggerVisualizers namespace. Dependency on Visual Studio would be a problem under Linux. I decided to not dwell too much on this error and chose to remove this project all together from the solution. All the dependencies on Raven.Client.Debug would also have to be removed.

Third attempt:

95 build errors. This time it was the NLog namespace under the Raven.Tryouts project. Reviewing the error revealed that NLog dll was not compatible with the current Mono runtime. Fortunately, NLog had a Mono compatible binary available for download on Codeplex. I downloaded it and replaced the reference in the project.

Forth attempt:

94 build errors. Upon closer inspection, I observed that Raven.Web relied on System.Web.Entity which is not available under Mono. Also, Raven.Client.Silverlight System.Windows and System.Windows.Browser which are also not included with Mono. I decided to remove these two project from compilation along with Raven.Tests.Silverlight project. Raven.Studio project was the next one to go since it was a WPF application - lots of XAML files - and WPF is not fully implemented under Mono.

Fifth attempt:

0 build errors. Are we there yet? Let's give it a try. I ran the Raven.Server project. Bummer! Runtime exception. DllNotFoundException. Turns out that Raven.Storage.Esent project implements Microsoft's ISAM Esent storage which is proprietary and requires Windows in order to run. Since we also had Raven.Storage.Managed project, I decided to modify the configuration of the application to use the managed storage library Munin instead of UnManaged Esent. I modified the App.Config under Raven.Server and changed the value for "Raven/StorageEngine" to "Munin" from "Esent".

Sixth attempt:

0 build errors (expected). Yet another runtime error however. This time it was a MissingManifestResourceException with a very vague description and a long stack trace. However, right before the stack trace took me into obscurity, there was a hint - ravendb/Raven.Database/Server/HttpServer.cs:110. The SatisfyImportsOnce call was failing after some execution. After doing some reading into it, I found out that RavenDB was build on MEF which is a feature of .Net 4.0 and is supposed to provide easy plugin/extension functionality to .Net programs. That line was supposed to initialize the extensions once. Somewhere in the code below there would be a code block that would expect non null values. I commented that line and proceeded.

Seventh attempt:

0 build errors. As expected, NullReferenceException on Raven.Database/Server/HttpServer.cs:112. RequestResponders was supposed to be non-null. I did some more research to understand how MEF was supposed to implement the extensibility. I had to find classes that were inheirited from AbstractRequestResponder. May be manually loading objects into RequestResponders would help. And indeed I found plenty of classes that were inheirited from AbstractRequestResponder and RequestResponder classes. I added the following lines of code in HttpServer.cs

Eight attempt:

0 build errors. Another exception, this time NullReferenceException at Server/HttpServer.cs:193. This seemed similar to the previous issue but with a different object - ConfigureHttpListeners. Time to repeat the exersice with different base class - interface this time.

Ninth attempt:

0 build errors. I started seeing some output repeating and they didn't look like error messages:

Available commands: cls, reset, gc, q

Could not understand:

I decided to run the compiled application from konsole to see how it would behave there.

Raven is ready to process requests. Build 13, Version 1.0.0.0 / abcdef0

Server started in 847 ms

Data directory: /home/karim/Projects/RavenDB/ravendb/Raven.Server/bin/Debug/Data

HostName: <any> Port: 8080, Storage: Munin

Server Url: http://karim-laptop:8080/

Available commands: cls, reset, gc, q

Time to run some tests. I created a small console .Net application and decided to fire it up. It worked! Some issues did surface, but nonetheless, it worked. I was able to store and retrieve simple documents to the database. Since I had removed some of the critical projects such as Raven.Web and Raven.Studio, the web interface was gone with them. Dynamic indices did not work either.

Now that I have a semi functional database, I'll put some time in and try to create a simple Raven Studio replacement as well as work the other kinks out.

Overall, it helps that the application was written completely in .Net without any dependencies on the OS that could not be achieved via framework level abstraction. I am certain that RavenDB could be made to work under Linux at similar level of confidence as it does under Windows.

This article is part of the series NoSQL - RavenDB on Linux. The series contains the following articles:
NoSQL - RavenDB on Linux
Open Source Shines - RavenDB on Linux
RavenDB on Linux - Source Code
RavenDB on Linux - Update

Tuesday, September 21, 2010

SQL or NOSQL

Over past few days, I have been reading and experimenting with MongoDB, a Non-RDBMS database, aka NOSQL database. NOSQL stands for "Not Only SQL." NOSQL databases offer an alternative to software development scenarios where traditional RDBMS might be too cumbersome to work with. MongoDB falls under what is called a "Document Oriented Database" category. Surprisingly enough, NOSQL has been around for at least a few years now and is already a favorite among the Social Network crowd: FaceBook, Twitter, etc., however, regardless of how simplified definitions on the Internet, it takes a while to completely grasp the concept. Take as much time as you can to read about them before you decide to take a plunge.
I was looking at CouchDB and MongoDB at the same time, and decided, for no particular reason, to use MongoDB for my training exercises. What better way to learn new technology than to use it in one's project (when feasible)?
My development environment comprises of the following: Ubuntu Linux Operating System, PHP setup with Zend framework and h2o template, git for source control, Eclipse IDE with PHP PDT and eGit plugin, and MongoDB and the native drivers for PHP. The idea is to train myself on PHP, Zend framework, h2o, MongoDB, and git at the same time in one big comprehensive development project.

Pages

Wednesday, September 9, 2015

The Internship Experiment - Top 5 Challenging Scala and MongoDB concepts

Overview

5. MongoDB: Joins and Aggregations

4. Play/Reactive MongoDB Driver: Writing code for asynchronous execution

3. Scala: Implicit Conversions

2. MongoDB: Accessing the _id for an inserted document

1. Scala: Type Inference

Conclusion

Saturday, December 31, 2011

RavenDB on Linux - Source Code

Saturday, December 24, 2011

NoSQL - RavenDB on Linux

Tuesday, September 21, 2010

SQL or NOSQL