I acknowledge that this blog post may be a TL;DR-kind of post. If it feels like that or begins to feel like that … by all means, click away.
However, I was asked to give a bit of a personal experience that leads me to DeepSee and to give some career and historical background that has influenced my perspective on the innovation areas we see and our emphasis on building the industry’s best, most extendable, and most capable platform for Deep Learning and Machine Learning (ML) models for Natural Language Processing (NLP) and text analytics with the many insights and efficiencies that can be derived from the input to these models, that being unstructured data. I am a bit apprehensive about publishing this post as it dates me, certainly, and it is really just a view into my historical experiences which may be uninteresting, I acknowledge. However, I have enjoyed the writing of this as it offered me some permission to reflect on some of my many experiences. My joy in writing this, and thinking of my meandering career, will far outweigh the labor required to read it, I am sure.
Personal Career History and Platform Influence
In the late 1980’s and early 1990’s I was an engineer with a software firm in Utah, Novell. I was with Novell for 8+ years and left in late 1994. I learned a great deal during this time about networking, APIs, remote services, abstraction layers and interfaces for extension that all led to some early “platform” ideas.
In the early 90’s, the Internet was very unlike what we see or think of today. Interacting and sharing information was with Usenet newsgroups, forums, anonymous ftp sites, email, and email lists. I was so influenced by these early Internet movements that I started an Internet Service Provider, ISP, with a rack of ZyXEL modems, and a “co-location” (COLO) offering that was literally in my closet, during this time (with Novell’s permission) called Infonaut.com. I sold this ISP a few years later to another ISP.
During this time of change and Internet adoption were various technical conferences, like USENIX, where information and trends were shared. At one USENIX conference I attended, I was sitting at a small table eating lunch, surrounded by lots of people doing the same, and I see Dennis Ritchie (co-inventor of the C Programming Language with Brian Kernighan) walking toward, then past, me. Cool, at least this was for me at that age. He was one smart dude, no doubt. Regardless of what you know of him, in high-tech software and platforms he is clearly a demi-god; or at least the trio of him, Ken Thompson and Brian Kernighan being demi-gods together. This trio of collaborators created the very influential UNIX operating system while at Bell Labs back in the ‘70s. It would be hard to argue against this creation being the most valuable and influential software platform, invented to date. The simplicity, approach, technical implementation and architecture were ahead of their time and, though many find UNIX cryptic or difficult, it was easily extendable. There may have been better OS approaches around at the time, or even since, but none have had the influence of what they created.
In early 1991 I had a “pizza box” NeXT computer on my desk. It was one of the few NeXT computers in the company. I was a huge fan of the NeXT, its MACH microkernel and BSD UNIX operating system, the use of Objective-C, and its clever user interface and collective OS experience, called NeXTSTEP. To this day, I still have the “dock” on my Macbook on the lefthand side mimicking what NeXTSTEP did. There is a cool history about this platform starting with Steve Jobs at Apple, then after “leaving”, or being asked to leave, Apple, starting NeXT, finding and working with Avie Tevanian, who was very bright out of CMU working on MACH, Apple then buying NeXT, Steve becoming CEO again and then Apple adopting NeXTSTEP as the basis for MacOS. But, I already digress.
I found, during some research, an app out of CERN in Europe, around this time. It was the first browser produced by Tim Berners-Lee (now, Sir Tim Berners-Lee) of what he called the WorldWideWeb. The first “app” he built was on the NeXT. I downloaded and ran it. At the time, I showed this simple app to a good friend of mine with words describing it like a “hypercard over the net” app and indicated that I thought this was “cool” and something that a networking company, like Novell, ought to investigate as to document and information exchange. What a missed opportunity. Even back then, I had a feeling that information exchange and access was intensely valuable as I was an active participant in the many, then current, methods for such exchanges. I’m not saying that I saw the current Internet/Web from this nascent beginning, though I did see one of the appeals was an easier form of, and easier methods for, information exchange. Tim’s approach of using a defined visualization language methodology of an SGML DTD, that being HTML, as the markup language for his app was smart, building on what was before it. Why reinvent when you don’t have to? Tim invented HTTP (Hyper Text Transport Protocol, the protocol of the “web”) and it was brilliant in its simplicity; the only action of the first version of HTTP was “GET”, as in “get a page.” This simple beginning has led to an amazing “platform” for interaction and exchange of information and, now, services. Wow! I wish I had seen all that coming at that time, but I didn’t.
The platforms mentioned above, and more generally all platforms, have their roots in convention and protocol. In other words, a language or definition that defines interactions and what is expected in response to a request. It also sets up the concepts of “server” and “client” or a platform that provides a set of services and users (clients) that request service from servers. It is quite striking that a set of response conventions, even very simple ones, to requests lead to such innovation and expansion, but it can, partly because of the “contract” nature of the request and the response forms and expectations surrounding these request/response interactions. Much of the maturity of the concept of platform over the years has been to start with these base conventions, definitions and contracts and to build “upper layer” conventions that provide additional, or higher level, services to requesters and then even more conventions and interfaces that build on these and so forth. As these layers of conventions become popular and are built upon, a kind of abstraction happens, actually it is inherent in the system, where a requester can ask of a service a very “high level” request that requires no knowledge of the many lower-level services that the service was built upon. What a brilliant concept! This abstraction leads to focused services and a greater ability to perform very complex things without all the knowledge of the underlying parts. This, the many layers of abstraction with each layer adding value to the layers below it, is part of the reason that the rest of the world is perplexed by why software engineers aren’t more fungible. They aren’t because they may specialize on a very rich “layer” of the many abstraction layers and become expert “in that layer” and to ask them to move up or down the “stack” sometimes isn’t a simple ask.
Good, experienced, productive, “full stack” engineers are hard to find and sometimes not even desired as a “jack of all trades” might not have the expertise desired on a particular and valuable “layer” of service.
Another major contributor to platforms, generally, has been in the use and popularity of open source software. In the early ‘90s, open source software and platforms using open source were in their early days. A possible exception was the GNU C Compiler (gcc) which had somewhat wide adoption and use. I did a small’ish project extending gcc to produce apps that would run on Novell’s NetWare operating system (called NLMs or NetWare Loadable Modules). I was only able to do this project because the software was available and I could envision an extension to it to provide such a feature. Why would I do this project? Because, I wanted to continue to use my NeXT to produce NLMs, as was my job at the time, as opposed to doing this function on DOS, like what was the norm. This driving desire to answer the question of “why can’t I do something?” leads to interesting pathways, particularly in light of open source software, and sometimes adds platform capabilities that solve real problems in easier ways; thus, this anecdote being included here.
My exposure to platforms and open source was only beginning. During my tenure at Novell, I worked on a version of NetWare, which I lead the effort to write, that we hosted on various operating systems. This project was called Portable NetWare and then a couple of years later was renamed NetWare for Unix. As to platform, my exposure was greatly enhanced by discussions with various candidate and eventual customers (minicomputer manufacturers, mainframe suppliers, and, what was termed at the time, “workstations” including many hardware/software vendors running on RISC CPUs; e.g. Sun Microsystems, HP, DEC). I was given an extensive view to a wide array of platforms and operating systems (some UNIX variants, but not all) for Portable NetWare. One aspect of this “porting,” for instance, of Portable NetWare to a hosted OS included the need to bring to a candidate OS the transport protocol stack that NetWare required; namely, IPX (all based upon the early work at Xerox PARC and their XNS protocols). We had a version of IPX, and the other adjacent protocols needed for full NetWare functionality, written in STREAMS, a somewhat popular communication framework at the time. The idea was if we could get STREAMS on a candidate OS then we could get the protocol stacks moved. In most encountered operating systems we extended the underlying operating system “kernel” but on a few we ran the protocol stack (STEAMS and our IPX protocol stack) in “user” space. This approach took a performance “hit” but it worked.
So, how do you extend a functional platform (in this case an OS) while preserving capabilities and performance? In other words, how do you extend the platform to have additional capabilities that are needed? There are “good” ways and “not so good” ways, but usually there IS a way.
After this, I was given the opportunity to explore the use of the MACH microkernel out of CMU and more particularly a version that was positioned as a cluster-OS (NORMA MACH, used at the time, as it was widely believed, on US Navy battleships) to host a clustered version of NetWare on HP workstations in a proof of concept. At the time, MACH was a beautiful piece of software (at least in my opinion in MACH version 3.0) where operating systems ran as “personalities” on top of the microkernel (“personalities” is a term used in a competing microkernel called ChorusOS I was able to play with, at the time, and I liked the terminology and the ChorusOS language of the architecture). A microkernel abstracted away the hardware from the operating system and in so doing allowed various operating systems to run, sometimes in parallel with each other, on the same microkernel either on the same hardware or a cluster of hardware platforms. During this time, with MACH and other advancements, I was able to explore the notion of large and distributed systems (operating systems and file systems), and concepts around concurrency, remote consistency (caching) which led to today’s thinking about “eventually consistent” protocols used in Blockchains and elsewhere, checkpointing, remote procedure call infrastructures (RPC) and Object Request Brokers (ORBs), for managing remote response procedure services and many other infrastructure and distributed ideas. This time was rich in my personal journey in expansion on capabilities, approaches and architectures in software, and in hardware, and more particularly in distributed systems and platforms. These early experiences in “platforms” were instrumental to my belief that it is these platform concepts that provide fertile ground and sustainability for ideas to grow. Mostly, it is the contract nature of the services provided and the ability to extend, or build upon, a platform to add an additional service or service layer. The platforms available to me, to this point, were many: NetWare, NeXT with the ease of app development using Objective-C and NeXTSTEP, UNIX and its many variants, various mainframe operating environments, file systems and distributed file systems, open source software and tools (e.g. gcc, BSD UNIX running on opensource MACH micro-kernel), OMG CORBA ORBs (you can look this up if you wish), Domain Directory Services, POSIX (where I learned that, generally, committees are good for input but bad for output), and many others.
The last project I worked on at Novell was using very early Linux (’93 and ’94) in a product (Linux as the OS base for a product) that Novell intended to give away as a desktop OS offering. We called it internally Corsair. To my knowledge the first time Linus Torvalds, creator of Linux, came to the US, from his native Finland, was at my invitation to come to Utah, at our expense, to see me and our team. We did a great deal with Linux at this time and this added, for me, additional weight to the benefits for useful platforms and how they can be successful. Linus was a brilliant young man, and I’m sure still is a brilliant somewhat older man now, with just the right mixture of technical prowess, an easy-going demeanor but also stern personality traits that played an almost indescribable role in the advancement of not only Linux but opensource and platforms. He changed the world in ways that would take me volumes to describe from even my very limited view. I was lucky to have met him early in his, and my, career. Just a bit more on Linus and Linux. There are many, many attributes that contributed to the success of Linux but one of these attributes early on was Linus’ decision to not only opensource Linux but to do so under the GNU General Public License (GPL). A discussion on the genesis of GNU (the recursive acronym of GNU’s Not Unix), the GPL and Richard Stallman is beyond the scope of this blog post, though it would be long, interesting, and entertaining, I believe. This publishing Linux under the GPL was a great move, in my opinion, and that combined with Linus’ knowledge and personality was pivotal to Linux success early on and even until now. There are many wonderful articles that give context on why the GPL license was so important and that is outside of the scope of this blog post but interesting reads, nonetheless.
This Novell Corsair project, built upon Linux, came with a NetWare client built-in, a DOS emulator running DRDOS (which Novell owned at the time and now, somewhat surprisingly as to history, I purchased in the mid-90s and still own, including the original PC operating system, CP/M, with a friend, but, again, that’s another story), the ability to run off-the-shelf Microsoft Windows apps in a free-of-Microsoft code platform, a “Windows emulator,” with a technical genesis that was acquired, a very attractive Mac-like user interface licensed by a wonderfully creative company that no longer exists in Reston, Virginia called Visix and their product called Looking Glass. I hold in great esteem two wonderful and bright people at Visix that influenced me early on; George Hoyem, now a partner at In-Q-Tel and Jeff Barr who is a longtime employee of Amazon working on AWS and now an evangelist of AWS services. This Corsair operating system had a great set of tools and capabilities, could run the newly acquired, by Novell, Windows-based WordPerfect, and associated apps, and many others and had great network infrastructures. We also built from-scratch a full featured Web browser, we called Ferret, before Netscape existed, that was fast and capable while also being able to run on Novell networks (running IPX and not TCP/IP, the transport protocol of the Internet). We did this with protocol gateways on the IPX network to route from IPX networks to TCP/IP networks (the Internet) and back again so you could browse the Internet from a machine on a Novell network while not having TCP/IP installed on your machine. It was very cool. Ultimately, and unfortunately, Novell decided to not ship this product and stopped its progress. When that happened, I quit and started a Linux company, Caldera, to recreate much of what I could “on the outside.” I wasn’t able to recreate and build out all I desired in Caldera and I made many mistakes along that journey, that being my first entrepreneurial adventure, but my beliefs continued along the value of platforms and their potential impact. We saw possibilities, at that time, with Linux (as a generally more available and useful UNIX) and its many standard interfaces, HTTP for exchange of information and services, client/server infrastructures, transport stacks, and the ability to extend or add to most every “layer” in this rich suite of services.
At this point, on the idea of platforms, I was hooked and much of my career since has been influenced by these early days. After my experience at Caldera, I started (with a few others) an embedded Linux system software platform company (Lineo) working with device manufacturers all over the world embedding Linux and Linux services in a wide array of small and large embedded systems. This was clearly a platform play and we offered a wide and deep set of services and SDKs (software developer kits) while extending what Linux inherently offered, at the time, (e.g. hard real time, suite of capabilities in and on top of Linux to solve a wide array of embedded market challenges, and helping with ports to other CPU types and architectures). Our customers, and investors, were some of the world’s largest device manufacturers (e.g. Samsung, Motorola, Mitsubishi, Acer, Mitac, France Telecom, Hitachi, Sun Microsystems, and many others). Based on the concepts of protocols and definitions we were able to leverage and extend Linux in ways that were meaningful and valuable to our customers. The ultimate penetration of Linux into these types of devices, as we sit here today, is a testament to service layers built on other reliable layers. Linux is the basis for many “embedded” devices: routers and other network infrastructure devices, handheld devices with Android, based on Linux, or Samsung’s Tizen, LG’s webOS and many others. The list of devices that use Linux are too many to list for both large and small devices. Lineo was ultimately purchased by Motorola’s semiconductor group.
After Lineo, I started another Linux platform company, Solera Networks, based on the idea of network packet capture and the ability to do forensic reconstruction of past events based on large and very capable network packet capture and storage. We could capture and store (write to disk) at incredible rates. So how might one use the platform to navigate the many millions of packets captured to date to reconstruct some HTTP session or a Voice-over-IP (VOIP) telephone call, or something else? We wrote our own filesystem for Linux that was prejudiced to “write” performance whereas most all other filesystems are prejudiced for “reads” and could scale our filesystem to many, many, many terabytes all while capturing fully saturated 10GB network packet streams. To navigate with this filesystem we decided to use the Linux filesystem interface to extend to new capabilities. For instance, say that you wanted to track all VOIP packets from a particular IP address during some time window, perhaps due to some legal “wire tap” of VOIP phone calls. We allowed the UNIX file “creat()” call to be used and the name of the file “created” as guidance to our filesystem to help satisfy the filter provided by the name of the file. For instance, we used, on the command line, the UNIX “touch” command, or utility, to simply create a file with a name. The name created might be something like, touch -f 07.04.2020.13:15-07.04.2020.15:00_srcIP-10.10.10.2_prot:VOIP, which would create a file (more particularly, a standard Packet Capture, PCAP, file if you want to look that up) that is populated with TCP/IP packets from the source IP address 10.10.10.2, during the time window of 4 July 2020 at 1:15pm through 4 July 2020 at 3pm but only include VOIP packets in the file created. This building upon a well-known interface allowed for rapid development of “upper layer” applications, like reconstruction apps, without having to “learn” a new interface to gain access to and filter through the underlying packets. This is a very cool, though nerdy and perhaps esoteric, “platform” idea. You extend a known interface to provide new functionality and follow all the conventions and protocols that users of this interface would expect from their historic use of it.
This technology found wide adoption in enterprise, education, and government. Ultimately, this company was acquired by Blue Coat Systems, with Steve Shillingford, our DeepSee CEO, as the then CEO of Solera Networks.
Prior to DeepSee I spent a few years helping a successful educational technology company, Imagine Learning, move their offering to the cloud and then extend their offerings, in many ways, including the introduction and use of machine learning. IL uses gamified activities to teach young kids, particularly those that don’t have English as their first language, how to read, write and speak academic English and also offers activities and instruction to teach math skills and concepts to K-12 markets. I was their CTO for the first 4+ years of my time there and then had the title President, that I shared with another, the last 1.5 years. Early on, I helped this company move from approximately 4500 on-premise servers to a full cloud offering. I worked with great people in this effort. We served millions of kids and hundreds of thousands of teachers and the amount of data we captured and analyzed gave me, and us, unique views to scalable systems, large data collections and data lakes, OLAP cubes, microservices, machine learning, high volume event systems (e.g. streaming analytics) and many other “platform” concepts. We were heavy users of both Amazon AWS and Microsoft Azure cloud platforms.
Suffice it to say, the idea of platform has been highly transformed over the years and is now, in my opinion, a basis for untold innovations to come.
The API’ification of the Internet of the past few years is only the beginning. There really aren’t words to describe the level of abstraction and capability in what we would call platforms and platform services when compared to the early days. The abstraction of services, the democratization of access to methods and data and capabilities to extend make for a future that is very bright. There is almost a “perfect storm” of democratization of two concepts that will change and enable untold insights. The first is platforms, and all their benefits lightly described here and more, and the other is machine learning and our primary area of interest, Natural Language Processing.
Our view, at DeepSee, and my personal interest with this venture, is based on the idea that Natural Language Processing (NLP) using Deep Learning and Machine Learning (ML) models and infrastructures are in early-days of innovation. Whereas, more structured data ML concepts and frameworks have seen high innovation curves the past few years, unstructured data and NLP are just beginning with perhaps an inflection point that coincides with the release of the Transformer ML model, by Google, just a couple of years ago. The Google project, BERT, was the first of many variants that sets a path for high innovation curves surrounding NLP in the years to come. One offshoot of Transformer models was picked up by the group OpenAI with their GPT-2, and now the GPT-3 pre-trained model. These models are amazingly capable and furthers the potential innovation to build with. NLP innovations, in our opinion, will lead to Natural Language Understanding. This next level of innovation has the potential to answer the question of, “what was meant by the text just viewed or read?”. It is fascinating and exciting on many levels. Further, this innovation around unstructured data and the many insights to be gained from it when coupled with a full-featured platform to access data and insights, extend data availability that offer new insights, and introduce the ability to enhance and introduce new ML models and receive even further insights to solve real problems is meaningful and doable. The Robotic Process Automation (RPA) efficiency opportunities alone are large while meaningful and actionable insights open new actions or mitigations learned from these documents that until now have been unseen or unrealized. Additionally, the ingestion of documents, and their many types, combined with AutoML layers to select the best NLP models to analyze the text just given will allow for easier adoption of these technologies. In other words, if we are successful in our platform build out, it will bring a democratization of a wide array of capabilities with meaningful abstractions to hide away the more mundane tasks so that “towers” of innovative solutions can be built on “top” of the platform that we will continue to improve.
All this is why I, on a more personal basis, am here at DeepSee and why I am surrounded by bright and capable people, who all have a different mixture of experiences that brings them here, to build out this vision. We are optimistic for where we are now and where we can and will go.
Please reach out to us to learn more of how we can help you. Or reach out to join us in this venture if I have in some way piqued your interest.
If you have made it here, I still think this is a TL;DR-kind of post but, thanks for reading.