Mashgin: The Future of Computer Vision

twitter-picAbout a year ago I invested in and joined a startup called Mashgin. In this post I want to talk a little about what we’re working on.

Mashgin is building a self-checkout kiosk that uses cameras to recognize multiple items at once without needing barcodes.

The current version of the kiosk is designed for cafeterias, where customers can slide their tray up and everything on it is recognized instantly. Excluding payment, the process takes around 2 seconds. No more waiting for a single line held up by a price check!

But retail checkout is just a package around Mashgin’s core fundamental technology. We believe there is an opportunity to apply recent technical advancements to many fields. Advancements such as:

  • Smartphone dividends — cheap sensors and ubiquitous, miniaturized electronic components
  • Cheap parallel processing power including low-cost GPUs
  • An explosion in collaborative, open-source software tools
  • Machine learning methods, in particular convolutional neural networks (a byproduct of the 2 preceding trends)
  • Cheap cloud infrastructure

Chris Dixon talks more about some of these trends in his post What’s Next in Computing?

So how is Mashgin applying this technology?

Adaptive Visual Automation

Face swap: billionaire edition

Computer Vision transforms images into usable data (descriptions) using software. If cameras are the “eyes” of a machine, computer vision would be the brain’s visual cortex–processing and making sense of what it sees.

When computers know what they’re looking at, it opens up a world of potential. You can see it in existing use cases from facial recognition in Facebook photos (…or face swap apps) to Google Image Search and OCR. Newer, much more sophisticated applications include driverless cars, autonomous drones, and augmented reality.

Gradient Descent
A visual example of using gradient descent (the reverse of hill climbing in a fitness landscape) as part of the learning process of a neural network

These recent applications tend to be more complex, and as a result use machine learning in addition to traditional image processing methods. Machine learning, and in particular deep learning through neural networks, has changed the game in many areas of computer science, and we are just beginning to see its potential. ML can simplify a large amount of data into a single algorithm. As the name implies, it can learn and adapt to new information over time with little or no “teaching” from engineers.

Both CV and ML can be applied to many fields, but one of the biggest immediate needs is in Automation. There are a surprising amount of simple (to humans) visual tasks ripe for automation. This includes industrial use cases in manufacturing and distribution, and consumer use cases in household robotics and relief of everyday bottlenecks.

I call the above combination adaptive visual automation: using machine learning to automate vision-based tasks. Although relatively new, this combination covers a large and quickly growing class of real-world problems. Autonomous cars (and especially trucks) are a good up-and-coming example that will have huge ramifications.

Mashgin’s future

Mashgin uses adaptive visual automation to improve the speed, accuracy and cost of applications in recognition, measurement, and counting in a closed environment. That was a bit of a mouthful, so here’s the short version: Mashgin wants to make visual automation intelligent.

There’s a broader category of AI vision companies whose purpose is giving computers the ability to understand what they see. Mashgin is a subset of this group, focusing on automating well defined real-world problems.

There are further subsets such as eliminating bottlenecks in everyday circumstances — speeding up checkout lines being one example. In many of the activities you do on a daily basis, intelligent automation has the ability to save a huge amount of time and money.

Retail checkout is a big market (even for just cafeterias) but it only scratches the surface of the value Mashgin will eventually be capable of. We have already established a foundation for applying recent advancements to these problems and it will only get better from here.

Atlastory: Mapping the history of the world

Certain ideas are “inevitable” over time. Paul Graham calls them “[squares] in the periodic table” — if they don’t exist now, they’ll be created shortly. It’s only a matter of when, not if.

I believe that Atlastory is one of those ideas. The following is a long post about a project I’ve been passionate about for some time now and am currently in the process of winding down.

The Idea

Atlastory is an open source project to create an interactive map that chronicles the history of life on earth. It’s a “Google Maps” for history. The ultimate goal is the ability to see what the world looked like 50, 200, 1000+ years ago. It was inspired by OpenStreetMap & Wikipedia: combining historic maps with cultural & statistical data.

Atlastory map in action

I started Atlastory at first because I’m a fan of both history and good data visualizations. I was surprised something like this didn’t already exist and thought that it would be an amazing educational tool.

Maps are one of the best ways to clearly show an enormous amount of information. Since everything in the past took place at a certain time and location, maps are an obvious choice to visualize that knowledge. Understanding history requires seeing changes and interactions over time, and a four-dimensional map allows this.

To envision information—and what bright and splendid visions can result—is to work at the intersection of image, word, number, art.” — Edward Tufte

Good design will be a key aspect of the final product. Good information design can communicate a huge amount of knowledge in a small window of time or space. Great information design has a high amount of density and complexity while remaining completely understandable.

The Vision (version ∞)

Atlastory’s purpose is to improve understanding of the past by organizing and visualizing historic knowledge.

My vision for Atlastory was that one day it would become a tool like Wikipedia that’s used regularly around the world. A journalist could use it to go back 20 years to see the geography and timeline of a major world event. A student could use it to go back 20,000 years to see the expansion of human culture across the globe. A climatologist could use it to visualize the historic overlap of population growth with changes in global climate patterns.

Wikipedia organizes information by creating a searchable network of interconnected articles that combine text and other multimedia. Atlastory can be the first medium that allows completely visual navigation, displaying information at a much higher density and level of interactivity.


Imagine students in a classroom learning about World War II. You’d be able to see the country borders of Europe as they existed in 1942. Drag the timeline, and see the borders change as the years go on. Turn on an overlay of population density or GDP per capita and see the flow of activity throughout the war. Zoom in and see the troop movements of a pivotal battle.

The visual interactivity would make it much more enticing for people, young and old. Almost game-like in terms of exploration and discovery.

Eventually, the timeline could go back far enough that you’re able to see continental drift and other pre-historic geographic or environmental changes.

Map content

Maps can be broken down into a few different types:

  • Physical — shows the physical landscape including mountains, rivers, lakes.
  • Political — sovereign, national and state boundaries, with cities of all sizes. The typical world map you see will be political with some physical features.
  • Road — shows roads of various sizes along with destinations and points of interest. Google Maps & other navigation apps fall into this category.
  • Statistical — shows statistics about human populations such as economic stats, population density, etc.
  • Scientific — thematic maps that can show climate, ecological regions, etc. (see the climate map below)
  • Events — shows how a specific event played out geographically, like WWII or Alexander the Great’s conquests.

Climate patterns

Any map type that has enough data to span long periods could eventually go into the Atlastory system. Event, thematic, statistical, and scientific maps could all seamlessly layer on top of the main “base map”.

Base map

The Atlastory base map should be an elegant combination between 3 map types: physical (basic landscape features), political (sovereign and administrative boundaries), and cultural (see below). Major roads and infrastructure would be added only after a worldwide “structure” of the base map was created.

Importantly, map creation should be top down, from global to local. The purpose of an Atlastory map is not navigation, it is understanding of history. Creating a global structure will also provide context and make it easier to interest other users/contributors.

Cultural cartography

Most world maps made today (of the present time or of the last few hundred years or so) are of the political variety. But what happens when you go back a few thousand years? What about areas of the world where, even now, aren’t necessarily defined by geopolitical boundaries?

The solution is mapping cultural regions. Culture, in this case, being human societies with common language, belief systems, and norms. “A cultural boundary (also cultural border) in ethnology is a geographical boundary between two identifiable ethnic or ethno-linguistic cultures.”

A cultural map would have different levels, just like political maps: from dominant cultural macroregions to local divisions between subcultures or classes within a society (blue collar vs. white collar, etc.).

Combining cultural cartography with typical map types allows for a much better understanding of both modern and ancient history. Culture plays a major role in world events & limiting the map to only defined borders paints an inaccurate view of history.

Cultural regions

(Notice any overlap between cultural regions and the climate regions in the map above it?)

The Tech

The technical infrastructure behind Atlastory has a few basic components:

  1. A database of nodes (latitude/longitude points) organized into shapes, layers, types, and time periods.
  2. An API that manages, imports and exports data from the database.
  3. crowdsourced map editor interface (like iD for OpenStreetMap, but designed specifically for top-down time-based editing).
  4. A map rendering service that turns raw map data from the database into vector tiles that can be styled for viewing.
  5. The map itself: a web interface to view and navigate the maps.

Most of the components would be built from existing open-source tools created by organizations like OpenStreetMap, MapBox, and CartoDB. There has been a lot of technical innovation in this field over the past few years which is one of the main reasons something like Atlastory is now possible to build. (Although given what I known about the requirements still very challenging.)

Read more about the technical requirements…

The current status and future of Atlastory

I’ve been working on this as a side project for more than 3 years now. Originally I imagined being able to quickly find a way to profit from the service. But as development dragged on and other commitments began taking up more of my time, I realized I’d never be able to finish it alone.

Earlier this year I joined Mashgin, a startup in the Bay Area, as a full-time “Generalist.” My spare time completely dried up and I decided everything needed to be completely open sourced and distributed to anyone interested in the project.

Due to personal time constraints, I can’t continue with it so I’m looking for others who are interested. This could mean taking over / adapting the codebase or using other means to pursue the idea. See below for more details on what’s currently done. Although many of the back-end components are functional, the infrastructure is in a rather unusable state right now.

Please contact me or leave a comment below if this strikes your curiosity or you know anyone else who would be interested. I’m happy to answer any questions.


Education & Elon Musk’s School Startup

One of my “later in life” goals has always been to start my own school. A “School Startup” rather than a Startup School, if you will. The school would be radically different than traditional education. Charter schools, Montessori education, and AltSchool are steps in the right direction but don’t go far enough. (See my post a year ago on mental model education as an example.)

Once again, Elon Musk has stolen my idea. Running two revolutionary billion-dollar companies just wasn’t enough. But of all the people who have and will try something like this, I think Musk has much higher odds of pulling it off. (Then again, the same could be said about a lot of undertakings…)

I hadn’t seen this news before, but Eric Jorgensen linked to a video from a May interview with Beijing News where Musk discusses creating Ad Astra, a private school in L.A for his 5 kids and 15 or so others (primarily kids of SpaceX employees).

“I created a little school,” Musk began. “It’s small, it’s only got 14 kids now and it will have 20 kids in September. It’s called ‘Ad Astra’ which means ‘to the stars.’ ” He continues:

What’s a bit different from most other schools is that there aren’t any grades — there’s no grade 1, grade 2, grade 3 type of thing — and not making all the children go in the same grade at the same time like an assembly line. Because some people love English, or languages, some people love math, some people like music, and have different abilities at different times. It makes more sense to cater the education to match their aptitudes and abilities. So that’s one principle.

Another is that it’s important to teach problem solving — or teach to the problem, not to the tools. . . . Let’s say you’re trying to teach people about how engines work. A more traditional approach would be to say we’re going to teach all about screwdrivers and wrenches, and you’re going to have a course on screwdrivers, a course on wrenches . . . A much better way would be like “Here’s the engine, now let’s take it apart. How are we going to take it apart? Oh, you need a screwdriver. That’s what the screwdriver is for. You need a wrench — that’s what the wrench is for. And then a very important thing happens which is that the relevance of the tools becomes apparent.

I think the problem solving aspect is key. Having kids work alone or in teams to solve problems, with a teacher to guide and review after. As they get older, using a “case study” approach to learning can supplement this as a way to learn multidisciplinary mental models.

Imagine learning about the space race in the 1960s. For 1 or 2 weeks students can learn (through a mixture of lectures, media, internet research, experimentation, etc.) about a whole range of disciplines: history of the space race & cold war tensions, politics, math & science of getting to space, reading chapters of “The Right Stuff”, engineering by building model rockets, and so on.

I’m sure there are schools/teachers who already do this but I wish it was the norm rather than the exception.

Found Quotes 3

Let’s start with a test: Do you have any opinions that you would be reluctant to express in front of a group of your peers? If the answer is no, you might want to stop and think about that. If everything you believe is something you’re supposed to believe, could that possibly be a coincidence? Odds are it isn’t. Odds are you just think whatever you’re told. ― Paul Graham

If you’ll laugh about something one day, you may as well start now. ― Paul Graham

Be patient, calm, compassionate. Know that existence is fleeting.― Ettore Sottsass

A person’s success in life can usually be measured by the number of uncomfortable conversations he or she is willing to have. — Tim Ferriss

We start from the presumption that our people are talented and want to contribute. We accept that, without meaning to, our company is stifling that talent in myriad unseen ways. Finally, we try to identify those impediments and fix them. — Ed Catmull

Adventurous men enjoy shipwrecks, mutinies, earthquakes, conflagrations, and all kinds of unpleasant experiences. They say to themselves, for example, ‘So this is what an earthquake is like,’ and it gives them pleasure to have their knowledge of the world increased by this new item. — Bertrand Russell

Reality provides us with facts so romantic that imagination itself could add nothing to them. — Jules Verne

1) Don’t sell anything you wouldn’t buy yourself, 2) Don’t work for anyone you don’t respect, 3) Work only with people you enjoy. — Charlie Munger

The game of life is the game of everlasting learning. At least it is if you want to win. – Charlie Munger

Berkshire’s Best Investments + Poster Now Available

[This is a cross post from the Explorist Productions blog. Explorist is a media company I founded that publishes content related to business, innovation, and discovery.]

The Berkshire Hathaway limited hardcover letters book and “50 Years of Berkshire” wall print are now available for purchase online. Both of these items were available at the meeting a month ago and I’ve received lots of praise about them from other shareholders, so I’m glad to finally make them available to everyone.

In the process of doing research for the visualization, I collected a lot of data on Berkshire’s financial history — much more data than could fit in the charts on the print.

So in addition to the wall print, I hope to release a few more posts further exploring the story of how Warren Buffett transformed Berkshire over the years. Once I reformat and clean-up it up, I’ll eventually release the raw data so that others can do their own analysis.

Berkshire Hathaway’s Best and Most Notable Investments

The following chart shows the cumulative contribution to book value* of selected investments over 50 years. This is a good yardstick for comparing how successful investments were over time. It doesn’t include insurance companies other than GEICO, as it’s too difficult to separate individual performance given available data.



  • See’s Candy: Income for some years after 23 are estimated.
  • Buffalo News: No data available after year 23.
  • BNSF: Post-acqusition performance only (pre-2009 stock return not included).
  • Dividend income for stock holdings calculated in most cases on average shares held during year.

Some interesting tidbits:

  • One-third of Coca-Cola’s total gain to Berkshire is in dividends paid over the 27 year holding period. One-quarter of the Washington Post gains are from dividends, the remainder from realized gains in the 2014 sale/transfer.
  • With underwriting gains, GEICO has added 7,119% to book value since purchase in 1976. This means that had the rest of Berkshire’s investments returned 0% over those 38 years, annual book value growth would still have been 12%.

* A simple example to show the calculation: ABC Corp. is purchased in year 1, adding $100 (either in net income for subs, or change in unrealized gains + dividends for investments) that year to an initial equity base of $1,000. So contribution after year 1 would be 10%. In year 2, ABC Corp. adds another $100 to a starting equity base of $1,300. Contribution for that individual year would be 100/1300 = 7.7%, but cumulative contribution would be 20%, as ABC Corp. has contributed $200 to an initial equity base of $1,000.

This measurement puts investments on an equal footing, allowing comparison across different timeframes. It implicitly accounts for both individual return and capital allocated to the investment. What is not accounted for is excess capital reinvestment — in other words, contribution is based on GAAP net income, not true free cash flow.

A mental model education

Based off a previous #tweetstorm.

See: So Bill Gates Has This Idea for a History Class . . . by Andrew Ross Sorkin. Some relevant quotes:

Christian’s aim was not to offer discrete accounts of each period so much as to integrate them all into vertiginous conceptual narratives, sweeping through billions of years in the span of a single semester. . . . In the worldview of “Big History,” a discussion about the formation of stars cannot help including Einstein and the hydrogen bomb; a lesson on the rise of life will find its way to Jane Goodall and Dian Fossey.

“Most kids experience school as one damn course after another; there’s nothing to build connections between the courses that they take,” says Bob Bain.

“This course is a fundamental shift in how you deliver something. But there’s so many factors in American education that work against it.”

If any of this interests you, check out David Christian’s Big History Project and watch his TED talk The history of our world in 18 minutes.

Christian’s view of teaching is what I call a mental models approach that weaves narratives from all disciplines. It’s not only more interesting, but a more accurate portrayal.

Unfortunately, it is hard to introduce this into current curriculum. There are bureaucracies and “kingdoms” to protect and people set in their ways.

The easiest way is to build a new eduction system from ground up: rethink everything including:

  • The concept of “classes” and the compartmentalization of subjects (aka the mental model approach).
  • Scheduling — length and timing of the school day, length of the school year, a rigid “period” schedule vs. a more free-flowing approach…
  • Grade levels — why should kids born within a defined 365 day period be taught together? How could this be adjusted?
  • Range of subjects — what else should be taught other than the typical math, science, language, history?
  • Self-motivation policy — is homework useful? What should students do outside of class? How much should the school be involved in this?
  • . . .

There are tools that aid this kind of learning, both in and out of class. Big History is one. I believe something like Atlastory, a project I started, is another.

The dawn of immersive storytelling

From a previous #tweetstorm:

Immersive storytelling will be a big industry in the near future: movies viewed on Oculus Riftdome-like cinemas, or interactive games. We have co’s like Jaunt, Condition One & (consumer) making 360 cameras that will be used for filming.

A new visual “grammar” will have to be discovered by filmmakers through trial and error (i.e. no fast cuts, super close-ups, etc.). Parts of the legacy film industry will rebel at first, as they have over the last 100 years since storytelling evolved from live performances to filmed, pre-recorded stories.

Just like audiences were frightened at the sight of a train barreling towards them in early theaters, there will be a learning curve for immersive experiences. Early players of demo games for the Oculus Rift have been scared to the point of ripping their headsets off. Dome cinemas could be the social alternative to VR headsets. (If you ever been on Disney’s Soarin’ Over California ride that’s an example.)

Technology-wise, I feel a complete 360 field-of-view (FOV) like this Jaunt setup won’t be the way to go. There has to be some direction to the audience’s attention. A complete FOV is too immersive and incompatible with users’ prior experiences. Maybe at some point down the road. Something like a 180-220 degree FOV + 180 up and down to allow some freedom of motion (immersion) but still directed view with surround sound.

There is lots of experimentation ahead in the near future in both technology and storytelling grammar. I look forward to both observing and participating.