Equijoin

← Back to blog

Published on 12/31/2024 00:00 by Brian 'bits' Olsen

Why I founded Equijoin

Hey friends,

I’m happy to share this strange but exciting journey of starting a business, project, and community I have called Equijoin. Equijoin literally comes from the the relational algebra operation equijoin that speaks to the foundational strategy this company will have building out small sustainable open source businesses in the data space. The deeper meaning of Equijoin delves into the long-term vision of making open source a sustainable model for global and social change that proliferates from software and technology into a humanitarian movement. This movement works in tandem with other open and altruistic ideas like effective altruism, open knowledge and open movements in general. I know the term “open” can be quite nebulous these days, but there are special lessons we can learn about how society can mimic models of cooperation from open source that improve the lives of stakeholders not just shareholders.

The Equijoin Vision

Understanding the way forward shouldn’t be driven by a single person, agency, or company. It will take gathering the awareness and insights of people all across the globe to understand how we collectively define this new approach to collaboration. Rather than bringing all the answers, I want to present raw ideas to play with an interdisciplinary scope of ideas Equijoin proposes. I have special talents and perspectives from being raised in a poor family, experiencing child-abuse, having ADHD, becoming a U.S. Marine, over a decade in software and data engineering, and learning various aspects of business in small startups that expand open source tools. This arms me with the drive and a somewhat informed perspective on where to begin this journey. But that’s the point, nobody has the full context to lead and align 350 million people, much less 7 billion people globally. There are so many areas where I know little to nothing and have many blind-spots that building a community around core collaborative and philosophical ideas in open source is a necessity to expand beyond the narrow applications of these principles to technology decisions. That said, the use of the internet to build open source software that challenges large businesses provides the first global-scale unionization mechanisms to truly bring power to the masses.

In combining and sharing the lessons from history and utilizing political and economic strategies we can form a concrete set of economical initiatives driven by the voices of anyone who believes in hearing all voices. We need to distance ourselves ideologies and narratives used in the current power-centralized system to isolate us and focus on how we collectively give everyone a voice. We need leadership from all political and social ideologies. We need all demographics, academics, impoverished, white and blue collar workers, .01% elites who detest the system they run in, and anyone who believes in collective voices that enable diverse ethics and morality and avoids cultural homogenization at the federal scale. We’ve slowly crept back to a second guilded age in the United States where the elite class make sweeping decisions over a population that feels unheard and powerless.

I refer to this raising of awareness and collective action as a modern incarnation of syndicalism. Syndicalism was the foundational movement that led to worker’s unions in the early days of the industrial revolution. Worker’s came together to demand better working conditions when they realized that the only way to get companies to listen was through direct action that hit companies in their pocketbooks. Although strikes devolved at times into violent action, following the direct action we see in open source communities can enable non-violent forms of direct action, in a way that is legal and has less likelihood of devolving into violence or being disbanded.

I’ve named this movement open syndicalism, the portmanteau of open source and syndicalism. Open Syndicalism is a new way to peacefully organize and protest by building collaborative small businesses (worker’s cooperatives) around open source to challenge the centralization of wealth. These worker’s cooperatives are small but mighty as a whole and borrows open source development, licensing, and governing models as a blue print to organically combat anti-competitive practices. A collective will start with consumers or underdog competitors, who can band together to create open source standards and build multiple variations of interoperable consumer-driven alternatives. Once consumers realize that shared standards organically encourages corporate behavior of a true laissez-faire market. Picture trying to sell a computer today that only offers specialized peripheral connections instead of USB, consumers learn that avoiding the standard becomes suspect and would prefer more degrees of freedom between the products they buy. This enables consumers to push back on monopolies or oligarchies. Once a battery of open standards become popular enough, you eliminate the methods in which corporations can lock you in and removes financial incentive to do so. If you think this sounds great in theory but is impossible, you should watch Revolution OS to see early occurrences of open source success in operating systems over large corporations through networks of friends collaborating online. I myself have experienced this first-hand as well, which led me to a realization that I needed to share with others (more on this below).

|1024x1012

Open syndicalism logo, combining the open source initiative logo with the syndicalism black cat

The vision then of Equijoin is to enlighten others first in the local data and analytics space and prove out this small-business model, and grow this idea beyond data and tech. The step after will grow to other industries that overlap with software but have resource. One large part of this vision wants to avoid promoting change through “othering” others. Instead, we need to realize that this movement is about addressing the system that has historically subjugated portions of society and aims to systemically increase the wealth gap, polarize individuals which is orchestrated in fear by those currently in power, and moves us closer to global catastrophes. We need to recognize the threat that these systems pose to us as a threat to our very survival, and use it as a motivation to come together as a humans to fend off these threats with a new aligned narrative. This is much larger than nation states, ideologies, or land disputes, this is about rewriting the script on how we collaborate to avoid the imminent threat to ourselves.

What led to Equijoin LLC

Over the last five years, I have had the privilege of working with the most talented individuals in tech who have a curious desire to open source all the software they write. I myself was bitten by the open source bug back when I was provided a way to experience connection with an online community where I could grow my economic value and work on projects that interested me. I could also showcase this work in public as all accounts of my contributions were viewable forever online. As my involvement grew, I was eventually approached to work at Starburst, a company built around an open source project I contributed to called Trino, and do so full time!

Through working with the Trino project, I also grew fond of an open SQL table format called Apache Iceberg that overlapped with the functions of Trino. As I expanded into more open source communities, I learned a lot about the complex incentive structures and variations of systems that govern how open source projects function. Open source projects generally don’t only provide just the ability to view and run the software, but have learned that transparent and participatory collaboration with anyone using the software provides positive-sum benefits to all the users. From a healthy open source projects you’ll see ecosystems of full-scale products, exterior tooling, and services created to centralize a great deal of effort between individuals and groups of individuals (e.g. companies). In my more recent occupation at Tabular, a company built by the founders of Apache Iceberg, I learned the valuable open source philosophy of keeping the community focused on standardizing an open specification as opposed to standardizing on a shared implementation has changed the way I believe we should approach open source projects.

As opposed to many of the open source projects that centralize innovation on an implementation, Iceberg centralized around standardizing technical protocols, policies, and definitions rather than expecting multiple companies to combine their intellectual property. Keeping the ideas and protocols standardized kept the commodities consistent for the consumer market, to avoid vendor lock-in, while trialing different products for different use cases.

|1024x693

An example of open standards that uses the fable of the three little pigs to standardize how these companies may build the houses, but not requiring the pigs share the same trade secrets about how they might incentive the pigs to buy from them given different use cases, like the absence or presence of a big bad wolf.

At Tabular, we among other vendors and community members, were encouraged to build our own implementations. While the community-driven Apache Iceberg standard that all query engines and catalog providers must align on is available to all, the implementation created by Tabular could remain closed-source, while enabling any customer to move on or off of our platform without concern of the expense it would cost to migrate to a competitor. The open community specification provides transparency to consumers and companies on how the spec is governed and why decisions are made. Even further it opens the table of discussing how the specification changes to individual consumers and individuals representing the interests of their company. This preserves the incentives for companies to cooperate in a harmonious way and optimizes fairness to control for anti-competitive strategies by standardizing the consumer model. But how does this work?

When Apache Iceberg started, the primary query engines that supported this format were two query engines, Apache Spark and Trino. This support came from the common shared architecture that was also used at Netflix, which birthed the Apache Iceberg format. Netflix explicitly wanted to avoid using the existing table format, Delta Lake, as it retained technical issues from its predecessor, Apache Hive, and the power that Databricks swayed over governing the technical roadmap of Delta Lake. Upon Netflix open sourcing Apache Iceberg, the Delta Lake project quickly gained features that were originally only available within Databricks and put the open formats in a near parity. As time went on, a few other query engines added support for Iceberg, but awareness really exploded when Databricks’ competitor Snowflake announced their initial support for Apache Iceberg due to high customer demand and the competitive advantage gained against Databricks. This spurred on a domino effect of Iceberg adoption with AWS Redshift, Google BigQuery, and many others soon followed. Snowflake’s adoption actually made competitors and consumers aware that Apache Iceberg could become the standard way of storing larger datasets not just between new query engines, but also the established data warehouses.

Following Metcalfe’s Law, as more query engines and databases added Iceberg support, adding support in other systems become much more valuable to customers as this grew the flexibility consumers had to choose between engines without retaining the upfront costs. However, the real shocker came when Databricks also announced their initial support for Apache Iceberg in the year following the initial Snowflake announcement. This culminated to the events of this year where Databricks acquired Tabular, which signaled that they were all in on supporting Apache Iceberg, and minimizing the upfront decision consumers have to make in choosing a table format. Once the acquisition was announced, there were many strong concerns that the open governance that characterized Iceberg would at risk as many of the Iceberg community leaders would join Databricks as part of the acquisition. Due to knowing the way the Apache Software Foundation, open source licensing and open source culture works, I knew there was little to be concerned about. This made me realize that I wanted to educate the community about the incentive structure around open source and specifically open standards. Also, how attending community syncs and reviewing the features that are getting attention are key to understanding the realities of how a community is truly being governed.

More importantly, as I reflected on the events that had taken place from the time I had learned about Iceberg, to this incredibly validating event, I also realized I had gained a much deeper insight. The power that open source communities and open standards could not only revolutionize and empower consumer markets in tech, but create open and transparent consensus for any standard or policy-making body. Not only does this pattern provide opportunities for laid off tech workers to one day challenge the largest companies who removed them to improve profit margins, but perhaps this could be a blue print to assemble and organize cooperatives that can slowly replace policy-making bodies in governing bodies.

Moving back to the present, this is where Equijoin begins its journey. Advising and driving awareness to the open source models, improving open source sustainability, and driving incentives so that we prune the winner-takes-all paths, while improving collaboration, camaraderie, and innovation that we so deeply need in the economy and as humans sharing this earth.

I look forward to keeping you all informed as this journey progresses. Please join the Equijoin Community on Matrix.

Brian “bits” Olsen Founder and Open Source Freelancer @ Equijoin LLC

Written by Brian 'bits' Olsen

← Back to blog