Launch HN: Activepieces (YC S22) – Open-Source Zapier Alternative

		Launch HN: Activepieces (YC S22) – Open-Source Zapier Alternative
	127 points by ashrafsam 6 hours ago \| hide \| past \| favorite \| 39 comments
	Hi HN, I’m one of the creators of Activepieces, an open source (MIT) no-code business automation tool. We’re excited to share it with HN! Our Github is https://github.com/activepieces/activepieces, our website is https://www.activepieces.com/ and there’s a video that shows how to build a Pipedrive + Slack + Email flow in 2 minutes at https://www.youtube.com/watch?v=IY4TI6jGBwM When we used automation tools like Zapier at my previous job, we found that it became incredibly expensive very quickly, and we had only too few options to self-host business automation when data had to reside on prem. There are open-source automation tools that we think are too technical like Huginn and Node-RED or developed under less permissive open source licenses like N8n. So we decided to build an open source automation tool under a permissive license (MIT) with a simple user experience that doesn’t require technical knowledge, and can be self-hosted. We plan to make money from the cloud version and a future enterprise edition with advanced features - maybe advanced roles and permissions. The current version includes a visual designer for automation flows which can run on schedules (Cron), by Webhooks, or by triggers from external apps—25 apps and counting, including Stripe, Calendly, Google Sheets and others (we’re building these rapidly). The app is customizable, you can add custom steps using HTTP requests or you can write Node.js code and bring in your npm packages. If you’re curious about how it works, here are the docs: https://www.activepieces.com/docs We’d love to hear HN’s thoughts on what we’re building! Thanks!

Congrats on the launch! Huge TAM so there’s lots of room for a healthy ecosystem of competitors in no code low code. My suggestions for succeeding in this ecosystem:

* The value is in the long tail. Major services will integrate with other major services on their own. Make it easy to integrate with the smaller services.

* Enterprise features is where the rev is at, you want to be able to get into Corp customers who are willing to spend $50k-$150k/year with you because you offer automation their internal it or dev teams can’t (along with RBAC, audit logs, and the usual trimmings that an enterprise offering entails) (individual and smb personas are fine, but they are price sensitive and have higher churn)

* UX is important. Spend the resources as you scale to understand how your users are leveraging your product for their workflows; it should be magical to them. The easier it is to use, the more it’ll be used, which translates to more revenue (assuming revenue tied to tasks executed).

* Integrations will break frequently; instrument to know when this happens and to rapidly roll out fixes.

* It is crucial to be able to pause your workers as well as replay data from webhooks and polling. Also, log all the things (while redacting secrets) as data is processed. This will make troubleshooting integration issues and edge cases (which will pop up often at scale) less painful as data structures flow through your code paths.

Are there any programmatic methods to detect integration breakages? Simply relying on unit/integration testing with mock endpoint is not effective, as it cannot capture changes made by third parties. Being notified of breakages is a solution, but not the most ideal one.

I've considered sourcing open API specs, like APIsguru.com, to scan for changes, but I was wondering if you have any other suggestions.

It has been almost a decade since I've been involved at such an org, so I cannot speak to the state of the art. Folks currently doing the schlep are going to be more knowledgable than myself. With that said, you are likely to be alerted in a few ways of trouble with an integration:

1. Exception handling. Whatever code you have polling API endpoints or processing inbound webhooks is likely to throw an exception if the structure of the API it's consuming or inbound webhook message formats have changed materially. I recommend handling via Sentry or a similar application error reporting mechanism for triage by your SRE or platform team.

2. API responses. There is some peril here, as every API is different. Some APIs will behave as you'd expect with respect to error codes, error messages, and request allowances, while some APIs will reply with code 200 with the error message in the body. Again, this is the value incumbents offer; they know what failure looks like for each API, and they also have a good idea of what success and health looks like at steady state. Build relationships with API partners (do you have a partner team? you eventually should) so that you have open comms with them with regards to breaking changes, and code defensively in general. Tangentially, ensure you have robust logic around deduplication of polling data.

3. User reporting. If your unit tests didn't catch something, nor did your application error mechanisms, your users will absolutely let you know if a piece of JSON element landed where it shouldn't have in a target integration.

I'd encourage you to ask around to others in this space, as their recent knowledge will be more relevant for avoiding sharp edges. Also, once you've built whatever you're building, you'll be able to (or, at least, you should if you've approached this from a systems thinking perspective and wrapped the necessary telemetry and observation tools around the machine) observe at scale what optimal and suboptimal looks like.

Thank you-there is huge similarity between how we think and your comment!

How do you define long tail? Fewer users?

I'm admittedly not at all familiar with typeorm (not really great at Node either for that matter), but a few questions:

What kind of testing have you done on app/DB performance under heavy load?

Why are you indexing nearly the same thing thing twice in the AppConnection table? [0]

What kind of column is your pkey, as defined in BaseEntity? [1] It says String, but AFAIK that's not a Postgres type.

Excited to see more work in this space!

Disclaimer: I'm a DBRE at Zapier.

[0]: https://github.com/activepieces/activepieces/blob/main/packa...

[1]: https://github.com/activepieces/activepieces/blob/main/packa...

The development began two months ago and we have not encountered any scaling issues yet, as the majority of users self-host. Therefore, our priority was focused on building apps. Is there anything you believe we should consider?

You are correct, repeating the same index twice is a mistake. Thanks for hinting at that

We are using nano id (https://www.npmjs.com/package/nanoid) for all entities, It's stored as varchar in the database.

> nano id

There are pros and cons to using random IDs as a PK. For RDBMS clustering on the PK (InnoDB), it's a terrible idea. If you're going to sort by the PK, it's usually a terrible idea (UUIDv1 isn't as bad since it includes the timestamp, but that assumes your access pattern is based on insertion time). There is ULID [0] if you'd like something that's sortable. You could also just have a secondary index. An advantage can be that it _can_ be a good way (again, this depends heavily on your access patterns) to do sharding.

My other concern for nano id is twofold, both around their PRNG. They mention using Node's crypto.randomBytes(), but their source code instead references crypto.randomFill() [1]. Node's docs mention that having "surprising and negative performance implications for some applications" [2], related to libuv's thread pool. See my later comment about libuv and containers. Also, Node's crypto.randomBytes() mentions that it "will not complete until there is sufficient entropy available." That sounds suspiciously like they're using `/dev/random` instead of `/dev/urandom`, which at least for this application of it, would be an odd decision. I did note that elsewhere in nano id, they're creating their own entropy pool, so it may not matter either way.

With that out of the way:

If the plan is only for self-hosting, then yeah, you don't really need to consider schema design that carefully. Databases are really good at their job. Also, honestly nearly none of this matters until you have significant scale.

If you plan on starting a SaaS, there's a lot to consider. An incomplete list, in no particular order:

* Foreign keys. They're very handy, but they can introduce performance problems with some access patterns. Consider indexing child table FKs (but not always - benchmark first).

* DDL like ALTER TABLE. I suggest getting intimately familiar with Postgres' locks [3]. The good news is that instant ADD COLUMN with {DEFAULT, NOT NULL} is safer now. The bad news is that it does so by lazy-loading, so if your queries are doing silly things like SELECT *, you're still going to end up a ton of contention.

* Connection pooling. You don't want to eat up RAM dealing with connections. PgBouncer [4] and Pgpool-II [5] are two that come to mind, but there are others as well. The latter also handles replication and load balancing which is nice. If you aren't using that, you'll need to handle replication and load balancing on your own.

* Load balancing. HAProxy [6] is good for load balancing, but has its own huge set of footguns. Read their docs [7]. A few things that come to mind are:

  * Any kind of abstraction away from the CPU, like containers, may cause contention. Same with VMs (i.e. EC2), for that matter, since a noisy neighbor can drop the single-core turbo of Xeons A LOT. Look into CPU pinning if possible.

  * HAProxy really likes fast clocks over anything else, for x86. Xeons will beat Epyc. ARM can beat x86 if tuned correctly.

  * If you're using Kubernetes, look into Intel's CPU Management [8], which is also now native in K8s v1.26 [9].

* Overall for containers, learn about cgroups. Specifically, how they (both v1 and v2) expose the `/proc` filesystem to applications. Then at how your application is detecting that for any kind of multiprocessing. Hint: Node [10] uses libuv, which is calling `/proc/cpuinfo` [11].

* If you have access to the disk (e.g. you're running bare metal or VMs with this capability), think carefully about the filesystem you use and its block size (and record size, if you use ZFS).

Good luck!

[0]: https://github.com/ulid/spec

[1]: https://github.com/ai/nanoid/blob/main/async/index.js#L5

[2]: https://github.com/nodejs/node/blob/main/doc/api/crypto.md#c...

[3]: https://www.postgresql.org/docs/current/explicit-locking.htm...

[4]: https://www.pgbouncer.org/

[5]: https://www.pgpool.net/mediawiki/index.php/Main_Page

[6]: https://www.haproxy.org/

[7]: https://cbonte.github.io/haproxy-dconv/2.4/configuration.htm...

[8]: https://networkbuilders.intel.com/solutionslibrary/cpu-pin-a...

[9]: https://kubernetes.io/docs/tasks/administer-cluster/cpu-mana...

[10]: https://github.com/nodejs/node/blob/main/src/node_os.cc#L100

[11]: https://github.com/libuv/libuv/blob/v1.x/src/unix/linux.c#L8...*

> There are pros and cons to using random IDs as a PK. For RDBMS clustering on the PK (InnoDB), it's a terrible idea. If you're going to sort by the PK, it's usually a terrible idea (UUIDv1 isn't as bad since it includes the timestamp, but that assumes your access pattern is based on insertion time).

In which use cases you want to sort by PK? Isn't sorting by fields like created_at sufficient?

Depends if you're using an autoincrementing int as your PK. If so, it may be a reasonable decision. It may have gaps, of course, but it's still monotonic.

Related, PG15 has an improved sort algorithm for some data types. Most integers, timestamps, and a few others. So the column type could matter here.

I wouldn't judge the project on how correctly they are indexing tables.

Rather judge them on the feature parity and overall thesis. All the scaling issues can be fixed in future as they come. You do not have to be ready for cloud scale from day 1.

I'm not, to be clear, and I just wrote a giant referenced comment where I clarify that most things don't matter until you're at scale.

But for something small like that, may as well fix it now.

We appreciate you sharing your thoughts and welcome them. We learn as we go and your comments are inspiring, thank you.

Hey HN, I'm one of the co-founders of Activepieces and I just wanted to give a shoutout to everyone who's been showing us love on GitHub by giving us stars.

To show my appreciation, I created a flow using Activepieces to thank everyone on our Discord.

Screenshot: https://imgur.com/a/W5p60le

Considering that another open source alternative [0][1] was posted just a few days ago, how does this compare?

[0] https://news.ycombinator.com/item?id=34610686

[1] https://trigger.dev/

And Automatisch a few days before that. In September, CRDT databases were all the rage. This month, it's no-code workflow tools. I guess it makes sense that once someone launches, the rest want to follow quickly.

https://news.ycombinator.com/item?id=34519639

https://automatisch.io/

I believe trigger is more focused on devs while this is more focused on others

Yes, this! Trigger.dev lives in your code, we offer a visual flow builder, more like Zapier.

If someone needs something between zapier and trigger, check out windmill.dev. been using it for a few days. And I'm really impressed. I think Activepieces is targeting the same user as zapier. Trigger is targeting developers. Windmill seems to be able to give github sync and + you start of with writing code not in the GUI. And then you add a GUI on top of the script or connect scripts together into workflows. As a developer i found it amazing. But it's for people who can ( ow would like to) program. I think they found some kind of magical middle ground. (No affiliation, just a happy user.)

As a zapier alternativ this looks great. Congratulations with the launch and good luck. A lot of things is happening in this space now.

MIT license builds the trust to use it and contribute to it, or at least use it freely.

But that also enables someone to use your software as a starting point for their own competing SaaS solution.

Which is what encourages companies to at some point of time shift to a BSL license, as you might also at some point of time.

The goal is of course to build Enterprise features which are hard to replicate for others, but these imply more complex engineering challenges are being solved.

That could be plugins to integrate into other Enterprise tools like Snowflake or Salesforce for e.g.

Another interesting observation is that in next 4-5 years we could expect a robust open source MIT licensed stack for pretty much everything.

But it only feels like that, because we will start to have quantum computing and then all software will need to get rebuilt.

There's already quite a few options with a BSL, I think it's wise to stick with MIT as a clear differentiator.

We saw some evidence that companies will prefer to work with the original developers for their own product's hosting, but I'll keep working on improving my understanding to the open source ecosystem. Very interesting thoughts.

When I evaluate software, I see having multiple vendors offering to support/host the software as a good thing. After selecting a piece of software I then begin to evaluate vendor offerings. The vendor being deeply involved in the project- such as being the original creator or otherwise being a major contributor, is a huge plus.

Having 100% of a small pie is often worse than having a good percent of a larger pie. Fostering a large Open Source ecosystem increases the size of the pie- even if you have to share some of it.

Of course, any business is a lot of work, whether your software is Open Source or proprietary. Open Source is a great strategy, but it doesn't guarantee success.

Thanks for sharing your thoughts, we are seeing this pattern indeed. I agree that it doesn't guarantee success, we're inspired by open source apps that compete in being better software rather than only privileged as open source.

Could I use this to embed inside my app to allow my customers create integrations flows to other products, from my app ?

Some of our users ship 1-click in-app integrations to their users with us, we haven't worked on cases where the whole builder is embedded. Feel free to open an issue about it and we'll discuss it there: https://github.com/activepieces/activepieces/issues

Very cool and congrats on the launch! Love the MIT license as well. Can you talk about some of the differences with n8n except for the license?

I actually forgot to mention a very important differentiator. If you self-host N8n and you'd like to connect say your Asana or Gmail account, you have to own your own app or get an API key.

We have a cloud auth service, that even if you self-host us, you have the option to automatically connect your account with our predefine OAuth 2.0 app which makes the experience seamless if you don't want to take care of auth. We found that this is a very important feature to simplify the UX!

We think our UX is simpler for users who don't wish to get involved in too much technical understanding. We heard from many users that N8n was a bit too technical to them. Thank you!

Excited about this open source alternative!

I think no code crons have huge opportunity to capture nocode makers who like framer or webflow.

I think this is also quite economical when compared with competitors like

zapier - free / 100 tasks / month n8n - 20 euros / 5000 workflows / month make - free / 1000 ops / month activepieces - free / 5000 tasks / month

I think it would be great if in your landing page you can describe how you differentiate or provide similar experience as your competitors

I agree that we should show some display of how our features compare to existing apps in the market.

Thank you for the cost comparison, I'd like to highlight that some apps price on whole flow execution and others on step execution. We chose pricing on step execution to easily compare with Zapier and Make. We'll see how this will develop.

Looks really cool.

How did you build those nice canvas-style UIs with dragging/dropping/connecting etc? Would like to use something for a personal project.

Maybe you can find them reusable for your project from our codebase. We're an Angular project on the frontend.

My perfect workflow is something like Config as code.

So basically i could declare a JSON/TOML config, then when deployed, the visual diagram is set.

That's a decent feature. The main reasons we know of are versioning and IDE experience.

We were designed API-first. They are clean, but undocumented yet. We currently have a draft PR that generates an OpenAPI specification.

I think the next step for us would be a CLI to bring the code experience into the IDE.

What are your reasons?

Is there a good way to navigate the code base and understand the layout and structure?

Yup,take look there https://activepieces.com/docs/contributing/repo-structure

Most current documentation is about the pieces package (Building Pieces Section), the rest will follow soon.

Launch HN: Activepieces (YC S22) – Open-Source Zapier Alternative

Recommend

Elon Musk poised to reclaim title of world’s richest person

UK High Court accepts Craig Wright’s claim over Bitcoin database ownership

Pretzel and the Puppies

基于fpga的嵌入式图像处理（总结篇）

Progressively enhanced Turbo Native apps in the App Store

Billionaire Auctions Millions Worth of Hermès and Chanel Bags: Photos

Donald Trump is officially back on Facebook and Instagram

SAP in Sustainability – The Buzz that’s Redefining Business Models

WhatsApp to let users transcribe audio messages into text

Apple assembler Foxconn in serious talks to establish operations in third Indian...

About Joyk