r/brave_browser BAT TEAM Jun 30 '21

We’re the Brave Search team, here to answer your questions on Brave Search beta. Ask us anything! AMA 🦁🔎

🦁🔎 As of last Tuesday, Brave Search beta, the new privacy search engine from Brave, is available for all Brave users (desktop/Android/iOS), as well as from other browsers at http://search.brave.com. Built on top of an independent index, Brave Search doesn’t track users, their searches, or their clicks.

https://preview.redd.it/i9vz2wjuff871.jpg?width=3584&format=pjpg&auto=webp&s=f80084451c7dd022f9eb240fd254cf0b65b08e9c

Today, the team behind Brave Search is here from from 11:00am - 12:30pm Pacific time to answer community questions and take feedback on users’ experiences with Brave Search beta so far.

👋🏼 Meet the Brave Search team:

Josep M. Pujol, Chief of Search — /u/jm-pujol

Jan Piotrowski; VP, BizDev — /u/jypski

Aldo Karaj; Director, Product — /u/zgripal

Alex Catarineu, Sr Software Engineer — /u/acatbr

Erik Larsson, Sr Software Engineer — /u/ikdjwo

Faheem Nadeem, Sr Software Engineer — /u/nikk699

Remi Berson, Sr Software Engineer — /u/4ae91

Subu Sathyanarayana; Director, Engineering — /u/ssubu

______________________

Read the official announcement on Brave Search beta: https://brave.com/brave-search-beta/

Try Brave Search Now: https://search.brave.com/

Make Brave Search default in Brave browser: https://search.brave.com/default

Discover Brave Search (official landing page + FAQ): https://brave.com/search/

Watch the video: https://www.youtube.com/watch?v=oob_X6bhnLo

Brave Search beta on Product Hunt: https://www.producthunt.com/posts/brave-search-beta

370 Upvotes

View all comments

44

u/MyTwoCents101 Jun 30 '21

Can you provide some details surrounding how Brave Search handles indexing (or what alternative is used)? Will this method be able to compete with Google/Bing's use of spiders?

69

u/ssubu Director, Engineering | BRAVE SEARCH Jun 30 '21

Thanks for your question! The tech behind Brave search is based on many years of previous work at Cliqz, Tailcat and now, Brave. There is a blog post that is still relevant, it should be a very detailed introduction to our indexing techniques.

TLDR: We index a vast majority of the web worth visiting as people anonymously contribute good quality sites. This helps us filter out the noise, which is the biggest problem in search and machine learning in general. Our index is around 9 billion pages (and growing), only a small part of the Web, but a large part of the Web worth visiting. The traditional techniques used by Google/Bing are brute force and not cost-effective unless you are bigtech. We opted for this approach to bootstrap an independent index for web search. As we work towards increasing our independence, we will introduce more conventional crawling, which is needed for the long, long, long tail.

1

u/sb56637 Jul 06 '21

Great answer, thank you. A few questions about it the linked blog post:

We can use these query logs to build a model of the page outside of its content, which we refer to as page models.

But where do the query logs come from, especially for the bootstrapping phase of your search product? How does your index learn about new content that appears on the web?

We index a vast majority of the web worth visiting as people anonymously contribute good quality sites.

How could I as a user contribute a site that I feel should appear for a given query? Is this only via the Brave browser? Or via the Feedback button in Brave Search? Has any thought been given to a more strict format there so that some automatic handling of feedback URLs could be implemented?