6

Our MongoDB Atlas hackathon idea: news headlines and NLP

 2 years ago
source link: https://dev.to/yactouat/our-mongodb-atlas-hackathon-idea-news-headlines-and-nlp-2n83
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

The idea: a news headlines search engine with sentiment analysis

After scrolling through possible subjects and MongoDB technologies we might use for the hackathon, we (me and 2 friends) came up with an idea that would leverage the text search capabilities of Atlas Search with a massive amount of news headlines data.

We want our news headlines search engine to be able to do these kinds of searches:

  • "War in Iraq" => would give all headlines related to that subject even if the title does not exactly match
  • "sentiment about war in Iraq in news headlines from date 1 to date 2" => would output a main sentiment related to that subject using NLP

More query filters and capabilities could later be added to the app', but if we make a text box work that outputs relevant results related showing that the aforementioned features work, we would be very pleased :)

We have no prior formal knowledge of Data Science Atlas Search or NLP, so I guess it's gonna be a hell of a ride ^^

initial steps

We derived a few major steps to create our app':

  1. ✅ get a maximum of data related to news headlines as CSV's or JSON's from various sources: so far our list contains https://github.com/yactouat/dev.to_mongodbatlas_hackathon_2022
  2. define common data structure of the news headlines entity(ies) we'll use in the app
  3. I/O algorithm to format all data from various sources into one or multiple files with same format
  4. fill mongo DB with formatted data
  5. implement full text search with Atlas Search
  6. add sentiment analysis to headlines text search feature

These are all the vague steps we thought about, I guess these will be split into multiple sub todos as we go along.

If you want to see how our project moves on, check out => https://github.com/yactouat/dev.to_mongodbatlas_hackathon_2022/projects/2

Stay Tuned !


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK