Who, what, when, where and why

Get the latest news about Sourcefabric software, solutions and ideas.


Why open source search with Solr is good for news organisations

Quality news sites need to build a reputation for quality content. Visitors must be able to find this content in a variety of ways. Navigation menus and tools don't work for everyone, and sometimes people don't know what they are looking for. Search offers a way for people to find relevant information quickly. Happy, returning visitors means a more attractive proposition for advertisers and content partners.

Newscoop has been focussing on integrating better search functionality for its latest release. Solr, used by AOLNetflix, InstagramSourceForge, Internet ArchiveDiscogsCISCOMTVNASAWhiteHouse.govApple and many more, was chosen. In addition to the search tool is used by these large institutions, there were many reasons why it was suitable for Newscoop and independent news organisations.

What is Solr?

Solr is an open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is highly scalable.

Solr delivers a rich feature set out of the box, including but not limited to:

  • Faceted navigation
  • Hit Highlighting / dynamic teaser GEO search: filter & sort by distance
  • Spellcheck & auto suggest
  • Advanced ranking and sorting
  • Distributed and replicated search
  • Structured/unstructured search
  • Rich plugin architecture, extensible

How does it work?

Apache Solr can be housed in its own server or in a cloud, which allows for fine-tuned and dedicated search hardware. During a web configuration, data from the site, in XML format, is sent to Solr via HTTP. Solr creates its own database for the site and indexes the data, which can be updated according to a schedule or triggered when changes are made.

The data can then be queried by the web server using HTTP GET. Solr uses the Lucene Java search library to search its own database and its response format is configurable. It can return results in XML, PHP, JSON and other formats. The results tell the web server what to push to the browser.

Why news organisations benefit from Solr

Quicker search keeps readers happy

Audiences find content quicker, enjoy the site more and are happy to stay and revisit.

Better indexing of content

News sites produce huge amounts of new content. The site should be able to index this and offer it quickly to visitors for a more relevant browsing experience.

Extend search beyond articles

Adding new blogs or content is not a problem, Solr can adapt and index this. It can even search inside attachments (pdfs, Word documents).

Grows with you

Solr, just like Newscoop, is open source and therefore extendable. As a site grows, the possibilities of search can grow.

Auto-complete and filtering

Google introduced great search to the web. People expect intelligent search, and by using it you can open up access to your archives easily.

What's popular?

Using search 'top hits' as a barometer of community opinion and interest is a great traffic driver and important for advertisers, writers and content syndicators.

  • Newscoop 4.1 with Solr search is out now. Find out more here