How to filter fake news?

No specific Bitcoin Bounty has been announced by author. Still, anyone could send Bitcoin Tips to those who provide a good answer.

How does one filter fake news while not hampering information dissemination using alternative media? Reputable media outfits are not without their biases. The internet was able to provide an avenue for information to flow unrestricted, yet with such freedom came the proliferation of "fake news" sites. It wouldn't be an issue if people were more discerning of the information they believe in, but such is not the case. If people are then ill-equip to filter on their own, would technology be capable of doing so without bias? Or would this be a futile act because whoever owns the filtering code would have power to choose which information can reach us?

1 Answer

Internet users could send Bitcoin Tips to you if they like your answer!

Here is my few cents on "technology vs fake news" topic:

Is it blood or red paint?

I think technology should be able to catch SOME of the low quality obvious fakes by just looking at their formal content quality signals. With the higher quality fakes (like "staged" news with real authentic pictures but in those pictures people use red paint to pretend its blood), all the formal content quality signals could be "high", and so historical reputation of that particular news source should come into play.

  • Formal content quality signals
  • Historical reputation of the news source (reporter and website)

Content quality

As their first layer of defense, one could use a Machine Learning approach similar to what Google is using to filter out SEO spam. Such ML system should be able to catch low quality obvious fakes with stolen old non-relevant images, re-used content, and other obvious give-aways. There are multiple articles published by SEO researches that describe what signals Google is using to identiy low quality content, most of it is also applicable to identify low quality news content that is more likely to be fake. Since images are often found in the news articles, it is probably a good idea to pay close attention to images metadata:

  • metadata on images (JPEG Exif attributes, etc) - real images taken on the spot are very likely to have Exif headers with relevant GPS coordinates, to be taken from the same camera or from the same mobile phone, taken in a relevant time-frame. Real reporters would use a modern digital camera or mobile phone, in both cases there should be plenty of metadata in Exif attributes. Fake reporters would more likely to copy some 3-year old image and most likely will not be bothered to produce realistic-looking combination of Exif attributes. So with enough training ML system should be able to learn the proper distance between real authentic resent images taken by news reporters and non-genuine copied images.
  • besides images, there are other 200+ signals that Google is using and that could be used to measure formal content quality

Historical reputation of the news author

If this given author has a track record of publishing unique content on a regular basis for the past 10 years, and his public "reputation score" remains stellar for all these years, then his todays news story is more trustworthy compared to someone new who is now publishing his 1st article.

  • identifying the original content author
  • feedback system that allows readers to provide feedback to increase / descrease the reputation of the author

Such reputation system applies not only to "fake news" but to web content in general, and there are some interesting developments in this area that would make a very interesting conversation on its own


Too many commands? Learning new syntax? is a free tool to save your favorite scripts and commands, then quickly find and copy-paste your commands with just few clicks.

Boost your productivity with!

Post Answer