The WikiAnswers corpus contains clusters of questions tagged by WikiAnswers users as paraphrases. Each cluster optionally contains an answer provided by WikiAnswers users. There are 30,370,994 clusters containing an average of 25 questions per cluster. 3,386,256 (11%) of the clusters have an answer.

Mac

The data can be downloaded from: http://knowitall.cs.washington.edu/oqa/data/wikianswers/. The corpus is split into 40 gzip-compressed files. The total compressed filesize is 8GB; the total decompressed filesize is 40GB. Each file contains one cluster per line. Each cluster is a tab-separated list of questions and answers. Questions are prefixed by q: and answers are prefixed by a:. Here is an example cluster (tabs replaced with newlines):

Corpus Online

Reference: https://github.com/afader/oqa#wikianswers-corpus
Related Corpus: Paralex: Paraphrase-Driven Learning for Open Question Answering