怎麽能讀取到PDF文件裏的內容信息,PDF為全英文文本格式 無圖片類的東西..
我怎麽才能讀取到所有信息 然後保存到字符串裏面..?
------解決方案--------------------
下載 PDFBox-0.7.3.zip
PDFBox-0.7.3.dll
lucene-demos-2.0.0.dll
lucene-core-2.0.0.dll
bcmail-jdk14-132.dll
bcprov-jdk14-132.dll
FontBox-0.1.0-dev.dll
ICSharpCode.SharpZipLib.dll
IKVM.AWT.WinForms.dll
IKVM.GNU.Classpath.dll
IKVM.Runtime.dll
ikvm-native.dll
放入Bin中
C# code<%@ Page Language="C#" %>
<%@ Import Namespace="System" %>
<%@ Import Namespace="org.pdfbox.pdmodel" %>
<%@ Import Namespace="org.pdfbox.util" %>
<script language="C#" runat="server">
protected void Page_Load(object sender, System.EventArgs e)
{
string pdfPath = Server.MapPath("index.pdf");
PDDocument doc = PDDocument.load(pdfPath);
PDFTextStripper stripper = new PDFTextStripper();
string txt = stripper.getText(doc);
Response.Write(txt);
}
</script>
註:網上搜的不知道管不管用。樓主試試吧能讀出來再處理就好辦多了