星期四, 10月 27, 2011

20111027 python-nltk 學習小記


phthon-nltk
  • NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets and tutorials supporting research and development in Natural Language Processing.
    官方網站
    http://www.nltk.org/

    工程計算要先from math import *

    Getting Started with NLTK
    • >>>from __future__ import division
      • 使用浮點數的用法
      >>> import nltk
      >>> nltk.download()

      • 下載 Collection 標籤內 Book 的套件
      • 會下載到 ~/nltk_data
      NLTK Book with Oreilly
      http://www.nltk.org/book

      >>> from nltk.book import *
      This says "from NLTK's book module, load all items.
      • 當結束完 python
        啟動新的 python 都要
        import nltk
        from  nltk.book  import  *
      找字
      • .concordance()
        • text1.concordance("monstrous")
      計算句子(或文章)的字(或token)數
      • len()
        • >>> stt=['I', 'want', 'to', 'go', 'home', '.']
          >>> len(stt)
          6

      顯示不同token
      • set()
        • >>> stt=['I', 'want', 'to', 'go', 'to','school']
          >>> set(stt)
          set(['I', 'to', 'school', 'go', 'want'])
      計算token出現頻率
      • >>> len(stt)/len(set(stt))
        1.2
      排序
      • sorted()
        • >>> sorted(set(stt))
          ['I', 'go', 'school', 'to', 'want']
      計算某token發生次數
      • .count()
        • >>> st='I want to go home.'
          >>> st
          'I want to go home.'
          >>> st.count('t')
          2

          >>> stt
          ['I', 'want', 'to', 'go', 'to', 'school']
          >>> stt.count('to')
          2
      找出token在list的位置
      • .index()
        • >>> stt
          ['I', 'want', 'to', 'go', 'to', 'school']
          >>> stt.index('to')
          2
          >>> stt.index('I')
          0
      取出在list特定位置的token
      • >>> stt
        ['I', 'want', 'to', 'go', 'to', 'school']
        >>> stt[1]
        'want'
        >>> stt[1:3]
        ['want', 'to']
        >>> stt[1:]
        ['want', 'to', 'go', 'to', 'school']
        >>> stt[:3]
        ['I', 'want', 'to']

        • 1:3 只取出 1,2
          1: 從 1 取到最後
          :3 從0 取到2
      string轉成list
      • .split()
        • >>> st='I want to go home.'
          >>> st
          'I want to go home.'
          >>> st.split()
          ['I', 'want', 'to', 'go', 'home.']
      list 轉成 string
      • .join()
        • >>> stt
          ['I', 'want', 'to', 'go', 'to', 'school']
          >>> ''.join(stt)
          'Iwanttogotoschool'
          >>> ' '.join(stt)
          'I want to go to school'
          >>> ','.join(stt)
          'I,want,to,go,to,school'

          • 可以指定轉接符號

沒有留言: