Friday, February 24, 2012

PostgreSQL Conference 2012

PostgreSQL Conference 2012 was held at here.

大きな地図で見る
I attended these sessions:

  • Opening
  • Keynote 1 : How a large organisation moved its critical application toward PostgreSQL
    • By Philippe Beaudoin
    • Caisse nationale des allocations familiales (CNAF)
      • The Organization of the French social security system
      • Benefit welfare costs for 33 million families/11 million households
      • 123 CAF branches
    • Migration
      • From RFM on mainframe
      • To PostgreSQL on RHEL
      • I was the manager of all of migration team - develop, test, production and so forth.
      • It takes 18 month. There were 8 or 10 people who were assigned all over the period.
      • We use BYTEA data type to store COBOL-friendly data.
      • WE use DBSP as data transfer tool.
      • So that, we done both transferring and checking data within 24 hours.
      • There are less changed programs than expected.
    • Tests after migration are rather hard.
      • It's important to test enough and enough times.
      • 2 DBs on 2 clusters per 1 CAF branch.
      • 4 TB in gross. The largest size is 250 GB.
      • pgAdmin3 : Database management tool for developers
      • phpPgAdmin : Database management tool for testers
      • Even If you keep in mind to query simple SQL, Performance tuning is also important. So that, we can handle 33,000 SQL statement per second.
    • Customer satisfaction after migration
      • I'm proud that this project has not delayed
      • All players (from developers to end users) are satisfying.
      • There are zero problems caused by PostgreSQL itself.
    • Now we are refactoring HA system
    • And now we are migrating from PostgreSQL 8.4 to PostgreSQL 9.0.
    • Q&A
      • Q) Why do you choose PostgreSQL?
      • ->You all ask this question when we speak this session :-)
      • ->We(Bull Inc.) suggested the migration to CNAF using DBSP that was supporting PostgreSQL or Oracle Database.
      • ->CNAF chose PostgreSQL, because CNAF had done hurt in Oracle Database.
      • ->So that we researched and re-suggested using both DBSP and PostgreSQL.
      • Q) Training PostgreSQL to CNAF was difficult?
      • ->No, It gone smoothly becuase there were many people familiar with PostgreSQL.
  • Keynote 2 : An Overview of PostgreSQL 9.2
    • By Robert Haas
    • 4 Overviews
      • High-End Servers
      • Larger Data Sets
      • More Copies of PostgreSQL
      • More Replicas
    • Major Features
      • Scalability (for many cores)
        • Until 9.1, there are limitation that the scalability works fine only up to 8-cores because of lock contention.
        • Since 9.2, the scalability works fine up to 32-cores because of lock contention free.
      • Index only scan (for larger data sets)
        • Until 9.1, accessing to index file brings accessing to table unexpectedly.
        • Since 9.2, accessing to index file does not bring accessing to table if never needed.
        • For most case, 9.2 will only access to index file, so that 9.2 will speed up.
      • Reduced Power Consumption (for hosting provides)
        • In the Cloud (like Heroku), there are huge amount of PostgreSQL copies
        • In such environment, power consumption becomes the leverage.
        • Until 9.1, 11.5 auxiliary processes per second
        • Since current 9.2 branch, 7.5 auxiliary processes per second. We try to reduce more in 9.2 final.
      • New Backup and replications (for ease to scale out)
        • Cascading replication
        • Remote mode
    • Other features
      • JSON data type
      • RANGE data type
      • Change the column type (like change the size of VARCHAR) without any ALTER TABLE statements.
    • Future
      • Remove bottlenecks caused by buffer replacement in single thread
      • Lock contention free more than 32-cores.
    • Q&A
      • Q) Why the threshold of lock contention is around 32-cores.
      • ->Because we could only prepare machines up to 32-cores.
      • ->Please give us 64-cores machine :-)
      • Q) Why do you describe power consumption as auxiliary processes per second? Why not Volts or Watts?
      • ->Volt and Watt are not consistent value in each machines.
      • Q) Are you planning any SSD optimized solutions, like write scalability?
      • ->No.
      • Q) Are there any limitations when we use cascading replication since 9.2?
      • ->I don't know.
      • Q) When 9.2 will be released?
      • ->9.0 and 9.1 had been released in September. So I hope 9.2 will be released in this September, but not yet fixed.
  • To implement on-memory caching with pgpool-Ⅱ
    • By Toshihiro Kitagawa
    • What is pgpool-Ⅱ?
      • A middleware for PostgreSQL
      • BSD License
      • Provide useful features between PostgreSQL tier and Application tier
      • We keep on implementing on-memory query caching
      • An example of adoption : JTB Tabi Card
    • What is on-memory query caching (in the case of 3-tier web architecture)?
      • At the DB tier, Disk I/O becomes a bottleneck easily and hard to multiplex. So that there is a need to cache in App tier or in Web tier.
      • While some DBMS has caching feature, but this needs to access in Db Tier too, anyway, DB tier causes a bottleneck.
    • Current query caching of pgpool-Ⅱ is not good because this stores query caches on DB, and these query caches are not updated automatically while original data are updated. So now is the time to fix the implementation of query caching of pgpool-Ⅱ.
    • Features of renewal on-memory query caching
      • Storing on shared memory or on memcached
      • Reuse caches across any sessions
      • Define to use caches or not to use for each tables
      • Don't create caches when "SELECT INTO", "SELECT CURRENT_TIMESTAMP" or "SELECT FOR UPDATE" queries are called.
      • Don't create caches when updated by any triggers or any cascading operations.
    • Key of cache
      • statement + DB name + table name + user name
      • So if these all are same value, key is also same despite schema name is NOT same.
      • Key doesn't match another user's one.
    • This feature will be released in this May(not yet fixed).
    • Q&A
      • Q) The query cache of MySQL is case sensitive. How about pgpool-Ⅱ?
      • ->pgpool-Ⅱ's one is case sensitive too.
  • Introduction for PostgreSQL Architecture
    • By Satoshi Nagayasu
    • Processes
      • e.g. writer process
        • The process to write data of shared buffer to the disk
      • e.g. pg_statinfo / pg_reporter
        • Analysis tools
    • Memories
      • e.g. shared buffer
        • Caching some blocks on the disk to reduce disk I/O.
    • Files
      • e.g. Transaction log file (a.k.a WAL file)
        • Record information before writing data of shared buffer to the disk
        • 16MB per file
      • e.g. Table file
        • Table information
        • 8KB per file
      • e.g. B-Tree index file
        • Index information
        • 8KB per file
      • PostgreSQL has no overwrite I/O.
        • Logical delete with marking as deleted
    • Query plan
      • PostgreSQL has a policy not to use any hint clauses like some proprietary DBMSs.
      • When sequential access occurs?
        • Query with aggregate functions
        • Query with LIKE clause
        • In these case, access time slows down related with growing data amounts
      • When random access occurs?
        • Query using Index
        • In these case, access time is not related with data amounts, if index files are used effectively.
    • no overwrite I/O
      • INSERT : append row
      • DELETE : mark as deleted to corresponding row
      • UPDATE : A combination of INSERT and DELETE
      • The header of record data has transaction IDs both for creation and for deletion.
      • If you want to make all records marked as delete empty, do "VACUUM".
    • Performance management
      • How to measure performance : Response time of one query X concurrent queries
      • Stay within the capacity of the hardware
    • Backup, restore and recovery
      • It's hard to back-up perfectly because Data exist not only in files but also memories.
      • Cold back-up
        • Backing-up with stopping all processes
        • It's useful if file system of your server can be ready for snapshot.
      • Hot back-up (pg_dump/pg_restore)
        • Backing-up without stopping any processes
        • It's useful if you want to back-up database units or table units.
      • Back-up with archive log and PITR
        • Back-up at base point : pg_start_backup / pg_stop_backup
        • Back-up differences from base point : archive log
      • Let's rehearsal to backup, restore and recovery
    • Redundancy
      • PostgreSQL has several types of redundancy.
      • How to choose : Load balancing, single point of failure, operation, data consistency
      • shared disk
      • pgpool
      • streaming replication
      • single master - multi slave
        • Streaming transaction logs from master to slave
  • PostgreSQL operation techniques ~level up edition~
    • By Masahiko Sakamoto
    • Operations in necessary and sufficient conditions
    • Why we fail to operate?
      • Too much to make operation tool uniquely.
      • So that we create bugs when we make that tools.
      • We will encounter unexpected cases.
    • Using OSS tools
      • OSS tools are mature
      • OSS tools had been encountered plenty of type of cases.
      • e.g. : back-up : PostgreSQL has many strategies
    • Introduction tools
      • pg_statsinfo : Graphical stats via browser
      • pg_reorg : do VACUUM FULL or re-index with keeping on table accesses
      • pg_rman : prepare PITR
      • pg_bulkload : Fast data load
    • demo
  • How to Use SQL Calls to Secure
    • By Hiroshi Tokumaru
    • This session is based on the IPA's textbook : "How to Use SQL Calls to Secure Your Web Site" (en, ja)
    • Literals and SQL injection
    • How to use SQL calls
      • Embed parameters into SQL dynbamically
        • concatinate strings (e.g. sprintf)
      • Create SQL with placeholders
        • static placeholder : Parsing query in server side
        • dynamic placeholder : Parsing query in client side
        • PostgreSQL has static placeholder.
        • MySQL and SQLite have dynamic placeholder and make dynamic placeholder default.
        • "static placeholder" is the perfect answer.
        • If you wnat to use "dynamic placeholder", there is an ability of SQL injection if connection library has bugs despite your application has no bugs.
    • To escape concatinated strings is difficult and annoying.
      • We must consider the differences about DBMS
      • We must consider the differences about start-up options
      • We must consider the differences about the version
      • Since PostgreSQL 9.1, "standard_conforming_strings = on" is default option
      • backslash_quote option : '' and \\ are only valid escapes
      • PHP and Perl have quote method that can escapy safely.
      • But, Perl + MySQL occurs lacking some escapes.
    • "Java + Oracle" is the most reliable combination for enterprise use.
    • "PHP + PostgreSQL" is the most reliable combination for LL use.
    • "Perl + MySQL" has several traps.
    • 0x5C problem
      • the problem when the bsecond byte of Shift_JIS character is "0x5C".
      • because the backslash's code is aso "0x5C".
      • So that, we should use UTF-8 in client program.
    • U+00A5 problem
      • While we make client program with UTF-8.
      • Database connection is under Shift_JIS or EUC-JP encoding.
      • U+00A5 is recognized backslash in Shift_JIS/EUC-JP database connection.
      • So that, we also use UTF-8 for database connection encoding.
  • Closing and Lightning talks
N.B. These summaries may be imprecise.

No comments:

Post a Comment